User Tools

Site Tools


2018grep_motifs

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
2018grep_motifs [2018/09/10 11:18]
david [Variable length patterns]
2018grep_motifs [2018/09/10 11:37] (current)
david
Line 55: Line 55:
 </​code>​ </​code>​
  
-This added a match from R13A5.5: aatttttg+This added a match from R13A5.5: ​''​aatttttg''​ 
 + 
 +Of course, we may not wand that many. It has 5 t's, so let's set an upper bound to 4. We must change how we do it. A length range can be specified after any symbol or character set via curly braces. 
 +<​code>​ 
 +{n,m} - example {2,4} 
 +{n}   - example {2,} 
 +{,m}  - example {,4} 
 +</​code>​  
 + 
 +  - The first example matches a repetition of 2-4 of the preceding symbol or character set. 
 +  - The second example matches a repetition of AT LEAST 2 of the preceding symbol or character set. 
 +  - The third example matches a repetition of AT MOST 4 of the preceding symbol or character set. 
 + 
 +We need to change the way we search for such a pattern. 
 + 
 +  * egrep - "​extended"​ regular expressions 
 +  * quote the pattern - Curly braces have a special meaning in BASH, so we use single quotes ('''''​) to prevent the shell from interpreting them. 
 + 
 +<code bash> 
 +$ egrep -bi '​AAT{,​4}G'​ output/* 
 +output/​F56H9.5.1.fa:​24:​catccatttatactattgcaccgaatattgggttaatgtcggtgtttgaa 
 +output/​F56H9.5.1.fa:​75:​tatattttggttacagtttaaatgcttcaaatttaaatcaattaaatc 
 +output/​F56H9.5.2.fa:​24:​ttaaatgcttcaaatttaaatcaattaaatc 
 +</​code>​ 
 + 
 +We lost our R13A5.5.1.fa match. We can restore it by increasing the upper bound to 5 or higher. 
 +<code bash> 
 +$ egrep -bi '​AAT{,​5}G'​ output/* 
 +output/​F56H9.5.1.fa:​24:​catccatttatactattgcaccgaatattgggttaatgtcggtgtttgaa 
 +output/​F56H9.5.1.fa:​75:​tatattttggttacagtttaaatgcttcaaatttaaatcaattaaatc 
 +output/​F56H9.5.2.fa:​24:​ttaaatgcttcaaatttaaatcaattaaatc 
 +output/​R13A5.5.1.fa:​24:​ttactaatttttgttatcttatcaaacaaatatattttccagc 
 +</​code>​ 
 + 
 +Here is the full set of repetition operators in regular expressions. 
 +<code bash> 
 +$ man grep 
 +</​code>​ 
 + 
 +...Scrolling way down... 
 + 
 +<​code>​ 
 +   ​Repetition 
 +       A regular expression may be followed by one of several repetition operators:​ 
 +       ? ​     The preceding item is optional and matched at most once. 
 +       ​* ​     The preceding item will be matched zero or more times. 
 +       ​+ ​     The preceding item will be matched one or more times. 
 +       ​{n} ​   The preceding item is matched exactly n times. 
 +       ​{n,​} ​  The preceding item is matched n or more times. 
 +       ​{,​m} ​  The preceding item is matched at most m times. ​ This is a GNU extension. 
 +       ​{n,​m} ​ The preceding item is matched at least n times, but not more than m times. 
 +</​code>​ 
 + 
 +Regular expressions are a general concept, but different commands may have different limitations on what they support. 
 +There are more operators, patterns, and capabilities of regular expressions,​ but like most things we've encountered,​ YOU MUST TEST each command to make sure it works as expected.
2018grep_motifs.1536599925.txt.gz · Last modified: 2018/09/10 11:18 by david