User Tools

Site Tools


2018grep_motifs

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
2018grep_motifs [2018/09/10 11:18]
david
2018grep_motifs [2018/09/10 11:37] (current)
david
Line 55: Line 55:
 </​code>​ </​code>​
  
 +This added a match from R13A5.5: ''​aatttttg''​
 +
 +Of course, we may not wand that many. It has 5 t's, so let's set an upper bound to 4. We must change how we do it. A length range can be specified after any symbol or character set via curly braces.
 +<​code>​
 +{n,m} - example {2,4}
 +{n}   - example {2,}
 +{,m}  - example {,4}
 +</​code> ​
 +
 +  - The first example matches a repetition of 2-4 of the preceding symbol or character set.
 +  - The second example matches a repetition of AT LEAST 2 of the preceding symbol or character set.
 +  - The third example matches a repetition of AT MOST 4 of the preceding symbol or character set.
 +
 +We need to change the way we search for such a pattern.
 +
 +  * egrep - "​extended"​ regular expressions
 +  * quote the pattern - Curly braces have a special meaning in BASH, so we use single quotes ('''''​) to prevent the shell from interpreting them.
 +
 +<code bash>
 +$ egrep -bi '​AAT{,​4}G'​ output/*
 +output/​F56H9.5.1.fa:​24:​catccatttatactattgcaccgaatattgggttaatgtcggtgtttgaa
 +output/​F56H9.5.1.fa:​75:​tatattttggttacagtttaaatgcttcaaatttaaatcaattaaatc
 +output/​F56H9.5.2.fa:​24:​ttaaatgcttcaaatttaaatcaattaaatc
 +</​code>​
 +
 +We lost our R13A5.5.1.fa match. We can restore it by increasing the upper bound to 5 or higher.
 +<code bash>
 +$ egrep -bi '​AAT{,​5}G'​ output/*
 +output/​F56H9.5.1.fa:​24:​catccatttatactattgcaccgaatattgggttaatgtcggtgtttgaa
 +output/​F56H9.5.1.fa:​75:​tatattttggttacagtttaaatgcttcaaatttaaatcaattaaatc
 +output/​F56H9.5.2.fa:​24:​ttaaatgcttcaaatttaaatcaattaaatc
 +output/​R13A5.5.1.fa:​24:​ttactaatttttgttatcttatcaaacaaatatattttccagc
 +</​code>​
 +
 +Here is the full set of repetition operators in regular expressions.
 +<code bash>
 +$ man grep
 +</​code>​
 +
 +...Scrolling way down...
 +
 +<​code>​
 +   ​Repetition
 +       A regular expression may be followed by one of several repetition operators:
 +       ? ​     The preceding item is optional and matched at most once.
 +       ​* ​     The preceding item will be matched zero or more times.
 +       ​+ ​     The preceding item will be matched one or more times.
 +       ​{n} ​   The preceding item is matched exactly n times.
 +       ​{n,​} ​  The preceding item is matched n or more times.
 +       ​{,​m} ​  The preceding item is matched at most m times. ​ This is a GNU extension.
 +       ​{n,​m} ​ The preceding item is matched at least n times, but not more than m times.
 +</​code>​
 +
 +Regular expressions are a general concept, but different commands may have different limitations on what they support.
 +There are more operators, patterns, and capabilities of regular expressions,​ but like most things we've encountered,​ YOU MUST TEST each command to make sure it works as expected.
2018grep_motifs.1536599892.txt.gz ยท Last modified: 2018/09/10 11:18 by david