User Tools

Site Tools


2018grep_motifs

This is an old revision of the document!


Using regular expressions to find sequence motifs

Sequence motifs can contain degenerate symbols, meaning they match more than one option at a given position.

Example:

WGATAR - [A or T] G A T A [A or G]

This is very close to the regular expression you would use to search a sequence.

grep '[AT]GATA[AG]' sequence.fa

The brackets in regular expression syntax stand for a character set. Any single match can have one of the characters inside the bracket. The match length for the above example is always the same length: 6 characters.

The degenerate symbols are from the IUPAC standard:

Symbol Stands for Character set
W Weak [AT]
S Strong [CG]
R pUrine [AG]
Y pYrimadine [CT]
M aMino group [AC]
K Keto group [GT]
N aNy [ACGT]
2018grep_motifs.1536599329.txt.gz · Last modified: 2018/09/10 11:08 by david