User Tools

Site Tools


This is an old revision of the document!

Using regular expressions to find sequence motifs

Sequence motifs can contain degenerate symbols, meaning they match more than one option at a given position.


WGATAR - [A or T] G A T A [A or G]

This is very close to the regular expression you would use to search a sequence.

grep '[AT]GATA[AG]' sequence.fa

The brackets in regular expression syntax stand for a character set. Any single match can have one of the characters inside the bracket. The match length for the above example is always the same length: 6 characters.

The degenerate symbols are from the IUPAC standard:

Symbol Stands for Character set
W Weak [AT]
S Strong [CG]
R pUrine [AG]
Y pYrimadine [CT]
M aMino group [AC]
K Keto group [GT]
N aNy [ACGT]
2018grep_motifs.1536599329.txt.gz · Last modified: 2018/09/10 11:08 by david