| matchDNAPattern {Biostrings} | R Documentation |
Generic that finds all matches of a pattern in a DNA string. Currently two algorithms are implemented. The default algorithm is an extension of the Boyer-Moore algorithm. The extended algorithm allows some wildcards in addition to the symbols for the bases and gap. The other algorithm is a simple forward search that examines all substrings of the full string of the same length as the pattern from the begining to end.
matchDNAPattern(pattern, x, algorithm, mismatch)
pattern |
An object representing the pattern string. The string in
pattern can use any of the standard DNA pattern letters. See
DNAPatternAlphabet for all valid letters. |
x |
An object representing a DNA string. |
algorithm |
Currently the only valid values are
"boyer-moore", "forward-search"
and "shift-or". The forward search algorithm is often as
fast as the more sphisticated Boyer-Moore algorithm when the
patterns being matched are very simple. The shift-or algorithm is
even faster. However, it can only be used for patterns of length at
most 32 or 64 depending on the number of bits in a machine word. The
shift-or algorithm can also do inexact matches for a given number of
mismatches. The default is "shift-or" where valid and "boyer-moore"
otherwise |
mismatch |
An integer, the number of mismatches allowed. The defualt is 0. If the default is non-zero an inexact match algorithm is used for matching. |
An object of class "BioString" with the same length as the number of
matches. Each element in the "BioString" object is a match. To obtain
the start and end points of the matches, use as.matrix on the
return value. See documentation for the "BioString" class for more
details.
Saikat DebRoy
Dan Gusfield - Algorithms on strings, trees, and sequences
BioString-class for the type of the return value.
x <- DNAString("AAGCGCGATATG")
m1 <- matchDNAPattern("GCNNNAT", x)
m1
as.matrix(m1)
m2 <- matchDNAPattern("GCNNNAT", x, algorithm="forward-search")
m2
as.matrix(m2)
data('yeastSEQCHR1')
yeast1 <- DNAString(yeastSEQCHR1)
PpiI <- "GAACNNNNNCTC" # a restriction enzyme pattern
match1.PpiI <- matchDNAPattern(PpiI, yeast1)
match2.PpiI <- matchDNAPattern(PpiI, yeast1, algorithm="forward-search")
match1.PpiI
match2.PpiI
match3.PpiI <- matchDNAPattern(PpiI, yeast1, mismatch=1)
match3.PpiI