Anagrams combinatorial complexity

As explained before, we allow the user to input a defined nucleotide composition for k-mer generation. This option has been implemented for two reasons:

  1. to reduce the amount of sequences that has to be tested;
  2. to drive the search towards k-mers with primer-like features.

keeSeek starts from the selected nucleotide composition and permutates symbols to generate all possible anagrams.

The number of existing anagrams for a sequence of length N, and with N different symbols, is the factorial of N. If we skip equivalent words in permutations, since there are only four symbols in DNA strings, the number of anagrams for a word of N / (S1* S2* S3* S4) where S1, S2, S3, and S4 are the amounts of the different symbols in the string.

In the following graph we show the total number of anagrams resulting from different nucleotide composition.

keeseek/anagrams.txt · Last modified: 2014/02/07 17:51 by admin
Recent changes RSS feed
Public Domain
Donate Powered by PHP Valid XHTML 1.0 Valid CSS Driven by DokuWiki