Anagrams combinatorial complexity

As explained before, we allow the user to input a defined nucleotide composition for k-mer generation. This option has been implemented for two reasons:

to reduce the amount of sequences that has to be tested;
to drive the search towards k-mers with primer-like features.

keeSeek starts from the selected nucleotide composition and permutates symbols to generate all possible anagrams.

The number of existing anagrams for a sequence of length N, and with N different symbols, is the factorial of N. If we skip equivalent words in permutations, since there are only four symbols in DNA strings, the number of anagrams for a word of N / (S₁* S₂* S₃* S₄) where S₁, S₂, S₃, and S₄ are the amounts of the different symbols in the string.

In the following graph we show the total number of anagrams resulting from different nucleotide composition.