| Reference Genome | Neverword | Min. num. of mismatches | Generation time (min : sec : msec) |
|---|---|---|---|
| Homo sapiens (GRCh37.64 ENSEMBL release) | TGATCGTAATCGTCGCGACA | 4 | 4:36:241 |
| GCAACCGTTCACGTGTTAGA | 3 | 1:08:592 | |
| CTGAACGATTGATGCTCGAC | 3 | 1:08:560 | |
| Arabidopsis thaliana (NC_003070.9, NC_003071.7, NC_003074.8, NC_003075.7, NC_003076.8) | TCGGTGTACGGTAATCACCA | 4 | 0:01:987 |
| CTGGTCGAAGTACGCATATC | 4 | 0:02:275 | |
| CTTGAGTGCAACAGCGTATC | 4 | 0:01:989 | |
| Mycobacterium tuberculosis (NC_000962.2) | GGATTGCCCCTTAGACTAGA | 7 | 0:02:053 |
| CGTGTCTCCAATAAGTGAGC | 5 | 0:00:084 | |
| CAGCTTAGGCTATCATCGAG | 5 | 0:00:175 |
Results have been obtained by running three times keeSeek and specifying the parameters “-a 5:5:5:5 -R 1-K 3000 <genome.fasta>”. The first parameter allows to specify the nucleotides composition, namely 5 A, 5 C, 5 G, and 5 T, and to obtain ordered permutations of 20-mers. The second parameter tells keeSeek to create a random seed and use it to prepare a random mapping for reshuffling the permutations; in this way the order among permutations is preserved but an increased variability of codes is ensured. Finally, “-K 3000” is used to limit the computation to the first 3000 results. Candidates that pass the filters are evaluated in blocks.