The larger your genome, the longer it will take for keeSeek to produce an output. A good starting point for generating primer-like neverwords is the anagram mode with an even distribution of nucleotides (-a 5:5:5:5). Nevertheless, big genomes (e.g. human genome) will require a certain amount of time to produce the first results. Some estimates for different genome sizes are reported in the following table, for both GPU and CPU versions of keeSeek. Note that these are just estimates made on our hardware, namely a server without GPU 1), a server with GPU 2), a desktop with GPU 3) and a notebook with GPU 4), and they can widely vary on different machines. The CPU version of keeSeek (-N) is discouraged when analysing genomes > 100 MB, and professional GPU cards are recommended.
Table1. Expected times to produce the first 128 primer-like neverwords using sequential mode (-w 20) on different genomes.
Reference genome | Genome size | -w option | server1 (no GPU) | server2 (with GPU) | desktop1 (with GPU) | notebook1 (with GPU) |
---|---|---|---|---|---|---|
Mycobacterium tuberculosis | 4.4 MB | 20 | 0m01.52s | 0m1.3s | 0m0.408s | 0m23.81s |
Amycolatopsis mediterranei | 10.2 MB | 20 | 0m33.1s | 0m1.6s | 0m1.036s | 0m54.08s |
Arabidopsis thaliana | 120 MB | 20 | 6m25.6s | 0m8.5s | 0m9.937s | 7m0.889s |
Pyrus sp. | 500 MB | 20 | 27m31.2s | 0m36.3s | 0m41.847s | NOT POSSIBLE |
Xenopus tropicalis | 1.5 GB | 20 | 1h10m28s | 1m50.6s | 2m2.168s | NOT POSSIBLE |
Homo sapiens | 3 GB | 20 | 2h44m36s | 3m38.2s | NOT POSSIBLE | NOT POSSIBLE |
Table2. Expected times to produce the first 128 primer-like neverwords using anagram mode (-a) on different genomes.
Reference genome | Genome size | -a option | server1 (no GPU) | server2 (with GPU) | desktop1 (with GPU) | notebook1 (with GPU) |
---|---|---|---|---|---|---|
Mycobacterium tuberculosis | 4.4 MB | 5:5:5:5 | 0m01.4s | 0m1.1s | 0m0.612s | 0m23.28s |
Amycolatopsis mediterranei | 10.2 MB | 5:5:5:5 | 0m33.1s | 0m1.8s | 0m0.868s | 0m53.99s |
Arabidopsis thaliana | 120 MB | 5:5:5:5 | 6m23.3s | 0m8.5s | 0m9.553s | 7m2.203s |
Pyrus sp. | 500 MB | 5:5:5:5 | 27m18.8s | 0m36.7s | 0m41.843s | NOT POSSIBLE |
Xenopus tropicalis | 1.5 GB | 5:5:5:5 | 1h10m1s | 1m50.7s | 2m1.396s | NOT POSSIBLE |
Homo sapiens | 3 GB | 5:5:5:5 | 2h44m1s | 3m09.1s | NOT POSSIBLE | NOT POSSIBLE |
On the importance of reshuffling: Sometimes you will notice that generation times are unusually long. In that case, changing the seed for reshuffling (-R) could be useful. Try to explore the sequences generated using option -v 2 and look at their variation patterns. If, for example, the last 2 nucleotides are 'A' or 'T', and the reshuffling mask does not affect them, the filters will discard all k-mers (3' end filtering) untill those nucleotides are changed by the lexicographic order of k-mer generation.