Expected time to see the first results

The larger your genome, the longer it will take for keeSeek to produce an output. A good starting point for generating primer-like neverwords is the anagram mode with an even distribution of nucleotides (-a 5:5:5:5). Nevertheless, big genomes (e.g. human genome) will require a certain amount of time to produce the first results. Some estimates for different genome sizes are reported in the following table, for both GPU and CPU versions of keeSeek. Note that these are just estimates made on our hardware, namely a server without GPU 1), a server with GPU 2), a desktop with GPU 3) and a notebook with GPU 4), and they can widely vary on different machines. The CPU version of keeSeek (-N) is discouraged when analysing genomes > 100 MB, and professional GPU cards are recommended.


Table1. Expected times to produce the first 128 primer-like neverwords using sequential mode (-w 20) on different genomes.

Reference genome Genome size -w option server1
(no GPU)
server2
(with GPU)
desktop1
(with GPU)
notebook1
(with GPU)
Mycobacterium tuberculosis 4.4 MB 20 0m01.52s 0m1.3s 0m0.408s 0m23.81s
Amycolatopsis mediterranei 10.2 MB 20 0m33.1s 0m1.6s 0m1.036s 0m54.08s
Arabidopsis thaliana 120 MB 20 6m25.6s 0m8.5s 0m9.937s 7m0.889s
Pyrus sp. 500 MB 20 27m31.2s 0m36.3s 0m41.847s NOT POSSIBLE
Xenopus tropicalis 1.5 GB 20 1h10m28s 1m50.6s 2m2.168s NOT POSSIBLE
Homo sapiens 3 GB 20 2h44m36s 3m38.2s NOT POSSIBLE NOT POSSIBLE


Table2. Expected times to produce the first 128 primer-like neverwords using anagram mode (-a) on different genomes.

Reference genome Genome size -a option server1
(no GPU)
server2
(with GPU)
desktop1
(with GPU)
notebook1
(with GPU)
Mycobacterium tuberculosis 4.4 MB 5:5:5:5 0m01.4s 0m1.1s 0m0.612s 0m23.28s
Amycolatopsis mediterranei 10.2 MB 5:5:5:5 0m33.1s 0m1.8s 0m0.868s 0m53.99s
Arabidopsis thaliana 120 MB 5:5:5:5 6m23.3s 0m8.5s 0m9.553s 7m2.203s
Pyrus sp. 500 MB 5:5:5:5 27m18.8s 0m36.7s 0m41.843s NOT POSSIBLE
Xenopus tropicalis 1.5 GB 5:5:5:5 1h10m1s 1m50.7s 2m1.396s NOT POSSIBLE
Homo sapiens 3 GB 5:5:5:5 2h44m1s 3m09.1s NOT POSSIBLE NOT POSSIBLE


On the importance of reshuffling: Sometimes you will notice that generation times are unusually long. In that case, changing the seed for reshuffling (-R) could be useful. Try to explore the sequences generated using option -v 2 and look at their variation patterns. If, for example, the last 2 nucleotides are 'A' or 'T', and the reshuffling mask does not affect them, the filters will discard all k-mers (3' end filtering) untill those nucleotides are changed by the lexicographic order of k-mer generation.

1) server1: Intel Xeon E5540 quad core, 2.53GHz, 32GB RAM
2) server2: AMD Opteron 6128 quad core, 2.6GHz, 64GB RAM; GPU Nvidia Fermi M2050, 6GB global memory
3) desktop1: Intel Q6600, 2.40 GHz, 3GB RAM; GPU Nvidia GeForce GT 640 GDDR5, 2GB global memory
4) notebook1: Intel i7 M260, 2.67 GHz, 4GB RAM; GPU Nvidia GeForce 310M, 0.5GB global memory
expected_time_to_see_the_first_results.txt · Last modified: 2014/04/15 18:38 by admin
Recent changes RSS feed
Public Domain
Donate Powered by PHP Valid XHTML 1.0 Valid CSS Driven by DokuWiki