Computational Medicine group [expected_time_to_see_the_first_results]

Expected time to see the first results

The larger your genome, the longer it will take for keeSeek to produce an output. A good starting point for generating primer-like neverwords is the anagram mode with an even distribution of nucleotides (-a 5:5:5:5). Nevertheless, big genomes (e.g. human genome) will require a certain amount of time to produce the first results. Some estimates for different genome sizes are reported in the following table, for both GPU and CPU versions of keeSeek. Note that these are just estimates made on our hardware, namely a server without GPU ¹⁾, a server with GPU ²⁾, a desktop with GPU ³⁾ and a notebook with GPU ⁴⁾, and they can widely vary on different machines. The CPU version of keeSeek (-N) is discouraged when analysing genomes > 100 MB, and professional GPU cards are recommended.

Table1. Expected times to produce the first 128 primer-like neverwords using sequential mode (-w 20) on different genomes.

Reference genome	Genome size	-w option	server1 (no GPU)	server2 (with GPU)	desktop1 (with GPU)	notebook1 (with GPU)
Mycobacterium tuberculosis	4.4 MB	20	0m01.52s	0m1.3s	0m0.408s	0m23.81s
Amycolatopsis mediterranei	10.2 MB	20	0m33.1s	0m1.6s	0m1.036s	0m54.08s
Arabidopsis thaliana	120 MB	20	6m25.6s	0m8.5s	0m9.937s	7m0.889s
Pyrus sp.	500 MB	20	27m31.2s	0m36.3s	0m41.847s	NOT POSSIBLE
Xenopus tropicalis	1.5 GB	20	1h10m28s	1m50.6s	2m2.168s	NOT POSSIBLE
Homo sapiens	3 GB	20	2h44m36s	3m38.2s	NOT POSSIBLE	NOT POSSIBLE

Table2. Expected times to produce the first 128 primer-like neverwords using anagram mode (-a) on different genomes.

Reference genome	Genome size	-a option	server1 (no GPU)	server2 (with GPU)	desktop1 (with GPU)	notebook1 (with GPU)
Mycobacterium tuberculosis	4.4 MB	5:5:5:5	0m01.4s	0m1.1s	0m0.612s	0m23.28s
Amycolatopsis mediterranei	10.2 MB	5:5:5:5	0m33.1s	0m1.8s	0m0.868s	0m53.99s
Arabidopsis thaliana	120 MB	5:5:5:5	6m23.3s	0m8.5s	0m9.553s	7m2.203s
Pyrus sp.	500 MB	5:5:5:5	27m18.8s	0m36.7s	0m41.843s	NOT POSSIBLE
Xenopus tropicalis	1.5 GB	5:5:5:5	1h10m1s	1m50.7s	2m1.396s	NOT POSSIBLE
Homo sapiens	3 GB	5:5:5:5	2h44m1s	3m09.1s	NOT POSSIBLE	NOT POSSIBLE

On the importance of reshuffling: Sometimes you will notice that generation times are unusually long. In that case, changing the seed for reshuffling (-R) could be useful. Try to explore the sequences generated using option -v 2 and look at their variation patterns. If, for example, the last 2 nucleotides are 'A' or 'T', and the reshuffling mask does not affect them, the filters will discard all k-mers (3' end filtering) untill those nucleotides are changed by the lexicographic order of k-mer generation.

¹⁾ server1: Intel Xeon E5540 quad core, 2.53GHz, 32GB RAM

²⁾ server2: AMD Opteron 6128 quad core, 2.6GHz, 64GB RAM; GPU Nvidia Fermi M2050, 6GB global memory

³⁾ desktop1: Intel Q6600, 2.40 GHz, 3GB RAM; GPU Nvidia GeForce GT 640 GDDR5, 2GB global memory

⁴⁾ notebook1: Intel i7 M260, 2.67 GHz, 4GB RAM; GPU Nvidia GeForce 310M, 0.5GB global memory