Instructions for generating input files for Argot2.5 batch process

If your goal is to annotate a large set of sequences (or entire genomes) you can use Argot2.5 batch processing section. In this case we cannot provide the computational resources necessary to accomplish BLAST and HMMER analysis but we need you to do them by yourself. You will have to submit the output files from BLAST and HMMER (both of them are recommended, but only one is necessary) in the correct format.

Below there is a list of MANDATORY instructions that MUST be followed carefully for a correct generation of Argot3 input files.

Input files for BLAST and HMM

Your sequences must be in fasta format. Header lines must contain a '>' followed by an alphanumeric univocal string, we call unique ID, containing no spaces and not longer than 20 characters (the rest of the eventual comments in the fasta header, separated by one or more white spaces from the unique ID, will not be considered). If you want to perform both BLAST and HMMER searches you need to have protein sequences only and the input file for both these searches must be the same.

Databases

BLAST and HMM searches MUST be performed against Uniprot and P-fam databases, respectively.

You can download Uniprot database at http://www.uniprot.org/ (in our server we use both SwissProt and Trembl datasets from Uniprot).

You can download P-fam database at one of the following sites (in our server we use both Pfam-A and Pfam-B):

BLAST

Download and install ncbi-blast-2.2.24+ (http://blast.ncbi.nlm.nih.gov/Blast.cgi?CMD=Web&PAGE_TYPE=BlastDocs&DOC_TYPE=Download).

Once you have BLAST working on your machines you can launch blastp or blastx (if you have protein or nucleotide sequences, respectively) with the the option for custom tabular output WITH COMMENTS (-outfmt "7 qseqid sseqid evalue").

An example of command line for launching protein blast is:

blastp -outfmt "7 qseqid sseqid evalue " -query your_sequences -db Uniprot -out output_file

HMMER

Download and install HMMER3-3.0 package (http://hmmer.janelia.org/).

Once you have HMMER working on your machines you can launch hmmscan program on your protein dataset with the option for tabular output (--tblout output_file).

An example of command line for launching hmmscan is:

hmmscan --tblout output_file P-fam_database your_protein_sequences

Argot2.5 submission

Once you have completed BLAST and/or HMMER searches you need to compress the tabular output files in zip format. The .zip files produced are ready to be submitted to Argot2.5.

Examples

BLAST

The spaces are the TAB characters

HMMer