G4-virus: a comprehensive database for G-quadruplexes in human viruses

G-quadruplexes are non-canonical nucleic acid structures that control transcription, replication and translation, regulated genome recombination and genetic and epigenetic instability. G-quadruplexes are present in eukaryotes, prokaryotes and viruses. On this platform the analyses performed on all human viruses can be easily examined and downloaded.


For each viral class of the Baltimore classification, a compressed tarball is available (here) and contains:

  • For each virus, multiple alignments comprehending all full genome sequences registered in GenBank (fasta format; we suggest to use Jalview for an easy visualization).
  • For each virus, the list of G-quadruplex coordinates and their conservation in the multiple alignment.
  • The list of G-quadruplex coordinates and their conservation with respect to a reference sequence selected for each virus (all the gaps present in the multiple alignments are removed in order to refer the coordinates to the original reference sequences).
  • One file with all “loop_conservation” values, calculated as the difference between G4_SCI (G4 scaffold conservation index) and LSC (local sequence conservation). See main text for further details.


The list of all analyzed viruses in each class is provided. Viral names are links to dedicated pages containing visual information regarding G-quadruplexes found in each virus:

  • Conservation of G-quadruplexes and viral genomes.
  • Sequence complexity measures.
  • Statistical evidence for the calculated G4s in real virus genomes vs. randomized genomes (i.e. genomes with the same length and nucleotide composition, but not nucleotide order).
  • Further links for the visualization of G-quadruplexes in the multiple alignments and in the annotated reference sequence.

Supplementary files and scripts are available for download.


Lavezzo E, Berselli M, Frasson I, Perrone R, Palù G, Brazzale AR, Richter SN, Toppo S. (2018) G-quadruplex forming sequences in the genome of all known human viruses: A comprehensive guide. PLoS Comput Biol 14(12): e1006675. https://doi.org/10.1371/journal.pcbi.1006675.

Search virus by name


Search virus by class

Class 1: dsDNA (double-stranded DNA viruses)

Class 2: ssDNA (single-stranded DNA viruses)

Class 3: dsRNA (double-stranded RNA viruses)

Class 4: (+)ssRNA (positive-sense single-stranded RNA viruses)

Class 5: (-)ssRNA (negative-sense single-stranded RNA viruses)

Class 6: ssRNA-RT (positive-sense single-stranded RNA viruses that replicate through a DNA intermediate)

Class 7: dsDNA-RT (double-stranded DNA viruses that replicate through a single-stranded RNA intermediate)

Data download

FilenameFilesizeLast modified
Reference_sequences.tar.gz1.6 MiB2018/08/01 16:38
dsDNA.tar.gz37.7 MiB2018/08/01 18:40
dsDNA_RT.tar.gz2.7 MiB2018/08/01 18:40
dsRNA.tar.gz5.8 MiB2018/08/01 18:40
ssDNA.tar.gz125.6 KiB2018/08/01 18:40
ssRNA_RT.tar.gz32.6 MiB2018/08/01 18:40
ssRNA_negative_polarity.tar.gz12.8 MiB2018/08/01 18:40
ssRNA_positive_polarity.tar.gz14.5 MiB2018/08/01 18:40
  1. Reference_sequences.tar.gz: Fasta files of sequences used as references
  2. dsDNA.tar.gz: Class 1, double-stranded DNA viruses
  3. dsDNA_RT.tar.gz: Class 7, double-stranded DNA viruses that replicate through a single-stranded RNA intermediate
  4. dsRNA.tar.gz: Class 3, double-stranded RNA viruses
  5. ssDNA.tar.gz: Class 2, single-stranded DNA viruses
  6. ssRNA_RT.tar.gz: Class 6, positive-sense single-stranded RNA viruses that replicate through a DNA intermediate
  7. ssRNA_negative_polarity.tar.gz: Class 5, negative-sense single-stranded RNA viruses
  8. ssRNA_positive_polarity.tar.gz: Class 4, positive-sense single-stranded RNA viruses

Scripts and supplementary files

FilenameFilesizeLast modified
Supplementary_Material.pdf1019.6 KiB2018/08/01 15:55
Supplementary_table_S3.xlsx48.7 KiB2018/08/01 15:55
scripts.tar.gz204.2 KiB2018/10/09 12:22
  1. Supplementary_material.pdf: file containing all supplemental figures and tables cited in the paper.
  2. Supplementary_table_S3.xlsx: list of viruses whose G4 content is significant at 10% with respect to randomized sequences.
  3. scripts.tar.gz: scripts used to perform the analyses.

Dept. Molecular Medicine, University of Padova

contact: enrico.lavezzo@unipd.it, berselli.michele@gmail.com

