Bio Databases 2020: Viruses and COVID-19

NAR databases since 1996. 2019 values are highlighted in red.

NAR Database List

I always look forward to the Nucleic Acids Research (NAR) database issue.  Typically I cover the issue in January, but I got busy, and then a pandemic broke out, so this year I'm later than usual. Through the delay viruses emerged as a good topic. But, first the numbers. 

The opening editorial by DJ Rigden and XM Fernández indicates that 65 new databases were added to the NAR compendium and 125 databases were removed in 2019. As stated in the editorial, the archive now lists 1637 databases showing a slight downward trend. However, if the semi-alphabetical listing of databases is used as a source for counting, then, like last year, there are 1697 databases. The decrease in database growth is, in part, due to the overall trend that the number of new databases being submitted to the archive each year, since 2004, has been slowly decreasing. It is also due to an increase in the number of databases being removed from the archive. Both issues were discussed at the end of last year's blog

NAR Virus Databases

As indicated by this year's title, and apropos to COVID-19, I'm going to focus on databases related to viruses. When the text of NAR list is searched with the terms "virus" or "viral" (using the web browser's "find" command), 31 databases can be identified. Databases named after specific viruses (like HIV) and lacking virus or viral in their title or description are missed, so we can say NAR lists more than 31 virus related databases. As the database issue was assembled at the end of 2019, none of the databases were named COVID-19. One database named SARS-CoV SS RNA also had coronovirus in its description so it was found via the virus/viral search. I also added VIOLIN (the Vaccine Investigation and Online Information Network) database, because it was amongst the "V's" and spotted with viral/virus databases, and vaccines are always an important topic - even more then ever now. The result, a list containing 32 databases. A thumbnail image summary of the list is shown below to the right.

As some know, Digital World Biology is working with Shoreline Community College to develop an immuno-bioinformatics class. Hence, it is worthwhile exploring databases related to immunology. Last year we discussed the NAR Immunological Databases Category, and summarized this category in a table, as background for the development of class exercises. As noted in that post, one component of the class will explore vaccine development using epitope prediction tools hosted by IEDB (Immune Epitope Database) and molecular structure visualization. In the preliminary development of this exercise we used examples from Ebola virus to demonstrate how epitopes predicted from protein sequences can be verified and complemented by examining those sequences in 3D molecular visualization software. With the onset of COVID-19 it became a good idea to revamp that exercise  to include COVID-19 vaccine development. As a complement it also became worthwhile to explore virus related resources. The NAR list is a good place to start.

Click the image to view the table

22 of the 32 virus databases listed by NAR are still active. Of these 22, four (DPVweb, HCV Database, Vir-Mir db, and VirOligo) are likely end of life (see the table). Some of the databases are virus, or disease, specific focusing exclusively on influenza, papilloma virus, hemorrhagic fever, and hepatitis. Others are very general and provide resources for genomics and phylogenetics. Three of the databases (VIRBase, VIRsiRNAdb, and Vir-Mir db) capture information about non-coding (nc) RNA interactions. Non-coding RNA is important in regulating gene activity and likely plays a role in the regulation of viral development cycles and their pathogenicity. One database, VIPERdb, is devoted to icosahedral virus capsid structures, and another, AVPdb, includes experimentally validated anti-viral peptides. VIOLIN (mentioned earlier) is a vaccine database and organizes data for 215 organisms and viruses, 4090 vaccines (2209 licensed), 4044 references, and related pathogen (1649) and host (2482) genes. If one wants to develop a good understanding of virology and related molecular biology, these databases are a great place to visit.

SARS-CoV-2 / COVID-19

Five of the databases listed in NAR (NCBI Viral genomes, ViPR, ViralZone, VirHostNet, and Virus Taxonomy) have specific pages and resources devoted to COVID-19. Each is further described:

  • NCBI Viral genomes has a link that prefilters NCBI's large virus resource on SARS-CoV-2. This page also provide links to a coronavirus blast tool, articles in PubMed, and CDC outbreak information. 
     
  • ViPR - Virus Pathogen and analysis Resource - is maintained by the J. Criag Venter Institute. ViPR is a general resource that includes sequence, immune epitope, 3D structure, antiviral drug, host factor, and other data and information.  The SARS-CoV-2 page contains links to CoV viral subtype genome sequences. From this page serval kinds of analyses can be launched. Additional information includes news, bioinformatics analysis reports, epidemiology reports, and publications. 
     
  •  ViralZone is produced by ExPASy (Expert Protein Analysis System) and the Swiss Institute of Bioinformatics. As a general database it has a host of bioinformatic resources and tools including the Virosaurus, (a portmanteau of virus and thesaurus).
     
  • VirHostNet, a knowledgebase of virus-host molecular interaction networks, captures information about protein-protein interactions, which can be useful in designing drugs and biologics. Indeed, data from VirHostNet was used to develop a bioRxiv preprint entitled "A SARS-CoV-2-Human Protein-Protein Interaction Map Reveals Drug Targets and Potential Drug-Repurposing." It's companion in the NAR list, Virus Mentha,  did not have a SARS-CoV-2 specific page, but may also be useful for this purpose.
     
  • Virus Taxonomy is maintained by the people who name viruses. It's not so easy to name a virus. As expected this site has a panel devoted to SARS-CoV-2, which includes a link to the Nature Microbiology paper describing how SARS-CoV-2 became SARS-CoV-2.  

In addition to the NAR resources several others are useful for tracking the COVID-19 pandemic. These include:

 

 

Posted from Discovering Biology in a Digital World by Todd Smith on Mon Apr 13, 2020