Module 3: DNA Databases and Sequence Queries


Major DNA databases

There are three major DNA databases spread across the globe, as part of the International Nucleotide Sequence Database Collaboration. They encompass all published (sometimes unpublished) sequence data arising from all sequencing endeavors around the planet. All three of them are interconnected thanks to the World Wide Web and exchange data daily. Most publications, covering any field of biological research, require that the authors of articles containing sequences submit their data to any one of these databases. The publications refer to the relevant sequence using a reference number (Accession number) generated by the database. This process makes it fairly simple and globally uniform for any researcher in the world to access the submitted sequence(s). It is worthwhile to mention, at this point, that all these databases have corresponding protein data banks related to the stored DNA (gene) sequences.
  • Genbank (USA): This database can be accessed from the National Center for Biotechnology Information (NCBI), which is a division of the National Library of Medicine, funded by the National Institutes of Health (NIH). As of August 2000, there are approximately 9.546 billion bases of sequence stored in this database in 8.214 million sequence records. Once every two months, the NCBI compiles all the available sequence entries and updates them as releases.
  • EMBL (Europe): The European site for storage of DNA sequences is at the European Bioinformatics Institute (EBI) of the European Molecular Biology Laboratory (EMBL) at Hinxton, United Kingdom.
  • DDBJ (Japan): The DNA Data Bank of Japan (DDBJ) is the third major location for storing DNA data in the world.

