Module 3: DNA Databases and Sequence Queries
Databases of Sequence Sets
|Specialized databases catering to different types of DNA data sets have been established. Most of these originally pertained to the
human genome data from HGP, but have recently included sequences from other organisms.
- Expressed Sequence Tags (dbEST): Expressed Sequence Tags (ESTs) are DNA
sequences derived from cDNA clones
(Adams MD et al,
1991). A database of ESTs can be utilized in the discovery of new human genes, and identification of coding regions in genomic
sequences. A newly identified open reading frame can be searched against this database to look for potential candidate cDNAs or
- Sequence Tagged Sites (dbSTS): Sequenced Tagged Sites (STSs) are short
(about 200-500 bp) sequences that are operationally unique in a genome (i.e., can be specifically detected by PCR in the
presence of all other genomic sequences), and that define a specific position on the physical map
(Olson et al,
1989). A STS database can therefore be used to map a newly determined DNA sequence to a single position within the genome
provided that it is an unique sequence.
- Single Nucleotide Polymorphisms (dbSNP): The most common sequence variations
in populations with heritable phenotypes are Single Nucleotide Polymorphisms (SNPs), which occur approximately once every 100 to 300
bases. Recently, there has been a great deal of interest in the discovery and detection of SNPs associated with "multigenic"
diseases, such as diabetes and environmental factor-related illnesses. The dbSNP database can be analyzed for polymorphisms linked
to a disease within any given racial population.
- Genome Survey Sequences (dbGSS): The Genome Survey Sequences (GSSs) are
similar to ESTs except that they are genomic in origin unlike the cDNA origin for ESTs. They are mostly random "single pass read"
genome survey sequences which have not been mapped in the genome.