Module 3: DNA Databases and Sequence Queries

 

Sequence Search and Retrieval Methods

Databases of sequence sets module 3 contents Exercise 3 back to the index of modules


When a researcher sequences a new cDNA or genomic clone isolated from his/her field of research (tissue-, organ-, disease- or species-specific), the obvious next step is to search for and retrieve any sequence(s) in the databases which could be the same as or similar to the new sequence. There are a variety of sequence search algorithms which are quite good at this process. Most of these sites enable the user to retrieve other data besides DNA sequences.
  • Entrez: This service from NCBI is a very powerful tool to search for any nucleotide or protein sequence found in GenBank. In addition, it can search for related Medline research articles associated with the sequence. The Entrez database has the following relationships within its components.
    • Sequence relationships are computed with BLAST (Module 4 for discussion). One can not search directly for similarities between a test sequence and the database sequences since Entrez does not allow sequence entries, in any format, as a search term. After a BLAST search, the sequences which seem highly similar to the test sequence can then be retrieved through Entrez.
    • Relationships between DNA and protein sequences rely on accession numbers.
    • Shared keywords and accession numbers can be used for going back-and-forth between sequences and related Medline articles & abstracts (Approximately 11 million of the publications in the Medline database).
    • Related articles can be accessed with shared keywords called "MESH" (Medical Subject Headings) terms.
  • Sequence Retrieval System (SRS): This system is a service created at the EBI of EMBL. It is a global sequence retrieval system and includes other applications (such as search for mutations, transcription factor binding sites etc.,) besides the retrieval of DNA and protein sequences.
  • WWW search launcher: Baylor College of Medicine has a global portal for searching any type of database (similar to the SRS). The nucleotide search site in this server allows the user to enter DNA sequences (up to 7,000 bases maximum) for similarity searches using BLAST. There is also an application which can convert sequences in any format to the FASTA format (Module 4 for more details) for similarity searches.
  • E-mail servers (Non-interactive): Each of the 3 major databases has some form of e-mail service through which one can retrieve sequences. In these cases, the request to the database should contain some key word (eg., MESH words, gene name) and/or unique identifier (eg., accession number).
    • Query Server from NCBI can be searched by sending e-mail to [email protected] with the following two-line message:
      DB [domain]
      UID [Unique identifier or text term]

      The domain could be "n" for nucleotide sequence and "p" for protein sequence. The unique identifier (UID) could be an accession number. Optional search parameters and formatting specifications are placed in following lines. To get a better explanation for the search terms and other help with this service, help documentation can be obtained by sending e-mail to the above address with the word "HELP" in the body of the message.
    • EMBL sequences can be obtained by sending e-mail to [email protected] with no subject heading and a message that says, get nuc:[accession number]
      This message will get the annotated DNA sequence with the requested accession number.


Databases of sequence sets module 3 contents Exercise 3 back to the index of modules

| Return to SWBIC home |

The Southwest Biotechnology and Informatics Center WWW server is located at "http://www.swbic.org/".
Please send comments and suggestions to: [email protected]
© SWBIC 2001