SeqEST Help

Run Program

Contents

Input Sequences
The Form
Information about each field
Download Page and Warnings

Overview

SeqEST is a utility that helps in preparing an Expressed Sequence Tag (EST) submission for the NCBI dbEST database. Full instructions for submitting to dbEST are found at the NCBI dbEST information page. In SeqEST, you supply the EST sequences in FASTA-format, and fill in a form for the information to be supplied with each sequence. SeqEST then prepares a file, which you may download and edit further before submitting to NCBI. The values supplied in the form are placed in fields in each sequence header. Some fields will have information that is the same for all sequences, which SeqEST will fill out. Other fields will have information that is different and specific for each sequence; SeqEST will create these fields, but you will need to edit the output file and add the specific information. You may choose which fields SeqEST outputs in each sequence header.

Note: The NCBI dbEST submission page allows submission of several file “TYPES.” The only “TYPE” of file output by SeqEST is EST.

Terms:

  • Field: a specific type of information to be supplied with an EST sequence; it consists of a tag and descriptive information
  • Tag: a short, capitalized word that defines the type of field (e.g., STATUS and CITATION)
  • Sequence header: all tags and their descriptions associated with an EST sequence. Following the header is the tag SEQUENCE and the EST sequence.
  • Output file – downloadable file created by SeqEST program; you will want to edit this file further to include information specific to each sequence.

Input Sequences

The EST sequences are input from a file on your computer. This must be a text file with the sequences in FASTA format: each sequence has a definition line starting with the “>” character, followed by one or more sequence lines.

The Form

The EST information form is seperated into two sections, the obligatory and the non-obligatory.  The first section consists of information that must be filled in for an output file to be created. The information in the second section (non-obligatory fields) will be printed in the output file only if the associated check box has been checked; if no information is entered in the text box for a field, the tag will be output so that you may edit the file on your computer to contain the information specific to that sequence.

Also note the following:

  • Information in the form text fields is included in every sequence header in the output file.
  • Information in the form text fields is not checked for errors, it is copied directly into the output file headers. 
  •  The tag “EST#” is not in the form but the tag is printed in the output file. This is because the EST number is unique to every sequence and must be entered seperately.

Information about each field

Fields are presented in the following order in the form.

Obligatory Fields Suggested Information Example of or size of field
  • STATUS is either New or Updated: Choose an option.
New entry or Updating existing EST entry. New Update
  • CONT_NAME
Name of contact Size: 40 characters
  • CITATION
Journal citation. Size: 400 character
  • LIBRARY
Library Name Size: 40 characters
  • PUBLIC
Date of public release. See below for more information.
Non-Obligatory Fields Suggested Information Example of or size of field
  • GB#
GenBank accession number Size: 40 characters
  • GB_SEC
Secondary GenBank accessions Size: 40 characters
  • GDB#
Genome database accession number Size: 40 characters
  • GDB_DSEG
Genome database Dsegment number Size: 40 characters
  • CLONE
Clone id Size: 40 characters
  • SOURCE
Source providing clone Size: 40 characters
  • SOURCE_DNA
Source id number for the clone as a pure DNA Size: 40 characters
  • SOURCE_INHOST
Source id number for the clone stored in the host Size: 40 characters
  • OTHER_EST
Other ESTs on this clone Size: 40 characters
  • DBNAME
Database name for cross-reference to another database Size: 40 characters
  • DBXREF
Database cross-reference accession Size: 40 characters
  • PCR_F
Forward PCR primer sequence Size: 40 characters
  • PCR_B
Backward PCR primer sequence Size: 40 characters
  • INSERT
Insert length (in bases) Size: 40 characters
  • ERROR
Estimated error in insert length (bases) Size: 40 characters
  • PLATE
Plate number or code Size: 40 characters
  • ROW
Row number or letter Size: 40 characters
  • COLUMN
Column number or letter Size: 40 characters
  • SEQ_PRIMER
Sequencing primer description or sequence Size: 40 characters
  • P_END
Which end sequenced Size: 40 characters
  • HIQUAL_START
Base position of start of highest quality sequence (default = 1) Size: 40 characters
  • HIQUAL_STOP
Base position of last base of highest quality sequence Size: 40 characters
  • DNA_TYPE
cDNA (default), Genomic, Viral, Synthetic, Other
  • PUT_ID
Putative identification of sequence by submitter Size: 40 characters
  • TAG_LIB
Name of library whose tag is found in this sequence Size: 40 characters
  • TAG_TISSUE
Tissue that was source for tagged library, if a library tag was found Size: 40 characters
  • TAG_SEQ
The actual sequence of the library tag found in the EST read.  If the tag was searched for and not found, “None found” is entered. Size: 40 characters  “None found” is default information.
  • POLYA
If a polyA tail was found in the EST sequence, yes or no. Yes No
  • COMMENT
Comments about EST Size: 500 characters

Information for PUBLIC field. There are two options:

  • Enter an exact date of release. The first field is the day, second is the month and the third is the year. Use the pull-down menu to choose day, month and year.
  • “Immediate” option. This indicates that the information is for immediate release. Choosing this option will leave the date blank.

Download Page and Warnings

This page contains a link to the dbEST submission file, which you may view or download using your browsers “save as” function. If the lines run together in the downloaded file, load the file into a word processing program (which will display it correctly) and save it as text.

If the input sequence file is not in valid FASTA format, the output file will not be created and a warning will be given.

If invalid information is entered in the form, the output file will be created but a warning will be shown with a message about the invalid information.