Module 6: Structure-Function Relationships


Exercise 6

Chromosomal Structures module 6 contents back to the index of modules

We will perform two different exercises to illustrate structure-function relationships.

1. Analyze a DNA sequence for ORF, and do secondary structure prediction on the resulting predicted polypeptide.

  • Get a human DNA sequence which is not finished. (with possible sequencing errors; "working draft")
    • Go to the Sanger Centre HGP web site to get a piece of unfinished sequence. This link will take you to a search engine for sequences in human chromosomes. First, pick chromosome 9 from the drop-down box at the top (under the "Seq Search" tab) and then click on chromosome 9 from the last drop-down box (in the " Sequencing Information" section).
    • You should get to a FTP site with links to a list of unfinished chromosome 9 "contig" sequences. Pick bA4E23 as the sequence for this exercise. This will show you four contigs of varying sizes (~2.5 to 3.2 kb). Notice that the 4 sequences are in the FASTA format.
    • Copy and paste all of these sequences on to a new file using a word processing software such as Notepad.
  • Check for ORF's in the sequence of interest using ORF Finder.
    • Link to the simple ORF Finder tool at NCBI.
    • Copy and paste any one of the contigs from chromosome 9 in FASTA format and click "OrfFind".
    • You will get all potential ORF's of at least 100 bp in size (ie., 33 amino acids) in all six reading frames. ORF Finder is a relatively simple-minded algorithm that looks for a start codon on all six reading frames and finds an ORF if it is more than 100 bp to the next stop codon.
    • Save the results in a file for later comparisons.
    • Some issues to be considered before you proceed:
      1. How will sequencing errors (substitutions, deletions, frame shifts etc.,) in this working draft alter the results from ORF Finder?
      2. Do short ORF's have any significance within a gene? (Remember that this is genomic DNA and hence will have exons and introns within the gene sequence.)

  • Use more advanced searches for ORF or exon sequences.
    • Link to the Baylor College of Medicine comprehensive gene searcher site (BCM Search Launcher).
    • Choose the GRAIL-1.3 option. You may want to check the help [H] option for data input and output formats and analysis of results.
    • Copy and paste a sequence from human chromosome 9 (as mentioned above) or this sequence. Be aware that the latter sequence is from the intron region of a gene (in this case, murine p53 intron 1) and, by definition, introns do not contain coding sequences.
    • Click on "Perform Search".
    • Notice that the number of potential exons (ORF's) may be zero or quite few (1 or 2) compared to the ORF Finder results.
    • Answer the following questions regarding this search:
      1. Why are the number of ORF's lower in this case? (Hint: Check the help section of GRAIL for answers.)
      2. What happens when you perform the search using the BCM Gene Finder feature (FGENESH) which performs human gene structure predictions based on the Hidden Markov Model (HMM)? Do you find the same exon(s) as with GRAIL or different one(s)? Why or why not?
  • Find the best possible ORF and translate it into polypeptide.
    • Pick the best possible ORF using the following criteria.
    • Translate the ORF into its corresponding polypeptide at the University of Minnesota DNA sequence translation server. Choose the query type to be "Raw" and copy the ORF sequence into the box for "raw sequence".
  • Perform secondary structure prediction on the translated polypeptide, using the following tools.
    • Protein Hydrophilicity/Hydrophobicity Server to detect hydrophobic regions in the polypeptide.
    • Statistical Analysis of Protein Sequences (SAPS) for membrane-spanning, coiled-coil, helix-turn-helix, or signal peptide domains.
    • Secondary Structural Content Prediction (SSCP) to determine if there is a predominant secondary structure (alpha helix or beta sheet) present.
    • Check for motifs, domains and patterns in the polypeptide.
      • TMpred for transmembrane domains.
      • BLOCKS for blocks of highly conserved regions of proteins.
      • Pfam for families of related proteins.
    • Answer the following questions regarding these searches.
      1. Do you identify any consensus secondary structure features of this ORF from "unfinished" human sequences?
      2. Do you find any conflicting results from the different algorithms? For example, do the hydrophobic regions from Kyte-Doolittle plots generally "match up" with the transmembrane domains?

2. View T cell receptor (TCR) molecule with Protein Explorer (advanced version of RasMol and Chime combined).

  • Go to the RasMol server at the University of Massachusetts and connect to the Protein Explorer site. This is the latest RasMol-derivative with features of Chime incorporated into the algorithm.
  • Enter the PDB (Protein Data Bank) identification code of "1NFD" (without quotes) for alpha-beta TCR molecule.
  • Resize your window to the 1,024 x 768 pixel size and start Explorer.
  • Once you see the rotating molecule, try different options in the panels on the left side. Rotate the molecule by dragging on it with the mouse.
  • Try the advanced features of this software by clicking on "Explore More!". Also, check out the "Form for Recording Observations" and answer the questions in that form.
  • Go over the above steps for a molecule of your interest. The only requirement is that the molecule should have 3-dimensional atomic coordinates stored in PDB.

Chromosomal Structures module 6 contents back to the index of modules

| Return to SWBIC home |

The Southwest Biotechnology and Informatics Center WWW server is located at "".
Please send comments and suggestions to: [email protected]
SWBIC 2001