Module 2: Storage, Retrieval and Manipulation of DNA Information (General Themes)

 

Exercise 2

How to manipulate DNA sequences for various applications? (Elaborated in subsequent lectures) module 2 contents back to the index of modules


  • This exercise will familiarize the student with the concept of "working draft" and "finished sequence" of a genome being sequenced.

    Calculate the rate of sequencing (or, how long will it take to sequence) the genome of an imaginary organism, Bioinformatica coursicum.
    1. B. coursicum has 12 chromosomes with a genome size of 2.56 x 109 (billion) base pairs.
    2. Let us assume the current rate of sequencing averages to be 1,000 bases of raw data per second.
    3. You should know about the concept of "depth of coverage" of any sequenced data. Depth of coverage refers to the number of times any given region of chromosome has been sequenced. Typically, a working draft sequence has a 4-5X depth of coverage; ie., that sequence has been sequenced at least 4 to 5 times to reduce errors in the data. In general, a finished sequence has a depth of 9-10X. Sometimes, these are refered to as 4-5 or 9-10 "pass" sequences respectively.
    4. In this case, assume the working draft and the finished sequence will have depths of 5X and 10X respectively.
    5. Assuming that the sequence data is generated 24 hours a day, 7 days a week (which is exactly what happens at the genome centers around the world), how long will it take to get a working draft of the B. coursicum genome if the sequencing starts now?
    6. How long will it take to get a finished sequence of the entire genome?

  • Go to the HGP weekly report site.
    1. Notice that the data sequenced and stored is more than equivalent to the whole genome (The length of the human genome is 3 x 109 base pairs).
    2. The finished, non-redundant sequence published is 23.6% of the genome (as of August 20, 2000). The projected date of completion of the HGP is the year 2003. Why do you think that it will be this long before the complete sequence is available?
    3. What are the factors that could speed up or slow down the completion of the HGP?

  • Link to The Institute for Genomic Research (TIGR) Databases main page and find out the approximate size of the completely sequenced genomes of the following three infectious, microbial organisms.
    1. Haemophilus influenzae
    2. Mycobacterium tuberculosis
    3. Vibrio cholerae


How to manipulate DNA sequences for various applications? (Elaborated in subsequent lectures) module 2 contents back to the index of modules

| Return to SWBIC home |

The Southwest Biotechnology and Informatics Center WWW server is located at "http://www.swbic.org/".
Please send comments and suggestions to: [email protected]
SWBIC 2001