HOMEAbout SWBIC
 
Ad
Nav Tabs

New: iDNAfication - Solve mysteries using forensic techniques.


DoD Biotech Resources


Minority Student Resources


New Bioinformatics Tools
 

Tutorial in Uses of Web Tools in Molecular Biology


SWBIC Educational Resources  >> Courses & Tutorials  >> Tutorial

Tutorial 1 - How to retrieve a gene sequence or a family of gene sequences.
Tutorial 2- How to construct a family of aligned protein sequences.
Tutorial 3- How to display the 3-dimensional structure of a protein.


Tutorial 1 - How to retrieve a gene sequence or a family of gene sequences.

First get yourself a gene. Go to NCBI and search GenBank using text browsing.   For example to find the gene for the cytoskeletal protein actin enter the keyword actin.  To find the actin genes in the plant potato, enter the keywords  actin and potato.  The search "actin and potato" retreived 16 documents from GenBank (version 100.0, 4/13/99). 

Among these 16 genes are examples of genomic clones eg., STPOAC101 [GenBank accession # X55752] and cDNA clones, eg., STU60483 [GenBank accession #U60483].  Inspection of the GenBank reports will provide information about whether the gene sequence contains a full length gene, location of introns and exons, and the predicted translated product. The report also includes information about the source of the sequenced DNA, [organism, eg. potato, phage library of leaf transcripts].


Tutorial 2 - How to construct a family of aligned protein sequences.

You can either make the family yourself, (a good project for students) or see examples of aligned protein families at sites like Pfam.

The example in the tutorial will be actin.  As we want to look at plants in particular, we used the key words actin and viridiplantae. This retreived 152 documents from the 4/17/97 version of Genbank. We fine-tuned the search for actin and solanaceae, and the search retrieved 30 documents.

To align protein sequences, you must first create a file with properly formatted amino acid sequences. This can be done by cutting and pasting the information from the GenBank reports.  The file will then be sent to ClustalW

An example of a properly formatted set of protein sequences is shown below.

>STPOAC101 potato actin A
AGFAGDDAPRAVFPSIVGRPRHTGVMVGMGQKDAYVGDEAQSKR GILTLKYPIEHGIVSNWDDMEKIWHHTFYNELRVSPDEHPVLLTEAPLNPKANREKMT QIMFETFNVPAMYVAIQAVLSLYASGRTTGIVLDSGDGVSHTVPIYEGYALPHAILRL DLAGRDLTDCLMKILTERGYSFTTSAEREIVRDMKEKLAYVALDYEQELETAKSSSAV EKSYELPDGQVITIGAERFRCPEVLFQPSLVGMEAAGIHETTYNSIMKCDVDIRKDLY GNIVLSGGTTMFPGIADRMSKEITALAPSSMKIKVVAPPERKYSVWIGGSILASLSTF QQMWITKGEYDESGPSIVHRKCF

>STU60483 potato actin B
AGFAGDDAPRAVFPSIVGRPRHTGVMVGMGQKDAYVGDEAQSKR GILTLKYPIEHGIVSNWDDMEKIWHHTFYNELRVAPEEHPVLLTEAPLNPKANREKMT QIMFETFNTPAMYVAIQAVLSLYASGRTTGIVMDSGDGVSHTVPIYEGYALPHAILRL DLAGRDLTDHLMKILTERGYSFTTTAEREIVRDVKEKLSYIALDYEQELDTSKTSSSV EKSYELPDGQVITIGAERFRCPEVLFQPSMIGMEAAGIHETTYNSIMKCDVDIRKDLY GNIVLSGGTTMFNGIADRMSKEITALAPSSMKIKVVAPPERKYSVWIGGSILASLSTF QQ

This information can be pasted into a window on the ClustalW web page. The alignment results will be returned on the browser, usually within minutes.  The alignment of the two potato actin protein sequences are shown below.  This example was done with just two protein sequences;  to align a larger set of sequences simply assemble a longer list using the format indicated above. The Clustal alignment results can also be used as the input file for generating phylogenies, cladograms and phenograms. To do this though, you need to download Clustal and run it locally on your own PC/Mac.

ClustalW Multiple Sequence Alignment Results

Page 1.1

                1            15 16           30 31           45 46           60 
    1 STPOAC101 MADAEDIEPLVCDNG TGMVKAGFAGDDAPR AVFPSIVGRPRHTGV MVGMGQKDAYVGDEA 
    2 STU60483  --------------- -----AGFAGDDAPR AVFPSIVGRPRHTGV MVGMGQKDAYVGDEA 
    61           75 76           90 
    QSKRGILTLKYPIEH GIVSNWDDMEKIWHH     90
    QSKRGILTLKYPIEH GIVSNWDDMEKIWHH     70

Page 2.1
                91          105 106         120 121         135 136         150 
    1 STPOAC101 TFYNELRVSPDEHPV LLTEAPLNPKANREK MTQIMFETFNVPAMY VAIQAVLSLYASGRT 
    2 STU60483  TFYNELRVAPEEHPV LLTEAPLNPKANREK MTQIMFETFNTPAMY VAIQAVLSLYASGRT 
     151         165 166         180 
     TGIVLDSGDGVSHTV PIYEGYALPHAILRL    180
     TGIVMDSGDGVSHTV PIYEGYALPHAILRL    160

Page 3.1
                181         195 196         210 211         225 226         240 
    1 STPOAC101 DLAGRDLTDCLMKIL TERGYSFTTSAEREI VRDMKEKLAYVALDY EQELETAKSSSAVEK 
    2 STU60483  DLAGRDLTDHLMKIL TERGYSFTTTAEREI VRDVKEKLSYIALDY EQELDTSKTSSSVEK 
      241                     270
      SYELPDGQVITIGAE RFRCPEVLFQPSLVG    270
      SYELPDGQVITIGAE RFRCPEVLFQPSMIG    250

Page 4.1
                271         285 286         300 301         315 316         330 
    1 STPOAC101 MEAAGIHETTYNSIM KCDVDIRKDLYGNIV LSGGTTMFPGIADRM SKEITALAPSSMKIK 
    2 STU60483  MEAAGIHETTYNSIM KCDVDIRKDLYGNIV LSGGTTMFNGIADRM SKEITALAPSSMKIK 
     31         345 346         360 
     VVAPPERKYSVWIGG SILASLSTFQQMWIT    360
     VVAPPERKYSVWIGG SILASLSTFQQ----    336

Page 5.1
                361         375 376         390 391         405 406         420 
    1 STPOAC101 KGEYDESGPSIVHRK CF    377
    2 STU60483  --------------- --    336
     421         435 436         450 

Alignment Data (Fasta format)

>STPOAC101
MADAEDIEPLVCDNGTGMVKAGFAGDDAPRAVFPSIVGRPRHTGVMVGMG
QKDAYVGDEAQSKRGILTLKYPIEHGIVSNWDDMEKIWHHTFYNELRVSP
DEHPVLLTEAPLNPKANREKMTQIMFETFNVPAMYVAIQAVLSLYASGRT
TGIVLDSGDGVSHTVPIYEGYALPHAILRLDLAGRDLTDCLMKILTERGY
SFTTSAEREIVRDMKEKLAYVALDYEQELETAKSSSAVEKSYELPDGQVI
TIGAERFRCPEVLFQPSLVGMEAAGIHETTYNSIMKCDVDIRKDLYGNIV
LSGGTTMFPGIADRMSKEITALAPSSMKIKVVAPPERKYSVWIGGSILAS
LSTFQQMWITKGEYDESGPSIVHRKCF
>STU60483
--------------------AGFAGDDAPRAVFPSIVGRPRHTGVMVGMG
QKDAYVGDEAQSKRGILTLKYPIEHGIVSNWDDMEKIWHHTFYNELRVAP
EEHPVLLTEAPLNPKANREKMTQIMFETFNTPAMYVAIQAVLSLYASGRT
TGIVMDSGDGVSHTVPIYEGYALPHAILRLDLAGRDLTDHLMKILTERGY
SFTTTAEREIVRDVKEKLSYIALDYEQELDTSKTSSSVEKSYELPDGQVI
TIGAERFRCPEVLFQPSMIGMEAAGIHETTYNSIMKCDVDIRKDLYGNIV
LSGGTTMFNGIADRMSKEITALAPSSMKIKVVAPPERKYSVWIGGSILAS
LSTFQQ---------------------


Tutorial 3 - How to display the 3-dimensional structure of a protein.

Finally you can also see the 3D structure of a member of the actin gene family. Go to the Protein Data Bank use the 3DB browser and search for an exact match to actin by using the PDB ID number 2btf. Click on the link to view structure and use the First Glance or Protein Explorer option. You need to have Chime loaded as a plug-in for Netscape to see the structures. Instructions for loading Chime are available at PDB.

 

 
Copyright SWBIC   Contact SWBIC