Tutorial in Uses of Web Tools in Molecular Biology


SWBIC Educational Resources  >> Courses & Tutorials  >> Tutorial


Tutorial 1 – How to retrieve a gene sequence or a family of gene sequences.
Tutorial 2– How to construct a family of aligned protein sequences.
Tutorial 3– How to display the 3-dimensional structure of a protein.

Tutorial 1 – How to retrieve a gene sequence or a family of gene sequences.

First get yourself a gene. Go to NCBI and search GenBank using text browsing.   For example to find the gene for the cytoskeletal protein actin enter the keyword actin.  To find the actin genes in the plant potato, enter the keywords  actin and potato.  The search “actin and potato” retreived 16 documents from GenBank (version 100.0, 4/13/99). 

Among these 16 genes are examples of genomic clones eg., STPOAC101 [GenBank accession # X55752] and cDNA clones, eg., STU60483 [GenBank accession #U60483].  Inspection of the GenBank reports will provide information about whether the gene sequence contains a full length gene, location of introns and exons, and the predicted translated product. The report also includes information about the source of the sequenced DNA, [organism, eg. potato, phage library of leaf transcripts].


Tutorial 2 – How to construct a family of aligned protein sequences.

You can either make the family yourself, (a good project for students) or see examples of aligned protein families at sites like Pfam.

The example in the tutorial will be actin.  As we want to look at plants in particular, we used the key words actin and viridiplantae. This retreived 152 documents from the 4/17/97 version of Genbank. We fine-tuned the search for actin and solanaceae, and the search retrieved 30 documents.

To align protein sequences, you must first create a file with properly formatted amino acid sequences. This can be done by cutting and pasting the information from the GenBank reports.  The file will then be sent to ClustalW

An example of a properly formatted set of protein sequences is shown below.

>STPOAC101 potato actin A
AGFAGDDAPRAVFPSIVGRPRHTGVMVGMGQKDAYVGDEAQSKR GILTLKYPIEHGIVSNWDDMEKIWHHTFYNELRVSPDEHPVLLTEAPLNPKANREKMT QIMFETFNVPAMYVAIQAVLSLYASGRTTGIVLDSGDGVSHTVPIYEGYALPHAILRL DLAGRDLTDCLMKILTERGYSFTTSAEREIVRDMKEKLAYVALDYEQELETAKSSSAV EKSYELPDGQVITIGAERFRCPEVLFQPSLVGMEAAGIHETTYNSIMKCDVDIRKDLY GNIVLSGGTTMFPGIADRMSKEITALAPSSMKIKVVAPPERKYSVWIGGSILASLSTF QQMWITKGEYDESGPSIVHRKCF

>STU60483 potato actin B
AGFAGDDAPRAVFPSIVGRPRHTGVMVGMGQKDAYVGDEAQSKR GILTLKYPIEHGIVSNWDDMEKIWHHTFYNELRVAPEEHPVLLTEAPLNPKANREKMT QIMFETFNTPAMYVAIQAVLSLYASGRTTGIVMDSGDGVSHTVPIYEGYALPHAILRL DLAGRDLTDHLMKILTERGYSFTTTAEREIVRDVKEKLSYIALDYEQELDTSKTSSSV EKSYELPDGQVITIGAERFRCPEVLFQPSMIGMEAAGIHETTYNSIMKCDVDIRKDLY GNIVLSGGTTMFNGIADRMSKEITALAPSSMKIKVVAPPERKYSVWIGGSILASLSTF QQ

This information can be pasted into a window on the ClustalW web page. The alignment results will be returned on the browser, usually within minutes.  The alignment of the two potato actin protein sequences are shown below.  This example was done with just two protein sequences;  to align a larger set of sequences simply assemble a longer list using the format indicated above. The Clustal alignment results can also be used as the input file for generating phylogenies, cladograms and phenograms. To do this though, you need to download Clustal and run it locally on your own PC/Mac.

ClustalW Multiple Sequence Alignment Results

Page 1.1                  1            15 16           30 31           45 46           60      1 STPOAC101 MADAEDIEPLVCDNG TGMVKAGFAGDDAPR AVFPSIVGRPRHTGV MVGMGQKDAYVGDEA      2 STU60483  --------------- -----AGFAGDDAPR AVFPSIVGRPRHTGV MVGMGQKDAYVGDEA      61           75 76           90      QSKRGILTLKYPIEH GIVSNWDDMEKIWHH     90     QSKRGILTLKYPIEH GIVSNWDDMEKIWHH     70  Page 2.1                 91          105 106         120 121         135 136         150      1 STPOAC101 TFYNELRVSPDEHPV LLTEAPLNPKANREK MTQIMFETFNVPAMY VAIQAVLSLYASGRT      2 STU60483  TFYNELRVAPEEHPV LLTEAPLNPKANREK MTQIMFETFNTPAMY VAIQAVLSLYASGRT       151         165 166         180       TGIVLDSGDGVSHTV PIYEGYALPHAILRL    180      TGIVMDSGDGVSHTV PIYEGYALPHAILRL    160  Page 3.1                 181         195 196         210 211         225 226         240      1 STPOAC101 DLAGRDLTDCLMKIL TERGYSFTTSAEREI VRDMKEKLAYVALDY EQELETAKSSSAVEK      2 STU60483  DLAGRDLTDHLMKIL TERGYSFTTTAEREI VRDVKEKLSYIALDY EQELDTSKTSSSVEK        241                     270       SYELPDGQVITIGAE RFRCPEVLFQPSLVG    270       SYELPDGQVITIGAE RFRCPEVLFQPSMIG    250  Page 4.1                 271         285 286         300 301         315 316         330      1 STPOAC101 MEAAGIHETTYNSIM KCDVDIRKDLYGNIV LSGGTTMFPGIADRM SKEITALAPSSMKIK      2 STU60483  MEAAGIHETTYNSIM KCDVDIRKDLYGNIV LSGGTTMFNGIADRM SKEITALAPSSMKIK       31         345 346         360       VVAPPERKYSVWIGG SILASLSTFQQMWIT    360      VVAPPERKYSVWIGG SILASLSTFQQ----    336  Page 5.1                 361         375 376         390 391         405 406         420      1 STPOAC101 KGEYDESGPSIVHRK CF    377     2 STU60483  --------------- --    336      421         435 436         450   

Alignment Data (Fasta format)

>STPOAC101 MADAEDIEPLVCDNGTGMVKAGFAGDDAPRAVFPSIVGRPRHTGVMVGMG QKDAYVGDEAQSKRGILTLKYPIEHGIVSNWDDMEKIWHHTFYNELRVSP DEHPVLLTEAPLNPKANREKMTQIMFETFNVPAMYVAIQAVLSLYASGRT TGIVLDSGDGVSHTVPIYEGYALPHAILRLDLAGRDLTDCLMKILTERGY SFTTSAEREIVRDMKEKLAYVALDYEQELETAKSSSAVEKSYELPDGQVI TIGAERFRCPEVLFQPSLVGMEAAGIHETTYNSIMKCDVDIRKDLYGNIV LSGGTTMFPGIADRMSKEITALAPSSMKIKVVAPPERKYSVWIGGSILAS LSTFQQMWITKGEYDESGPSIVHRKCF >STU60483 --------------------AGFAGDDAPRAVFPSIVGRPRHTGVMVGMG QKDAYVGDEAQSKRGILTLKYPIEHGIVSNWDDMEKIWHHTFYNELRVAP EEHPVLLTEAPLNPKANREKMTQIMFETFNTPAMYVAIQAVLSLYASGRT TGIVMDSGDGVSHTVPIYEGYALPHAILRLDLAGRDLTDHLMKILTERGY SFTTTAEREIVRDVKEKLSYIALDYEQELDTSKTSSSVEKSYELPDGQVI TIGAERFRCPEVLFQPSMIGMEAAGIHETTYNSIMKCDVDIRKDLYGNIV LSGGTTMFNGIADRMSKEITALAPSSMKIKVVAPPERKYSVWIGGSILAS LSTFQQ--------------------- 


Tutorial 3 – How to display the 3-dimensional structure of a protein.

Finally you can also see the 3D structure of a member of the actin gene family. Go to the Protein Data Bank use the 3DB browser and search for an exact match to actin by using the PDB ID number 2btf. Click on the link to view structure and use the First Glance or Protein Explorer option. You need to have Chime loaded as a plug-in for Netscape to see the structures. Instructions for loading Chime are available at PDB.