Tutorial in Uses of Web Tools in Molecular Biology
SWBIC Educational Resources >> Courses & Tutorials >> Tutorial
- Tutorial 1 – How to retrieve a gene sequence or a family of gene sequences.
- Tutorial 2– How to construct a family of aligned protein sequences.
- Tutorial 3– How to display the 3-dimensional structure of a protein.
Tutorial 1 – How to retrieve a gene sequence or a family of gene sequences.
First get yourself a gene. Go to NCBI and search GenBank using text browsing. For example to find the gene for the cytoskeletal protein actin enter the keyword actin. To find the actin genes in the plant potato, enter the keywords actin and potato. The search “actin and potato” retreived 16 documents from GenBank (version 100.0, 4/13/99).
Among these 16 genes are examples of genomic clones eg., STPOAC101 [GenBank accession # X55752] and cDNA clones, eg., STU60483 [GenBank accession #U60483]. Inspection of the GenBank reports will provide information about whether the gene sequence contains a full length gene, location of introns and exons, and the predicted translated product. The report also includes information about the source of the sequenced DNA, [organism, eg. potato, phage library of leaf transcripts].
Tutorial 2 – How to construct a family of aligned protein sequences.
You can either make the family yourself, (a good project for students) or see examples of aligned protein families at sites like Pfam.
The example in the tutorial will be actin. As we want to look at plants in particular, we used the key words actin and viridiplantae. This retreived 152 documents from the 4/17/97 version of Genbank. We fine-tuned the search for actin and solanaceae, and the search retrieved 30 documents.
To align protein sequences, you must first create a file with properly formatted amino acid sequences. This can be done by cutting and pasting the information from the GenBank reports. The file will then be sent to ClustalW
An example of a properly formatted set of protein sequences is shown below.
>STPOAC101 potato actin A
AGFAGDDAPRAVFPSIVGRPRHTGVMVGMGQKDAYVGDEAQSKR GILTLKYPIEHGIVSNWDDMEKIWHHTFYNELRVSPDEHPVLLTEAPLNPKANREKMT QIMFETFNVPAMYVAIQAVLSLYASGRTTGIVLDSGDGVSHTVPIYEGYALPHAILRL DLAGRDLTDCLMKILTERGYSFTTSAEREIVRDMKEKLAYVALDYEQELETAKSSSAV EKSYELPDGQVITIGAERFRCPEVLFQPSLVGMEAAGIHETTYNSIMKCDVDIRKDLY GNIVLSGGTTMFPGIADRMSKEITALAPSSMKIKVVAPPERKYSVWIGGSILASLSTF QQMWITKGEYDESGPSIVHRKCF
>STU60483 potato actin B
AGFAGDDAPRAVFPSIVGRPRHTGVMVGMGQKDAYVGDEAQSKR GILTLKYPIEHGIVSNWDDMEKIWHHTFYNELRVAPEEHPVLLTEAPLNPKANREKMT QIMFETFNTPAMYVAIQAVLSLYASGRTTGIVMDSGDGVSHTVPIYEGYALPHAILRL DLAGRDLTDHLMKILTERGYSFTTTAEREIVRDVKEKLSYIALDYEQELDTSKTSSSV EKSYELPDGQVITIGAERFRCPEVLFQPSMIGMEAAGIHETTYNSIMKCDVDIRKDLY GNIVLSGGTTMFNGIADRMSKEITALAPSSMKIKVVAPPERKYSVWIGGSILASLSTF QQ
This information can be pasted into a window on the ClustalW web page. The alignment results will be returned on the browser, usually within minutes. The alignment of the two potato actin protein sequences are shown below. This example was done with just two protein sequences; to align a larger set of sequences simply assemble a longer list using the format indicated above. The Clustal alignment results can also be used as the input file for generating phylogenies, cladograms and phenograms. To do this though, you need to download Clustal and run it locally on your own PC/Mac.
ClustalW Multiple Sequence Alignment Results
Page 1.1 1 15 16 30 31 45 46 60 1 STPOAC101 MADAEDIEPLVCDNG TGMVKAGFAGDDAPR AVFPSIVGRPRHTGV MVGMGQKDAYVGDEA 2 STU60483 --------------- -----AGFAGDDAPR AVFPSIVGRPRHTGV MVGMGQKDAYVGDEA 61 75 76 90 QSKRGILTLKYPIEH GIVSNWDDMEKIWHH 90 QSKRGILTLKYPIEH GIVSNWDDMEKIWHH 70 Page 2.1 91 105 106 120 121 135 136 150 1 STPOAC101 TFYNELRVSPDEHPV LLTEAPLNPKANREK MTQIMFETFNVPAMY VAIQAVLSLYASGRT 2 STU60483 TFYNELRVAPEEHPV LLTEAPLNPKANREK MTQIMFETFNTPAMY VAIQAVLSLYASGRT 151 165 166 180 TGIVLDSGDGVSHTV PIYEGYALPHAILRL 180 TGIVMDSGDGVSHTV PIYEGYALPHAILRL 160 Page 3.1 181 195 196 210 211 225 226 240 1 STPOAC101 DLAGRDLTDCLMKIL TERGYSFTTSAEREI VRDMKEKLAYVALDY EQELETAKSSSAVEK 2 STU60483 DLAGRDLTDHLMKIL TERGYSFTTTAEREI VRDVKEKLSYIALDY EQELDTSKTSSSVEK 241 270 SYELPDGQVITIGAE RFRCPEVLFQPSLVG 270 SYELPDGQVITIGAE RFRCPEVLFQPSMIG 250 Page 4.1 271 285 286 300 301 315 316 330 1 STPOAC101 MEAAGIHETTYNSIM KCDVDIRKDLYGNIV LSGGTTMFPGIADRM SKEITALAPSSMKIK 2 STU60483 MEAAGIHETTYNSIM KCDVDIRKDLYGNIV LSGGTTMFNGIADRM SKEITALAPSSMKIK 31 345 346 360 VVAPPERKYSVWIGG SILASLSTFQQMWIT 360 VVAPPERKYSVWIGG SILASLSTFQQ---- 336 Page 5.1 361 375 376 390 391 405 406 420 1 STPOAC101 KGEYDESGPSIVHRK CF 377 2 STU60483 --------------- -- 336 421 435 436 450
Alignment Data (Fasta format)
>STPOAC101 MADAEDIEPLVCDNGTGMVKAGFAGDDAPRAVFPSIVGRPRHTGVMVGMG QKDAYVGDEAQSKRGILTLKYPIEHGIVSNWDDMEKIWHHTFYNELRVSP DEHPVLLTEAPLNPKANREKMTQIMFETFNVPAMYVAIQAVLSLYASGRT TGIVLDSGDGVSHTVPIYEGYALPHAILRLDLAGRDLTDCLMKILTERGY SFTTSAEREIVRDMKEKLAYVALDYEQELETAKSSSAVEKSYELPDGQVI TIGAERFRCPEVLFQPSLVGMEAAGIHETTYNSIMKCDVDIRKDLYGNIV LSGGTTMFPGIADRMSKEITALAPSSMKIKVVAPPERKYSVWIGGSILAS LSTFQQMWITKGEYDESGPSIVHRKCF >STU60483 --------------------AGFAGDDAPRAVFPSIVGRPRHTGVMVGMG QKDAYVGDEAQSKRGILTLKYPIEHGIVSNWDDMEKIWHHTFYNELRVAP EEHPVLLTEAPLNPKANREKMTQIMFETFNTPAMYVAIQAVLSLYASGRT TGIVMDSGDGVSHTVPIYEGYALPHAILRLDLAGRDLTDHLMKILTERGY SFTTTAEREIVRDVKEKLSYIALDYEQELDTSKTSSSVEKSYELPDGQVI TIGAERFRCPEVLFQPSMIGMEAAGIHETTYNSIMKCDVDIRKDLYGNIV LSGGTTMFNGIADRMSKEITALAPSSMKIKVVAPPERKYSVWIGGSILAS LSTFQQ---------------------
Tutorial 3 – How to display the 3-dimensional structure of a protein.
Finally you can also see the 3D structure of a member of the actin gene family. Go to the Protein Data Bank use the 3DB browser and search for an exact match to actin by using the PDB ID number 2btf. Click on the link to view structure and use the First Glance or Protein Explorer option. You need to have Chime loaded as a plug-in for Netscape to see the structures. Instructions for loading Chime are available at PDB.