Module 5: Sequence Alignments

 

Multiple Alignment

Pairwise module 5 contents Exercise 5 back to the index of modules


Multiple alignment of several sequences can be used to extend the knowledge gained about a group of sequences through similarity searches or pairwise alignments. This technique, in general, highlights regions of sequence similarity across the group of sequences being aligned. Multiple alignment is also often one of the first steps in phylogenetic analysis (see module 8).
  • Multidimensional, dynamic programming could be employed to simultaneously align multiple proteins with a guaranteed optimal solution. A common algorithm implementing this approach is appropriately called MSA (Multiple Sequence Alignment). This method is fairly slow due to the highly intensive nature of the computing involved in dynamic programming algorithms. An useful site to perform a MSA alignment would be at the http://www.ibc.wustl.edu/msa.html server.

  • A faster, but still fairly optimal multiple alignment, can be achieved by techniques where initially 2 sequences are optimally aligned and then more, potentially homologous sequences are added one at a time for multiple optimal alignments.

  • ClustalW [Ref: Thompson, J. D., Higgins, D. G., and Gibson, T. J. (1994) Nucleic Acids Res. 22: 4673-4680] is the most popular algorithm that is employed to perform multiple alignments. It follows a progressive alignment approach as mentioned above.

    • Incorporates position-specific gap penalties, sequence weighting, & a weight choice matrix: Gaps are introduced into the sequences to find matching residues between them. The number of such matching residues identified, within a region of the sequence, in order for a gap to be introduced there is called the gap penalty. It should be be obvious that the larger the gap penalty the greater the homology between two sequences in any given region.

      Sequence weighting and weight choice matrix are factors that consider the evolutionary nature of amino acid substitutions between related sequences. Here, similarity matrices such as PAM or BLOSUM (see Module 4) are employed for the alignment.

    • Can use the FASTA method of alignment (rapid) or Smith-Waterman method (slow).

    • Can add individual sequences to an existing alignment or align two groups of pre-aligned sequences with each other.

    • Can realign selected sequences or selected regions of globally-aligned sequences ignoring unselected regions within the sequences.

    • Servers: The following sites offer good to excellent services for performing multiple alignments of proteins using the ClustalW algorithm. Some of these sites are comprehensive in that they have other multiple alignment resources besides ClustalW.

    • ClustalX: A Windows-based tool to align multiple sequences using the ClustalW algorithm can be downloaded by FTP from ftp://ftp.ebi.ac.uk/pub/software/dos/clustalx. A similar Macintosh-based software can be downloaded from ftp://ftp.ebi.ac.uk/pub/software/mac/clustalx.

  • Alignment editors and formatters: Once a raw alignment of a newly identified or sequenced polypeptide with similar sequences has been completed (eg., with ClustalW), it can be refined manually or using certain automatic tools (alignment editing). Manual alignment editing requires an excellent knowledge of protein structure and evolutionary trends and is usually a tedious process. Computer-aided alignment editing and additional formatting for viewing and publication can be achieved by various techniques.

    • CINEMA (Colour INteractive Editor for Multiple Alignments): This is an excellent interactive editor for multiple alignments with a JAVA applet and a downloadable version. The applet version can input a variety of "pluglets" corresponding to different sequence formats. (http://www.bioinf.man.ac.uk/dbbrowser/CINEMA2.1/)

    • MACAW (Multiple Alignment Construction & Analysis Workbench): This is a program for locating, analyzing, and editing blocks of localized sequence similarity among multiple sequences and linking them into a composite multiple alignment. (ftp://ncbi.nlm.nih.gov/pub/macaw/)

    • Boxshade: This software is a formatter algorithm that shades and prints residues that are identical or similar in aligned sequences. (http://bioweb.pasteur.fr/seqanal/interfaces/boxshade.html)

    • Viseur MSF formatter: This server provides HTML conversion services for MSF (multiple sequence format) and SWISSPROT formats. (http://www.lctn.u-nancy.fr/viseur/services.html)


Pairwise module 5 contents Exercise 5 back to the index of modules

| Return to SWBIC home |

The Southwest Biotechnology and Informatics Center WWW server is located at "http://www.swbic.org/".
Please send comments and suggestions to: [email protected]
© SWBIC 2001