
Module 4: Similarity Searches
 A dot plot is a simplified graphical representation of similarity between two sequences. In complex comparisons, scoring
matrices that are employed for computations of similarities can be considered as extensions of dot plots. One can visually
compare two sequences for regions of close similarity.
 When we arrange two DNA sequences along the X and Y axes, respectively, of a simple graph and place a dot at every
position with identical nucleotides in the two sequences, we get a dot plot as shown below (Figure 1). Note that a dot is
placed at the intersection of a row and a column with the same nucleotide.
 A diagonal stretch of dots can be noticed in regions where there is similarity between the two sequences. Although this
similarity is noticeable, one can use statistical filters to highlight the similarities. eg., in the following plot
(Figure 2), we have used a filter such that a dot is placed only when in a window of 4 bases, 3 of them match (75%
identity).
 To detect similarities in more distantly related DNA, one should make the window larger (may be 30  50 bases) and
decrease the percentage of identity (may be 50% instead of 75%).
 Dot plotting can be adapted for amino acids within protein sequences also. The PAM
matrices are, in fact, more complex dot plots with various window sizes of amino acids and percentages of identities
between amino acids.

