Motif, Domain, Profile, Pattern, & Repeat Searches
Bioinformatics & Genomics
Companies, Publishers, & Books
Compendiums & Lists of Resource Links
Compounds & Enzymes
Educational & Information Resources
Genomics & DNA Sequence Analysis
Hidden Markov Models
Major Sites & Organizations
Metabolic Pathway Databases & Related
Molecular Modeling & Visualization
Motif, Domain, Profile, Pattern, & Repeat Searches
Multiple Alignment & Phylogeny
Protein & Nucleic Acid Search Servers
Protein Analysis from Sequence
Sites with Multiple or Integrated Tools
Software Catalogues, Lists, & Downloads
| Opens resource in a 2nd browser window.
3 Dee - Database of Protein Domain Definitions
[European Bioinformatics Institute] 3Dee contains structural domain definitions for all protein chains in the Protein Databank (PDB) [EBI-MSD/RCSB] that have 20 or more residues and are not theoretical models.
BCM Search Launcher: General Protein Sequence/Pattern Searches
[Baylor College of Medicine] A site with multiple tools to search a protein sequence for patterns.
Bioinformatics & Pattern Discovery
[IBM] Information and bioinformatics related servers developed at IBM. Services include sequence pattern analysis, gene expression analysis, and multiple sequence alignment. Servers may be used online, or the code may be downloaded.
Blocks WWW Server
[Fred Hutchinson Cancer Research Center] Tools for the detection and verification of protein sequence homology.
[San Diego Supercomputer Center] The Conserved Key Amino Acid Positions database provides access to an analysis of structurally similar proteins with dissimilar sequences where key residues within a common fold are identified. CKAAP database provides CKAAPs of the representative set of proteins derived from the Combinatorial Extension algorithm and FSSP databases.
Construction of profiles for PROSITE
[ISREC] This is a guide on how to generate
profile entries for the PROSITE database.
[Uppsala Software Factory] The input to the server is a pdb file with a secondary structure motif.
The secondary structure elements (SSEs) of the pdb file will be assigned first (or given explicitly in a file).
You may input superpositioning criteria based on which the server will find similiar secondary structure motifs.
[The International Center for Genetic Engineering and Biotechnology] This is an introductory tutorial for biologists interested in weak protein sequence similarities which can not be found with simple database search.
[K. Hinsen] (Centre National de la Recherche Scientifique) A program for the determination and characterization of dynamical domains in proteins.
[ISREC] This is a java applet that does Dot Plots (pairwise sequence comparisons). This site also contains examples with interpretations of Dot Plots, including protein repeat regions and intron and exon patterns in DNA.
Top of Page
[Karolinska Institutet, Center for Genomics Research] Compares two related sequences and finds matches, creating a dotplot. Accompanying the dotplot are excellent statistics and user-friendly adjustment of thresholds. Download the program to run on Unix. A server version is available at http://www.hgmp.mrc.ac.uk/Registered/Option/dotter.html for registered users.
[Stanford Univ.] Ranks the motifs that it finds by both their specificity (expected false postives) and the number of supplied sequences that it covers (true positives). The twenty highest-scoring motifs are returned. This site also contains several other tools for sequence alignment and similarity searching, protein function identification and genome analysis.
Compare the numerical order of protein-coding genes in two genomes, using GenBank genome files. Or compare a user-created list of protein sequences with a genome or another list. Program creates a dotplot of matches in proteins coded by genes, plotted in numerical order along the genome (or list). Points are plotted at different levels of significance for matching amino acid sequences. Program also generates a clickable list of matches. Genome size limited to less than 250 kb.
[European Bioinformatics Institute] InterPro is an integrated documentation resource for protein families, domains and
sites, developed initially as a means of rationalising the complementary efforts of the
PROSITE, PRINTS, Pfam and ProDom database projects. Each combined InterPro
entry includes functional descriptions and literature references, and links are made
back to the relevant member database(s), allowing users to see at a glance whether
a particular family or domain has associated patterns, profiles, fingerprints, etc.
[NCGR] MAR-Finder uses statistical inference to deduce the presence of matrix association regions. A user name and password is required, but registration is free.
[San Diego Supercomputing Center] Discovers motifs (highly conserved regions) in groups of related DNA or protein sequences using MEME and searches sequence databases using motifs using MAST.
Molecular Sequence Megaclassification
[Washington Univ., St. Louis] Provides access to a non-redundant molecular sequence collection that can be accessed by domain type of sequence.
[CBC] A tool developed to search PIR or Arabidopsis database using a protein or DNA sequence.
Motif Search Tool
[NCBI] (MoST) "MoST will scan the indicated database iteratively until convergence, by adding segments selected at each iteration to the original block."
MOTIF: Searching Protein and Nucleic Acid Sequence Motifs
[Genome Net] Finds protein motifs in query sequence and gives structural information on the found motifs.
Top of Page
PANAL Protein Analysis
[University of Minnesota] PANAL is an integrated resource for protein sequence analysis. The tool allows the user to simultaneously search a protein sequence for motifs from several databases, and to view the results as an intutive graphical summary.
[Argonne National Laboratory] A pattern matcher which searches protein or nucleotide (DNA, RNA, tRNA etc.) sequence archives for instances of a pattern which is input.
[Pole Bio-Informatique Lyonnais] A form allowing submission of a sequence to perform a pattern search.
[ISREC] Searches the PROSITE database with a Perl script.
[Washington University, St. Louis] Pfam is a large collection of multiple sequence alignments and hidden Markov models covering many common protein domains based on the Swissprot 38 and SP-TrEMBL 11 protein sequence databases.
Pfam ftp Site
[Washington University, St. Louis] The ftp site for downloading the Pfam protein domain database and the associated tools.
[consortium of several research laboratories] This genomics site is oriented to plant functional genomics. The site maintains servers for analyzing protein sequence, searching for patterns, and plotting motifs. The site also discusses plant protein phosphorylation.
Pratt - A Pattern Discovery Tool
[University of Bergen] Discovers patterns conserved in sets of unaligned protein sequences.
Pratt Pattern Discovery
[EBML] A tool that allows the user to search for patterns conserved in a set of protein sequences.
PRINTS BLAST Search
[University of Manchester] "This is an interface to a BLAST search of all protein sequences contained within the PRINTS database. The user entered sequence may be a protein or DNA sequence."
Top of Page
PRINTS: Protein Fingerprint Database
[University of Manchester] "PRINTS is a compendium of protein fingerprints. A fingerprint is a group of conserved motifs used
to characterise a protein family; its diagnostic power is refined by iterative scanning of a
ProDom: The Protein Domain Database
[INRA] "The ProDom protein domain database consists of an automatic compilation of homologous domains."
[ISREC] Searches a single sequence against currently available profile databases. Also available is Frame-ProfileScan Server, which uses the new frame search option of the pfscan program to search a single DNA sequence against currently available protein profile databases.
[ExPASy] A searchable dictionary of proteins and patterns. Identifies to which family of proteins a sequence belongs.
[ISREC] Uses a Perl script to scan the Amos Bairochs ProSite database.
[Virtual Genome Center] Searches to determine if a protein motif is encoded by a DNA sequence or a database of DNA sequences.
Protein Sequence Search Tool
[Indian Institute of Science] The service provides 6 utilities for analyzing protein sequences: sequence retrieval; amino acid composition, molecular wt., and volume; sequence length search; molecular weight search; identical pattern matching; and similar pattern matching.
[ExPASy] An extensive annotated list of links to tools that provide analysis of protein sequences and structures.
[Munich Information Center for Protein Sequences] ProtFam by MIPS is a curated protein classification database. In a joint effort, MIPS and
PIR-NBRF classify sequences into
superfamilies and families and annotate homology domains. This is part of the joint project to
produce the PIR-International Protein Sequence Database.
[Stanford University] "An exhaustive classification of all the proteins in the SWISSPROT and TrEMBL databases, into groups of related proteins. The analysis uses transitivity to identify homologous proteins, and within each group, every two members are either directly or transitively related."
Top of Page
[ICGEBnet] A protein domain library with structural, functional, ligand-binding, and topogenic segments of proteins.
scan for matches
[Bioweb Pasteur] Search tool to search a protein or nucleic acid sequence for patterns. User may enter sequence or datafile to be searched, and sequence of pattern searched for.
SMART: Simple Modular Architecture Research Tool
[EMBL] Performs a search for proteins with specific combinations of domains in defined taxonomic ranges, and allows for rapid identification and annotation of signaling domain sequences.
[Uppsala Software Factory] The input to the SPASM Server is a file in PDB format with a motif. The server will find similiar motifs based on a search in the PDB.
The search is based on criteria on how strict the atomic superpositioning should be and
allowed substitutions of some (or all) of the residue-types.
[Medical Research Council Laboratory of Molecular Biology, Cambridge] The purpose of this server is to provide structural (and hence implied functional) assignments to protein sequences at the Structural Classification of Proteins (SCOP)
superfamily level. SCOP provides a classification of all proteins in the Protein Data Bank (PDB). Whole genome assignment results are also provided.
UCLA/DOE Fold Server
[UCLA-DOE] Motif-Based Fold Assignment method combines the
functional information contained in the SwissProt keyword annotation with Prosite
motif database to improve the performance of any conventional
sequence/structure based prediction.
Uncharacterized Protein Families List
[ExPASy] (UPF) A list of uncharacterized protein families and an index of members. Information provided includes: name, origin, general information and a description.