Sequence Alignment and Modeling System


SAM-T02 HMM WWW Servers

SAM 3.5 (July 2005) is available!
The SAM documentation (the 175 page, manual is also available in PDF and PS) discusses the changes from previous versions.

If you are a college, university, U.S. government lab, or nonprofit, you can download the software from the SAM distribution page. If you are interested in SAM for commercial use, please request more information from sam-info@cse.ucsc.edu


Martin Madera and Julian Gough have written a perl converter between SAM and HMMer 2.0 formats. You can get it from them (be sure to read their excellent documentation!) or download a 10/24/2000 copy.
Please read the ISMB99 tutorial on using HMMs
A linear hidden Markov model is a sequence of nodes, each corresponding to a column in a multiple alignment. In our HMMs, each node has a match state (square), insert state (diamond) and delete state (circle). Each sequence uses a series of these states to traverse the model from start to end. Using a match state indicates that the sequence has a character in that column, while using a delete state indicates that the sequence does not. Insert states allow sequences to have additional characters between columns. In many ways, these models correspond to profiles.

The primary advantage of these models over standard methods of sequence search is their ability to characterize an entire family of sequences. Thus, each position has a distribution of bases, as do transitions between states. That is, these linear HMMs have position-dependent character distributions and position-dependent insertion and deletion gap penalties. The alignment of each of a family to a trained model automatically yields a multiple alignment among those sequences.

The SAM software system is a collection of tools for creating and using these models.

The algorithms and methods used by SAM and other HMM systems were initially described in several papers from the University of California, Santa Cruz. These papers, several of which are described below, are available in the UCSC Computational Biology group's Protein FTP directory.

The complete SAM documentation is available in compressed (.gz) postscript and as a series of WWW pages. We also have a 2-page overview of SAM in postscript.

SAM runs on Unix workstation. Building a model using SAM can require minutes to several hours on a workstation depending on the length of the model, the number of sequences, and other factors.

SAM makes use of UCSC's Dirichlet mixture regularizer research.

The creation and distribution of SAM has been supported in part by NSF grants CDA-9115268, IRI-9123692, DBI-9408579 and DBI-9808007; ONR grant N00014-91-J-1162; NIH grants GM17129 and 1 R01 GM068570-01; DOE grant DE-FG03-95ER62112; a grant from the Danish Natural Science Research Council; and the UCSC Center for Biomolecular Science and Engineering;

Sean Eddy has written another program suite based on these methods called HMMER, which may also be of interest. SAM includes conversion programs between the two systems' formats.

Hidden Markov models are used extensively in speech recognition.

UCSC Specific papers of interest (Click here to see abstracts as well)

Other papers and pointers of interest (please email new pointers!)


sam-info@cse.ucsc.edu
UCSC Computational Biology Group