This book is a very well written overview to hidden Markov models and context-free grammar methods in computational biology. The authors have written a book that is useful to both biologists and mathematicians. Biologists with a background in probability theory equivalent to a senior-level course should be able to follow along without any trouble. The approach the author's take in the book is very intuitive and they motivate the concepts with elementary examples before moving on to the more abstract definitions. Exercises also abound in the book, and they are straightforward enough to work out, and should be if one desires an in-depth understanding of the main text. In addition, there is a software package called HMMER, developed by one of the authors (Eddy) that is in the public domain and can be downloaded from the Internet. The package specifically uses hidden Markov models to perform sequence analysis using the methods outlined in the book.
Probabilistic modeling has been applied to many different areas, including speech recognition, network performance analysis, and computational radiology. An overview of probabilistic modeling is given in the first chapter, and the authors effectively introduce the concepts without heavy abstract formalism, which for completeness they delegate to the last chapter of the book. Bayesian parameter estimation is introduced as well as maximum likelihood estimation. The authors take a pragmatic attitude in the utility of these different approaches, with both being developed in the book.
This is followed by a treatment of pairwise alignment in Chapter Two, which begins with substitution matrices. They point out, via some exercises, the role of physics in influencing particular alignments (hydrophobicity for example). Global alignment via the Gotoh algorithm and local alignment via the Smith-Waterman algorithm, are both discussed very effectively. Finite state machines with accompanying diagrams are used to discuss dynamic programming approaches to sequence alignment. The BLAST and FASTA packages are briefly discussed, along with the PAM and BLOSUM matrices.
Hidden Markov models are treated thoroughly in the next chapter with the Viterbi and Baum-Welch algorithms playing the central role. HIdden Markov models are then used in Chapter 4 for pairwise alignment. State diagrams are again used very effectively to illustrate the relevant ideas. Profile hidden Markov models which, according to the authors are the most popular application of hidden Markov models, are treated in detail in the next chapter. A very surprising application of Voronoi diagrams from computational geometry to weighting training sequences is given.
Several different approaches, such as Barton-Sternberg, CLUSTALW, Feng-Doolittle, MSA, simulated annealing, and Gibbs sampling are applied to multiple sequence alignment methods in Chapter 6. It is very well written, with the only disappointment being that only one exercise is given in the entire chapter. Phylogenetic trees are covered in Chapter 7, with emphasis placed on tree building algorithms using parsimony. The next chapter discusses the same topic from a probabilistic perspective. This to me was the most interesting part of the book as it connects the sequence alignment algorithms with evolutionary models.
The authors switch gears starting with the next chapter on transformational grammars. It is intriguing to see how concepts used in compiler construction can be generalized to the probabilistic case and then applied to computational biology. The PROSITE database is given as an example of the application of regular grammars to sequence matching. This chapter is fascinating reading, and there are some straightforward exercises illustrating the main points.
The last chapter covers RNA structure analysis, which introduces the concept of a pseudoknot. These are not to be confused with the usual knot constructions that can be applied to the topology of DNA, but instead result from the existence of non-nested base pairs in RNA sequences. The authors discuss many other techniques used in RNA sequence analysis and take care to point out which ones are more practical from a computational point of view. Surprisingly, genetic algorithms and algorithms based on Monte Carlo sampling are not discussed in the book, but the authors do give references for the interested reader.
The best attribute of this book is that the authors take a pragmatic point of view of how mathematics can be applied to problems in computational biology. They are not dogmatic about any particular approach, but instead fit the algorithm to the problem at hand.