Bioinformatics
The Machine Learning Approach
Résumé
Machine learning approaches (e.g., neural networks, hidden Markov models, and belief networks) are ideally suited for areas where there is a lot of data but little theory--and this is exactly the situation in molecular biology. As with its predecessor, statistical model fitting, the goal in machine learning is to extract useful information from a body of data by building good probabilistic models. The particular twist behind machine learning, however, is to automate the process as much as possible.
In this book, Pierre Baldi and Søren Brunak present the key machine learning approaches and apply them to the computational problems encountered in the analysis of biological data. The book is aimed at two types of researchers and students. First are the biologists and biochemists who need to understand new data-driven algorithms, such as neural networks and hidden Markov models, in the context of biological sequences and their molecular structure and function. Second are those with a primary background in physics, mathematics, statistics, or computer science who need to know more about specific applications in molecular biology.
Table of contents
- Series Foreword
- Preface
- 1 Introduction
- 1.1 Biological Data in Digital Symbol Sequences
- 1.2 Genomes--Diversity, Size, and Structure
- 1.3 Proteins and Proteomes
- 1.4 On the Information Content of Biological
Sequences
- 1.5 Prediction of Molecular Function and
Structure
- 2 Machine Learning Foundations: The Probabilistic
Framework
- 2.1 Introduction: Bayesian Modeling
- 2.2 The Cox-Jaynes Axioms
- 2.3 Bayesian Inference and Induction
- 2.4 Model Structures: Graphical Models and Other
Tricks
- 2.5 Summary
- 3 Probabilistic Modeling and Inference: Examples
- 3.1 The Simplest Sequence Models
- 3.2 Statistical Mechanics
- 4 Machine Learning Algorithms
- 4.1 Introduction
- 4.2 Dynamic Programming
- 4.3 Gradient Descent
- 4.4 EM/GEM Algorithms
- 4.5 Markov Chain Monte Carlo Methods
- 4.6 Simulated Annealing
- 4.7 Evolutionary and Genetic Algorithms
- 4.8 Learning Algorithms: Miscellaneous Aspects
- 5 Neural Networks: The Theory
- 5.1 Introduction
- 5.2 Universal Approximation Properties
- 5.3 Priors and Likelihoods
- 5.4 Learning Algorithms: Backpropagation
- 6 Neural Networks: Applications
- 6.1 Sequence Encoding and Output Interpretation
- 6.2 Prediction of Protein Secondary Structure
- 6.3 Prediction of Signal Peptides and Their Cleavage
Sites
- 6.4 Applications for DNA and RNA Nucleotide
Sequences
- 7 Hidden Markov Models: The Theory
- 7.1 Introduction
- 7.2 Prior Information and Initialization
- 7.3 Likelihood and Basic Algorithms
- 7.4 Learning Algorithms
- 7.5 Applications of HMMs: General Aspects
- 8 Hidden Markov Models: Applications
- 8.1 Protein Applications
- 8.2 DNA and RNA Applications
- 8.3 Conclusion: Advantages and Limitations of
HMMs
- 9 Hybrid Systems: Hidden Markov Models and Neural
Networks
- 9.1 Introduction to Hybrid Models
- 9.2 The Single-Model Case
- 9.3 The Multiple-Model Case
- 9.4 Simulation Results
- 9.5 Summary
- 10 Probabilistic Models of Evolution: Phylogenetic
Trees
- 10.1 Introduction to Probabilistic Models of
Evolution
- 10.2 Substitution Probabilities and Evolutionary
Rates
- 10.3 Rates of Evolution
- 10.4 Data Likelihood
- 10.5 Optimal Trees and Learning
- 10.6 Parsimony
- 10.7 Extensions
- 11 Stochastic Grammars and Linguistics
- 11.1 Introduction to Formal Grammars
- 11.2 Formal Grammars and the Chomsky Hierarchy
- 11.3 Applications of Grammars to Biological
Sequences
- 11.4 Prior Information and Initialization
- 11.5 Likelihood
- 11.6 Learning Algorithms
- 11.7 Applications of SCFGs
- 11.8 Experiments
- 11.9 Future Directions
- 12 Internet Resources and Public Databases
- 12.1 A Rapidly Changing Set of Resources
- 12.2 Databases over Databases and Tools
- 12.3 Databases over Databases
- 12.4 Databases
- 12.5 Sequence Similarity Searches
- 12.6 Alignment
- 12.7 Selected Prediction Servers
- 12.8 Molecular Biology Software Links
- 12.9 Ph.D. Courses over the Internet
- 12.10 HMM/NN Simulator
- A Statistics
- A.1 Decision Theory and Loss Functions
- A.2 Quadratic Loss Functions
- A.3 The Bias/Variance Trade-off
- A.4 Combining Estimators
- A.5 Error Bars
- A.6 Sufficient Statistics
- A.7 Exponential Family
- A.8 Gaussian Process Models
- A.9 Variational Methods
- B Information Theory, Entropy, and Relative
Entropy
- B.1 Entropy
- B.2 Relative Entropy
- B.3 Mutual Information
- B.4 Jensen's Inequality
- B.5 Maximum Entropy
- B.6 Minimum Relative Entropy
- C Probabilistic Graphical Models
- C.1 Notation and Preliminaries
- C.2 The Undirected Case: Markov Random Fields
- C.3 The Directed Case: Bayesian Networks
- D HMM Technicalities, Scaling, Periodic Architectures,
State Functions, and Dirichlet Mixtures
- D.1 Scaling
- D.2 Periodic Architectures
- D.3 State Functions: Bendability
- D.4 Dirichlet Mixtures
- E List of Main Symbols and Abbreviations
- References
- Index
Caractéristiques techniques
PAPIER | |
Éditeur(s) | The MIT Press |
Auteur(s) | Pierre Baldi, Sren Brunak |
Parution | 10/05/1998 |
Nb. de pages | 232 |
EAN13 | 9780262024426 |
Avantages Eyrolles.com
Consultez aussi
- Les meilleures ventes en Graphisme & Photo
- Les meilleures ventes en Informatique
- Les meilleures ventes en Construction
- Les meilleures ventes en Entreprise & Droit
- Les meilleures ventes en Sciences
- Les meilleures ventes en Littérature
- Les meilleures ventes en Arts & Loisirs
- Les meilleures ventes en Vie pratique
- Les meilleures ventes en Voyage et Tourisme
- Les meilleures ventes en BD et Jeunesse