Bioinformatics

The Machine Learning Approach

Pierre Baldi, Sren Brunak

232 pages, parution le 10/05/1998

Ajouter à une liste

Indisponible

Résumé

An unprecedented wealth of data is being generated by genome sequencing projects and other experimental efforts to determine the structure and function of biological molecules. The demands and opportunities for interpreting these data are expanding more than ever. Biotechnology, pharmacology, and medicine will be particularly affected by the new results and the increased understanding of life at the molecular level. Bioinformatics is the development and application of computer methods for analysis, interpretation, and prediction, as well as for the design of experiments. It has emerged as a strategic frontier between biology and computer science.

Machine learning approaches (e.g., neural networks, hidden Markov models, and belief networks) are ideally suited for areas where there is a lot of data but little theory--and this is exactly the situation in molecular biology. As with its predecessor, statistical model fitting, the goal in machine learning is to extract useful information from a body of data by building good probabilistic models. The particular twist behind machine learning, however, is to automate the process as much as possible.

In this book, Pierre Baldi and Søren Brunak present the key machine learning approaches and apply them to the computational problems encountered in the analysis of biological data. The book is aimed at two types of researchers and students. First are the biologists and biochemists who need to understand new data-driven algorithms, such as neural networks and hidden Markov models, in the context of biological sequences and their molecular structure and function. Second are those with a primary background in physics, mathematics, statistics, or computer science who need to know more about specific applications in molecular biology.

Table of contents

Series Foreword
Preface
1 Introduction: 1.1 Biological Data in Digital Symbol Sequences; 1.2 Genomes--Diversity, Size, and Structure; 1.3 Proteins and Proteomes; 1.4 On the Information Content of Biological Sequences; 1.5 Prediction of Molecular Function and Structure
2 Machine Learning Foundations: The Probabilistic Framework: 2.1 Introduction: Bayesian Modeling; 2.2 The Cox-Jaynes Axioms; 2.3 Bayesian Inference and Induction; 2.4 Model Structures: Graphical Models and Other Tricks; 2.5 Summary
3 Probabilistic Modeling and Inference: Examples: 3.1 The Simplest Sequence Models; 3.2 Statistical Mechanics
4 Machine Learning Algorithms: 4.1 Introduction; 4.2 Dynamic Programming; 4.3 Gradient Descent; 4.4 EM/GEM Algorithms; 4.5 Markov Chain Monte Carlo Methods; 4.6 Simulated Annealing; 4.7 Evolutionary and Genetic Algorithms; 4.8 Learning Algorithms: Miscellaneous Aspects
5 Neural Networks: The Theory: 5.1 Introduction; 5.2 Universal Approximation Properties; 5.3 Priors and Likelihoods; 5.4 Learning Algorithms: Backpropagation
6 Neural Networks: Applications: 6.1 Sequence Encoding and Output Interpretation; 6.2 Prediction of Protein Secondary Structure; 6.3 Prediction of Signal Peptides and Their Cleavage Sites; 6.4 Applications for DNA and RNA Nucleotide Sequences
7 Hidden Markov Models: The Theory: 7.1 Introduction; 7.2 Prior Information and Initialization; 7.3 Likelihood and Basic Algorithms; 7.4 Learning Algorithms; 7.5 Applications of HMMs: General Aspects
8 Hidden Markov Models: Applications: 8.1 Protein Applications; 8.2 DNA and RNA Applications; 8.3 Conclusion: Advantages and Limitations of HMMs
9 Hybrid Systems: Hidden Markov Models and Neural Networks: 9.1 Introduction to Hybrid Models; 9.2 The Single-Model Case; 9.3 The Multiple-Model Case; 9.4 Simulation Results; 9.5 Summary
10 Probabilistic Models of Evolution: Phylogenetic Trees: 10.1 Introduction to Probabilistic Models of Evolution; 10.2 Substitution Probabilities and Evolutionary Rates; 10.3 Rates of Evolution; 10.4 Data Likelihood; 10.5 Optimal Trees and Learning; 10.6 Parsimony; 10.7 Extensions
11 Stochastic Grammars and Linguistics: 11.1 Introduction to Formal Grammars; 11.2 Formal Grammars and the Chomsky Hierarchy; 11.3 Applications of Grammars to Biological Sequences; 11.4 Prior Information and Initialization; 11.5 Likelihood; 11.6 Learning Algorithms; 11.7 Applications of SCFGs; 11.8 Experiments; 11.9 Future Directions
12 Internet Resources and Public Databases: 12.1 A Rapidly Changing Set of Resources; 12.2 Databases over Databases and Tools; 12.3 Databases over Databases; 12.4 Databases; 12.5 Sequence Similarity Searches; 12.6 Alignment; 12.7 Selected Prediction Servers; 12.8 Molecular Biology Software Links; 12.9 Ph.D. Courses over the Internet; 12.10 HMM/NN Simulator
A Statistics: A.1 Decision Theory and Loss Functions; A.2 Quadratic Loss Functions; A.3 The Bias/Variance Trade-off; A.4 Combining Estimators; A.5 Error Bars; A.6 Sufficient Statistics; A.7 Exponential Family; A.8 Gaussian Process Models; A.9 Variational Methods
B Information Theory, Entropy, and Relative Entropy: B.1 Entropy; B.2 Relative Entropy; B.3 Mutual Information; B.4 Jensen's Inequality; B.5 Maximum Entropy; B.6 Minimum Relative Entropy
C Probabilistic Graphical Models: C.1 Notation and Preliminaries; C.2 The Undirected Case: Markov Random Fields; C.3 The Directed Case: Bayesian Networks
D HMM Technicalities, Scaling, Periodic Architectures, State Functions, and Dirichlet Mixtures: D.1 Scaling; D.2 Periodic Architectures; D.3 State Functions: Bendability; D.4 Dirichlet Mixtures
E List of Main Symbols and Abbreviations
References
Index

Caractéristiques techniques

	PAPIER
Éditeur(s)	The MIT Press
Auteur(s)	Pierre Baldi, Sren Brunak
Parution	10/05/1998
Nb. de pages	232
EAN13	9780262024426

Avantages Eyrolles.com

Livraison à partir de 0,01 € en France métropolitaine

Paiement en ligne SÉCURISÉ

Livraison dans le monde

Retour sous 15 jours

+ d'un million et demi de livres disponibles

Bioinformatics

Résumé

Caractéristiques techniques

Consultez aussi