Algebraic Geometry and Statistical Learning Theory (Cambridge Monographs on Applied and Computational Mathematics) (英語) ハードカバー – 2009/8/13
Kindle 端末は必要ありません。無料 Kindle アプリのいずれかをダウンロードすると、スマートフォン、タブレットPCで Kindle 本をお読みいただけます。
Sure to be influential, this book lays the foundations for the use of algebraic geometry in statistical learning theory. Many widely used statistical models and learning machines applied to information science have a parameter space that is singular: mixture models, neural networks, HMMs, Bayesian networks, and stochastic context-free grammars are major examples. Algebraic geometry and singularity theory provide the necessary tools for studying such non-smooth models. Four main formulas are established: 1. the log likelihood function can be given a common standard form using resolution of singularities, even applied to more complex models; 2. the asymptotic behaviour of the marginal likelihood or 'the evidence' is derived based on zeta function theory; 3. new methods are derived to estimate the generalization errors in Bayes and Gibbs estimations from training errors; 4. the generalization errors of maximum likelihood and a posteriori methods are clarified by empirical process theory on algebraic varieties.
"Overall, the many insightful remarks and simple direct language make the book a pleasure to read."
Shaowei Lin, Mathematical Reviews
Also reviewed in the initial sections of the book are the concepts from statistical learning theory, including the very important method of comparing two probability density functions: the Kullback-Leibler distance (called relative entropy in the physics literature). The reader will have to have a good understanding of functional analysis to follow the discussion, being able to appreciate for example the difference between convergence in different norms on function space. From a theoretical standpoint, learning can be different in different norms, a fact that becomes readily apparent throughout the book (from a practical standpoint however, it is difficult to distinguish between norms, due to the finiteness of all data sets). Of particular importance in early discussion is the need for "singular" statistical learning theory, which as the author shows, boils down to finding a mathematical formalism that can cope with learning problems where the Fisher information matrix is not positive definite (in this case there is no guarantee that unbiased estimators will be available). This is where (real) algebraic geometry comes in, for it allows the removal of the singularities in parameter space by recursively using "blow-up" (birational) maps. The author lists several examples of singular theories, such as hidden Markov models, Boltzmann machines, and Bayesian networks. The author also shows to generalize some of the standard constructions in "ordinary" or "regular" statistical learning to the case of singular theories, such as the Akaike information criterion and Bayes information criterion. Some of the definitions he makes are somewhat different than what some readers are used to, such as the notion of stochastic complexity. In this book it is defined merely as the negative logarithm of the `evidence', whereas in information theory it is a measure of the code length of a sequence of data relative to a family of models. The methods for calculating the stochastic complexity in both cases are similar of course.
In singular theories, one must deal with such things as the divergence of the maximum likelihood estimator and the failure of asymptotic normality. The author shows how to deal with these situations after the singularities are resolved, and he gives a convincing argument as to why his strategies are generic enough to cover situations where the set of singular parameters, i.e. the set where the Fisher information matrix is degenerate, has measure zero. In this case, he correctly points out that one still needs to know if the true parameter is contained in the singular set, and this entails dealing with "non-generic" situations using hypothesis testing, etc.
Examples of singular learning machines are given towards the end of the book, one of these being a hidden Markov model, while another deals with a multilayer perceptron. The latter example is very important since the slowness in learning in multilayer perceptrons is widely encountered in practice (largely dependent on the training samples). The author shows how this is related to the singularities in the parameter space from which the learning is sampled, even when the true distribution is outside of the parametric model, where the collection of parameters is finite. This example leads credence to the motto that "singularities affect learning" and the author goes on further to show to what extent this is a "universal" phenomenon. By this he means that having only a "small" number of training samples will bring out the complexity of the singular parameter space; increasing the number of training samples brings out the simplicity of the singular parameter space. He concludes from this that the singularities make the learning curve smaller than any nonsingular learning machine. Most interestingly, he speculates that "brain-like systems utilize the effect of singularities in the real world."
Machine learning in particular has always been a set of magical techniques, especially when applied to linguistic problems.
Pr Watanabe is paving the way to a world where it is an actual science.
This is one of the most important books I know for the future of computer science engineering.
The problem is that reading this requires übernerd level mathematics background *and* mindset (do not wait for too much pedagogy). So chances are, if you are of mathematician breed you will love it ; if your cursus is computer science engineering, it willl make you cry.
- 洋書 > Computers & Technology > Computer Science > Artificial Intelligence > Machine Learning
- 洋書 > Computers & Technology > Programming > Algorithms > Pattern Recognition
- 洋書 > Professional & Technical > Professional Science > Mathematics > Geometry & Topology
- 洋書 > Science > Mathematics > Geometry & Topology > Algebraic Geometry