An Introduction to Statistical Learning: with Applications in R (Springer Texts in Statistics) (英語) ハードカバー – 2013/8/12
Gareth James is a professor of data sciences and operations at the University of Southern California. He has published an extensive body of methodological work in the domain of statistical learning with particular emphasis on high-dimensional and functional data. The conceptual framework for this book grew out of his MBA elective courses in this area.
Daniela Witten is an associate professor of statistics and biostatistics at the University of Washington. Her research focuses largely on statistical machine learning in the high-dimensional setting, with an emphasis on unsupervised learning.
Trevor Hastie and Robert Tibshirani are professors of statistics at Stanford University, and are co-authors of the successful textbook Elements of Statistical Learning. Hastie and Tibshirani developed generalized additive models and wrote a popular book of that title. Hastie co-developed much of the statistical modeling software and environment in R/S-PLUS and invented principal curves and surfaces. Tibshirani proposed the lasso and is co-author of the very successful An Introduction to the Bootstrap.
Since data science is a fast moving field, there are people who want to jump on the deep learning bandwagon straightaway. This is not the correct way to enter the field. You have to have you statistical bases covered before you touch the more advanced topics. This is especially true for CS students who learn more of discrete math which doesn't lend itself well in the world of AI/ML.
So for those learners I would recommend this book. If you self-assess yourself to be good at Maths and an advanced learner, I would recommend the authors' other book Elements of Statistical Learning.
A set of tools used to analyze data. Includes most general techniques in AI excluding Neural Networks. Kinds of tools covered: Regressions, Logistical Regression, Linear discriminant analysis, Decision Trees, Random Forest, Boosting, Cross Validation, SVM, PCA, K-means clustering.
Standouts (Strong topics):
1) Great coverage in Linear Regression. Absolutely brilliant. For the first time in my life (I've been in data science for ten odd years) I learnt about t-statistic and f-statistic in the way that it should be taught.
2) Good mathematical coverage of cross validation
3) Good coverage of Logistic Regression, PCA, Random Forest and trees, Clustering.
Bonus: Great coverage of relationship between SVM and logistical regression - history of hype behind kernel methods in SVM
Not so good parts:
1) I mentioned in my headline that i have a love hate relationship with this book. The reason for the lower rating is that there were many parts where i looking for external references. This book leaves you in the wilderness of mathematics many times of stating a conclusion without a proof. This to me is the same mistake made by several Indian books, and unlike the authors' other book ESL.
My advise to the authors is cover a topic fully or not at all. To not assume no knowledge of mathematics by the reader, especially given -
2) The book is mathematically hard in parts. They don't treat the reader with baby gloves. Several times the summations used take some disambiguation to understand. This is in my opinion good. Just that there are other parts of the book where they do not assume the same level of expertise from the reader and will just state a complex formula without derivation or justification. The consistency is not there.
3) It felt like different chapters in the book were written by different people and that's why there is a difference in the level of mathematics used and tone of teaching used. I will advise authors for another edition where the additions/editing are done by one author throughout and there are tougher parts put in the appendix.
The copy I got from Springer was simply a delight to read. It was made of silky paper and page turning was so easy. It will be one of the books in my collection.
These authors are very famous. They are pioneers in the field of statistical learning. The word was coined by them, to include all methods of learning from data excluding neural networks (which came from the AI world).
The Mooc is available for free from Stanford Lagunita. Do check it out if you are buying the book. I easily recommend the book over the video lectures. The reason is that the book is the best "Introductory statistics for ML" but there are several better MOOCs than the Lagunita one for ML.
I would rate it 5/5 for applicative learning as they run a parallel stream through the book teach you R as well. For those of you who don't know it was lingua franca for Data Scientists before the TensorFlow age. Though it has marginally decreased in popularity since then, it is still the best non-production data science language available.
Note: Due to several R paradigms (libraries) having changed since the book, I would not recommend it to learn R. It's something that can give you a taste of R that you have to learn full fledged elsewhere. I recommend MOOC : The Analytics Edge for this.
The book is great for the right audience. Decide whether
1) You are medium to advanced in the field. Then buy ESL (Elements of statistical learning) over ISLR
2) You are from different field and are not thrown off by mathematical notations
3) You are disappointed with regular statistics books as required for Data Science.
4) You want to go "the right way" to learning AI and ML, and don't want to jump to the advanced topics straightaway without understanding the basics.
This is THE book for an undergraduate first or second year book, for a first course in AI or ML. But you have to be ready to work through another book after this. The foundations you learn in this book will hold you steady as you trudge into the world of data science.
Free availability of book:
The authors have officially made the book available for free as a pdf from the book website. I have personally found it extremely hard to read books on a laptop because our computers are filled with all kinds of distractions. Further the book printing quality was extremely good.
But if you cannot afford (college student etc), then no doubt read the pdf.
Note: Advanced learners can straightaway go for the book by the same authors, ESL (Elements of Statistical Learning).
Note 2: I didnt have the opportunity to work through the exercises, but I have to note that the exercises are extensive. Making it again suitable for college learners
Would be nice to have a chapter on using the tidyverse to simplify tasks.
Nothing on cleaning data in here, you'll need another reference for that.