Programming Collective Intelligence: Building Smart Web 2.0 Applications ペーパーバック – 2007/8/26
Kindle 端末は必要ありません。無料 Kindle アプリのいずれかをダウンロードすると、スマートフォン、タブレットPCで Kindle 本をお読みいただけます。
Want to tap the power behind search rankings, product recommendations, social bookmarking, and online matchmaking? This fascinating book demonstrates how you can build Web 2.0 applications to mine the enormous amount of data created by people on the Internet. With the sophisticated algorithms in this book, you can write smart programs to access interesting datasets from other web sites, collect data from users of your own applications, and analyze and understand the data once you've found it.
Programming Collective Intelligence takes you into the world of machine learning and statistics, and explains how to draw conclusions about user experience, marketing, personal tastes, and human behavior in general -- all from information that you and others collect every day. Each algorithm is described clearly and concisely with code that can immediately be used on your web site, blog, Wiki, or specialized application. This book explains:
- Collaborative filtering techniques that enable online retailers to recommend products or media
- Methods of clustering to detect groups of similar items in a large dataset
- Search engine features -- crawlers, indexers, query engines, and the PageRank algorithm
- Optimization algorithms that search millions of possible solutions to a problem and choose the best one
- Bayesian filtering, used in spam filters for classifying documents based on word types and other features
- Using decision trees not only to make predictions, but to model the way decisions are made
- Predicting numerical values rather than classifications to build price models
- Support vector machines to match people in online dating sites
- Non-negative matrix factorization to find the independent features in a dataset
- Evolving intelligence for problem solving -- how a computer develops its skill by improving its own code the more it plays a game
"Bravo! I cannot think of a better way for a developer to first learn these algorithms and methods, nor can I think of a better way for me (an old AI dog) to reinvigorate my knowledge of the details."
-- Dan Russell, Google
"Toby's book does a great job of breaking down the complex subject matter of machine-learning algorithms into practical, easy-to-understand examples that can be directly applied to analysis of social interaction across the Web today. If I had this book two years ago, it would have saved precious time going down some fruitless paths."
-- Tim Wolters, CTO, Collective Intellect
Toby Segaran is the author of Programming Collective Intelligence, a very popular O'Reilly title. He was the founder of Incellico, a biotech software company later acquired by Genstruct. He currently holds the title of Data Magnate at Metaweb Technologies and is a frequent speaker at technology conferences.
This book is for those who realise programming, no matter what language, can do amazing things once you understand some simple concepts to tell a story through data. It gets you out of the mind set of, "I have some data stored here, and I will present it here". Instead, "I have some data stored here, how do I show, create understanding, explore, wedge out, predict, recommend it here"
Most of the topics presented in this book are not new in any sense, however they are not old either. They're tried and proven methods for creating meaning from datasets. They will be used for decades to come because they work! There are other books on the topics presented, like I said they are not new, however the simplicity of Python provides a frictionless entry for anyone wanting to get up and running with out a bloated IDE or framework to make it happen.
Those who are thinking, "well it's Python, and Python can't do X", I say to you a language does not determine what can and can not do it is the developer. At the end of the day the capability of the developer determines what the language can and can't do. If it seriously can't do something then build an extension to the language! With this thinking you can port what is presented in this book to any language. Python was chosen for it's simple constructs and readability.
If you're ever going to by a book on this topic buy this. Not the kindle, but the hard copy. The kindle version I've found doesn't present well for the code sections.
Overall this book is a great reference and is also a great primer if wanting to go deeper. It will allow you to tackle your next project with a different mindset and allow your users to discover and learn new things about their online surroundings and themselves!
- The book gives a good survey of common Machine Learning algorithms. It explains what kind of problems these algorithms are good for. That's perfect for someone who wants to get a quick overview and has no background in Machine Learning.
- The book is very easy to understand. The writing style is very casual. Even people without formal training in Computer Science should have no problem. The only thing that's required is basic programming knowledge, preferably in Python.
- Among all of the theoretical ML books out there it's refreshing to find a book that applies the algorithms to real-world problems.
Now the negative points. The following are not necessarily negatives for everyone, as it really depends on what you were looking for in this book. However, I was expecting a bit more, and was disappointed about the following:
- Half of the books is code. I just don't see the point in printing full listing of Python code. Why not give shorter pseudocode and make the Python code available on the website? The long code listings only obfuscate the ideas instead of demonstrating how to apply them. If you take away the code listings there are maybe 150 pages of "real" content left.
- The very casual and easy-to-understand style comes at a price. The book does not go into the mathematical details of any of the algorithms. I understand that this wasn't the books intention to begin with, but I would argue that some mathematical background is necessary in order to efficiently apply complex algorithms. If you want to apply the algorithms presented in the book to slightly different or more complex problems, or wish to understand the advantages/disadvantages of each of the algorithms you'll have to know the basic math behind them.
- The algorithms are very poorly implemented. Looking at some of the code makes me cringe. While the code in the book may work for "Building a search engine" for a few thousand pages, or optimizing problems with a handful of variables, it certainly won't work for more interesting problems that involve real-world data, which is orders of magnitude larger. And the real-world scale is where these algorithms actually become interesting. The code in this book will only work for small examples where efficiency play no role. I understand the author wanted to keep the code as simple as possible, but in my opinion a few notes about how algorithms can be made more efficient would have been necessary. I can see many people trying to apply these algorithms to their real-world data, and getting stuck because of the poor implementation.
"Collective intelligence is shared or group intelligence that emerges from the collaboration, collective efforts, and competition of many individuals and appears in consensus decision making. " from Wikipedia
Also you can do anything with scikit-learn nowadays. Outdated.