When I first starting working on Web application development teams, I was a bit overwhelmed by the number of skills and range of knowledge needed to drive the project through establishing the technical foundations; design, development and testing iterations; to final staged release. Lots of things got discussed in team meetings that I had barely a clue about. Not only do I wish I had this book *then,* I wish all members of my teams could have it *now.*
Cal Henderson has a wide background in the area and is lead developer for Flickr, the photo sharing site that has gained deserved popularity and is often mentioned as the quintessential Web 2.0 application.
The author does an excellent job of spreading out before you the whole process at a high enough level so the book can be valuable for managers, designers, and all sorts of people involved in putting out the final product.
His focus is on program design and design implementation issues, not programming as such. Code is not neglected. Many points regarding design implementation are made with code examples and solutions.
I find this book so personally valuable in grounding me in a complex process, I give it the highest Amazon rating even though I found aspects of the book's organization to be completely incomprehensible. This is a book about scalable Web sites and applications but the author does not define scalability nor does he deal with the broad Web issues (like the scaling myth) until 60% into the book! Chapter 9, titled "Scaling Web Applications" should have been much closer to the beginning since it was a high-level view with no code, as the other chapters. I was also befuddled with placing the chapter on internationalization, localization, and Unicode so early in the book -- even before the chapter on data integrity and security. There is nothing like a mind-numbing Unicode glyphs and grapheme discussion to kill the pacing of a book! Skip the chapter entirely or read it last.
Organization anomalies aside, the author has a good writing style and he does not view humor as a blunt instrument. His four page analogy between layered architecture and an English trifle was worth the space he took. That's high praise from a guy who does a job that Henderson likens to whipped cream.
Unfortunately, I'm finding that there are still some in the software industry--from "two guys in a garage" to the largest corporation--don't know, follow, or believe software best practices. Suddenly when something goes wrong (e.g. the wrong version of a file was deployed, changes can't be rolled back, the application won't scale), everyone scrambles in an effort to figure out what happened. Oftentimes, if simple software practices were followed, many of these issues would never surface.
This book does a tremendous job identifying many of these best practices, identifies how to easily implement them--in almost any situation, and discusses application scaling techniques. As the book mentions, scalability is made up of three characteristics:
* The application can accommodate an increase in users
* The application can accommodate an increase in data
* The application is maintainable
Like any good book on application scalability, this one begins discussing the tiered architecture that is common in so many modern applications, and is a fundamental step in creating any truly scalable application. This follows into a discussion on source control--another fundamental part of keeping the application maintainable.
The author briefly discusses security issues by touching on cross-site scripting (XSS), SQL injection, and the like. The discussion is well written and thorough for the amount of time spent on the topic.
Finally, the author discusses many of the issues related to deployment of web applications, including system monitoring and alerting. There is also an excellent section on load balancing, techniques to keep databases scalable, and caching. Finally, the author ties the final section together by showing how to take data from a live production environment and use that information to continually improve the application.
This is an excellent read--a must if you are in the business of creating web applications. Whether your applications expect loads of 10 users or a million users, the techniques discussed in this book will make your application perform better and be easier to maintain.
The title should be "Overview Of Building Scalable Web Sites".
I give it 2 stars not because it is a bad book but because I was tricked into thinking it was going to be useful as a scalable website builder. What you should do is look at the table of contents and research those topics and not bother reading this book.
The book is more of an overview of the topics you need to consider when building scalable web sites. For example, if you are building a scalable website and the powers that be put someone who knows nothing about web sites in charge of managing you, this really is the perfect book to give to your new manager. Your new manager will get a clue, but your new manager won't know a thing about HOW to build anything, but will know ABOUT what is being built.
The thing that got me is the first 188 pages of the book, just doesn't seem all that useful. On page 1 there is a definition of "What Is a Web Application", I'd estimate a book like this should assume you know what it is (it even suggests you do know what it is), but probably should save space and not even bother writing about it.
Some sections and my summaries:
Layered Software Architecture - could summarize into: DB layer, app code, html, css on top
Layered Technologies - get appropriate book on actual topic such as DB book, and use a template language
Getting from A to B - separate program from markup, use a template system
Hardware Platforms - dedicated, co-located, self hosting, space/power consumption, networking
It took 26 pages to get through all of that. Indeed they are all very important topics (for the web builder and your new manager to know), but as a builder (if you've gone past the first "hello world" website) you should really know that you'll be using a database and writing web app code and using html and css. You should already know that in order to run a website, you'll need to run it on a computer which takes up space and power and needs to be networked. It's good to know that dedicated/colo hosting exists, but no need to write so much about it.
It's almost like a book titled, "Building huge skyscrapers" and then goes on to say you are going to need construction equipment, concrete and steel. You'd hope the person interested in that book has already built houses or commercial buildings and has used construction equipment and concrete and steel already. I'm probably being too harsh here, but that's the jist of it.
My "favorite" chapter is 3, "Development Environments". Use source control, have a good build system, track bugs. Those are very good rules, but to have 19 pages on source control AND 3 of those pages on RCS/CVS, it's like, "Are you kidding me? Isn't this book about building scalable websites?". Nowadays people probably have never even heard of RCS... (the book is a bit dated though).
Chapter 9, Scaling Web Applications has some stuff about load balancing and database replication/master-slave info, but after reading the chapter, you still won't have the first clue of what load balancing system to use or how to setup database replication or clustering... but you'll know that load balancing and database replications exists and know a little about them.
The actual best chapter is chapter 10, Statics, Monitoring and Alerting, there is information there that is useful. For your own sake though, look at the Nagios, Zabbix, etc monitoring packages and that'll get you started in the right direction.
For the reviews which say this book is technical, I couldn't disagree more, if it was actually technical I wouldn't be so annoyed with this book. If it was technical, then you'd know HOW to do something after reading it...
In conclusion, I think it's a good overview on the topics involved, but it's not really about building anything, it's about some topics you need to know that are involved with building a scalable website.
I. Code Freeley
...it's all there.
Maybe it's my background, but I found the first seven chapters to be....dull, and not directly about scalability. To be honest, I almost set the book aside and considered it money NOT well-spent. Then things started to heat up in Chapter 8, and in Chapter 9 it all comes together. That one chapter (9) is the highest density of useful information about website scaling that I've ever seen. There are literally gems on every page.
Make no mistake. This book is more of an overview of the landscape, with brief asides that are clearly brain-dumps from his Flickr experience. The author manages to touch on every topic area that matters, and provide simple overviews of the options available and when they should be applied. In that sense it's more like an informal design patterns book (lots of "yeah, I knew that" and "Ah! I had a feeling there was a pattern there" moments), with just enough detail to let me do intelligent googling for deeper insights on analysis, design, and construction of scalable systems.
Chapters 8, 9, and 10 make the book worth every penny.
I came to "Building Scalable Web Sites" as our team was expanding, introducing two new developers to the existing pair of us. For that reason alone, I jumped at the opportunity to review this book, and once I realized it was from one of the designers of Flickr, that piqued my interest even more. We'd recently moved from hosting our family albums locally to a Flickr website, and so their background and scalability affected me directly.
The introduction chapter started off quite dry, as the author attempted to introduce "What is a Web Application", a step probably not necessary for this volume's intended audience.
The second chapter opened right up however. Henderson's analogy of a trifle to describe Architecture was genius, with your sponge base all the way to your garnish of sprinkles on top. I went racing into the third chapter, exited about the prospects of Source Control ( a problem with our current environment, and one I only see getting worse).
Unfortunately, the book slowed right back down again, dragging through too long segments on Release Management, Issue Management Strategies, and the like. I took longer and longer breaks to come back to this section, almost leaving the book to the side here.
The book continued in this fashion, some bits of great insight and interest, but scattered with wordy, heavy sections that seemed to strangle the pace. As a Higher Education Programmer, Unicode was completely irrelevant to my projects, but the section on Canonical Holes brought me right back in again.
SQL Injections kept me reading right along, but a whole chapter on email in your web application had me drifting again. In summary, Henderson goes into great detail where you need it in some great areas on Scalability, but I'd not read it straight. Find the chapters that relate to your project or your goal, and you'll find a great resource.
The index is great for this purpose, with well thought out keywords that I've already found myself referencing even though I've just only finished the book. The lean of the volume is pretty heavily LAMP, with several Linux/Unix only references and software leads, which would be great for some audiences, but in our ColdFusion/IIS environment, I found myself searching for a tool that was described in the book only to realize that it didn't support Windows Servers.
It also focused heavily on scaling up to millions of users, and I think many system administrators would be more interested in a quicker, dirtier look at taking their dozens to hundreds of users into the thousands and tens of thousands of users instead. Preparing that heavily for growth at these early stages would slow production much too far, in my opinion.
And enjoy the trifle analogy. Mmm...trifle.