If you are like me, you have probably dabbled in PHP and MySQL driven CMS solutions, simply because that combination is so prevalent. I consider myself to be primarily a front-end developer / graphic designer, and am not terribly interested in knowing every server-side language there is, so long as whatever I am using can get the job done. Suffice it to say that while I am eager to learn and expand my horizons, my exposure to more than LAMP has been limited.
So, when Jason Gilmore sent me a review copy of his latest book, that he co-authored with Robert Treat, I was glad to be learning about this powerful database. The title is Beginning PHP and PostgreSQL 8, and the structure of the text mirrors closely that of Jason's other best-selling book on PHP and MySQL 5, now in its second edition. To get a feel of the PHP aspect of both these books, read my other previous reviews on the First Edition and the Second Edition...
This saves me having to repeat myself, covering the PHP side of things as much. Suffice it to say that it is extensive and well worth reading. Now, allow me to focus on the second aspect of this book, PostgreSQL. While MySQL bosts being the world's most popular open source database, PostgreSQL has stake to the claim of being the world's most advanced open source database. Are these two in competition? Well, yes and no. I will expound further on that, but first a brief history lesson.
In 1986 at UC Berkeley, professor Michael Stonebreaker set out to build a better open source database than his previous project, called INGRES. Since the first project was a huge success, he decided to entitle the follow-up Post-GRES. So, while the first name was an acronym, Postgres is simply a derived nick-name that grew out of it. Postgres became so popular that providing tech support was becoming far too time consuming for Stonebreaker and his team. Eventually they ceased development to focus on teaching.
But, since it was released under the BSD license, development was picked up by Andrew Yu and Jolly Chen, who added in SQL functionality. Eventually this hybrid grew into the PostgreSQL database we know today. Their mantra has always been stability first, speed second. Whereas MySQL is built for responsiveness, PostgreSQL is designed to be rock-solid. This mentality is reflected in the choices of database mascots: a Dolphin vs. an Elephant.
You might have been hearing about PostgreSQL more lately because of the rising interest around the Django Project, which recommends PG as their database of choice. While it is of course driven by Python and not PHP, this further illustrates the scalable versatility of PostgreSQL. When reading this book, I was blown away by some of the server strain that PG can handle.
Consider these examples: Afilias Incorporated, the Internet registrar company responsible for managing the .info domain name extension, handles over 1000 database inserts per second! The NOAA weather service Weather.gov has scaled their operations across 150 PG servers. Whitepages.com runs PG databases exceeding 375 gigabytes with over 250 million rows. Big newspaper sites also run PG, such as Lawrence.com, KUSports.com and LJWorld.com.
Mega-churches such as FellowshipChurch.com also make use of PostgreSQL. When you have 20,000 people that go to your church, each one needing to check up on what's happening on a dynamically driven PHP website, that can be quite a server strain. To handle this, you either need an expensive proprietary platform, or a scalable open source solution. Fellowship opted to roll their own CMS and power it with PG.
Okay, so now you get the point that PostgreSQL is a force to be reckoned with. Allow me to touch briefly on the code aspects of what I thought was cool in this book. Robert has done a great job of flowing with Jason's proven writing-style. Many times, multi-author books differing topics can seem disjointed, but that is not the case here. He explains things very thoroughly without mincing words.
One of the cool features of PG is Multi Version Concurrency Control or MVCC for short. This allows "snapshots" of your database to be taken at set intervals, so that you can serve up cached data without repeatedly hitting the database, risking a hardware lock-up. This is commonly referred to as the "Digg Effect" or becoming a victim of being "Slashdotted," in which high-traffic websites send a flurry of incoming visitors to a lesser server which cannot handle the load.
PostgreSQL also supports MySQL style row-level locking, but the MVCC method is preferred because of its stability. Another nice feature is the ability to store commonly used queries, and even create abbreviations for frequently called procedures. You could think of it like referring to a best friend by a nick-name. This set of features has been native to PG for some time now, and is something that MySQL has recently implemented as of version 5.
The way I think of the differences between MySQL and PostgreSQL is that PG is a more transactional database. Meaning, if you want large enterprise level sites that handle many insertions and modifications throughout any given day, PostgreSQL is probably the way to go. MySQL can do this with with the InnoDB table type, but PG has better native support for it. PG also sports Foreign key support, Views, Stored procedures, Triggers, Unions and Full joins.
On the flip-side of that coin is that MySQL is typically a bit faster, has a more simplified database layout, and does not require vacuuming. Vacuum is a command that must be run from time to time, and typically is set as an automated process. One of the ways PG maintains consistent stability is through the previously mentioned MVCC method, which disperses duplicate data in order to avoid hitting one particular location repeatedly. Periodically, these duplicates need to be reigned in or vacuumed, to clean up disk space.
In comparing PostgreSQL with MySQL side by side, there is not a clear-cut winner, because it is a bit like comparing apples and oranges. If you want a fast and responsive site, MySQL is probably the way to go. If you are managing one that has a high amount of traffic and needs to handle enterprise level or "mission critical" data, PostgreSQL should be one of the solutions you consider. I think of it this way: MySQL is a ninja, PostgreSQL is a samurai. One is quick and nimble, the other more heavily armored. If you want to learn more about the latter, then this is the book for you; Also available as an eBook, PDF file from the Apress website...