- 本カテゴリの商品を2500円以上購入で買取金額500円UPキャンペーン対象商品です。商品出荷時に買取サービスでご利用いただけるクーポンをメールにてご案内させていただきます。 詳細はこちら (細則もこちらからご覧いただけます)
Cassandra: The Definitive Guide (英語) ペーパーバック – 2016/7/22
Kindle 端末は必要ありません。無料 Kindle アプリのいずれかをダウンロードすると、スマートフォン、タブレットPCで Kindle 本をお読みいただけます。
With this hands-on guide, you'll learn how Apache Cassandra handles hundreds of terabytes of data while remaining highly available across multiple data centerscapabilities that have attracted Facebook, Twitter, and other data-intensive companies. Updated for Cassandra 3.0, this second edition provides the technical details and practical examples you need to assess this database management system and put it to work in a production environment.
Authors Jeff Carpenter and Eben Hewitt demonstrate the advantages of Cassandra's nonrelational design, and pay special attention to data modeling. If you're a developer, DBA, application architect, or manager looking to solve a database scaling issue or future-proof your application, this guide shows you how to harness Cassandra's speed and flexibility.
Author bio coming later
Eben Hewitt is Director of Application Architecture at a publicly traded company where he is responsible for the design of their mission-critical, global-scale web, mobile and SOA integration projects. He has written several programming books, including Java SOA Cookbook (O'Reilly).
However, the book is in need of additional editing – it contains enough sections that are confusing, misleading and in some cases, completely wrong, that it is not really suitable as an authoritative reference or (as its title claims) a definitive guide.
A few examples:
Page 70 contains a warning about counters, stating that “the increment and decrement operators are not idempotent”, with no additional explanation. Without further explanation, this statement is useless to most people new to Cassandra because incrementing and decrementing are normally not idempotent operations – incrementing a counter twice should be expected to leave the counter in a state different than incrementing a counter once. The passage goes on to say “There is no operation to reset a counter directly, but you can approximate a reset by reading the counter value and decrementing by that value. Unfortunately, this is not guaranteed to work perfectly, as the counter may have been changed elsewhere in between reading and writing.” While that passage may be correct, it has nothing to do with idempotence; instead it is due to the fact that read-modify-write of counters is not performed atomically by Cassandra. As it happens, there may be an issue with Cassandra counters and idempotence in versions of Cassandra prior to 2.1, and with counter inaccuracies resulting from timeouts in all versions of Cassandra, but these issues are nowhere described in the book. The book’s handling of counters is deficient in other ways as well – e.g. no detailed examples are given to illustrate how counters might be profitably employed in a real-world data model.
Even more concerning is the discussion of “wide rows” which first occurs on page 59 and continues at various points throughout the book. Page 59 defines a wide row as a row that has “lot and lots (perhaps tens of thousands or even millions) of columns”. But, the following page illustrates a wide row as being synonymous with a partition, i.e. a set of rows of a table with a common set of value for the columns that compose the partition key. These are two different notions, and the book does not make it clear which is the correct definition for “wide row”. A later section of the book (on page 90) uses the hotel model (introduced in the logical data modeling section) as an example of the “wide row” model. However, the most columns of any table in the hotel model is 7, hardly “lots and lots”, so presumably this section is using “wide row” to mean “partition” rather than “a row with lots and lots of columns”.
More partition confusion occurs on page 97 under the heading “Calculating Partition Size”. We are warned that we need to calculate a maximum partition size to look for whether “our tables will have partitions that will be overly large”, and that “Cassandra’s hard limit is 2 billion cells per partition, but we’ll likely run into performance issues before reaching that point”. A few paragraphs later, it calculates the partition size (in columns) of the available_rooms_by_hotel_date table from the book’s hotel data model as the number of rows times the number of non-primary key columns. For the number of rows, it uses 5000 hotels *100 rooms/hotel *730 days = 365,000,000. But, this is the number of rows in the table. Since this table’s partition key is hotel_id, there is one partition per hotel, and so the number of rows per partition is actually 100 rooms/hotel*730 days = 73,000, a far cry from 365,000,000!
Page 186 contains a misleading statement about inserting with light-weight transactions. It states that when inserting rows with the “with not exists” qualifier, if a row already exists with the same values for the primary key columns as the row that we are trying to insert, that the CQL interpreter will return a failure, along with the “values that we tried to enter”. However, a few paragraphs above, it is said that “if a transaction fails because the existing values did not match the one you expected, Cassandra will include the current ones so you can decide whether to retry or abort without making an extra request” – which sounds like Cassandra is returning the values that are already in the database rather than the ones that we tried to enter.
A final example of misleading text occurs on page 305, where the sizing for machines used as Cassandra nodes is described. This section recommends that Cassandra nodes in development environments should have at least 2 cores and 8 GB of memory, and that Cassandra nodes in production environments should have at least “eight cores (although four cores are acceptable for virtual machines), and anywhere from 16 MB to 64 MB of memory”. This section raises two questions:
1. Why would a virtual machine need fewer cores than a physical server? This assertion seems dubious. And, even if true (which seems unlikely), it is sufficiently counterintuitive as to require explanation, but none is given.
2. Is 16 MB really sufficient RAM for a production Cassandra node? Presumably the author intended to say 16GB to 64GB (rather than MB).
In summary, the book’s scope and engaging text make it a useful text for those new to Cassandra. However, it is in need of editing, and its numerous inaccuracies and misleading sections preclude it from being useful as an authoritative reference or definitive guide. Hopefully the third edition will address these issues.
I would suggest this book as a must buy for anyone who wants to learns about Cassandra. If possible try to grab the Kindle edition which allows you to read across platforms.