Cassandra: The Definitive Guide (英語) ペーパーバック – 2016/7/22
Kindle 端末は必要ありません。無料 Kindle アプリのいずれかをダウンロードすると、スマートフォン、タブレットPCで Kindle 本をお読みいただけます。
Imagine what you could do if scalability wasn't a problem. With this hands-on guide, you'll learn how the Cassandra database management system handles hundreds of terabytes of data while remaining highly available across multiple data centers. This expanded second edition--updated for Cassandra 3.0--provides the technical details and practical examples you need to put this database to work in a production environment.
Authors Jeff Carpenter and Eben Hewitt demonstrate the advantages of Cassandra's non-relational design, with special attention to data modeling. If you're a developer, DBA, or application architect looking to solve a database scaling issue or future-proof your application, this guide helps you harness Cassandra's speed and flexibility.
- Understand Cassandra's distributed and decentralized structure
- Use the Cassandra Query Language (CQL) and cqlsh--the CQL shell
- Create a working data model and compare it with an equivalent relational model
- Develop sample applications using client drivers for languages including Java, Python, and Node.js
- Explore cluster topology and learn how nodes exchange data
- Maintain a high level of performance in your cluster
- Deploy Cassandra on site, in the Cloud, or with Docker
- Integrate Cassandra with Spark, Hadoop, Elasticsearch, Solr, and Lucene
Jeff Carpenter is a software and systems architect with experience in the hospitality and defense industries. Jeff cut his teeth as an architect in the early days of Service-Oriented Architecture (SOA) and has worked on projects ranging from a complex battle planning system in an austere network environment, to a cloud-based hotel reservation system. Jeff is passionate about projects and technologies that change industries, helping troubled projects find architectural solutions, and mentoring other architects and developers.
Eben Hewitt is Director of Application Architecture at a publicly traded company where he is responsible for the design of their mission-critical, global-scale web, mobile and SOA integration projects. He has written several programming books, including Java SOA Cookbook (O'Reilly).
Amazon.com で最も参考になったカスタマーレビュー (beta) （「Early Reviewer Program」のレビューが含まれている場合があります）
However, the book is in need of additional editing – it contains enough sections that are confusing, misleading and in some cases, completely wrong, that it is not really suitable as an authoritative reference or (as its title claims) a definitive guide.
A few examples:
Page 70 contains a warning about counters, stating that “the increment and decrement operators are not idempotent”, with no additional explanation. Without further explanation, this statement is useless to most people new to Cassandra because incrementing and decrementing are normally not idempotent operations – incrementing a counter twice should be expected to leave the counter in a state different than incrementing a counter once. The passage goes on to say “There is no operation to reset a counter directly, but you can approximate a reset by reading the counter value and decrementing by that value. Unfortunately, this is not guaranteed to work perfectly, as the counter may have been changed elsewhere in between reading and writing.” While that passage may be correct, it has nothing to do with idempotence; instead it is due to the fact that read-modify-write of counters is not performed atomically by Cassandra. As it happens, there may be an issue with Cassandra counters and idempotence in versions of Cassandra prior to 2.1, and with counter inaccuracies resulting from timeouts in all versions of Cassandra, but these issues are nowhere described in the book. The book’s handling of counters is deficient in other ways as well – e.g. no detailed examples are given to illustrate how counters might be profitably employed in a real-world data model.
Even more concerning is the discussion of “wide rows” which first occurs on page 59 and continues at various points throughout the book. Page 59 defines a wide row as a row that has “lot and lots (perhaps tens of thousands or even millions) of columns”. But, the following page illustrates a wide row as being synonymous with a partition, i.e. a set of rows of a table with a common set of value for the columns that compose the partition key. These are two different notions, and the book does not make it clear which is the correct definition for “wide row”. A later section of the book (on page 90) uses the hotel model (introduced in the logical data modeling section) as an example of the “wide row” model. However, the most columns of any table in the hotel model is 7, hardly “lots and lots”, so presumably this section is using “wide row” to mean “partition” rather than “a row with lots and lots of columns”.
More partition confusion occurs on page 97 under the heading “Calculating Partition Size”. We are warned that we need to calculate a maximum partition size to look for whether “our tables will have partitions that will be overly large”, and that “Cassandra’s hard limit is 2 billion cells per partition, but we’ll likely run into performance issues before reaching that point”. A few paragraphs later, it calculates the partition size (in columns) of the available_rooms_by_hotel_date table from the book’s hotel data model as the number of rows times the number of non-primary key columns. For the number of rows, it uses 5000 hotels *100 rooms/hotel *730 days = 365,000,000. But, this is the number of rows in the table. Since this table’s partition key is hotel_id, there is one partition per hotel, and so the number of rows per partition is actually 100 rooms/hotel*730 days = 73,000, a far cry from 365,000,000!
Page 186 contains a misleading statement about inserting with light-weight transactions. It states that when inserting rows with the “with not exists” qualifier, if a row already exists with the same values for the primary key columns as the row that we are trying to insert, that the CQL interpreter will return a failure, along with the “values that we tried to enter”. However, a few paragraphs above, it is said that “if a transaction fails because the existing values did not match the one you expected, Cassandra will include the current ones so you can decide whether to retry or abort without making an extra request” – which sounds like Cassandra is returning the values that are already in the database rather than the ones that we tried to enter.
A final example of misleading text occurs on page 305, where the sizing for machines used as Cassandra nodes is described. This section recommends that Cassandra nodes in development environments should have at least 2 cores and 8 GB of memory, and that Cassandra nodes in production environments should have at least “eight cores (although four cores are acceptable for virtual machines), and anywhere from 16 MB to 64 MB of memory”. This section raises two questions:
1. Why would a virtual machine need fewer cores than a physical server? This assertion seems dubious. And, even if true (which seems unlikely), it is sufficiently counterintuitive as to require explanation, but none is given.
2. Is 16 MB really sufficient RAM for a production Cassandra node? Presumably the author intended to say 16GB to 64GB (rather than MB).
In summary, the book’s scope and engaging text make it a useful text for those new to Cassandra. However, it is in need of editing, and its numerous inaccuracies and misleading sections preclude it from being useful as an authoritative reference or definitive guide. Hopefully the third edition will address these issues.
- 洋書 > Computers & Technology > Computer Science > Information Theory
- 洋書 > Computers & Technology > Computer Science > Modeling & Simulation
- 洋書 > Computers & Technology > Databases > Data Mining
- 洋書 > Computers & Technology > Databases > Database Design
- 洋書 > Computers & Technology > Databases > Database Management Systems
- 洋書 > Computers & Technology > Hardware > Parallel Processing Computers