High Performance Spark: Best Practices for Scaling and Optimizing Apache Spark ペーパーバック – 2017/6/16
Holden Karau is transgender Canadian, and an active open source contributor. When not in San Francisco working as a software development engineer at IBM's Spark Technology Center, Holden talks internationally on Apache Spark and holds office hours at coffee shops at home and abroad. She is a Spark committer with frequent contributions, specializing in PySpark and Machine Learning. Prior to IBM she worked on a variety of distributed, search, and classification problems at Alpine, Databricks, Google, Foursquare, and Amazon. She graduated from the University of Waterloo with a Bachelor of Mathematics in Computer Science. Outside of software she enjoys playing with fire, welding, scooters, poutine, and dancing.
Rachel Warren is a data scientist and software engineer at Alpine Data Labs, where she uses Spark to address real world data processing challenges. She has experience working as an analyst both in industry and academia. She graduated with a degree in Computer Science from Wesleyan University in Connecticut.
- ASIN : 1491943203
- 出版社 : O'Reilly Media; 第1版 (2017/6/16)
- 発売日 : 2017/6/16
- 言語 : 英語
- ペーパーバック : 358ページ
- ISBN-10 : 9781491943205
- ISBN-13 : 978-1491943205
- 寸法 : 17.78 x 1.88 x 23.34 cm
- Amazon 売れ筋ランキング: - 145,433位洋書 (の売れ筋ランキングを見る洋書)
For beginner Spark users, the book may feel overwhelming, particularly as it focused on Spark RDDs rather than the Spark SQL API which is more widely used. I would highly recommend Zaharia and Chamber's Spark - the Definitive Guide as an alternative purchase as being both more comprehensive and easier to understand. For those, hoping to learn Scala/Spark Scala this book also probably dives in way too fast, and I would recommend Chuisano and Bjarnason's excellent Functional Programming in Scala (although quite hard) and Alexander's Functional Programming Simplified.
On the positive side, the chapter on Key/Value data, although perhaps fairly widely known, was both well-explained and clarifying as were some of the information about how to make more effective transformations.
Some of the code examples are so difficult to read. On top of this, huge chunks of the book 'build upon' old examples, but this just ends up being a complete refactor of the old examples to improve it. Therefore this book can't be used as a handbook without reading it through first. Code examples should have been small and distinct.
Despite these complaints this is a truly fantastic guide, full of straight answers that are difficult or impossible to find online via trial and error.