Data Visualization: A Practical Introduction (英語) ペーパーバック – 2018/12/18
Kindle 端末は必要ありません。無料 Kindle アプリのいずれかをダウンロードすると、スマートフォン、タブレットPCで Kindle 本をお読みいただけます。
An accessible primer on how to create effective graphics from data
This book provides students and researchers a hands-on introduction to the principles and practice of data visualization. It explains what makes some graphs succeed while others fail, how to make high-quality figures from data using powerful and reproducible methods, and how to think about data visualization in an honest and effective way.
Data Visualization builds the readers expertise in ggplot2, a versatile visualization library for the R programming language. Through a series of worked examples, this accessible primer then demonstrates how to create plots piece by piece, beginning with summaries of single variables and moving on to more complex graphics. Topics include plotting continuous and categorical variables; layering information on graphics; producing effective small multiple plots; grouping, summarizing, and transforming data for plotting; creating maps; working with the output of statistical models; and refining plots to make them more comprehensible.
Effective graphics are essential to communicating ideas and a great way to better understand data. This book provides the practical skills students and practitioners need to visualize quantitative data and get the most out of their research findings.
- Provides hands-on instruction using R and ggplot2
- Shows how the tidyverse of data analysis tools makes working with R easier and more consistent
- Includes a library of data sets, code, and functions
"Finally! A data visualization guide that is simultaneously practical and elegant. Healy combines the beauty and insight of Tufte with the concrete helpfulness of Stack Exchange. Data Visualization is brimming with insights into how quantitative analysts can use visualization as a tool for understanding and communication. A must-read for anyone who works with data."--Elizabeth Bruch, University of Michigan
"Data Visualization is a brilliant book that not only teaches the reader how to visualize data but also carefully considers why data visualization is essential for good social science. The book is broadly relevant, beautifully rendered, and engagingly written. It is easily accessible for students at any level and will be an incredible teaching resource for courses on research methods, statistics, and data visualization. It is packed full of clear-headed and sage insights."--Becky Pettit, University of Texas at Austin
"Healy provides a unique introduction to the process of visualizing quantitative data, offering a remarkably coherent treatment that will appeal to novices and advanced analysts alike. There is no other book quite like this."--Thomas J. Leeper, London School of Economics
"Kieran Healy has written a wonderful book that fills an important niche in an increasingly crowded landscape of materials about software in R. Data Visualization is clear, beautifully formatted, and full of careful insights."--Brandon Stewart, Princeton University
"Healy's prose is clear and direct. I came away from this book with a much better understanding of both visualizations and R."--Neal Caren, University of North Carolina, Chapel Hill
"Innovative and extraordinarily well-written."--Jeremy Freese, Stanford University
"Healy's fun and readable book is unusual in covering the 'why do' as well as the 'how to' of data visualization, demonstrating how dataviz is a key step in all stages of social science--from theory construction to measurement to modeling and interpretation of analyses--and giving readers the tools to integrate visualization into their own work."--Andrew Gelman, author of Red State, Blue State, Rich State, Poor State: Why Americans Vote the Way They Do
There's also one case where the text associated with a figure is both inaccurate and essentially "left as an execise for the reader.
The example I'll give is from p.47, but it's not the first instance of this problem -- just the straw that broke my back.
FWIW, I'm not a newbie: I've been a programmer for 40 years, and have worked in at least a dozen high-level languages, including APL, FORTRAN, COBOL, Pascal, C/C++, Java, Lisp, Prolog, and Perl. Some of these (e.g., Perl & APL) have pretty exotic sytaxes, so I'm not unfamiliar with compact and hieroglyphic notation. I'm not bragging -- after 40 years, I'd be a slacker if I hadn't accumulated a decent amount of experience -- just saying that I do have (at least in theory) enough background that I don't think I'm just being stupid.
In this example, Healy has presented a data frame, and then a tibble created from it. He says "Look carefully at the top and bottom of the output to see what additional information the tibble class gives you over and above the data frame version."
The only _substantive_ difference at the top is a row between the first row (var names) and the first observation that reads as follows:
<fct> <fct> <dbl> <dbl>
"dbl" is easy, but WTF is fct? "function"? "fact"? looks like string data (perished, survived; male, female), but my intuition says that we have two defined enumerated data types, each with two values ... but ...
I should not have to rely on guesswork: the meaning of <fct> should be easily accessible.
There's no index entry for "fct"; I see one for "facet" -- is this a facet? I won't find out for another 30 pages. If this were the Kindle version, I could search for <fct>, and for fct if I find nothing for <fct> -- but (a) this if the paperback and (b) I shouldn't _have to_ search for the meaning of a notation in an example.
Returning to the tibble, there is _nothing_ added at the bottom: the last row is the same as the last row of the data frame (modulo the addition of a decimal point after 344, the value of the "n" variable).
I haven't decided yet whether to return it, but I'm put off by sloppy proofreading & copy-editing, partly because it's unprofessional, but largely because I'm a publisher and have _done_ proofing & copy-editing for 6 books, and I am scrupulous about it (and have at least 2 other people check me before I release a title). It's hard, and time-consuming, but to have an otherwise decent book marred by random glitches like this decreases the value & utility of the book.
If I can find errata for this book, and they address the many glitches I've found, I may keep it; o/w I'll wait for the 2nd edition.
My main goal is to introduce you to both the ideas and the methods of data visualization in a sensible, comprehensible, reproducible way.
Well, mission accomplished. The book is at once enormously readable, and sufficiently technically detailed as to make it easy to implement the principles introduced.
The book itself is also beautifully designed. The use of figures and margin notes give you a sense of being guided through the ideas rather than just being told what they are. I've had lots of fun going back to some of my own visualizations made with R and ggplot2 and improving them based on what I learned here.
I absolutely recommend this to beginners and experts alike. Healy gives you everything you'd need to know if you're starting from scratch, but in such a way as to not slow things down for the more experienced reader. For that reason, it would also make a great book for a course on applied use of R.
The reason is that the author provides a narrative that is easy to read and that focuses on the basic logic of the gg structure (rather than just the syntax). This orientation can be helpful even when the user already knows the syntax (mostly) because it helps create a mental framework that enables more creative use of ggplot2 beyond examples that are provided in the many books available.
I recommend the book even if you have other intro books already and even if you already have basic capabilities in using ggplot2. It is up-to-date and adds a perspective that expands both appreciation and and facility with ggplot2.