Analyzing Linguistic Data: A Practical Introduction to Statistics using R ペーパーバック – 2008/3/6
Kindle 端末は必要ありません。無料 Kindle アプリのいずれかをダウンロードすると、スマートフォン、タブレットPCで Kindle 本をお読みいただけます。
Statistical analysis is a useful skill for linguists and psycholinguists, allowing them to understand the quantitative structure of their data. This textbook provides a straightforward introduction to the statistical analysis of language. Designed for linguists with a non-mathematical background, it clearly introduces the basic principles and methods of statistical analysis, using 'R', the leading computational statistics programme. The reader is guided step-by-step through a range of real data sets, allowing them to analyse acoustic data, construct grammatical trees for a variety of languages, quantify register variation in corpus linguistics, and measure experimental data using state-of-the-art models. The visualization of data plays a key role, both in the initial stages of data exploration and later on when the reader is encouraged to criticize various models. Containing over 40 exercises with model answers, this book will be welcomed by all linguists wishing to learn more about working with and presenting quantitative data.
R. H. Baayen is Professor of Quantitative Linguistics at Radboud University of Nijmegen and the Max Planck Institute for Psycholinguistics, Nijmegen.
With that said, I should mention a few things. One, having some computing background will help a lot when reading this book. I imagine that without prior exposure to statistics or any sort of coding, the book would be a bit overwhelming. Knowing some statistics or coding beforehand will help. It should go without saying that the reader knows some linguistics as well.
Two, the book is starting to show its age. R has changed a bit in the past 5 years since it came out. The example code (that so graciously comes with the R package specifically written for the book, languageR) has a few inconsistencies and is even obsolete in some places--especially in the last chapter. For example, the Design package has been replaced by rms.
Third, I found it helpful to have a dataset ready to be analyzed. I gained a lot more from this book because I was able to apply it to my own dataset in addition to the examples.
I can't say for certain how this book compares to similar books. The two main ones I know about are Gries' Statistics for Linguistics with R and Johnson's Quantitative Methods In Linguistics. I haven't read either in much depth, but it appears that Gries' is more mathy, while Johnson's is more explicit on how to apply things to specific subfields. And while you'll get a ton more background in a standard introduction to statistics book, actual application to your own linguistic data--and how to do it in R--might be the best part of this book.
[Edit: One thing that that Baayen's book that I really miss in other books (Gries' and Johnson's) is that all the code for the entire book is in one file. Instead of having to sift through dozens of short snippets, the entire book is in one file. It makes things a lot easier to run. Also, what I took for granted while reading through this book is that it comes with its own R package "languageR" that has all the datasets built into it, not to mention a couple handy functions. Instead of going to the companion website and downloading files and having to worry about that, it's fully integrated into R itself. Extremely helpful!]
I thought the book was great. Again, I went from nothing to a pro in just four months. The statistics jargon I've heard and read in papers now makes a lot more sense and I can critically analyze others' methodologies. While Variationist Sociolinguistics and Laboratory Phonologists would probably get the most out of something like this, I think any linguist student should get some statistics background like the kind in this book.
R. Baayen's R book is well organized. The first two chapters encourage readers to sit down at their computer and type commands into the R command line. This hands-on introduction is supplemented by guidance on managing R sessions and creating command files. Subsequent chapters teach basic graphing techniques and statistical probability. The book then steps through the standard curriculum of introductory statistics, from simple t-tests through advanced regression modeling. Chapter 6 on clustering and classification gives this topic more attention that it receives in most introductory stats texts. The data sets and analysis tasks are drawn from applied linguistics and seem realistic and interesting--to this psychologist, anyway.
The book's instructional chapters are supported by helpful resources. The data sets and associated files are easily downloaded from the author's web site. The chapters are filled with example R code and output, allowing readers to follow examples closely and check their work. Back-of-the-book materials include answers to chapter exercises, a topical organization of R functions, and a very complete and up-to-date reference section. Three separate indices help readers find references to datasets and R commands as well as general topics.
My knowledge of and skill with R has increased as a result of using this book. I feel well prepared to conduct analyses in R that I have done previously in SPSS because I have become familiar with not only specific commands, but with the R way of doing things. I'll be moving on to Quantitative Corpus Linguistics with R: A Practical Introduction to supplement my experience in text mining with a better understanding of what computational linguists do in R.