Modern Information Retrieval (Acm Press Series) (英語) ペーパーバック – 1999/5/15
Kindle 端末は必要ありません。無料 Kindle アプリのいずれかをダウンロードすると、スマートフォン、タブレットPCで Kindle 本をお読みいただけます。
This is a rigorous and complete textbook for a first course on information retrieval from the computer science (as opposed to a user-centred) perspective. The advent of the Internet and the enormous increase in volume of electronically stored information generally has led to substantial work on IR from the computer science perspective - this book provides an up-to-date student oriented treatment of the subject.
Ricardo Baeza-Yates received his Ph.D. in Computer Science from the University of Waterloo, Canada in 1989. In 1992 and 1996, he was elected president of the Chilean Computer Science Society. In 1993, he received the Organization of American States award for yound researcher in exact sciences. He has several papers in various journals and is a member of ACM, AMS, EATCS, IEEE, SCC and SIAM. He is currently a full professor at the Computer Science Department of the University of Chile, Santiago.
Berthier Ribeiro-Neto reveived his Ph.D. in Computer Science from the University of California, Los Angeles in 1995. He is involved with various research projects financed by Braziliam agencies; the two main projects deal with wireless information systems and video on demand. He has chaired distinguished conferences in South America and is a member of ACM, IEEE and ASIS. He is currently an associate professor at the Computer Science Department of the Federal University of Minas Gerais in Belo Horizonte, Brazil.
Considering the low price and the timely and well-written coverage of new topics in IR, I think the book is a terrific buy for any scientist or student conducting research/studies in IR.
Chapter one just acts as a guide to the rest of the book. The book is basically divided into four parts: text IR, human-computer interfacing for IR, multimedia IR, and applications of IR. The part on text IR is best for beginners trying to learn the overall subject of IR, and consists of chapters 2 through 9. Chapter 2 is a long and important chapter that introduces fundamental concepts in IR and lays foundations for later chapters. Models for "ranking" documents based on queries are presented, including the boolean, vector, probabilistic, and fuzzy models. Chapter 3 is far less technical than chapter 2 and focuses on evaluation of IR models. Chapter 4 is an introduction to query languages, which are necessary for the elegant presentation of complex queries. Chapter 5 deals with query operations, which is the transformation of queries from simple keywords into weighted sets of terms and also includes user feedback. As in previous chapters, there is quite a bit of mathematics involved. Chapter 6 is devoted to text languages such as HTML and SGML since the user might refer to the structure of a document in his/her query, and that structure must be defined somewhere. Chapter 7 is about operations on documents themselves for the purpose of simplifying them for quick search. Thus, it is important as a time saver to eliminate common words such as "the" and also to reduce words to their grammatical roots. The potentially large size of document collections requires special indexing techniques for efficient retrieval. This is the subject of Chapter 8. Query processing can be further accelerated by using the parallel and distributed IR techniques discussed in Chapter 9, which concludes the book's discussion of text IR.
Chapter 10 is a stand-alone chapter on HCI for IR that discusses the design of user interfaces that assist the user in forming a query and current approaches for visualization of large data sets.
Multimedia IR is discussed in chapters 11 and 12. Models and query languages for office and medical information systems are discussed in Chapter 11. Efficient indexing and searching of multimedia objects is discussed in Chapter 12.
The final three chapters of the book are about the applications of IR. There is a chapter each about searching the web, bibliographic systems, and digital libraries.
The chapter on text languages is starting to show its age, as are the chapters on IR applications at the end of the book. The chapters on algorithms, and particularly the algorithmic portions of the chapters on text IR cause this book to remain a worthwhile read. There is quite a bit of mathematics used in this book, and probability theory in particular. Thus, the reader should already be familiar with probability theory and the basics of pattern recognition to get the most from this book.
They also give fair warning when they are only covering the outline of subject matter (which is rare), and they give extensive footnotes for anyone who needs to go deeper. The writing is always clear; the auithors never engage in the type of handwaving that other authors use to get past material you have the impression they themselves don't fully grasp.
If you need to implement search for a database and don't know where to start or what might be involved, this is the book for you. If you need to implement the GUI for search results and are wondering what the state of the art is and what issues are involved, then this is the book for you. If you need a well-structured framework to help you understand how internet search engines work, then this is the book for you. If you want to press the research forward on any of these topics and you are not already fluent in the literature, then this is the book for you.