Practical File System Design (英語) ペーパーバック – 1999/1/15
Kindle 端末は必要ありません。無料 Kindle アプリのいずれかをダウンロードすると、スマートフォン、タブレットPCで Kindle 本をお読みいただけます。
This is the new guide to the design and implementation of file systems in general, and the Be File System (BFS) in particular. This book covers all topics related to file systems, going into considerable depth where traditional operating systems books often stop. Advanced topics are covered in detail such as journaling, attributes, indexing and query processing. Built from scratch as a modern 64 bit, journaled file system, BFS is the primary file system for the Be Operating System (BeOS), which was designed for high performance multimedia applications.
You do not have to be a kernel architect or file system engineer to use Practical File System Design. Neither do you have to be a BeOS developer or user. Only basic knowledge of C is required. If you have ever wondered about how file systems work, how to implement one, or want to learn more about the Be File System, this book is all you will need.
* Review of other file systems, including Linux ext2, BSD FFS, Macintosh HFS, NTFS and SGI's XFS.
* Allocation policies for placing data on disks and discussion of on-disk data structures used by BFS
* How to implement journaling
* How a disk cache works, including cache interactions with the file system journal
* File system performance tuning and benchmarks comparing BFS, NTFS, XFS, and ext2
* A file system construction kit that allows the user to experiment and create their own file systems
Dominic Giampaolo has a Masters degree in Computer Science from Worchester Polytechnic and is one of the principal kernel engineers for Be Inc. His responsibilities include the file system and various other parts of the kernel. Dominic Giampolo joined Be as one of its principle engineers. He has had the primary responsibility for designing and implementing many of the low level features of the operating system, including the file system.
So, I requested for help and they sent an other copy of the book.
Although it would have been lot more appreciating if it got to me in the first place, still I appreciate their services. They were more than happy to help me with finding the package.
Perhaps the most annoying thing about this book though is that he doesn't finish his thoughts. I felt that often, just as he was getting to the interesting part after cutting through the fluffy descriptions of his design choices, he would leave the topic and not come back. The must frustrating part of this was that after skipping over many pertinent details of how he actually built the BeFS, he spends an excruciating amount of time describing the vnode layer and the exact API that the file system driver must write too -- something I feel would have been better left to a Be-specific API programming manual.
The editing could have used some help. Grammar and flow were pretty good, the book was readable. However, the author too often finished discussions by saying, "we didn't have time." This is annoying and gets old.
Also annoying was the repetion of some lines of thought unecessarily. For instance, he talks about B+ trees and then instead of finishing the conversation, wanders away and talks about something else, and then wanders back and repeats himself before continuing.
Overall organization was a little sloppy, with summary after summary of what was just discussed or what will be discussed next. Focus was also affected by the author's understandable tendency to spend too much time describing the minute details of this or that "tricky" implementation issue that he encountered while build his piece of the BeOS. Obviously he is proud of his accomplishments, and he should be, but I felt the subtle back-patting going on at various points in the book's explanations complicated them unnecessarily with testaments to the author's debugging and optimization skill, rather than providing a complete road-map and "beware" warnings to feature implementors. As I've already pointed out, the book doesn't have enough detail to implement something from, so it's kind of awkward for these very complete descriptions of certain types of problems to be present in the text.
I thought the "summary" sections at the end of the chapters were too creampuff to be useful -- I didn't pay for Cliffs' Notes in my book.
Overall, a reasonably worthwhile purchase, especially given the derth of material in this area, but there are more technical, better explained resources on the net that should also be consulted for more info about file system design.
If you're not familiar with the internals of a filesystem, this is an excellent way to learn. The Be file system is advanced enough to be useful in the real world (better than many in use today), but simple enough to be understood by the average programmer or the well-educated layman. It's not the be-all of filesystems (pun intended), but it's damned good and quite comprehensible. Recommended if you want to see a good example of a file system. If you want cutting edge, you need to start reading the journals and looking over the zfs, next-gen Linux filesystems, lustre, etc.
Finally, you can't beat the price for this book, as it's free at the authors home page. That's now at [...], where you can find both the free pdf version of the book and the File System construction toolkit mentioned in the appendix.
The ideas are in general sound and representative of the current state of file system practice. The historical view is a bit Unix-centric - to state that the Berkeley Fast File System is the ancestor of most modern file systems is to ignore arguably superior and significantly earlier implementations from IBM, DEC, and others. This bias carries over into aspects of implementation as well, such as use of the Unix direct/indirect/double-indirect mapping mechanism to manage contiguous 'block runs' without adding file address information to the mapping blocks to eliminate the need to scan them sequentially (save for the double-indirect blocks, which avoid the scan by establishing a standard run-length of 4 blocks - arrgh!) when positioning within the file - and the unbalanced Unix-style tree itself would almost certainly be better implemented as a b-tree variant (with its root in-line in the i-node) indexed on file address. And the text occasionally blurs the distinction between what the BFS chose to implement (a journal system that forced meta-data update transactions to be serialized) and what is possible (a multi-threaded journal supporting concurrent transactions simply by allowing each transaction to submit a log record for each individual change it makes - which would also support staged execution of extremely large transactions eliminating the log size as a constraint on them).
Some of the choices made in BFS can be questioned, even in its particular use context. The 'allocation group' mechanism interacts in subtle ways with the basic file system block size, and given the relative and on-going improvement of disk seek time vs. rotational latency the value of locating related structures relatively near each other (though not actually adjacent) on disk may no longer justify the added complexity (though the effort to place file inodes immediately following the parent directory inode is likely worthwhile if a read-ahead facility exists to take advantage of it). The discussion of on-disk placement also ignores 'disks' that may in fact be composed of multiple striped units, which would further dilute the benefits of allocation groups; note that this would also complicate the read-ahead facility just mentioned, as would a shared-disk environment unless the disk unit itself performed the read-ahead and replication if present was taken into account (as in the Veritas file system, as I remember).
Even the fundamental decision to make attributes indexable deserves closer examination, given the costs of indexing. Current hardware can perform a complete inode scan on a single-user workstation fast enough to satisfy the occasional random query and can scan the inodes for files within some limited sub-tree of the directory structure (e.g., a cluster of e-mail directories) relatively quickly for more common queries, and in a multi-user environment indexing individual attributes across all users is frequently not the behavior desired. Placing index management under explicit application control may be a better approach, perhaps by allowing the application to specify on attribute creation the index, if any, in which its value should be entered (thus preserving the ability to encapsulate the operation within a system-controlled transaction without the need for user-level transaction support) - and storing the index (perhaps by its inode) with the attribute for later change or deletion.
Conspicuous by their omission are any mentions of how to manage very large allocation bit-maps (which one really must expect when other parts of the system are carefully crafted to handle 2**58-byte files) or the impact of a shared-disk environment (if BFS was intended to be limited to desk-top use this may be more understandable, but even desk-tops may soon have high-availability configurations). Security is mentioned briefly as a concern to be addressed later - but BFS's dynamic allocation of inodes from the general space pool makes this impossible, given that directory inode addresses can apparently be fed in from user-mode (the author does note this near the book's end, but fails to discuss possible remedies).
The author also expresses regret in the introduction at not having had time to include more comparative information on other file systems, both current and historical. Perhaps he is leaving himself room to write a second book. I hope so: despite my comments above, this one was worthwhile - both on its own merits, and because of the lack of competition in this subject area.