Intel IRP-TR-03-10 Computer Drive User Manual


 
there should be no compromise of robustness, consis-
tency or security. There should also be no added com-
plexity in sharing and collaboration. Finally, the design
should be tolerant of human error: improper use of the
portable storage device (such as using the wrong de-
vice or forgetting to copy the latest version of a file to
it) should not hurt correctness.
Lookaside caching is an extension of AFS2-style
whole-file caching [7] that meets the above goals. It is
based on the observation that virtually all distributed
file system protocols provide separate remote proce-
dure calls (RPCs) for access of meta-data and access of
data content. Lookaside caching extends the definition
of meta-data to include a cryptographic hash of data
content. This extension only increases the size of meta-
data by a modest amount: just 20 bytes if SHA-1 [11]
is used as the hash. Since hash size does not depend on
file length, it costs very little to obtain and cache hash
information even for many large files. Using POSIX
terminology, caching the results of “ls -lR” of a large
tree is feasible on a small client, even if there is not
enough cache space for the contents of all the files in
the tree. This continues to be true even if one augments
stat information for each file or directory in the tree
with its SHA-1 hash.
Once a client possesses valid meta-data for an ob-
ject, it can use the hash to redirect the fetch of data
content. If a mounted portable storage device has a file
with matching length and hash, the client can obtain the
contents of the file from the device rather than from the
file server. Whether it is beneficial to do this depends,
of course, on factors such as file size, network band-
width, and device transfer rate. The important point is
that possession of the hash gives a degree of freedom
that clients of a distributed file system do not possess
today.
Since lookaside caching treats the hash as part of
the meta-data, there is no compromise in consistency.
The underlying cache coherence protocol of the dis-
tributed file system determines how closely client state
tracks server state. There is no degradation in the ac-
curacy of this tracking if the hash is used to redirect
access of data content. To ensure no compromise in se-
curity, the file server should return a null hash for any
object on which the client only has permission to read
the meta-data.
Lookaside caching can be viewed as a degenerate
case of the use of file recipes, as described by Tolia et
al. [22]. In that work, a recipe is an XML description of
file content that enables block-level reassembly of the
file from content-addressable storage. One can view
the hash of a file as the smallest possible recipe for it.
The implementation using recipes is considerably more
complex than our support for lookaside caching. In re-
turn for this complexity, synthesis from recipes may
succeed in many situations where lookaside fails.
4 Prototype Implementation
We have implemented lookaside caching in the
Coda file system on Linux. The user-level implemen-
tation of Coda client cache manager and server code
greatly simplified our effort since no kernel changes
were needed. The implementation consists of four
parts: a small change to the client-server protocol; a
quick index check (the “lookaside”) in the code path
for handling a cache miss; a tool for generating looka-
side indexes; and a set of user commands to include or
exclude specific lookaside devices.
The protocol change replaces two RPCs,
ViceGetAttr() and ViceValidateAttrs()
with the extended calls ViceGetAttrPlusSHA()
and ViceValidateAttrsPlusSHA() that have an
extra parameter for the SHA-1 hash of the file.
ViceGetAttr() is used to obtain meta-data for a
file or directory, while ViceValidateAttrs() is
used to revalidate cached meta-data for a collection
of files or directories when connectivity is restored to
a server. Our implementation preserves compatibility
with legacy servers. If a client connects to a server that
has not been upgraded to support lookaside caching, it
falls back to using the original RPCs mentioned above.
The lookaside occurs just before the execution of
the ViceFetch() RPC to fetch file contents. Before
network communication is attempted, the client con-
sults one or more lookaside indexes to see if a local file
with identical SHA-1 value exists. Trusting in the colli-
sion resistance of SHA-1 [10], a copy operation on the
local file can then be a substitute for the RPC. To de-
tect version skew between the local file and its index,
the SHA-1 hash of the local file is re-computed. In case
of a mismatch, the local file substitution is suppressed
and the cache miss is serviced by contacting the file
server. Coda’s consistency model is not compromised,
although some small amount amount of work is wasted
on the lookaside path.
The index generation tool walks the file tree rooted
3