Chapter 2 HPSS Planning
66 September 2002 HPSS Installation Guide
Release 4.5, Revision 2
storage class reaches the threshold configured in the purge policy for that storage class. Remember
that simply adding migration and purge policies to a storage class will cause MPS to begin running
against the storage class, but it is also critical that the hierarchies to which that storage class belongs
be configured with proper migration targets in order for migration and purge to perform as
expected.
The purpose of disk migration is to make one or more copies of data stored in a disk storage class
to lower levels in the storage hierarchy. BFS uses a metadata queue to pass migration records to MPS.
When a disk file needs to be migrated (because it has been created, modified, or undergone a class
of service change), BFS places a migration record on this queue. During a disk migration run on a
given storage class, MPS uses the records on this queue to identify files which are migration
candidates. Migration records on this queue are ordered by storage hierarchy, file family, and record
create time, in that order. This ordering determines the order in which files are migrated.
MPS allows disk storage classes to be used atop multiple hierarchies (to avoid fragmenting disk
resources). To avoid unnecessary tape mounts, it is desirable to migrate all of the files in one
hierarchy before moving on to the next. At the beginning of each run MPS selects a starting
hierarchy. This is stored in the MPS checkpoint metadata between runs. The starting hierarchy
alternates to ensure that, when errors are encountered or the migration target is not 100 percent, all
hierarchies are served equally. For example, if a disk storage class is being used in three hierarchies,
1, 2, and 3, successive runs will migrate the hierarchies in the following order: 1-2-3, 3-1-2, 2-3-1, 1-
2-3, etc. A migration run ends when either the migration target is reached or all of the eligible files
in every hierarchy are migrated. Files are ordered by file family for the same reason, although
families are not checkpoints as hierarchies are. Finally, the record create time is simply the time at
which BFS adds the migration record to the queue, and so files in the same storage class, hierarchy,
and familytend to migrate in theorder which theyare written (actually the order in which the write
completes).
When a migration run for a given storage class starts work on a hierarchy, it sets a pointer in the
migration record queue to the first migration record for the given hierarchy and file family.
Following this, migration attempts tobuild lists of 256 migration candidates. Each migrationrecord
read is evaluated against the values in the migration policy. If the file in question is eligible for
migration its migration record is added to the list. If the file is not eligible, it is skipped and it will
not be considered again until the next migration run. When 256 eligible files are found, MPS stops
reading migration records and does the actual work to migrate these files. This cyclecontinues until
either the migration target is reached or all of the migration records for the hierarchy in question
are exhausted.
The purpose of disk purge is to maintain a given amount of free space in a disk storage class by
removing data of which copies exist at lower levels in the hierarchy. BFS uses another metadata
queue to pass purge records to MPS. A purge record is created for any disk file which may be
removed from a given level in the hierarchy (because it has been migrated or staged). During a disk
purge run on a given storage class, MPS uses the records on this queue to identify files which are
purge candidates. The order in which purge records are sorted may be configured on the purge
policy, and this determines the order in which files are purged. It should be noted that all of the
options except purge record create time require additional metadata updates and can impose extra
overhead on SFS. Also, unpredictable purge behavior may be observed if the purge record ordering
is changed with existing purge records in the system until these existing records are cleared. Purge
operates strictly on a storage class basis, and makes no consideration of hierarchies or file families.
MPS builds lists of 32 purge records, and each file is evaluated for purge at the point when its purge
record is read. If a file is deemed to be ineligible, it will not be considered again until the next purge run.
A purge run ends when either the supply of purge records is exhausted or the purge target is
reached.