HP (Hewlett-Packard) D2D Computer Drive User Manual


 
37
Verify
By default most backup applications will perform a verify pass on each back job, in which they read the backup
data from the D2D and check against the original data.
Due to the nature of deduplication the process of reading data is slower than writing as data needs to be re-
hydrated. Thus running a verify will more than double the overall backup time. If possible verify should be
disabled for all backup jobs to D2D.
Synthetic full backups
Some backup applications have introduced the concept of a “Synthetic Full” backup where after an initial full
backup, only file or block based incremental backups are undertaken. The backup application will then construct
a full system recovery of a specific point in time from the original full backup and all of the changes up to the
specified recovery point.
In most cases this model will not work well with a NAS target on a D2D backup system for one of two reasons.
The backup application may post-process each incremental backup to apply the changes to the
original full backup. This will perform a lot of random read and write and write-in-place which will
be very inefficient for the deduplication system resulting in poor performance and dedupe ratio.
If the backup application does not post-process the data, then it will need to perform a
reconstruction operation on the data when restored, this will need to open and read a large number
of incremental backup files that contain only a small amount of the final recovery image, so the
access will be very random in nature and therefore a slow operation.
An exception to this restriction is the HP Data Protector Synthetic full backup which works well. However the HP
Data Protector Virtual Synthetic full backup which uses a distributed file system and creates thousands of open
files does not.
Housekeeping impact on maximum file size selection
The housekeeping process which is used to reclaim disk space when data is overwritten or deleted runs as a
background task and for NAS devices will run on any backup file as soon as the file is closed by the backup
application.
Housekeeping is governed by a back-off algorithm which attempts to reduce its impact on the performance of
other operations by throttling it back when performance may be impacted; however there is still some impact of
housekeeping on the performance of other operations like backup.
When choosing the maximum size that a backup file may grow to, it is important to consider housekeeping. If the
file size is small (e.g. 4 5GB) and therefore lots of files make up a single backup job, housekeeping will by
default run as soon as the first and any subsequent files are overwritten and closed by the backup application,
thus housekeeping will run in parallel with the backup reducing backup performance. Using larger files will
generally mean that housekeeping does not run until after the backup completes.
In some situations however it may be preferable to have a small amount of housekeeping running throughout the
backup rather than a larger amount which starts at the end. For example, if backup performance is already slow
due to other network bottlenecks the impact of housekeeping running during the backup may be negligible and
therefore the total time to complete backup and housekeeping is actually faster.
Some backup applications, however, will always create housekeeping at the start of the backup by deleting
expired backup files at the beginning of the backup.
The housekeeping process can be temporarily delayed by applying housekeeping blackout windows to cover the
period of time when backups are running; this is considered best practice. In general it is best to use larger
backup files as previously described.
G1 and G2 products using 1.1.X and 2.1.X or later software contain functionality that allows the user to monitor
and have some control over the housekeeping process. This software provides user configurable blackout
windows for housekeeping during which time it will not run and therefore not impact performance of other