HP (Hewlett-Packard) D2D Computer Drive User Manual


 
7
HP StoreOnce Technology
A basic understanding of the way that HP StoreOnce Technology works is necessary in order to understand
factors that may impact performance of the overall system and to ensure optimal performance of your backup
solution.
HP StoreOnce Technology is an “inline” data deduplication process. It uses hash-based chunking technology,
which analyzes incoming backup data in “chunks” that average up to 4K in size. The hashing algorithm
generates a unique hash value that identifies each chunk and points to its location in the deduplication store.
Hash values are stored in an index that is referenced when subsequent backups are performed. When data
generates a hash value that already exists in the index, the data is not stored a second time. Rather, an entry
with the hash value is simply added to the “recipe file” for that backup session.
Key factors for performance considerations with deduplication:
The inline nature of the deduplication process means that there will always be some performance trade-off for
the benefits of increased disk space utilisation.
With each Virtual Library or NAS Share created there is an associated dedicated deduplication store. If
“Global” deduplication across all backups is required, this will only happen if a single virtual library or NAS
share is configured and all backups are sent to it.
The best deduplication ratio will be achieved by configuring a minimum number of libraries/shares. Best
performance will be gained by configuring a larger number of libraries/shares and optimising for individual
deduplication store complexity.
If servers with lots of similar data are to be backed up, a higher deduplication ratio can be achieved by
backing them all up to the same library/share.
If servers contain dissimilar data types, the best deduplication ratio/performance compromise will be achieved
by grouping servers with similar data types together into their own dedicated libraries/shares. For example, a
requirement to back up a set of exchange servers, SQL database servers, file servers and application servers
would be best served by creating four virtual libraries or NAS shares; one for each server set.
When restoring data from a deduplicating device it must reconstruct the original un-deduplicated data stream
from all of the data chunks. This can result in lower performance than that of the backup.
Full backup jobs will result in higher deduplication ratios and better restore performance (because only one
piece of media is needed for a full restore). Incremental and differential backups will not deduplicate as well.
Replication overview
Deduplication technology is the key enabling technology for efficient replication because only the new data
created at the source site needs to replicate to the target site. This efficiency in understanding precisely which
data needs to replicate can result in bandwidth savings in excess of 95% compared to having to transmit the full
contents of a cartridge from the source site. The bandwidth saving will be dependent on the backup change rate
at the source site.
There is some overhead of control data that also needs to pass across the replication link. This is known as
manifest data, a final component of any hash codes that are not present on the remote site and may also need to
be transferred. Typically the “overhead components” are less than 2% of the total virtual cartridge size to
replicate.
Replication can be “throttled” by using bandwidth limits as a percentage of an existing link, so as not to affect
the performance of other applications running on the same link.
Key factors for Key factors for performance considerations with replication:
Seed replication using physical tape or co-location to improve first replicate performance.