HP (Hewlett-Packard) D2D Computer Drive User Manual


 
55
Amount of data in each backup
Data change per backup (deduplication ratio)
Number of D2D systems replicating
Number of concurrent replication jobs from each source
Number of concurrent replication jobs to each target
As a general rule of thumb, however, a minimum bandwidth of 2 Mb/s per replication job should be allowed.
For example, if a replication target is capable of accepting 8 concurrent replication jobs (HP D2D4112) and
there are enough concurrently running source jobs to reach that maximum, the WAN link needs to be able to
provide 16 Mb/s to ensure that replication will run correctly at maximum efficiency below this threshold
replication jobs will begin to pause and restart due to link contention. It is important to note that this minimum
value does not ensure that replication will meet the performance requirements of the replication solution, a lot
more bandwidth may be required to deliver optimal performance.
Seeding and why it is required
One of the benefits of deduplication is the ability to identify unique data, which then enables us to replicate
between a source and a target D2D, only transferring the unique data identified. This process only requires low
bandwidth WAN links, which is a great advantage to the customer because it delivers automated disaster
recovery in a very cost-effective manner.
However prior to being able to replicate only unique data between source and target D2D, we must first ensure
that each site has the same hash codes or “bulk data” loaded on it – this can be thought of as the reference data
against which future backups are compared to see if the hash codes exist already on either source or target. The
process of getting the same bulk data or reference data loaded on the D2D source and D2D target is known as
“seeding”.
Seeding is generally is a one-time operation which must take place before steady-state, low bandwidth
replication can commence. Seeding can take place in a number of ways:
Over the WAN link although this can take some time for large volumes of data
Using co-location where two devices are physically in the same location and can use a GbE replication
link for seeding. After seeding is complete, one unit is physically shipped to its permanent destination.
Using a form of removable media (physical tape or portable USB disks) to “ship data” between sites.
Once seeding is complete there will typically be a 90+% hit rate, meaning most of the hash codes are already
loaded on the source and target and only the unique data will be transferred during replication.
It is good practice to plan for seeding time in your D2D deployment plan as it can sometimes be very time
consuming or manually intensive work.
During the seeding process it is recommended that no other operations are taking place on the source D2D, such
as further backups or tape copies. It is also important to ensure that the D2D has no failed disks and that RAID
parity initialization is complete because these will impact performance.
When seeding over fast networks (co-located D2D devices) it should be expected that performance to replicate a
cartridge or file is similar to the performance of the original backup. If, however, a lot of replication jobs are
running to a single target appliance from several source appliances, performance will be reduced due to the
amount of disk activity required on the target system.
Replication models and seeding
The diagrams in Replication usage models starting on page 49 indicate the different replication models
supported by HP D2D Backup Systems; the complexity of the replication models has a direct influence on which
seeding process is best. For example an Active Passive replication model can easily use co-location to quickly
seed the target device, where as co-location may not be the best seeding method to use with a 50:1, many to 1
replication model.