HP (Hewlett-Packard) 600 Computer Drive User Manual


 
Troubleshooting 31
CAUTION: When fault tolerance is compromised, data loss can occur. However, it may be
possible to recover the data. For more information, see "Recovering from compromised fault
tolerance (on page 31)."
If more drives fail than the fault-tolerance method can manage, fault tolerance is compromised, and the
logical drive fails. If this failure occurs, the operating system rejects all requests and indicates unrecoverable
errors.
For example, fault tolerance might occur when a drive in an array fails while another drive in the array is
being rebuilt.
Compromised fault tolerance can also be caused by problems unrelated to drives. In such cases, replacing
the physical drives is not required.
Recovering from compromised fault tolerance
If fault tolerance is compromised, inserting replacement drives does not improve the condition of the logical
volume. Perform the following procedure to recover data:
1. Check for loose, dirty, broken, or bent cabling and connectors on all devices.
2. Power down the MDS600 ("Power down" on page 13).
3. Power up the MDS600 ("Power up" on page 12).
In some cases, a marginal drive is operational long enough to allow backup of important files.
4. Make copies of important data, if possible.
5. Replace any failed drives ("Installing the hard drives" on page 19).
Factors to consider before replacing hard drives
You can replace hard drives without powering down the system. However, before replacing a degraded
drive:
Open HP SIM and inspect the Error Counter window for each physical drive in the same array to
confirm that no other drives have any errors. (For details, refer to the HP SIM documentation on the
Management CD.)
Be sure that the array has a current, valid backup.
Use replacement drives that have a capacity at least as great as that of the smallest drive in the array.
The controller immediately fails drives that have insufficient capacity.
To minimize the likelihood of fatal system errors when removing failed drives, take the following precautions:
Do not remove a degraded drive if any other drive in the array is offline (the online LED is off). In this
situation, removing any other drive in the array causes data loss.
Exceptions:
o When RAID 1+0 is used, drives are mirrored in pairs. Several drives can be in a failed condition
simultaneously (and they can all be replaced simultaneously) without data loss, if no two failed
drives belong to the same mirrored pair.
o When RAID 6 with ADG is used, two drives can fail simultaneously (and be replaced
simultaneously) without data loss.
o If the offline drive is a spare, the degraded drive can be replaced.