IBM DS6000 Computer Drive User Manual


 
248 IBM System Storage DS6000 Series: Copy Services with IBM System z
A database update usually involves three dependent write I/Os to ensure data consistency,
even in the event of a failure during this process:
1. Update intent to the logging files—logging may happen to two logging files.
2. Update the data.
3. Indicate update complete to the logging files.
This sequence of I/Os is also called a two phase commit process.
When you are in a remote copy environment, in an outage situation, the following sequence
may occur; see Figure 22-2 on page 247:
1. Write update intent to logging volume A1.
2. Update to database on volume A2 eventually fails due to a replication problem from the
primary site to the secondary site and volume pairs are defined with CRIT(HEAVY). This
may also be a severe primary storage failure and the beginning of a rolling disaster.
3. The database subsystem recognizes the failed write I/O and indicates, in a configuration
file on volume A3, that one or more databases on A2 have to be recovered due to an I/O
error. This gets replicated to the secondary site because the paths for this storage disk
subsystem pair are still working and this particular primary storage disk subsystem did not
fail yet.
4. The failure eventually progressed and failed the primary site completely. The database
subsystem is restarted at the secondary site after switching over to the secondary site.
5. At startup, the database subsystem discovers in its configuration file that the database on
A2-B2 has to be recovered as indicated before. This is actually not necessary because the
data was synchronously replicated and both related volumes, A2 and B2, are identical and
at the very same level of data currency —that of the moment previous to the error.
Nevertheless, the database subsystem will still recover all databases that are marked for
recovery as per the information found in the configuration files on A3-B3.
6. This recovery is actually not necessary because the data in A2-B2 is perfectly in-sync due
to the synchronous replication approach. Nonetheless, the recovery will take place
because there was no automation window in place to freeze the configuration, which
would have removed the necessary paths between the primary and the secondary sites,
thus ensuring that no further I/Os continue to take place.
This does not happen with an automated solution such as GDPS, that makes use of the
freeze capabilities of the IBM System Storage DS6000. Had GDPS been there, after failing
over to the secondary site, the database subsystem would have just restarted without the
necessity for a lengthy database recovery.
Figure 22-3 on page 249 illustrates a rolling disaster situation where we have an automation
software that makes use of the freeze capability of the DS6000.