IBM 440 Server User Manual


 
32 IBM ^ xSeries 440 Planning and Installation Guide
for RAID components and hard disk drives. This reduces labor and service
costs by providing replacement part information in the alert message so that
the correct part can be obtained for the service call.
Software Rejuvenation In networked servers, software often exhibits an
increasing failure rate over time, due to programming errors, data corruption,
numerical error accumulation, etc. These errors can spawn threads or
processes that are never terminated, or they can result in memory leaks or file
systems that fill up over time. These effects constitute a phenomenon known
as software aging, which can lead to unplanned server outages. Advanced
IBM analytical techniques allow IBM Director Software Rejuvenation to
monitor trends and predict system outages based on the experience of
system outages on a given server. Alerts of this sort act as Predictive Failure
Analysis for software, giving an administrator the opportunity to schedule
servicing (rejuvenation) at a convenient time in advance of an actual failure
and avoid costly downtime.
Software Rejuvenation can be scheduled to reset all or part of the software
system with no need for operator intervention. When Software Rejuvenation
reinitializes a server, the servers software failure rate returns to its initial lower
level because resources have been freed up and the cumulative effects of
numerical errors have been removed.
When Software Rejuvenation is invoked within a clustered environment,
cluster management failover services (such as Microsoft Cluster Services and
Microsoft Datacenter Server) may be used to stop the offending subsystem
and restart it on the same or another node in the cluster in a controlled
manner. In a clustered environment, xSeries servers can be set to fail over to
another server, then be reset by IBM Director without downtime.
IBM Director 3.1 includes a Trend Viewer feature to graphically monitor the
software aging process and an application culprit list that identifies the
applications most likely to be causing the aging.
System Availability System Availability accurately measures
uptime/downtime for individual servers or groups of servers, and provides a
variety of graphical views of this information. This enables users to track the
improvements in their server availability in order to verify the benefits of the
systems management processes and tools. IBM Director 3.1 includes the
ability to distinguish between planned versus unplanned outages.
Electronic Service Agent Electronic Service Agent enables the Director
server to contact IBM automatically in the event of a fault condition. Data
gathered by IBM Director that is relevant to the fault is included in the
message, in most cases allowing IBM service to respond to the condition
without the need for additional details. Once IBM has been notified of the
event, the course of action is the same as if a service call was placed
manually. Electronic Service Agent support requires registering the systems