Sun Microsystems 5310 NAS Server User Manual


 
Chapter 2 NAS Head 2-73
After reviewing the case, engineering may make specific recommendations and
modifications, or they may recommend that you proceed with the filesystem repair.
For instructions on how to complete a filesystem repair, see “Filesystem check
procedure” under Diagnostic Procedures at the end of this document.
Reoccurrence of filesystem related error messages / mount
problems after repair
If you have run a filesystem check until no errors were reported, or recreated a
volume, this should permanently resolve the filesystem errors. If the errors return,
the source of the problem remains. The most likely source is a hardware problem. A
good first step is to replace the system board memory and the RAID controller, or
failing that, the entire system. Once the source of the problem has been resolved, it
will be necessary to proceed according to the “Filesystem check procedure” under
Diagnostic Procedures at the end of this document.
Checkpoint database problems reported in system log
Can’t delete checkpoints
The indication of a checkpoint database problem is either a hard error (e.g. cannot
write) in the system log when attempting to delete a checkpoint, or an error message
which specifically states “error in checkpoint database”. As the checkpoint
filesystem is read-only, and treated as a separate filesystem in many ways, this
problem must be addressed at the filesystem level. Specifically, via the chkpntabort
command and a file system check.
It is generally recommended that this issue be escalated for assistance in accurately
identifying the problem, and also to locate the source of the problem. The messages
can vary considerably from the above; and similar checkpoint related messages
could lead one down the wrong path toward applying an unnecessarily severe
solution.
A diagnostic email, with all attachments, is required to escalate this type of issue.
The primary source of information for this case is the system log. The diagnostic
should be captured as close as possible to the time the messages occur, so that they
may be seen in context in the system log. Also, collect as much information as
possible about the circumstances surrounding the failure, e.g. when did the
messages first appear, what was happening at the time, symptoms reported by users.
Typically in this case, it is necessary to abort checkpoints on the volume. This is done
from the CLI. After verifying the diagnosis with engineering, access the CLI and
enter “chkpntabort <volumename>”. StorEdge will prompt for confirmation.
Answering “y”, “yes” to the prompt will result in the immediate deletion all
checkpoints. A file system check is required as soon as possible after aborting