Monitoring Portal Server
136 Portal Server 6 2005Q1 • Deployment Planning Guide
Most applications suggest using a larger percentage of the total heap for the new
generation, but in the case of Portal Server, using only one eighth the space for the
young generation is appropriate, because most memory used by Portal Server is
long-lived. The sooner the memory is copied to the old generation the better the
garbage collection (GC) performance.
Even with a large heap size, after a portal instance has been running under
moderate load for a few days, most of the heap appears to be used because of the
lazy nature of the GC. The GC performs full garbage collections until the resident
set size (RSS) reaches approximately 85 percent of the total heap space; at that point
the garbage collections can have a measurable impact on performance.
For example, on a 900 MHz UltraSPARCIII™, a full GC on a 2 GB heap can take
over ten seconds. During that period of time, the system is unavailable to respond
to web requests. During a reliability test, full GCs are clearly visible as spikes in the
response time. You must understand the impact on performance and the frequency
of full GCs. In production, full GCs go unnoticed most of the time, but any
monitoring scripts that measure the performance of the system need to account for
the possibility that a full GC might occur.
Measuring the frequency of full GCs is sometimes the only way to determine if the
system has a memory leak. Conduct an analysis that shows the expected frequency
(of a baseline system) and compare that to the observed rate of full GCs. To record
the frequency of GCs, use the
vebose:gc JVM™ parameter.
CPU Utilization
When deployed using the building module concept (as described in Chapter 5,
“Creating Your Portal Design”), Portal Server has a capable, scalable CPU
architecture that also degrades gracefully under high loads.
However, when monitoring a production site, track CPU utilization over time.
Load usually comes in spikes and keeping ahead of spikes involves a careful
assessment of availability capabilities.
Most organizations find that portal sites are “sticky” in nature. This means that site
usage grows over time, even when the size of the user community is fixed, as users
become more comfortable with the site. When the size of the user community also
grows over time a successful portal site can see a substantial growth in the CPU
requirements over a short period of time.
When monitoring a portal server’s CPU utilization, determine the average page
latency during peak load and how that differs from the average latency.