Sun Microsystems T5120 Server User Manual


 
4
Th
e
E
vo
l
u
ti
on o
f
Chi
p
M
u
ltith
rea
di
ng
(CMT)
Sun Microsystems, Inc.
estate to build increasingly complex processors, with instruction-level parallelism (ILP)
as a goal. Today these traditional processors employ very high frequencies along with a
variety of sophisticated tactics to accelerate a single instruction pipeline, including:
Large caches
Superscalar designs
Out-of-order execution
Very high clock rates
Deep pipelines
Speculative pre-fetches
While these techniques have produced faster processors with impressive-sounding
multiple-gigahertz frequencies, they have largely resulted in complex, hot, and power-
hungry processors that are not well suited to the types of workloads often found in
modern datacenters. In fact, many datacenter workloads are simply unable to take
advantage of the hard-won ILP provided by these processors. Applications with high
shared memory and high simultaneous user or transaction counts are typically more
focused on processing a large number of simultaneous threads (thread-level
parallelism, TLP) rather than running a single thread as quickly as possible (ILP).
Making matters worse, the majority of ILP in existing applications has already been
extracted and further gains promise to be small. In addition, microprocessor frequency
scaling itself has leveled off because of microprocessor power issues. With higher clock
speeds, each successive processor generation has seemingly demanded more power
than the last, and microprocessor frequency scaling has leveled off in the 2-3 GHz range
as a result. Deploying pipelined Superscalar processors requires more power, limiting
this approach by the fundamental ability to cool the processors.
Chip Multiprocessing with Multicore Processors
To address these issues, many in the microprocessor industry have used the transistor
budget provided by Moore's Law to group two or even four conventional processor
cores on a single physical die — creating multicore processors (or chip multiprocessors,
CMP). The individual processor cores introduced by many CMP designs have no greater
performance than previous single-processor chips, and in fact, have been observed to
run single-threaded applications more slowly than single-core processor versions.
However, the aggregate chip performance increases since multiple programs (or
multiple threads) can be accommodated in parallel (thread level parallelism).
Unfortunately, most currently-available (or soon to be available) chip multiprocessors
simply replicate cores from existing (single-threaded) processor designs. This approach
typically yields only slight improvements in aggregate performance since it ignores key
performance issues such as memory speed and hardware thread context switching. As