5
Th
e
E
vo
l
u
ti
on o
f
Chi
p
M
u
ltith
rea
di
ng
(CMT)
Sun Microsystems, Inc.
a result, while these designs provide some additional throughput and scalability, they
can consume considerable power and generate significant heat — without a
commensurate increase in overall performance.
Chip Multithreading (CMT) with CoolThreads
™
Technology
Sun engineers were early to recognize the disparity between processor speeds and
memory access rates. While processor speeds continue to double every two years,
memory speeds have typically doubled only every six years. As a result, memory latency
now dominates much application performance, erasing even very impressive gains in
clock rates. This growing disconnect is the result of memory suppliers focusing on
density and cost as their design center, rather than speed.
Unfortunately, this relative gap between processor and memory speeds leaves ultra-fast
processors idle as much as 85 percent of the time, waiting for memory transactions to
complete. Ironically, as traditional processor execution pipelines get faster and more
complex, the effect of memory latency grows — fast, expensive processors spend more
cycles doing nothing. Worse still, idle processors continue to draw power and generate
heat. It is easy to see that frequency (gigahertz) is truly a misleading indicator of real
performance.
First introduced with the UltraSPARC T1 processor, chip multithreading takes advantage
of CMP advances, but adds a critical capability —
the ability to scale with threads
rather than frequency
. Unlike traditional single-threaded processors and even most
current multicore (CMP) processors, hardware multithreaded processor cores allow
rapid switching between active threads as other threads stall for memory. Figure 1
illustrates the difference between CMP, fine-grained hardware multithreading (FG-MT),
and chip multithreading. The key to this approach is that each core in a CMT processor
is designed to switch between multiple threads on each clock cycle. As a result, the
processor’s execution pipeline remains active doing real useful work, even as memory
operations for stalled threads continue in parallel.
Figure 1. Chip multithreading combines CMP and fine-grained hardware multithreading
Chip
Multiprocessing
(CMP)
Fine-Grained
Multithreading
(FG-MT)
Chip
Multithreading
(CMT)
(n cores
per processor)
(m strands
per core)
(n x m threads
per processor)
Memory Latency Compute