HP (Hewlett-Packard) C7S14A Computer Hardware User Manual


 
Performance of the K20X Module
2688 CUDA cores
1.32 Tflops of double-precision peak performance
3.95 Tflops of single-precision peak performance
GDDR5 memory optimizes performance and reduces data transfers by keeping large data sets in 6 GB of local memory that is
attached to the GPU
The NVIDIA Parallel DataCache™ accelerates algorithms such as physics solvers, ray-tracing, and sparse matrix multiplication
where data addresses are not known beforehand. This includes a configurable L1 cache per Streaming Multiprocessor block and
a unified L2 cache for all of the processor cores.
Asynchronous transfer turbo charges system performance by transferring data over the PCIe bus while the computing cores are
crunching other data. Even applications with heavy data-transfer requirements, such as seismic processing, can maximize the
computing efficiency by transferring data to local memory before it is needed.
Dynamic Parallelism capability that enables GPU threads to automatically spawn new threads.
Hyper-Q feature that enables multiple CPU cores to simultaneously utilize the CUDA cores on a single GPU.
The high speed PCIe Gen 2.0 data transfer maximizes bandwidth between the HP ProLiant server and the Tesla processors.
Reliability
"ECC Memory meets a critical requirement for computing accuracy and reliability for datacenters and supercomputing centers. It
offers protection of data in memory to enhance data integrity and reliability for applications. For M2075, M2070Q, M2090, K20,
K20X register files, L1/L2 caches, shared memory, and DRAM all are ECC protected. For K10, only external DRAM is ECC
protected. Double-bit errors are detected and can trigger alerts with the HP Cluster Management Utility. Also, the Platform LSF
job scheduler, available as part of HP HPC Linux Value Pack, can be configured to report when jobs encounter double-bit errors.
Passive heatsink design eliminates moving parts and cables reduces mean time between failures.
Programming and Management Ecosystem
The CUDA programming environment has broad support of programming languages and APIs. Choose C, C++, OpenCL,
DirectCompute, or Fortran to express application parallelism and take advantage of the innovative Tesla architectures. The
CUDA software, as well as the GPU drivers, can be automatically installed on HP ProLiant servers, by HP Insight Cluster
Management Utility.
"Exclusive mode" enables application-exclusive access to a particular GPU. CUDA environment variables enable cluster
management software such as the Platform LSF job scheduler (available as part of HP HPC Linux Value Pack) to limit the Tesla
GPUs an application can use.
With HP ProLiant servers, application programmers can control the mapping between processes running on individual cores, and
the GPUs with which those processes communicate. By judicious mappings, the GPU bandwidth, and thus overall performance,
can be optimized. The technique is described in a white paper available to HP customers at:
www.hp.com/go/hpc
. A heuristic
version of this affinity-mapping has also been implemented by HP as an option to the mpirun command as used for example
with HP-MPI, available as part of HP HPC Linux Value Pack.
GPU control is available through the nvidia-smi tool which lets you control compute-mode (e.g. exclusive),
enable/disable/report ECC and check/reset double-bit error count. IPMI and iLO gather data such as GPU temperature. HP Cluster
Management Utility has incorporated these sensors into its monitoring features so that cluster-wide GPU data can be presented
in real time, can be stored for historical analysis and can be easily used to set up management alerts.
QuickSpecs
NVIDIA Tesla GPU Modules for HP ProLiant Servers
Standard Features
DA - 13743 North America — Version 16 — September 30, 2013
Page 4