... | ... | @@ -2,7 +2,7 @@ |
|
|
|
|
|
## Introduction
|
|
|
|
|
|
This wiki gives an overview of the various server machines offered by the ASC chair.
|
|
|
This wiki page gives an overview of the various server machines offered by the ASC group.
|
|
|
|
|
|
|
|
|
## Interactive nodes
|
... | ... | @@ -36,30 +36,119 @@ This wiki gives an overview of the various server machines offered by the ASC ch |
|
|
|
|
|
## CPU architecture overview
|
|
|
|
|
|
| feature | Xeon E5504 | Xeon E5530 | Xeon E3-1585 | Xeon Silver 4110 | Xeon Gold 6130 | Xeon Platinum 8168 | Xeon Phi 7210 |
|
|
|
| --------------------------- | ----------- | ----------- | ------------------ | ---------------- | -------------- | ------------------ | ---------------- |
|
|
|
| architecture | Nehalem | Nehalem | Skylake-H | Skylake-SP | Skylake-SP | Skylake-SP | Knights Landing |
|
|
|
| vector instructions | SSE4.2 | SSE4.2 | AVX2 | AVX-512 | AVX-512 | AVX-512 | AVX-512 |
|
|
|
| cores per socket | 4 | 4 | 4 | 8 | 16 | 24 | 64 |
|
|
|
| SMT | | | 2 threads per core | | | | |
|
|
|
| base frequency | 2.0 GHz | 2.4 GHz | 3.5..3.9 GHz | 2.1 GHz | 2.1 GHz | 2.7 GHz | 1.3 GHz |
|
|
|
| AVX2 frequency | | | 3.5..3.9 GHz | 1.7 GHz | 1.7 GHz | 2.3 GHz | 1.1 GHz |
|
|
|
| AVX-512 frequency | | | | 1.0 GHz | 1.3 GHz | 1.9 GHz | 1.1 GHz |
|
|
|
| L1 instruction cache | 32 KB | 32 KB | 32 KB | 32 KB | 32 KB | 32 KB | 32 KB |
|
|
|
| L1 data cache | 32 KB | 32 KB | 32 KB | 32 KB | 32 KB | 32 KB | 32 KB |
|
|
|
| L2 cache (per core) | 256 KB | 256 KB | 256 KB | 1 MB | 1 MB | 1 MB | 1 MB per 2 cores |
|
|
|
| L2 cache (per socket) | 1 MB | 1 MB | 1 MB | 8 MB | 16 MB | 24 MB | 32 MB |
|
|
|
| L3 cache type | inclusive | inclusive | inclusive | exclusive | exclusive | exclusive | |
|
|
|
| L3 cache (per core) | | | 2 MB | 1.375 MB | 1.375 MB | 1.375 MB | |
|
|
|
| L3 cache (per socket) | 4 MB | 8 MB | 8 MB | 11 MB | 22 MB | 33 MB | |
|
|
|
| eDRAM | | | 128 MB | | | | |
|
|
|
| HBM | | | | | | | 16 GB |
|
|
|
| L1 read bandwidth | | | 448..499 GB/s | 269 GB/s | 269 GB/s | 346 GB/s | 166 GB/s |
|
|
|
| L2 bandwidth | | | 224..250 GB/s | 135 GB/s | 135 GB/s | 173 GB/s | 83 GB/s |
|
|
|
| eDRAM bandwidth, ST | | | | | | | |
|
|
|
| eDRAM bandwidth, concurrent | | | | | | | |
|
|
|
| HBM bandwidth, ST | | | | | | | |
|
|
|
| HBM bandwidth, concurrent | | | | | | | |
|
|
|
| SSE2 GFlops/s | | | | | | | |
|
|
|
| AVX2 GFlops/s | | | | | | | |
|
|
|
| AVX-512 GFlops/s | | | | | | | | |
|
|
| feature | [Xeon E5504][1] | [Xeon E5530][2] | [Xeon E3-1585][3] | [Xeon Silver 4110][4] | [Xeon Gold 6130][5] | [Xeon Platinum 8168][6] | [Xeon Phi 7210][7] |
|
|
|
| --------------------------- | -------------- | -------------- | ------------------ | --------------------- | ------------------- | ----------------------- | --------------------- |
|
|
|
| architecture | [Nehalem][8] | [Nehalem][8] | [Skylake-H][9] | [Skylake-SP][10] | [Skylake-SP][10] | [Skylake-SP][10] | [Knights Landing][11] |
|
|
|
| vector instructions | SSE4.2 | SSE4.2 | AVX2 | AVX-512 | AVX-512 | AVX-512 | AVX-512 |
|
|
|
| cores per socket | 4 | 4 | 4 | 8 | 16 | 24 | 64 |
|
|
|
| SMT | | | 2 threads per core | | | | |
|
|
|
| base frequency | 2.0 GHz | 2.4 GHz | 3.5..3.9 GHz | 2.4..3.0 GHz | 2.8..3.7 GHz | 3.4..3.7 GHz | 1.3..1.5 GHz |
|
|
|
| AVX2 frequency | | | 3.5..3.9 GHz | 2.1..2.9 GHz | 2.4..3.6 GHz | 3.0..3.6 GHz | 1.1 GHz |
|
|
|
| AVX-512 frequency | | | | 1.3..1.8 GHz | 1.9..3.5 GHz | 2.5..3.5 GHz | 1.1 GHz |
|
|
|
| L1 instruction cache | 32 KB | 32 KB | 32 KB | 32 KB | 32 KB | 32 KB | 32 KB |
|
|
|
| L1 data cache | 32 KB | 32 KB | 32 KB | 32 KB | 32 KB | 32 KB | 32 KB |
|
|
|
| L2 cache (per core) | 256 KB | 256 KB | 256 KB | 1 MB | 1 MB | 1 MB | 1 MB per 2 cores |
|
|
|
| L2 cache (per socket) | 1 MB | 1 MB | 1 MB | 8 MB | 16 MB | 24 MB | 32 MB |
|
|
|
| L3 cache type | inclusive | inclusive | inclusive | exclusive | exclusive | exclusive | |
|
|
|
| L3 cache (per core) | | | 2 MB | 1.375 MB | 1.375 MB | 1.375 MB | |
|
|
|
| L3 cache (per socket) | 4 MB | 8 MB | 8 MB | 11 MB | 22 MB | 33 MB | |
|
|
|
| eDRAM | | | 128 MB | | | | |
|
|
|
| HBM | | | | | | | 16 GB |
|
|
|
|
|
|
On more recent CPUs, the clock frequency depends on the number of cores under load and the
|
|
|
instruction set used. A tabular overview can be found on the WikiChip pages linked above. The
|
|
|
value ranges given in the table ("A..B GHz") state the clock frequencies for single-core load and
|
|
|
full load.
|
|
|
|
|
|
Simultaneous multithreading ("hyperthreading") is enabled on the mp-media* machines and disabled
|
|
|
on all other machines.
|
|
|
|
|
|
eDRAM refers to the Embedded DRAM in the Skylake-H CPU, a memory side cache shared between CPU
|
|
|
and GPU, cf.
|
|
|
[https://en.wikichip.org/wiki/intel/microarchitectures/skylake_(client)](https://en.wikichip.org/wiki/intel/microarchitectures/skylake_(client)#eDRAM_architectural_changes).
|
|
|
|
|
|
HBM is the configurable high-bandwidth memory available on the Knights Landing CPUs, cf.
|
|
|
[https://colfaxresearch.com/knl-mcdram/](https://colfaxresearch.com/knl-mcdram/).
|
|
|
|
|
|
|
|
|
## Peak system bandwidth, theoretical
|
|
|
|
|
|
| measure | Xeon E5504 | Xeon E5530 | Xeon E3-1585 | Xeon Silver 4110 | Xeon Gold 6130 | Xeon Platinum 8168 | Xeon Phi 7210 |
|
|
|
| ---------------------------- | ----------- | ----------- | ---------------- | ---------------- | ---------------- | ------------------ | ---------------- |
|
|
|
| L1 read per core | | | f/GHz × 128 GB/s | f/GHz × 128 GB/s | f/GHz × 128 GB/s | f/GHz × 128 GB/s | f/GHz × 128 GB/s |
|
|
|
| L1 write per core | | | f/GHz × 64 GB/s | f/GHz × 64 GB/s | f/GHz × 64 GB/s | f/GHz × 64 GB/s | f/GHz × 64 GB/s |
|
|
|
| L1 read+write per core | | | f/GHz × 192 GB/s | f/GHz × 192 GB/s | f/GHz × 192 GB/s | f/GHz × 192 GB/s | f/GHz × 192 GB/s |
|
|
|
| L2 per core | | | f/GHz × 64 GB/s | f/GHz × 64 GB/s | f/GHz × 64 GB/s | f/GHz × 64 GB/s | f/GHz × 64 GB/s |
|
|
|
| eDRAM, single-threaded | | | | | | | |
|
|
|
| eDRAM, concurrent | | | | | | | |
|
|
|
| HBM, single-threaded | | | | | | | |
|
|
|
| HBM, concurrent | | | | | | | ~400 GB/s |
|
|
|
| DRAM, single-threaded | | | 17.1 GB/s | 19.2 GB/s | 21.3 GB/s | 21.3 GB/s | |
|
|
|
| DRAM, concurrent, per socket | | | 34.1 GB/s | 115.2 GB/s | 128 GB/s | 128 GB/s | ~90 GB/s |
|
|
|
|
|
|
Note that all bandwidth indications use decimal unit prefixes, i.e. 1 GB = $10^9$ B.
|
|
|
|
|
|
The cache bandwidths scale with the clock frequency $f$ which depends on the number of cores
|
|
|
under load and the instruction set used. Computations are based on the number of 64B cache lines
|
|
|
the CPUs can fetch or store in each cycle.
|
|
|
|
|
|
|
|
|
## Peak system floating-point performance, theoretical
|
|
|
|
|
|
| instruction set | Xeon E5504 | Xeon E5530 | Xeon E3-1585 | Xeon Silver 4110 | Xeon Gold 6130 | Xeon Platinum 8168 | Xeon Phi 7210 |
|
|
|
| --------------- | ----------- | ----------- | ------------------- | ------------------- | ------------------- | ------------------- | ------------------- |
|
|
|
| SSE2 | | | | | | | |
|
|
|
| AVX2 | | | f/GHz × 16 GFlops/s | f/GHz × 16 GFlops/s | f/GHz × 16 GFlops/s | f/GHz × 16 GFlops/s | f/GHz × 16 GFlops/s |
|
|
|
| AVX-512 | | | f/GHz × 16 GFlops/s | f/GHz × 16 GFlops/s | f/GHz × 32 GFlops/s | f/GHz × 32 GFlops/s | f/GHz × 32 GFlops/s |
|
|
|
|
|
|
All figures are per core and refer to double precision computations.
|
|
|
|
|
|
Numbers are computed as the number of lanes per SIMD vector × number of vector units per core ×
|
|
|
number of flops / cycles per flop. All Skylake and Knights Landing CPUs have an arithmetic unit
|
|
|
that either serves as two AVX2 pipelines or as one AVX-512 pipeline. Xeon Gold, Xeon Platinum,
|
|
|
and Knights Landing CPUs have an additional arithmetic unit solely for AVX-512 instructions. All
|
|
|
Skylake and Knights Landing can compute a fused multiplication and addition in one cycle.
|
|
|
|
|
|
|
|
|
## Peak system bandwidth, measured
|
|
|
|
|
|
| measure | Xeon E5504 | Xeon E5530 | Xeon E3-1585 | Xeon Silver 4110 | Xeon Gold 6130 | Xeon Platinum 8168 | Xeon Phi 7210 |
|
|
|
| ---------------------------- | ----------- | ----------- | ---------------- | ---------------- | ---------------- | ------------------ | ---------------- |
|
|
|
| L1 read per core | | | | | | | |
|
|
|
| L1 write per core | | | | | | | |
|
|
|
| L1 read+write per core | | | | | | | |
|
|
|
| L2 per core | | | | | | | |
|
|
|
| eDRAM, single-threaded | | | | | | | |
|
|
|
| eDRAM, concurrent | | | | | | | |
|
|
|
| HBM, single-threaded | | | | | | | |
|
|
|
| HBM, concurrent | | | | | | | |
|
|
|
| DRAM, single-threaded | | | | | | | |
|
|
|
| DRAM, concurrent, per socket | | | | | | | |
|
|
|
|
|
|
Note that all bandwidth indications use decimal unit prefixes, i.e. 1 GB = $10^9$ B.
|
|
|
|
|
|
|
|
|
## Peak system floating-point performance, measured
|
|
|
|
|
|
| instruction set | Xeon E5504 | Xeon E5530 | Xeon E3-1585 | Xeon Silver 4110 | Xeon Gold 6130 | Xeon Platinum 8168 | Xeon Phi 7210 |
|
|
|
| --------------- | ----------- | ----------- | ------------------- | ------------------- | ------------------- | ------------------- | ------------------- |
|
|
|
| SSE2 | | | | | | | |
|
|
|
| AVX2 | | | | | | | |
|
|
|
| AVX-512 | | | | 28.7 GFlops/s | 109.1 GFlops/s | 111.8 GFlops/s | |
|
|
|
|
|
|
All figures are per core and refer to double precision computations. Measurements were conducted
|
|
|
with a single thread.
|
|
|
|
|
|
|
|
|
|
|
|
[1] https://ark.intel.com/products/40711
|
|
|
[2] https://ark.intel.com/products/37103
|
|
|
[3] https://en.wikichip.org/wiki/intel/xeon_e3/e3-1585_v5
|
|
|
[4] https://en.wikichip.org/wiki/intel/xeon_silver/4110
|
|
|
[5] https://en.wikichip.org/wiki/intel/xeon_gold/6130
|
|
|
[6] https://en.wikichip.org/wiki/intel/xeon_platinum/8168
|
|
|
[7] https://ark.intel.com/de/products/94033
|
|
|
[8] https://en.wikipedia.org/wiki/Nehalem_(microarchitecture)
|
|
|
[9] https://en.wikichip.org/wiki/intel/microarchitectures/skylake_(client)
|
|
|
[10] https://en.wikichip.org/wiki/intel/microarchitectures/skylake_(server)
|
|
|
[11] https://en.wikichip.org/wiki/intel/microarchitectures/knights_landing |