... | ... | @@ -7,114 +7,122 @@ This wiki page gives an overview of the various server machines offered by the A |
|
|
|
|
|
## Interactive nodes
|
|
|
|
|
|
| hostname | CPU | frequency | cores | architecture | RAM | description |
|
|
|
| -------- | ------------------------| ------------ | ----- | ------------ | ------ | --------------------------------------------------------------------------------------------------- |
|
|
|
| mp-login | | | | | | accessible to everyone with an ASC username |
|
|
|
| mp-headnode | 2× Intel Xeon Gold 6130 | 2.8..3.7 GHz | 2×16 | Skylake-SP | 192 GB | headnode of the compute cluster;<br>compile and debug your applications and submit your jobs here |
|
|
|
| hostname | CPU | frequency | cores | architecture | RAM |
|
|
|
| -------- | -------------------------| ------------ | ------ | ------------ | ------ |
|
|
|
| `mp-login` | | | | | |
|
|
|
| `mp-headnode` | 2 × Intel Xeon Gold 6130 | 2.8..3.7 GHz | 2 × 16 | Skylake-SP | 192 GB |
|
|
|
|
|
|
- `mp-login` is accessible to everyone with an ASC username; you can use this node
|
|
|
to change your password if you do not have cluster access.
|
|
|
- `mp-headnode` is the headnode of the compute cluster. Compile and debug your
|
|
|
applications and submit your jobs there.
|
|
|
|
|
|
|
|
|
## Compute nodes
|
|
|
|
|
|
| hostname | CPU | frequency | cores | architecture | RAM | GPU | NVMe | description |
|
|
|
| ------------ | --------------------------- | ------------ | ----- | --------------- | ------ | -------------------------- | --------- | ------------------|
|
|
|
| mp-skl2s8c | 2× Intel Xeon Silver 4110 | 2.4..3.0 GHz | 2×8 | Skylake-SP | 96 GB | NVIDIA GeForce GTX 1080 Ti | | |
|
|
|
| mp-skl2s24c | 2× Intel Xeon Platinum 8168 | 3.4..3.7 GHz | 2×24 | Skylake-SP | 96 GB | | | |
|
|
|
| mp-media1 | Intel Xeon E3-1585 v5 | 3.5..3.9 GHz | 4 (8) | Skylake-H | 64 GB | Intel Iris Pro P580 | 2× 1.6 TB | |
|
|
|
| mp-media2 | Intel Xeon E3-1585 v5 | 3.5..3.9 GHz | 4 (8) | Skylake-H | 64 GB | Intel Iris Pro P580 | 1.6 TB | |
|
|
|
| mp-media3 | Intel Xeon E3-1585 v5 | 3.5..3.9 GHz | 4 (8) | Skylake-H | 64 GB | Intel Iris Pro P580 | | |
|
|
|
| mp-media4 | Intel Xeon E3-1585 v5 | 3.5..3.9 GHz | 4 (8) | Skylake-H | 64 GB | Intel Iris Pro P580 | | |
|
|
|
| mp-knl1 | Intel Xeon Phi 7210 | 1.3..1.5 GHz | 64 | Knights Landing | 112 GB | | | Quadrant, Cache |
|
|
|
| mp-knl2 | Intel Xeon Phi 7210 | 1.3..1.5 GHz | 64 | Knights Landing | 96 GB | | | Hemisphere, Cache |
|
|
|
| mp-knl3 | Intel Xeon Phi 7210 | 1.3..1.5 GHz | 64 | Knights Landing | 96 GB | | | All2All, Cache |
|
|
|
| mp-knl4 | Intel Xeon Phi 7210 | 1.3..1.5 GHz | 64 | Knights Landing | 96 GB | | | SNC, Cache |
|
|
|
| mp-process01 | Intel Xeon E5504 | 2.0 GHz | 4 | Nehalem | 8 GB | NVIDIA GeForce GTX 1070 | | |
|
|
|
| mp-process02 | Intel Xeon E5504 | 2.0 GHz | 4 | Nehalem | 8 GB | NVIDIA GeForce GTX 1070 | | |
|
|
|
| mp-process03 | Intel Xeon E5504 | 2.0 GHz | 4 | Nehalem | 8 GB | NVIDIA GeForce GTX 1070 | | |
|
|
|
| mp-process04 | Intel Xeon E5504 | 2.0 GHz | 4 | Nehalem | 8 GB | NVIDIA GeForce GTX 1070 | | |
|
|
|
| mp-capture01 | Intel Xeon E5530 | 2.4 GHz | 4 | Nehalem | 48 GB | 2× NVIDIA GeForce GTX 1070 | | |
|
|
|
| mp-capture02 | Intel Xeon E5530 | 2.4 GHz | 4 | Nehalem | 48 GB | 2× NVIDIA GeForce GTX 1070 | | |
|
|
|
| hostname | CPU | frequency | cores | architecture | RAM | GPU | NVMe | description |
|
|
|
| ---------------- | ---------------------------- | ------------ | ------ | --------------- | ------ | --------------------------- | ---------- | ------------------|
|
|
|
| `mp-skl2s8c` | 2 × Intel Xeon Silver 4110 | 2.4..3.0 GHz | 2 × 8 | Skylake-SP | 96 GiB | NVIDIA GeForce GTX 1080 Ti | | |
|
|
|
| `mp-skl2s24c` | 2 × Intel Xeon Platinum 8168 | 3.4..3.7 GHz | 2 × 24 | Skylake-SP | 96 GiB | | | |
|
|
|
| `mp-media1` | Intel Xeon E3-1585 v5 | 3.5..3.9 GHz | 4 | Skylake-H | 64 GiB | Intel Iris Pro P580 | 2 × 1.6 TB | |
|
|
|
| `mp-media2` | Intel Xeon E3-1585 v5 | 3.5..3.9 GHz | 4 | Skylake-H | 64 GiB | Intel Iris Pro P580 | 1.6 TB | |
|
|
|
| `mp-media3` | Intel Xeon E3-1585 v5 | 3.5..3.9 GHz | 4 | Skylake-H | 64 GiB | Intel Iris Pro P580 | | |
|
|
|
| `mp-media4` | Intel Xeon E3-1585 v5 | 3.5..3.9 GHz | 4 | Skylake-H | 64 GiB | Intel Iris Pro P580 | | |
|
|
|
| `mp-knl1` | Intel Xeon Phi 7210 | 1.3..1.5 GHz | 64 | Knights Landing | 96 GiB | | | Quadrant, Cache |
|
|
|
| `mp-knl2` | Intel Xeon Phi 7210 | 1.3..1.5 GHz | 64 | Knights Landing | 96 GiB | | | Hemisphere, Cache |
|
|
|
| `mp-knl3` | Intel Xeon Phi 7210 | 1.3..1.5 GHz | 64 | Knights Landing | 96 GiB | | | All2All, Cache |
|
|
|
| `mp-knl4` | Intel Xeon Phi 7210 | 1.3..1.5 GHz | 64 | Knights Landing | 96 GiB | | | SNC, Cache |
|
|
|
| `mp-process01` | Intel Xeon E5504 | 2.0 GHz | 4 | Nehalem | 8 GiB | NVIDIA GeForce GTX 1070 | | |
|
|
|
| `mp-process02` | Intel Xeon E5504 | 2.0 GHz | 4 | Nehalem | 8 GiB | NVIDIA GeForce GTX 1070 | | |
|
|
|
| `mp-process03` | Intel Xeon E5504 | 2.0 GHz | 4 | Nehalem | 8 GiB | NVIDIA GeForce GTX 1070 | | |
|
|
|
| `mp-process04` | Intel Xeon E5504 | 2.0 GHz | 4 | Nehalem | 8 GiB | NVIDIA GeForce GTX 1070 | | |
|
|
|
| `mp-capture01` | 2 × Intel Xeon E5530 | 2.4 GHz | 2 × 4 | Nehalem | 48 GiB | 2 × NVIDIA GeForce GTX 1070 | | |
|
|
|
| `mp-capture02` | 2 × Intel Xeon E5530 | 2.4 GHz | 2 × 4 | Nehalem | 48 GiB | 2 × NVIDIA GeForce GTX 1070 | | |
|
|
|
|
|
|
- Cache and DRAM capacities are given with binary unit prefixes, i.e. 1 GiB = 2³⁰ B.
|
|
|
- I/O storage device capacities are given with decimal unit prefixes, i.e. 1 TB = 10¹² B.
|
|
|
|
|
|
|
|
|
## CPU architecture overview
|
|
|
|
|
|
| feature \\ CPU | [Xeon E5504](https://ark.intel.com/products/40711) | [Xeon E5530](https://ark.intel.com/products/37103) | [Xeon E3-1585](https://en.wikichip.org/wiki/intel/xeon_e3/e3-1585_v5) | [Xeon Silver 4110](https://en.wikichip.org/wiki/intel/xeon_silver/4110) | [Xeon Gold 6130](https://en.wikichip.org/wiki/intel/xeon_gold/6130) | [Xeon Platinum 8168](https://en.wikichip.org/wiki/intel/xeon_platinum/8168) | [Xeon Phi 7210](https://ark.intel.com/de/products/94033) |
|
|
|
| feature \\ CPU | [Xeon E5504](https://ark.intel.com/products/40711) | [Xeon E5530](https://ark.intel.com/products/37103) | [Xeon E3-1585 v5](https://en.wikichip.org/wiki/intel/xeon_e3/e3-1585_v5) | [Xeon Silver 4110](https://en.wikichip.org/wiki/intel/xeon_silver/4110) | [Xeon Gold 6130](https://en.wikichip.org/wiki/intel/xeon_gold/6130) | [Xeon Platinum 8168](https://en.wikichip.org/wiki/intel/xeon_platinum/8168) | [Xeon Phi 7210](https://ark.intel.com/products/94033) |
|
|
|
| --------------------------- | -------------- | -------------- | ------------------ | --------------------- | ------------------- | ----------------------- | --------------------- |
|
|
|
| architecture | [Nehalem](https://en.wikipedia.org/wiki/Nehalem_(microarchitecture)) | [Nehalem](https://en.wikipedia.org/wiki/Nehalem_(microarchitecture)) | [Skylake-H](https://en.wikichip.org/wiki/intel/microarchitectures/skylake_(client)) | [Skylake-SP](https://en.wikichip.org/wiki/intel/microarchitectures/skylake_(server)) | [Skylake-SP](https://en.wikichip.org/wiki/intel/microarchitectures/skylake_(server)) | [Skylake-SP](https://en.wikichip.org/wiki/intel/microarchitectures/skylake_(server)) | [Knights Landing](https://en.wikichip.org/wiki/intel/microarchitectures/knights_landing) |
|
|
|
| vector instructions | SSE4.2 | SSE4.2 | AVX2 | AVX-512 | AVX-512 | AVX-512 | AVX-512 |
|
|
|
| cores per socket | 4 | 4 | 4 | 8 | 16 | 24 | 64 |
|
|
|
| simultaneous multithreading | | | 2 threads per core | | | | |
|
|
|
| base frequency | 2.0 GHz | 2.4 GHz | 3.5..3.9 GHz | 2.4..3.0 GHz | 2.8..3.7 GHz | 3.4..3.7 GHz | 1.3..1.5 GHz |
|
|
|
| AVX2 frequency | | | 3.5..3.9 GHz | 2.1..2.9 GHz | 2.4..3.6 GHz | 3.0..3.6 GHz | 1.1 GHz |
|
|
|
| AVX-512 frequency | | | | 1.3..1.8 GHz | 1.9..3.5 GHz | 2.5..3.5 GHz | 1.1 GHz |
|
|
|
| L1 instruction cache | 32 KB | 32 KB | 32 KB | 32 KB | 32 KB | 32 KB | 32 KB |
|
|
|
| L1 data cache | 32 KB | 32 KB | 32 KB | 32 KB | 32 KB | 32 KB | 32 KB |
|
|
|
| L2 cache (per core) | 256 KB | 256 KB | 256 KB | 1 MB | 1 MB | 1 MB | 1 MB per 2 cores |
|
|
|
| L2 cache (per socket) | 1 MB | 1 MB | 1 MB | 8 MB | 16 MB | 24 MB | 32 MB |
|
|
|
| SMT threads per core | | 2 (disabled) | 2 (disabled) | 2 (disabled) | | 2 (disabled) | 4 (disabled) |
|
|
|
| base frequency | 2.0 GHz | 2.4 GHz | 3.5..3.9 GHz | 2.4..3.0 GHz | 2.8..3.7 GHz | 3.4..3.7 GHz | 1.3..1.5 GHz |
|
|
|
| AVX2 frequency | | | 3.5..3.9 GHz | 2.1..2.9 GHz | 2.4..3.6 GHz | 3.0..3.6 GHz | 1.1 GHz |
|
|
|
| AVX-512 frequency | | | | 1.3..1.8 GHz | 1.9..3.5 GHz | 2.5..3.5 GHz | 1.1 GHz |
|
|
|
| L1 instruction cache | 32 KiB | 32 KiB | 32 KiB | 32 KiB | 32 KiB | 32 KiB | 32 KiB |
|
|
|
| L1 data cache | 32 KiB | 32 KiB | 32 KiB | 32 KiB | 32 KiB | 32 KiB | 32 KiB |
|
|
|
| L2 cache (per core) | 256 KiB | 256 KiB | 256 KiB | 1 MiB | 1 MiB | 1 MiB | 1 MiB per 2 cores |
|
|
|
| L2 cache (per socket) | 1 MiB | 1 MiB | 1 MiB | 8 MiB | 16 MiB | 24 MiB | 32 MiB |
|
|
|
| L3 cache type | inclusive | inclusive | inclusive | exclusive | exclusive | exclusive | |
|
|
|
| L3 cache (per core) | | | 2 MB | 1.375 MB | 1.375 MB | 1.375 MB | |
|
|
|
| L3 cache (per socket) | 4 MB | 8 MB | 8 MB | 11 MB | 22 MB | 33 MB | |
|
|
|
| eDRAM | | | 128 MB | | | | |
|
|
|
| HBM | | | | | | | 16 GB |
|
|
|
|
|
|
On more recent CPUs, the clock frequency depends on the number of cores under load and the
|
|
|
instruction set used. A tabular overview can be found on the WikiChip pages linked above. The
|
|
|
value ranges given in the table ("A..B GHz") state the clock frequencies for single-core load and
|
|
|
full load.
|
|
|
|
|
|
[Simultaneous multithreading](https://en.wikipedia.org/wiki/Simultaneous_multithreading) (also
|
|
|
known as "hyperthreading") is enabled on the mp-media* machines and disabled on all other
|
|
|
machines.
|
|
|
|
|
|
eDRAM refers to the Embedded DRAM in the Skylake-H CPU, a memory side cache shared between CPU
|
|
|
and GPU, cf.
|
|
|
| L3 cache (per core) | | | 2 MiB | 1.375 MiB | 1.375 MiB | 1.375 MiB | |
|
|
|
| L3 cache (per socket) | 4 MiB | 8 MiB | 8 MiB | 11 MiB | 22 MiB | 33 MiB | |
|
|
|
| eDRAM | | | 128 MiB | | | | |
|
|
|
| HBM | | | | | | | 4 × 4 GiB |
|
|
|
|
|
|
- Cache and DRAM capacities are given with binary unit prefixes, i.e. 1 MiB = 2²⁰ B.
|
|
|
- On more recent CPUs, the clock frequency depends on the number of cores under load and the
|
|
|
instruction set used. A tabular overview can be found on the WikiChip pages linked above. The
|
|
|
value ranges given in the table ("A..B GHz") state the clock frequencies for single-core load and
|
|
|
full load.
|
|
|
- SMT refers to [Simultaneous multithreading](https://en.wikipedia.org/wiki/Simultaneous_multithreading)
|
|
|
(also known as "hyperthreading"). SMT is currently disabled on all machines.
|
|
|
- eDRAM refers to the Embedded DRAM in the Skylake-H CPU, a memory side cache shared between CPU
|
|
|
and GPU, cf.
|
|
|
[https://en.wikichip.org/wiki/intel/microarchitectures/skylake_(client)#eDRAM_architectural_changes](https://en.wikichip.org/wiki/intel/microarchitectures/skylake_(client)#eDRAM_architectural_changes).
|
|
|
|
|
|
HBM refers to the configurable high-bandwidth memory available on the Knights Landing CPUs, cf.
|
|
|
[https://colfaxresearch.com/knl-mcdram/](https://colfaxresearch.com/knl-mcdram/).
|
|
|
- HBM refers to the configurable high-bandwidth memory available on the Knights Landing CPUs, cf.
|
|
|
[https://colfaxresearch.com/knl-mcdram/](https://colfaxresearch.com/knl-mcdram/).
|
|
|
|
|
|
|
|
|
## Peak system bandwidth, theoretical
|
|
|
|
|
|
| measure \\ CPU | Xeon E5504 | Xeon E5530 | Xeon E3-1585 | Xeon Silver 4110 | Xeon Gold 6130 | Xeon Platinum 8168 | Xeon Phi 7210 |
|
|
|
| measure \\ CPU | Xeon E5504 | Xeon E5530 | Xeon E3-1585 v5 | Xeon Silver 4110 | Xeon Gold 6130 | Xeon Platinum 8168 | Xeon Phi 7210 |
|
|
|
| ---------------------------- | ----------- | ----------- | -------------------- | -------------------- | -------------------- | ---------------------- | -------------------- |
|
|
|
| L1 read per core | | | $`f`$/GHz × 128 GB/s | $`f`$/GHz × 128 GB/s | $`f`$/GHz × 128 GB/s | $`f`$/GHz × 128 GB/s | $`f`$/GHz × 128 GB/s |
|
|
|
| L1 write per core | | | $`f`$/GHz × 64 GB/s | $`f`$/GHz × 64 GB/s | $`f`$/GHz × 64 GB/s | $`f`$/GHz × 64 GB/s | $`f`$/GHz × 64 GB/s |
|
|
|
| L1 read+write per core | | | $`f`$/GHz × 192 GB/s | $`f`$/GHz × 192 GB/s | $`f`$/GHz × 192 GB/s | $`f`$/GHz × 192 GB/s | $`f`$/GHz × 192 GB/s |
|
|
|
| L2 per core | | | $`f`$/GHz × 64 GB/s | $`f`$/GHz × 64 GB/s | $`f`$/GHz × 64 GB/s | $`f`$/GHz × 64 GB/s | $`f`$/GHz × 64 GB/s |
|
|
|
| L1 read per core | | | $`f`$/GHz × 128 GB/s | $`f`$/GHz × 128 GB/s | $`f`$/GHz × 128 GB/s | $`f`$/GHz × 128 GB/s | $`f`$/GHz × 128 GB/s |
|
|
|
| L1 write per core | | | $`f`$/GHz × 64 GB/s | $`f`$/GHz × 64 GB/s | $`f`$/GHz × 64 GB/s | $`f`$/GHz × 64 GB/s | $`f`$/GHz × 64 GB/s |
|
|
|
| L1 read+write per core | | | $`f`$/GHz × 192 GB/s | $`f`$/GHz × 192 GB/s | $`f`$/GHz × 192 GB/s | $`f`$/GHz × 192 GB/s | $`f`$/GHz × 192 GB/s |
|
|
|
| L2 per core | | | $`f`$/GHz × 64 GB/s | $`f`$/GHz × 64 GB/s | $`f`$/GHz × 64 GB/s | $`f`$/GHz × 64 GB/s | $`f`$/GHz × 64 GB/s |
|
|
|
| eDRAM, single-threaded | | | | | | | |
|
|
|
| eDRAM, concurrent | | | | | | | |
|
|
|
| HBM, single-threaded | | | | | | | |
|
|
|
| HBM, concurrent | | | | | | | ~400 GB/s |
|
|
|
| DRAM, single-threaded | | | 17.1 GB/s | 19.2 GB/s | 21.3 GB/s | 21.3 GB/s | |
|
|
|
| DRAM, concurrent, per socket | | | 34.1 GB/s | 115.2 GB/s | 128 GB/s | 128 GB/s | ~90 GB/s |
|
|
|
| HBM, concurrent | | | | | | | 563 GB/s |
|
|
|
| DRAM, single-threaded | | | 17.1 GB/s | 19.2 GB/s | 21.3 GB/s | 21.3 GB/s | |
|
|
|
| DRAM, concurrent, per socket | | | 34.1 GB/s | 115.2 GB/s | 128 GB/s | 128 GB/s | 102 GB/s |
|
|
|
|
|
|
Note that all bandwidth indications use decimal unit prefixes, i.e. 1 GB = 10⁹ B.
|
|
|
|
|
|
The cache bandwidths scale with the clock frequency $`f`$ which depends on the number of cores
|
|
|
under load and the instruction set used. Computations are based on the number of 64B cache lines
|
|
|
the CPUs can fetch or store in each cycle.
|
|
|
- Bandwidth indications use decimal unit prefixes, i.e. 1 GB = 10⁹ B.
|
|
|
- Cache bandwidths scale with the clock frequency $`f`$ which depends on the number of
|
|
|
cores under load and the instruction set used.
|
|
|
- Cache bandwidths are computed as *core clock frequency* / GHz × *number of cache lines*
|
|
|
*transferred fetched/stored per cycle* × *cache line size* (64 B).
|
|
|
- DRAM bandwidths are computed as *DRAM transfer rate* [T/s] × *bus width* (64 B/T) ×
|
|
|
*number of channels*.
|
|
|
- TODO how is HBM bandwidth computed?
|
|
|
|
|
|
|
|
|
## Peak system floating-point performance, theoretical
|
|
|
|
|
|
| instruction set \\ CPU | Xeon E5504 | Xeon E5530 | Xeon E3-1585 | Xeon Silver 4110 | Xeon Gold 6130 | Xeon Platinum 8168 | Xeon Phi 7210 |
|
|
|
| instruction set \\ CPU | Xeon E5504 | Xeon E5530 | Xeon E3-1585 v5 | Xeon Silver 4110 | Xeon Gold 6130 | Xeon Platinum 8168 | Xeon Phi 7210 |
|
|
|
| ---------------------- | ----------- | ----------- | ----------------------- | ----------------------- | ----------------------- | ----------------------- | ----------------------- |
|
|
|
| SSE2 | | | | | | | |
|
|
|
| AVX2 | | | $`f`$/GHz × 16 GFlops/s | $`f`$/GHz × 16 GFlops/s | $`f`$/GHz × 16 GFlops/s | $`f`$/GHz × 16 GFlops/s | $`f`$/GHz × 16 GFlops/s |
|
|
|
| AVX-512 | | | $`f`$/GHz × 16 GFlops/s | $`f`$/GHz × 16 GFlops/s | $`f`$/GHz × 32 GFlops/s | $`f`$/GHz × 32 GFlops/s | $`f`$/GHz × 32 GFlops/s |
|
|
|
|
|
|
All figures are per core and refer to double precision computations.
|
|
|
| AVX2 | | | $`f`$/GHz × 16 GFlops/s | $`f`$/GHz × 16 GFlops/s | $`f`$/GHz × 16 GFlops/s | $`f`$/GHz × 16 GFlops/s | $`f`$/GHz × 16 GFlops/s |
|
|
|
| AVX-512 | | | $`f`$/GHz × 16 GFlops/s | $`f`$/GHz × 16 GFlops/s | $`f`$/GHz × 32 GFlops/s | $`f`$/GHz × 32 GFlops/s | $`f`$/GHz × 32 GFlops/s |
|
|
|
|
|
|
Numbers are computed as *number of lanes per SIMD vector* × *number of vector units per core* ×
|
|
|
*number of flops* / *cycles per flop*. All Skylake and Knights Landing CPUs have an arithmetic
|
|
|
unit that either serves as two AVX2 pipelines or as one AVX-512 pipeline. Xeon Gold, Xeon
|
|
|
Platinum, and Knights Landing CPUs have an additional arithmetic unit solely for AVX-512
|
|
|
instructions. All Skylake and Knights Landing can compute a fused multiplication and addition in
|
|
|
one cycle.
|
|
|
- All figures are per core and refer to double precision computations.
|
|
|
- Figures are computed as *number of lanes per SIMD vector* × *number of vector*
|
|
|
*units per core* × *number of flops* / *cycles per flop*. All Skylake and Knights
|
|
|
Landing CPUs have an arithmetic unit that either serves as two AVX2 pipelines or
|
|
|
as one AVX-512 pipeline. Xeon Gold, Xeon Platinum, and Knights Landing CPUs have
|
|
|
an additional arithmetic unit solely for AVX-512 instructions. All Skylake and
|
|
|
Knights Landing can compute a fused multiplication and addition in one cycle.
|
|
|
|
|
|
|
|
|
## Peak system bandwidth, measured
|
|
|
|
|
|
| measure \\ CPU | Xeon E5504 | Xeon E5530 | Xeon E3-1585 | Xeon Silver 4110 | Xeon Gold 6130 | Xeon Platinum 8168 | Xeon Phi 7210 |
|
|
|
| measure \\ CPU | Xeon E5504 | Xeon E5530 | Xeon E3-1585 v5 | Xeon Silver 4110 | Xeon Gold 6130 | Xeon Platinum 8168 | Xeon Phi 7210 |
|
|
|
| ---------------------------- | ----------- | ----------- | ---------------- | ---------------- | ---------------- | ------------------ | ---------------- |
|
|
|
| L1 read per core | | | | | | | |
|
|
|
| L1 write per core | | | | | | | |
|
... | ... | @@ -123,20 +131,20 @@ one cycle. |
|
|
| eDRAM, single-threaded | | | | | | | |
|
|
|
| eDRAM, concurrent | | | | | | | |
|
|
|
| HBM, single-threaded | | | | | | | |
|
|
|
| HBM, concurrent | | | | | | | |
|
|
|
| HBM, concurrent | | | | | | | ~400 GB/s |
|
|
|
| DRAM, single-threaded | | | | | | | |
|
|
|
| DRAM, concurrent, per socket | | | | | | | |
|
|
|
| DRAM, concurrent, per socket | | | | | | | ~90 GB/s |
|
|
|
|
|
|
Note that all bandwidth indications use decimal unit prefixes, i.e. 1 GB = 10⁹ B.
|
|
|
- Bandwidth indications use decimal unit prefixes, i.e. 1 GB = 10⁹ B.
|
|
|
|
|
|
|
|
|
## Peak system floating-point performance, measured
|
|
|
|
|
|
| instruction set \\ CPU | Xeon E5504 | Xeon E5530 | Xeon E3-1585 | Xeon Silver 4110 | Xeon Gold 6130 | Xeon Platinum 8168 | Xeon Phi 7210 |
|
|
|
| instruction set \\ CPU | Xeon E5504 | Xeon E5530 | Xeon E3-1585 v5 | Xeon Silver 4110 | Xeon Gold 6130 | Xeon Platinum 8168 | Xeon Phi 7210 |
|
|
|
| ---------------------- | ----------- | ----------- | ------------------- | ------------------- | ------------------- | ------------------- | ------------------- |
|
|
|
| SSE2 | | | | | | | |
|
|
|
| AVX2 | | | | | | | |
|
|
|
| AVX-512 | | | | 28.7 GFlops/s | 109.1 GFlops/s | 111.8 GFlops/s | |
|
|
|
| AVX-512 | | | | 28.7 GFlops/s | 109.1 GFlops/s | 111.8 GFlops/s | |
|
|
|
|
|
|
All figures are per core and refer to double precision computations. Measurements were conducted
|
|
|
with a single thread. |
|
|
- All figures are per core and refer to double precision computations.
|
|
|
Measurements were conducted with a single thread. |