... | ... | @@ -74,18 +74,18 @@ HBM refers to the configurable high-bandwidth memory available on the Knights La |
|
|
|
|
|
## Peak system bandwidth, theoretical
|
|
|
|
|
|
| measure | Xeon E5504 | Xeon E5530 | Xeon E3-1585 | Xeon Silver 4110 | Xeon Gold 6130 | Xeon Platinum 8168 | Xeon Phi 7210 |
|
|
|
| ---------------------------- | ----------- | ----------- | ---------------- | ---------------- | ---------------- | ------------------ | ---------------- |
|
|
|
| L1 read per core | | | $`f/\mathrm{GHz} \times 128 \mathrm{GB}/\mathrm{s}`$ | $`f`$/GHz × 128 GB/s | $`f`$/GHz × 128 GB/s | $`f`$/GHz × 128 GB/s | $`f`$/GHz × 128 GB/s |
|
|
|
| measure | Xeon E5504 | Xeon E5530 | Xeon E3-1585 | Xeon Silver 4110 | Xeon Gold 6130 | Xeon Platinum 8168 | Xeon Phi 7210 |
|
|
|
| ---------------------------- | ----------- | ----------- | -------------------- | -------------------- | -------------------- | ---------------------- | -------------------- |
|
|
|
| L1 read per core | | | $`f`$/GHz × 128 GB/s | $`f`$/GHz × 128 GB/s | $`f`$/GHz × 128 GB/s | $`f`$/GHz × 128 GB/s | $`f`$/GHz × 128 GB/s |
|
|
|
| L1 write per core | | | $`f`$/GHz × 64 GB/s | $`f`$/GHz × 64 GB/s | $`f`$/GHz × 64 GB/s | $`f`$/GHz × 64 GB/s | $`f`$/GHz × 64 GB/s |
|
|
|
| L1 read+write per core | | | $`f`$/GHz × 192 GB/s | $`f`$/GHz × 192 GB/s | $`f`$/GHz × 192 GB/s | $`f`$/GHz × 192 GB/s | $`f`$/GHz × 192 GB/s |
|
|
|
| L2 per core | | | $`f`$/GHz × 64 GB/s | $`f`$/GHz × 64 GB/s | $`f`$/GHz × 64 GB/s | $`f`$/GHz × 64 GB/s | $`f`$/GHz × 64 GB/s |
|
|
|
| eDRAM, single-threaded | | | | | | | |
|
|
|
| eDRAM, concurrent | | | | | | | |
|
|
|
| HBM, single-threaded | | | | | | | |
|
|
|
| HBM, concurrent | | | | | | | ~400 GB/s |
|
|
|
| DRAM, single-threaded | | | 17.1 GB/s | 19.2 GB/s | 21.3 GB/s | 21.3 GB/s | |
|
|
|
| DRAM, concurrent, per socket | | | 34.1 GB/s | 115.2 GB/s | 128 GB/s | 128 GB/s | ~90 GB/s |
|
|
|
| eDRAM, single-threaded | | | | | | | |
|
|
|
| eDRAM, concurrent | | | | | | | |
|
|
|
| HBM, single-threaded | | | | | | | |
|
|
|
| HBM, concurrent | | | | | | | ~400 GB/s |
|
|
|
| DRAM, single-threaded | | | 17.1 GB/s | 19.2 GB/s | 21.3 GB/s | 21.3 GB/s | |
|
|
|
| DRAM, concurrent, per socket | | | 34.1 GB/s | 115.2 GB/s | 128 GB/s | 128 GB/s | ~90 GB/s |
|
|
|
|
|
|
Note that all bandwidth indications use decimal unit prefixes, i.e. 1 GB = 10⁹ B.
|
|
|
|
... | ... | @@ -96,19 +96,20 @@ the CPUs can fetch or store in each cycle. |
|
|
|
|
|
## Peak system floating-point performance, theoretical
|
|
|
|
|
|
| instruction set | Xeon E5504 | Xeon E5530 | Xeon E3-1585 | Xeon Silver 4110 | Xeon Gold 6130 | Xeon Platinum 8168 | Xeon Phi 7210 |
|
|
|
| --------------- | ----------- | ----------- | ------------------- | ------------------- | ------------------- | ------------------- | ------------------- |
|
|
|
| SSE2 | | | | | | | |
|
|
|
| AVX2 | | | f/GHz × 16 GFlops/s | f/GHz × 16 GFlops/s | f/GHz × 16 GFlops/s | f/GHz × 16 GFlops/s | f/GHz × 16 GFlops/s |
|
|
|
| AVX-512 | | | f/GHz × 16 GFlops/s | f/GHz × 16 GFlops/s | f/GHz × 32 GFlops/s | f/GHz × 32 GFlops/s | f/GHz × 32 GFlops/s |
|
|
|
| instruction set | Xeon E5504 | Xeon E5530 | Xeon E3-1585 | Xeon Silver 4110 | Xeon Gold 6130 | Xeon Platinum 8168 | Xeon Phi 7210 |
|
|
|
| --------------- | ----------- | ----------- | ----------------------- | ----------------------- | ----------------------- | ----------------------- | ----------------------- |
|
|
|
| SSE2 | | | | | | | |
|
|
|
| AVX2 | | | $`f`$/GHz × 16 GFlops/s | $`f`$/GHz × 16 GFlops/s | $`f`$/GHz × 16 GFlops/s | $`f`$/GHz × 16 GFlops/s | $`f`$/GHz × 16 GFlops/s |
|
|
|
| AVX-512 | | | $`f`$/GHz × 16 GFlops/s | $`f`$/GHz × 16 GFlops/s | $`f`$/GHz × 32 GFlops/s | $`f`$/GHz × 32 GFlops/s | $`f`$/GHz × 32 GFlops/s |
|
|
|
|
|
|
All figures are per core and refer to double precision computations.
|
|
|
|
|
|
Numbers are computed as the number of lanes per SIMD vector × number of vector units per core ×
|
|
|
number of flops / cycles per flop. All Skylake and Knights Landing CPUs have an arithmetic unit
|
|
|
that either serves as two AVX2 pipelines or as one AVX-512 pipeline. Xeon Gold, Xeon Platinum,
|
|
|
and Knights Landing CPUs have an additional arithmetic unit solely for AVX-512 instructions. All
|
|
|
Skylake and Knights Landing can compute a fused multiplication and addition in one cycle.
|
|
|
Numbers are computed as *number of lanes per SIMD vector* × *number of vector units per core* ×
|
|
|
*number of flops* / *cycles per flop*. All Skylake and Knights Landing CPUs have an arithmetic
|
|
|
unit that either serves as two AVX2 pipelines or as one AVX-512 pipeline. Xeon Gold, Xeon
|
|
|
Platinum, and Knights Landing CPUs have an additional arithmetic unit solely for AVX-512
|
|
|
instructions. All Skylake and Knights Landing can compute a fused multiplication and addition in
|
|
|
one cycle.
|
|
|
|
|
|
|
|
|
## Peak system bandwidth, measured
|
... | ... | |