Difference between revisions of "Intel Xeon Phi and High Bandwidth Memory"
Pthomadakis (talk | contribs) |
Pthomadakis (talk | contribs) (→MCDRAM) |
||
Line 35: | Line 35: | ||
== MCDRAM == | == MCDRAM == | ||
− | MCDRAM is a high bandwidth (~4x more than DDR4), low capacity (up to 16GB) memory, packaged with the Knights Landing Silicon. MCDRAM can be configured as a third level cache (memory side cache) or as a distinct NUMA node (allocatable memory) or somewhere in between. With the different memory modes by which the system can be booted, it becomes very challenging from a software perspective to understand the best mode suitable for an application. | + | MCDRAM is a high bandwidth (~4x more than DDR4), low capacity (up to 16GB) memory, packaged with the Knights Landing Silicon. MCDRAM can be configured as a third level cache (memory side cache) or as a distinct NUMA node (allocatable memory) or somewhere in between. With the different memory modes by which the system can be booted, it becomes very challenging from a software perspective to understand the best mode suitable for an application. At the same time, it is also very essential to utilize the available memory bandwidth in MCDRAM efficiently without leaving any performance on the table. |
− | + | === HBM Modes === | |
+ | HBM can work in three modes, which have to be determined by the BIOS and thus require a reboot in order to switch between them. The three modes are cache, flat and hybrid. | ||
+ | |||
+ | ==== Cache Mode ==== | ||
+ | * Advantages | ||
+ | ** No software modifications required | ||
+ | ** Bandwidth benefit (over DDR) | ||
+ | * Disadvantages | ||
+ | ** Higher latency for DDR access. Needs to go through the MCDRAM first (i.e cache misses) | ||
+ | ** Misses limited by DDR bandwidth |
Revision as of 15:55, 1 June 2018
Contents
High Bandwith Memory
High Bandwidth Memory (HBM) is a high bandwidth RAM interface for 3D-stacked DRAM developed by AMD. Hybrid Memory Cube Interface developed by Micron Technology is also a similar technology but is not compatible with HBM. HBM has been designed to provide higher bandwidth than DDR4 (for CPUs) and GDDR5 (for GPUs) while demanding less power and space. Furthermore, it is compatible with on-chip integration, while other technologies (DRAM, NAND) are not, which allows for even higher performance benefit.
Technology
HBM consists of a number of stacked DRAM dies, with each one having a set of independent channels available to for access by the processors. Each channel interface maintains a 128-bit data bus operating at DDR data rates. The DRAM dies connect to the CPU or GPU through an ultra-fast interconnect called the “interposer. Several stacks of HBM are plugged into the interposer alongside a CPU or GPU, and that assembled module connects to a circuit board. Though these HBM stacks are not physically integrated with the CPU or GPU, they are so closely and quickly connected via the interposer that HBM’s characteristics are nearly indistinguishable from on-chip integrated RAM.
Resources
- AMD High Bandwidth Memory
- HBM (High Bandwidth Memory) DRAM Technology and Architecture
- Nvidia on HBM
Intel Xeon Phi
Xeon Phi is a series of x86 manycore processors designed and made entirely by Intel. They are intended for use in supercomputers, servers, and high-end workstations. Its architecture allows the use of standard programming languages and APIs such as OpenMP. Xeon Phis were originally designed based on a GPU architecture, being used as an external co-processor and requiring a host processor. The main difference with common GPUs like AMD's and Nvidia's counterparts is that they are x86 compatible requiring less effort to transit from a standard x86 processor to a Phi.
Knights Landing
The second Xeon Phi generation codenamed "Knights Landing" (KNL) is the first self-boot Phi processor that is compatible with x86 architecture. It can be used as a standalone processor running standard OS. Current versions number up to 72 cores with 4 threads per core, to a total of 288 threads per unit.
Architecture
The 72 cores running at ~ 1.3 GHz are split into tiles with 2 tiles per core, 2 VPUs ( Vector Processing Units, 512-bits) per core and each tile shares 2MB of L2 cache for a total of 36MB of L2 across the design. Tiles are arranged in a mesh topology using interconnected fabric. KNL incorporates Intel's version of high-bandwidth memory, named MCDRAM (Multi-Channel DRAM), promising about to 400+ GB/s bandwidth when all memory is used in parallel. Because of the cost of the memory and the need to keep it small and in-package, its size is limited to 16 GB. DDR4 RAM controllers are attached and DDR memory is a totally separate component of memory.
MCDRAM
MCDRAM is a high bandwidth (~4x more than DDR4), low capacity (up to 16GB) memory, packaged with the Knights Landing Silicon. MCDRAM can be configured as a third level cache (memory side cache) or as a distinct NUMA node (allocatable memory) or somewhere in between. With the different memory modes by which the system can be booted, it becomes very challenging from a software perspective to understand the best mode suitable for an application. At the same time, it is also very essential to utilize the available memory bandwidth in MCDRAM efficiently without leaving any performance on the table.
HBM Modes
HBM can work in three modes, which have to be determined by the BIOS and thus require a reboot in order to switch between them. The three modes are cache, flat and hybrid.
Cache Mode
- Advantages
- No software modifications required
- Bandwidth benefit (over DDR)
- Disadvantages
- Higher latency for DDR access. Needs to go through the MCDRAM first (i.e cache misses)
- Misses limited by DDR bandwidth