Samsung Demos In-Memory Processing for HBM2, GDDR6, DDR4 and LPDDR5X
If Samsung is successful, memory chips in future desktops, laptops, or GPUs might think for themselves. At Hot Chips 33, Samsung announced that it would expand its in-memory processing technology to DDDR4, GDDR6, and LPDDR5X modules in addition to its HBM2 chips. Earlier this year, Samsung announced its HBM2 memory with a built-in processor that can compute up to 1.2 TFLOPS for AI workloads, allowing the memory itself to perform operations typically reserved for CPUs, GPUs. , ASICs or FPGAs. Today marks more progress with this chip, but Samsung also has more powerful variants on the roadmap with its next-gen HBM3. Given the rise of AI-based rendering techniques like scaling, we might even see this technology making its way into gaming GPUs.
Today’s announcement reveals the official brand of Aquabolt-XL HBM2 memory, along with a reveal of AXDIMM DDR4 and LPDDR5 memory that also comes with built-in computing power. Here we have covered every detail of the first HBM-PIM (Processing-In-Memory) chips. Simply put, the chips have an AI engine injected inside each DRAM bank. This allows the memory itself to process the data, which means that the system does not have to move data between memory and the processor, thus saving time and energy. Of course, there is a capacity trade-off for the technology with current memory types, but Samsung says that HBM3 and future memories will have the same capacities as regular memory chips.
Samsung’s Aquabolt-XL HBM-PIM fits directly into the company’s product stack and works with JEDEC-compliant HBM2 memory controllers. It is therefore an instant replacement for standard HBM2 memory. Samsung recently demonstrated this by swapping out its HBM2 memory for a standard Xilinx Alveo FPGA without any board modification, resulting in a 2.5x system performance gain with a 62% reduction in power consumption. energy.
While Samsung’s PIM technology is already compatible with any standard memory controller, improved support for processor vendors will result in more performance in certain scenarios (such as not requiring that many threads to fully utilize them. processing elements). Samsung tells us that it is testing the HBM2-PIM with an anonymous processor vendor for use in its future products. Of course, that could be any number of potential manufacturers, whether they are on the x86 or Arm side of the fence – Intel’s Sapphire Rapids, AMD’s Genoa, and Arm’s Neoverse platforms all support it. loads HBM memory (among others).
Naturally, Samsung’s PIM technology is ideal for data centers, in large part because it is ideal for memory-related AI workloads that don’t require heavy calculations, like speech recognition. Nonetheless, the company is also considering the technology moving to more standard climates as well. To this end, the company also demonstrated its AXDIMM, a new prototype accelerator DIMM that performs processing in the buffer chip. Like the HBM2 chip, it can perform FP16 processing using standard TensorFlow and Python code, although Samsung is working hard to expand support to other types of software. Samsung says this type of DIMM can be integrated into any DDR4-equipped server with LRDIMMs or UDIMMs, and we imagine DDR5 support will follow in due course.
The company says its tests (performed on a Facebook AI workload) found a 1.8-fold increase in performance, a 42.6% reduction in power consumption, and a 70% reduction in latency. of the tail with a 2 row kit, which is very impressive. – especially considering that Samsung plugged DIMMs into a standard server without modifications. Samsung is already testing this on customer servers, so we can expect this technology to hit the market in the near future.
Samsung’s PIM technology is transferable to any of its memory processes or products. So he even started experimenting with PIM memory in LPDDR5 chips, which means the technology could be applied to laptops, tablets, and even cellphones in the future. Samsung is still in the simulation phase with this technology. Still, its testing of a simulated LPDDR5X-6400 chip claims a 2.3-fold performance improvement in speech recognition workloads, a 1.8-fold improvement in a transformer-based translation, and a 2-fold increase in performance. , 4 times in GPT-2 text generation. These performance improvements are associated with a power reduction of 3.85X, 2.17X, and 4.35X, respectively.
This technology is evolving rapidly and works with standard memory controllers and existing infrastructure, but it has yet to be certified by the JEDEC standards committee, a key hurdle Samsung must overcome before it sees widespread adoption. However, the company hopes that the initial PIM specification will be accepted into the HBM3 standard later this year.
Speaking of HBM3, Samsung says it will move from FP16 SIMD processing in HBM2 to FP64 in HBM3, which means the chips will have extended capabilities. FP16 and FP32 will be reserved for data center uses, while INT8 and INT16 will serve the LPDDR5, DDR5 and GDDR6 segments.
In addition, you lose half the capacity of an 8GB chip if you want the computing power of HBM2 PIM, but there will be no such capacity trade-offs in the future: the chips will have full capacity. standard capacity regardless of computing capacity.
Samsung will also bring this capability to other types of memory, such as GDDR6, and expand the possible applications. CXL support could also be on the horizon. Samsung says its Aquabolt-XL HBM2 chips are available for purchase and integration today, with its other products already in development.
Who knows, with the rise of AI-based scaling and rendering techniques, this technology could be a game-changer for enthusiasts more than what we see on the surface. Going forward, it’s plausible that GPU memory can handle some of the compute workloads to improve GPU performance and reduce power consumption.