AMD unveiled its next-gen Intuition MI300 accelerator at CES 2023, and we had been fortunate sufficient to get some hands-on time and snap a couple of close-up photographs of the mammoth chip.
Make no mistake, the Intuition MI300 is a game-changing design – the information heart APU blends a complete of 13 chiplets, a lot of them 3D-stacked, to create a chip with twenty-four Zen 4 CPU cores fused with a CDNA 3 graphics engine and eight stacks of HBM3. General the chip weighs in with 146 billion transistors, making it the most important chip AMD has pressed into manufacturing.
The MI300 weighs in with 146 billion complete transistors, simply outweighing Intel’s 100 billion transistor Ponte Vecchio, tied along with 128GB of HBM3 reminiscence. The delidded chip is extremely arduous to {photograph} given its shiny exterior, however you possibly can clearly see the eight stacks of HBM3 that flank the middle dies. Small slivers of structural silicon are positioned between these HBM stacks to make sure stability when a cooling answer is torqued down atop the bundle.
The computing portion of the chip consists of 9 5nm chiplets which can be both CPU or GPU cores, however AMD hasn’t given us particulars on what number of of every are employed. Zen 4 cores are usually deployed as eight-core dies, so we may very well be taking a look at three CPU dies and 6 GPU dies. The GPU dies use AMD’s CDNA 3 structure, the third revision of AMD’s knowledge center-specific graphics structure. AMD hasn’t specified the CU rely.
These 9 dies are 3D-stacked atop 4 6nm base dies that aren’t merely passive interposers – we’re advised these dies are lively and deal with I/O and numerous different features. AMD representatives confirmed us one other MI300 pattern that had the highest dies sanded off with a belt sander to disclose the 4 lively interposer dies beneath, and there we might clearly see the constructions that allow communication not solely between the I/O tiles, but in addition the reminiscence controllers that interface with the HBM3 stacks. We weren’t allowed to {photograph} this second pattern.
The 3D design permits for unimaginable knowledge throughput between the CPU, GPU and reminiscence dies whereas additionally permitting the CPU and GPU to work on the identical knowledge in reminiscence concurrently (zero-copy), which saves energy, boosts efficiency, and simplifies programming. Will probably be attention-grabbing to see if this gadget can be utilized with out commonplace DRAM, as we see with Intel’s Xeon Max CPUs that additionally make use of on-package HBM.
AMD’s representatives had been coy with particulars, so it is not clear if AMD makes use of a typical TSV method to fuse the higher and decrease dies collectively, or if it makes use of a extra superior hybrid bonding method. We’re advised AMD will share extra particulars in regards to the packaging quickly.
AMD claims the MI300 delivers eight occasions the AI efficiency and 5 occasions the efficiency per watt than the Intuition MI250 (measured with FP8 with sparsity). AMD additionally says that it may possibly scale back the coaching time for ultra-large AI fashions, like ChatGPT and DALL-E, from months to weeks, thus saving thousands and thousands of {dollars} of electrical energy.
The present-gen Intuition MI250 powers the Frontier supercomputer, the world’s first exascale machine, and the Intuition MI300 will energy the forthcoming two exaflop El Capitan supercomputer. AMD tells us these halo MI300 chips might be costly and comparatively uncommon — these usually are not a high-volume product, so they will not see extensive deployment just like the EPYC Genoa knowledge heart CPUs. Nevertheless, the tech will filter right down to a number of variants in numerous type components.
This chip may even vie with Nvidia’s Grace Hopper Superchip, which is the mix of a Hopper GPU and the Grace CPU on the identical board. These chips are anticipated to reach this yr. The Neoverse-based Grace CPUs assist the Arm v9 instruction set and programs include two chips fused along with Nvidia’s newly branded NVLink-C2C interconnect tech. AMD’s method is designed to supply superior throughput and vitality effectivity, as combining these gadgets right into a single bundle usually permits greater throughput between the models than when connecting to 2 separate gadgets.
The MI300 may even compete with Intel’s Falcon Shores, a chip that may characteristic a various variety of compute tiles with x86 cores, GPU cores, and reminiscence in a dizzying variety of potential configurations, however these aren’t slated to reach till 2024.
Right here we are able to see the underside of the MI300 bundle with the contact pads used for an LGA mounting system. AMD did not share particulars in regards to the socketing mechanism, however we’ll be sure you study extra quickly — the chip is at the moment in AMD’s labs, and the corporate expects to ship the Intuition MI300 within the second half of 2023. The El Capitan supercomputer would be the world’s quickest supercomputer when it’s deployed in 2023. It’s at the moment on schedule.