The family of Apple M1 chips – using an ARM instruction set – is growing with a monstrous version, the M1 Ultra chip, whose design takes up a well-known approach. Indeed, the latter combines in the same block two M1 Max chips by connecting them by a silicon bridge called UltraFusion by the manufacturer. The set offers an ARM family processor with 20 cores, with 16 performance type and 4 low power. The M1 Ultra chip will make its debut in the Mac Studio, a €3679 HT workstation also announced yesterday. According to Johnny Srouji, Apple’s senior vice president of hardware technologies, combining two M1 Max chips in a single block overcomes the physical limitations of future iterations of the M1 silicon. “This Apple-created SoC delivers blazing performance,” Srouji said. It could also compete with high-end processors in the PC market.

M1 Ultra, nearly 114 billion transistors

As Mr. Srouji pointed out, the M1 Ultra chip has 20 cores in total, which doubles the Max CPU core count (and potentially the performance), on paper at least. Apple did not provide details on the manufacturing technology used by the M1 Ultra, but the manufacturer is known to have used 5nm technology for all its previous chips. Potentially, the M1 chip is huge. The M1 Max already had 57 billion transistors, which means it has logically double that. The illustration below provided by Apple gives an idea of ​​its size:

Quite right, Apple’s M1 Ultra CPU is nothing short of massive. We can clearly see the two M1 Max chips and the UltraFusion bridge that connects them together. (Credit: Apple)

The two processors are linked by the UltraFusion interconnect architecture, which provides 2.5 TB/s of total bandwidth between the two chips, which Srouji says is four times the bandwidth of leading processor technology. multi-chip interconnect. Memory bandwidth is also increased, to 800 GB/s, with a total unified memory supported of 128 GB. Connecting multiple silicon dies is not particularly new. In fact, AMD announced its Ryzen Threadripper Pro 5000 chip on the same day. The Santa Clara vendor precisely engineered this design to combine a high number of CPU cores, and the design combines multiple multi-core silicon dies. For example, the 24-core Threadripper Pro 3945WX includes three 8-core CCDs (Core Chiplet Dies), which connect the individual cores together. Note that previous Threadrippers used CCDs with what AMD called a CCX, or Core Complex, which linked processor chips with AMD’s Infinity Fabric. Apple is taking the same approach here. It should also be noted that several companies, including Intel, ARM, Advanced Semiconductor Engineering (ASE), AMD, Qualcomm, Google Cloud, Meta, Microsoft, Samsung and TSMC are working together on a standard called the Universal Chiplet Interconnect Express, or UCIe to accelerate the exchanges between the different components of a chip.

The addition of the two M1 dies gives 24 cores to the M1 Ultra chip. (Apple Credit)

Lots of graphics cores in this dubbed chip, but an external AMD or Nvidia card will always do better. (Apple Credit)

As for CPU performance, Srouji said the M1 Ultra delivered 90% better performance, within the same power envelope, than the fastest 16-core chip – Intel’s Core i9-12900K. – for PC desktop, without specifying on which benchmark Apple relied to affirm it. In other words, the Cupertino company says its M1 Ultra can deliver the same basic performance while consuming 100 watts less than the 12900K. Note, however, that energy sobriety is a very good point for a laptop, whose chassis will struggle to dissipate heat, but not in a tower type workstation cut for performance with one or two discrete graphics cards AMD or Nvidia.

Information on the test tool used for these two benchmarks is lacking to validate Apple’s measurements. (Credit IDG)

Presumably, the graphics circuit performance of the M1 Ultra has also been doubled. The manufacturer said its chip now has a GPU with 64 graphics cores, which is double the M1 Max chip. “Its performance also exceeds that of the highest-end GPU available (namely the Nvidia GeForce RTX 3090 GPU) while consuming 200 watts less,” Srouji said, also without specifying the benchmark used. On the other hand, it seems that the UltraFusion bridge connects the two GPUs together, which few people expected. Finally, according to Apple, the M1 Ultra has 32 neural processing units that can operate up to 22 trillion operations per second to accelerate machine learning. The media engine – for compression/decompression – is also twice as fast.

The M1 Ultra chip against the x86 competition

If we go back a bit, the M1 Pro chip offered 8 performance cores, 2 other low power cores and a GPU with 16 graphics cores; the M1 Max offered 8 performance cores, 2 other low power cores and a GPU with 32 graphics units. Now, the M1 Ultra offers double what the M1 Max itself offers, which, in terms of performance, was already very competitive. Going back to our performance evaluation of the M1 Max, we can see that the M1 Max with 10 cores lagged slightly behind Intel’s Core i9 of 12e 14-core generation. But really not much. From there to suppose that the performance of the M1 Ultra will be double? Maybe not literally, but surely not far. Whichever way you look at it, the M1 Ultra CPU gives Apple a nice lead.

Theoretically doubling the M1 Max’s score to a score of around 3,000 in this benchmark sets the bar very high for the PC industry looking to compete with the M1 Ultra. (Credit: IDG)

Again, in the popular Geekbench benchmark, doubling the performance of the M1 Max shown here would certainly propel the M1 Ultra to the top, easily. However, Apple has a few reasons to be concerned. When you start to consider GPU performance, previous tests show that the current M1’s GPU performance is already suffering, and is struggling too much to probably catch up with the PC ecosystem. The Geekbench Compute benchmark uses the GPU to run 11 processing loads like depth of field, face detection, and particle physics. The test uses two cross-platform APIs, i.e. agreed instructions and protocols to perform functions on different hardware, namely OpenCL, Vulkan and CUDA, Nvidia’s proprietary API, and Metal, the Apple’s proprietary API. The benchmark essentially reveals that Apple’s integrated GPU could be overtaken by a discrete GPU, at any time.

Still, overall, there’s no denying that the M1 Ultra CPU is an absolute monster, as claimed by Apple. Now let’s see how the PC industry, and ARM chip vendors like Fujitsu, Qualcomm and Texas Instrument will react.