by  

AI’s skyrocketing power demands have served as a strong catalyst for innovation in the world of data center cooling. Traditional air cooling has given way to advanced liquid cooling methods like two-phase direct-to-chip, which delivers stronger thermal performance and a considerable headroom to empower the next era of AI.

Recent articles by Microsoft and TSMC have announced the next link in the evolutionary chain of cooling, known as direct-to-die. Unlike direct-to-chip, which utilizes a cold plate (or vaporator) and thermal interface materials (or TIM) to cool a chip package, a direct-to-die approach does away with a cold plate and TIM entirely. Instead, it uses microfluidics to flow cooling fluid through complex grooves in a silicon chip die: either a water-based fluid for single-phase direct-to-die, or a refrigerant (non-electrically conducting) for two-phase direct-to-die.

While etching and optimizing these complex grooves in silicon poses a unique design and manufacturing challenge — a challenge that industry leaders indicate may take a decade to solve — it’s easy to understand why direct-to-die has so rapidly captured the industry’s attention. Its ability to cool future processors with ultra-high Thermal Design Power (TDP) and heat flux requirements, combined with its potential to approach a near-zero thermal resistance, means it may prove to be “magnitudes better” vs. other popular cooling methods, according to Akshith Narayanan, Product Development Thermal Engineer at Accelsius.

To be clear: we’re equally enthusiastic about direct-to-die. “It’s something you hear about in academia but never thought would ever be deployed in the industry,” says Narayanan. While chip manufacturers work over the next 5-10 years to get direct-to-die ready for widespread adoption, we’ll also be hard at work preparing our solution to hit the ground running once direct-to-die makes its mainstream debut. 

And when that happens? Largely thanks to our dielectric fluid, two-phase is perfectly positioned to emerge as the leading direct-to-die liquid cooling method. 

Why two-phase will triumph

Single-phase direct-to-chip (1P D2C) and two-phase direct-to-chip (2P D2C) both rely on a cold plate and TIM, which add thermal resistance; therefore, both single-phase and two-phase will mutually benefit from direct-to-die’s removal of cold plates and TIMs. With 2P and 1P gaining a similar reduction in thermal resistance, we’ll therefore have to look at the fluids to conclude which method gains a clear competitive advantage.

We’ve written elsewhere about why the inherent qualities of two-phase’s refrigerant vs. single-phase’s water translates to 2P D2C’s victory in terms of thermal performance, energy savings, and much more. Two-phase’s advantages likewise apply to direct-to-die. In fact, not only will two-phase deliver a stronger performance and greater savings once direct-to-die is ready for industry adoption — unlike single-phase, it could even accelerate the timeline for adoption itself.

Zero risk of failure due to leaks

It’s basic science: if water interacts with exposed electronics, disaster occurs. That isn’t the case with a non-conductive refrigerant like ours. 

When making the switch from direct-to-chip to direct-to-die, a water-based cooling method will invite even greater risk of damage. Simply put, you aren’t just feeding water into a cold plate; you’re spilling it onto the chip itself. Any leak, however small, would lead to instant failure.

If chip manufacturers rely on a water-based approach, they’ll have to ensure their etchings are theoretically leak-proof, which could substantially delay direct-to-die’s widespread adoption — and still wouldn’t fully prevent leak-induced disasters. 

With our refrigerant, however, “we have the flexibility to play with alternative thermal designs, because if leaks occur, it’s not a catastrophe,” Narayanan states. This ability for experimentation can empower chip manufacturers to release designs for two-phase direct-to-die much sooner — delivering the same boost in cooling performance with zero concern for damage due to leaks.

Our isothermal cooling equals lower complexity

Single-phase’s non-uniform (or non-isothermal) cooling will become exaggerated with direct-to-die; without a cold plate laid over the entire chip to help spread the heat, it’s harder to address the localized hotspots and thermal gradients that naturally occur with single-phase. If this non-uniformity isn’t properly addressed, it can lead to the chip being bent or warped. Properly compensating for this will create additional design hurdles that chip manufacturers and single-phase cooling providers alike will have to overcome — potentially adding further delays until direct-to-die’s full release.

Meanwhile, two-phase’s isothermal properties mean that it’ll be even easier (and quicker) to manufacture two-phase direct-to-die solutions. A simpler design is often simpler to design, after all.

Less chance of erosion, zero chance of corrosion

There’s another critical factor to consider with water: namely, the probability of bio-growths in your system. According to CTO Dr. Richard Bonner, if water invites microbes and other contaminants to fester in a direct-to-die system, “you aren’t just gumming up or corroding your cold plate; you’re destroying or needing to deep clean the GPU itself.”

Similarly, because water needs to operate at markedly higher flow rates to achieve the same results as two-phase, a single-phase direct-to-die system is more likely to cause erosion of a GPU or CPU’s silicon; in essence, you’re blasting cold water at higher pressures onto hundreds of thousands of dollars’ worth of AI investments.

Therefore, it’s best to stick with two-phase fluids, which are non-corrosive and operate at a 75-90% lower flow rate, extending your chip’s lifecycle and removing the need for constant maintenance to prevent bacteria from eating your GPU.

Ready for our research?

As previously mentioned, our engineering team has initiated its R&D testing on two-phase direct-to-die cooling. Already, the research conducted has yielded promising results.

Utilizing heat blocks that mimicked NVIDIA Blackwell B200 GPUs, our team discovered that even a bare die (with no additional design modifications) demonstrated 2X the performance vs. a direct-to-chip approach, with significant reductions to thermal resistance. In turn, this reduced thermal resistance allowed the system to operate at 50-55°C facility water (FW) temperatures — enabling a data center to allocate more energy away from cooling towards compute.

Eager to hear the full results? Attend our session at ASME InterPACK at 3:45 PM on Wednesday, October 29th. You’ll get a glimpse of liquid cooling’s exciting future — and the pivotal role we’ll play in making it happen.