by  

You’re More Prepared Than You Think

This is a confusing time for data center operators. They need to evaluate a range of architectures for power conditioning and cooling—and they need to do it quickly. GPU advances have forced liquid cooling adoption at a pace that’s hard to match with required changes to infrastructure.

This is a quick note of reassurance to those operators. Most of your investments in shell infrastructure, chiller infrastructure, and facility water loops support most types of liquid cooling. By and large, you can plug and play immersion, single-phase water, or two-phase cooled racks into that water loop and achieve needed levels of heat transfer to the atmosphere. So, rest easy. Continued investments in supporting infrastructure are not at risk, and little to no investment should be wasted with a change in IT loop cooling architectures.

This doesn’t alleviate the need to find the best fit for your specific use case, however. Despite these various choices being able to connect to your facility water loop, there are substantial differences between them. As you begin to assess potential solutions, metrics such as TCO and heat flux capacity should be treated as “make-or-break” factors prior to adoption.

Key Questions to Guide Your Cooling Strategy

Of course, there’s a multitude of criteria on which to decide your “best-fit” solution. To name a few:

  1. Can the technology keep up with higher TDPs and thermally denser chips?  Heat fluxes (W/cm2) will continue to increase rapidly. With chip roadmaps generally available, have you confirmed that the solution you have chosen can handle heat fluxes over 250 W/cm2?
  1. Does the solution risk the equipment it is designed to protect (conductive fluid vs. non-conductive fluid)? Leaks happen. There are a lot of hands on these racks and a lot of points of potential failure. Have you quantified the cost and operational downtime when a leak occurs?
  1. Have you calculated Total Cost of Ownership (TCO) in a comprehensive way? The facility water loop temperature has a huge impact on energy use and operating expense. Vertiv estimates that for every 1° C you don’t have to cool the water loop, there is approximately 4% per year energy savings. Some architectures are much more efficient than others and will allow a substantially warmer water loop. Five or six degrees therefore enable a meaningful expansion of the “free cooling zone,” when energy-hungry chiller compressors are not required.

    Other OpEx factors include reduced preventative maintenance actions. Water requires consistent testing and frequent loop flushes, which equate to staff and downtime. These considerations are not rounding errors—they can add up.
  1. Are you concerned about thermal disuniformity reducing the life span or increasing potential failures of your GPUs / CPUs? This happens when a single-phase fluid enters a cold plate at a cooler temperature than it exits, creating strain on the silicon. With each GPU costing around $30K, this is something to be considered. High availability inference environments should make particular note of this.

Now’s the Best Time to Choose the Best Fit

It’s entirely natural to feel pressured as liquid cooling rapidly shifts from a “nice-to-have” to “must-have” solution. However, data center operators can still take comfort in knowing that most of their existing architecture is already compatible with a wide range of cooling architectures. There’s still time to take a step back and properly assess your data center’s unique performance, risk, and cost objectives prior to adoption.

Because that’s where the real challenge lies: not in compatibility, but with choosing a solution that properly aligns with your data center’s unique needs. While evaluating your options, be sure to prioritize solutions like NeuCool™, which can easily surpass the standards we’ve outlined above.

Ultimately, the decisions you make today will shape the resilience and efficiency of your data center tomorrow. Make sure you choose wisely.