Date published

14 May 2024


Dr Kelley Mullick

Artificial intelligence (AI) is dominating the digital infrastructure industry and revolutionizing how we interact with technology. From personalized recommendations on streaming platforms to autonomous vehicles navigating our roads, AI is seemingly everywhere, yet still in the early stages of its development. Which poses an important question - how are data center operators managing the impact of AI?

Much of the initial focus has been on capacity as wholesale space in most major global data center markets is limited due to a “land grab” by cloud providers to support AI workloads. What is also emerging are constraints on power infrastructure and the ability to meet sustainability objectives caused by AI. Global headlines across Europe, Asia and the US are showcasing the tension between power, sustainability, and data center growth. The International Energy Agency (IEA) predicted that global electricity demand, driven by AI growth, is set to double by 2026. 

This surge in power consumption poses significant challenges for data center operators striving to maintain efficiency, sustainability, and total cost of ownership (TCO). The energy-intensive nature of AI exacerbates carbon emissions and the carbon footprint of data centers, amplifying environmental sustainability concerns. Cloud Service Providers (CSPs) are particularly concerned about TCO optimization as they grapple with the implications of AI on their operations. Similarly, telco operators in Europe and Asia prioritize improving TCO and sustainability while relying on data centers to support AI-driven services.

Data centers must also allocate a greater proportion of their resources to cooling power-hungry CPUs and GPUs to meet the computional demand of AI workloads. Nvidia made headlines with the announcement of its 1200W Blackwell GPU calling it “a new class of AI superchip”. The solution is designed to build and run real-time generative AI on trillion-parameter large language models. Because of this kind of compute density required for AI, as well as the overall rising thermal design power of IT equipment and the need for sustainable solutions, liquid cooling is rapidly emerging as the solution of choice for solving these challenges.

Liquid cooling systems offer a more efficient means of dissipating heat compared to air cooling methods. By circulating a coolant fluid directly over the hottest components, heat is rapidly transferred away, maintaining optimal operating temperatures for AI systems. As chips continue to get hotter, data center operators need to know they are future proofing their infrastructure investment for 1000W CPUs and GPUs and beyond. Choosing technologies that can meet the demands of processor and chip roadmaps and future server generations will be key.

Iceotope Labs recently conducted tests to validate how single-phase liquid cooling technology, like Precision Liquid Cooling, can go beyond the perceived 1000W limit to compete head-to-head with other cooling technologies. Initially, the testing showed that single-phase liquid cooling demonstrated a constant thermal resistance at a given flow rate as the power was increased from 250W to 1000W. More excitingly, a second round of testing found continued consistent thermal resistance up to 1500W – a threshold not yet met within the industry. It is exciting to see these results as it showcases single-phase liquid cooling technology as an indispensable solution for effectively managing the escalating thermal demands of AI workloads in data centers.

Liquid cooling is a leading solution for efficiently accommodating modern compute requirements. Embracing this technology enhances operational efficiency, lowers energy consumption, and aligns with emerging sustainability standards. While much of the market hasn't reached 1500W operation yet, it's poised to do so soon. Liquid cooling efficiently dissipates heat from high computational power and denser hardware configurations, addressing the thermal challenges of AI and optimizing performance, energy efficiency, and hardware reliability. It's indispensable for AI workloads and key to unlocking their future.