快捷搜索:  汽车  科技

冷却液是什么原理?液体冷却真正的意义

冷却液是什么原理?液体冷却真正的意义Previously we noted that the rapid and widespread adoption of power-hungry technologies like artificial intelligence high-performance computing and hyperscale cloud deployments are continuing to put demands on data center deployments’ already increasing size and power density - some in the neighborhood of 1 000W/SF or 75 kW/rack.此前,我们注意到人工智能、高性能计算和超大规模云部署等高耗电技术快速且广泛地应用,这将对规模和功率密度已经在呈上升趋势的数据中心部

液体冷却真正的意义

WHEN DOES LIQUID COOLING MAKE SENSE?

Brian Medina | August 1 2021

与大众观点相反,水系统将在未来的数据中心中占有一席之地。

Contrary to popular belief water has a place in tomorrow’s Data centers.

冷却液是什么原理?液体冷却真正的意义(1)

数据中心的效率仍然是一个热门话题,而很少有人质疑冷却效率是数据中心电力使用效率(PUE)的重要因素。

Data center efficiency continues to be a hot topic and few would argue that cooling efficiency is a significant factor in data center power usage effectiveness (PUE).

此前,我们注意到人工智能、高性能计算和超大规模云部署等高耗电技术快速且广泛地应用,这将对规模和功率密度已经在呈上升趋势的数据中心部署提出要求,有些数据中心部署的机架功率将近1000W /SF或75 kW/rack。

Previously we noted that the rapid and widespread adoption of power-hungry technologies like artificial intelligence high-performance computing and hyperscale cloud deployments are continuing to put demands on data center deployments’ already increasing size and power density - some in the neighborhood of 1 000W/SF or 75 kW/rack.

GPU和CPU用于驱动资源密集型的应用程序,与上一代相比,GPU和CPU可以共同产生高达350%以上的热量,这将挑战数据中心运营者去寻找新型且更有效的快速冷却数据机房的方式,特别是针对设计和配置上未考虑处理高密度机柜的存量基础设施。

The GPUs and CPUs that power these resource-intensive applications can collectively produce as much as 350% more heat than previous generations challenging data center operators to find new and more efficient ways of rapidly cooling data halls – especially in legacy facilities not designed or equipped to handle high densities.

例如,以前的CPU不会超过130W,而NVIDIA Quadro RTX 8000 GPU的最大功耗是295W。 一台Dell PowerEdge C4140 1U服务器有2个Intel Xeon CPU,4个NVIDIA GPU和2400W的电源供应。很容易看到这些配置在机架中快速叠加,从而产生一些高功率密度的机架。

For example previous generations of CPUs would not exceed 130W whereas max power consumption on an NVIDIA Quadro RTX 8000 GPU is 295W. A Dell PowerEdge C4140 1U Server has (2) Intel Xeon CPUs (4) NVIDIA GPUs and a 2400W power supply. It’s easy to see how these would quickly add up in a rack to create some large power densities.

液冷是否能满足这些快速出现的新需求? 以下是您需要了解的内容,以便您为您的组织做出正确的技术决策。

Can liquid cooling address these new fast-emerging needs? Here’s what you need to know so you can make the right technical decisions for your organization.

设计具有温度阈值的液冷系统

Designing A Liquid Cooling System With

Temperature Thresholds In Mind

实际上,水的比热容比空气高很多,相比于配置大功率风扇的空气冷却系统,水冷系统具有显著的节能条件。节省的不仅是电能消耗,还会节省馈线、配电柜、变压器、发电机和制冷设备等其他组件。

Practically water has a much higher thermal capacity than air creating significant savings opportunities over air cooling systems that require large fans with huge horsepower requirements. The savings extend beyond power usage to other connected components like feeders switchboards transformers generators and cooling units.

当然,实现液冷也需要考虑一些问题。 为了与更知名的风冷热指南配套,ASHRAE TC 9.9发布了液冷指南。 与风冷IT设备(ITE) 拥有推荐的允许设备等级划分相似,液冷IT设备有自己的W1 - W5级的等级划分。 针对W1和W2级ITE,由于设备的供水温度不应超过17°C或27°C(62.6°F或80.6°F),仍然需要冷水机组。 更专业的ITE设备(W4和W5级)可以接受高达45°C(113°F)的水温。

Of course there are still some considerations that come with liquid cooling implementation as well. ASHRAE TC 9.9 has published guidelines for liquid cooling to accompany its more well-known air-cooled thermal guidelines. Similar to the air-cooled IT equipment (ITE) recommended and allowable equipment class ranges liquid-cooled ITE has its own class system of W1 through W5. Within classes W1 and W2 chillers are still required as the facility supply water temperature should not exceed 17°C or 27°C (62.6°F or 80.6°F). More specialized equipment (Class W4 and W5) can accept water temperatures upwards to 45°C (113°F).

在设计这些系统时,关注 “趋近”温度也很重要。 换热器在传热方面并不完美,换热器的传热效率与冷热侧的对数平均温差有关。 劳伦斯伯克利国家实验室(Lawrence Berkeley National Laboratory)之前的研究表明,离开芯片散热器时的水温比进入数据中心时的供水温度高40°F。

It’s also important to be mindful of “approach” temperatures while designing these systems. Heat exchangers are not perfect at transferring heat and their effectiveness is characterized by the log mean temperature difference between the hot and cold sides of a heat exchanger. Previous studies from the Lawrence Berkeley National Laboratory have shown that liquid leaving a chip heat sink is as much as 40°F warmer than the supply water entering the data center.

冷却液是什么原理?液体冷却真正的意义(2)

该研究假设使用的CPU是在高性能计算应用程序中常用的“典型”英特尔Xeon CPU。 处理器的壳体温度最大可接受77.5°C(172°F)。 此外,该报告还建议预留22°F的设计余量,以考虑应对更高功率芯片,各种冷却拓扑,周围环境,以及因为新兴技术可能导致系统容量过小或在未来无法使用的情况。

The study assumed the use of a “typical” Intel Xeon CPU commonly used in HPC applications. The critical temperature for the processor is the maximum case temperature accepted at 77.5°C (172°F). Further it suggested a ~22°F design margin to account for higher-powered chips various cooling topologies ambient design considerations and emerging technologies that could render the system undersized or unusable in the future.

研究根据采用的散热方式和相应的建筑供液温度,总结了三个自然液体温度点,如下图所示。 这就是相比于采用直接/间接蒸发或直接膨胀式冷却的数据中心,使用冷冻流体的基础设施更适合处理液冷设备的主要原因。

The study concluded three natural liquid temperature breakpoints depending on the type of heat rejection utilized and the corresponding building supply temperature shown below. This is the main reason why facilities with chilled fluid are better equipped to handle liquid-cooled applications than data centers utilizing direct/indirection evaporation or DX cooling.

冷却液是什么原理?液体冷却真正的意义(3)

其他关注点

Other considerations

以下是在设计、规划和实施液冷方式时需要考虑的一些其他要点:

Here are a few other rules of the road to consider when designing planning and executing your liquid cooling approach:

混合系统

Mixing systems:

完全想要采用液冷而不配置任何风冷的基础设施并不常见。 大多数MMR房间和中间配线架(IDF)将继续使用风冷IT设备,UPS和PDU等电气基础设施还会排放热量。即使大多数机架是液冷,数据中心的空间本身仍然会有很大制冷负荷,仍然需要一定量的空气冷却。 许多先进的冷却技术并不能带走服务器产生的所有热量,因此仍然需要附加的冷却方式。 液冷技术并不能消除对冷水机组及其冷却装置的需求。

It’s uncommon to envision a facility that would be 100% liquid-cooled without any air-cooled infrastructure. Most meet-me-rooms (MMRs) and intermediate distribution frames (IDFs) will continue to have air-cooled ITE and electrical infrastructure like UPSs and PDUs will reject heat. The data center white space itself will still have a large process cooling load – even if the majority of the racks are liquid-cooled - that will still require at least some appreciable volume of air. Many advanced cooling technologies do not remove all of the heat generated by the server thus still requiring supplemental cooling. Liquid cooling will not eliminate the need for chillers and cooling units.

过滤

Filtration:

最常见的液冷实现方式是采用一套冷却剂分配单元(CDU)。这些CDU可能与机架成为一体,也可能在机架外,但它们确保了工艺冷却系统(TCS)的完整性,并与基础设施冷却水系统(FWS)保持分离。基础设施的水循环水质可能会比较脏,特别是采用开式冷却塔工作的时候。

The most common liquid cooling implementation utilizes a coolant distribution unit (CDU). These may be integral or external to the rack but they protect the integrity of the technology cooling system (TCS) and maintain separation from the facility process cooling water system (FWS). Facility water loops can be dirty especially if they are served by an open cooling tower.

CDU的板式换热器的空间间距非常窄(2-8毫米),这样有助于提高流动效率。在CDU的FWS侧配备末端过滤装置很有必要,同时建议在TCS侧也配备末端过滤装置。未配置CDU的液冷系统并不多见,因此有必要对其过滤进行更严格的检查。

The CDU plate-type heat exchanger has very narrow spacing (2-8 mm) that helps drive efficiency. It is necessary to provide endpoint filtration at the FWS side of the CDU and recommended to provide filtration on the TCS side as well. Non-CDU liquid cooling systems are far less common and should necessitate an even higher level of scrutiny with respect to filtration.

结露

Condensation:

CDU可监测环境的空气露点温度,并可将TCS液温升高至露点温度2℃以上,从而防止结露。CDU有若干预热功能,可以在水到达晶片之前对其进行加热,晶片上发生结露可能是灾难性的。 与上面提到的过滤类似,在没有防结露保护的情况下,业主和操作人员应该特别注意设备结露的可能性。

The CDU is capable of monitoring the ambient air dew point condition and elevating the TCS liquid temperature to 2°C above that to prevent condensation. The CDU has several preheating functions that will warm the water before it reaches the chip – where condensation can be catastrophic. Similar to the filtration mentioned above owners and operators should be especially mindful of the possibility of condensing on equipment without this safeguard in place.

湿度控制不到位则会导致冷的盘管上结露,最终可能会在IT设备上结露。 此外,采用低冷冻水温或高级别室外空气的高密传统数据中心也可能存在问题。

Poor control of humidification systems that would have otherwise resulted in condensation on cooling coils may end up migrating to IT equipment instead. Furthermore densifying legacy data centers that use depressed chilled water temperatures or high levels of outside air could also prove to be problematic.

附属设施

Additional infrastructure:

安装单独的热交换器和泵循环系统来隔离液体冷却系统可能是有必要的。 无论是基于技术需求还是最佳实践驱动,这都可能会在附加基础设施的所有权和维护方面给设施操作员和客户带来一些不确定性。 它的物理位置也会影响通道和潜在的可租用面积。

It may be necessary to install separate heat exchangers and pumping loops to segregate liquid cooling systems. Whether driven by technical requirements or best practices it can create some ambiguity in ownership and maintenance of this additional infrastructure between the facility operator and the client. Where it is physically located also has an impact on access and potentially leasable square footage.

冗余性和正常运行时间是进一步关注的问题。 泵和热交换器必须在某些时候进行预防性维护,但必须保证到达ITE的流量。 与所有关键设备一样,并行可维护性也应该得到考虑,且做出相应的设计。 同样,电力供应转向发电机上时,水必须能够连续循环,特别是在高密度应用中。 泵应该由UPS供电以提供不间断冷却,泵的电机电压需要调节。 在任何情况下,运营商和最终用户之间的服务水平协议都需要清楚地说明所有的期望,并定义各自负责的内容。

Redundancy and uptime is a further concern. The pump and heat exchanger will have to undergo preventative maintenance at some point or another but flow to the ITE must be maintained. Like all critical equipment concurrent maintainability should be addressed and this can be designed for. Similarly on a transfer to generator the water must circulate continuously especially in high-density applications. Arguably pumps should be UPS backed to provide uninterrupted cooling and the pump motor voltage needs to be coordinated. In any case the service level agreement between the operator and the end-user needs to clearly state all of these expectations and define who is responsible for what.

物流

Logistics:

实施液冷解决方案最明显的挑战是将液体输送到机架上。 虽然用架空地板来布置和隐藏管道看起来是有利的,但在地板基座间弯弯曲曲布置管线时或视觉上确认管线完整性时可能具有挑战性,特别是在电力和网络基础设施就位的情况下。

The most obvious challenge to implementing a liquid-cooled solution is getting the liquid over to the rack. While it may seem advantageous to have a raised floor to route and conceal piping it can be challenging to snake it between floor pedestals or visually identify integrity – especially with power and network infrastructure in place.

运行数据中心时,开放的地板还有其他潜在风险。 如果管道尺寸较大,则很可能是用预制钢管(而不是铜)在现场用螺栓连接在一起,从而避免现场焊接。 不管预制钢在空中还是在地板下,这都将是具有挑战性且耗时。 架空地板只有24英寸x 24英寸,所以在安装长管子时,需要拆除或切割支撑。 这可能会对地板的结构完整性造成影响,尤其是当它支撑着负载很重的机柜时。

There is additional hazard potential in open floor tiles in operational data centers. If the piping is sizable it will likely be prefabricated steel (versus copper) bolted together in the field to avoid onsite welding. This is challenging and time-consuming whether it’s overhead or underfloor. Raised floor tiles are only 24” x 24” so stringers need to be removed or cut out in order to install long sticks of pipe. This can call into question the floor’s structural integrity especially when it’s holding up densely loaded cabinets.

最后,大多数业主尽可能不让管道进入数据机房。 然而,在管道正式投入使用前,需要向新安装的管道中注入成百上千加仑的水。将水管从后勤区域拉到在线的数据机房空间进行管道冲洗时,操作人员将会感受到紧张和压力, 需要仔细评估、计划和执行这些步骤,以避免对初始计算环境的严重破坏。

Finally most owners do what they can to keep plumbing out of the data hall. However the newly installed piping needs to be flushed and filled with hundreds or thousands of gallons of water before it can become operational. This will give operators certain stress and consternation to route hoses from back of house areas into live data hall white space for the purpose of flushing and filling pipe. These steps need to be carefully assessed planned and executed to avoid significant disruptions to the pristine computing environment.

若干组织正在针对新建及现存基础设施中部署液冷研究设计参考,其中包括开放计算项目小组委员会的先进冷却设施和先进冷却解决方案在内。

Several groups including Open Compute Project’s subcommittee’s Advanced Cooling Facilities and Advanced Cooling Solutions are developing reference designs for deploying liquid cooling in new and existing facilities.

液冷是否适合你的数据中心?

Is It Right For Your Data Center?

高密化推动着更高效且有效的冷却技术需求。 随着半导体制造技术的不断发展,我们将很快敲开通往3纳米MOSFET技术点的大门,完全有可能达到空气冷却效果的上限,甚至在不久的将来,监管机构会强制广泛采用液冷。 此外,有些设计将需要液冷。 虽然液冷将成为托管部署的首选冷却方法,但仍需要权衡明确的收益及潜在的缺点。

Densification is driving demand for more efficient and effective cooling technologies. As semiconductor manufacturing technology continues to evolve we will soon be knocking on the door of a 3 nm MOSFET technology node. It’s entirely possible that we reach the upper limits of air-cooling effectiveness or even see regulatory mandates that will force more widespread adoption of liquid cooling in the not-too-distant future. Furthermore certain designs will require liquid cooling. While it is poised to become the go-to cooling methodology for colocation deployments it’s imperative to weigh the obvious benefits against the potential downsides.

DeepKnowledge

翻译:

钱涛

上海哔哩哔哩科技有限公司 高级运维工程师

DKV(Deep Knowledge Volunteer)普通成员

校对

王璐琦

上海上证数据服务有限责任公司 运营管理部助理经理

DKV(Deep Knowledge Volunteer)精英成员

公众号声明:

本文并非原文官方认可的中文版本,仅供国内读者学习参考,不得用于任何商业用途,文章内容请以英文原版为准。中文版未经公众号DeepKnowledge书面授权,请勿转载。



公众号 | 视频号 | 头条号| 西瓜视频|BiliBli

猜您喜欢: