PopYard:Today's Tech.-NVIDIA CEO: AI Workloads Will “Flood” Data Centers

Mon Nov 25 10:48:11 2024

NVIDIA CEO: AI Workloads Will “Flood” Data Centers
Source: Yevgeniy Sverdlik

During a keynote at his company’s big annual conference in Silicon Valley last week, NVIDIA CEO Jensen Huang took several hours to announce the chipmaker’s latest products and innovations, but also to drive home the inevitability of the force that is Artificial Intelligence.

NVIDIA is the top maker of GPUs used in computing systems for Machine Learning, currently the part of the AI field where most action is happening. GPUs work in tandem with CPUs, accelerating the processing necessary to both train machines to do certain tasks and to execute them.

“Machine Learning is one of the most important computer revolutions ever,” Huang said. “The number of [research] papers in Deep Learning is just absolutely explosive.” (Deep Learning is a class of Machine Learning algorithms where innovation has skyrocketed in recent years.) “There’s no way to keep up. There is now 10 times as much investment in AI companies since 10 years ago. There’s no question we’re seeing explosive growth.”

While AI and Machine Learning are one of Gartner’s top 10 strategic technology trends for 2017, most other trends on the list – such as conversational systems, virtual and augmented reality, Internet of Things, and intelligent apps – are accelerating in large part because of advances in Machine Learning.

“Over the next 10 years, virtually every app, application and service will incorporate some level of AI,” Gartner fellow and VP David Cearley said in a statement. “This will form a long-term trend that will continually evolve and expand the application of AI and machine learning for apps and services.”
No Longer Just for Hyper-Scalers

Growth in Machine Learning means a “flood” of AI workloads is headed for the world’s data center floors, Huang said. Up until now, the most impactful production applications of Deep Learning have been developed and deployed by a handful of hyper-scale cloud giants – such as Google, Microsoft, Facebook, and Baidu – but NVIDIA sees the technology starting to proliferate beyond the massive cloud data centers.

“AI is just another kind of computing, and it’s going to hit many, many markets,” Ian Buck, the NVIDIA VP in charge of the company’s Accelerated Computing unit, told Data Center Knowledge in an interview. While there’s no doubt that Machine Learning will continue growing as portion of the total computing power inside cloud data centers, he expects to see it in data centers operated by everybody in the near future — from managed service providers to banks. “It’s going to be everywhere.”

In preparation for this flood, data center managers need to answer some key basic questions: Will it make more sense for my company to host Deep Learning workloads in the cloud or on-premises? Will it be a hybrid of the two? How much of the on-prem infrastructure will be needed for training Deep Learning algorithms? How much of it will be needed for inference? If we’ll have a lot of power-hungry training servers, will we go for maximum performance or give up some performance in exchange for higher efficiency of the whole data center? Will we need inference capabilities at the edge?
Cloud or On-Premises? Probably Both

Today, many companies large and small are in early research phases, looking for ways Deep Learning can benefit their specific businesses. One data center provider that specializes in hosting infrastructure for Deep Learning told us most of their customers hadn’t yet deployed their AI applications in production.

This drives demand for rentable GPUs in the cloud, which Amazon Web Services, Microsoft Azure, and Google Cloud Platform are happy to provide. By using their services, researchers can access lots of GPUs without having to spend a fortune on on-premises hardware.

“We’re seeing a lot of demand for it [in the] cloud,” Buck said. “Cloud is one of the reasons why all the hyper-scalers and cloud providers are excited about GPUs.”

A common approach, however, is combining some on-premises systems with cloud services. Berlin-based AI startup Twenty Billion Neurons, for example, synthesizes video material to train its AI algorithm to understand the way physical objects interact with their environment. Because those videos are so data-intensive, twentybn uses an on-premises compute cluster at its lab in Toronto to handle them, while outsourcing the actual training workloads to cloud GPUs in a Cirrascale data center outside San Diego.

Cloud GPUs are also a good way to start exploring Deep Learning for a company without committing a lot of capital upfront. “We find that cloud is a nice lubricant to getting adoption up for GPUs in general,” Buck said.
Efficiency v. Performance

If your on-premises Deep Learning infrastructure will do a lot of training – the computationally intensive applications used to teach neural networks things like speech and image recognition – prepare for power-hungry servers with lots of GPUs on every motherboard. That means higher power densities than most of the world’s data centers have been designed to support (we’re talking up to 30kW per rack).

However, it doesn’t automatically mean you’ll need as high-density cooling infrastructure as possible. Here, the tradeoff is between performance and the number of users, or workloads, the infrastructure can support simultaneously. Maximum performance means the highest-power GPUs money can buy, but it’s not necessarily the most efficient way to go.

NVIDIA’s latest Volta GPUs, expected to hit the market in the third quarter, deliver maximum performance at 300 watts, but if you slash the power in half you will still get 80 percent of the number-crunching muscle, Buck said. If “you back off power a little bit, you still maintain quite a bit of performance. It means I can up the number of servers in a rack and max out my data center. It’s just an efficiency choice.”
What about the Edge?

Inferencing workloads – applications neural networks use to apply what they’ve been trained to do – require fewer GPUs and less power, but they have to perform extremely fast. (Alexa wouldn’t be much fun to use if it took even 5 seconds to respond to a voice query.)

While not particularly difficult to handle on-premises, one big question to answer about inferencing servers for the data center manager is how close they have to be to where input data originates. If your corporate data centers are in Ashburn, Virginia, but your Machine Learning application has to provide real-time suggestions to users in Dallas or Portland, chances are you’ll need some inferencing servers in or near Dallas and Portland to make it actually feel close to real-time. If your application has to do with public safety — analyzing video data at intersections to help navigate autonomous vehicles for example – it’s very likely that you’ll need some inferencing horsepower right at those intersections.
“Second Era of Computing”

Shopping suggestions on Amazon.com (one of the earliest uses of Machine Learning in production), Google search predictions, neither of these capabilities was written out as sequences of specific if/then instructions by software engineers, Huang said, referring to the rise of Machine Learning as a “second era of computing.”

And it’s growing quickly, permeating all industry verticals, which means data center managers in every industry have some homework to do.

}