NVIDIA Research comprises more than 200 scientists around the world driving innovation across a range of industries. One of its central figures is David Luebke, who founded the team in 2006 and is now the company’s vice president of graphics research.
Luebke spoke with AI Podcast host Noah Kravitz about what he’s working on. He’s especially focused on the interaction between AI and graphics. Rather than viewing the two as conflicting endeavors, Luebke argues that AI and graphics go together “like peanut butter and jelly.”
NVIDIA Research proved that with StyleGAN2, the second iteration of the generative adversarial network StyleGAN. Trained on high-resolution images, StyleGAN2 takes numerical input and produces realistic portraits.
Creating images comparable to those generated in films — which could take up to weeks to create just a single frame — the first version of StyleGAN only takes 24 milliseconds to produce an image.
Luebke envisions the future of GANs as an even larger collaboration between AI and graphics. He predicts that GANs such as those used in StyleGAN will learn to produce the key elements of graphics: shapes, materials, illumination and even animation.
Key Points From This Episode:
AI is especially useful in graphics by replacing or augmenting components of the traditional computer graphics pipeline, from content creation to mesh generation to realistic character animation.
Luebke researches a range of topics, one of which is virtual and augmented reality. It was, in fact, what inspired him to pursue graphics research — learning about VR led him to switch majors from chemical engineering.
Displays are a major stumbling block in virtual and augmented reality, he says. He emphasizes that VR requires high frame rates, low latency and very high pixel density.
Tweetables:
“Artificial intelligence, deep neural networks — that is the future of computer graphics” — David Luebke [2:34]
“[AI], like a renaissance artist, puzzled out the rules of perspective and rotation” — David Luebke [16:08]
Real-time graphics technology, namely, GPUs, sparked the modern AI boom. Now modern AI, driven by GPUs, is remaking graphics. This episode’s guest is Aaron Lefohn, senior director of real-time rendering research at NVIDIA. Aaron’s international team of scientists played a key role in founding the field of AI computer graphics.
Ever wondered what it takes to produce the complex imagery in films like Star Wars or Transformers? Here to explain the magic is Colie Wertz, a conceptual artist and modeler who works on film, television and video games. Wertz discusses his specialty of hard modeling, in which he produces digital models of objects with hard surfaces like vehicles, robots and computers.
DOOM, of course, is foundational to 3D gaming. 3D gaming, of course, is foundational to GPUs. GPUs, of course, are foundational to deep learning, which is, now, thanks to a team of Italian researchers, two of whom we’re bringing to you with this podcast, being used to make new levels for … DOOM.
Every year, 60,000 people go blind from diabetic retinopathy, a condition caused by damage to the blood vessels in the eye and a risk factor of high blood sugar levels.
Digital Diagnostics, a software-defined AI medical imaging company formerly known as IDx, is working to help those people retain their vision, using NVIDIA technology to do so.
The startup was founded a decade ago by Michael Abramoff, a retinal surgeon with a Ph.D. in computer science. While training as a surgeon, Abramoff often saw patients with diabetic retinopathy, or DR, that had progressed too far to be treated effectively, leading to permanent vision loss.
With the mission of increasing access to and quality of DR diagnosis, as well as decreasing its cost, Abramoff and his team have created an AI-based solution.
The company’s product, IDx-DR, takes images of the back of the eye, analyzes them and provides a diagnosis within minutes — referring the patient to a specialist for treatment if a more than mild case is detected.
The system is optimized on NVIDIA GPUs and its deep learning pipeline was built using the NVIDIA cuDNN library for high-performance GPU-accelerated operations. Training occurs using Amazon EC2 P3 instances featuring NVIDIA V100 Tensor Core GPUs and is based on images of DR cases confirmed by retinal specialists.
IDx-DR enables diagnostic tests to be completed in easily accessible settings like drugstores or primary care providers’ offices, rather than only at ophthalmology clinics, said John Bertrand, CEO at Digital Diagnostics.
“Moving care to locations the patient is already visiting improves access and avoids extra visits that overwhelm specialty physician schedules,” he said. “Patients avoid an extra copay and don’t have to take time off work for a second appointment.”
Autonomous, Not Just Assistive
“There are lots of good AI products specifically created to assist physicians and increase the detection rate of finding an abnormality,” said Bertrand. “But to allow physicians to practice to the top of their license, and reduce the costs of these low complexity tests, you need to use autonomous AI,” he said.
IDx-DR is the first FDA-cleared autonomous AI system — meaning that while the FDA has cleared many AI-based applications, IDx-DR was the first that doesn’t require physician oversight.
Clinical trials using IDx-DR consisted of machine operators who didn’t have prior experience taking retinal photographs, simulating the way the product would be used in the real world, according to Bertrand.
“Anyone with a high school diploma can perform the exam,” he said.
The platform has been deployed in more than 20 sites across the U.S., including Blessing Health System, in Illinois, where family medicine doctor Tim Beth said, “Digital Diagnostics has done well in developing an algorithm that can detect the possibility of early disease. We would be missing patients if we didn’t use IDx-DR.”
In addition to DR, Digital Diagnostics has created prototypes for products that diagnose glaucoma and age-related macular degeneration. The company is also looking to provide solutions for healthcare issues beyond eye-related conditions, including those related to the skin, nose and throat.
For many of the tens of millions of employees working from home amid the pandemic, their change of scenery is likely to stick.
Fifty-two percent of global IT and business leaders surveyed by IDC in June said that their work at home employment models will likely be permanently changed.*
To cope, enterprises are turning to the cloud as it provides the simplified, flexible management of IT resources that are required to support remote workers, wherever they may be. With NVIDIA GPUs and virtualization software, cloud infrastructure can support all kinds of compute and visualization workloads — AI, data science, computer-aided design, rendering, content creation and more — without compromising performance.
This surge of growth in remote work has led the NVIDIA Cloud Service Provider program, a pillar of the NVIDIA Partner Network, to grow by over 60 percent in the first half of the year alone.
The program provides partners like these with resources and tools to grow their business and ensure customer success. Recently 22 new partners have joined in Europe and more than 10 in North America.
Europe and North America have driven regional growth which has contributed to over 80 percent of the new CSP partner adoption, bringing the program to over 100 partners worldwide.
“As the world continues to adapt to working remote, we see unprecedented demand for high-performance managed desktop as a service across all industries,” said Robert Green, president and CTO of Dizzion. Jviation, an aviation engineering firm, relies on Dizzion to optimize its end-user experience, especially for high-end graphics, video collaboration and other media-intense workloads.
“With innovative NVIDIA GPUs, Dizzion cloud desktops enable any global team member to work from home — or anywhere — and keep things business as usual,” said Green.
Daniel Kobran, chief operating officer at Paperspace, said, “GPUs and the new era of accelerated computing are powering applications previously thought impossible. The Paperspace cloud platform provides on-demand GPU processing power behind a unified hub to facilitate collaboration across large, distributed teams for customers such as Medivis, which is using Paperspace to build AI-assisted, real-time analysis to provide surgeons key insights during surgery.”
Cloud service providers in the NPN program have expertise in designing, developing, delivering and managing cloud-based workloads, applications and services. Customers choosing providers that offer NVIDIA GPU-accelerated infrastructure can gain additional benefits, such as:
Broad NVIDIA GPU options from the cloud, such as Quadro RTX 6000 and 8000 and NVIDIA T4 and V100 Tensor Core GPUs.
Management software to easily unify enterprise private and multi-cloud infrastructure.
Services and offerings that ease adoption and migration to the cloud, including deep vertical and workload expertise. For example, desktop-as-a-service options configured with NVIDIA Quadro Virtual Workstation to support graphics and compute workloads required by creative and technical professionals. Many offerings can be tailored to each enterprise’s unique needs.
Compliance with local data sovereignty laws.
More information on program benefits and how to sign up as a partner is available here.
* Source: IDC, “From Rigid to Resilient Organizations: Enabling the Future of Work”, Doc # US45799820, July 2020
If you want to create a world-class recommendation system, follow this recipe from a global team of experts: Blend a big helping of GPU-accelerated AI with a dash of old-fashioned cleverness.
The proof was in the pudding for a team from NVIDIA that won this year’s ACM RecSys Challenge. The competition is a highlight of an annual gathering of more than 500 experts who present the latest research in recommendation systems, the engines that deliver personalized suggestions for everything from restaurants to real estate.
At the Sept. 22-26 online event, the team will describe its dish, already available as open source code. They’re also sharing lessons learned with colleagues who build NVIDIA products like RAPIDS and Merlin, so customers can enjoy the fruits of their labor.
In an effort to bring more people to the table, NVIDIA will donate the contest’s $15,000 cash prize to Black in AI, a nonprofit dedicated to mentoring the next generation of Black specialists in machine learning.
GPU Server Doles Out Recommendations
This year’s contest, sponsored by Twitter, asked researchers to comb through a dataset of 146 million tweets to predict which ones a user would like, reply or retweet. The NVIDIA team’s work led a field of 34 competitors, thanks in part to a system with four NVIDIA V100 Tensor Core GPUs that cranked through hundreds of thousands of options.
Their numbers were eye-popping. GPU-accelerated software engineered in less than a minute features that required nearly an hour on a CPU, a 500x speedup. The four-GPU system trained the team’s AI models 120x faster than a CPU. And GPUs gave the group’s end-to-end solution a 280x speedup compared to an initial implementation on a CPU.
“I’m still blown away when we pull off something like a 500x speedup in feature engineering,” said Even Oldridge, a Ph.D. in machine learning who in the past year quadrupled the size of his group that designs NVIDIA Merlin, a framework for recommendation systems.
GPUs and frameworks such as UCX provided up to 500x speedups compared to CPUs.
Competition Sparks Ideas for Software Upgrades
The competition spawned work on data transformations that could enhance future versions of NVTabular, a Merlin library that eases engineering new features with the spreadsheet-like tables that are the basis of recommendation systems.
“We won in part because we could prototype fast,” said Benedikt Schifferer, one of three specialists in recommendation systems on the team that won the prize.
Schifferer also credits two existing tools. DASK, an open-source scheduling tool, let the team split memory-hungry jobs across multiple GPUs. And cuDF, part of NVIDIA’s RAPIDS framework for accelerated data science, let the group run the equivalent of the popular Pandas library on GPUs.
“Searching for features in the data using Pandas on CPUs took hours for each new feature,” said Chris Deotte, one of a handful of data scientists on the team who have earned the title Kaggle grandmaster for their prowess in competitions.
“When we converted our code to RAPIDS, we could explore features in minutes. It was life changing, we could search hundreds of features and that eventually led to discoveries that won that competition,” said Deotte, one of only two grandmasters who hold that title in all four Kaggle categories.
More enhancements for recommendation systems are on the way. For example, customers can look forward to improvements in text handling on GPUs, a key data type for recommendation systems.
An Aha! Moment Fuels the Race
Deotte credits a colleague in Brazil, Gilberto Titericz, with an insight that drove the team forward.
“He tracked changes in Twitter followers over time which turned out to be a feature that really fueled our accuracy — it was incredibly effective,” Deotte said.
“I saw patterns changing over time, so I made several plots of them,” said Titericz, who ranked as the top Kaggle grandmaster worldwide for a couple years.
“When I saw a really great result, I thought I made a mistake, but I took a chance, submitted it and to my surprise it scored high on the leaderboard, so my intuition was right,” he added.
In the end, the team used a mix of complementary AI models designed by Titericz, Schifferer and a colleague in Japan, Kazuki Onodera, all based on XGBoost, an algorithm well suited for recommendation systems.
Several members of the team are part of an elite group of Kaggle grandmasters that NVIDIA founder and CEO Jensen Huang dubbed KGMON, a playful takeoff on Pokemon. The team won dozens of competitions in the last four years.
Recommenders Getting Traction in B2C
For many members, including team leader Jean-Francois Puget in southern France, it’s more than a 9-to-5 job.
“We spend nights and weekends in competitions, too, trying to be the best in the world,” said Puget, who earned his Ph.D. in machine learning two decades before deep learning took off commercially.
Now the technology is spreading fast.
This year’s ACM RecSys includes three dozen papers and talks from companies like Amazon and Netflix that helped establish the field with recommenders that help people find books and movies. Now, consumer companies of all stripes are getting into the act including IKEA and Etsy, which are presenting at ACM RecSys this year.
“For the last three or four years, it’s more focused on delivering a personalized experience, really understanding what users want,” said Schifferer. It’s a cycle where “customers’ choices influence the training data, so some companies retrain their AI models every four hours, and some say they continuously train,” he added.
That’s why the team works hard to create frameworks like Merlin to make recommendation systems run easily and fast at scale on GPUs. Other members of NVIDIA’s winning team were Christof Henkel (Germany), Jiwei Liu and Bojan Tunguz (U.S.), Gabriel De Souza Pereira Moreira (Brazil) and Ahmet Erdem (Netherlands).
To get tips on how to design recommendation systems from the winning team, tune in to an online tutorial here on Friday, Sept. 25.
Apple’s iPad 2 launch in 2011 ignited a touch tablet craze, but when David Cann and Marc DeVidts got their hands on one they saw something different: They rigged it to a remote-controlled golf caddy and posted a video of it in action on YouTube.
Next came phone calls from those interested in buying such a telepresence robot.
Hacks like this were second nature for the friends who met in 2002 while working on the set of the BattleBots TV series, featuring team-built robots battling before live audiences.
That’s how Double Robotics began in 2012. The startup went on to attend YCombinator’s accelerator, and it has sold more than 12,000 units. That cash flow has allowed the small team with just $1.8 million in seed funding to carry on without raising capital, a rarity in hardware.
Much has changed since they began. Double Robotics, based in Burlingame, Calif., today launched its third-generation model, the Double 3, sporting an NVIDIA Jetson TX2 for AI workloads.
“We did a bunch of custom CUDA code to be able to process all of the depth data in real time, so it’s much faster than before, and it’s highly tailored to the Jetson TX2 now,” said Cann.
Remote Worker Presence
The Double helped engineers inspect Selene while it was under construction.
The Double device, as it’s known, was designed for remote workers to visit offices in the form of the robot so they could see their co-workers in meetings. Video-over-internet call connections allow people to see and hear their remote colleague on the device’s tablet screen.
The Double has been a popular ticket at tech companies on the East and West Coasts in the five years prior to the pandemic, and interest remains strong but in different use cases, according to the company. It has also proven useful in rural communities across the country, where people travel long distances to get anywhere, the company said.
NVIDIA purchased a telepresence robot from Double Robotics so that non-essential designers sheltering at home could maintain daily contact with work on Selene, the world’s seventh-fastest computer.
Some customers who use it say it breaks down communication barriers for remote workers, with the physical presence of the robot able to interact better than using video conferencing platforms.
Also, COVID-19 has spurred interest for contact-free work using the Double. Pharmaceutical companies have contacted Double Robotics asking how the robot might aid in international development efforts, according to Cann. The biggest use case amid the pandemic is for using the Double robots in place of international business travel, he said. Instead of flying in to visit a company office, the office destination could offer a Double to would-be travelers.
Double 3 Jetson Advances
Now shipping, the Double 3 features wide-angle and zoom cameras and can support night vision. It also uses two stereovision sensors for depth vision, five ultrasonic range finders, two wheel encoders and an inertial measurement unit sensor.
Double Robotics will sell the head of the new Double 3 — which includes the Jetson TX2 — to existing customers seeking to upgrade its brains for access to increasing levels of autonomy.
To enable the autonomous capabilities, Double Robotics relied on the NVIDIA Jetson TX2 to process all of the camera and sensor data in realtime, utilizing the CUDA-enabled GPUs and the accelerated multimedia and image processors.
The company is working on autonomous features for improved self-navigation and safety features for obstacle avoidance as well as other capabilities, such as improved auto docking for recharging and auto pilot all the way into offices.
Right now the Double can do automated assisted driving to help people avoid hitting walls. The company next aims for full office autonomy and ways to help it get through closed doors.
“One of the reasons we chose the NVIDIA Jetson TX2 is that it comes with the Jetpack SDK that makes it easy to get started and there’s a lot that’s already done for you — it’s certainly a huge help to us,” said Cann.
Life choices can change a person’s DNA — literally.
Gene changes that occur in human cells over a person’s lifetime, known as somatic mutations, cause the vast majority of cancers. They can be triggered by environmental or behavioral factors such as exposure to ultraviolet light or radiation, drinking or smoking.
By using NVIDIA GPUs to analyze the signature, or molecular fingerprint, of these mutations, researchers can better understand known causes of cancer, discover new risk factors and investigate why certain cancers are more common in certain areas of the world than others.
The Cancer Grand Challenges’ Mutographs team, an international research group funded by Cancer Research U.K., is using NVIDIA GPU-accelerated machine learning models to study DNA from the tumors of 5,000 patients with five cancer types: pancreas, kidney and colorectal cancer, as well as two kinds of esophageal cancer.
Using powerful NVIDIA DGX systems, researchers from the Wellcome Sanger Institute — a world leader in genomics — and the University of California, San Diego, collaborated with NVIDIA developers to achieve more than 30x acceleration when running their machine learning software SigProfiler.
“Research projects such as the Mutographs Grand Challenge are just that — grand challenges that push the boundary of what’s possible,” said Pete Clapham, leader of the Informatics Support Group at the Wellcome Sanger Institute. “NVIDIA DGX systems provide considerable acceleration that enables the Mutographs team to not only meet the project’s computational demands, but to drive it even further, efficiently delivering previously impossible results.”
Molecular Detective Work
Just as every person has a unique fingerprint, cancer-causing somatic mutations have unique patterns that show up in a cell’s DNA.
“At a crime scene, investigators will lift fingerprints and run those through a database to find a match,” said Ludmil Alexandrov, computational lead on the project and an assistant professor of cellular and molecular medicine at UCSD. “Similarly, we can take a molecular fingerprint from cells collected in a patient’s biopsy and see if it matches a risk factor like smoking or ultraviolet light exposure.”
Some somatic mutations have known sources, like those Alexandrov mentions. But the machine learning model can pull out other mutation patterns that occur repeatedly in patients with a specific cancer, but have no known source.
When that happens, Alexandrov teams up with other scientists to test hypotheses and perform large-scale experiments to discover the cancer-causing culprit.
Discovering a new risk factor can help improve cancer prevention. Researchers in 2018 traced back a skin cancer mutational signature to an immunosuppressant drug, which now lists the condition as one of its possible side effects, and helps doctors better monitor patients being treated with the drug.
Enabling Whirlwind Tours of Global Data
In cases where the source of a mutational signature is known, researchers can analyze trends in the occurrence of specific kinds of somatic mutations (and their corresponding cancers) in different regions of the world as well as over time.
“Certain cancers are very common in one part of the world, and very rare in others. And when people migrate from one country to another, they tend to acquire the cancer risk of the country they move to,” said Alexandrov. “What that tells you is that it’s mostly environmental.”
Researchers on the Mutographs project are studying a somatic mutation linked to esophageal cancer, a condition some studies have correlated with the drinking of scalding beverages like tea or maté.
Esophageal cancer is much more common in Eastern South America, East Africa and Central Asia than in North America or West Africa. Finding the environmental or lifestyle factor that puts people at higher risk can help with prevention and early detection of future cases.
Cases of esophageal squamous cell carcinoma vary greatly around the world. (Image courtesy of Mutographs project. Data source: GLOBOCAN 2012.)
The Mutographs researchers teamed up with NVIDIA to accelerate the most time-consuming parts of the SigProfiler AI framework on NVIDIA GPUs. When running pipeline jobs with double precision on NVIDIA DGX systems, the team observed more than 30x acceleration compared to using CPU hardware. With single precision, Alexandrov says, SigProfiler runs significantly faster, achieving around a 50x speedup.
The DGX system’s optimized software and NVLink interconnect technology also enable the scaling of AI models across all eight NVIDIA V100 Tensor Core GPUs within the system for maximum performance in both model development and deployment.
For research published in Nature this year, Alexandrov’s team analyzed data from more than 20,000 cancer patients, which used to take almost a month.
“With NVIDIA DGX, we can now do that same analysis in less than a day,” he said. “That means we can do much more testing, validation and exploration.”
Hands-free phone calls and touchless soap dispensers have been the norm for years. Next up, contact-free hospitals.
San Francisco-based startup Ouva has created a hospital intelligence platform that monitors patient safety, acts as a patient assistant and provides a sensory experience in waiting areas — without the need for anyone to touch anything.
The platform uses the NVIDIA Clara Guardian application framework so its optical sensors can take in, analyze and provide healthcare professionals with useful information, like whether a patient with high fall-risk is out of bed. The platform is optimized on NVIDIA GPUs and its edge deployments use the NVIDIA Jetson TX1 module.
Ouva is a member of NVIDIA Inception, a program that provides AI startups go-to-market support, expertise and technology. Inception partners also have access to NVIDIA’s technical team.
Dogan Demir, founder and CEO of Ouva, said, “The Inception program informs us of hardware capabilities that we didn’t even know about, which really speeds up our work.”
Patient Care Automation
The Ouva platform automates patient monitoring, which is critical during the pandemic.
“To prevent the spread of COVID-19, we need to minimize contact between staff and patients,” said Demir. “With our solution, you don’t need to be in the same room as a patient to make sure that they’re okay.”
More and more hospitals use video monitoring to ensure patient safety, he said, but without intelligent video analytics, this can entail a single nurse trying to keep an eye on up to 100 video feeds at once to catch an issue in a patient’s room.
By detecting changes in patient movement and alerting workers of them in real time, the Ouva platform allows nurses to pay attention to the right patient at the right time.
The Ouva platform alerts nurses to changes in patient movement.
“The platform minimizes the time that nurses may be in the dark about how a patient is doing,” said Demir. “This in turn reduces the need for patients to be transferred to the ICU due to situations that could’ve been prevented, like a fall or brain injury digression due to a seizure.”
According to Ouva’s research, the average hospitalization cost for a fall injury is $35,000, with an additional $43,000 estimated per person with a pressure injury like an ulcer from the hospital bed. This means that by preventing falls and monitoring a patient’s position changes, Ouva could help save $4 million per year for a 100-bed facility.
Ouva’s system also performs personal protective equipment checks and skin temperature screenings, as well as flags contaminated areas for cleaning, which can reduce a nurse’s hours and contact with patients.
Radboud University Medical Center in the Netherlands recently integrated Ouva’s platform for 10 of its neurology wards.
“Similar solutions typically require contact with the patient’s body, which creates an infection and maintenance risk,” said Dr. Harry van Goor from the facility. “The Ouva solution centrally monitors patient safety, room hygiene and bed turnover in real time while preserving patients’ privacy.”
Patient Assistant and Sensory Experience
The platform can also guide patients through a complex hospital facility by providing answers to voice-activated questions about building directions. Medical City Hospital in Dallas was the first to pick up this voice assistant solution for their Heart and Spine facilities at the start of COVID-19.
In waiting areas, patients can participate in Ouva’s touch-free sensory experience by gesturing at 60-foot video screens that wrap around walls, featuring images of gardens, beaches and other interactive locations.
The goal of the sensory experience, made possible by NVIDIA GPUs, is to reduce waiting room anxiety and improve patient health outcomes, according to Demir.
“The amount of pain that a patient feels during treatment can be based on their perception of the care environment,” said Demir. “We work with physical and occupational therapists to design interactive gestures that allow people to move their bodies in ways that both improve their health and their perception of the hospital environment.”
The new instructor-led workshops cover fundamentals of deep learning, recommender systems and Transformer-based applications. Anyone connected online can join for a nominal fee, and participants will have access to a fully configured, GPU-accelerated server in the cloud.
DLI instructor-led trainings consist of hands-on remote learning taught by NVIDIA-certified experts in virtual classrooms. Participants can interact with their instructors and peers in real time. They can whiteboard ideas, tackle interactive coding challenges and earn a DLI certificate of subject competency to support their professional growth.
DLI at GTC is offered globally, with several courses available in Korean, Japanese and Simplified Chinese for attendees in their respective time zones.
New DLI workshops launching at GTC include:
Fundamentals of Deep Learning — Build the confidence to take on a deep learning project by learning how to train a model, work with common data types and model architectures, use transfer learning between models, and more.
Building Intelligent Recommender Systems — Create different types of recommender systems: content-based, collaborative filtering, hybrid, and more. Learn how to use the open-source cuDF library, Apache Arrow, alternating least squares, CuPy and TensorFlow 2 to do so.
Building Transformer-Based Natural Language Processing Applications — Learn about NLP topics like Word2Vec and recurrent neural network-based embeddings, as well as Transformer architecture features and how to improve them. Use pre-trained NLP models for text classification, named-entity recognition and question answering, and deploy refined models for live applications.
Applications of AI for Predictive Maintenance — Leverage predictive maintenance and identify anomalies to manage failures and avoid costly unplanned downtimes, use time-series data to predict outcomes using machine learning classification models with XGBoost, and more.
Fundamentals of Accelerated Data Science with RAPIDS — Learn how to use cuDF and Dask to ingest and manipulate large datasets directly on the GPU, applying GPU-accelerated machine learning algorithms including XGBoost, cuGRAPH and cuML to perform data analysis at massive scale.
Fundamentals of Accelerated Computing with CUDA C/C++ — Find out how to accelerate CPU-only applications to run their latent parallelism on GPUs, using techniques like essential CUDA memory management to optimize accelerated applications.
Fundamentals of Deep Learning for Multi-GPUs — Scale deep learning training to multiple GPUs, significantly shortening the time required to train lots of data and making solving complex problems with deep learning feasible.
Applications of AI for Anomaly Detection — Discover how to implement multiple AI-based solutions to identify network intrusions, using accelerated XGBoost, deep learning-based autoencoders and generative adversarial networks.
With more than 2 million registered NVIDIA developers working on technological breakthroughs to solve the world’s toughest problems, the demand for deep learning expertise is greater than ever. The full DLI course catalog includes a variety of topics for anyone interested in learning more about AI, accelerated computing and data science.
Get a glimpse of the DLI experience:
Workshops have limited seating, with the early bird deadline on Sep 25. Register now.
In short, machine learning, one part of the broad field of AI, is set to become as mainstream as software applications. That’s why the process of running ML needs to be as buttoned down as the job of running IT systems.
Machine Learning Layered on DevOps
MLOps is modeled on the existing discipline of DevOps, the modern practice of efficiently writing, deploying and running enterprise applications. DevOps got its start a decade ago as a way warring tribes of software developers (the Devs) and IT operations teams (the Ops) could collaborate.
MLOps adds to the team the data scientists, who curate datasets and build AI models that analyze them. It also includes ML engineers, who run those datasets through the models in disciplined, automated ways.
MLOps combine machine learning, applications development and IT operations. Source: Neal Analytics
It’s a big challenge in raw performance as well as management rigor. Datasets are massive and growing, and they can change in real time. AI models require careful tracking through cycles of experiments, tuning and retraining.
So, MLOps needs a powerful AI infrastructure that can scale as companies grow. For this foundation, many companies use NVIDIA DGX systems, CUDA-X and other software components available on NVIDIA’s software hub, NGC.
Lifecycle Tracking for Data Scientists
With an AI infrastructure in place, an enterprise data center can layer on the following elements of an MLOps software stack:
Data sources and the datasets created from them
A repository of AI models tagged with their histories and attributes
An automated ML pipeline that manages datasets, models and experiments through their lifecycles
Software containers, typically based on Kubernetes, to simplify running these jobs
It’s a heady set of related jobs to weave into one process.
Data scientists need the freedom to cut and paste datasets together from external sources and internal data lakes. Yet their work and those datasets need to be carefully labeled and tracked.
Likewise, they need to experiment and iterate to craft great models well torqued to the task at hand. So they need flexible sandboxes and rock-solid repositories.
And they need ways to work with the ML engineers who run the datasets and models through prototypes, testing and production. It’s a process that requires automation and attention to detail so models can be easily interpreted and reproduced.
Today, these capabilities are becoming available as part of cloud-computing services. Companies that see machine learning as strategic are creating their own AI centers of excellence using MLOps services or tools from a growing set of vendors.
Gartner’s view of the machine-learning pipeline
Data Science in Production at Scale
In the early days, companies such as Airbnb, Facebook, Google, NVIDIA and Uber had to build these capabilities themselves.
“We tried to use open source code as much as possible, but in many cases there was no solution for what we wanted to do at scale,” said Nicolas Koumchatzky, a director of AI infrastructure at NVIDIA.
“When I first heard the term MLOps, I realized that’s what we’re building now and what I was building before at Twitter,” he added.
Koumchatzky’s team at NVIDIA developed MagLev, the MLOps software that hosts NVIDIA DRIVE, our platform for creating and testing autonomous vehicles. As part of its foundation for MLOps, it uses the NVIDIA Container Runtime and Apollo, a set of components developed at NVIDIA to manage and monitor Kubernetes containers running across huge clusters.
Laying the Foundation for MLOps at NVIDIA
Koumchatzky’s team runs its jobs on NVIDIA’s internal AI infrastructure based on GPU clusters called DGX PODs. Before the jobs start, the infrastructure crew checks whether they are using best practices.
First, “everything must run in a container — that spares an unbelievable amount of pain later looking for the libraries and runtimes an AI application needs,” said Michael Houston, whose team builds NVIDIA’s AI systems including Selene, a DGX SuperPOD recently ranked the most powerful industrial computer in the U.S.
Among the team’s other checkpoints, jobs must:
Launch containers with an approved mechanism
Prove the job can run across multiple GPU nodes
Show performance data to identify potential bottlenecks
Show profiling data to ensure the software has been debugged
The maturity of MLOps practices used in business today varies widely, according to Edwin Webster, a data scientist who started the MLOps consulting practice a year ago for Neal Analytics and wrote an article defining MLOps. At some companies, data scientists still squirrel away models on their personal laptops, others turn to big cloud-service providers for a soup-to-nuts service, he said.
Two MLOps Success Stories
Webster shared success stories from two of his clients.
One involves a large retailer that used MLOps capabilities in a public cloud service to create an AI service that reduced waste 8-9 percent with daily forecasts of when to restock shelves with perishable goods. A budding team of data scientists at the retailer created datasets and built models; the cloud service packed key elements into containers, then ran and managed the AI jobs.
Another involves a PC maker that developed software using AI to predict when its laptops would need maintenance so it could automatically install software updates. Using established MLOps practices and internal specialists, the OEM wrote and tested its AI models on a fleet of 3,000 notebooks. The PC maker now provides the software to its largest customers.
Many, but not all, Fortune 100 companies are embracing MLOps, said Shubhangi Vashisth, a senior principal analyst following the area at Gartner. “It’s gaining steam, but it’s not mainstream,” she said.
Vashisth co-authored a white paper that lays out three steps for getting started in MLOps: Align stakeholders on the goals, create an organizational structure that defines who owns what, then define responsibilities and roles — Gartner lists a dozen of them.
Gartner refers to the overall MLOps process as the machine learning development lifecycle (MLDLC).
Beware Buzzwords: AIOps, DLOps, DataOps, and More
Don’t get lost in a forest of buzzwords that have grown up along this avenue. The industry has clearly coalesced its energy around MLOps.
By contrast, AIOps is a narrower practice of using machine learning to automate IT functions. One part of AIOps is IT operations analytics, or ITOA. Its job is to examine the data AIOps generate to figure out how to improve IT practices.
Similarly, some have coined the terms DataOps and ModelOps to refer to the people and processes for creating and managing datasets and AI models, respectively. Those are two important pieces of the overall MLOps puzzle.
Interestingly, every month thousands of people search for the meaning of DLOps. They may imagine DLOps are IT operations for deep learning. But the industry uses the term MLOps, not DLOps, because deep learning is a part of the broader field of machine learning.
Despite the many queries, you’d be hard pressed to find anything online about DLOps. By contrast, household names like Google and Microsoft as well as up-and-coming companies like Iguazio and Paperspace have posted detailed white papers on MLOps.
MLOps: An Expanding Software and Services Smorgasbord
Those who prefer to let someone else handle their MLOps have plenty of options.
Major cloud-service providers like Alibaba, AWS and Oracle are among several that offer end-to-end services accessible from the comfort of your keyboard.
For users who spread their work across multiple clouds, DataBricks’ MLFlow supports MLOps services that work with multiple providers and multiple programming languages, including Python, R and SQL. Other cloud-agnostic alternatives include open source software such as Polyaxon and KubeFlow.
Companies that believe AI is a strategic resource they want behind their firewall can choose from a growing list of third-party providers of MLOps software. Compared to open-source code, these tools typically add valuable features and are easier to put into use.
All six vendors provide software to manage datasets and models that can work with Kubernetes and NGC.
It’s still early days for off-the-shelf MLOps software.
Gartner tracks about a dozen vendors offering MLOps tools including ModelOp and ParallelM now part of DataRobot, said analyst Vashisth. Beware offerings that don’t cover the entire process, she warns. They force users to import and export data between programs users must stitch together themselves, a tedious and error-prone process.
The edge of the network, especially for partially connected or unconnected nodes, is another underserved area for MLOps so far, said Webster of Neal Analytics.
Koumchatzky, of NVIDIA, puts tools for curating and managing datasets at the top of his wish list for the community.
“It can be hard to label, merge or slice datasets or view parts of them, but there is a growing MLOps ecosystem to address this. NVIDIA has developed these internally, but I think it is still undervalued in the industry.” he said.
Long term, MLOps needs the equivalent of IDEs, the integrated software development environments like Microsoft Visual Studio that apps developers depend on. Meanwhile Koumchatzky and his team craft their own tools to visualize and debug AI models.
The good news is there are plenty of products for getting started in MLOps.
In addition to software from its partners, NVIDIA provides a suite of mainly open-source tools for managing an AI infrastructure based on its DGX systems, and that’s the foundation for MLOps. These software tools include:
Foreman and MAAS (Metal as a Service) for provisioning individual systems
Ansible and Git for cluster configuration management
And DeepOps for scripts and instructions on how to deploy and orchestrate all of the elements above
Many are available on NGC and other open source repositories. Pulling these ingredients into a recipe for success, NVIDIA provides a reference architecture for creating GPU clusters called DGX PODs.
In the end, each team needs to find the mix of MLOps products and practices that best fits its use cases. They all share a goal of creating an automated way to run AI smoothly as a daily part of a company’s digital life.
The Mercedes-Benz S-Class has always combined the best in engineering with a legendary heritage of craftsmanship. Now, the flagship sedan is adding intelligence to the mix, fusing AI with the embodiment of automotive luxury.
At a world premiere event, the legendary premium automaker debuted the redesigned flagship S-Class sedan. It features the all-new MBUX AI cockpit system, with an augmented reality head-up display, AI voice assistant and rich interactive graphics to enable every passenger in the vehicle, not just the driver, to enjoy personalized, intelligent features.
“This S-Class is going to be the most intelligent Mercedes ever,” said Mercedes-Benz CEO Ola Källenius during the virtual launch.
Like its predecessor, the next-gen MBUX system runs on the high-performance, energy-efficient compute of NVIDIA GPUs for instantaneous AI processing and sharp graphics.
“Mercedes-Benz is a perfect match for NVIDIA, because our mission is to use AI to solve problems no ordinary computers can,” said NVIDIA founder and CEO Jensen Huang, who took the new S-Class for a spin during the launch. “The technology in this car is remarkable.”
Jensen was featured alongside Grammy award-winning artist Alicia Keys and Formula One driver Lewis Hamilton at the premiere event, each showcasing the latest innovations of the premium sedan.
The S-Class’s new intelligent system represents a significant step toward a software-defined, autonomous future. When more automated and self-driving features are integrated into the car, the driver and passengers alike can enjoy the same entertainment and productivity features, experiencing a personalized ride, no matter where they’re seated.
Unparalleled Performance
AI cockpits orchestrate crucial safety and convenience features, constantly learning to continuously deliver joy to the customer.
“For decades, the magic moment in car manufacturing was when the chassis received its engine,” Källenius said. “Today, there’s another magic moment that is incredibly important — the ‘marriage’ of the car’s body and its brain — the all-new head unit with the next-level MBUX-system.”
A vehicle’s cockpit typically requires a collection of electronic control units and switches to perform basic functions, such as powering entertainment or adjusting temperature. Leveraging NVIDIA technology, Mercedes-Benz was able to consolidate these components into an AI platform — removing 27 switches and buttons — to simplify the architecture while creating more space to add new features.
And the S-Class’s new compute headroom is as massive as its legroom. With NVIDIA at the helm, the premium sedan contains about the same computing power as 60 average vehicles. Just one chip each controls the 3D cluster, infotainment and rear seat displays.
“There’s more computing power packed into this car than any car, ever — three powerful computer chips with NVIDIA GPUs,” Jensen said. “Those three computer chips represent the brain and the nervous system of this car.”
Effortless Convenience
The new MBUX system makes the cutting edge in graphics, passenger detection and natural language processing seem effortless.
The S-Class features five large screens, each with brilliant displays — the 12.8-inch central infotainment with OLED technology — making vehicle and comfort controls even more user-friendly for every passenger. The new 3D driver display gives a spatial view at the touch of a button, providing a realistic view of the car in its surroundings.
The system delivers even more security, enabling fingerprint, face and voice recognition, alongside a traditional PIN to access personal features. Its cameras can detect if a passenger is about to exit into oncoming traffic and warn them before they open the door. The same technology is used to monitor whether a child seat is correctly attached and if the driver is paying attention to the road.
MBUX can even carry on more conversation. It can answer a wider range of questions, some without the key phrase “Hey Mercedes,” and can interact in 27 languages, including Thai and Czech.
These futuristic functions are the result of over 30 million lines of code written by hundreds of engineers, who are continuously developing new and innovative ways for customers to enjoy their drive.
“These engineers are practically in your garage and they’re constantly working on the software, improving it, enhancing it, creating more features, and will update it over the air,” Jensen said. “Your car can now get better and better over time.”