NVIDIA Advances Physical AI at CVPR With Largest Indoor Synthetic Dataset

NVIDIA Advances Physical AI at CVPR With Largest Indoor Synthetic Dataset

NVIDIA contributed the largest ever indoor synthetic dataset to the Computer Vision and Pattern Recognition (CVPR) conference’s annual AI City Challenge — helping researchers and developers advance the development of solutions for smart cities and industrial automation.

The challenge, garnering over 700 teams from nearly 50 countries, tasks participants to develop AI models to enhance operational efficiency in physical settings, such as retail and warehouse environments, and intelligent traffic systems.

Teams tested their models on the datasets that were generated using NVIDIA Omniverse, a platform of application programming interfaces (APIs), software development kits (SDKs) and services that enable developers to build Universal Scene Description (OpenUSD)-based applications and workflows.

Creating and Simulating Digital Twins for Large Spaces

In large indoor spaces like factories and warehouses, daily activities involve a steady stream of people, small vehicles and future autonomous robots. Developers need solutions that can observe and measure activities, optimize operational efficiency, and prioritize human safety in complex, large-scale settings.

Researchers are addressing that need with computer vision models that can perceive and understand the physical world. It can be used in applications like multi-camera tracking, in which a model tracks multiple entities within a given environment.

To ensure their accuracy, the models must be trained on large, ground-truth datasets for a variety of real-world scenarios. But collecting that data can be a challenging, time-consuming and costly process.

AI researchers are turning to physically based simulations — such as digital twins of the physical world — to enhance AI simulation and training. These virtual environments can help generate synthetic data used to train AI models. Simulation also provides a way to run a multitude of “what-if” scenarios in a safe environment while addressing privacy and AI bias issues.

Creating synthetic data is important for AI training because it offers a large, scalable, and expandable amount of data. Teams can generate a diverse set of training data by changing many parameters including lighting, object locations, textures and colors.

Building Synthetic Datasets for the AI City Challenge

This year’s AI City Challenge consists of five computer vision challenge tracks that span traffic management to worker safety.

NVIDIA contributed datasets for the first track, Multi-Camera Person Tracking, which saw the highest participation, with over 400 teams. The challenge used a benchmark and the largest synthetic dataset of its kind — comprising 212 hours of 1080p videos at 30 frames per second spanning 90 scenes across six virtual environments, including a warehouse, retail store and hospital.

Created in Omniverse, these scenes simulated nearly 1,000 cameras and featured around 2,500 digital human characters. It also provided a way for the researchers to generate data of the right size and fidelity to achieve the desired outcomes.

The benchmarks were created using Omniverse Replicator in NVIDIA Isaac Sim, a reference application that enables developers to design, simulate and train AI for robots, smart spaces or autonomous machines in physically based virtual environments built on NVIDIA Omniverse.

Omniverse Replicator, an SDK for building synthetic data generation pipelines, automated many manual tasks involved in generating quality synthetic data, including domain randomization, camera placement and calibration, character movement, and semantic labeling of data and ground-truth for benchmarking.

Ten institutions and organizations are collaborating with NVIDIA for the AI City Challenge:

  • Australian National University, Australia
  • Emirates Center for Mobility Research, UAE
  • Indian Institute of Technology Kanpur, India
  • Iowa State University, U.S.
  • Johns Hopkins University, U.S.
  • National Yung-Ming Chiao-Tung University, Taiwan
  • Santa Clara University, U.S.
  • The United Arab Emirates University, UAE
  • University at Albany – SUNY, U.S.
  • Woven by Toyota, Japan

Driving the Future of Generative Physical AI 

Researchers and companies around the world are developing infrastructure automation and robots powered by physical AI — which are models that can understand instructions and autonomously perform complex tasks in the real world.

Generative physical AI uses reinforcement learning in simulated environments, where it perceives the world using accurately simulated sensors, performs actions grounded by laws of physics, and receives feedback to reason about the next set of actions.

Developers can tap into developer SDKs and APIs, such as the NVIDIA Metropolis developer stack — which includes a multi-camera tracking reference workflow — to add enhanced perception capabilities for factories, warehouses and retail operations. And with the latest release of NVIDIA Isaac Sim, developers can supercharge robotics workflows by simulating and training AI-based robots in physically based virtual spaces before real-world deployment.

Researchers and developers are also combining high-fidelity, physics-based simulation with advanced AI to bridge the gap between simulated training and real-world application. This helps ensure that synthetic training environments closely mimic real-world conditions for more seamless robot deployment.

NVIDIA is taking the accuracy and scale of simulations further with the recently announced NVIDIA Omniverse Cloud Sensor RTX, a set of microservices that enable physically accurate sensor simulation to accelerate the development of fully autonomous machines.

This technology will allow autonomous systems, whether a factory, vehicle or robot, to gather essential data to effectively perceive, navigate and interact with the real world. Using these microservices, developers can run large-scale tests on sensor perception within realistic, virtual environments, significantly reducing the time and cost associated with real-world testing.

Omniverse Cloud Sensor RTX microservices will be available later this year. Sign up for early access.

Showcasing Advanced AI With Research

Participants submitted research papers for the AI City Challenge and a few achieved top rankings, including:

All accepted papers will be presented at the AI City Challenge 2024 Workshop, taking place on June 17.

At CVPR 2024, NVIDIA Research will present over 50 papers, introducing generative physical AI breakthroughs with potential applications in areas like autonomous vehicle development and robotics.

Papers that used NVIDIA Omniverse to generate synthetic data or digital twins of environments for model simulation, testing and validation include:

Read more about NVIDIA Research at CVPR, and learn more about the AI City Challenge.

Get started with NVIDIA Omniverse by downloading the standard license free, access OpenUSD resources and learn how Omniverse Enterprise can connect teams. Follow Omniverse on Instagram, Medium, LinkedIn and X. For more, join the Omniverse community on the forums, Discord server, Twitch and YouTube channels.

Read More

‘Believe in Something Unconventional, Something Unexplored,’ NVIDIA CEO Tells Caltech Grads

‘Believe in Something Unconventional, Something Unexplored,’ NVIDIA CEO Tells Caltech Grads

NVIDIA founder and CEO Jensen Huang on Friday encouraged Caltech graduates to pursue their craft with dedication and resilience — and to view setbacks as new opportunities.

“I hope you believe in something. Something unconventional, something unexplored. But let it be informed, and let it be reasoned, and dedicate yourself to making that happen,” he said. “You may find your GPU. You may find your CUDA. You may find your generative AI. You may find your NVIDIA.”

Trading his signature leather jacket for black and yellow academic regalia, Huang addressed the nearly 600 graduates at their commencement ceremony in Pasadena, Calif., starting with the tale of the computing industry’s decades-long evolution to reach this pivotal moment of AI transformation.

“Computers today are the single most important instrument of knowledge, and it’s foundational to every single industry in every field of science,” Huang said. “As you enter industry, it’s important you know what’s happening.”

He shared how, over a decade ago, NVIDIA — a small company at the time — bet on deep learning, investing billions of dollars and years of engineering resources to reinvent every computing layer.

“No one knew how far deep learning could scale, and if we didn’t build it, we’d never know,” Huang said. Referencing the famous line from Field of Dreams — if you build it, he will come — he said, “Our logic is: If we don’t build it, they can’t come.”

Looking to the future, Huang said, the next wave of AI is robotics, a field where NVIDIA’s journey resulted from a series of setbacks.

He reflected on a period in NVIDIA’s past when the company each year built new products that “would be incredibly successful, generate enormous amounts of excitement. And then one year later, we were kicked out of those markets.”

These roadblocks pushed NVIDIA to seek out untapped areas — what Huang refers to as “zero-billion-dollar markets.”

“With no more markets to turn to, we decided to build something where we are sure there are no customers,” Huang said. “Because one of the things you can definitely guarantee is where there are no customers, there are also no competitors.”

Robotics was that new market. NVIDIA built the first robotics computer, Huang said, processing a deep learning algorithm. Over a decade later, that pivot has given the company the opportunity to create the next wave of AI.

“One setback after another, we shook it off and skated to the next opportunity. Each time, we gain skills and strengthen our character,” Huang said. “No setback that comes our way doesn’t look like an opportunity these days.”

Huang stressed the importance of resilience and agility as superpowers that strengthen character.

“The world can be unfair and deal you with tough cards. Swiftly shake it off,” he said, with a tongue-in-cheek reference to one of Taylor Swift’s biggest hits. “There’s another opportunity out there — or create one.”

Huang concluded by sharing a story from his travels to Japan, where, as he watched a gardener painstakingly tending to Kyoto’s famous moss garden, he realized that when a person is dedicated to their craft and prioritizes doing their life’s work, they always have plenty of time.

“Prioritize your life,” he said, “and you will have plenty of time to do the important things.”

Main image courtesy of Caltech. 

Read More

NVIDIA Releases Open Synthetic Data Generation Pipeline for Training Large Language Models

NVIDIA Releases Open Synthetic Data Generation Pipeline for Training Large Language Models

NVIDIA today announced Nemotron-4 340B, a family of open models that developers can use to generate synthetic data for training large language models (LLMs) for commercial applications across healthcare, finance, manufacturing, retail and every other industry.

High-quality training data plays a critical role in the performance, accuracy and quality of responses from a custom LLM — but robust datasets can be prohibitively expensive and difficult to access.

Through a uniquely permissive open model license, Nemotron-4 340B gives developers a free, scalable way to generate synthetic data that can help build powerful LLMs.

The Nemotron-4 340B family includes base, instruct and reward models that form a pipeline to generate synthetic data used for training and refining LLMs. The models are optimized to work with NVIDIA NeMo, an open-source framework for end-to-end model training, including data curation, customization and evaluation. They’re also optimized for inference with the open-source NVIDIA TensorRT-LLM library.

Nemotron-4 340B can be downloaded now from Hugging Face. Developers will soon be able to access the models at ai.nvidia.com, where they’ll be packaged as an NVIDIA NIM microservice with a standard application programming interface that can be deployed anywhere.

Navigating Nemotron to Generate Synthetic Data

LLMs can help developers generate synthetic training data in scenarios where access to large, diverse labeled datasets is limited.

The Nemotron-4 340B Instruct model creates diverse synthetic data that mimics the characteristics of real-world data, helping improve data quality to increase the performance and robustness of custom LLMs across various domains.

Then, to boost the quality of the AI-generated data, developers can use the Nemotron-4 340B Reward model to filter for high-quality responses. Nemotron-4 340B Reward grades responses on five attributes: helpfulness, correctness, coherence, complexity and verbosity. It’s currently first place on the Hugging Face RewardBench leaderboard, created by AI2, for evaluating the capabilities, safety and pitfalls of reward models.

nemotron synthetic data generation pipeline diagram
In this synthetic data generation pipeline, (1) the Nemotron-4 340B Instruct model is first used to produce synthetic text-based output. An evaluator model, (2) Nemotron-4 340B Reward, then assesses this generated text — providing feedback that guides iterative improvements and ensures the synthetic data is accurate, relevant and aligned with specific requirements.

Researchers can also create their own instruct or reward models by customizing the Nemotron-4 340B Base model using their proprietary data, combined with the included HelpSteer2 dataset.

Fine-Tuning With NeMo, Optimizing for Inference With TensorRT-LLM

Using open-source NVIDIA NeMo and NVIDIA TensorRT-LLM, developers can optimize the efficiency of their instruct and reward models to generate synthetic data and to score responses.

All Nemotron-4 340B models are optimized with TensorRT-LLM to take advantage of tensor parallelism, a type of model parallelism in which individual weight matrices are split across multiple GPUs and servers, enabling efficient inference at scale.

Nemotron-4 340B Base, trained on 9 trillion tokens, can be customized using the NeMo framework to adapt to specific use cases or domains. This fine-tuning process benefits from extensive pretraining data and yields more accurate outputs for specific downstream tasks.

A variety of customization methods are available through the NeMo framework, including supervised fine-tuning and parameter-efficient fine-tuning methods such as low-rank adaptation, or LoRA.

To boost model quality, developers can align their models with NeMo Aligner and datasets annotated by Nemotron-4 340B Reward. Alignment is a key step in training LLMs, where a model’s behavior is fine-tuned using algorithms like reinforcement learning from human feedback (RLHF) to ensure its outputs are safe, accurate, contextually appropriate and consistent with its intended goals.

Businesses seeking enterprise-grade support and security for production environments can also access NeMo and TensorRT-LLM through the cloud-native NVIDIA AI Enterprise software platform, which provides accelerated and efficient runtimes for generative AI foundation models.

Evaluating Model Security and Getting Started

The Nemotron-4 340B Instruct model underwent extensive safety evaluation, including adversarial tests, and performed well across a wide range of risk indicators. Users should still perform careful evaluation of the model’s outputs to ensure the synthetically generated data is suitable, safe and accurate for their use case.

For more information on model security and safety evaluation, read the model card.

Download Nemotron-4 340B models via Hugging Face. For more details, read the research papers on the model and dataset.

See notice regarding software product information.

Read More

‘The Proudest Refugee’: How Veronica Miller Charts Her Own Path at NVIDIA

‘The Proudest Refugee’: How Veronica Miller Charts Her Own Path at NVIDIA

When she was five years old, Veronica Miller (née Teklai) and her family left their homeland of Eritrea, in the Horn of Africa, to escape an ongoing war with Ethiopia and create a new life in the U.S.

She grew up in East Orange, New Jersey, watching others judge her parents and turn them away from jobs they were qualified for because of their appearance, their accented English or their unfamiliar names.

After working in the shipping industry for 20 years, Miller’s dad eventually became a New York City cab driver, an often-dangerous job in the 1980s. Her mom, despite earning a computer science degree in the U.S., trained to become a home health aide, where jobs were more available.

“My parents’ resilience and courage made my life possible,” Miller said.

After graduating from Ramapo College of New Jersey with a degree in international business, Miller worked at large automotive companies in client support, production support and project management.

Now working as a technical program manager in product security at NVIDIA, she feels like her family’s journey has come full circle.

“It’s the honor of my life being here at NVIDIA: I’m the proudest refugee,” she said.

In her role, Miller functions like a conductor in an orchestra. She works with engineers to bridge gaps and understand challenges to define solutions — always trying to create opportunities to turn a “no” into a “yes” through collaboration.

At NVIDIA, Miller feels like she can be herself, helping her thrive. She no longer feels the pressure to conform to fit in, allowing her creativity to flow freely and solve problems.

“Previously in my career, I never wore my hair curly. After someone once asked to touch my curly hair, I believed it would be easier to make myself look like everyone else. I thought it was the best way to let my work be the focus instead of my hair,” she said. “NVIDIA is the first employer that encouraged me to bring my full self to work.”

Outside of work, Veronica and her husband, Nathan, are passionate about paying it forward and helping local youth in Trenton, New Jersey. Together, they’ve developed The Miller Family Foundation to help with community needs, including education. The foundation’s scholarship fund has donated $20,000 to low-income high school students to provide support for college tuition and career mentorship.

“I truly believe anyone could get here. There wasn’t anyone that showed me the path. It was belief in myself, a ton of research and endless hard work,” she said. “We’re in a special place where my husband and I can give the next generation some of the financial support and career guidance we didn’t have.”

Learn more about NVIDIA life, culture and careers

Read More

Cloud Ahoy! Treasure Awaits With ‘Sea of Thieves’ on GeForce NOW

Cloud Ahoy! Treasure Awaits With ‘Sea of Thieves’ on GeForce NOW

Set sail for adventure, pirates. Sea of Thieves makes waves in the cloud this week. It’s an adventure-filled GFN Thursday with four new games joining the GeForce NOW library.

#GreetingsfromGFN
#GreetingsFromGFN by Cloud Gaming Photography.

Plus, members are sharing their favorite locations they can access from the cloud. Follow along all month on @NVIDIAGFN social media accounts and post your own favorite cloud screenshots using #GreetingsfromGFN.

Seas the Day

Live the pirate life in the smash-hit pirate adventure game from Rare and Xbox Game Studios. Sea of Thieves takes place in an open world where players can explore the vast seas, engage in ship battles, hunt for treasure and embark on exciting quests.

Sea of Thieves on GeForce NOW
Come sea what’s possible in the cloud.

The Sea of Thieves environment is always changing, as various seasons bring new features to the game and offer rich rewards for pirates old and new. Visit uncharted islands in search of treasure, dive deep into narrative-focused Tall Tales, take part in events and forge a path to become a true Pirate Legend. The newest season features the mysterious Sunken City, Cursed Sloop skeleton ships and fresh cosmetics.

Every pirate needs a crew, so grab some mateys and carve a fearsome reputation across the open seas, or adventure solo to keep all the bountiful treasure. Make the journey more rewarding with a GeForce NOW Ultimate membership, and play with gamers across the world with up to eight-hour gaming sessions for a kraken good time.

New Games Zoom Onto the Cloud

Disney Speedstorm on GeForce NOW
Take the tracks by storm.

Drift into the ultimate hero-based combat racing game in Disney Speedstorm, a free-to-play kart-racing game that features characters and high-speed circuits inspired by beloved Disney and Pixar worlds. Customize racers and karts, master each character’s unique skills and engage in thrilling multiplayer races. Whether exploring the docks of the Pirates’ Island track from Pirates of the Caribbean or the wilds of the Jungle Ruins map from The Jungle Book, players can experience iconic environments in the game.

Check out the list of new games this week:

  • SunnySide (New release on Steam, June 14)
  • Disney Speedstorm (Steam and Xbox, available on PC Game Pass)
  • Sea of Thieves (Steam and Xbox, available on PC Game Pass)
  • Bodycam (Steam)

What are you planning to play this weekend? Let us know on X or in the comments below.

 

Read More

Every Company’s Data Is Their ‘Gold Mine,’ NVIDIA CEO Says at Databricks Data + AI Summit

Every Company’s Data Is Their ‘Gold Mine,’ NVIDIA CEO Says at Databricks Data + AI Summit

Accelerated computing is transforming data processing and analytics for enterprises, declared NVIDIA founder and CEO Jensen Huang Wednesday during an on-stage chat with Databricks cofounder and CEO Ali Ghodsi at the Databricks Data + AI Summit 2024.

“Every company’s business data is their gold mine,” Huang said, explaining that every company has enormous amounts of data, but extracting insights and distilling intelligence from it has been challenging.

Databricks Leverages NVIDIA’s Full Stack to Accelerate Generative AI Applications

To unlock all that intelligence, Huang and Ghodsi announced the integration of NVIDIA’s accelerated computing with Databricks Photon, Databricks’ engine for fast data processing, designed to power Databricks SQL with top-tier performance and cost efficiency.

“This is a big announcement,” Huang said, adding that accelerated computing and generative AI are the two most important technological trends today. “NVIDIA and Databricks are going to partner to combine our skills in these areas and bring them to all of you.”

Huang shared that it’s taken NVIDIA five years to build a set of libraries that make it possible to accelerate Photon, allowing users to “wrangle data faster, more cost-effectively and consume a lot less energy.”

“We are super-excited to partner with you to use GPU acceleration on the Photon engine to enhance core data processing and get them to also run on NVIDIA GPUs,” Ghodsi said.

Creating Generative AI Factories With NVIDIA NIM

NVIDIA and Databricks also announced that Databricks’ open-source model DBRX is now available as an NVIDIA NIM microservice hosted on the NVIDIA API catalog.

NVIDIA NIM inference microservices provide models as fully optimized, pre-built containers for deployment anywhere.

“Creating these endpoints is complicated,” Huang explained. “We optimized everything into a microservice, which runs on every cloud and on premises.”

Microservices dramatically increase enterprise developer productivity by providing a simple, standardized way to add generative AI models to applications.

Launched in March, DBRX was built entirely on top of Databricks, leveraging all the tools and techniques available to Databricks customers and partners, and was trained with NVIDIA DGX Cloud, a scalable end-to-end AI platform for developers.

Organizations can customize DBRX with enterprise data to create high-quality, organization-specific models or use it to build a custom DBRX-style mixture of expert models as a reference architecture.

Huang said that accelerating data processing is a huge opportunity, encouraging everyone to put accelerated computing and generative AI to work.

“Whatever you do, just start — you have to engage in this incredibly fast-moving train,” Huang said. “Remember, generative AI is growing exponentially — you don’t want to wait and observe an exponential trend, because in a couple of years, you’ll be so far behind.”

Joining the Conversation

Attendees at the summit are encouraged to participate in sessions and engage with NVIDIA experts to learn more about how NVIDIA and Databricks are driving the future of AI and data intelligence.

Key sessions, taking place June 13, include:

  • “Development and Deployment of Generative AI with NVIDIA” at 12:30 p.m. PT
  • “Architecture Analysis for ETL Processing: CPU vs. GPU” at 4:30 p.m. PT;
  • “Spark RAPIDS ML: GPU Accelerated Distributed ML in Spark Clusters” at 1:30 p.m. PT

Read More

Scaling to New Heights: NVIDIA MLPerf Training Results Showcase Unprecedented Performance and Elasticity

Scaling to New Heights: NVIDIA MLPerf Training Results Showcase Unprecedented Performance and Elasticity

The full-stack NVIDIA accelerated computing platform has once again demonstrated exceptional performance in the latest MLPerf Training v4.0 benchmarks.

NVIDIA more than tripled the performance on the large language model (LLM) benchmark, based on GPT-3 175B, compared to the record-setting NVIDIA submission made last year. Using an AI supercomputer featuring 11,616 NVIDIA H100 Tensor Core GPUs connected with NVIDIA Quantum-2 InfiniBand networking, NVIDIA  achieved this remarkable feat through larger scale — more than triple that of the 3,584 H100 GPU submission a year ago — and extensive full-stack engineering.

Thanks to the scalability of the NVIDIA AI platform, Eos can now train massive AI models like GPT-3 175B even faster, and this great AI performance translates into significant business opportunities. For example, in NVIDIA’s recent earnings call, we described how LLM service providers can turn a single dollar invested into seven dollars in just four years running the Llama 3 70B model on NVIDIA HGX H200 servers. This return assumes an LLM service provider serving Llama 3 70B at $0.60/M tokens, with an HGX H200 server throughput of 24,000 tokens/second.

NVIDIA H200 GPU Supercharges Generative AI and HPC 

The NVIDIA H200 Tensor GPU builds upon the strength of the Hopper architecture, with 141GB of HBM3 memory and over 40% more memory bandwidth compared to the H100 GPU. Pushing the boundaries of what’s possible in AI training, the NVIDIA H200 Tensor Core GPU extended the H100’s performance by up to 47% in its MLPerf Training debut.

NVIDIA Software Drives Unmatched Performance Gains

Additionally, our submissions using a 512 H100 GPU configuration are now up to 27% faster compared to just one year ago due to numerous optimizations to the NVIDIA software stack. This improvement highlights how continuous software enhancements can significantly boost performance, even with the same hardware.

This work also delivered nearly perfect scaling. As the number of GPUs increased by 3.2x — going from 3,584 H100 GPUs last year to 11,616 H100 GPUs with this submission — so did the delivered performance.

Learn more about these optimizations on the NVIDIA Technical Blog.

Excelling at LLM Fine-Tuning

As enterprises seek to customize pretrained large language models, LLM fine-tuning is becoming a key industry workload. MLPerf introduced a new LLM fine-tuning benchmark this round, based on the popular low-rank adaptation (LoRA) technique applied to Meta Llama 2 70B.

The NVIDIA platform excelled at this task, scaling from eight to 1,024 GPUs, with the largest-scale NVIDIA submission completing the benchmark in a record 1.5 minutes.

Accelerating Stable Diffusion and GNN Training

NVIDIA also accelerated Stable Diffusion v2 training performance by up to 80% at the same system scales submitted last round. These advances reflect numerous enhancements to the NVIDIA software stack, showcasing how software and hardware improvements go hand-in-hand to deliver top-tier performance.

On the new graph neural network (GNN) test based on R-GAT, the NVIDIA platform with H100 GPUs excelled at both small and large scales. The H200 delivered a 47% boost on single-node GNN training compared to the H100. This showcases the powerful performance and high efficiency of NVIDIA GPUs, which make them ideal for a wide range of AI applications.

Broad Ecosystem Support

Reflecting the breadth of the NVIDIA AI ecosystem, 10 NVIDIA partners submitted results, including ASUS, Dell Technologies, Fujitsu, GIGABYTE, Hewlett Packard Enterprise, Lenovo, Oracle, Quanta Cloud Technology, Supermicro and Sustainable Metal Cloud. This broad participation, and their own impressive benchmark results, underscores the widespread adoption and trust in NVIDIA’s AI platform across the industry.

MLCommons’ ongoing work to bring benchmarking best practices to AI computing is vital. By enabling peer-reviewed comparisons of AI and HPC platforms, and keeping pace with the rapid changes that characterize AI computing, MLCommons provides companies everywhere with crucial data that can help guide important purchasing decisions.

And with the NVIDIA Blackwell platform, next-level AI performance on trillion-parameter generative AI models for both training and inference is coming soon.

Read More

TOPS of the Class: Decoding AI Performance on RTX AI PCs and Workstations

TOPS of the Class: Decoding AI Performance on RTX AI PCs and Workstations

Editor’s note: This post is part of the AI Decoded series, which demystifies AI by making the technology more accessible, and showcases new hardware, software, tools and accelerations for RTX PC users.

The era of the AI PC is here, and it’s powered by NVIDIA RTX and GeForce RTX technologies. With it comes a new way to evaluate performance for AI-accelerated tasks, and a new language that can be daunting to decipher when choosing between the desktops and laptops available.

While PC gamers understand frames per second (FPS) and similar stats, measuring AI performance requires new metrics.

Coming Out on TOPS

The first baseline is TOPS, or trillions of operations per second. Trillions is the important word here — the processing numbers behind generative AI tasks are absolutely massive. Think of TOPS as a raw performance metric, similar to an engine’s horsepower rating. More is better.

Compare, for example, the recently announced Copilot+ PC lineup by Microsoft, which includes neural processing units (NPUs) able to perform upwards of 40 TOPS. Performing 40 TOPS is sufficient for some light AI-assisted tasks, like asking a local chatbot where yesterday’s notes are.

But many generative AI tasks are more demanding. NVIDIA RTX and GeForce RTX GPUs deliver unprecedented performance across all generative tasks — the GeForce RTX 4090 GPU offers more than 1,300 TOPS. This is the kind of horsepower needed to handle AI-assisted digital content creation, AI super resolution in PC gaming, generating images from text or video, querying local large language models (LLMs) and more.

Insert Tokens to Play

TOPS is only the beginning of the story. LLM performance is measured in the number of tokens generated by the model.

Tokens are the output of the LLM. A token can be a word in a sentence, or even a smaller fragment like punctuation or whitespace. Performance for AI-accelerated tasks can be measured in “tokens per second.”

Another important factor is batch size, or the number of inputs processed simultaneously in a single inference pass. As an LLM will sit at the core of many modern AI systems, the ability to handle multiple inputs (e.g. from a single application or across multiple applications) will be a key differentiator. While larger batch sizes improve performance for concurrent inputs, they also require more memory, especially when combined with larger models.

The more you batch, the more (time) you save.

RTX GPUs are exceptionally well-suited for LLMs due to their large amounts of dedicated video random access memory (VRAM), Tensor Cores and TensorRT-LLM software.

GeForce RTX GPUs offer up to 24GB of high-speed VRAM, and NVIDIA RTX GPUs up to 48GB, which can handle larger models and enable higher batch sizes. RTX GPUs also take advantage of Tensor Cores — dedicated AI accelerators that dramatically speed up the computationally intensive operations required for deep learning and generative AI models. That maximum performance is easily accessed when an application uses the NVIDIA TensorRT software development kit (SDK), which unlocks the highest-performance generative AI on the more than 100 million Windows PCs and workstations powered by RTX GPUs.

The combination of memory, dedicated AI accelerators and optimized software gives RTX GPUs massive throughput gains, especially as batch sizes increase.

Text-to-Image, Faster Than Ever

Measuring image generation speed is another way to evaluate performance. One of the most straightforward ways uses Stable Diffusion, a popular image-based AI model that allows users to easily convert text descriptions into complex visual representations.

With Stable Diffusion, users can quickly create and refine images from text prompts to achieve their desired output. When using an RTX GPU, these results can be generated faster than processing the AI model on a CPU or NPU.

That performance is even higher when using the TensorRT extension for the popular Automatic1111 interface. RTX users can generate images from prompts up to 2x faster with the SDXL Base checkpoint — significantly streamlining Stable Diffusion workflows.

ComfyUI, another popular Stable Diffusion user interface, added TensorRT acceleration last week. RTX users can now generate images from prompts up to 60% faster, and can even convert these images to videos using Stable Video Diffuson up to 70% faster with TensorRT.

TensorRT acceleration can be put to the test in the new UL Procyon AI Image Generation benchmark, which delivers speedups of 50% on a GeForce RTX 4080 SUPER GPU compared with the fastest non-TensorRT implementation.

TensorRT acceleration will soon be released for Stable Diffusion 3 — Stability AI’s new, highly anticipated text-to-image model — boosting performance by 50%. Plus, the new TensorRT-Model Optimizer enables accelerating performance even further. This results in a 70% speedup compared with the non-TensorRT implementation, along with a 50% reduction in memory consumption.

Of course, seeing is believing — the true test is in the real-world use case of iterating on an original prompt. Users can refine image generation by tweaking prompts significantly faster on RTX GPUs, taking seconds per iteration compared with minutes on a Macbook Pro M3 Max. Plus, users get both speed and security with everything remaining private when running locally on an RTX-powered PC or workstation.

The Results Are in and Open Sourced

But don’t just take our word for it. The team of AI researchers and engineers behind the open-source Jan.ai recently integrated TensorRT-LLM into its local chatbot app, then tested these optimizations for themselves.

Source: Jan.ai

The researchers tested its implementation of TensorRT-LLM against the open-source llama.cpp inference engine across a variety of GPUs and CPUs used by the community. They found that TensorRT is “30-70% faster than llama.cpp on the same hardware,” as well as more efficient on consecutive processing runs. The team also included its methodology, inviting others to measure generative AI performance for themselves.

From games to generative AI, speed wins. TOPS, images per second, tokens per second and batch size are all considerations when determining performance champs.

Generative AI is transforming gaming, videoconferencing and interactive experiences of all kinds. Make sense of what’s new and what’s next by subscribing to the AI Decoded newsletter.

Read More

Nerding About NeRFs: How Neural Radiance Fields Transform 2D Images Into Hyperrealistic 3D Models

Nerding About NeRFs: How Neural Radiance Fields Transform 2D Images Into Hyperrealistic 3D Models

Let’s talk about NeRFs — no, not the neon-colored foam dart blasters, but neural radiance fields, a technology that might just change the nature of images forever. In this episode of NVIDIA’s AI Podcast recorded live at GTC, host Noah Kravitz speaks with Michael Rubloff, founder and managing editor of radiancefields.com, about radiance field-based technologies. NeRFs allow users to take a series of 2D images or video to create a hyperrealistic 3D model — something like a photograph of a scene, but that can be looked at from multiple angles. Tune in to learn more about the technology’s creative and commercial applications and how it might transform the way people capture and experience the world.

Watch the replay of Rubloff’s GTC session on the intersection of generative AI and extended reality.

Time Stamps

1:18: What are NeRFs?
3:01: How do NeRFs work?
3:44: What’s the difference between NeRFs and Gaussian splatting?
4:36: How are NeRFs being used?
7:22: What is a radiance field?
14:18: How might radiance fields affect creative applications?
17:50: Examples of NeRFs in action in the media right now
21:00: Rubloff’s insight on where NeRFs will go in the future

You Might Also Like…

Media.Monks’ Lewis Smithingham on Enhancing Media and Marketing With AI – Ep. 222

Meet Media.Monks’ Wormhole, an alien-like, conversational robot with a quirky personality and the ability to offer keen marketing expertise. Lewis Smithingham, senior vice president of innovation and special ops at Media.Monks, a global marketing and advertising company, discusses the creation of Wormhole and AI’s potential to enhance media and entertainment.

Living Optics CEO Robin Wang on Democratizing Hyperspectral Imaging – Ep. 219

Step into the realm of the unseen with Robin Wang, CEO of Living Optics. The startup cofounder discusses the power of Living Optics’ hyperspectral imaging camera, which can capture visual data across 96 colors, reveals details invisible to the human eye.

Exploring Filmmaking With Cuebric’s AI: Insights from Pinar Seyhan Demirdag – Ep. 214

Pinar Seyhan Demirdag, co-founder and CEO of Cuebric, which is on a mission to offer new filmmaking and content creation solutions through immersive, two-and-a-half-dimensional cinematic environments.

Deepdub’s Ofir Krakowski on Redefining Dubbing From Hollywood to Bollywood – Ep. 202

Deepdub acts as a digital bridge, providing access to content by using generative AI to break down language and cultural barriers. The Israel-based startup’s co-founder and CEO, Ofir Krakowski, describes how Deepdub uses AI-driven dubbing to help entertainment companies boost efficiency and cut costs while increasing accessibility.

Subscribe to the AI Podcast

Get the AI Podcast through iTunes, Google Play, Amazon Music, Castbox, DoggCatcher, Overcast, PlayerFM, Pocket Casts, Podbay, PodBean, PodCruncher, PodKicker, Soundcloud, Spotify, Stitcher and TuneIn.

Make the AI Podcast better: Have a few minutes to spare? Fill out this listener survey.

Read More

Why Accelerated Data Processing Is Crucial for AI Innovation in Every Industry

Why Accelerated Data Processing Is Crucial for AI Innovation in Every Industry

Across industries, AI is supercharging innovation with machine-powered computation. In finance, bankers are using AI to detect fraud more quickly and keep accounts safe, telecommunications providers are improving networks to deliver superior service, scientists are developing novel treatments for rare diseases, utility companies are building cleaner, more reliable energy grids and automotive companies are making self-driving cars safer and more accessible.

The backbone of top AI use cases is data. Effective and precise AI models require training on extensive datasets. Enterprises seeking to harness the power of AI must establish a data pipeline that involves extracting data from diverse sources, transforming it into a consistent format and storing it efficiently.

Data scientists work to refine datasets through multiple experiments to fine-tune AI models for optimal performance in real-world applications. These applications, from voice assistants to personalized recommendation systems, require rapid processing of large data volumes to deliver real-time performance.

As AI models become more complex and begin to handle diverse data types such as text, audio, images, and video, the need for rapid data processing becomes more critical. Organizations that continue to rely on legacy CPU-based computing are struggling with hampered innovation and performance due to data bottlenecks, escalating data center costs, and insufficient computing capabilities.

Many businesses are turning to accelerated computing to integrate AI into their operations. This method leverages GPUs, specialized hardware, software, and parallel computing techniques to boost computing performance by as much as 150x and increase energy efficiency by up to 42x.

Leading companies across different sectors are using accelerated data processing to spearhead groundbreaking AI initiatives.

Finance Organizations Detect Fraud in a Fraction of a Second

Financial organizations face a significant challenge in detecting patterns of fraud due to the vast amount of transactional data that requires rapid analysis. Additionally, the scarcity of labeled data for actual instances of fraud poses a difficulty in training AI models. Conventional data science pipelines lack the required acceleration to handle the large data volumes associated with fraud detection. This leads to slower processing times that hinder real-time data analysis and fraud detection capabilities.

To overcome these challenges, American Express, which handles more than 8 billion transactions per year, uses accelerated computing to train and deploy long short-term memory (LSTM) models. These models excel in sequential analysis and detection of anomalies, and can adapt and learn from new data, making them ideal for combating fraud.

Leveraging parallel computing techniques on GPUs, American Express significantly speeds up the training of its LSTM models. GPUs also enable live models to process huge volumes of transactional data to make high-performance computations to detect fraud in real time.

The system operates within two milliseconds of latency to better protect customers and merchants, delivering a 50x improvement over a CPU-based configuration. By combining the accelerated LSTM deep neural network with its existing methods, American Express has improved fraud detection accuracy by up to 6% in specific segments.

Financial companies can also use accelerated computing to reduce data processing costs. Running data-heavy Spark3 workloads on NVIDIA GPUs, PayPal confirmed the potential to reduce cloud costs by up to 70% for big data processing and AI applications.

By processing data more efficiently, financial institutions can detect fraud in real time, enabling faster decision-making without disrupting transaction flow and minimizing the risk of financial loss.

Telcos Simplify Complex Routing Operations

Telecommunications providers generate immense amounts of data from various sources, including network devices, customer interactions, billing systems, and network performance and maintenance.

Managing national networks that handle hundreds of petabytes of data every day requires complex technician routing to ensure service delivery. To optimize technician dispatch, advanced routing engines perform trillions of computations, taking into account factors like weather, technician skills, customer requests and fleet distribution. Success in these operations depends on meticulous data preparation and sufficient computing power.

AT&T, which operates one of the nation’s largest field dispatch teams to service its customers, is enhancing data-heavy routing operations with NVIDIA cuOpt, which relies on heuristics, metaheuristics and optimizations to calculate complex vehicle routing problems.

In early trials, cuOpt delivered routing solutions in 10 seconds, achieving a 90% reduction in cloud costs and enabling technicians to complete more service calls daily. NVIDIA RAPIDS, a suite of software libraries that enables acceleration of data science and analytics pipelines, further accelerates cuOpt, allowing companies to integrate local search heuristics and metaheuristics like Tabu search for continuous route optimization.

AT&T is adopting NVIDIA RAPIDS Accelerator for Apache Spark to enhance the performance of Spark-based AI and data pipelines. This has helped the company boost operational efficiency on everything from training AI models to maintaining network quality to reducing customer churn and improving fraud detection. With RAPIDS Accelerator, AT&T is reducing its cloud computing spend for target workloads while enabling faster performance and reducing its carbon footprint.

Accelerated data pipelines and processing will be critical as telcos seek to improve operational efficiency while delivering the highest possible service quality.

Biomedical Researchers Condense Drug Discovery Timelines

As researchers utilize technology to study the roughly 25,000 genes in the human genome to understand their relationship with diseases, there has been an explosion of medical data and peer-reviewed research papers. Biomedical researchers rely on these papers to narrow down the field of study for novel treatments. However, conducting literature reviews of such a massive and expanding body of relevant research has become an impossible task.

AstraZeneca, a leading pharmaceutical company, developed a Biological Insights Knowledge Graph (BIKG) to aid scientists across the drug discovery process, from literature reviews to screen hit rating, target identification and more. This graph integrates public and internal databases with information from scientific literature, modeling between 10 million and 1 billion complex biological relationships.

BIKG has been effectively used for gene ranking, aiding scientists in hypothesizing high-potential targets for novel disease treatments. At NVIDIA GTC, the AstraZeneca team presented a project that successfully identified genes linked to resistance in lung cancer treatments.

To narrow down potential genes, data scientists and biological researchers collaborated to define the criteria and gene features ideal for targeting in treatment development. They trained a machine learning algorithm to search the BIKG databases for genes with the designated features mentioned in literature as treatable. Utilizing NVIDIA RAPIDS for faster computations, the team reduced the initial gene pool from 3,000 to just 40 target genes, a task that previously took months but now takes mere seconds.

By supplementing drug development with accelerated computing and AI, pharmaceutical companies and researchers can finally use the enormous troves of data building up in the medical field to develop novel drugs faster and more safely, ultimately having a life-saving impact.

Utility Companies Build the Future of Clean Energy 

There’s been a significant push to shift to carbon-neutral energy sources in the energy sector. With the cost of harnessing renewable resources such as solar energy falling drastically over the last 10 years, the opportunity to make real progress toward a clean energy future has never been greater.

However, this shift toward integrating clean energy from wind farms, solar farms and home batteries has introduced new complexities in grid management. As energy infrastructure diversifies and two-way power flows must be accommodated, managing the grid has become more data-intensive. New smart grids are now required to handle high-voltage areas for vehicle charging. They must also manage the availability of distributed stored energy sources and adapt to variations in usage across the network.

Utilidata, a prominent grid-edge software company, has collaborated with NVIDIA to develop a distributed AI platform, Karman, for the grid edge using a custom NVIDIA Jetson Orin edge AI module. This custom chip and platform, embedded in electricity meters, transforms each meter into a data collection and control point, capable of handling thousands of data points per second.

Karman processes real-time, high-resolution data from meters at the network’s edge. This enables utility companies to gain detailed insights into grid conditions, predict usage and seamlessly integrate distributed energy resources in seconds, rather than minutes or hours. Additionally, with inference models on edge devices, network operators can anticipate and quickly identify line faults to predict potential outages and conduct preventative maintenance to increase grid reliability.

Through the integration of AI and accelerated data analytics, Karman helps utility providers transform existing infrastructure into efficient smart grids. This allows for tailored, localized electricity distribution to meet fluctuating demand patterns without extensive physical infrastructure upgrades, facilitating a more cost-effective modernization of the grid.

Automakers Enable Safer, More Accessible, Self-Driving Vehicles

As auto companies strive for full self-driving capabilities, vehicles must be able to detect objects and navigate in real time. This requires high-speed data processing tasks, including feeding live data from cameras, lidar, radar and GPS into AI models that make navigation decisions to keep roads safe.

The autonomous driving inference workflow is complex and includes multiple AI models along with necessary preprocessing and postprocessing steps. Traditionally, these steps were handled on the client side using CPUs. However, this can lead to significant bottlenecks in processing speeds, which is an unacceptable drawback for an application where fast processing equates to safety.

To enhance the efficiency of autonomous driving workflows, electric vehicle manufacturer NIO integrated NVIDIA Triton Inference Server into its inference pipeline. NVIDIA Triton is open-source, multi-framework, inference-serving software. By centralizing data processing tasks, NIO reduced latency by 6x in some core areas and increased overall data throughput by up to 5x.

NIO’s GPU-centric approach made it easier to update and deploy new AI models without the need to change anything on the vehicles themselves. Additionally, the company could use multiple AI models at the same time on the same set of images without having to send data back and forth over a network, which saved on data transfer costs and improved performance.

By using accelerated data processing, autonomous vehicle software developers ensure they can reach a high-performance standard to avoid traffic accidents, lower transportation costs and improve mobility for users.

Retailers Improve Demand Forecasting

In the fast-paced retail environment, the ability to process and analyze data quickly is critical to adjusting inventory levels, personalizing customer interactions and optimizing pricing strategies on the fly. The larger a retailer is and the more products it carries, the more complex and compute-intensive its data operations will be.

Walmart, the largest retailer in the world, turned to accelerated computing to significantly improve forecasting accuracy for 500 million item-by-store combinations across 4,500 stores.

As Walmart’s data science team built more robust machine learning algorithms to take on this mammoth forecasting challenge, the existing computing environment began to falter, with jobs failing to complete or generating inaccurate results. The company found that data scientists were having to remove features from algorithms just so they would run to completion.

To improve its forecasting operations, Walmart started using NVIDIA GPUs and RAPIDs. The company now uses a forecasting model with 350 data features to predict sales across all product categories. These features encompass sales data, promotional events, and external factors like weather conditions and major events like the Super Bowl, which influence demand.

Advanced models helped Walmart improve forecast accuracy from 94% to 97% while eliminating an estimated $100 million in fresh produce waste and reducing stockout and markdown scenarios. GPUs also ran models 100x faster with jobs complete in just four hours, an operation that would’ve taken several weeks in a CPU environment.

By shifting data-intensive operations to GPUs and accelerated computing, retailers can lower both their cost and their carbon footprint while delivering best-fit choices and lower prices to shoppers.

Public Sector Improves Disaster Preparedness 

Drones and satellites capture huge amounts of aerial image data that public and private organizations use to predict weather patterns, track animal migrations and observe environmental changes. This data is invaluable for research and planning, enabling more informed decision-making in fields like agriculture, disaster management and efforts to combat climate change. However, the value of this imagery can be limited if it lacks specific location metadata.

A federal agency working with NVIDIA needed a way to automatically pinpoint the location of images missing geospatial metadata, which is essential for missions such as search and rescue, responding to natural disasters and monitoring the environment. However, identifying a small area within a larger region using an aerial image without metadata is extremely challenging, akin to locating a needle in a haystack. Algorithms designed to help with geolocation must address variations in image lighting and differences due to images being taken at various times, dates and angles.

To identify non-geotagged aerial images, NVIDIA, Booz Allen and the government agency collaborated on a solution that uses computer vision algorithms to extract information from image pixel data to scale the image similarity search problem.

When attempting to solve this problem, an NVIDIA solutions architect first used a Python-based application. Initially running on CPUs, processing took more than 24 hours. GPUs supercharged this to just minutes, performing thousands of data operations in parallel versus only a handful of operations on a CPU. By shifting the application code to CuPy, an open-sourced GPU-accelerated library, the application experienced a remarkable 1.8-million-x speedup, returning results in 67 microseconds.

With a solution that can process images and the data of large land masses in just minutes, organizations can gain access to the critical information needed to respond more quickly and effectively to emergencies and plan proactively, potentially saving lives and safeguarding the environment.

Accelerate AI Initiatives and Deliver Business Results

Companies using accelerated computing for data processing are advancing AI initiatives and positioning themselves to innovate and perform at higher levels than their peers.

Accelerated computing handles larger datasets more efficiently, enables faster model training and selection of optimal algorithms, and facilitates more precise results for live AI solutions.

Enterprises that use it can achieve superior price-performance ratios compared to traditional CPU-based systems and enhance their ability to deliver outstanding results and experiences to customers, employees and partners.

Learn how accelerated computing helps organizations achieve AI objectives and drive innovation. 

Read More