How AI Is Personalizing Customer Service Experiences Across Industries

How AI Is Personalizing Customer Service Experiences Across Industries

Customer service departments across industries are facing increased call volumes, high customer service agent turnover, talent shortages and shifting customer expectations.

Customers expect both self-help options and real-time, person-to-person support. These expectations for seamless, personalized experiences extend across digital communication channels, including live chat, text and social media.

Despite the rise of digital channels, many consumers still prefer picking up the phone for support, placing strain on call centers. As companies strive to enhance the quality of customer interactions, operational efficiency and costs remain a significant concern.

To address these challenges, businesses are deploying AI-powered customer service software to boost agent productivity, automate customer interactions and harvest insights to optimize operations.

In nearly every industry, AI systems can help improve service delivery and customer satisfaction. Retailers are using conversational AI to help manage omnichannel customer requests, telecommunications providers are enhancing network troubleshooting, financial institutions are automating routine banking tasks, and healthcare facilities are expanding their capacity for patient care.

What Are the Benefits of AI for Customer Service?

With strategic deployment of AI, enterprises can transform customer interactions through intuitive problem-solving to build greater operational efficiencies and elevate customer satisfaction.

By harnessing customer data from support interactions, documented FAQs and other enterprise resources, businesses can develop AI tools that tap into their organization’s unique collective knowledge and experiences to deliver personalized service, product recommendations and proactive support.

Customizable, open-source generative AI technologies such as large language models (LLMs), combined with natural language processing (NLP) and retrieval-augmented generation (RAG), are helping industries accelerate the rollout of use-case-specific customer service AI. According to McKinsey, over 80% of customer care executives are already investing in AI or planning to do so soon.

With cost-efficient, customized AI solutions, businesses are automating management of help-desk support tickets, creating more effective self-service tools and supporting their customer service agents with AI assistants. This can significantly reduce operational costs and improve the customer experience.

Developing Effective Customer Service AI

For satisfactory, real-time interactions, AI-powered customer service software must return accurate, fast and relevant responses. Some  tricks of the trade include:

Open-source foundation models can fast-track AI development. Developers can flexibly adapt and enhance these pretrained machine learning models, and enterprises can use them to launch AI projects without the high costs of building models from scratch.

RAG frameworks connect foundation or general-purpose LLMs to proprietary knowledge bases and data sources, including inventory management and customer relationship management systems and customer service protocols. Integrating RAG into conversational chatbots, AI assistants and copilots tailors responses to the context of customer queries.

Human-in-the-loop processes remain crucial to both AI training and live deployments. After initial training of foundation models or LLMs, human reviewers should judge the AI’s responses and provide corrective feedback. This helps to guard against issues such as hallucination —  where the model generates false or misleading information, and other errors including toxicity or off-topic responses. This type of human involvement ensures fairness, accuracy and security is fully considered during AI development.

Human participation is even more important for AI in production. When an AI is unable to adequately resolve a customer question, the program must be able to route the call to customer support teams. This collaborative approach between AI and human agents ensures that customer engagement is efficient and empathetic.

What’s the ROI of Customer Service AI?   

The return on investment of customer service AI should be measured primarily based on efficiency gains and cost reductions. To quantify ROI, businesses can measure key indicators such as reduced response times, decreased operational costs of contact centers, improved customer satisfaction scores and revenue growth resulting from AI-enhanced services.

For instance, the cost of implementing an AI chatbot using open-source models can be compared with the expenses incurred by routing customer inquiries through traditional call centers. Establishing this baseline helps assess the financial impact of AI deployments on customer service operations.

To solidify understanding of ROI before scaling AI deployments, companies can consider a pilot period. For example, by redirecting 20% of call center traffic to AI solutions for one or two quarters and closely monitoring the outcomes, businesses can obtain concrete data on performance improvements and cost savings. This approach helps prove ROI and informs decisions for further investment.

Businesses across industries are using AI for customer service and measuring their success:

Retailers Reduce Call Center Load 

Modern shoppers expect smooth, personalized and efficient shopping experiences, whether in store or on an e-commerce site. Customers of all generations continue prioritizing live human support, while also desiring the option to use different channels. But complex customer issues coming from a diverse customer base can make it difficult for support agents to quickly comprehend and resolve incoming requests.

To address these challenges, many retailers are turning to conversational AI and AI-based call routing. According to NVIDIA’s 2024 State of AI in Retail and CPG report, nearly 70% of retailers believe that AI has already boosted their annual revenue.

CP All, Thailand’s sole licensed operator for 7-Eleven convenience stores, has implemented conversational AI chatbots in its call centers, which rack up more than 250,000 calls per day. Training the bots presented unique challenges due to the complexities of the Thai language, which includes 21 consonants, 18 pure vowels, three diphthongs and five tones.

To manage this, CP All used NVIDIA NeMo, a framework designed for building, training and fine-tuning GPU-accelerated speech and natural language understanding models. With automatic speech recognition and NLP models powered by NVIDIA technologies, CP All’s chatbot achieved a 97% accuracy rate in understanding spoken Thai.

With the conversational chatbot handling a significant number of customer conversations, the call load on human agents was reduced by 60%. This allowed customer service teams to focus on more complex tasks. The chatbot also helped reduce wait times and provided quicker, more accurate responses, leading to higher customer satisfaction levels.

With AI-powered support experiences, retailers can enhance customer retention, strengthen brand loyalty and boost sales.

Telecommunications Providers Automate Network Troubleshooting

Telecommunications providers are challenged to address complex network issues while adhering to service-level agreements with end customers for network uptime. Maintaining network performance requires rapid troubleshooting of network devices, pinpointing root causes and resolving difficulties at network operations centers.

With its abilities to analyze vast amounts of data, troubleshoot network problems autonomously and execute numerous tasks simultaneously, generative AI is ideal for network operations centers. According to an IDC survey, 73% of global telcos have prioritized AI and machine learning investments for operational support as their top transformation initiative, underscoring the industry’s shift toward AI and advanced technologies.

Infosys, a leader in next-generation digital services and consulting, has built AI-driven solutions to help its telco partners overcome customer service challenges. Using NVIDIA NIM microservices and RAG, Infosys developed an AI chatbot to support network troubleshooting.

By offering quick access to essential, vendor-agnostic router commands for diagnostics and monitoring, the generative AI-powered chatbot significantly reduces network resolution times, enhancing overall customer support experiences.

To ensure accuracy and contextual responses, Infosys trained the generative AI solution on telecom device-specific manuals, training documents and troubleshooting guides. Using NVIDIA NeMo Retriever to query enterprise data, Infosys achieved 90% accuracy for its LLM output. By fine-tuning and deploying models with NVIDIA technologies, Infosys achieved a latency of 0.9 seconds, a 61% reduction compared with its baseline model. The RAG-enabled chatbot powered by NeMo Retriever also attained 92% accuracy, compared with the baseline model’s 85%.

With AI tools supporting network administrators, IT teams and customer service agents, telecom providers can more efficiently identify and resolve network issues.

Financial Services Institutions Pinpoint Fraud With Ease

While customers expect anytime, anywhere banking and support, financial services require a heightened level of data sensitivity. And unlike other industries that may include one-off purchases, banking is typically based on ongoing transactions and long-term customer relationships.

At the same time, user loyalty can be fleeting, with up to 80% of banking customers willing to switch institutions for a better experience. Financial institutions must continuously improve their support experiences and update their analyses of customer needs and preferences.

Many banks are turning to AI virtual assistants that can interact directly with customers to manage inquiries, execute transactions and escalate complex issues to human customer support agents. According to NVIDIA’s 2024 State of AI in Financial Services report, more than one-fourth of survey respondents are using AI to enhance customer experiences, and 34% are exploring the use of generative AI and LLMs for customer experience and engagement.

Bunq, a European digital bank with more than 2 million customers and 8 billion euros worth of deposits, is deploying generative AI to meet user needs. With proprietary LLMs, the company built Finn, a personal AI assistant available to both customers and bank employees. Finn can answer finance-related inquiries such as “How much did I spend on groceries last month?” or “What is the name of the Indian restaurant I ate at last week?”

Plus, with a human-in-the-loop process, Finn helps employees more quickly identify fraud. By collecting and analyzing data for compliance officers to review, bunq now identifies fraud in just three to seven minutes, down from 30 minutes without Finn.

By deploying AI tools that can use data to protect customer transactions, execute banking requests and act on customer feedback, financial institutions can serve customers at a higher level, building the trust and satisfaction necessary for long-term relationships.

Healthcare and Life Sciences Organizations Overcome Staffing Shortages

In healthcare, patients need quick access to medical expertise, precise and tailored treatment options, and empathetic interactions with healthcare professionals. But with the World Health Organization estimating a 10 million personnel shortage by 2030, access to quality care could be jeopardized.

AI-powered digital healthcare assistants are helping medical institutions do more with less. With LLMs trained on specialized medical corpuses, AI copilots can save physicians and nurses hours of daily work by helping with clinical note-taking, automating order-placing for prescriptions and lab tests, and following up with after-visit patient notes.

Multimodal AI that combines language and vision models can make healthcare settings safer by extracting insights and providing summaries of image data for patient monitoring. For example, such technology can alert staff of patient fall risks and other patient room hazards.

To support healthcare professionals, Hippocratic AI has trained a generative AI healthcare agent to perform low-risk, non-diagnostic routine tasks, like reminding patients of necessary appointment prep and following up after visits to make sure medication routines are being followed and no adverse side effects are being experienced.

Hippocratic AI trained its models on evidence-based medicine and completed rigorous testing with a large group of certified nurses and doctors. The constellation architecture of the solution comprises 20 models, one of which communicates with patients while the other 19 supervise its output. The complete system contains 1.7 trillion parameters.

The possibility of every doctor and patient having their own AI-powered digital healthcare assistant means reduced clinician burnout and higher-quality medical care.

Raising the Bar for Customer Experiences With AI 

By integrating AI into customer service interactions, businesses can offer more personalized, efficient and prompt service, setting new standards for omnichannel support experiences across platforms. With AI virtual assistants that process vast amounts of data in seconds, enterprises can equip their support agents to deliver tailored responses to the complex needs of a diverse customer base.

To develop and deploy effective customer service AI, businesses can fine-tune AI models and deploy RAG solutions to meet diverse and specific needs.

NVIDIA offers a suite of tools and technologies to help enterprises get started with customer service AI.

NVIDIA NIM microservices, part of the NVIDIA AI Enterprise software platform, accelerate generative AI deployment and support various optimized AI models for seamless, scalable inference. NVIDIA NIM Agent Blueprints provide developers with packaged reference examples to build innovative solutions for customer service applications.

By taking advantage of AI development tools, enterprises can build accurate and high-speed AI applications to transform employee and customer experiences.

Learn more about improving customer service with generative AI.

Read More

Three Ways to Ride the Flywheel of Cybersecurity AI

Three Ways to Ride the Flywheel of Cybersecurity AI

The business transformations that generative AI brings come with risks that AI itself can help secure in a kind of flywheel of progress.

Companies who were quick to embrace the open internet more than 20 years ago were among the first to reap its benefits and become proficient in modern network security.

Enterprise AI is following a similar pattern today. Organizations pursuing its advances — especially with powerful generative AI capabilities — are applying those learnings to enhance their security.

For those just getting started on this journey, here are ways to address with AI three of the top security threats industry experts have identified for large language models (LLMs).

AI Guardrails Prevent Prompt Injections

Generative AI services are subject to attacks from malicious prompts designed to disrupt the LLM behind it or gain access to its data. As the report cited above notes, “Direct injections overwrite system prompts, while indirect ones manipulate inputs from external sources.”

The best antidote for prompt injections are AI guardrails, built into or placed around LLMs. Like the metal safety barriers and concrete curbs on the road, AI guardrails keep LLM applications on track and on topic.

The industry has delivered and continues to work on solutions in this area. For example, NVIDIA NeMo Guardrails software lets developers protect the trustworthiness, safety and security of generative AI services.

AI Detects and Protects Sensitive Data

The responses LLMs give to prompts can on occasion reveal sensitive information. With multifactor authentication and other best practices, credentials are becoming increasingly complex, widening the scope of what’s considered sensitive data.

To guard against disclosures, all sensitive information should be carefully removed or obscured from AI training data. Given the size of datasets used in training, it’s hard for humans — but easy for AI models — to ensure a data sanitation process is effective.

An AI model trained to detect and obfuscate sensitive information can help safeguard against revealing anything confidential that was inadvertently left in an LLM’s training data.

Using NVIDIA Morpheus, an AI framework for building cybersecurity applications, enterprises can create AI models and accelerated pipelines that find and protect sensitive information on their networks. Morpheus lets AI do what no human using traditional rule-based analytics can: track and analyze the massive data flows on an entire corporate network.

AI Can Help Reinforce Access Control

Finally, hackers may try to use LLMs to get access control over an organization’s assets. So, businesses need to prevent their generative AI services from exceeding their level of authority.

The best defense against this risk is using the best practices of security-by-design. Specifically, grant an LLM the least privileges and continuously evaluate those permissions, so it can only access the tools and data it needs to perform its intended functions. This simple, standard approach is probably all most users need in this case.

However, AI can also assist in providing access controls for LLMs. A separate inline model can be trained to detect privilege escalation by evaluating an LLM’s outputs.

Start the Journey to Cybersecurity AI

No one technique is a silver bullet; security continues to be about evolving measures and countermeasures. Those who do best on that journey make use of the latest tools and technologies.

To secure AI, organizations need to be familiar with it, and the best way to do that is by deploying it in meaningful use cases. NVIDIA and its partners can help with full-stack solutions in AI, cybersecurity and cybersecurity AI.

Looking ahead, AI and cybersecurity will be tightly linked in a kind of virtuous cycle, a flywheel of progress where each makes the other better. Ultimately, users will come to trust it as just another form of automation.

Learn more about NVIDIA’s cybersecurity AI platform and how it’s being put to use. And listen to cybersecurity talks from experts at the NVIDIA AI Summit in October.

Read More

19 New Games to Drop for GeForce NOW in September

19 New Games to Drop for GeForce NOW in September

Fall will be here soon, so leaf it to GeForce NOW to bring the games, with 19 joining the cloud in September.

Get started with the seven games available to stream this week, and a day one PC Game Pass title, Age of Mythology: Retold, from the creators of the award-winning Age of Empires franchise World’s Edge, Forgotten Empires and Xbox Game Studios.

The Open Beta for Call of Duty: Black Ops 6 runs Sept. 6-9, offering everyone a chance to experience game-changing innovations before the title officially launches on Oct. 25. Members can stream the Battle.net and Steam versions of the Open Beta instantly this week on GeForce NOW to jump right into the action.

Where Myths and Heroes Collide

Age of Mythology on GeForce NOW
A vast, mythical world to explore with friends? Say no more…

Age of Mythology: Retold revitalizes the classic real-time strategy game by merging its beloved elements with modern visuals.

Get immersed in a mythical universe, command legendary units and call upon the powers of various gods from the Atlantean, Greek, Egyptian and Norse pantheons. The single-player experience features a 50-mission campaign, including engaging battles and myth exploration in iconic locations like Troy and Midgard. Challenge friends in head-to-head matches or cooperate to take on advanced, AI-powered opponents.

Call upon the gods from the cloud with an Ultimate and Priority membership and stream the game across devices. Games update automatically in the cloud, so members can dive into the action without having to wait.

September Gets Better With New Games

The Casting of Frank Stone on GeForce NOW
Choose your fate.

Catch the storytelling prowess of Supermassive Games in The Casting of Frank Stone, available to stream this week for members. The shadow of Frank Stone looms over Cedar Hills, a town forever altered by his violent past. Delve into the mystery of Cedar Hills alongside an original cast of characters bound together on a twisted journey where nothing is quite as it seems. Every decision shapes the story and impacts the fate of the characters.

In addition, members can look for the following games this week:

  • The Casting of Frank Stone (New release on Steam, Sept. 3)
  • Age of Mythology (New release on Steam and Xbox, available on PC Game Pass, Sept.4 )
  • Sniper Ghost Warrior Contracts  (New release on Epic Games Store, early access Sept. 5)
  • Warhammer 40,000: Space Marine 2 (New release on Steam, early access Sept. 5)
  • Crime Scene Cleaner (Steam)
  • FINAL FANTASY XVI Demo (Epic Games Store)
  • Sins of a Solar Empire II (Steam)

Here’s what members can expect for the rest of September:

  • Frostpunk 2 (New release on Steam and Xbox available  on PC Game Pass, Sept. 17)
  • FINAL FANTASY XVI (New release on Steam and Epic Games Store, Sept. 17)
  • The Plucky Squire (New release on Steam, Sept. 17)
  • Tiny Glade (New release on Steam, Sept. 23)
  • Disney Epic Mickey: Rebrushed (New release on Steam, Sept. 24)
  • Greedfall II: The Dying World (New release on Steam, Sept. 24)
  • Mechabellum ( Steam)
  • Blacksmith Master (New release on Steam, Sept. 26)
  • Breachway (New release on Steam, Sept. 26)
  • REKA (New release on Steam)
  • Test Drive Unlimited Solar Crown (New release on Steam)
  • Rider’s Republic (New release on PC Game Pass, Sept. 11). To begin playing, members need to activate access, and can refer to the help article for instructions.

Additions to August

In addition to the 18 games announced last month, 48 more joined the GeForce NOW library:

  • Prince of Persia: The Lost Crown (Day zero release on Steam, Aug. 8)
  • FINAL FANTASY XVI Demo (New release on Steam, Aug. 19)
  • Black Myth: Wukong (New release on Steam and Epic Games Store, Aug. 20)
  • GIGANTIC: RAMPAGE EDITION (Available on Epic Games Store, free Aug. 22)
  • Skull and Bones (New release on Steam, Aug. 22)
  • Endzone 2 (New release on Steam, Aug. 26)
  • Age of Mythology: Retold (Advanced access on Steam, Xbox, available on PC Game Pass, Aug. 27)
  • Core Keeper (New release on Xbox, available on PC Game Pass, Aug. 27)
  • Alan Wake’s American Nightmare (Xbox, available on Microsoft Store)
  • Car Manufacture (Steam)
  • Cat Quest III (Steam)
  • Commandos 3 – HD Remaster (Xbox, available on Microsoft Store)
  • Cooking Simulator (Xbox, available on PC Game Pass)
  • Crown Trick (Xbox, available on Microsoft Store)
  • Darksiders Genesis (Xbox, available on Microsoft Store)
  • Desperados III (Xbox, available on Microsoft Store)
  • The Dungeon of Naheulbeuk: The Amulet of Chaos (Xbox, available on Microsoft Store)
  • Expeditions: Rome (Xbox, available on Microsoft Store)
  • The Flame in the Flood (Xbox, available on Microsoft Store)
  • FTL: Faster Than Light (Xbox, available on Microsoft Store)
  • Genesis Noir (Xbox, available on PC Game Pass)
  • House Flipper (Xbox, available on PC Game Pass)
  • Into the Breach (Xbox, available on Microsoft Store)
  • Iron Harvest (Xbox, available on Microsoft Store)
  • The Knight Witch (Xbox, available on Microsoft Store)
  • Lightyear Frontier (Xbox, available on PC Game Pass)
  • Medieval Dynasty (Xbox, available on PC Game Pass)
  • Metro Exodus Enhanced Edition (Xbox, available on Microsoft Store)
  • My Time at Portia (Xbox, available on PC Game Pass)
  • Night in the Woods (Xbox, available on Microsoft Store )
  • Offworld Trading Company (Xbox, available on PC Game Pass)
  • Orwell: Keeping an Eye on You (Xbox, available on Microsoft Store)
  • Outlast 2 (Xbox, available on Microsoft Store)
  • Project Winter (Xbox, available on Microsoft Store)
  • Psychonauts (Steam)
  • Psychonauts 2 (Steam and Xbox, available on PC Game Pass)
  • Shadow Tactics: Blades of the Shogun (Xbox, available on Microsoft Store)
  • Sid Meier’s Civilization VI (Steam, Epic Games Store and Xbox, available on the Microsoft store)
  • Sid Meier’s Civilization V (Steam)
  • Sid Meier’s Civilization IV (Steam)
  • Sid Meier’s Civilization: Beyond Earth (Steam)
  • Spirit of the North (Xbox, available on PC Game Pass)
  • SteamWorld Heist II (Steam, Xbox, available on Microsoft Store)
  • Visions of Mana Demo (Steam)
  • This War of Mine (Xbox, available on PC Game Pass)
  • We Were Here Too (Steam)
  • Wreckfest (Xbox, available on PC Game Pass)
  • Yoku’s Island Express (Xbox, available on Microsoft Store)

Breachway was originally included in the August games list, but the launch date was moved to September by the developer. Stay tuned to GFN Thursday for updates.

Starting in October, members will no longer see the option of launching “Epic Games Store” versions of games published by Ubisoft on GeForce NOW.  To play these supported games, members can select the “Ubisoft Connect” option on GeForce NOW and will need to connect their Ubisoft Connect and Epic game store accounts the first time they play the game. Check out more details.

What are you planning to play this weekend? Let us know on X or in the comments below.

Read More

Volvo Cars EX90 SUV Rolls Out, Built on NVIDIA Accelerated Computing and AI

Volvo Cars EX90 SUV Rolls Out, Built on NVIDIA Accelerated Computing and AI

Volvo Cars’ new, fully electric EX90 is making its way from the automaker’s assembly line in Charleston, South Carolina, to dealerships around the U.S.

To ensure its customers benefit from future improvements and advanced safety features and capabilities, the Volvo EX90 is built on the NVIDIA DRIVE Orin system-on-a-chip (SoC), capable of more than 250 trillion operations per second (TOPS).

Running NVIDIA DriveOS, the system delivers high-performance processing in a package that’s literally the size of a postage stamp. This core compute architecture handles all vehicle functions, ranging from enabling safety and driving assistance features to supporting the development of autonomous driving capabilities — all while delivering an excellent user experience.

The state-of-the-art SUV is an intelligent mobile device on wheels, equipped with the automaker’s most advanced sensor suite to date, including radar, lidar, cameras, ultrasonic sensors and more. NVIDIA DRIVE Orin enables real-time, redundant and advanced 360-degree surround-sensor data processing, supporting Volvo Cars’ unwavering commitment to safety.

DRIVE Thor Powering the Next Generation of Volvo Cars

Setting its sights on the future, Volvo Cars also announced plans to migrate to the next-generation NVIDIA DRIVE Thor SoC for its upcoming fleets.

Before the end of the decade, Volvo Cars will move to NVIDIA DRIVE Thor, which boasts 1,000 TOPS —  quadrupling the processing power of a single DRIVE Orin SoC, while improving energy efficiency sevenfold.

The next-generation DRIVE Thor autonomous vehicle processor incorporates the latest NVIDIA Blackwell GPU architecture, helping unlock a new realm of possibilities and capabilities both in and around the car. This advanced platform will facilitate the deployment of safe advanced driver-assistance system (ADAS) and self-driving features — and pave the way for a new era of in-vehicle experiences powered by generative AI.

Highlighting Volvo Cars’ leap to NVIDIA’s next-generation processor, Volvo Cars CEO Jim Rowan noted, “With NVIDIA DRIVE Thor in our future cars, our in-house developed software becomes more scalable across our product lineup, and it helps us to continue to improve the safety in our cars, deliver best-in-class customer experiences — and increase our margins.”

Zenseact Strategic Investment in NVIDIA Technology

Volvo Cars and its software subsidiary, Zenseact, are also investing in NVIDIA DGX systems for AI model training in the cloud, helping ensure that future fleets are equipped with the most advanced and well-tested AI-powered safety features.

Managing the massive amount of data needed to safely train the next generation of AI-enabled vehicles demands data-center-level compute and infrastructure.

NVIDIA DGX systems provide the computational performance essential for training AI models with unprecedented efficiency. Transportation companies use them to speed autonomous technology development in a cost-effective, enterprise-ready and easy-to-deploy way.

Volvo Cars and Zenseact’s AI training hub, based in the Nordics, will use the systems to help catalyze multiple facets of ADAS and autonomous driving software development. A key benefit is the optimization of the data annotation process — a traditionally time-consuming task involving the identification and labeling of objects for classification and recognition.

The cluster of DGX systems will also enable processing of the required data for safety assurance, delivering twice the performance and potentially halving time to market.

“The NVIDIA DGX AI supercomputer will supercharge our AI training capabilities, making this in-house AI training center one of the largest in the Nordics,” said Anders Bell, chief engineering and technology officer at Volvo Cars. “By leveraging NVIDIA technology and setting up the data center, we pave a quick path to high-performing AI, ultimately helping make our products safer and better.”

With NVIDIA technology as the AI brain inside the car and in the cloud, Volvo Cars and Zenseact can deliver safe vehicles that allow customers to drive with peace of mind, wherever the road may lead.

Read More

Manufacturing Intelligence: Deltia AI Delivers Assembly Line Gains With NVIDIA Metropolis and Jetson

Manufacturing Intelligence: Deltia AI Delivers Assembly Line Gains With NVIDIA Metropolis and Jetson

It all started at Berlin’s Merantix venture studio in 2022, when Silviu Homoceanu and Max Fischer agreed AI could play a big role in improving manufacturing. So the two started Deltia.ai, which runs NVIDIA Metropolis vision AI on NVIDIA Jetson AGX Orin modules to measure and help optimize assembly line processes.

Hailing from AI backgrounds, Homoceanu had previously led self-driving software at Volkswagen, while Fischer had founded a startup that helped digitize more than 40 factories.

Deltia, an NVIDIA Metropolis partner, estimates that today its software platform can provide as much as a 20% performance jump on production lines for its customers.

Customers using the Deltia platform include Viessman, a maker of heating pumps, and industrial electronics company ABB, among others. Viessman is running Deltia at 15 stations, and plans to add it to even more lines in the future. Once all lines are linked to Deltia, production managers say that they expect up to a 50% increase in overall productivity.

“We provide our users with a dashboard that is basically the Google Analytics of manufacturing,” said Homoceanu, Deltia’s CTO. “We install these sensors, and two weeks later they get the keys to this dashboard, and the magic happens in the background.”

Capturing Assembly Line Insights for Digital Transformations  

Once the cameras start gathering data on assembly lines, Deltia uses that information to train models on NVIDIA-accelerated computing that can monitor activities on the line. It then uses those models deployed on Jetson AGX Orin modules at the edge to gather operational insights.

These Jetson-based systems continuously monitor the camera streams and extract metadata. This metadata identifies the exact points in time when a product arrives at a specific station, when it is being worked on and when it leaves the station. This digital information is available to line managers and process improvement personnel via Deltia’s custom dashboard, helping to identify bottlenecks and accelerate line output.

“TensorRT helps us compress complex AI models to a level where we can serve, in an economical fashion, multiple stations with a single Jetson device,” said Homoceanu.

Tapping Into Jetson Orin for Edge AI-Based Customer Insights 

Beyond identifying quick optimizations, Deltia’s analytics help visualize production flows hour-by-hour. This means that Deltia can send rapid alerts when production slips away from predicted target ranges, and it can continuously track output, cycle times and other critical key performance indicators.

It also helps map how processes flow throughout a factory floor, and it suggests improvements for things like walking routes and shop-floor layouts. One of Deltia’s customers used the platform to identify that materials shelves were too far from workers, which caused unnecessarily long cycle times and limited production. Once the shelves were moved, production went up more than 30%.

Deltia’s applications extend beyond process improvements. The platform can be used to help monitor machine states at a granular level, assisting to predict when machine parts are worn out and recommend preemptive replacements, saving time and money down the line. The platform can also suggest optimizations for energy usage, saving on operational costs and reducing maintenance expenses.

“Our vision is to empower manufacturers with the tools to achieve unprecedented efficiency,” said Fischer, CEO of Deltia.ai. “Seeing our customers experience as much as a 30% increase in productivity with our vision models running on NVIDIA Jetson Orin validates the transformative potential of our technology.”

Deltia is a member of the NVIDIA Inception program for cutting-edge startups.

Learn more about NVIDIA Metropolis and NVIDIA Jetson.

Read More

Hammer Time: Machina Labs’ Edward Mehr on Autonomous Blacksmith Bots and More

Hammer Time: Machina Labs’ Edward Mehr on Autonomous Blacksmith Bots and More

Edward Mehr works where AI meets the anvil.  The company he cofounded, Machina Labs, blends the latest advancements in robotics and AI to form metal into countless shapes for use in defense, aerospace, and more. The company’s applications accelerate design and innovation, enabling rapid iteration and production in days instead of the months required by conventional processes. NVIDIA AI Podcast host Noah Kravitz speaks with Mehr, CEO of Machina Labs, on how the company uses AI to develop the first-ever robotic blacksmith. Its Robotic Craftsman platform integrates seven-axis robots that can shape, scan, trim and drill a wide range of materials — all capabilities made possible through AI.

Time Stamps

1:12: What does Machina Labs do?
3:37: Mehr’s background
8:45: Machina Lab’s manufacturing platform, the Robotic Craftsman
10:39: Machina Lab’s history and how AI plays a role in its work
15:07: The versatility of the Robotic Craftsman
21:48: How the Robotic Craftsman was trained in simulations using AI-generated manufacturing data
28:10: From factory to household — Mehr’s insight on the future of robotic applications

You Might Also Like:

How Two Stanford Students Are Building Robots for Handling Household Chores – Ep. 224

BEHAVIOR-1K is a robot that can perform 1,000 household chores, including picking up fallen objects or cooking. In this episode, Stanford Ph.D. students Chengshu Eric Li and Josiah David Wong discuss the breakthroughs and challenges they experienced while developing BEHAVIOR-1K.

Hittin’ the Sim: NVIDIA’s Matt Cragun on Conditioning Autonomous Vehicles in Simulation – Ep. 185

NVIDIA DRIVE Sim, built on Omniverse, provides a virtual proving ground for AV testing and validation. It’s a highly accurate simulation platform that can enable groundbreaking tools — including synthetic data and neural reconstruction — to build digital twins of driving environments. In this episode, Matt Cragun, senior product manager for AV simulation at NVIDIA, details the origins and inner workings of DRIVE Sim.

NVIDIA’s Liila Torabi Talks the New Era of Robotics Through Isaac Sim – Ep. 147

Robotics are not just limited to the assembly line. Liila Torabi, senior product manager for NVIDIA Isaac Sim, works on making the next generation of robotics possible. In this episode, she discusses the new era of robotics — one driven by making robots smarter through AI.

Art(ificial) Intelligence: Pindar Van Arman Builds Robots That Paint – Ep. 129

Pindar Van Arman is an American artist and roboticist, designing painting robots that explore the intersection of human and computational creativity. He’s built multiple artificially creative robots, the most famous of which being Cloud Painter, which was awarded first place at Robotart 2018. Tune in to hear how Van Arman deconstructs his own artistic process and teaches it to robots.

Subscribe to the AI Podcast

Get the AI Podcast through iTunes, Google Play, Amazon Music, Castbox, DoggCatcher, Overcast, PlayerFM, Pocket Casts, Podbay, PodBean, PodCruncher, PodKicker, Soundcloud, Spotify, Stitcher and TuneIn.

Make the AI Podcast better: Have a few minutes to spare? Fill out this listener survey.

Read More

Do the Math: New RTX AI PC Hardware Delivers More AI, Faster

Do the Math: New RTX AI PC Hardware Delivers More AI, Faster

Editor’s note: This post is part of the AI Decoded series, which demystifies AI by making the technology more accessible, and showcases new hardware, software, tools and accelerations for RTX PC users.

At the IFA Berlin consumer electronics and home appliances trade show this week, new RTX AI PCs will be announced, powered by RTX GPUs for advanced AI in gaming, content creation, development and academics and a neural processing unit (NPU) for offloading lightweight AI.

RTX GPUs, built with specialized AI hardware called Tensor Cores, provide the compute performance needed to run the latest and most demanding AI models. They now accelerate more than 600 AI-enabled games and applications, with more than 100 million GeForce RTX and NVIDIA RTX GPUs in users’ hands worldwide.

Since the launch of NVIDIA DLSS — the first widely deployed PC AI technology — more than five years ago, on-device AI has expanded beyond gaming to livestreaming, content creation, software development, productivity and STEM use cases.

Accelerating AI 

AI boils down to massive matrix multiplication — in other words, incredibly complex math. CPUs can do math, but, as serial processors, they can only perform one operation per CPU core at a time. This makes them far too slow for practical use with AI.

GPUs, on the other hand, are parallel processors, performing multiple operations at once. With hundreds of Tensor Cores each and being optimized for AI, RTX GPUs can accelerate incredibly complex mathematical operations.

RTX-powered systems give users a powerful GPU accelerator for demanding AI workloads in gaming, content creation, software development and STEM subjects. Some also include an NPU, a lightweight accelerator for offloading select low-power workloads.

Local accelerators make AI capabilities always available (even without an internet connection), offer low latency for high responsiveness and increase privacy so that users don’t have to upload sensitive materials to an online database before they become usable by an AI model.

Advanced Processing Power

NVIDIA powers much of the world’s AI — from data center to the edge to an install base of over 100 million PCs worldwide.

The GeForce RTX and NVIDIA RTX GPUs found in laptops, desktops and workstations share the same architecture as cloud servers and provide up to 686 AI trillion operations operations per second (TOPS) across the GeForce RTX 40 Series Laptop GPU lineup.

RTX GPUs unlock top-tier performance and power a wider range of AI and generative AI than systems with just an integrated system-on-a-chip (SoC).

“Many projects, especially within Windows, are built for and expect to run on NVIDIA cards. In addition to the wide software support base, NVIDIA GPUs also have an advantage in terms of raw performance.” — Jon Allman, industry analyst at Puget Systems

Gamers can use DLSS for AI-enhanced performance and can look forward to NVIDIA ACE digital human technology for next-generation in-game experiences. Creators can use AI-accelerated video and photo editing tools, asset generators, AI denoisers and more. Everyday users can tap RTX Video Super Resolution and RTX Video HDR for improved video quality, and NVIDIA ChatRTX and NVIDIA Broadcast for productivity improvements. And developers can use RTX-powered coding and debugging tools, as well as the NVIDIA RTX AI Toolkit to build and deploy AI-enabled apps for RTX.

Large language models — like Google’s Gemma, Meta’s Llama and Microsoft’s Phi — all run faster on RTX AI PCs, as systems with GPUs load LLMs into VRAM. Add in NVIDIA TensorRT-LLM acceleration and RTX GPUs can run LLMs 10-100x faster than on CPUs.

New RTX AI PCs Available Now

New systems from ASUS and MSI are now shipping with up to GeForce RTX 4070 Laptop GPUs — delivering up to 321 AI TOPS of performance — and power-efficient SoCs with Windows 11 AI PC capabilities. Windows 11 AI PCs will receive a free update to Copilot+ PC experiences when available.

ASUS’ Zephyrus G16 comes with up to a GeForce RTX 4070 Laptop GPU to supercharge photo and video editing, image generation and coding, while game-enhancing features like DLSS create additional high-quality frames and improve image quality. The 321 TOPS of local AI processing power available from the GeForce RTX 4070 Laptop GPU enables multiple AI applications to run simultaneously, changing the way gamers, creators and engineers work and play.

The ASUS ProArt P16 is the first AI PC built for advanced AI workflows across creativity, gaming, productivity and more. Its GeForce RTX 4070 Laptop GPU provides creatives with RTX AI acceleration in top 2D, 3D, video editing and streaming apps. The ASUS ProArt P13 also comes with state-of-the-art graphics and an OLED touchscreen for ease of creation. Both laptops also come NVIDIA Studio-validated, enabling and accelerating your creativity.

The MSI Stealth A16 AI+ features the latest GeForce RTX 40 Series Laptop GPUs, delivering up to 321 AI TOPS with a GeForce RTX 4070 Laptop GPU. This fast and intelligent AI-powered PC is designed to excel in gaming, creation and productivity, offering access to next-level technology.

These laptops join hundreds of RTX AI PCs available today from top manufacturers, with support for the 600+ AI applications and games accelerated by RTX.

Generative AI is transforming graphics and interactive experiences of all kinds. Make sense of what’s new and what’s next by subscribing to the AI Decoded newsletter.

Read More

From RAG to Richness: Startup Uplevels Retrieval-Augmented Generation for Enterprises

From RAG to Richness: Startup Uplevels Retrieval-Augmented Generation for Enterprises

Well before OpenAI upended the technology industry with its release of ChatGPT in the fall of 2022, Douwe Kiela already understood why large language models, on their own, could only offer partial solutions for key enterprise use cases.

The young Dutch CEO of Contextual AI had been deeply influenced by two seminal papers from Google and OpenAI, which together outlined the recipe for creating fast, efficient transformer-based generative AI models and LLMs.

Soon after those papers were published in 2017 and 2018, Kiela and his team of AI researchers at Facebook, where he worked at that time, realized LLMs would face profound data freshness issues.

They knew that when foundation models like LLMs were trained on massive datasets, the training not only imbued the model with a metaphorical “brain” for “reasoning” across data. The training data also represented the entirety of a model’s knowledge that it could draw on to generate answers to users’ questions.

Kiela’s team realized that, unless an LLM could access relevant real-time data in an efficient, cost-effective way, even the smartest LLM wouldn’t be very useful for many enterprises’ needs.

So, in the spring of 2020, Kiela and his team published a seminal paper of their own, which introduced the world to retrieval-augmented generation. RAG, as it’s commonly called, is a method for continuously and cost-effectively updating foundation models with new, relevant information, including from a user’s own files and from the internet. With RAG, an LLM’s knowledge is no longer confined to its training data, which makes models far more accurate, impactful and relevant to enterprise users.

Today, Kiela and Amanpreet Singh, a former colleague at Facebook, are the CEO and CTO of Contextual AI, a Silicon Valley-based startup, which recently closed an $80 million Series A round, which included NVIDIA’s investment arm, NVentures. Contextual AI is also a member of NVIDIA Inception, a program designed to nurture startups. With roughly 50 employees, the company says it plans to double in size by the end of the year.

The platform Contextual AI offers is called RAG 2.0. In many ways, it’s an advanced, productized version of the RAG architecture Kiela and Singh first described in their 2020 paper.

RAG 2.0 can achieve roughly 10x better parameter accuracy and performance over competing offerings, Kiela says.

That means, for example, that a 70-billion-parameter model that would typically require significant compute resources could instead run on far smaller infrastructure, one built to handle only 7 billion parameters without sacrificing accuracy. This type of optimization opens up edge use cases with smaller computers that can perform at significantly higher-than-expected levels.

“When ChatGPT happened, we saw this enormous frustration where everybody recognized the potential of LLMs, but also realized the technology wasn’t quite there yet,” explained Kiela. “We knew that RAG was the solution to many of the problems. And we also knew that we could do much better than what we outlined in the original RAG paper in 2020.”

Integrated Retrievers and Language Models Offer Big Performance Gains 

The key to Contextual AI’s solutions is its close integration of its retriever architecture, the “R” in RAG, with an LLM’s architecture, which is the generator, or “G,” in the term. The way RAG works is that a retriever interprets a user’s query, checks various sources to identify relevant documents or data and then brings that information back to an LLM, which reasons across this new information to generate a response.

Since around 2020, RAG has become the dominant approach for enterprises that deploy LLM-powered chatbots. As a result, a vibrant ecosystem of RAG-focused startups has formed.

One of the ways Contextual AI differentiates itself from competitors is by how it refines and improves its retrievers through back propagation, a process of adjusting algorithms — the weights and biases — underlying its neural network architecture.

And, instead of training and adjusting two distinct neural networks, that is, the retriever and the LLM, Contextual AI offers a unified state-of-the-art platform, which aligns the retriever and language model, and then tunes them both through back propagation.

Synchronizing and adjusting weights and biases across distinct neural networks is difficult, but the result, Kiela says, leads to tremendous gains in precision, response quality and optimization. And because the retriever and generator are so closely aligned, the responses they create are grounded in common data, which means their answers are far less likely than other RAG architectures to include made up or “hallucinated” data, which a model might offer when it doesn’t “know” an answer.

“Our approach is technically very challenging, but it leads to much stronger coupling between the retriever and the generator, which makes our system far more accurate and much more efficient,” said Kiela.

Tackling Difficult Use Cases With State-of-the-Art Innovations

RAG 2.0 is essentially LLM-agnostic, which means it works across different open-source language models, like Mistral or Llama, and can accommodate customers’ model preferences. The startup’s retrievers were developed using NVIDIA’s Megatron LM on a mix of NVIDIA H100 and A100 Tensor Core GPUs hosted in Google Cloud.

One of the significant challenges every RAG solution faces is how to identify the most relevant information to answer a user’s query when that information may be stored in a variety of formats, such as text, video or PDF.

Contextual AI overcomes this challenge through a “mixture of retrievers” approach, which aligns different retrievers’ sub-specialties with the different formats data is stored in.

Contextual AI deploys a combination of RAG types, plus a neural reranking algorithm, to identify information stored in different formats which, together, are optimally responsive to the user’s query.

For example, if some information relevant to a query is stored in a video file format, then one of the RAGs deployed to identify relevant data would likely be a Graph RAG, which is very good at understanding temporal relationships in unstructured data like video. If other data were stored in a text or PDF format, then a vector-based RAG would simultaneously be deployed.

The neural reranker would then help organize the retrieved data and the prioritized information would then be fed to the LLM to generate an answer to the initial query.

“To maximize performance, we almost never use a single retrieval approach — it’s usually a hybrid because they have different and complementary strengths,” Kiela said. “The exact right mixture depends on the use case, the underlying data and the user’s query.”

By essentially fusing the RAG and LLM architectures, and offering many routes for finding relevant information, Contextual AI offers customers significantly improved performance. In addition to greater accuracy, its offering lowers latency thanks to fewer API calls between the RAG’s and LLM’s neural networks.

Because of its highly optimized architecture and lower compute demands, RAG 2.0 can run in the cloud, on premises or fully disconnected. And that makes it relevant to a wide array of industries, from fintech and manufacturing to medical devices and robotics.

“The use cases we’re focusing on are the really hard ones,” Kiela said. “Beyond reading a transcript, answering basic questions or summarization, we’re focused on the very high-value, knowledge-intensive roles that will save companies a lot of money or make them much more productive.”

Read More

Crystal-Clear Gaming: ‘Visions of Mana’ Sharpens on GeForce NOW

Crystal-Clear Gaming: ‘Visions of Mana’ Sharpens on GeForce NOW

It’s time to mana-fest the spirit of adventure with Square Enix’s highly anticipated action role-playing game, Visions of Mana, launching today in the cloud.

Members can also head to a galaxy far, far away, from the comfort of their homes, with the power of the cloud and Ubisoft’s Star Wars Outlaws, with early access available on GeForce NOW.

Plus, be among the first to get early access to the Call of Duty: Black Ops 6 Open Beta on GeForce NOW without having to wait around for downloads — early access runs Aug. 30-Sept. 4 for those who preorder the game. Call of Duty: Black Ops 6 will join the cloud when the full game is released Oct. 25.

These triple-A titles are part of 26 titles joining the GeForce NOW library of over 2,000 games this week.

Cloudy With a Chance of Mana

Visions of Mana details a whimsical journey through the enchanting world of Mana. Step into the shoes of Val, a curious young man on an epic quest to escort his childhood friend Hina to the sacred Tree of Mana. Along the way, encounter a colorful cast of characters and face off against endearing yet formidable enemies.

Visions of Mana on GeForce NOW
Get ready for mana madness in the cloud.

The game’s combat system blends action and strategy for real-time battles where party members can be switched on the fly. Different party members offer unique skills, such as Val’s powerful sword strikes or magical spells cast by Careena’s dragon companion Ramcoh. Plus, traverse the game’s expansive, semi-open world on Pikuls — adorable, rideable creatures that can also ram through enemies.

Stream the game on an Ultimate or Priority membership for a seamless magical adventure. Experience the lush landscapes and dynamic combat in stunning detail — at up to 4K resolution and 120 frames per second for Ultimate members. GeForce NOW makes it easy to stay connected to the world of Mana whether at home or on the go, ensuring the journey to the Tree of Mana is always within reach.

Join the Galaxy’s Most Wanted

Star Wars Outlaws on GeForce NOW
Make your own destiny in the cloud.

In Star Wars Outlaws — the highly anticipated single-player action-adventure game from Ubisoft — explore the depths of the galaxy’s underworld as part of the beloved Star Wars franchise’s first-ever open-world game, set between the events of The Empire Strikes Back and Return of the Jedi.

Step into the shoes of Kay Vess, a daring scoundrel seeking freedom and adventure. Navigate distinct locations across the galaxy — both iconic and new — and encounter bustling cities, cantinas and sprawling outdoor landscapes. Fight, steal and outwit others while joining the ranks of the galaxy’s most wanted. Plus, play alongside Nix, a loyal companion who helps turn any situation to Kay’s advantage through blaster combat, stealth and clever distractions.

Get ready for high-stakes missions, space dogfights and an ever-changing reputation based on player choices. Unlock the power of the Force with a GeForce NOW Ultimate membership and stream from GeForce RTX 4080 SuperPODs at up to 4K resolution at 120 fps, without the need for upgraded hardware. The AI-powered graphics of NVIDIA DLSS 3.5 with Ray Reconstruction enhance the game for maximum performance, offering the clarity of a Jedi’s vision, and NVIDIA Reflex technology enables unbeatable responsiveness.

Answering the Call

Get ready for the anticipated addition to the Call of Duty franchise — Call of Duty: Black Ops 6. GeForce NOW will support the title’s PC Game Pass, Battle.net and Steam versions in the cloud.

Explore a range of new features, including innovative mechanics such as Omnimovement, which allows players to sprint, slide and dive in any direction to enhance the fluidity of combat. The Black Ops 6 Campaign features a spy action thriller set in the early 90s, a period of transition and upheaval in global politics, characterized by the end of the Cold War and the rise of the United States. With a mind-bending narrative and unbound by the rules of engagement, the title embodies signature Black Ops.

Experience the new gameplay mechanics and features before the title’s official launch with early access to the preorder beta from Aug. 30-Sept. 4 — available for those who preorder the game or have an active PC Game Pass subscription. The Open Beta follows soon after on Sept. 6-9 and will be available for all gamers to hop into the action, even without preordering the game.

GeForce NOW Ultimate members can gain an advantage on the field with ultra low-latency gaming, streaming from GeForce RTX 4080 SuperPODs in the cloud.

WOW, New Games

WoW The World Within on GeForce NOW
Join the underground party from the cloud.

The newest World of Warcraft expansion, The War Within, is available to play from the cloud today. Head to the subterranean realm of Khaz Algar, featuring four new zones, including the Isle of Dorn — home to the Earthen, a newly playable allied race. Experience game additions such as Delves, bite-sized world instances for solo or small-group play, and Warbands, which allow players to manage and share achievements across multiple characters.

In addition, GeForce NOW recently added support for over 25 of the top AddOns from CurseForge, a leading platform for WoW customization, enabling members to explore new adventures under the surface from the cloud.

In addition, members can look for the following:

  • Endzone 2 (New release on Steam, Aug. 26)
  • Age of Mythology: Retold (New release on Steam, Xbox, available on PC Game Pass, Advanced Access on Aug. 27)
  • Core Keeper (New release on Xbox, available on PC Game Pass, Aug. 27)
  • Star Wars Outlaws (New release on Ubisoft Connect, early access Aug. 27)
  • Akimbot (New release on Steam, Aug. 29)
  • Gori: Cuddly Carnage (New release on Steam, Aug. 29)
  • MEMORIAPOLIS (New release on Steam, Aug. 29)
  • Visions of Mana (New release on Steam, Aug. 29)
  • Avatar: Frontiers of Pandora (Steam)
  • Cat Quest III (Steam)
  • Cooking Simulator (Xbox, available on PC Game Pass)
  • Crown Trick (Xbox, available on Microsoft Store)
  • Darksiders Genesis (Xbox, available on Microsoft Store)
  • Expeditions: Rome (Xbox, available on Microsoft Store)
  • Heading Out (Steam)
  • Into the Breach (Xbox, available on Microsoft Store)
  • Iron Harvest (Xbox, available on Microsoft Store)
  • The Knight Witch (Xbox, available on Microsoft Store)
  • Lightyear Frontier (Xbox, available on PC Game Pass)
  • Metro Exodus Enhanced Edition (Xbox, available on Microsoft Store)
  • Outlast 2 (Xbox, available on Microsoft Store)
  • Saturnalia (Steam)
  • SteamWorld Heist II (Steam, Xbox, available on Microsoft Store)
  • This War of Mine (Xbox, available on PC Game Pass)
  • We Were Here Too (Steam)
  • Yoku’s Island Express (Xbox, available on Microsoft Store)

What are you planning to play this weekend? Let us know on X or in the comments below.

 

Read More

NVIDIA Blackwell Sets New Standard for Generative AI in MLPerf Inference Debut

NVIDIA Blackwell Sets New Standard for Generative AI in MLPerf Inference Debut

As enterprises race to adopt generative AI and bring new services to market, the demands on data center infrastructure have never been greater. Training large language models is one challenge, but delivering LLM-powered real-time services is another.

In the latest round of MLPerf industry benchmarks, Inference v4.1, NVIDIA platforms delivered leading performance across all data center tests. The first-ever submission of the upcoming NVIDIA Blackwell platform revealed up to 4x more performance than the NVIDIA H100 Tensor Core GPU on MLPerf’s biggest LLM workload, Llama 2 70B, thanks to its use of a second-generation Transformer Engine and FP4 Tensor Cores.

The NVIDIA H200 Tensor Core GPU delivered outstanding results on every benchmark in the data center category — including the latest addition to the benchmark, the Mixtral 8x7B mixture of experts (MoE) LLM, which features a total of 46.7 billion parameters, with 12.9 billion parameters active per token.

MoE models have gained popularity as a way to bring more versatility to LLM deployments, as they’re capable of answering a wide variety of questions and performing more diverse tasks in a single deployment. They’re also more efficient since they only activate a few experts per inference — meaning they deliver results much faster than dense models of a similar size.

The continued growth of LLMs is driving the need for more compute to process inference requests. To meet real-time latency requirements for serving today’s LLMs, and to do so for as many users as possible, multi-GPU compute is a must. NVIDIA NVLink and NVSwitch provide high-bandwidth communication between GPUs based on the NVIDIA Hopper architecture and provide significant benefits for real-time, cost-effective large model inference. The Blackwell platform will further extend NVLink Switch’s capabilities with larger NVLink domains with 72 GPUs.

In addition to the NVIDIA submissions, 10 NVIDIA partners — ASUSTek, Cisco, Dell Technologies, Fujitsu, Giga Computing, Hewlett Packard Enterprise (HPE), Juniper Networks, Lenovo, Quanta Cloud Technology and Supermicro — all made solid MLPerf Inference submissions, underscoring the wide availability of NVIDIA platforms.

Relentless Software Innovation

NVIDIA platforms undergo continuous software development, racking up performance and feature improvements on a monthly basis.

In the latest inference round, NVIDIA offerings, including the NVIDIA Hopper architecture, NVIDIA Jetson platform and NVIDIA Triton Inference Server, saw leaps and bounds in performance gains.

The NVIDIA H200 GPU delivered up to 27% more generative AI inference performance over the previous round, underscoring the added value customers get over time from their investment in the NVIDIA platform.

Triton Inference Server, part of the NVIDIA AI platform and available with NVIDIA AI Enterprise software, is a fully featured open-source inference server that helps organizations consolidate framework-specific inference servers into a single, unified platform. This helps lower the total cost of ownership of serving AI models in production and cuts model deployment times from months to minutes.

In this round of MLPerf, Triton Inference Server delivered near-equal performance to NVIDIA’s bare-metal submissions, showing that organizations no longer have to choose between using a feature-rich production-grade AI inference server and achieving peak throughput performance.

Going to the Edge

Deployed at the edge, generative AI models can transform sensor data, such as images and videos, into real-time, actionable insights with strong contextual awareness. The NVIDIA Jetson platform for edge AI and robotics is uniquely capable of running any kind of model locally, including LLMs, vision transformers and Stable Diffusion.

In this round of MLPerf benchmarks, NVIDIA Jetson AGX Orin system-on-modules achieved more than a 6.2x throughput improvement and 2.4x latency improvement over the previous round on the GPT-J  LLM workload. Rather than developing for a specific use case, developers can now use this general-purpose 6-billion-parameter model to seamlessly interface with human language, transforming generative AI at the edge.

Performance Leadership All Around

This round of MLPerf Inference showed the versatility and leading performance of NVIDIA platforms — extending from the data center to the edge — on all of the benchmark’s workloads, supercharging the most innovative AI-powered applications and services. To learn more about these results, see our technical blog.

H200 GPU-powered systems are available today from CoreWeave — the first cloud service provider to announce general availability — and server makers ASUS, Dell Technologies, HPE, QTC and Supermicro.

See notice regarding software product information.

Read More