NVIDIA Unveils Isaac Nova Orin to Accelerate Development of Autonomous Mobile Robots

Next time socks, cereal or sandpaper shows up in hours delivered to your doorstep, consider the behind-the-scenes logistics acrobatics that help get them there so fast.

Order fulfillment is a massive industry of moving parts. Heavily supported by autonomous mobile robots (AMRs), warehouses can span 1 million square feet, expanding and reconfiguring to meet demands. It’s an obstacle course of workers and bottlenecks for hospitals, retailers, airports, manufacturers and others.

To accelerate development of these AMRs, we’ve introduced Isaac Nova Orin, a state-of-the-art compute and sensor reference platform. It’s built on the powerful new NVIDIA Jetson AGX Orin edge AI system, available today. The platform includes the latest sensor technologies and high-performance AI compute capability.

New Isaac Software Arrives for AMR Ecosystem

In addition to Nova Orin, which will be available later this year, we’re delivering new software and simulation capabilities to accelerate AMR deployments — including hardware-accelerated modules, or Isaac ROS GEMs, that are essential for enabling robots to visually navigate. That’s key for mobile robots to better perceive their environment to safely avoid obstacles and efficiently plan paths.

New simulation capabilities, available in the NVIDIA Isaac Sim April release, will help save time when building virtual environments to test and train AMRs. Using 3D building blocks, developers can rapidly create realistic complex warehouse scenes and configurations to validate the robot’s performance on a breadth of logistics tasks.

Isaac Nova Orin Key Features 

Nova Orin comes with all of the compute and sensor hardware needed to design, build and test autonomy in AMRs.

Its two Jetson AGX Orin units provide upto 550 TOPS of AI compute for perception, navigation and human-machine interaction. These modules process data in real time from the AMR’s central nervous system — essentially the sensor suite comprising up to six cameras, three lidars and eight ultrasonic sensors.

Nova Orin includes tools necessary to simulate the robot in Isaac Sim on Omniverse, as well as support for numerous ROS software modules designed to accelerate perception and navigation tasks. Tools are also provided for accurately mapping the robots’ environment using NVIDIA DeepMap.

The entire platform is calibrated and tested to work out of the box and give developers valuable time to innovate on new features and capabilities.

3D sensor field of Nova Orin

Enabling the Future

Much is at stake in intralogistics for AMRs, a market expected to top $46 billion by 2030, up from under $8 billion in 2021, according to estimates from ABI Research.

The old method of designing the AMR compute and sensor stack from the ground up is too costly in time and effort. Tapping into an existing platform allows manufacturers to focus on building the right software stack for the right robot application.

Improving productivity for factories and warehouses will depend on AMRs working safely and efficiently side by side at scale. High levels of autonomy driven by 3D perception from  Nova Orin will help drive that revolution.

As AMRs evolve, the need for secure deployment and management of the critical AI software on board is paramount. Over-the-air software management support is already preintegrated in Nova Orin.

Learn more about Nova Orin and the complete Isaac for AMR platform.

 

The post NVIDIA Unveils Isaac Nova Orin to Accelerate Development of Autonomous Mobile Robots appeared first on NVIDIA Blog.

Read More

Driving on Air: Lucid Group Builds Intelligent EVs on NVIDIA DRIVE

Lucid Group may be a newcomer to the electric vehicle market, but its entrance has been grand.

The electric automaker announced at GTC that its current and future fleets are built on NVIDIA DRIVE Hyperion for programmable, intelligent capabilities. By developing on the scalable, software-defined platform, Lucid ensures its vehicles are always at the cutting edge, receiving continuous improvements over the air.

The EV maker recently launched its first vehicle, the Lucid Air, late last year to widespread acclaim. The luxury sedan won MotorTrend’s 2022 Car of the Year, with industry-leading battery range and fast charging.

And Lucid isn’t stopping there — the automaker recently announced Project Gravity, a long-range electric SUV slated for launch in 2024.

A defining feature of the Lucid Air is the DreamDrive Pro advanced driver assistance system — standard on Dream Edition and Grand Touring trims, optional on other models — which leverages the high-performance compute of NVIDIA DRIVE to provide a seamless automated driving experience.

Future-Ready Intelligence

DreamDrive Pro is designed to continuously improve via over-the-air software updates, with the scalable and high-performance AI compute of NVIDIA DRIVE at the center of the system.

It uses a rich suite of 14 cameras, one lidar, five radars and 12 ultrasonics for robust automated driving and intelligent cockpit features.

In addition to a diversity of sensors, Lucid’s dual-rail power system and proprietary Ethernet Ring offer a high degree of redundancy for key systems, such as braking and steering.

“The seamless integration of the software-defined NVIDIA DRIVE platform provides a powerful basis for Lucid to further enhance what DreamDrive can do in the future — all of which can be delivered to vehicles over the air,” said Mike Bell, Senior Vice President of Digital, Lucid Group.

Together, Lucid and NVIDIA will support these intelligent vehicles, enhancing the customer experience with new functions throughout the life of the car.

A DreamDrive Come True

Lucid plans to build on its success in deploying industry-leading electric vehicles, continuing to develop on NVIDIA DRIVE for future generations.

By starting with a programmable, high-performance compute architecture in the Lucid Air, the automaker can take advantage of the scalability of NVIDIA DRIVE and always incorporate the latest AI technology as it expands with more models.

The ability to continuously deliver innovative and exciting features to its vehicles will have Lucid customers driving on air.

The post Driving on Air: Lucid Group Builds Intelligent EVs on NVIDIA DRIVE appeared first on NVIDIA Blog.

Read More

NVIDIA DRIVE Continues Industry Momentum With $11 Billion Pipeline as DRIVE Orin Enters Production

NVIDIA DRIVE Hyperion and DRIVE Orin are gaining ground in the industry.

At NVIDIA GTC, BYD, the world’s second-largest electric vehicle maker, announced it is building its next-generation fleets on the DRIVE Hyperion architecture. This platform, based on DRIVE Orin, is now in production, and powering a wide ecosystem of 25 EV makers building software-defined vehicles on high-performance, energy-efficient AI compute.

A wave of innovative startups also joined the DRIVE Hyperion ecosystem this week, including DeepRoute, Pegasus, UPower and WeRide, while luxury EV maker Lucid Motors announced its automated driving system is built on NVIDIA DRIVE.

All together, this growing ecosystem makes up an automotive pipeline that exceeds $11 billion.

The open DRIVE Hyperion 8 platform allows these companies to individualize this programmable architecture to their needs, leveraging end-to-end solutions to accelerate autonomous driving development.

The NVIDIA DRIVE Orin system-on-a-chip achieves up to  254 trillions of operations per second (TOPS) and is designed to handle the large number of applications and deep neural networks (DNNs) that run simultaneously in autonomous vehicles, with the ability to achieve systematic safety standards such as ISO 26262 ASIL-D.

Together, DRIVE Hyperion and DRIVE Orin act as the nervous system and brain of the vehicle, processing massive amounts of sensor data in real time to safely perceive, plan and act.

The World Leader in NEV

New energy vehicles are disrupting the transportation industry. They’re introducing a novel architecture that is purpose-built for software-defined functionality, enabling continuous improvement and exciting business models.

BYD is an NEV pioneer, leveraging its heritage as a rechargeable battery maker to introduce the world’s first plug-in hybrid, the F3, in 2008.

The F3 became China’s best-selling sedan the following year, and since then BYD has continued to push the limits of what’s possible for alternative powertrains, with more than 780,000 BYD electric vehicles in operation.

And now, it’s adding software-defined to the BYD fleet resume, building its coming generation of EVs on DRIVE Hyperion 8.

These vehicles will feature a programmable compute platform based on DRIVE Orin for intelligent driving and parking.

Even More Intelligent Solutions

In addition to automakers, autonomous driving startups are developing on DRIVE Hyperion to deliver software-defined vehicles.

DeepRoute, a self-driving company building robotaxis, said it is integrating DRIVE Hyperion into its level 4 system. The automotive-grade platform is key to the company’s plans to bring its production-ready vehicles to market next year.

Self-driving startup Pegasus Technology is also developing intelligent driving solutions for taxis, trucks and buses on DRIVE Hyperion to seamlessly operate on complex city roads. Its autonomous system is designed to handle lane changes, busy intersections, roundabouts, highway entrances and exits, and more.

UPower is a startup dedicated to streamlining the EV development process with its Super Board skateboard chassis. This next-generation foundation for electric vehicles will include DRIVE Hyperion for automated and autonomous driving capabilities.

Autonomous driving technology company WeRide has been building self-driving platforms for urban transportation on NVIDIA DRIVE since 2017. During GTC, the startup announced it will develop its coming generation of intelligent driving solutions on DRIVE Hyperion.

As the DRIVE Hyperion ecosystem expands, software-defined transportation will become more prevalent around the world, delivering safer, more efficient driving experiences.

The post NVIDIA DRIVE Continues Industry Momentum With $11 Billion Pipeline as DRIVE Orin Enters Production appeared first on NVIDIA Blog.

Read More

Announcing NVIDIA DRIVE Map: Scalable, Multi-Modal Mapping Engine Accelerates Deployment of Level 3 and Level 4 Autonomy

With a detailed knowledge of the world and everything in it, maps provide the foresight AI uses to make advanced and safe driving decisions.

At his GTC keynote, NVIDIA founder and CEO Jensen Huang introduced NVIDIA DRIVE Map, a multimodal mapping platform designed to enable the highest levels of autonomy while improving safety. It combines the accuracy of DeepMap survey mapping with the freshness and scale of AI-based crowdsourced mapping.

With three localization layers — camera, lidar and radar — DRIVE Map provides the redundancy and versatility required by the most advanced AI drivers.

DRIVE Map will provide survey-level ground truth mapping coverage to 500,000 kilometers of roadway in North America, Europe and Asia by 2024. This map will be continuously updated and expanded with millions of passenger vehicles.

NVIDIA DRIVE Map is available to the entire autonomous vehicle industry.

Multi-Layered 

DRIVE Map contains multiple localization layers of data for use with camera, radar and lidar modalities. The AI driver can localize to each layer of the map independently, providing the diversity and redundancy required for the highest levels of autonomy.

The camera localization layer consists of map attributes such as lane dividers, road markings, road boundaries, traffic lights, signs and poles.

DRIVE Map semantic localization layer

The radar localization layer is an aggregate point cloud of radar returns. It’s particularly useful in poor lighting conditions, which are challenging for cameras, and in poor weather conditions, which are challenging for cameras and lidars.

DRIVE Map radar localization layer

Radar localization is also useful in suburban areas where typical map attributes aren’t available, enabling the AI driver to localize based on surrounding objects that generate a radar return.

The lidar voxel layer provides the most precise and reliable representation of the environment. It builds a 3D representation of the world at 5-centimeter resolution — accuracy impossible to achieve with camera and radar.

DRIVE Map lidar voxel localization layer

Once localized to the map, the AI can use the detailed semantic information provided by the map to plan ahead and safely perform driving decisions.

Best of Both Worlds 

DRIVE Map is built with two map engines — ground truth survey map engine and crowdsourced map engine — to gather and maintain a collective memory of an Earth-scale fleet.

This unique approach combines the best of both worlds, achieving centimeter-level accuracy with dedicated survey vehicles, as well as the freshness and scale that can only be achieved with millions of passenger vehicles continuously updating and expanding the map.

The ground truth engine is based on the DeepMap survey map engine — proven technology that has been developed and verified over the past six years.

The AI-based crowdsource engine gathers map updates from millions of cars, constantly uploading new data to the cloud as the vehicles drive. The data is then aggregated at full fidelity in NVIDIA Omniverse and used to update the map, providing the real-world fleet fresh over-the-air map updates within hours.

DRIVE Map also provides a data interface, DRIVE MapStream, to allow any passenger car that meets the DRIVE Map requirements to continuously update the map using camera, radar and lidar data.

Earth-Scale Digital Twin

In addition to assisting the AI to make the optimal driving decisions, DRIVE Map accelerates AV deployment, from generating ground-truth training data for deep neural network training, as well as for testing and validation.

These workflows are centered on Omniverse, where real-world map data is loaded and stored. Omniverse maintains an Earth-scale representation of the digital twin that is continuously updated and expanded by survey map vehicles and millions of passenger vehicles.

Using automated content generation tools built on Omniverse, the detailed map is converted into a drivable simulation environment that can be used with NVIDIA DRIVE Sim. Features such as road elevation, road markings, islands, traffic signals, signs and vertical posts are accurately replicated at centimeter-level accuracy.

With physically based sensor simulation and domain randomization, AV developers can use the simulated environment to generate training scenarios that aren’t available in real data.

AV developers can also apply scenario generation tools to test AV software on digital twin environments before deploying AV in the real world. Finally, the digital twin provides fleet operators a complete virtual view of where the vehicles are driving in the world, assisting remote operation when needed.

As a highly versatile and scalable platform, DRIVE Map equips the AI driver with the understanding of the world needed to continuously advance autonomous capabilities.

The post Announcing NVIDIA DRIVE Map: Scalable, Multi-Modal Mapping Engine Accelerates Deployment of Level 3 and Level 4 Autonomy appeared first on NVIDIA Blog.

Read More

Introducing NVIDIA DRIVE Hyperion 9: Next-Generation Platform for Software-Defined Autonomous Vehicle Fleets

NVIDIA DRIVE Hyperion is taking software-defined vehicle architectures to the next level.

At his GTC keynote, NVIDIA founder and CEO Jensen Huang announced DRIVE Hyperion 9, the next generation of the open platform for automated and autonomous vehicles. The programmable architecture, slated for 2026 production vehicles, is built on multiple DRIVE Atlan computers to achieve intelligent driving and in-cabin functionality.

DRIVE Hyperion is designed to be compatible across generations, with the same computer form factor and NVIDIA DriveWorks APIs. Partners can leverage current investments in the DRIVE Orin platform and seamlessly migrate to NVIDIA DRIVE Atlan and beyond.

The platform includes the computer architecture, sensor set and full NVIDIA DRIVE Chauffeur and Concierge applications. It is designed to be open and modular, so customers can select what they need. Current-generation systems scale from NCAP to level 3 driving and level 4 parking with advanced AI cockpit capabilities.

Core Compute

DRIVE Hyperion incorporates redundancy into the architecture’s compute.

With the DRIVE Atlan SoC, the next-generation platform will feature more than double the performance of the current DRIVE Orin-based architecture at the same power envelope. This compute is capable of handling level 4 autonomous driving, as well as the convenience and safety features provided by NVIDIA DRIVE Concierge.

DRIVE ​​Atlan is a technical marvel for safe and secure AI computing, fusing all of NVIDIA’s technologies in AI, automotive, robotics, safety and BlueField data centers.

Leveraging NVIDIA’s high-performance GPU architecture, Arm CPU cores and deep learning and computer vision accelerators, it provides ample compute horsepower for redundant and diverse deep neural networks and leaves headroom for developers to continue adding features and improvements.

DRIVE Hyperion is the nervous system of the vehicle, and DRIVE Atlan serves as the brain.

Heightened Sensing

With DRIVE Atlan’s compute performance, DRIVE Hyperion 9 can process even more sensor data as the car drives, improving redundancy and diversity.

This upgraded sensor suite includes surround imaging radar, enhanced cameras with higher frame rates, two additional side lidar and improved undercarriage sensing with better camera and ultrasonic placement.

In total, the DRIVE Hyperion 9 architecture includes 14 cameras, nine radars, three lidars and 20 ultrasonics for automated and autonomous driving, as well as three cameras and one radar for interior occupant sensing.

By incorporating a rich sensor set and high-performance compute, the entire system is architected to the highest levels of functional safety and cybersecurity.

DRIVE Hyperion 9 will begin production in 2026, giving the industry continuous access to the cutting edge in AI technology as it begins to roll out more intelligent transportation.

The post Introducing NVIDIA DRIVE Hyperion 9: Next-Generation Platform for Software-Defined Autonomous Vehicle Fleets appeared first on NVIDIA Blog.

Read More

Siemens Gamesa Taps NVIDIA Digital Twin Platform for Scientific Computing to Accelerate Clean Energy Transition

Siemens Gamesa Renewable Energy is working with NVIDIA to create physics-informed digital twins of wind farms — groups of wind turbines used to produce electricity.

The company has thousands of turbines around the globe that light up schools, homes, hospitals and factories with clean energy. In total they generate over 100 gigawatts of wind power, enough to power nearly 87 million households annually.

Virtual representations of Siemens Gamesa’s wind farms will be built using NVIDIA Omniverse and Modulus, which together comprise NVIDIA’s digital twin platform for scientific computing.

The platform will help Siemens Gamesa achieve quicker calculations to optimize wind farm layouts, which is expected to lead to farms capable of producing up to 20 percent more power than previous designs.

With the global level of annual wind power installations likely to quadruple between 2020 and 2025, it’s more important than ever to maximize the power produced by each turbine.

The global trillion-dollar renewable energy industry is turning to digital twins, like those of Siemens Gamesa’s wind farms — and one of Earth itself — to further climate research and accelerate the clean energy transition.

And the world’s rapid clean energy technology improvements mean that a dollar spent on wind and solar conversion systems today results in 4x more electricity than a dollar spent on the same systems a decade ago. This has tremendous bottom-line implications for the transition towards a greener Earth.

With NVIDIA Modulus, an AI framework for developing physics-informed machine learning models, and Omniverse, a 3D design collaboration and world simulation platform, researchers can now simulate computational fluid dynamics up to 4,000x faster than traditional methods — and view the simulations at high fidelity.

“The collaboration between Siemens Gamesa and NVIDIA has meant a great step forward in accelerating the computational speed and the deployment speed of our latest algorithms development in such a complex field as computational fluid dynamics,” said Sergio Dominguez,  onshore digital portfolio manager at Siemens Gamesa.

Maximizing Wind Power

Adding a turbine next to another on a farm can change the wind flow and create wake effects — that is, decreases in downstream wind speed — which lead to a reduction in the farm’s production of electricity.

Omniverse digital twins of wind farms will help Siemens Gamesa to accurately simulate the effect that a turbine might have on another when placed in close proximity.

Using NVIDIA Modulus and physics-ML models running on GPUs, researchers can now run computational fluid dynamics simulations orders of magnitude faster than with traditional methods, like those based on Reynolds-averaged Navier-Stokes equations or large eddy simulations, which can take over a month to run, even on a 100-CPU cluster.

This up to 4,000x speedup allows the rapid and accurate simulation of wake effects.

Analyzing and minimizing potential wake effects in real time, while simultaneously optimizing wind farms for a variety of other wind and weather scenarios, require hundreds or thousands of iterations and simulation runs, which were traditionally prohibited by time constraints and costs.

NVIDIA Omniverse and Modulus enable accurate simulations of the complex interactions between the turbines, using high-fidelity and high-resolution models that are based on low-resolution inputs.

Learn more about NVIDIA Omniverse and Modulus at GTC, running through March 24.

Watch NVIDIA founder and CEO Jensen Huang’s GTC keynote address.

The post Siemens Gamesa Taps NVIDIA Digital Twin Platform for Scientific Computing to Accelerate Clean Energy Transition appeared first on NVIDIA Blog.

Read More

NVIDIA Unveils Onramp to Hybrid Quantum Computing

We’re working with leaders in quantum computing to build the tools developers will need to program tomorrow’s ultrahigh performance systems.

Today’s high-performance computers are simulating quantum computing jobs at scale and with performance far beyond what’s possible on today’s smaller and error-prone quantum systems. In this way, classical HPC systems are helping quantum researchers chart the right path forward.

As quantum computers improve, researchers share a vision of a hybrid computing model where quantum and classical computers work together, each addressing the challenges they’re best suited to. To be broadly useful, these systems will need a unified programming environment that’s efficient and easy to use.

We’re building this onramp to the future of computing today. Starting with commercially available tools, like NVIDIA cuQuantum, we’re collaborating with IBM, Oak Ridge National Laboratory, Pasqal and many others.

A Common Software Layer

As a first step, we’re developing a new quantum compiler. Called nvq++, it targets the Quantum Intermediate Representation (QIR), a specification of a low-level machine language that quantum and classical computers can use to talk to each other.

Researchers at Oak Ridge National Laboratory, Quantinuum, Quantum Circuits Inc., and others have embraced the QIR Alliance, led by the Linux Foundation. It enables an agnostic programming approach that will deliver the best from both quantum and classical computers.

Researchers at the Oak Ridge National Laboratory will be among the first to use this new software.

Ultimately, we believe the HPC community will embrace this unified programming model for hybrid systems.

Ready-to-Use Quantum Tools

You don’t have to wait for hybrid quantum systems. Any developer can start world-class quantum research today using accelerated computing and our tools.

NVIDIA cuQuantum is now in general release. It runs complex quantum circuit simulations with libraries for tensor networks and state vectors.

And our cuQuantum DGX Appliance, a container with all the components needed to run cuQuantum jobs optimized for NVIDIA DGX A100 systems, is available in beta release.

Researchers are already using these products to tackle real-world challenges.

For example, QC Ware is running quantum chemistry and quantum machine learning algorithms using cuQuantum on the Perlmutter supercomputer at the Lawrence Berkeley National Laboratory. The work aims to advance drug discovery and climate science.

An Expanding Quantum Ecosystem

Our quantum products are supported by an expanding ecosystem of companies.

For example, Xanadu has integrated cuQuantum into PennyLane, an open-source framework for quantum machine learning and quantum chemistry. The Oak Ridge National Lab is using cuQuantum in TNQVM, a framework for tensor network quantum circuit simulations.

In addition, other companies now support cuQuantum in their commercially available quantum simulators and frameworks, such as the Classiq Quantum Algorithm Design platform from Classiq, and Orquestra from Zapata Computing.

They join existing collaborators including Google Quantum AI, IBM, IonQ and Pasqal, that announced support for our software in November.

Learn More at GTC

Register free for this week’s GTC, to hear QC Ware discuss its research on quantum chemistry.

It’s among at least ten sessions on quantum computing at GTC. And to get the big picture, watch NVIDIA CEO Jensen Huang’s GTC keynote here.

The post NVIDIA Unveils Onramp to Hybrid Quantum Computing appeared first on NVIDIA Blog.

Read More

Speed Dialer: How AT&T Rings Up New Opportunities With Data Science

AT&T’s wireless network connects more than 100 million subscribers from the Aleutian Islands to the Florida Keys, spawning a big data sea.

Abhay Dabholkar runs a research group that acts like a lighthouse on the lookout for the best tools to navigate it.

“It’s fun, we get to play with new tools that can make a difference for AT&T’s day-to-day work, and when we give staff the latest and greatest tools it adds to their job satisfaction,” said Dabholkar, a distinguished AI architect who’s been with the company more than a decade.

Recently, the team tested on GPU-powered servers the NVIDIA RAPIDS Accelerator for Apache Spark, software that spreads work across nodes in a cluster.

It processed a month’s worth of mobile data — 2.8 trillion rows of information — in just five hours. That’s 3.3x faster at 60 percent lower cost than any prior test.

A Wow Moment

“It was a wow moment because on CPU clusters it takes more than 48 hours to process just seven days of data — in the past, we had the data but couldn’t use it because it took such a long time to process it,” he said.

Specifically, the test benchmarked what’s called ETL, the extract, transform and load process that cleans up data before it can be used to train the AI models that uncover fresh insights.

“Now we’re thinking GPUs can be used for ETL and all sorts of batch-processing workloads we do in Spark, so we’re exploring other RAPIDS libraries to extend work from feature engineering to ETL and machine learning,” he said.

Today, AT&T runs ETL on CPU servers, then moves data to GPU servers for training. Doing everything in one GPU pipeline can save time and cost, he added.

Pleasing Customers, Speeding Network Design

The savings could show up across a wide variety of use cases.

For example, users could find out more quickly where they get optimal connections, improving customer satisfaction and reducing churn. “We could decide parameters for our 5G towers and antennas more quickly, too,” he said.

Identifying what area in the AT&T fiber footprint to roll out a support truck can require time-consuming geospatial calculations, something RAPIDS and GPUs could accelerate, said Chris Vo, a senior member of the team who supervised the RAPIDS tests.

“We probably get 300-400 terabytes of fresh data a day, so this technology can have incredible impact — reports we generate over two or three weeks could be done in a few hours,” Dabholkar said.

Three Use Cases and Counting

The researchers are sharing their results with members of AT&T’s data platform team.

“We recommend that if a job is taking too long and you have a lot of data, turn on GPUs — with Spark, the same code that runs on CPUs runs on GPUs,” he said.

So far, separate teams have found their own gains across three different use cases; other teams have plans to run tests on their workloads, too.

Dabholkar is optimistic business units will take their test results to production systems.

“We are a telecom company with all sorts of datasets processing petabytes of data daily, and this can significantly improve our savings,” he said.

Other users including the U.S. Internal Revenue Service are on a similar journey. It’s a path many will take given Apache Spark is used by more than 13,000 companies including 80 percent of the Fortune 500.

Register free for GTC to hear AT&T’s Chris Vo talk about his work, learn more about data science at these sessions and hear NVIDIA CEO Jensen Huang’s keynote.

The post Speed Dialer: How AT&T Rings Up New Opportunities With Data Science appeared first on NVIDIA Blog.

Read More

NVIDIA Hopper GPU Architecture Accelerates Dynamic Programming Up to 40x Using New DPX Instructions

The NVIDIA Hopper GPU architecture unveiled today at GTC will accelerate dynamic programming — a problem-solving technique used in algorithms for genomics, quantum  computing, route optimization and more — by up to 40x with new DPX instructions.

An instruction set built into NVIDIA H100 GPUs, DPX will help developers write code to achieve speedups on dynamic programming algorithms in multiple industries, boosting workflows for disease diagnosis, quantum simulation, graph analytics and routing optimizations.

What Is Dynamic Programming? 

Developed in the 1950s, dynamic programming is a popular technique for solving complex problems with two key techniques: recursion and memoization.

Recursion involves breaking a problem down into simpler sub-problems, saving time and computational effort. In memoization, the answers to these sub-problems — which are reused several times when solving the main problem — are stored. Memoization increases efficiency, so sub-problems don’t need to be recomputed when needed later on in the main problem.

DPX instructions accelerate dynamic programming algorithms by up to 7x on an NVIDIA H100 GPU, compared with NVIDIA Ampere architecture-based GPUs. In a node with four NVIDIA H100 GPUs, that acceleration can be boosted even further.

Use Cases Span Healthcare, Robotics, Quantum Computing, Data Science

Dynamic programming is commonly used in many optimization, data processing and omics algorithms. To date, most developers have run these kinds of algorithms on CPUs or FPGAs — but can unlock dramatic speedups using DPX instructions on NVIDIA Hopper GPUs.

Omics 

Omics covers a range of biological fields including genomics (focused on DNA), proteomics (focused on proteins) and transcriptomics (focused on RNA). These fields, which inform the critical work of disease research and drug discovery, all rely on algorithmic analyses that can be sped up with DPX instructions.

For example, the Smith-Waterman and Needleman-Wunsch dynamic programming algorithms are used for DNA sequence alignment, protein classification and protein folding. Both use a scoring method to measure how well genetic sequences from different samples align.

Smith-Waterman produces highly accurate results, but takes more compute resources and time than other alignment methods. By using DPX instructions on a node with four NVIDIA H100 GPUs, scientists can speed this process 35x to achieve real-time processing, where the work of base calling and alignment takes place at the same rate as DNA sequencing.

This acceleration will help democratize genomic analysis in hospitals worldwide, bringing scientists closer to providing patients with personalized medicine.

Route Optimization

Finding the optimal route for multiple moving pieces is essential for autonomous robots moving through a dynamic warehouse, or even a sender transferring data to multiple receivers in a computer network.

To tackle this optimization problem, developers rely on Floyd-Warshall, a dynamic programming algorithm used to find the shortest distances between all pairs of destinations in a map or graph. In a server with four NVIDIA H100 GPUs, Floyd-Warshall acceleration is boosted 40x compared to a traditional dual-socket CPU-only server.

Paired with the NVIDIA cuOpt AI logistics software, this speedup in routing optimization could be used for real-time applications in factories, autonomous vehicles, or mapping and routing algorithms in abstract graphs.

Quantum Simulation

Countless other dynamic programming algorithms could be accelerated on NVIDIA H100 GPUs with DPX instructions. One promising field is quantum computing, where dynamic programming is used in tensor optimization algorithms for quantum simulation. DPX instructions could help developers accelerate the process of identifying the right tensor contraction order.

SQL Query Optimization

Another potential application is in data science. Data scientists working with the SQL programming language often need to perform several “join” operations on a set of tables.  Dynamic programming helps find an optimal order for these joins, often saving orders of magnitude in execution time and thus speeding up SQL queries.

Learn more about the NVIDIA Hopper GPU architecture. Register free for GTC, running online through March 24. And watch the replay of NVIDIA founder and CEO Jensen Huang’s keynote address.

The post NVIDIA Hopper GPU Architecture Accelerates Dynamic Programming Up to 40x Using New DPX Instructions appeared first on NVIDIA Blog.

Read More

H100 Transformer Engine Supercharges AI Training, Delivering Up to 6x Higher Performance Without Losing Accuracy

The largest AI models can require months to train on today’s computing platforms. That’s too slow for businesses.

AI, high performance computing and data analytics are growing in complexity with some models, like large language ones, reaching trillions of parameters.

The NVIDIA Hopper architecture is built from the ground up to accelerate these next-generation AI workloads with massive compute power and fast memory to handle growing networks and datasets.

Transformer Engine, part of the new Hopper architecture, will significantly speed up AI performance and capabilities, and help train large models within days or hours.

Training AI Models With Transformer Engine

Transformer models are the backbone of language models used widely today, such asBERT and GPT-3. Initially developed for natural language processing use cases, their versatility is increasingly being applied to computer vision, drug discovery and more.

However, model size continues to increase exponentially, now reaching trillions of parameters. This is causing training times to stretch into months due to huge amounts of computation, which is impractical for business needs.

Transformer Engine uses 16-bit floating-point precision and a newly added 8-bit floating-point data format combined with advanced software algorithms that will further speed up AI performance and capabilities.

AI training relies on floating-point numbers, which have fractional components, like 3.14. Introduced with the NVIDIA Ampere architecture, the TensorFloat32 (TF32) floating-point format is now the default 32-bit format in the TensorFlow and PyTorch frameworks.

Most AI floating-point math is done using 16-bit “half” precision (FP16), 32-bit “single” precision (FP32) and, for specialized operations, 64-bit “double” precision (FP64). By reducing the math to just eight bits, Transformer Engine makes it possible to train larger networks faster.

When coupled with other new features in the Hopper architecture — like the NVLink Switch system, which provides a direct high-speed interconnect between nodes — H100-accelerated server clusters will be able to train enormous networks that were nearly impossible to train at the speed necessary for enterprises.

Diving Deeper Into Transformer Engine

Transformer Engine uses software and custom NVIDIA Hopper Tensor Core technology designed to accelerate training for models built from the prevalent AI model building block, the transformer. These Tensor Cores can apply mixed FP8 and FP16 formats to dramatically accelerate AI calculations for transformers. Tensor Core operations in FP8 have twice the throughput of 16-bit operations.

The challenge for models is to intelligently manage the precision to maintain accuracy while gaining the performance of smaller, faster numerical formats. Transformer Engine enables this with custom, NVIDIA-tuned heuristics that dynamically choose between FP8 and FP16 calculations and automatically handle re-casting and scaling between these precisions in each layer.

Transformer Engine uses per-layer statistical analysis to determine the optimal precision (FP16 or FP8) for each layer of a model, achieving the best performance while preserving model accuracy.

The NVIDIA Hopper architecture also advances fourth-generation Tensor Cores by tripling the floating-point operations per second compared with prior-generation TF32, FP64, FP16 and INT8 precisions. Combined with Transformer Engine and fourth-generation NVLink, Hopper Tensor Cores enable an order-of-magnitude speedup for HPC and AI workloads.

Revving Up Transformer Engine

Much of the cutting-edge work in AI revolves around large language models like Megatron 530B. The chart below shows the growth of model size in recent years, a trend that is widely expected to continue. Many researchers are already working on trillion-plus parameter models for natural language understanding and other applications, showing an unrelenting appetite for AI compute power.

Growth in natural language understanding models continues at a vigorous pace. Source: Microsoft.

Meeting the demand of these growing models requires a combination of computational power and a ton of high-speed memory. The NVIDIA H100 Tensor Core GPU delivers on both fronts, with the speedups made possible by Transformer Engine to take AI training to the next level.

When combined, these innovations deliver higher throughput and a 9x reduction in time to train, from seven days to just 20 hours:

NVIDIA H100 Tensor Core GPU delivers up to 9x more training throughput compared to previous generation, making it possible to train large models in reasonable amounts of time.

Transformer Engine can also be used for inference without any data format conversions. Previously, INT8 was the go-to precision for optimal inference performance. However, it requires that the trained networks be converted to INT8 as part of the optimization process, something the NVIDIA TensorRT inference optimizer makes easy.

Using models trained with FP8 will allow developers to skip this conversion step altogether and do inference operations using that same precision. And like INT8-formatted networks, deployments using Transformer Engine can run in a much smaller memory footprint.

On Megatron 530B, NVIDIA H100 inference per-GPU throughput is up to 30x higher than NVIDIA A100, with a 1-second response latency, showcasing it as the optimal platform for AI deployments:

Transformer Engine will also increase inference throughput by as much as 30x for low-latency applications.

To learn more about NVIDIA H100 GPU and the Hopper architecture, watch the GTC 2022 keynote from Jensen Huang. Register for GTC 2022 for free to attend sessions with NVIDIA and industry leaders.”

The post H100 Transformer Engine Supercharges AI Training, Delivering Up to 6x Higher Performance Without Losing Accuracy appeared first on NVIDIA Blog.

Read More