Vulkan Fan? Six Reasons to Run It on NVIDIA

Many different platforms, same great performance. That’s why Vulkan is a very big deal.

With the release Tuesday of Vulkan 1.3, NVIDIA continues its unparalleled record of day one driver support for this cross-platform GPU application programming interface for 3D graphics and computing.

Vulkan has been created by experts from across the industry working together at the Khronos Group, an open standards consortium. From the start, NVIDIA has worked to advance this effort. NVIDIA’s Neil Trevett has been Khronos president since its earliest days.

“NVIDIA has consistently been at the forefront of computer graphics with new, enhanced tools, and technologies for developers to create rich game experiences,” said Jon Peddie, president of Jon Peddie Research.

“Their guidance and support for Vulkan 1.3 development, and release of a new compatible driver on day one across NVIDIA GPUs contributes to the successful cross-platform functionality and performance for games and apps this new API will bring,” he said.

With a simpler, thinner driver and efficient CPU multi-threading capabilities, Vulkan has less latency and overhead than alternatives, such as OpenGL or older versions of Direct3D.

If you use Vulkan, NVIDIA GPUs are a no-brainer. Here’s why:

  1. NVIDIA consistently provides industry leadership to evolve new Vulkan functionality and is often the first to make leading-edge computer graphics techniques available to developers. This ensures cutting-edge titles are supported on Vulkan and, by extension, made available to more gamers.
  2. NVIDIA designs hardware to provide the fastest Vulkan performance for your games and applications. For example, NVIDIA GPUs perform up over 30 percent faster than the nearest competition on games such as Doom Eternal with advanced rendering techniques such as ray tracing.
  3. NVIDIA provides the broadest range of Vulkan functionality to ensure you can run the games and apps that you want and need. NVIDIA’s production drivers support advanced features such as ray-tracing and DLSS AI rendering across multiple platforms, including Windows and popular Linux distributions like Ubuntu, Kylin and RHEL.
  4. NVIDIA works hard to be the platform of choice for Vulkan development with tools that are often the first to support the latest Vulkan functionality, encouraging apps and games to be optimized first for NVIDIA. NVIDIA Nsight, our suite of development tools, has integrated support for Vulkan, including debugging and optimizing of applications using full ray-tracing functionality. NVIDIA also provides extensive Vulkan code samples, tutorials and best practice guidance so developers can get the very best performance from their code.
  5. NVIDIA makes Vulkan available across a wider range of platforms and hardware than anyone else for easier cross-platform portability. NVIDIA ships Vulkan on PCs, embedded platforms, automotive and the data center. And gamers enjoy ongoing support of the latest Vulkan API changes with older GPUs.
  6. NVIDIA aims to bulletproof your games with highly reliable game-ready drivers. NVIDIA treats Vulkan as a first-class citizen API with focused development and support. In fact, developers can download our zero-day Vulkan 1.3 drivers right now at https://developer.nvidia.com/vulkan-driver.

Look for more details about our commitment and leadership in Vulkan on NVIDIA’s Vulkan web page. And if you’re not already a member of NVIDIA’s Developer Program, sign up. Developers can download new tools and drivers from NVIDIA for Vulkan 1.3 today. 

The post Vulkan Fan? Six Reasons to Run It on NVIDIA appeared first on The Official NVIDIA Blog.

Read More

Accurate Alpha Matting for Portrait Mode Selfies on Pixel 6

Image matting is the process of extracting a precise alpha matte that separates foreground and background objects in an image. This technique has been traditionally used in the filmmaking and photography industry for image and video editing purposes, e.g., background replacement, synthetic bokeh and other visual effects. Image matting assumes that an image is a composite of foreground and background images, and hence, the intensity of each pixel is a linear combination of the foreground and the background.

In the case of traditional image segmentation, the image is segmented in a binary manner, in which a pixel either belongs to the foreground or background. This type of segmentation, however, is unable to deal with natural scenes that contain fine details, e.g., hair and fur, which require estimating a transparency value for each pixel of the foreground object.

Alpha mattes, unlike segmentation masks, are usually extremely precise, preserving strand-level hair details and accurate foreground boundaries. While recent deep learning techniques have shown their potential in image matting, many challenges remain, such as generation of accurate ground truth alpha mattes, improving generalization on in-the-wild images and performing inference on mobile devices treating high-resolution images.

With the Pixel 6, we have significantly improved the appearance of selfies taken in Portrait Mode by introducing a new approach to estimate a high-resolution and accurate alpha matte from a selfie image. When synthesizing the depth-of-field effect, the usage of the alpha matte allows us to extract a more accurate silhouette of the photographed subject and have a better foreground-background separation. This allows users with a wide variety of hairstyles to take great-looking Portrait Mode shots using the selfie camera. In this post, we describe the technology we used to achieve this improvement and discuss how we tackled the challenges mentioned above.

Portrait Mode effect on a selfie shot using a low-resolution and coarse alpha matte compared to using the new high-quality alpha matte.

Portrait Matting
In designing Portrait Matting, we trained a fully convolutional neural network consisting of a sequence of encoder-decoder blocks to progressively estimate a high-quality alpha matte. We concatenate the input RGB image together with a coarse alpha matte (generated using a low-resolution person segmenter) that is passed as an input to the network. The new Portrait Matting model uses a MobileNetV3 backbone and a shallow (i.e., having a low number of layers) decoder to first predict a refined low-resolution alpha matte that operates on a low-resolution image. Then we use a shallow encoder-decoder and a series of residual blocks to process a high-resolution image and the refined alpha matte from the previous step. The shallow encoder-decoder relies more on lower-level features than the previous MobileNetV3 backbone, focusing on high-resolution structural features to predict final transparency values for each pixel. In this way, the model is able to refine an initial foreground alpha matte and accurately extract very fine details like hair strands. The proposed neural network architecture efficiently runs on Pixel 6 using Tensorflow Lite.

The network predicts a high-quality alpha matte from a color image and an initial coarse alpha matte. We use a MobileNetV3 backbone and a shallow decoder to first predict a refined low-resolution alpha matte. Then we use a shallow encoder-decoder and a series of residual blocks to further refine the initially estimated alpha matte.

Most recent deep learning work for image matting relies on manually annotated per-pixel alpha mattes used to separate the foreground from the background that are generated with image editing tools or green screens. This process is tedious and does not scale for the generation of large datasets. Also, it often produces inaccurate alpha mattes and foreground images that are contaminated (e.g., by reflected light from the background, or “green spill”). Moreover, this does nothing to ensure that the lighting on the subject appears consistent with the lighting in the new background environment.

To address these challenges, Portrait Matting is trained using a high-quality dataset generated using a custom volumetric capture system, Light Stage. Compared with previous datasets, this is more realistic, as relighting allows the illumination of the foreground subject to match the background. Additionally, we supervise the training of the model using pseudo–ground truth alpha mattes from in-the-wild images to improve model generalization, explained below. This ground truth data generation process is one of the key components of this work.

Ground Truth Data Generation
To generate accurate ground truth data, Light Stage produces near-photorealistic models of people using a geodesic sphere outfitted with 331 custom color LED lights, an array of high-resolution cameras, and a set of custom high-resolution depth sensors. Together with Light Stage data, we compute accurate alpha mattes using time-multiplexed lights and a previously recorded “clean plate”. This technique is also known as ratio matting.

This method works by recording an image of the subject silhouetted against an illuminated background as one of the lighting conditions. In addition, we capture a clean plate of the illuminated background. The silhouetted image, divided by the clean plate image, provides a ground truth alpha matte.

Then, we extrapolate the recorded alpha mattes to all the camera viewpoints in Light Stage using a deep learning–based matting network that leverages captured clean plates as an input. This approach allows us to extend the alpha mattes computation to unconstrained backgrounds without the need for specialized time-multiplexed lighting or a clean background. This deep learning architecture was solely trained using ground truth mattes generated using the ratio matting approach.

Computed alpha mattes from all camera viewpoints at the Light Stage.

Leveraging the reflectance field for each subject and the alpha matte generated with our ground truth matte generation system, we can relight each portrait using a given HDR lighting environment. We composite these relit subjects into backgrounds corresponding to the target illumination following the alpha blending equation. The background images are then generated from the HDR panoramas by positioning a virtual camera at the center and ray-tracing into the panorama from the camera’s center of projection. We ensure that the projected view into the panorama matches its orientation as used for relighting. We use virtual cameras with different focal lengths to simulate the different fields-of-view of consumer cameras. This pipeline produces realistic composites by handling matting, relighting, and compositing in one system, which we then use to train the Portrait Matting model.

Composited images on different backgrounds (high-resolution HDR maps) using ground truth generated alpha mattes.

Training Supervision Using In-the-Wild Portraits
To bridge the gap between portraits generated using Light Stage and in-the-wild portraits, we created a pipeline to automatically annotate in-the-wild photos generating pseudo–ground truth alpha mattes. For this purpose, we leveraged the Deep Matting model proposed in Total Relighting to create an ensemble of models that computes multiple high-resolution alpha mattes from in-the-wild images. We ran this pipeline on an extensive dataset of portrait photos captured in-house using Pixel phones. Additionally, during this process we performed test-time augmentation by doing inference on input images at different scales and rotations, and finally aggregating per-pixel alpha values across all estimated alpha mattes.

Generated alpha mattes are visually evaluated with respect to the input RGB image. The alpha mattes that are perceptually correct, i.e., following the subject’s silhouette and fine details (e.g., hair), are added to the training set. During training, both datasets are sampled using different weights. Using the proposed supervision strategy exposes the model to a larger variety of scenes and human poses, improving its predictions on photos in the wild (model generalization).

Estimated pseudo–ground truth alpha mattes using an ensemble of Deep Matting models and test-time augmentation.

Portrait Mode Selfies
The Portrait Mode effect is particularly sensitive to errors around the subject boundary (see image below). For example, errors caused by the usage of a coarse alpha matte keep sharp focus on background regions near the subject boundaries or hair area. The usage of a high-quality alpha matte allows us to extract a more accurate silhouette of the photographed subject and improve foreground-background separation.

Try It Out Yourself
We have made front-facing camera Portrait Mode on the Pixel 6 better by improving alpha matte quality, resulting in fewer errors in the final rendered image and by improving the look of the blurred background around the hair region and subject boundary. Additionally, our ML model uses diverse training datasets that cover a wide variety of skin tones and hair styles. You can try this improved version of Portrait Mode by taking a selfie shot with the new Pixel 6 phones.

Portrait Mode effect on a selfie shot using a coarse alpha matte compared to using the new high quality alpha matte.

Acknowledgments
This work wouldn’t have been possible without Sergio Orts Escolano, Jana Ehmann, Sean Fanello, Christoph Rhemann, Junlan Yang, Andy Hsu, Hossam Isack, Rohit Pandey, David Aguilar, Yi Jinn, Christian Hane, Jay Busch, Cynthia Herrera, Matt Whalen, Philip Davidson, Jonathan Taylor, Peter Lincoln, Geoff Harvey, Nisha Masharani, Alexander Schiffhauer, Chloe LeGendre, Paul Debevec, Sofien Bouaziz, Adarsh Kowdle, Thabo Beeler, Chia-Kai Liang and Shahram Izadi. Special thanks to our photographers James Adamson, Christopher Farro and Cort Muller who took numerous test photographs for us.

Read More

3 Questions: Anuradha Annaswamy on building smart infrastructures

Much of Anuradha Annaswamy’s research hinges on uncertainty. How does cloudy weather affect a grid powered by solar energy? How do we ensure that electricity is delivered to the consumer if a grid is powered by wind and the wind does not blow? What’s the best course of action if a bird hits a plane engine on takeoff? How can you predict the behavior of a cyber attacker?

A senior research scientist in MIT’s Department of Mechanical Engineering, Annaswamy spends most of her research time dealing with decision-making under uncertainty. Designing smart infrastructures that are resilient to uncertainty can lead to safer, more reliable systems, she says.

Annaswamy serves as the director of MIT’s Active Adaptive Control Laboratory. A world-leading expert in adaptive control theory, she was named president of the Institute of Electrical and Electronics Engineers Control Systems Society for 2020. Her team uses adaptive control and optimization to account for various uncertainties and anomalies in autonomous systems. In particular, they are developing smart infrastructures in the energy and transportation sectors.

Using a combination of control theory, cognitive science, economic modeling, and cyber-physical systems, Annaswamy and her team have designed intelligent systems that could someday transform the way we travel and consume energy. Their research includes a diverse range of topics such as safer autopilot systems on airplanes, the efficient dispatch of resources in electrical grids, better ride-sharing services, and price-responsive railway systems.

In a recent interview, Annaswamy spoke about how these smart systems could help support a safer and more sustainable future.

Q: How is your team using adaptive control to make air travel safer?

A: We want to develop an advanced autopilot system that can safely recover the airplane in the event of a severe anomaly — such as the wing becoming damaged mid-flight, or a bird flying into the engine. In the airplane, you have a pilot and autopilot to make decisions. We’re asking: How do you combine those two decision-makers?

The answer we landed on was developing a shared pilot-autopilot control architecture. We collaborated with David Woods, an expert in cognitive engineering at The Ohio State University, to develop an intelligent system that takes the pilot’s behavior into account. For example, all humans have something known as “capacity for maneuver” and “graceful command degradation” that inform how we react in the face of adversity. Using mathematical models of pilot behavior, we proposed a shared control architecture where the pilot and the autopilot work together to make an intelligent decision on how to react in the face of uncertainties. In this system, the pilot reports the anomaly to an adaptive autopilot system that ensures resilient flight control.

Q: How does your research on adaptive control fit into the concept of smart cities?

A: Smart cities are an interesting way we can use intelligent systems to promote sustainability. Our team is looking at ride-sharing services in particular. Services like Uber and Lyft have provided new transportation options, but their impact on the carbon footprint has to be considered. We’re looking at developing a system where the number of passenger-miles per unit of energy is maximized through something called “shared mobility on demand services.” Using the alternating minimization approach, we’ve developed an algorithm that can determine the optimal route for multiple passengers traveling to various destinations.

As with the pilot-autopilot dynamic, human behavior is at play here. In sociology there is an interesting concept of behavioral dynamics known as Prospect Theory. If we give passengers options with regards to which route their shared ride service will take, we are empowering them with free will to accept or reject a route. Prospect Theory shows that if you can use pricing as an incentive, people are much more loss-averse so they would be willing to walk a bit extra or wait a few minutes longer to join a low-cost ride with an optimized route. If everyone utilized a system like this, the carbon footprint of ride-sharing services could decrease substantially.

Q: What other ways are you using intelligent systems to promote sustainability?

A: Renewable energy and sustainability are huge drivers for our research. To enable a world where all of our energy is coming from renewable sources like solar or wind, we need to develop a smart grid that can account for the fact that the sun isn’t always shining and wind isn’t always blowing. These uncertainties are the biggest hurdles to achieving an all-renewable grid. Of course, there are many technologies being developed for batteries that can help store renewable energy, but we are taking a different approach.

We have created algorithms that can optimally schedule distributed energy resources within the grid — this includes making decisions on when to use onsite generators, how to operate storage devices, and when to call upon demand response technologies, all in response to the economics of using such resources and their physical constraints. If we can develop an interconnected smart grid where, for example, the air conditioning setting in a house is set to 72 degrees instead of 69 degrees automatically when demand is high, there could be a substantial savings in energy usage without impacting human comfort. In one of our studies, we applied a distributed proximal atomic coordination algorithm to the grid in Tokyo to demonstrate how this intelligent system could account for the uncertainties present in a grid powered by renewable resources.

Read More

Separating Birdsong in the Wild for Classification

Birds are all around us, and just by listening, we can learn many things about our environment. Ecologists use birds to understand food systems and forest health — for example, if there are more woodpeckers in a forest, that means there’s a lot of dead wood. Because birds communicate and mark territory with songs and calls, it’s most efficient to identify them by ear. In fact, experts may identify up to 10x as many birds by ear as by sight.

In recent years, autonomous recording units (ARUs) have made it easy to capture thousands of hours of audio in forests that could be used to better understand ecosystems and identify critical habitat. However, manually reviewing the audio data is very time consuming, and experts in birdsong are rare. But an approach based on machine learning (ML) has the potential to greatly reduce the amount of expert review needed for understanding a habitat.

However, ML-based audio classification of bird species can be challenging for several reasons. For one, birds often sing over one another, especially during the “dawn chorus” when many birds are most active. Also, there aren’t clear recordings of individual birds to learn from — almost all of the available training data is recorded in noisy outdoor conditions, where other sounds from the wind, insects, and other environmental sources are often present. As a result, existing birdsong classification models struggle to identify quiet, distant and overlapping vocalizations. Additionally, some of the most common species often appear unlabeled in the background of training recordings for less common species, leading models to discount the common species. These difficult cases are very important for ecologists who want to identify endangered or invasive species using automated systems.

To address the general challenge of training ML models to automatically separate audio recordings without access to examples of isolated sounds, we recently proposed a new unsupervised method called mixture invariant training (MixIT) in our paper, “Unsupervised Sound Separation Using Mixture Invariant Training”. Moreover, in our new paper, “Improving Bird Classification with Unsupervised Sound Separation,” we use MixIT training to separate birdsong and improve species classification. We found that including the separated audio in the classification improves precision and classification quality on three independent soundscape datasets. We are also happy to announce the open-source release of the birdsong separation models on GitHub.

Birdsong Audio Separation
MixIT learns to separate single-channel recordings into multiple individual tracks, and can be trained entirely with noisy, real-world recordings. To train the separation model, we create a “mixture of mixtures” (MoM) by mixing together two real-world recordings. The separation model then learns to take the MoM apart into many channels to minimize a loss function that uses the two original real-world recordings as ground-truth references. The loss function uses these references to group the separated channels such that they can be mixed back together to recreate the two original real-world recordings. Since there’s no way to know how the different sounds in the MoM were grouped together in the original recordings, the separation model has no choice but to separate the individual sounds themselves, and thus learns to place each singing bird in a different output audio channel, also separate from wind and other background noise.

We trained a new MixIT separation model using birdsong recordings from Xeno-Canto and the Macaulay Library. We found that for separating birdsong, this new model outperformed a MixIT separation model trained on a large amount of general audio from the AudioSet dataset. We measure the quality of the separation by mixing two recordings together, applying separation, and then remixing the separated audio channels such that they reconstruct the original two recordings. We measure the signal-to-noise ratio (SNR) of the remixed audio relative to the original recordings. We found that the model trained specifically for birds achieved 6.1 decibels (dB) better SNR than the model trained on AudioSet (10.5 dB vs 4.4 dB). Subjectively, we also found many examples where the system worked incredibly well, separating very difficult to distinguish calls in real-world data.

The following videos demonstrate separation of birdsong from two different regions (Caples and the High Sierras). The videos show the mel-spectrogram of the mixed audio (a 2D image that shows the frequency content of the audio over time) and highlight the audio separated into different tracks.

High Sierras
  
Caples

Classifying Bird Species
To classify birds in real-world audio captured with ARUs, we first split the audio into five-second segments and then create a mel-spectrogram of each segment. We then train an EfficientNet classifier to identify bird species from the mel-spectrogram images, training on audio from Xeno-Canto and the Macaulay Library. We trained two separate classifiers, one for species in the Sierra Nevada mountains and one for upstate New York. Note that these classifiers are not trained on separated audio; that’s an area for future improvement.

We also introduced some new techniques to improve classifier training. Taxonomic training asks the classifier to provide labels for each level of the species taxonomy (genus, family, and order), which allows the model to learn groupings of species before learning the sometimes-subtle differences between similar species. Taxonomic training also allows the model to benefit from expert information about the taxonomic relationships between different species. We also found that random low-pass filtering was helpful for simulating distant sounds during training: As an audio source gets further away, the high-frequency parts fade away before the low-frequency parts. This was particularly effective for identifying species from the High Sierras region, where birdsongs cover very long distances, unimpeded by trees.

Classifying Separated Audio
We found that separating audio with the new MixIT model before classification improved the classifier performance on three independent real-world datasets. The separation was particularly successful for identification of quiet and background birds, and in many cases helped with overlapping vocalizations as well.

Top: A mel-spectrogram of two birds, an American pipit (amepip) and gray-crowned rosy finch (gcrfin), from the Sierra Nevadas. The legend shows the log-probabilities for the two species given by the pre-trained classifiers. Higher values indicate more confidence, and values greater than -1.0 are usually correct classifications. Bottom: A mel-spectrogram for the automatically separated audio, with the classifier log probabilities from the separated channels. Note that the classifier only identifies the gcrfin once the audio is separated.
Top: A complex mixture with three vocalizations: A golden-crowned kinglet (gockin), mountain chickadee (mouchi), and Steller’s jay (stejay). Bottom: Separation into three channels, with classifier log probabilities for the three species. We see good visual separation of the Steller’s jay (shown by the distinct pink marks), even though the classifier isn’t sure what it is.

The separation model does have some potential limitations. Occasionally we observe over-separation, where a single song is broken into multiple channels, which can cause misclassifications. We also notice that when multiple birds are vocalizing, the most prominent song often gets a lower score after separation. This may be due to loss of environmental context or other artifacts introduced by separation that do not appear during classifier training. For now, we get the best results by running the classifier on the separated channels and the original audio, and taking the maximum score for each species. We expect that further work will allow us to reduce over-separation and find better ways to combine separation and classification. You can see and hear more examples of the full system at our GitHub repo.

Future Directions
We are currently working with partners at the California Academy of Sciences to understand how habitat and species mix changes after prescribed fires and wildfires, applying these models to ARU audio collected over many years.

We also foresee many potential applications for the unsupervised separation models in ecology, beyond just birds. For example, the separated audio can be used to create better acoustic indices, which could measure ecosystem health by tracking the total activity of birds, insects, and amphibians without identifying particular species. Similar methods could also be adapted for use underwater to track coral reef health.

Acknowledgements
We would like to thank Mary Clapp, Jack Dumbacher, and Durrell Kapan from the California Academy of Sciences for providing extensive annotated soundscapes from the Sierra Nevadas. Stefan Kahl and Holger Klinck from the Cornell Lab of Ornithology provided soundscapes from Sapsucker Woods. Training data for both the separation and classification models came from Xeno-Canto and the Macaulay Library. Finally, we would like to thank Julie Cattiau, Lauren Harrell, Matt Harvey, and our co-author, John Hershey, from the Google Bioacoustics and Sound Separation teams.

Read More

Meta Works with NVIDIA to Build Massive AI Research Supercomputer

Meta Platforms gave a big thumbs up to NVIDIA, choosing our technologies for what it believes will be its most powerful research system to date.

The AI Research SuperCluster (RSC), announced today, is already training new models to advance AI.

Once fully deployed, Meta’s RSC is expected to be the largest customer installation of NVIDIA DGX A100 systems.

“We hope RSC will help us build entirely new AI systems that can, for example, power real-time voice translations to large groups of people, each speaking a different language, so they could seamlessly collaborate on a research project or play an AR game together,” the company said in a blog.

Training AI’s Largest Models

When RSC is fully built out, later this year, Meta aims to use it to train AI models with more than a trillion parameters. That could advance fields such as natural-language processing for jobs like identifying harmful content in real time.

In addition to performance at scale, Meta cited extreme reliability, security, privacy and the flexibility to handle “a wide range of AI models” as its key criteria for RSC.

Meta RSC system
Meta’s AI Research SuperCluster features hundreds of NVIDIA DGX systems linked on an NVIDIA Quantum InfiniBand network to accelerate the work of its AI research teams.

Under the Hood

The new AI supercomputer currently uses 760 NVIDIA DGX A100 systems as its compute nodes. They pack a total of 6,080 NVIDIA A100 GPUs linked on an NVIDIA Quantum 200Gb/s InfiniBand network to deliver 1,895 petaflops of TF32 performance.

Despite challenges from COVID-19, RSC took just 18 months to go from an idea on paper to a working AI supercomputer (shown in the video below) thanks in part to the NVIDIA DGX A100 technology at the foundation of Meta RSC.



20x Performance Gains

It’s the second time Meta has picked NVIDIA technologies as the base for its research infrastructure. In 2017, Meta built the first generation of this infrastructure for AI research with 22,000 NVIDIA V100 Tensor Core GPUs that handles 35,000 AI training jobs a day.

Meta’s early benchmarks showed RSC can train large NLP models 3x faster and run computer vision jobs 20x faster than the prior system.

In a second phase later this year, RSC will expand to 16,000 GPUs that Meta believes will deliver a whopping 5 exaflops of mixed precision AI performance. And Meta aims to expand RSC’s storage system to deliver up to an exabyte of data at 16 terabytes per second.

A Scalable Architecture

NVIDIA AI technologies are available to enterprises of any size.

NVIDIA DGX, which includes a full stack of NVIDIA AI software, scales easily from a single system to a DGX SuperPOD running on-premises or at a colocation provider. Customers can also rent DGX systems through NVIDIA DGX Foundry.

The post Meta Works with NVIDIA to Build Massive AI Research Supercomputer appeared first on The Official NVIDIA Blog.

Read More

How the Intelligent Supply Chain Broke and AI Is Fixing It

Let’s face it, the global supply chain may not be the most scintillating subject matter. Yet in homes and businesses around the world, it’s quickly become the topic du jour: empty shelves; record price increases; clogged ports and sick truckers leading to disruptions near and far.

The business of organizing resources to supply a product or service to its final user feels like it’s never been more challenged by so many variables. Shortages of raw materials, everything from resin and aluminum to paint and semiconductors, are nearing historic levels. Products that do get manufactured sit on cargo ships or in warehouses due to shortages of containers and workers and truck drivers that help deliver them to their final destinations. And consumer pocketbooks and paychecks are getting squeezed by rising prices.

The $9 trillion logistics industry is responding by investing in automation and using AI and big data to gain more insights throughout the supply chain. Big money is being poured into supply-chain technology startups, which raised $24.3 billion in venture funding in the first three quarters of 2021, 58 percent more than the full-year total for 2020, according to analytics firm PitchBook Data Inc.

Investing in AI

Behind these investments, businesses see technology and accelerated computing as key to finding firmer ground. At Manifest 2022, a logistics and supply chain conference taking place in Las Vegas, the industry is discussing how to refine supply chains and create cost efficiencies using AI and machine learning. Among their goals: address labor shortages, improve throughput in distribution centers, and route deliveries more efficiently.

Take a box of cereal. Getting it from the warehouse to a home has never been more expensive. Employee turnover rates of 30 percent to 46 percent in warehouses and distribution centers are just part of the problem.

To mitigate the challenge, Dematic, a global materials-handling company, is evaluating software from companies like Kinetic Vision, which has developed computer vision applications on the NVIDIA AI platform that add intelligence to automated warehouse systems.

Companies like Kinetic Vision and SF Technology use video data from cameras to optimize every step of the package lifecycle, accelerating throughput by up to 20 percent and reducing conveyor downtime, which can cost retailers $3,000 to $5,000 a minute.

Autonomous robot companies such as Gideon, 6 River Systems and Symbotic also use the NVIDIA AI platform to improve distribution center throughput with their autonomous guided vehicles that transport material efficiently within the warehouse or distribution centers.

And with NVIDIA Fleet Command, which securely deploys, manages and scales AI applications via the cloud across distributed edge infrastructure, these solutions can be remotely deployed and managed securely and at scale across hundreds of distribution centers.

Digital Twins and Simulation

Improving layouts of stores and distribution centers also has become key to achieving cost efficiencies. NVIDIA Omniverse, a virtual world simulation and 3D design collaboration platform, makes it possible to virtually design and simulate distribution centers at full fidelity. Users can improve workflows and throughput with photorealistic, physically accurate virtual environments.

Retailers could, for example, develop a solution on the Omniverse platform to design, test and simulate the flow of material and employee processes in digital twins of their distribution centers and then bring those optimizations into the real world.

Digital human simulations could test new workflows for employee ergonomics and productivity. And robots are trained and operated with the NVIDIA Isaac robotics platform, creating the most efficient layout and workflows.

Kinetic Vision is using NVIDIA Omniverse to deliver digital twins technology and simulation to optimize factories and retail and consumer packaged goods distribution centers.

Leaning In

While manufacturers, supply chain operators and retailers each will have their own approaches to solving challenges, they’re leaning in on AI as a key differentiator.

Successfully implementing AI-enabled supply-chain management has enabled early adopters to improve logistics costs by 15 percent, inventory levels by 35 percent and service levels by 65 percent, compared with slower-moving competitors, according to McKinsey.

With some experts predicting the global supply chain won’t return to a new normal until at least 2023, companies are moving to take measures that matter most to the bottom line.

For more on how NVIDIA AI is powering the most innovative AI solutions for the supply chain and logistics industry attend the following talks at Manifest:

  • A fireside chat, “Bringing Agility and Flexibility to Distribution Centers with AI,” on Wednesday, Jan. 26, at 2 p.m. Pacific, in Champagne 4 with Azita Martin, vice president and general manager of AI for retail at NVIDIA, and Michael Larsson, CEO of North America region at Dematic.
  • A presentation “The Next Frontier in Warehouse Intelligence” on the same date, at 11:30 a.m. Pacific, in Champagne 4 with Azita Martin and Omer Rashid, vice president of Solutions Designs at DHL Supply Chain, and Renato Bottiglieri, chief logistics officer at Eggo Kitchen & House.

The post How the Intelligent Supply Chain Broke and AI Is Fixing It appeared first on The Official NVIDIA Blog.

Read More