Beyond Tabula Rasa: Reincarnating Reinforcement Learning

Beyond Tabula Rasa: Reincarnating Reinforcement Learning

Reinforcement learning (RL) is an area of machine learning that focuses on training intelligent agents using related experiences so they can learn to solve decision making tasks, such as playing video games, flying stratospheric balloons, and designing hardware chips. Due to the generality of RL, the prevalent trend in RL research is to develop agents that can efficiently learn tabula rasa, that is, from scratch without using previously learned knowledge about the problem. However, in practice, tabula rasa RL systems are typically the exception rather than the norm for solving large-scale RL problems. Large-scale RL systems, such as OpenAI Five, which achieves human-level performance on Dota 2, undergo multiple design changes (e.g., algorithmic or architectural changes) during their developmental cycle. This modification process can last months and necessitates incorporating such changes without re-training from scratch, which would be prohibitively expensive. 

Furthermore, the inefficiency of tabula rasa RL research can exclude many researchers from tackling computationally-demanding problems. For example, the quintessential benchmark of training a deep RL agent on 50+ Atari 2600 games in ALE for 200M frames (the standard protocol) requires 1,000+ GPU days. As deep RL moves towards more complex and challenging problems, the computational barrier to entry in RL research will likely become even higher.

To address the inefficiencies of tabula rasa RL, we present “Reincarnating Reinforcement Learning: Reusing Prior Computation To Accelerate Progress” at NeurIPS 2022. Here, we propose an alternative approach to RL research, where prior computational work, such as learned models, policies, logged data, etc., is reused or transferred between design iterations of an RL agent or from one agent to another. While some sub-areas of RL leverage prior computation, most RL agents are still largely trained from scratch. Until now, there has been no broader effort to leverage prior computational work for the training workflow in RL research. We have also released our code and trained agents to enable researchers to build on this work.

Tabula rasa RL vs. Reincarnating RL (RRL). While tabula rasa RL focuses on learning from scratch, RRL is based on the premise of reusing prior computational work (e.g., prior learned agents) when training new agents or improving existing agents, even in the same environment. In RRL, new agents need not be trained from scratch, except for initial forays into new problems.

Why Reincarnating RL?

Reincarnating RL (RRL) is a more compute and sample-efficient workflow than training from scratch. RRL can democratize research by allowing the broader community to tackle complex RL problems without requiring excessive computational resources. Furthermore, RRL can enable a benchmarking paradigm where researchers continually improve and update existing trained agents, especially on problems where improving performance has real-world impact, such as balloon navigation or chip design. Finally, real-world RL use cases will likely be in scenarios where prior computational work is available (e.g., existing deployed RL policies).

RRL as an alternative research workflow. Imagine a researcher who has trained an agent A1 for some time, but now wants to experiment with better architectures or algorithms. While the tabula rasa workflow requires retraining another agent from scratch, RRL provides the more viable option of transferring the existing agent A1 to another agent and training this agent further, or simply fine-tuning A1.

While there have been some ad hoc large-scale reincarnation efforts with limited applicability, e.g., model surgery in Dota2, policy distillation in Rubik’s cube, PBT in AlphaStar, RL fine-tuning a behavior-cloned policy in AlphaGo / Minecraft, RRL has not been studied as a research problem in its own right. To this end, we argue for developing general-purpose RRL approaches as opposed to prior ad-hoc solutions.

Case Study: Policy to Value Reincarnating RL

Different RRL problems can be instantiated depending on the kind of prior computational work provided. As a step towards developing broadly applicable RRL approaches, we present a case study on the setting of Policy to Value reincarnating RL (PVRL) for efficiently transferring an existing sub-optimal policy (teacher) to a standalone value-based RL agent (student). While a policy directly maps a given environment state (e.g., a game screen in Atari) to an action, value-based agents estimate the effectiveness of an action at a given state in terms of achievable future rewards, which allows them to learn from previously collected data.

For a PVRL algorithm to be broadly useful, it should satisfy the following requirements:

  • Teacher Agnostic: The student shouldn’t be constrained by the existing teacher policy’s architecture or training algorithm.
  • Weaning off the teacher: It is undesirable to maintain dependency on past suboptimal teachers for successive reincarnations.
  • Compute / Sample Efficient: Reincarnation is only useful if it is cheaper than training from scratch.

Given the PVRL algorithm requirements, we evaluate whether existing approaches, designed with closely related goals, will suffice. We find that such approaches either result in small improvements over tabula rasa RL or degrade in performance when weaning off the teacher.

To address these limitations, we introduce a simple method, QDagger, in which the agent distills knowledge from the suboptimal teacher via an imitation algorithm while simultaneously using its environment interactions for RL. We start with a deep Q-network (DQN) agent trained for 400M environment frames (a week of single-GPU training) and use it as the teacher for reincarnating student agents trained on only 10M frames (a few hours of training), where the teacher is weaned off over the first 6M frames. For benchmark evaluation, we report the interquartile mean (IQM) metric from the RLiable library. As shown below for the PVRL setting on Atari games, we find that the QDagger RRL method outperforms prior approaches.

Benchmarking PVRL algorithms on Atari, with teacher-normalized scores aggregated across 10 games. Tabula rasa DQN (–·–) obtains a normalized score of 0.4. Standard baseline approaches include kickstarting, JSRL, rehearsal, offline RL pre-training and DQfD. Among all methods, only QDagger surpasses teacher performance within 10 million frames and outperforms the teacher in 75% of the games.

Reincarnating RL in Practice

We further examine the RRL approach on the Arcade Learning Environment, a widely used deep RL benchmark. First, we take a Nature DQN agent that uses the RMSProp optimizer and fine-tune it with the Adam optimizer to create a DQN (Adam) agent. While it is possible to train a DQN (Adam) agent from scratch, we demonstrate that fine-tuning Nature DQN with the Adam optimizer matches the from-scratch performance using 40x less data and compute.

Reincarnating DQN (Adam) via Fine-Tuning. The vertical separator corresponds to loading network weights and replay data for fine-tuning. Left: Tabula rasa Nature DQN nearly converges in performance after 200M environment frames. Right: Fine-tuning this Nature DQN agent using a reduced learning rate with the Adam optimizer for 20 million frames obtains similar results to DQN (Adam) trained from scratch for 400M frames.

Given the DQN (Adam) agent as a starting point, fine-tuning is restricted to the 3-layer convolutional architecture. So, we consider a more general reincarnation approach that leverages recent architectural and algorithmic advances without training from scratch. Specifically, we use QDagger to reincarnate another RL agent that uses a more advanced RL algorithm (Rainbow) and a better neural network architecture (Impala-CNN ResNet) from the fine-tuned DQN (Adam) agent.

Reincarnating a different architecture / algorithm via QDagger. The vertical separator is the point at which we apply offline pre-training using QDagger for reincarnation. Left: Fine-tuning DQN with Adam. Right: Comparison of a tabula rasa Impala-CNN Rainbow agent (sky blue) to an Impala-CNN Rainbow agent (pink) trained using QDagger RRL from the fine-tuned DQN (Adam). The reincarnated Impala-CNN Rainbow agent consistently outperforms its scratch counterpart. Note that further fine-tuning DQN (Adam) results in diminishing returns (yellow).

Overall, these results indicate that past research could have been accelerated by incorporating a RRL approach to designing agents, instead of re-training agents from scratch. Our paper also contains results on the Balloon Learning Environment, where we demonstrate that RRL allows us to make progress on the problem of navigating stratospheric balloons using only a few hours of TPU-compute by reusing a distributed RL agent trained on TPUs for more than a month.

Discussion

Fairly comparing reincarnation approaches involves using the exact same computational work and workflow. Furthermore, the research findings in RRL that broadly generalize would be about how effective an algorithm is given access to existing computational work, e.g., we successfully applied QDagger developed using Atari for reincarnation on Balloon Learning Environment. As such, we speculate that research in reincarnating RL can branch out in two directions:

  • Standardized benchmarks with open-sourced computational work: Akin to NLP and vision, where typically a small set of pre-trained models are common, research in RRL may also converge to a small set of open-sourced computational work (e.g., pre-trained teacher policies) on a given benchmark.
  • Real-world domains: Since obtaining higher performance has real-world impact in some domains, it incentivizes the community to reuse state-of-the-art agents and try to improve their performance.

See our paper for a broader discussion on scientific comparisons, generalizability and reproducibility in RRL. Overall, we hope that this work motivates researchers to release computational work (e.g., model checkpoints) on which others could directly build. In this regard, we have open-sourced our code and trained agents with their final replay buffers. We believe that reincarnating RL can substantially accelerate research progress by building on prior computational work, as opposed to always starting from scratch.

Acknowledgements

This work was done in collaboration with Pablo Samuel Castro, Aaron Courville and Marc Bellemare. We’d like to thank Tom Small for the animated figure used in this post. We are also grateful for feedback by the anonymous NeurIPS reviewers and several members of the Google Research team, DeepMind and Mila.

Read More

Improving stability and flexibility of ML pipelines at Amazon Packaging Innovation with Amazon SageMaker Pipelines

Improving stability and flexibility of ML pipelines at Amazon Packaging Innovation with Amazon SageMaker Pipelines

To delight customers and minimize packaging waste, Amazon must select the optimal packaging type for billions of packages shipped every year. If too little protection is used for a fragile item such as a coffee mug, the item will arrive damaged and Amazon risks their customer’s trust. Using too much protection will result in increased costs and overfull recycling bins. With hundreds of millions of products available, a scalable decision mechanism is needed to continuously learn from product testing and customer feedback.

To solve these problems, the Amazon Packaging Innovation team developed machine learning (ML) models that classify whether products are suitable for Amazon packaging types such as mailers, bags, or boxes, or could even be shipped with no additional packaging. Previously, the team developed a custom pipeline based on AWS Step Functions to perform weekly training and daily or monthly inference jobs. However, over time the pipeline didn’t provide sufficient flexibility to launch models with new architectures. Development for the new pipelines presented an overhead and required coordination between data scientists and developers. To overcome these difficulties and improve speed of deploying new models and architectures, the team chose to orchestrate model training and inference with Amazon SageMaker Pipelines.

In this post, we discuss the previous orchestration architecture based on Step Functions, outline training and inference architectures using Pipelines, and highlight the flexibility the Amazon Packaging Innovation team achieved.

Challenges of the former ML pipeline at Amazon Packaging Innovation

To incorporate continuous feedback about performance of packages, a new model is trained every week using a growing number of labels. The inference for the entire inventory of products is performed monthly, and a daily inference is performed to deliver just-in-time predictions for the newly added inventory.

To automate the process of training multiple models and provide predictions, the team had developed a custom pipeline based on Step Functions to orchestrate the following steps:

  • Data preparation for training and inference jobs and loading of predictions to the database (Amazon Redshift) with AWS Glue.
  • Model training and inference with Amazon SageMaker.
  • Calculation of model performance metrics on the validation set with AWS Batch.
  • Using Amazon DynamoDB to store model configurations (such as data split ratio for training and validation, model artifact location, model type, and number of instances for training and inference), model performance metrics, and the latest successfully trained model version.
  • Calculation of the differences in the model performance scores, changes in the distribution of the training labels, and comparing the size of the input data between the previous and the new model versions with AWS Lambda functions.
  • Given the large number of steps, the pipeline also required a reliable alarming system at each step to alert the stakeholders of any issues. This was accomplished via a combination of Amazon Simple Queue Service (Amazon SQS) and Amazon Simple Notification Service (Amazon SNS). The alarms were created to notify the business stakeholders, data scientists, and developers about any failed steps and large deviations in the model and data metrics.

After using this solution for nearly 2 years, the team realized that this implementation only worked well for a typical ML workflow where a single model was trained and scored on a validation dataset. However, the solution wasn’t sufficiently flexible for complex models and wasn’t resilient to failures. For example, the architecture didn’t easily accommodate sequential model training. It was difficult to add or remove a step without duplicating the entire pipeline and modifying the infrastructure. Even simple changes in the data processing steps such as adjusting the data split ratio or selecting a different set of features required coordination from both a data scientist and a developer. When the pipeline failed at any step, it had to be restarted from the beginning, which resulted in repeated runs and increased cost. To avoid repeated runs and having to restart from the failed step, the team would create a new copy of an abridged state machine. This troubleshooting led to a proliferation of the state machines, each starting from the commonly failing steps. Finally, if a training job encountered a deviation in the distribution of labels, model score, or number of labels, a data scientist had to review the model and its metrics manually. Then a data scientist would access a DynamoDB table with the model versions and update the table to ensure that the correct model was used for the next inference job.

The maintenance of this architecture required at least one dedicated resource and an additional full-time resource for development. Given the difficulties of expanding the pipeline to accommodate new use cases, the data scientists had begun developing their own workflows, which in turn had led to a growing code base, multiple data tables with similar data schemes, and decentralized model monitoring. Accumulation of these issues had resulted in lower team productivity and increased overhead.

To address these challenges, the Amazon Packaging Innovation team evaluated other existing solutions for MLOps, including SageMaker Pipelines (December 2020 release announcement). Pipelines is a capability of SageMaker for building, managing, automating, and scaling end-to-end ML workflows. Pipelines allows you to reduce the number of steps across the entire ML workflow and is flexible enough to allow data scientists to define a custom ML workflow. It takes care of monitoring and logging the steps. It also comes with a model registry that automatically versions new models. The model registry has built-in approval workflows to select models for inference in production. Pipelines also allows for caching steps called with the same arguments. If a previous run is found, a cache is created, which allows for an easy restart instead of recomputing of the successfully completed steps.

In the evaluation process, Pipelines stood out from the other solutions for its flexibility and availability of features for supporting and expanding current and future workflows. Switching to Pipelines freed up developers’ time from platform maintenance and troubleshooting and redirected attention towards the addition of the new features. In this post, we present the design for training and inference workflows at the Amazon Packaging Innovation team using Pipelines. We also discuss the benefits and the reduction in costs the team realized by switching to Pipelines.

Training pipeline

The Amazon Packaging Innovation team trains models for every package type using a growing number of labels. The following diagram outlines the entire process.

PackagingInnovation-training-architecture

The workflow begins by extracting labels and features from an Amazon Redshift database and unloading the data to Amazon Simple Storage Service (Amazon S3) via a scheduled extract, transform, and load (ETL) job. Along with the input data, a file object with the model type and parameters is placed in the S3 bucket. This file serves as the pipeline trigger via a Lambda function.

The next steps are completely customizable and defined entirely by a data scientist using the SageMaker Python SDK for Pipelines. In the scenario we present in this post, the input data is split into training and validation sets and saved back in an S3 bucket by launching a SageMaker Processing job.

When the data is ready in Amazon S3, a SageMaker training job starts. After the model is successfully trained and created, the model evaluation step is performed on the validation data via a SageMaker batch transform job. The model metrics are then compared to the previous week’s model metrics using a SageMaker Processing job. The team has defined multiple custom criteria for evaluating deviations in the model performance. The model is either rejected or approved based on these criteria. If the model is rejected, the previous approved model is used for the next inference jobs. If the model is approved, its version is registered and that model is used for inference jobs. The stakeholders receive a notification about the outcome via Amazon CloudWatch alarms.

The following screenshot from Amazon SageMaker Studio shows the steps of the training pipeline.

PackagingInnovation-SMP-training

Pipelines tracks each pipeline run, which you can monitor in Studio. Alternatively, you can query the progress of the run using Boto3 or the AWS Command Line Interface (AWS CLI). You can visualize the model metrics in Studio and compare different model versions.

Inference pipeline

The Amazon Packaging Innovation team refreshes predictions for the entire inventory of products monthly. Daily predictions are generated to provide just-in-time packaging recommendations for newly added inventory using the latest trained model. This requires the inference pipeline to run daily with different volumes of data. The following diagram illustrates this workflow.

PackagingInnovation-inference-architecture

Similar to the training pipeline, the inference begins with unloading the data from Amazon Redshift to an S3 bucket. A file object placed in Amazon S3 triggers the Lambda function that initiates the inference pipeline. The features are prepared for inference and the data is split into appropriately sized files using a SageMaker Processing job. Next, the pipeline identifies the latest approved model to run the predictions and load them to an S3 bucket. Finally, the predictions are loaded back to Amazon Redshift using the boto3-data API within the SageMaker Processing job.

The following screenshot from Studio shows the inference pipeline details.

Benefits of choosing to architect ML workflows with SageMaker Pipelines

In this section, we discuss the gains the Amazon Packaging Innovation team realized by switching to Pipelines for model training and inference.

Out-of-the-box production-level MLOps features

While comparing different internal and external solutions for the next ML pipeline solution, a single data scientist was able to prototype and develop a full version of an ML workflow with Pipelines in a Studio Jupyter environment in less than 3 weeks. Even at the prototyping stage, it became clear that Pipelines provided all necessary infrastructure components required for a production level workflow: model versioning, caching, and alarms. Immediate availability of these features meant that no additional time would be spent developing and customizing them. This was a clear demonstration of value, which convinced the Amazon Packaging Innovation team that Pipelines was the right solution.

Flexibility in developing ML models

The biggest gain for the data scientists on the team was the ability to experiment easily and iterate through different models. Regardless of what framework they preferred for their ML work and the number of steps and features it involved, Pipelines accommodated their needs. The data scientists were empowered to experiment without having to wait to get on the software development sprint to add an additional feature or step.

Reduced Costs

The Pipelines capability of SageMaker is free: you pay only for the compute resources and the storage associated with training and inference. However, when thinking about the cost, you need to account not only for the cost of the services used but also the developer hours needed to maintain the workflow, debug, and patch it. Orchestrating with Pipelines is simpler because it consists of fewer pieces and familiar infrastructure. Previously, adding a new feature required at least two people (data scientist and software engineer) at the Amazon Packaging Innovation team to implement it. With the redesigned pipeline, engineering efforts are now directed towards additional custom infrastructure around the pipeline, such as creation of a single repository for tracking of the machine learning code, simplification of the model deployment across AWS accounts, development of the integrated ETL jobs and common reusable functions.

The ability to cache the steps with a similar input also contributed to the reduction in cost, because the teams were less likely to rerun the entire pipeline. Instead, they could easily start it from the point of failure.

Conclusion

The Amazon Packaging Innovation team trains ML models on a monthly basis and regularly updates predictions for the recommended product packaging types. These recommendations helped them achieve multiple team- and company-wide goals by reducing waste and delighting customers with each order. The training and inference pipelines must run reliably on a regular basis yet allow for constant improvement of the models.

Transitioning to Pipelines allowed the team to deploy four new multi-modal model architectures to production under 2 months. Deploying a new model using the previous architecture would have required 5 days (with the same model architecture) to 1 month (with a new model architecture). Deploying the same model using Pipelines enabled the team to reduce the development time to 4 hours with the same model architecture and to 5 days with a new model architecture. That evaluates to a savings of almost 80% of working hours.

Additional resources

For more information, see the following resources:


About the Authors

Ankur-Shukla-authorAnkur Shukla is a Principal Data Scientist at AWS-ProServe based in Palo Alto. Ankur has more than 15 years of consulting experience working directly with the customer and help them solve business problem with technology. He leads multiple global applied science and ML-Ops initiatives within AWS. In his free time, he enjoys reading and spending time with family.

Akash-Singla-authorAkash Singla is a Sr. System Dev Engineer with Amazon Packaging Innovation team. He has more than 17 years of experience solving critical business problems through technology for several business verticals. He currently focuses on upgrading NAWS infrastructure for variety of packaging centric applications to scale them better.

Vitalina-Komashko-authorVitalina Komashko is a Data Scientist with AWS Professional Services. She holds a PhD in Pharmacology and Toxicology but transitioned to data science from experimental work because she wanted “to own data generation and the interpretation of the results”. Earlier in her career she worked with biotech and pharma companies. At AWS she enjoys solving problems for customers from variety of industries and learning about their unique challenges.

Prasanth-Meiyappan-authorPrasanth Meiyappan is an Sr. Applied Scientist with Amazon Packaging Innovation for 4+ years. He has 6+ years of industry experience in machine learning and has shipped products to improve search customer experience and improve customer packaging experience. Prasanth is passionate about sustainability and has a PhD in statistical modeling of climate change.

Matthew-Bales-authorMatthew Bales is a Sr. Research Scientist working to optimize package type selection using customer feedback and machine learning. Prior to Amazon, Matt worked as a post doc performing simulations of particle physics in Germany and in a previous life, a production manager of radioactive medical implant devices in a startup. He holds a Ph.D. in Physics from the University of Michigan.

Read More

Take the Green Train: NVIDIA BlueField DPUs Drive Data Center Efficiency

Take the Green Train: NVIDIA BlueField DPUs Drive Data Center Efficiency

The numbers are in, and they paint a picture of data centers going a deeper shade of green, thanks to energy-efficient networks accelerated with data processing units (DPUs).

A suite of tests run with help from Ericsson, RedHat and VMware show power reductions up to 24% on servers using NVIDIA BlueField-2 DPUs. In one case, they delivered 54x the performance of CPUs.

The work, described in a recent whitepaper, offloaded core networking jobs from power-hungry host processors to DPUs designed to run them more efficiently.

Accelerated computing with DPUs for networking, security and storage jobs is one of the next big steps for making data centers more power efficient. It’s the latest of a handful of optimizations, described in the whitepaper, for data centers moving into the era of green computing.

DPUs Tested on VMware vSphere

Seeing the trend toward energy-efficient networks, VMware enabled DPUs to run its virtualization software, used by thousands of companies worldwide. NVIDIA has run several tests with VMware since its vSphere 8 software release this fall.

For example, on VMware vSphere Distributed Services Engine —  software that offloads and accelerates networking and security functions using DPUs — BlueField-2 delivered higher performance while freeing up 20% of the CPU’s resources required without DPUs.

That means users can deploy fewer servers to run the same workload, or run more applications on the same servers.

Power Costs Cut Nearly $2 Million

Few data centers face a more demanding job than those run by telecoms providers. Their networks shuttle every bit of data that smartphone users generate or request between their cellular networks and the internet.

Researchers at Ericsson tested whether operators could reduce their power consumption on this massive workload using SmartNICs, the network interface cards that handle DPU functions. Their test let CPUs slow down or sleep while an NVIDIA ConnectX SmartNIC handled the networking tasks.

The results, detailed in a recent article, were stunning.

Energy consumption of server CPUs fell 24%, from 190 to 145 watts on a fully loaded network. This single DPU application could cut power costs by nearly $2 million over three years for a large data center.

Ericsson tests with Bluefield DPUs
Ericsson ran the user-plane function for 5G networks on DPUs in three scenarios.

In the article, Ericsson’s CTO, Erik Ekudden, underscored the importance of the work.

“There’s a growing sense of urgency among communication service providers to find and implement innovative solutions that reduce network energy consumption,” he wrote. And the DPU techniques “save energy across a wide range of traffic conditions.”

70% Less Overhead, 54x More Performance

Results were even more dramatic for tests on Red Hat OpenShift, used by half of all Fortune 500 banks, airlines and telcos to manage software containers.

In the tests, BlueField-2 DPUs handled virtualization, encryption and networking jobs needed to manage these portable packages of applications and code.

The DPUs slashed networking demands on CPUs by 70%, freeing them up to run other applications. What’s more, they accelerated networking jobs by a whopping 54x.

A technical blog provides more detail on the tests.

Speeding the Way to Zero Trust

Across every industry, businesses are embracing a philosophy of zero trust to improve network security. So, NVIDIA tested IPsec, one of the most popular data center encryption protocols, on BlueField DPUs.

The test showed data centers could improve performance and cut power consumption 21% for servers and 34% for clients on networks running IPsec on DPUs. For large data centers, that could translate to nearly $9 million in savings on electric bills over three years.

NVIDIA and its partners continue to put DPUs to the test in an expanding portfolio of use cases, but the big picture is clear.

“In a world facing rising energy costs and rising demand for green IT infrastructure, the use of DPUs will become increasingly popular,” the whitepaper concludes.

It’s good to know the numbers, but seeing is believing. So apply to run your own test of DPUs on VMware’s vSphere.

The post Take the Green Train: NVIDIA BlueField DPUs Drive Data Center Efficiency appeared first on NVIDIA Blog.

Read More

Unearthing Data: Vision AI Startup Digs Into Digital Twins for Mining and Construction

Unearthing Data: Vision AI Startup Digs Into Digital Twins for Mining and Construction

Skycatch, a San Francisco-based startup, has been helping companies mine both data and minerals for nearly a decade.

The software-maker is now digging into the creation of digital twins, with an initial focus on the mining and construction industry, using the NVIDIA Omniverse platform for connecting and building custom 3D pipelines.

SkyVerse, which is a part of Skycatch’s vision AI platform, is a combination of computer vision software and custom Omniverse extensions that enables users to enrich and animate virtual worlds of mines and other sites with near-real-time geospatial data.

“With Omniverse, we can turn massive amounts of non-visual data into dynamic visual information that’s easy to contextualize and consume,” said Christian Sanz, founder and CEO of Skycatch. “We can truly recreate the physical world.”

SkyVerse can help industrial sites simulate variables such as weather, broken machines and more up to five years into the future — while learning from happenings up to five years in the past, Sanz said.

The platform automates the entire visualization pipeline for mining and construction environments.

First, it processes data from drones, lidar and other sensors across the environment, whether at the edge using the NVIDIA Jetson platform or in the cloud.

It then creates 3D meshes from 2D images, using neural networks built from NVIDIA’s pretrained models to remove unneeded objects like dump trucks and other equipment from the visualizations.

Next, SkyVerse stitches this into a single 3D model that’s converted to the Universal Scene Description (USD) framework. The master model is then brought into Omniverse Enterprise for the creation of a digital twin that’s live-synced with real-world telemetry data.

“The simulation of machines in the environment, different weather conditions, traffic jams — no other platform has enabled this, but all of it is possible in Omniverse with hyperreal physics and object mass, which is really groundbreaking,” Sanz said.

Skycatch is a Premier partner in NVIDIA Inception, a free, global program that nurtures startups revolutionizing industries with cutting-edge technologies. Premier partners receive additional go-to-market support, exposure to venture capital firms and technical expertise to help them scale faster.

Processing and Visualizing Data

Companies have deployed Skycatch’s fully automated technologies to gather insights from aerial data across tens of thousands of sites at several top mining companies.

The Skycatch team first determines optimal positioning of the data-collection sensors across mine vehicles using the NVIDIA Isaac Sim platform, a robotics simulation and synthetic data generation (SDG) tool for developing, testing and training AI-based robots.

“Isaac Sim has saved us a year’s worth of testing time — going into the field, placing a sensor, testing how it functions and repeating the process,” Sanz said.

The team also plans to integrate the Omniverse Replicator software development kit into SkyVerse to generate physically accurate 3D synthetic data and build SDG tools to accelerate the training of perception networks beyond the robotics domain.

Once data from a site is collected, SkyVerse uses edge devices powered by the NVIDIA Jetson Nano and Jetson AGX Xavier modules to automatically process up to terabytes of it per day and turn it into kilobyte-size analytics that can be easily transferred to frontline users.

This data processing was sped up 3x by the NVIDIA CUDA parallel computing platform, according to Sanz. The team is also looking to deploy the new Jetson Orin modules for next-level performance.

“It’s not humanly possible to go through tens of thousands of images a day and extract critical analytics from them,” Sanz said. “So we’re helping to expand human eyesight with neural networks.”

Using pretrained models from the NVIDIA TAO Toolkit, Skycatch also built neural networks that can remove extraneous objects and vehicles from the visualizations, and texturize over these spots in the 3D mesh.

The digital terrain model, which has sub-five-centimeter precision, can then be brought into Omniverse for the creation of a digital twin using the easily extensible USD framework, custom SkyVerse Omniverse extensions and NVIDIA RTX GPUs.

“It took just around three months to build the Omniverse extensions, despite the complexity of our extensions’ capabilities, thanks to access to technical experts through NVIDIA Inception,” Sanz said.

Skycatch is working with one of Canada’s leading mining companies, Teck Resources, to implement the use of Omniverse-based digital twins for its project sites.

“Teck Resources has been using Skycatch’s compute engine across all of our mine sites globally and is now expanding visualization and simulation capabilities with SkyVerse and our own digital twin strategy,” said Preston Miller, lead of technology and innovation at Teck Resources. “Delivering near-real-time visual data will allow Teck teams to quickly contextualize mine sites and make faster operational decisions on mission-critical, time-sensitive projects.”

The Omniverse extensions built by Skycatch will be available soon — learn more.

Safety and Sustainability

AI-powered data analysis and digital twins can make operational processes for mining and construction companies safer, more sustainable and more efficient.

For example, according to Sanz, mining companies need the ability to quickly locate the toe and crest (or bottom and top) of “benches,” narrow strips of land beside an open-pit mine. When a machine is automated to go in and out of a mine, it must be programmed to stay 10 meters away from the crest at all times to avoid the risk of sliding, Sanz said.

Previously, surveying and analyzing landforms to determine precise toes and crests typically took up to five days. With the help of NVIDIA AI, SkyVerse can now generate this information within minutes, Sanz said.

In addition, SkyVerse eliminates 10,000 open-pit interactions for customers per year, per site, Sanz said. These are situations in which humans and vehicles can intersect within a mine, posing a safety threat.

“At its core, Skycatch’s goal is to provide context and full awareness for what’s going on at a mining or construction site in near-real time — and better environmental context leads to enhanced safety for workers,” Sanz said.

Skycatch aims to boost sustainability efforts for the mining industry, too.

“In addition to mining companies, governmental organizations want visibility into how mines are operating — whether their surrounding environments are properly taken care of — and our platform offers these insights,” Sanz said.

Plus, minerals like cobalt, nickel and lithium are required for electrification and the energy transition. These all come from mine sites, Sanz said, which can become safer and more efficient with the help of SkyVerse’s digital twins and vision AI.

Dive deeper into technology for a sustainable future with Skycatch and other Inception partners in the on-demand webinar, Powering Energy Startup Success With NVIDIA Inception.

Creators and developers across the world can download NVIDIA Omniverse for free, and enterprise teams can use the platform for their 3D projects.

Learn more about and apply to join NVIDIA Inception.

The post Unearthing Data: Vision AI Startup Digs Into Digital Twins for Mining and Construction appeared first on NVIDIA Blog.

Read More

Check Out 26 New Games Streaming on GeForce NOW in November

Check Out 26 New Games Streaming on GeForce NOW in November

It’s a brand new month, which means this GFN Thursday is all about the new games streaming from the cloud.

In November, 26 titles will join the GeForce NOW library. Kick off with 11 additions this week, like Total War: THREE KINGDOMS and new content updates for Genshin Impact and Apex Legends.

Plus, leading 5G provider Rain has announced it will be introducing “GeForce NOW powered by Rain” to South Africa early next year. Look forward to more updates to come.

And don’t miss out on the 40% discount for GeForce NOW 6-month Priority memberships. This offer is only available for a limited time.

Build Your Empire

Lead the charge this week with Creative Assembly and Sega’s Total War: THREE KINGDOMS, a turn-based, empire-building strategy game and the 13th entry in the award-winning Total War franchise. Become one of many great leaders from history and conquer enemies to build a formidable empire.

Total War Three Kingdoms
Resolve an epic conflict in ancient China to unify the country and rebuild the empire.

The game is set in ancient China, and gamers must save the country from the oppressive rule of a warlord. Choose from a cast of a dozen legendary heroic characters to unify the nation and dominate enemies. Each has their own agenda, and there are plenty of different tactics for players to employ.

Extend your campaign with up to six-hour gaming sessions at 1080p 60 frames per second for Priority members. With an RTX 3080 membership, gain support for 1440p 120 FPS streaming and up to 8-hour sessions, with performance that will bring foes to their knees.

Sega Row on GeForce NOW
Members can find ‘Total War: THREE KINGDOMSand other Sega games from a dedicated row in the GeForce NOW app.

More to Explore

Alongside the 11 new games streaming this week, members can jump into updates for the hottest free-to-play titles on GeForce NOW.

Genshin Impact Version 3.2, “Akasha Pulses, the Kalpa Flame Rises,” is available to stream on the cloud. This latest update introduces the last chapter of the Sumeru Archon Quest, two new playable characters — Nahida and Layla — as well as new events and game play. Stream it now from devices, whether PC, Mac, Chromebook or on mobile with enhanced touch controls.

Genshin Impact 3.2 on GeForce NOW
Catch the conclusion of the main storyline for Sumeru, the newest region added to ‘Genshin Impact.’

Or squad up in Apex Legends: Eclipse, available to stream now on the cloud. Season 15 brings with it the new Broken Moon map, the newest defensive Legend — Catalyst — and much more.

Apex Legends on GeForce NOW
Don’t mess-a with Tressa.

Also, after working closely with Square Enix, we’re happy to share that members can stream STAR OCEAN THE DIVINE FORCE on GeForce NOW beginning this week.

Here’s the full list of games joining this week:

  • Against the Storm (Epic Games and New release on Steam)
  • Horse Tales: Emerald Valley Ranch (New release on Steam, Nov. 3)
  • Space Tail: Every Journey Leads Home (New release on Steam, Nov. 3)
  • The Chant (New release on Steam, Nov. 3)
  • The Entropy Centre (New release on Steam, Nov. 3)
  • WRC Generations — The FIA WRC Official Game (New Release on Steam, Nov. 3)
  • Filament (Free on Epic Games, Nov. 3-10)
  • STAR OCEAN THE DIVINE FORCE (Steam)
  • PAGUI (Steam)
  • RISK: Global Domination (Steam)
  • Total War: THREE KINGDOMS (Steam)

Arriving in November

But wait, there’s more! Among the total 26 games joining GeForce NOW in November is the highly anticipated Warhammer 40,000: Darktide, with support for NVIDIA RTX and DLSS.

Here’s a sneak peak:

  • The Unliving (New release on Steam, Nov. 7)
  • TERRACOTTA (New release on Steam and Epic Games, Nov. 7)
  • A Little to the Left (New Release on Steam, Nov. 8)
  • Yum Yum Cookstar (New Release on Steam, Nov. 11)
  • Nobody — The Turnaround (New release on Steam, Nov. 17)
  • Goat Simulator 3 (New release on Epic Games, Nov. 17)
  • Evil West (New release on Steam, Nov. 22)
  • Colortone: Remixed (New Release on Steam, Nov. 30)
  • Warhammer 40,000: Darktide (New Release on Steam, Nov. 30)
  • Heads Will Roll: Downfall (Steam)
  • Guns Gore and Cannoli 2 (Steam)
  • Hidden Through TIme (Steam)
  • Cave Blazers (Steam)
  • Railgrade (Epic Games)
  • The Legend of Tianding (Steam)

While The Unliving was originally announced in October, the release date of the game shifted to Monday, Nov. 7.

Howlin’ for More

October brought more treats for members. Don’t miss the 14 extra titles added last month. 

With all of these sweet new titles coming to the cloud, getting your game on is as easy as pie. Speaking of pie, we’ve got a question for you. Let us know your answer on Twitter or in the comments below.

The post Check Out 26 New Games Streaming on GeForce NOW in November appeared first on NVIDIA Blog.

Read More

Extending TorchVision’s Transforms to Object Detection, Segmentation & Video tasks

TorchVision is extending its Transforms API! Here is what’s new:

  • You can use them not only for Image Classification but also for Object Detection, Instance & Semantic Segmentation and Video Classification.
  • You can import directly from TorchVision several SoTA data-augmentations such as MixUp, CutMix, Large Scale Jitter and SimpleCopyPaste.
  • You can use new functional transforms for transforming Videos, Bounding Boxes and Segmentation Masks.

The interface remains the same to assist the migration and adoption. The new API is currently in Prototype and we would love to get early feedback from you to improve its functionality. Please reach out to us if you have any questions or suggestions.

Limitations of current Transforms

The stable Transforms API of TorchVision (aka V1) only supports single images. As a result it can only be used for classification tasks:

from torchvision import transforms
trans = transforms.Compose([
   transforms.ColorJitter(contrast=0.5),
   transforms.RandomRotation(30),
   transforms.CenterCrop(480),
])
imgs = trans(imgs)

The above approach doesn’t support Object Detection, Segmentation or Classification transforms that require the use of Labels (such as MixUp & CutMix). This limitation made any non-classification Computer Vision tasks second-class citizens as one couldn’t use the Transforms API to perform the necessary augmentations. Historically this made it difficult to train high-accuracy models using TorchVision’s primitives and thus our Model Zoo lagged by several points from SoTA.

To circumvent this limitation, TorchVision offered custom implementations in its reference scripts that show-cased how one could perform augmentations in each task. Though this practice enabled us to train high accuracy classification, object detection & segmentation models, it was a hacky approach which made those transforms impossible to import from the TorchVision binary.

The new Transforms API

The Transforms V2 API supports videos, bounding boxes, labels and segmentation masks meaning that it offers native support for many Computer Vision tasks. The new solution is a drop-in replacement:

from torchvision.prototype import transforms
# Exactly the same interface as V1:
trans = transforms.Compose([
    transforms.ColorJitter(contrast=0.5),
    transforms.RandomRotation(30),
    transforms.CenterCrop(480),
])
imgs, bboxes, labels = trans(imgs, bboxes, labels)

The new Transform Classes can receive any arbitrary number of inputs without enforcing specific order or structure:

# Already supported:
trans(imgs)  # Image Classification
trans(videos)  # Video Tasks
trans(imgs_or_videos, labels)  # MixUp/CutMix-style Transforms
trans(imgs, bboxes, labels)  # Object Detection
trans(imgs, bboxes, masks, labels)  # Instance Segmentation
trans(imgs, masks)  # Semantic Segmentation
trans({"image": imgs, "box": bboxes, "tag": labels})  # Arbitrary Structure
# Future support:
trans(imgs, bboxes, labels, keypoints)  # Keypoint Detection
trans(stereo_images, disparities, masks)  # Depth Perception
trans(image1, image2, optical_flows, masks)  # Optical Flow

The Transform Classes make sure that they apply the same random transforms to all the inputs to ensure consistent results.

The functional API has been updated to support all necessary signal processing kernels (resizing, cropping, affine transforms, padding etc) for all inputs:

from torchvision.prototype.transforms import functional as F
# High-level dispatcher, accepts any supported input type, fully BC
F.resize(inpt, resize=[224, 224])
# Image tensor kernel
F.resize_image_tensor(img_tensor, resize=[224, 224], antialias=True)
# PIL image kernel
F.resize_image_pil(img_pil, resize=[224, 224], interpolation=BILINEAR)
# Video kernel
F.resize_video(video, resize=[224, 224], antialias=True)
# Mask kernel
F.resize_mask(mask, resize=[224, 224])
# Bounding box kernel
F.resize_bounding_box(bbox, resize=[224, 224], spatial_size=[256, 256])

The API uses Tensor subclassing to wrap input, attach useful meta-data and dispatch to the right kernel. Once the Datasets V2 work is complete, which makes use of TorchData’s Data Pipes, the manual wrapping of input won’t be necessary. For now, users can manually wrap the input by:

from torchvision.prototype import features
imgs = features.Image(images, color_space=ColorSpace.RGB)
vids = features.Video(videos, color_space=ColorSpace.RGB)
masks = features.Mask(target["masks"])
bboxes = features.BoundingBox(target["boxes"], format=BoundingBoxFormat.XYXY, spatial_size=imgs.spatial_size)
labels = features.Label(target["labels"], categories=["dog", "cat"])

In addition to the new API, we now provide importable implementations for several data augmentations that are used in SoTA research such as MixUp, CutMix, Large Scale Jitter, SimpleCopyPaste, AutoAugmentation methods and several new Geometric, Colour and Type Conversion transforms.

The API continues to support both PIL and Tensor backends for Images, single or batched input and maintains JIT-scriptability on the functional API. It allows deferring the casting of images from uint8 to float which can lead to performance benefits. It is currently available in the prototype area of TorchVision and can be imported from the nightly builds. The new API has been verified to achieve the same accuracy as the previous implementation.

Current Limitations

Though the functional API (kernels) remain JIT-scriptable and fully-BC, the Transform Classes, though they offer the same interface, can’t be scripted. This is because they use Tensor Subclassing and receive arbitrary number of inputs which are not supported by JIT. We are currently working to reduce the dispatching overhead of the new API and to improve the speed of existing kernels.

An end-to-end example

Here is an example of the new API using the following image. It works both with PIL images and Tensors:

import PIL
from torchvision import io, utils
from torchvision.prototype import features, transforms as T
from torchvision.prototype.transforms import functional as F
# Defining and wrapping input to appropriate Tensor Subclasses
path = "COCO_val2014_000000418825.jpg"
img = features.Image(io.read_image(path), color_space=features.ColorSpace.RGB)
# img = PIL.Image.open(path)
bboxes = features.BoundingBox(
    [[2, 0, 206, 253], [396, 92, 479, 241], [328, 253, 417, 332],
     [148, 68, 256, 182], [93, 158, 170, 260], [432, 0, 438, 26],
     [422, 0, 480, 25], [419, 39, 424, 52], [448, 37, 456, 62],
     [435, 43, 437, 50], [461, 36, 469, 63], [461, 75, 469, 94],
     [469, 36, 480, 64], [440, 37, 446, 56], [398, 233, 480, 304],
     [452, 39, 463, 63], [424, 38, 429, 50]],
    format=features.BoundingBoxFormat.XYXY,
    spatial_size=F.get_spatial_size(img),
)
labels = features.Label([59, 58, 50, 64, 76, 74, 74, 74, 74, 74, 74, 74, 74, 74, 50, 74, 74])
# Defining and applying Transforms V2
trans = T.Compose(
    [
        T.ColorJitter(contrast=0.5),
        T.RandomRotation(30),
        T.CenterCrop(480),
    ]
)
img, bboxes, labels = trans(img, bboxes, labels)
# Visualizing results
viz = utils.draw_bounding_boxes(F.to_image_tensor(img), boxes=bboxes)
F.to_pil_image(viz).show()

Development milestones and future work

Here is where we are in development:

  • Design API
  • Write Kernels for transforming Videos, Bounding Boxes, Masks and Labels
  • Rewrite all existing Transform Classes (stable + references) on the new API:
    • Image Classification
    • Video Classification
    • Object Detection
    • Instance Segmentation
    • Semantic Segmentation
  • Verify the accuracy of the new API for all supported Tasks and Backends
  • Speed Benchmarks and Performance Optimizations (in progress – planned for Dec)
  • Graduate from Prototype (planned for Q1)
  • Add support of Depth Perception, Keypoint Detection, Optical Flow and more (future)

We are currently in the process of Benchmarking each Transform Class and Functional Kernel in order to measure and improve their performance. The scope includes optimizing existing kernels which will be adopted from V1. Early findings indicate that some improvements might need to be upstreamed on the C++ kernels of PyTorch Core. Our plan is to continue iterating throughout Q4 to improve the speed performance of the new API and enhance it with additional SoTA transforms with the help of the community.

We would love to get early feedback from you to improve its functionality. Please reach out to us if you have any questions or suggestions.

Read More

In machine learning, synthetic data can offer real performance improvements

In machine learning, synthetic data can offer real performance improvements

Teaching a machine to recognize human actions has many potential applications, such as automatically detecting workers who fall at a construction site or enabling a smart home robot to interpret a user’s gestures.

To do this, researchers train machine-learning models using vast datasets of video clips that show humans performing actions. However, not only is it expensive and laborious to gather and label millions or billions of videos, but the clips often contain sensitive information, like people’s faces or license plate numbers. Using these videos might also violate copyright or data protection laws. And this assumes the video data are publicly available in the first place — many datasets are owned by companies and aren’t free to use.

So, researchers are turning to synthetic datasets. These are made by a computer that uses 3D models of scenes, objects, and humans to quickly produce many varying clips of specific actions — without the potential copyright issues or ethical concerns that come with real data.

But are synthetic data as “good” as real data? How well does a model trained with these data perform when it’s asked to classify real human actions? A team of researchers at MIT, the MIT-IBM Watson AI Lab, and Boston University sought to answer this question. They built a synthetic dataset of 150,000 video clips that captured a wide range of human actions, which they used to train machine-learning models. Then they showed these models six datasets of real-world videos to see how well they could learn to recognize actions in those clips.

The researchers found that the synthetically trained models performed even better than models trained on real data for videos that have fewer background objects.

This work could help researchers use synthetic datasets in such a way that models achieve higher accuracy on real-world tasks. It could also help scientists identify which machine-learning applications could be best-suited for training with synthetic data, in an effort to mitigate some of the ethical, privacy, and copyright concerns of using real datasets.

“The ultimate goal of our research is to replace real data pretraining with synthetic data pretraining. There is a cost in creating an action in synthetic data, but once that is done, then you can generate an unlimited number of images or videos by changing the pose, the lighting, etc. That is the beauty of synthetic data,” says Rogerio Feris, principal scientist and manager at the MIT-IBM Watson AI Lab, and co-author of a paper detailing this research.

The paper is authored by lead author Yo-whan “John” Kim ’22; Aude Oliva, director of strategic industry engagement at the MIT Schwarzman College of Computing, MIT director of the MIT-IBM Watson AI Lab, and a senior research scientist in the Computer Science and Artificial Intelligence Laboratory (CSAIL); and seven others. The research will be presented at the Conference on Neural Information Processing Systems.   

Building a synthetic dataset

The researchers began by compiling a new dataset using three publicly available datasets of synthetic video clips that captured human actions. Their dataset, called Synthetic Action Pre-training and Transfer (SynAPT), contained 150 action categories, with 1,000 video clips per category.

They selected as many action categories as possible, such as people waving or falling on the floor, depending on the availability of clips that contained clean video data.

Once the dataset was prepared, they used it to pretrain three machine-learning models to recognize the actions. Pretraining involves training a model for one task to give it a head-start for learning other tasks. Inspired by the way people learn — we reuse old knowledge when we learn something new — the pretrained model can use the parameters it has already learned to help it learn a new task with a new dataset faster and more effectively.

They tested the pretrained models using six datasets of real video clips, each capturing classes of actions that were different from those in the training data.

The researchers were surprised to see that all three synthetic models outperformed models trained with real video clips on four of the six datasets. Their accuracy was highest for datasets that contained video clips with “low scene-object bias.”

Low scene-object bias means that the model cannot recognize the action by looking at the background or other objects in the scene — it must focus on the action itself. For example, if the model is tasked with classifying diving poses in video clips of people diving into a swimming pool, it cannot identify a pose by looking at the water or the tiles on the wall. It must focus on the person’s motion and position to classify the action.

“In videos with low scene-object bias, the temporal dynamics of the actions is more important than the appearance of the objects or the background, and that seems to be well-captured with synthetic data,” Feris says.

“High scene-object bias can actually act as an obstacle. The model might misclassify an action by looking at an object, not the action itself. It can confuse the model,” Kim explains.

Boosting performance

Building off these results, the researchers want to include more action classes and additional synthetic video platforms in future work, eventually creating a catalog of models that have been pretrained using synthetic data, says co-author Rameswar Panda, a research staff member at the MIT-IBM Watson AI Lab.

“We want to build models which have very similar performance or even better performance than the existing models in the literature, but without being bound by any of those biases or security concerns,” he adds.

They also want to combine their work with research that seeks to generate more accurate and realistic synthetic videos, which could boost the performance of the models, says SouYoung Jin, a co-author and CSAIL postdoc. She is also interested in exploring how models might learn differently when they are trained with synthetic data.

“We use synthetic datasets to prevent privacy issues or contextual or social bias, but what does the model actually learn? Does it learn something that is unbiased?” she says.

Now that they have demonstrated this use potential for synthetic videos, they hope other researchers will build upon their work.

“Despite there being a lower cost to obtaining well-annotated synthetic data, currently we do not have a dataset with the scale to rival the biggest annotated datasets with real videos. By discussing the different costs and concerns with real videos, and showing the efficacy of synthetic data, we hope to motivate efforts in this direction,” adds co-author Samarth Mishra, a graduate student at Boston University (BU).

Additional co-authors include Hilde Kuehne, professor of computer science at Goethe University in Germany and an affiliated professor at the MIT-IBM Watson AI Lab; Leonid Karlinsky, research staff member at the MIT-IBM Watson AI Lab; Venkatesh Saligrama, professor in the Department of Electrical and Computer Engineering at BU; and Kate Saenko, associate professor in the Department of Computer Science at BU and a consulting professor at the MIT-IBM Watson AI Lab.

This research was supported by the Defense Advanced Research Projects Agency LwLL, as well as the MIT-IBM Watson AI Lab and its member companies, Nexplore and Woodside.

Read More