RLPrompt: Optimizing Discrete Text Prompts with Reinforcement Learning

RLPrompt: Optimizing Discrete Text Prompts with Reinforcement Learning

Figure 1: Overview of RL Prompt for discrete prompt optimization. All language models (LMs) are frozen. We build our policy network by training a task-specific multi-layer perceptron (MLP) network inserted into a frozen pre-trained LM. The figure above illustrates generation of a prompt (left), example usages in a masked LM for classification (top right) and a left-to-right LM for generation (bottom right), and update of the MLP using RL reward signals (red arrows).

TL;DR: Prompting enables large language models (LLMs) to perform various NLP tasks without changing the model. Discrete prompts have many desirable properties, but are difficult to optimize. We propose an efficient approach using reinforcement learning, which shows superior performance and facilitates rich interpretations and analyses. You can easily adapt it for your own tasks using our code base here.

Prompting has emerged as a promising approach to solving a wide range of NLP problems using large pre-trained language models (LMs), including left-to-right models such as GPTs (Radford et al., 2019; Brown et al., 2020) and masked LMs such as BERT (Devlin et al., 2019), RoBERTa (Liu et al., 2019), etc.

Compared to conventional fine-tuning that expensively updates the massive LM parameters for each downstream task, prompting concatenates the inputs with an additional piece of text that steers the LM to produce the desired outputs. A key question with prompting is how to find the optimal prompts to improve the LM’s performance on various tasks, often with only a few training examples.

Most existing work resorts to tuning soft prompt (e.g., embeddings) which falls short of interpretability, reusability across LMs, and applicability when gradients are not accessible. Discrete prompt, on the other hand, is difficult to optimize, and is often created by “enumeration (e.g., paraphrasing)-then-selection” heuristics that do not explore the prompt space systematically.

Instead, we propose RLPrompt, an efficient discrete prompt optimization approach with reinforcement learning (RL). RLPrompt is flexibly applicable to different types of LMs (e.g., BERT and GPTs) for both classification and generation tasks. Experiments on few-shot classification and unsupervised text style transfer show superior performance over a wide range of existing finetuning or prompting methods. 

Interestingly, the resulting optimized prompts are often ungrammatical gibberish text; and surprisingly, those gibberish prompts are transferable between different LMs to retain significant performance, indicating LMs may have grasped shared structures for prompting, but do not follow human language patterns.

Discrete Prompt Optimization with RL

This paper presents RLPrompt, a new discrete prompt optimization approach based on reinforcement learning (RL). This approach brings together a wide range of desirable properties for efficient use on diverse tasks and LMs (see the table below). 

Table 1: RLPrompt unites the desirable properties of a wide range of previous prompt optimization approaches

Crucially, rather than directly editing the discrete tokens, which has been difficult and inefficient, RLPrompt trains a policy network that generates the desired prompts. Discrete prompt optimization thus amounts to learning a small number of policy parameters which we set as an MLP layer inserted into a frozen compact model such as distilGPT-2 (HuggingFace, 2019). We describe the specific formulations in Section §2.1-2.3 of our paper.

This formulation also allows us to employ off-the-shelf RL algorithms (e.g., Guo et al., 2021) that learn the policy with arbitrary reward functions—defined either with available data (e.g., in few-shot classification) or other weak signals when no supervised data is accessible (e.g., in controllable text generation).

Reward Stabilization 

On the other hand, RL for prompt optimization poses new challenges to learning efficiency: the large black-box LM presents a highly complex environment that, given the prompt (i.e., actions), goes through a long series of complex transitions (e.g., reading the input and inferring the output) before computing the rewards. This makes the reward signals extremely unstable and hard to learn from. 

To overcome this difficulty, we propose two simple yet surprisingly effective ways to stabilize the rewards and improve the optimization efficiency.

  1. Normalizing the training signal by computing the z-score of rewards for the same input.
  2. Designing piecewise reward functions that provide a sparse, qualitative bonus to desirable behaviors (e.g., certain accuracy on certain class).

We describe more details in Section §2.4 of our paper.

Experiments

We evaluate our approach on both classification (in the few-shot setting) and generation (unsupervised text style transfer), and perform rich analyses for new insights on LM prompting. We describe implementation details such as reward function design in Section §3 our paper, and publish the code at our Github codebase.

Few-Shot Text Classification

For few-shot classification, we follow previous work and experiment on popular sentiment and topic classification tasks, using 16 examples per class for both training and validation (Perez et al., 2021). Results using RoBERTa-large (left table below) show our approach improving over a wide range of fine-tuning and prompting methods, and is as efficient to optimize as similar methods that tune soft prompts (e.g., right figure below). We report detailed dataset-level results in Section §3.1 of our paper.

Table 1: Average accuracy for few-shot text classification across all tested datasets. All methods use RoBERTa-large for fine-tuning or prompting.
Figure 2: Comparison of our method (orange) and BlackBox (BB) Tuning (Sun et al., 2022) (blue) in terms of training efficiency. The solid curves are the mean and the shaded regions are the max. and min. test accuracies over 5 trials.

Unsupervised Text Style Transfer

For text style transfer, we evaluate on the popular Yelp sentiment transfer dataset (Shen et al., 2017) using popular automatic metrics for content preservation, style accuracy, and fluency, and report their sentence-level joint product (J(cdot)) below. Our full paper also includes few-shot experiments on the Shakespeare (Xu et al., 2012) dataset and human evaluations.

Results using GPT-2 (left table below) show our method outperforms or competes with various fine-tuning and prompting baselines, including DiRR (Liu et al., 2021c) which expensively fine-tunes all parameters of a GPT-2 model. Ablation study (right figure below) shows that our proposed reward normalization technique is crucial to optimization success. We describe the full evaluation results in Section §3.2 of our paper. 

Table 2: Automatic evaluation of our method vs. baselines on the Yelp (Shen et al., 2017) sentiment transfer dataset. (J(cdot)) is our main metric which measures the average joint sentence-level scores of content preservation, style accuracy, and fluency. Numbers in (parentheses) are standard deviations across 3 sets of prompts.
Figure 3: Comparison of our method with (orange) and without (purple) z-score reward normalization. The format is the same as Figure 2.

Analysis

Optimal Prompts Don’t Follow Human Language

The resulting discrete prompts also facilitate rich interpretations and analyses for new insights into LM prompting. In particular, the optimized prompts, though inducing strong task performance, tend to be gibberish text without clear human-understandable meaning (e.g., table below), echoing recent research (Webson and Pavlick, 2021; Zhao et al., 2021; Prasad et al., 2022) that LMs making use of prompts do not necessarily follow human language patterns. 

Table 3: Comparison of our method (RLPrompt) with manually-written (Manual) prompts for text style transfer performance on Yelp (Shen et al., 2017). For the manual prompts, we take one from Reif et al. (2021) and write two more for this experiment. (J(cdot)) is the main metric introduced in Table 2. All outputs are generated using GPT-2-xl and metrics are averaged over 5 runs.

Learned Prompts Transfer Trivially Across LMs

Perhaps surprisingly, those gibberish prompts learned with one LM can be used in other LMs for significant performance, indicating that those different pre-trained LMs have grasped shared structures for prompting (e.g., figures below).

Figure 4: Heatmap of sentiment analysis performance with transferred discrete prompts of 2 tokens. The columns represent the models used to learn the prompts, and the rows represent the models we perform classification with. Brighter color represents higher accuracy.
Figure 5: Heatmap of text style transfer performance with transferred discrete prompts. The columns represent the models used to learn the prompts, and the rows represent the models we perform text generation with. Manual and Random refer to manual prompts and random tokens, respectively. Brighter color represents better joint score (J(cdot)).

Conclusion

We have presented RLPrompt, an efficient and flexible approach for discrete prompt optimization using RL, which improves over a wide range of fine-tuning and prompting methods in experiments on few-shot classification and unsupervised text style transfer.

Analysis reveals that strong optimized prompts are incoherent but transferable between LMs for remarkable performance. The observation opens up many promising possibilities for prompting, such as learning prompts cheaply from smaller models and performing inference with larger models. We are excited to explore further.

Read More

Modular functions design for Advanced Driver Assistance Systems (ADAS) on AWS

Modular functions design for Advanced Driver Assistance Systems (ADAS) on AWS

Over the last 10 years, a number of players have developed autonomous vehicle (AV) systems using deep neural networks (DNNs). These systems have evolved from simple rule-based systems to Advanced Driver Assistance Systems (ADAS) and fully autonomous vehicles. These systems require petabytes of data and thousands of compute units (vCPUs and GPUs) to train.

This post covers build approaches, different functional units of ADAS, design approaches to building a modular pipeline, and the challenges of building an ADAS system.

DNN training methods and design

AV systems are built with deep neural networks. When it comes to the design of an AV system, there are two main approaches. The difference is based on how the DNNs are trained and the system boundary.

  • Modular training – With a modular pipeline design, the system is split into individual functional units (for example, perception, localization, prediction, and planning). This is a common design paradigm used by many AV system providers. With the whole system split into individual modules, they can be built and trained independently.
  • End-to-end training – This approach involves training a DNN model that takes raw sensor data as input and outputs the driving command. This is a monolithic architecture and is mainly explored by researchers. The DNN architecture is typically based on reinforcement learning (RL) based on a reward/penalty system or imitation learning (IL) by observing a human driving the vehicle. Although the overall architecture is simple, it’s hard to interpret and diagnose the monolith. However, annotations are cheap because the system learns from the data collected through human behavior.

In addition to these two approaches, researchers are also exploring a hybrid approach that trains two different DNNs that are connected by an intermediate representation.

This post explains the functions based on a modular pipeline approach.

Automation levels

The SAE International (formerly called as Society of Automotive Engineers) J3016 standard defines six levels of driving automation, and is the most cited source for driving automation. This ranges from Level 0 (no automation) to Level 5 (full driving automation), as shown in the following table.

Level Name Feature
0 No Driving Automation Human drives
1 Driving assistance Human drives
2 Partial driving automation Human drives
3 Conditional driving automation System drives with human as backup
4 High driving automation System drives
5 Full driving automation System drives

Modular functions

The following diagram provides an overview of a modular functions design.

At the higher levels of automation (Level 2 and above), the AD system performs multiple functions:

  • Data collection – The AV system gathers information about the vehicle’s surroundings in real time with centimeter accuracy. The vehicle is equipped with various devices, and the functions of these devices vary and intersect in a number of ways. AV is still an evolving space and there is no consensus and standardization of types of sensors and devices attached. In addition to the devices listed here, vehicles might also have GPS for navigation, and use maps and Inertial Measurement Units (IMUs) to measure linear and angular acceleration. Depending on the type of ADAS system, you will see a combination of the following devices:
    • Cameras – Visual devices conceptually similar to human perception. Supports high resolution but bad at depth estimation and handling extreme weather conditions.
    • LiDAR – Expensive devices providing data about the surroundings as a 3D point cloud. Provides accurate depth and speed estimation.
    • Ultrasonics – Small, inexpensive sensors but works well only in short ranges.
    • Radar – Supports long and short ranges and works well in low visibility and extreme weather conditions.
  • Data fusion – Multiple devices that are part of the AV system provide signals but have their limitations; however, signals across the devices provide complementary information. AV systems fuse data from the devices that are integrated together to build a comprehensive perception. This integrated dataset is used to train the DNN.
  • Perception – AV systems analyze the raw data collected from the devices to construct information about the environment around the vehicle, including obstacles, traffic signs, and other objects. This is called road scene perception or simply perception. It involves detecting the objects and classifying them as nearby vehicles, pedestrians, traffic lights, and traffic signs. This function measures depth and performs lane detection, lane curvature estimation, curb detection, and occlusion. This information is key for path planning and route optimization.
  • Localization and mapping – To operate and optimize vehicle safely, the AV systems need an understanding of the location of the objects detected by perception. The AV system constructs a 3D map and updates the position of the host vehicle (ego vehicle) and its surroundings in the map. It tracks the detected objects and their current location. Advanced systems predict the kinematics of the objects that are in motion.
  • Prediction – With the information collected from other modules, AV systems predict how the immediate future of the environment is going to change. The DNN running on the vehicle predicts the position of the ego vehicle and the surrounding object interactions by projecting the kinematic states over time (position, velocity, acceleration, jerk). It can predict potential traffic violations and collisions or near collisions.
  • Path planning – This function is responsible for drawing out the possible routes the vehicle can take as the next action based on inputs from perception, localization, and prediction. To plan the best possible route, the AV system takes localization, maps, GPS data, and predictions as input. Some AV systems construct a bird’s-eye view by projecting kinematics of the ego vehicle and other objects onto a static route to provide a 3D map. Some also fuse data from other vehicles. Overall, the planning function finds the optimal route from all the possible ones with a goal to maximize driver comfort (for example, smooth turns vs. sharp turns, slow down vs. stop abruptly at stop signs).
  • Control and execution – Takes the input from the route planner to perform actions to accelerate, decelerate, stop, and rotate the steering wheel. The goal of the controller is to maintain the planned trajectory.
  • Training pipeline – DNNs providing predictions on the vehicle need to be trained. They are typically trained in an offline fashion with data collected from the vehicles. Training requires thousands of compute units for an extended period of time. The amount of data required to train and the required compute power varies based on the model architecture and the AV system provider. To train the DNNs, the AV system provider requires labeled data that is partly annotated by humans and partly automated. Typically, personally identifiable information (PII) such as license plate number and face are anonymized via blurring. Many providers augment the labeled data with simulation. It provides the ability to generate data for specific scenarios and augment real-world data. AV system providers also utilize tools to mine relevant data for training, fine-tuning, and handling edge cases. The trained models are validated for accuracy with offline simulation. Some providers use a dormant model strategy and deploy candidate models (dormant) side by side with the production models. Although predictions from the dormant models aren’t used to control the vehicle, it helps the providers validate the model’s accuracy in real-world scenarios.

Challenges

DNNs for AV workloads need to be trained with huge volumes of data. You would need a compute infrastructure that is scalable to train the DNNs, handle large volumes of training data, and consider factors to optimize training with models and data parallelism.

Training with large volumes of data

AV systems collect a large volume of data from the devices attached to the vehicle. Depending on the AV system provider, the vehicle fleet ranges from a handful to thousands of vehicles. The following are some typical challenges an AV system provider may encounter:

  • Collection, preprocessing, and storage of petabytes of data – Each vehicle collects more than 40 TB of data for every 8 hours of driving.
  • Identification of relevant representation data from a huge volume of data – This is essential to reduce biases in the datasets so that common scenarios (driving at normal speed with obstruction) don’t create class imbalance. To yield better accuracy, DNNs require large volumes of diverse, good quality data.
  • Volume of corner cases – ML models need to handle a wide range of corner cases. This is essential to ensure the safety of the AV system.
  • Training time – Given a huge volume of data, training time is often in multiple days or even weeks. This reduces the development velocity and ability to fail fast.

To address the large value challenge, you can utilize the Amazon SageMaker distributed data parallelism feature (SMDDP). SageMaker is a fully managed machine learning (ML) service. With data parallelism, a large volume of data is split into batches. Blocks of data are sent to multiple CPUs or GPUs called as nodes, and the results are combined. Each node has a copy of the DNN. SageMaker has developed the distributed data parallel library, which splits data per node and optimizes the communication between the nodes. You can use the SageMaker Python SDK to trigger a job with data parallelism with minimal modifications to the training script. Data parallelism supports popular deep learning frameworks PyTorch, PyTorch Lightening, TensorFlow, and Hugging Face Transformers.

The Hyundai motor company utilized SageMaker data parallelism to reduce training time for their autonomous driving models and achieved more than 90% scaling efficiency with eight instances, each having 8 GPUs. The following diagram illustrates this architecture.

For more details, refer to Hyundai reduces ML model training time for autonomous driving models using Amazon SageMaker.

For more information about distributed training with SageMaker, refer to the AWS re:Invent 2020 video Fast training and near-linear scaling with DataParallel in Amazon SageMaker and The science behind Amazon SageMaker’s distributed-training engines.

Labeling a large volume of data

The training pipeline requires a large volume of labeled datasets. One of the common challenges faced by our customers is development of annotation tools to label the image, video, and sensor (for example, 3D point cloud); custom workflows for object detection; and semantic segmentation tasks. You need the ability to customize your workflows.

Amazon SageMaker Ground Truth is a fully managed data labeling service that provides flexibility to build and manage custom workflows. With Ground Truth, you can label image, video, and point cloud data for object detection, object tracking, and semantic segmentation tasks. You can transfer data collected from the vehicles and stored on premises to AWS using a data transfer mechanism such as AWS Storage Gateway, AWS Direct Connect, AWS DataSync, AWS Snowball, or AWS Transfer Family. After the data is preprocessed (such as blurring faces and license plates), the cleaned dataset is ready for labeling. Ground Truth supports sensor fusion of LiDAR data with video inputs from cameras. You can choose to use human annotators through Amazon Mechanical Turk, trusted third-party vendors, or your own private workforce.

In the following figure, we provide a reference architecture to preprocess data using AWS Batch and using Ground Truth to label the datasets.

For more information, refer to Field Notes: Automating Data Ingestion and Labeling for Autonomous Vehicle Development and Labeling data for 3D object tracking and sensor fusion in Amazon SageMaker Ground Truth.

For more information on using Ground Truth to label 3D point cloud data, refer to Use Ground Truth to Label 3D Point Clouds.

Training infrastructure

As the AV systems mature, the DNNs need to be trained to handle multiple edge cases (for example, humans walking on highways), and the model gets complex and big. This results in training the DNNs with more data from mining the recorded data or through simulations to handle newer scenarios. This demands more compute capacity and scaling compute infrastructure.

To support the computing needs for ML workloads, SageMaker provides multiple instance types for training. Each family is designed for a few specific workloads; you can choose based on the vCPU, GPU, memory, storage, and networking configurations of the instances. For full, end-to-end AV development, companies largely rely on the m, c, g, and p families.

Some of our customers use our Deep Learning AMIs (DLAMI) to launch NVIDIA GPU-based Amazon Elastic Compute Cloud (Amazon EC2) instances in the p family. Each EC2 p family instance generation integrates the latest NVIDIA technology, including the p2 instances (Tesla K80), p3 instances (Volta V100), and p4d instances (Ampere A100).

The following figure summarizes the available instances:

When the DNNs are complex and can’t fit in memory of one GPU, you can use the SageMaker model parallelism library. This splits the layers across GPUs and instances. You can use the library to automatically partition your TensorFlow and PyTorch models across multiple GPUs and multiple nodes with minimal code changes.

MLOps

When it comes to operationalizing, from data scientists conducting experiments on revised models to deploying across thousands of vehicles, AV system providers need a set of tools that work end to end seamlessly for various needs:

  • Data collection and transformation at scale
  • Automated analysis and evaluation of models
  • Standardization of data pipelines
  • The ability to define and conduct experiments for data scientists
  • Monitoring model performance
  • Establishing a repeatable process and eliminating human intervention with end-to-end automation
  • Automated model deployment, which enables you to quickly deploy a trained model across millions of vehicles

SageMaker provides comprehensive MLOps tools. Data scientists can use Amazon SageMaker Experiments, which automatically tracks the inputs, parameters, configurations, and results of iterations as trials. You can further assign, group, and organize these trials into experiments. Amazon SageMaker Model Monitor helps continuously monitor the quality of your ML models in real time. You can set up automated alerts to notify when there are deviations in the model quality, such as data drift and anomalies. When it comes to orchestration, you can choose from a number of options, including the SageMaker Pipelines SDK, AWS Step Functions, Amazon Managed Apache Airflow (Amazon MWAA), and open-source tools like Kubeflow.

Conclusion

In this post, we covered the build approaches and different functional units of ADAS, a unified framework to build a modular pipeline, and the challenges of building an ADAS system. We provided reference architectures and links to case studies and blog posts that explain how our customers use SageMaker and other AWS services to build a scalable AV system. The proposed solutions can help our customers to address the challenges while building a scalable AV system. In a later post, we will do a deep dive into the DNNs used by ADAS systems.


About the Authors

Shreyas Subramanian is a Principal AI/ML specialist Solutions Architect, and helps customers by using Machine Learning to solve their business challenges using the AWS platform. Shreyas has a background in large scale optimization and Machine Learning, and in use of Machine Learning and Reinforcement Learning for accelerating optimization tasks.

Gopi Krishnamurthy is a Senior AI/ML Solutions Architect at Amazon Web Services based in New York City. He works with large Automotive customers as their trusted advisor to transform their Machine Learning workloads and migrate to the cloud. His core interests include deep learning and serverless technologies. Outside of work, he likes to spend time with his family and explore a wide range of music.

Read More

Google Research, 2022 & beyond: Health

Google Research, 2022 & beyond: Health

(This is Part 8 in our series of posts covering different topical areas of research at Google. You can find other posts in the series here.)

Google’s focus on AI stems from the conviction that this transformational technology will benefit society through its capacity to assist, complement, and empower people in almost every field and sector. In no area is the magnitude of this opportunity greater than in the spheres of healthcare and medicine. Commensurate with our mission to demonstrate these societal benefits, Google Research’s programs in applied machine learning (ML) have helped place Alphabet among the top five most impactful corporate research institutions in the health and life sciences publications on the Nature Impact Index in every year from 2019 through 2022.

Our Health research publications have had broad impact, spanning the fields of biomarkers, consumer sensors, dermatology, endoscopy, epidemiology, medicine, genomics, oncology, ophthalmology, pathology, public & environmental health, and radiology. Today we examine three specific themes that came to the fore in the last year:

In each section, we emphasize the importance of a measured and collaborative approach to innovation in health. Unlike the “launch and iterate” approach typical in consumer product development, applying ML to health requires thoughtful assessment, ecosystem awareness, and rigorous testing. All healthcare technologies must demonstrate to regulators that they are safe and effective prior to deployment and need to meet rigorous patient privacy and performance monitoring standards. But ML systems, as new entrants to the field, additionally must discover their best uses in the health workflows and earn the trust of healthcare professionals and patients. This domain-specific integration and validation work is not something tech companies should embark upon alone, but should do so only in close collaboration with expert health partners.

Criticality of technology partnerships

Responsible innovation requires the patience and sustained investment to collectively follow the long arc from primary research to human impact. In our own journey to promote the use of ML to prevent blindness in underserved diabetic populations, six years elapsed between our publication of the primary algorithmic research, and the recent deployment study demonstrating the real-world accuracy of the integrated ML solution in a community-based screening setting. Fortunately, we have found that we can radically accelerate this journey from benchtop-ML to AI-at-the-bedside with thoughtfully constructed technology partnerships.

The need for accelerated release of health-related ML technologies is apparent, for example, in oncology. Breast cancer and lung cancer are two of the most common cancer types, and for both, early detection is key. If ML can yield greater accuracy and expanded availability of screening for these cancers, patient outcomes will improve — but the longer we wait to deploy these advances, the fewer people will be helped. Partnership can allow new technologies to safely reach patients with less delay — established med-tech companies can integrate new AI capabilities into existing product suites, seek the appropriate regulatory clearances, and use their existing customer base to rapidly deploy these technologies.

We’ve seen this play out first hand. Just two and half years after sharing our primary research using ML to improve breast cancer screening, we partnered with iCAD, a leading purveyor of mammography software, to begin integrating our technology into their products. We see this same accelerated pattern in translating our research on deep learning for low-dose CT scans to lung cancer screening workflows through our partnership with RadNet’s Aidence.

Genomics is another area where partnership has proven a powerful accelerant for ML technology. This past year, we collaborated with Stanford University to rapidly diagnose genetic disease by combining novel sequencing technologies and ML to sequence a patient’s entire genome in record-setting time, allowing life-saving interventions. Separately, we announced a partnership with Pacific Biosciences to further advance genomic technologies in research and the clinic by layering our ML techniques on top of their sequencing methods, building on our long running open source projects in deep learning genomics. Later in the same year PacBio announced Revio, a new genome sequencing tool powered by our technology.

<!–

Diagnosing a rare genetic disease may depend on finding a handful of novel mutations in out of billions of base pairs in the patient’s genome.

–>

Partnerships between med-tech companies and AI-tech companies can accelerate translation of technology, but these partnerships are a complement to, not a substitute for, open research and open software that moves the entire field forward. For example, within our medical imaging portfolio, we introduced a new approach to simplify transfer learning for chest x-ray model development, methods to accelerate the life-cycle of ML systems for medical imaging via robust and efficient self-supervision, and techniques to make medical imaging systems more robust to outliers — all within 2022.

Moving forward, we believe this mix of scientific openness and cross-industry partnerships will be a critical catalyst in realizing the benefits of human-centered AI in healthcare and medicine.

Top

Shift towards mobile medicine

In healthcare overall, and recapitulated in ML research in health applications, there has been a shift in emphasis away from concentrated centralized care (e.g., hospitalizations) and towards distributed care (e.g., reaching patients in their communities). Thus, we’re working to develop mobile ML-solutions that can be brought to the patient, rather than bringing the patient to the (ML-powered) clinic. In 2021, we shared some of our early work using smartphone cameras to measure heart rate and to help identify skin conditions. In 2022, we shared new research on the potential for smartphone camera selfies to assess cardiovascular health and metabolic risks to eyesight and the potential for smartphone microphones held to the chest to help interpret heart and lung sounds.

These examples all use the sensors that already exist on every smartphone. While these advances are valuable, there is still great potential in extending mobile health capabilities by developing new sensing technologies. One of our most exciting research projects in this area leverages new sensors that easily connect to modern smartphones to enable mobile maternal ultrasound in under-resourced communities.

Each year, complications from pregnancy & childbirth contribute to 295,000 maternal deaths and 2.4 million neonatal deaths, disproportionately impacting low income populations globally. Obstetric ultrasound is an important component of quality antenatal care, but up to 50% of women in low-and-middle-income countries receive no ultrasound screening during pregnancy. Innovators in ultrasound hardware have made rapid progress towards low-cost, handheld, portable ultrasound probes that can be driven with just a smartphone, but there’s a critical missing piece — a shortage of field technicians with the skills and expertise to operate the ultrasound probe and interpret its shadowy images. Remote interpretation is feasible of course, but is impractical in settings with unreliable or slow internet connectivity.

With the right ML-powered mobile ultrasounds, providers such as midwives, nurses, and community health workers could have the potential to bring obstetric ultrasound to those most in need and catch problems before it’s too late. Previous work had shown that convolutional neural networks (CNNs) could interpret ultrasounds acquired by trained sonographers using a standardized acquisition protocol. Recognizing this opportunity for AI to unblock access to potentially lifesaving information, we’ve spent the last couple of years working in collaboration with academic partners and researchers in the US and Zambia to improve and expand the ability to automatically interpret ultrasound video captures acquired by simply sweeping an ultrasound probe across the mother’s abdomen, a procedure that can easily be taught to non-experts.

This ultrasound acquisition procedure can be performed by novices with a few hours of ultrasound training.

Using just a low cost, battery-powered ultrasound device and a smartphone, the accuracy of this method is on par with existing clinical standards for professional sonographers to estimate gestational age and fetal malpresentation.

The accuracy of this AI enabled procedure is on-par with the clinical standard for estimating gestational age.

We are in the early stages of a wide-spread transformation in portable medical imaging. In the future, ML-powered mobile ultrasound will augment the phone’s built-in sensors to allow in-the-field triage and screening for a wide range of medical issues, all with minimal training, extending access to care for millions.

Top

Generative ML in Health

As the long arc of the application of ML to health plays out, we expect generative modeling to settle into a role complementary to the pattern recognition systems that are now relatively commonplace. In the past we’ve explored the suitability of generative image models in data augmentation, discussed how generative models might be used to capture interactions among correlated clinical events, and even used it to generate realistic, but entirely synthetic electronic medical records for research purposes.

Generating synthetic data from the original data with EHR-Safe.

Any discussion of today’s outlook on applied generative modeling would be incomplete without mention of recent developments in the field of large language models (LLMs). Nearly a decade of research in the making, publicly available demonstrations of text synthesis via generative recurrent neural networks have captured the world’s imagination. These technologies undoubtedly have real world applications — in fact, Google was among the first to deploy earlier variants of these networks in live consumer products. But when considering their applications to health, we must again return to our mantra of measurement — we have fundamental responsibility to test technologies responsibly and proceed with caution. The gravity of building an ML system that might one day impact real people with real health issues cannot be underestimated.

To that end, in December of last year we published a pre-print on LLMs and the encoding of clinical knowledge which (1) collated and expanded benchmarks for evaluating automated medical question answering systems, and (2) introduced our own research-grade medical question answering LLM, Med-PaLM. For example if one asked Med-Palm, “Does stress cause nosebleeds?” the LLM would generate a response explaining that yes, stress can cause nosebleeds, and detail some possible mechanisms. The purpose of Med-PaLM is to allow researchers to experiment with and improve upon the representation, retrieval, and communication of health information by LLMs, but is not a finished medical question answering product.

We were excited to report that Med-PaLM substantially outperformed other systems on these benchmarks, across the board. That said, a critical take-away of our paper is that merely receiving a “passing” mark on a set of medical exam questions (which ours and some other ML systems do) still falls well short of the safety and accuracy required to support real-world use for medical question answering. We expect that progress in this area will be brisk — but that much like our journey bringing CNNs to medical imaging, the maturation of LLMs for applications in health will require further research, partnership, care, and patience.

Our model, Med-PaLM, obtains state-of-the-art performance on the MedQA USMLE dataset exceeding previous best by 7%.

Top

Concluding thoughts

We expect all these trends to continue, and perhaps even accelerate, in 2023. In a drive to more efficiently map the arc from innovation to impact in AI for healthcare, we will see increased collaboration between academic, med-tech, AI-tech, and healthcare organizations. This is likely to interact positively with the measured, but nonetheless transformational, expansion of the role of phones and mobile sensors in the provisioning of care, potentially well beyond what we presently imagine telehealth to be. And of course, it’s hard to be in the field of AI these days, and not be excited at the prospects for generative AI and large language models. But particularly in the health domain, it is essential that we use the tools of partnership, and the highest standards of testing to realize this promise. Technology will keep changing, and what we know about human health will keep changing too. What will remain the same is the people caring for each other, and trying to do things better than before. We are excited about the role AI can play in improving healthcare in years to come.

Top

Google Research, 2022 & beyond

This was the seventh blog post in the “Google Research, 2022 & Beyond” series. Other posts in this series are listed in the table below:

Read More

Continuous Pseudo-Labeling from the Start

Self-training (ST), or pseudo-labeling has sparked significant interest in the automatic speech recognition (ASR) community recently because of its success in harnessing unlabeled data. Unlike prior semi-supervised learning approaches that relied on iteratively regenerating pseudo-labels (PLs) from a trained model and using them to train a new model, recent state-of-the-art methods perform ‘continuous training’ where PLs are generated using a very recent version of the model being trained. Nevertheless, these approaches still rely on bootstrapping the ST using an initial supervised learning…Apple Machine Learning Research

DIY Urban AI: Researchers Drive Hyper-Local Climate Modeling Movement

DIY Urban AI: Researchers Drive Hyper-Local Climate Modeling Movement

The do-it-yourself climate modeling movement is here.

Researchers from Northwestern University and Argonne National Laboratory have been launching NVIDIA Jetson-driven edge computing Waggle devices across the globe to collect hyper-local climate information. Waggle is an open source sensor platform for edge computing developed by Argonne.

Working with this, scientists share open-source AI code designed for the edge at an app store within the Sage web portal, funded by the National Science Foundation (NSF).

The pioneering work is supporting environmental studies around the world. As a result, more and more researchers and scientists are jumping in to study climate issues with edge computing and sensors.

Waggle’s installed base studies everything from micro-local Chicago weather to help understand urban heat islands and their impact on residents, to climate effects on wild rice on the Ojibwe tribe’s lands in Wisconsin.

More recently, the University of Oregon’s Hazards Lab began using edge computing with Waggle. This work aims to help understand and identify wildfires as part of the ALERTWildfire system that provides local residents, firefighters and municipalities live data streams from smart cameras.

The efforts, on several continents, underscore the accessibility of edge computing paired with a digital infrastructure for delivering open AI models for use in these climate-related applications.

“Many climate models focus on large geographic scales — and therefore the impact can be difficult to understand for specific communities —  but the Department of Energy wants to understand how our changing climate will impact humans, especially in an urban environment,” said Pete Beckman, an Argonne distinguished fellow and co-director of the Northwestern University Argonne Institute of Science and Engineering.

NVIDIA announced at GTC 2021 the Earth-2 AI supercomputer for climate research worldwide.

Waggle Nodes Plus Sage AI

It all began in 2015 with an NSF project called the “Array of Things,” or AoT, led by Charlie Catlett, which introduced advanced sensors and edge computing for studying urban environments.

The AoT was built using the Waggle edge computing platform that had been recently developed internally at Argonne National Laboratory. Waggle brings together powerful edge AI computing like NVIDIA Jetson with industry-standard software toolkits like Kubernetes, PyTorch and TensorFlow to provide a programmable intelligent platform that can support cameras, microphones, software-defined radios, lidar and infrared imagers. To support the rapidly growing AI and sensor landscape, the NVIDIA platform was the obvious choice, offering the largest ecosystem, the most flexibility and industry-leading performance.

The energy efficiency of Jetson is key, as Waggle nodes are often mounted outside of buildings or on light posts.

The Sage project began with a grant from the NSF in 2022 to build a national-scale, software-defined sensor network to support AI at the edge.

Sage nodes are open resources for scientific exploration. Scientists can develop new AI models, upload them to the Sage app store and then deploy them to mountaintops in Oregon or prairies in Illinois.

Monitoring Chicago for Heat Waves

The same core technology is being deployed in Chicago, which uses the Waggle-Sage platform.

The U.S. Department of Energy wanted to understand what was happening with climate change in the urban environment. It put out a call for proposals for an urban integrated field lab. The effort pairs the supercomputing of the Waggle nodes with the open-source Sage models for hyper-local data analysis.

Argonne and partners are establishing an urban integrated field lab, dubbed Community Research on Climate and Urban Science (CROCUS), to focus on the Chicago region. The plans are for it to take community input to identify questions and specific areas of urban climate change to study, ensuring that research results directly benefit local residents.

“How do we build AI systems that are hyperlocal, that can give us real insight into an urban environment?” said Beckman.

Modeling Wild Rice Migration

In Wisconsin, researchers deployed a node with the Ojibwe tribe, in efforts to help understand wild rice, a food source with important cultural significance.

“Wild rice is a species that is shifting because of climate change, so they want to understand what is happening,” said Beckman.

Identifying Birds by Sounds

What else do you get when you combine open AI models, edge computing and researchers on a climate mission?

A lot of useful applications for many people, including bird identification AI.

Now, birders can download the Merlin Bird ID app — available for iOS and Android devices — and start identifying birds by the sounds they make. The models have also been moved to some Waggle devices and can identify birds wherever Sage is deployed.

This AI is music to the ears.

Read More

A new Kaggle competition to help parents of deaf children learn sign language

A new Kaggle competition to help parents of deaf children learn sign language

By Sam Sepah, ML Research Program Manager

As a Deaf person who learned sign language from my family at an early age, I consider myself lucky. Every day, 33 babies are born with permanent hearing loss in the U.S. (kdhe.ks.gov, deafchildren.org) The majority of deaf children are born to hearing parents who do not know how to sign, like my own. My parents were determined to provide me with the ability to communicate effectively anywhere, anytime, and anyplace. Because of this rich language environment, today I can achieve my dreams and live the life I want to live.

But most hearing parents do not know how to sign and might not have the resources to learn. Because of this, they may not be able to have a conversation with their deaf child, even at the family dinner table.

Deaf children who grow up in homes where sign language is not used are at risk for language deprivation. Language deprivation is a delay in language development that occurs when sufficient exposure to language, spoken or signed, is not provided in the first few years of a child’s life. Language deprivation is very common in deaf children, but it can happen to hearing children as well. It often leads to a life of challenges with employment, relationships, and the ability to be successful in one’s life goals.

So, what can be done?

You can help reduce the risk of language deprivation for deaf children by joining our new Isolated Sign Language Recognition competition on Kaggle and training an accurate, real-time sign language recognition (TensorFlow Lite) model!

We plan to open source the winning model and add it to the PopSign smartphone game app. PopSign* is a smartphone app that makes learning American Sign Language (ASL) fun and interactive. Players match videos of ASL signs with bubbles containing written English words to pop them.

PopSign is designed to help parents with deaf children learn ASL, but it’s open to anyone who wants to learn sign language vocabulary. The app is a great resource for parents who want to learn sign language and help their children develop language and social skills. By adding a sign language recognizer from this competition, PopSign players will be able to sign the type of bubble they want to shoot, providing the player with the opportunity to form the sign instead of just watching videos of other people signing.

We are grateful to our partners, the National Technical Institute for the Deaf at Rochester Institute of Technology, the Georgia Institute of Technology, and Deaf Professional Arts Network, for developing the PopSign game app, creating the dataset and helping us prepare for this competition. The game, dataset, and model you train will help us improve access to communication for so many families!

Join the competition today and together, we can make a difference for deaf children worldwide.

*PopSign is an app developed by the Georgia Institute of Technology and the National Technical Institute for the Deaf at Rochester Institute of Technology. The app is available in beta on Android and iOS.

Read More

NVIDIA Celebrates 1 Million Jetson Developers Worldwide at GTC

NVIDIA Celebrates 1 Million Jetson Developers Worldwide at GTC

A million developers across the globe are now using the NVIDIA Jetson platform for edge AI and robotics to build innovative technologies. Plus, more than 6,000 companies — a third of which are startups — have integrated the platform with their products.

These milestones and more will be celebrated during the NVIDIA Jetson Edge AI Developer Days at GTC, a global conference for the era of AI and the metaverse, taking place online March 20-23.

Register free to learn more about the Jetson platform and begin developing the next generation of edge AI and robotics.

One in a Million

Atlanta-based Kris Kersey, the mind behind the popular YouTube channel Kersey Fabrications, is one developer using the NVIDIA Jetson platform for his one-in-a-million technological innovations.

He created a fully functional Iron Man helmet that could be straight out of the Marvel Comics films. It uses the NVIDIA Jetson Xavier NX 8GB developer kit as the core of the “Arc Reactor” powering its heads-up display — a transparent display that presents information wherever the user’s looking.

In just over two years, Kersey built from scratch the wearable helmet, complete with object detection and other on-screen sensors that would make Tony Stark proud.

“The software design was more than half the work on the project, and for me, this is the most exciting, interesting part,” Kersey said. “The software takes all of the discrete hardware components and makes them into a remarkable system.”

To get started, Kersey turned to GitHub where he found “Hello AI World,” a guide for deploying deep-learning inference networks and deep vision primitives with the NVIDIA TensorRT software development kit and NVIDIA Jetson. He then wrote a wrapper code to connect his own project.

Watch Kersey document his Iron Man project from start to finish:

This 3D-printed helmet is just the beginning for Kersey, who’s aiming to build a full Iron Man suit later this year. He plans to make the entire project’s code open source, so anyone who dreams of becoming a superhero can try it for themselves.

Jetson Edge AI Developer Days at GTC

Developers like Kersey can register for the free Jetson Edge AI Developer Days at GTC, which feature NVIDIA experts who’ll cover the latest Jetson hardware, software and partners. Sessions include:

  • Level Up Edge AI and Robotics With NVIDIA Jetson Orin Platform
  • Accelerate Edge AI With NVIDIA Jetson Software
  • Getting the Most Out of Your Jetson Orin Using NVIDIA Nsight Developer Tools
  • Bring Your Products to Market Faster With the NVIDIA Jetson Ecosystem
  • Design a Complex Architecture on NVIDIA Isaac ROS

Plus, there’ll be a Connect with Experts session focusing on the Jetson platform that provides a deep-dive Q&A with embedded platform engineers from NVIDIA on Tuesday, March 21, at 12 p.m. PT. This interactive session offers a unique opportunity to meet, in a group or individually, with the minds behind NVIDIA products and get your questions answered. Space is limited and on a first-come, first-served basis.

Additional Sessions by Category

GTC sessions will also cover robotics, intelligent video analytics and smart spaces. Below are some of the top sessions in these categories.

Robotics:

Computer Vision and AI Video Analytics:

Smart Cities and Spaces:

Check out the latest Jetson community projects for ideas to replicate or be inspired by.

Grab the latest Jetson modules and developer kits from the NVIDIA Jetson store.

And sign up for the NVIDIA Developer Program to connect with Jetson developers from around the world and get access to the latest software and software development kits, including NVIDIA JetPack.

Read More

Mercedes-Benz Taking Vehicle Product Lifecycle Digital With NVIDIA AI and Omniverse

Mercedes-Benz Taking Vehicle Product Lifecycle Digital With NVIDIA AI and Omniverse

To drive the automotive industry forward, NVIDIA and Mercedes-Benz are taking the virtual road.

NVIDIA founder and CEO Jensen Huang joined Mercedes-Benz CEO Ola Källenius on stage at the automaker’s strategy update event yesterday in Silicon Valley, showcasing progress in their landmark partnership to digitalize the entire product lifecycle, plus the ownership and automated driving experience.

The automotive industry is undergoing a massive transformation, which is driven by advancements in accelerated computing, AI and the industrial metaverse.

“Digitalization is streamlining every aspect of the automotive lifecycle: from styling and design, software development and engineering, manufacturing, simulation and safety testing, to customer buying and driving experiences,” said Huang.

Since its founding, Mercedes-Benz has set the bar in automotive innovation and ingenuity, backed by superior craftsmanship. The automaker is shaping the future with its intelligent and software-defined vehicles, which are powered by NVIDIA’s end-to-end solutions.

The Fleet of the Future

Next-generation Mercedes-Benz vehicles will be built on a revolutionary centralized computing architecture that includes sophisticated software and features that will turn these future vehicles into high-performance, perpetually upgradable supercomputers on wheels.

During the event, the automaker took the wraps off its new operating system, MB.OS, a purpose-built, chip-to-cloud architecture that will be standard across its entire vehicle portfolio — delivering exceptional software capabilities and ease of use.

MB.OS benefits from full access to all vehicle domains, including infotainment, automated driving, body and comfort, driving and charging — an approach that allows Mercedes-Benz customers a differentiated, superior product experience.

“MB.OS is a platform that connects all parts of our business,” Källenius noted during the event.

Safe Has Never Felt So Powerful

At the heart of this architecture is NVIDIA DRIVE Orin, which delivers high-performance, energy-efficient AI compute to support a comprehensive sensor suite and software to safely enable enhanced assisted driving and, ultimately, level 3 conditionally automated driving.

Running on DRIVE Orin is the flexible and scalable software stack jointly developed by NVIDIA and Mercedes-Benz. Sarah Tariq, NVIDIA vice president of autonomous driving software, joined Magnus Östberg, chief software officer at Mercedes-Benz, on stage to delve deeper into this full-stack software architecture, which includes the MB.OS, middleware and deep neural networks to enable advanced autonomy.

Tariq said, “The companies are working in close collaboration to develop a software stack that can comfortably and safely handle all the complexities that the automaker’s customers may encounter during day-to-day commutes all over the world.”

This includes enhanced level 2 features in urban environments where there are pedestrians or dense, complex traffic patterns. Using advanced AI, Mercedes-Benz can deliver a comfortable driving experience that consumers have come to expect, backed by uncompromised safety and security.

With the ability to perform 254 trillion operations per second, DRIVE Orin has ample compute headroom to continuously advance this software with new capabilities and subscription services over the life of the vehicle, through over-the-air software updates, via an app, web or from inside the car.

Additionally, Mercedes-Benz is accelerating the development of these systems with the high-fidelity NVIDIA DRIVE Sim platform, built on NVIDIA Omniverse. This cloud-native platform delivers physically based, scalable simulation for automakers to develop and test autonomous vehicle systems on a wide range of rare and hazardous scenarios.

Manufacturing in the Industrial Metaverse

This software-defined platform is just one piece of Mercedes-Benz intelligent vehicle strategy.

At CES last month, Mercedes-Benz previewed its first step in digitalization of its production process using NVIDIA Omniverse — a platform for building and operating metaverse applications — to plan and operate its manufacturing and assembly facilities.

With Omniverse, Mercedes-Benz can create an AI-enabled digital twin of the factory to review and optimize floor layouts, unlocking operational efficiencies. With enhanced predictive analysis, software and process automation, the digital twin can maximize productivity and help maintain faultless operation.

By implementing a digital-first approach to its operations, Mercedes-Benz can also ensure production activities won’t be disrupted as new models and architectures are introduced. And this blueprint can be deployed to other areas within the automaker’s global production network for scalable, more agile vehicle manufacturing.

Revolutionizing the Customer Experience

Digitalization is also improving the car-buying experience, migrating from physical retail showrooms to immersive online digital spaces.

With Omniverse, automakers can bridge the gap between the digital and physical worlds, making the online car-research experience more realistic and interactive. These tools include online car configurators, 3D visualizations of vehicles, demonstration of cars in augmented reality and virtual test drives.

Östberg summed up, “The partnership with NVIDIA is already living up to its promise, and the potential is huge.”

Read More

A New Window in the Cloud: NVIDIA and Microsoft to Bring Top PC Games to GeForce NOW

A New Window in the Cloud: NVIDIA and Microsoft to Bring Top PC Games to GeForce NOW

The cloud just got bigger. NVIDIA and Microsoft announced this week they’re working to bring top PC Xbox Game Studios games to the GeForce NOW library, including titles from Bethesda, Mojang Studios and Activision, pending closure of Microsoft’s acquisition.

With six new games joining the cloud this week for members to stream, it’s a jam-packed GFN Thursday.

Plus, Ultimate members can now access cloud-based RTX 4080-class servers in and around Paris, the latest city to light up on the update map. Keep checking GFN Thursday to see which RTX 4080 SuperPOD upgrade is completed next.

Game On

GeForce NOW Ultimate Superpods
GeForce NOW beyond fast gaming expands to Xbox PC Games.

NVIDIA and Microsoft’s 10-year deal to bring the Xbox PC game library to GeForce NOW is a major boost for cloud gaming and brings incredible choice to gamers. It’s the perfect bow to wrap up GeForce NOW’s anniversary month, expanding the over 1,500 titles available to stream.

Work to bring top Xbox PC game franchises and titles to GeForce NOW, such as Halo, Minecraft and Elder Scrolls, will begin immediately. Games from Activision like Call of Duty and Overwatch are on the horizon once Microsoft’s acquisition of Activision closes. GeForce NOW members will be able to stream these titles across their devices, with the flexibility to easily switch between underpowered PCs, Macs, Chromebooks, smartphones and more.

Xbox Game Studios PC games available on third-party stores, like Steam or Epic Games Store, will be among the first streamed through GeForce NOW. The partnership also marks the first games that will be available on the Windows Store, support for which will begin soon.

It’s an exciting time for all gamers, as the partnership will give people more choice and higher performance. Stay tuned to GFN Thursdays for news on the latest Microsoft titles coming to GeForce NOW.

Ready, Set, Action!

Son of the Forest on GeForce NOW
Find a way to survive alone or with a buddy.

A new week means new GFN Thursday games. Sons of the Forest, the highly anticipated sequel to The Forest from Endnight Games, places gamers on a cannibal-infested island after crash-landing. Survive alone or pair up online with a buddy online.

Earlier in the week, members started streaming Atomic Heart, the action role-playing game from Mundfish, day-and-date from the cloud. Check out the full list of new titles available to stream this week:

With the wrap up of GeForce NOW’s #3YearsOfGFN celebrations, members are sharing their winning GeForce NOW moments on Twitter and Facebook for a chance to win an MSI Ultrawide Gaming monitor — the perfect companion with an Ultimate membership. Join the conversation and add your own favorite moments.

Let us know in the comments or on GeForce NOW social channels what you’ll be streaming next.

Read More

Boomi uses BYOC on Amazon SageMaker Studio to scale custom Markov chain implementation

Boomi uses BYOC on Amazon SageMaker Studio to scale custom Markov chain implementation

This post is co-written with Swagata Ashwani, Senior Data Scientist at Boomi.

Boomi is an enterprise-level software as a service (SaaS) independent software vendor (ISV) that creates developer enablement tooling for software engineers. These tools integrate via API into Boomi’s core service offering.

In this post, we discuss how Boomi used the bring-your-own-container (BYOC) approach to develop a new AI/ML enabled solution for their customers to tackle the “blank canvas” problem. Boomi’s machine learning (ML)-powered solution facilitates the rapid development of integrations on their platform, and enables faster time to market for their customers. Boomi funded this solution using the AWS PE ML FastStart program, a customer enablement program meant to take ML-enabled solutions from idea to production in a matter of weeks. Boomi built this solution using Amazon SageMaker Studio, an end-to-end browser-based IDE for AI/ML workloads, and Amazon Elastic Container Registry (Amazon ECR).

The blank canvas problem describes productivity and creativity issues faced by developers when starting a new task. An experienced developer knows at the onset of a new task what their code base will look like generally, but the process of building this code base is extensive and there’s no clear starting point. As the developer begins making progress on the blank canvas, their productivity is still low. The code written is usually boilerplate code providing the foundation for the business logic that can’t be written until most of the foundation is laid.

Boomi built a novel solution for the blank canvas problem using traditional development techniques. Boomi’s ML and data engineering teams needed the solution to be deployed quickly, in a repeatable and consistent way, at scale. The Boomi team uses the SageMaker BYOC paradigm to support their custom model. The Boomi team then used SageMaker Projects and SageMaker Pipelines to automate the training, testing, monitoring, and deployment of their custom model solution.

Customer use case

Markov chains are specialized structures for making predictive recommendations in a state machine. Markov chains are best known for their applications in web crawling and search engines. Boomi’s data science team implemented a Markov chain model that could be applied to common integration sequences, or steps, on their platform, hence the name Step Suggest.

Markov chains are built using a state machine and a probability matrix describing the likelihood of state transitions. Given a starting state, a Markov chain calculates the probability of a transition to another state allowed by the state machine. The data science team at Boomi applied the Markov Chain approach to the Step Suggest problem by treating integration steps as states in a state machine. Boomi’s Markov chain implementation takes the previous integration step and predicts the next integration step with significant accuracy.

Boomi had significant success with their application of Markov chains. However, the underlying algorithm for Step Suggest is complicated and proprietary. SageMaker has built-in support for several popular ML algorithms, but Boomi already had a working solution. Instead of starting from scratch, Boomi used the BYOC approach to import their existing models to SageMaker. As a result, Boomi’s team was able to use SageMaker for inference, CI/CD, and monitoring, without having to rebuild their Markov chain from scratch.

Solution details

The most important criteria for this solution were the reusability of existing models and the ease of deployment of those models to production. Boomi’s Step Suggest solution needs automated training and inference pipelines. At the time of the migration to SageMaker’s BYOC deployment model, Boomi’s solution was largely built and tested on individual laptops.

Boomi used Amazon ECR to store versions of their Step Suggest model. Amazon ECR stores and versions containerized applications in a container registry. The Boomi team built a Docker container with the model built off individual laptops and uploaded that container to an Amazon ECR domain. When the upload was complete, Boomi mounted the image to their SageMaker domain, where it could be imported and used for additional ML tasks like inference deployments to a hosted endpoint.

The exact steps to replicate this process are outlined Train and deploy deep learning models using JAX with Amazon SageMaker. This post discusses how to bring the JAX framework into your SageMaker domain. JAX is an up-and-coming ML framework for which SageMaker has no built-in support. Boomi implemented a similar workflow for their proprietary framework, extending the capabilities of their SageMaker deployment to satisfy the requirements of the Step Suggest project. There are a few prerequisites; proceed with the next steps before following the guide in the JAX post to practice the BYOC deployment paradigm with SageMaker.

Alternatives to SageMaker

Boomi was already an AWS customer before the AWS PE ML FastStart program. In fact, most of their data science team was using SageMaker notebook instances for model development. Data stored in Amazon Simple Storage Service (Amazon S3) trained models on notebook instances, which came pre-installed with the Jupyter Notebook software. This worked for model development, but Boomi needed a more robust solution to scale this workload to their customers.

The AWS PE ML FastStart program conducted a deep-dive session with Boomi’s data science and engineering teams. We decided SageMaker Studio would be a better solution for Boomi’s team to scale this solution quickly to their customers.

Why SageMaker?

SageMaker Studio brought several key advantages that SageMaker notebooks couldn’t do alone. First and foremost, Studio makes it easier to share notebook assets across a large team of data scientists like the one at Boomi. Boomi’s analysts were free to use SageMaker Data Wrangler for data preparation tasks, while Boomi’s data scientists could continue to use Jupyter notebooks. Most importantly, Studio maintained BYOC functionality. This was absolutely crucial because it meant the team could reuse the model assets they had already built.

Secondly, SageMaker Pipelines made it easy for Boomi’s team to visualize and modify their complex CI/CD requirements. The BYOC deployment paradigm requires additional integrations with Amazon ECR. To that end, the exact training and inference pipelines used by Boomi’s MLOps team necessitated additional steps for automated deployment, rollback, and monitoring. SageMaker Pipelines and the SageMaker StepFunctions SDK addressed this requirement.

Finally, SageMaker Projects presented the team with the ability to create AWS CloudFormation templates that standardized their ML development environments. Infrastructure as code (IaC) solutions like AWS CloudFormation reduce digital waste and standardize resource deployments in an AWS account. When CloudFormation templates are deployed through the AWS Service Catalog, as is done with SageMaker Projects, data science teams can operate freely without fear of violating any organization guardrails or best practices. Boomi’s cloud engineering team agreed this would be an important factor in scaling their data science team.

Feature deep dive

The following diagram illustrates the solution architecture and workflow.

Solution Architecture

The SageMaker BYOC paradigm enabled Boomi’s team to reuse a highly customized implementation of a proprietary ML algorithm. Boomi also had several custom preprocessing and postprocessing steps for their models. These proprietary steps allowed Boomi to bridge the gap between their data science and core product engineering teams. Implementing the processing logic within Studio, although possible, would be better suited for a built-in algorithm. The Studio BYOC paradigm enabled Boomi’s data science team to do what they did best without sacrificing speed and agility in their product’s development.

Because Boomi is a large organization with a strong cloud governance team, and because there are so many teams actively contributing to this project, having robust CI/CD is necessary. The CI/CD enabled by SageMaker Pipelines made it possible for the various contributing parties to collaborate. Boomi’s analysts contributed to preprocessing and postprocessing; the data science team customized, tuned, and built the model within a container; and the MLOps and systems engineering team were able to integrate Step Suggest into their core platform.

Conclusion

By leveraging Amazon SageMaker Studio, Amazon SageMaker Projects, and Amazon SageMaker Pipelines, Boomi made it easier to build MLOps Solutions at scale.

“AWS SageMaker Pipeline based solution has reduced the time needed to build, deploy, and manage our model by ~30%, thanks to its intuitive and user-friendly interface. By using this solution, we were able to deploy our model in just 4 weeks, 2x faster than if we had used traditional infrastructure.”

Boomi has an active relationship with their AWS account team. AWS account teams connect customers like Boomi with programs designed to address their business and technology needs. Connect with your AWS account team to learn more about programs like AWS PE ML FastStart to improve your time to market for new, innovative products built on or with AWS.


About the Authors

Dan Ferguson is an AI/ML Specialist Solutions Architect (SA) on the Private Equity Solutions Architecture at Amazon Web Services. Dan helps Private Equity backed portfolio companies leverage AI/ML technologies to achieve their business objectives.

Swagata Ashwani is a Senior Data Scientist at Boomi with over 6+ years experience in Data Science. Her interests include MLOps, natural language processing, and data visualization. She also actively engages herself in volunteering for Women in Data/AI and spreading more awareness and outreach within the AI community.
In her spare time she can be found plucking strums of her guitar, sipping masala chai and enjoying spicy Indian street food.

Read More