Generating 3D Molecular Conformers via Equivariant Coarse-Graining and Aggregated Attention

Generating 3D Molecular Conformers via Equivariant Coarse-Graining and Aggregated Attention

<!– –>
Figure 1: CoarsenConf architecture.

<!– (I) The encoder $q_phi(z| X, mathcal{R})$ takes the fine-grained (FG) ground truth conformer $X$, RDKit approximate conformer $mathcal{R}$ , and coarse-grained (CG) conformer $mathcal{C}$ as inputs (derived from $X$ and a predefined CG strategy), and outputs a variable-length equivariant CG representation via equivariant message passing and point convolutions.
(II) Equivariant MLPs are applied to learn the mean and log variance of both the posterior and prior distributions.
(III) The posterior (training) or prior (inference) is sampled and fed into the Channel Selection module, where an attention layer is used to learn the optimal pathway from CG to FG structure.
(IV) Given the FG latent vector and the RDKit approximation, the decoder $p_theta(X |mathcal{R}, z)$ learns to recover the low-energy FG structure through autoregressive equivariant message passing. The entire model can be trained end-to-end by optimizing the KL divergence of latent distributions and reconstruction error of generated conformers. –>

Molecular conformer generation is a fundamental task in computational chemistry. The objective is to predict stable low-energy 3D molecular structures, known as conformers, given the 2D molecule. Accurate molecular conformations are crucial for various applications that depend on precise spatial and geometric qualities, including drug discovery and protein docking.

We introduce CoarsenConf, an SE(3)-equivariant hierarchical variational autoencoder (VAE) that pools information from fine-grain atomic coordinates to a coarse-grain subgraph level representation for efficient autoregressive conformer generation.

Generating 3D Molecular Conformers via Equivariant Coarse-Graining and Aggregated Attention

Generating 3D Molecular Conformers via Equivariant Coarse-Graining and Aggregated Attention

<!– –>
Figure 1: CoarsenConf architecture.

<!– (I) The encoder $q_phi(z| X, mathcal{R})$ takes the fine-grained (FG) ground truth conformer $X$, RDKit approximate conformer $mathcal{R}$ , and coarse-grained (CG) conformer $mathcal{C}$ as inputs (derived from $X$ and a predefined CG strategy), and outputs a variable-length equivariant CG representation via equivariant message passing and point convolutions.
(II) Equivariant MLPs are applied to learn the mean and log variance of both the posterior and prior distributions.
(III) The posterior (training) or prior (inference) is sampled and fed into the Channel Selection module, where an attention layer is used to learn the optimal pathway from CG to FG structure.
(IV) Given the FG latent vector and the RDKit approximation, the decoder $p_theta(X |mathcal{R}, z)$ learns to recover the low-energy FG structure through autoregressive equivariant message passing. The entire model can be trained end-to-end by optimizing the KL divergence of latent distributions and reconstruction error of generated conformers. –>

Molecular conformer generation is a fundamental task in computational chemistry. The objective is to predict stable low-energy 3D molecular structures, known as conformers, given the 2D molecule. Accurate molecular conformations are crucial for various applications that depend on precise spatial and geometric qualities, including drug discovery and protein docking.

We introduce CoarsenConf, an SE(3)-equivariant hierarchical variational autoencoder (VAE) that pools information from fine-grain atomic coordinates to a coarse-grain subgraph level representation for efficient autoregressive conformer generation.

Optimizing LibTorch-based inference engine memory usage and thread-pooling

Optimizing LibTorch-based inference engine memory usage and thread-pooling

Outline

In this blog post we show how to optimize LibTorch-based inference engine to maximize throughput by reducing memory usage and optimizing the thread-pooling strategy. We apply these optimizations to Pattern Recognition engines for audio data, for example, music and speech recognition or acoustic fingerprinting. The optimizations discussed in this blog post allow for memory usage reduction by 50% and reduction in end-to-end latency for Inference by 37.5%. These optimizations are applicable to computer vision and natural language processing.

Audio Recognition Inferencing

Audio Recognition (AR) engines can be used to recognize and identify sound patterns. As an example, identifying the type and species of a bird from audio recordings, distinguishing music from the singer’s voice, or detecting an abnormal sound indicating a breach in a building. To identify sounds of interest, AR engines process audio through 4 stages:

  1. File Validation: The AR engine validates the input audio file.
  2. Feature Extraction: Features are extracted from each segment within the audio file.
  3. Inference: LibTorch performs inference using CPUs or accelerators. In our case Intel processors on an Elastic Cloud Compute (EC2) instance.
  4. Post-processing: A post-processing model decodes the results and calculates scores that are used to convert inference output into tags or transcripts.

Of these 4 steps, inference is the most computationally intensive and can take up to 50% of the pipeline processing time depending on the model complexity. This means that any optimization at this stage has a significant impact on the overall pipeline. 

Optimizing the Audio Recognition engine with concurrency…is not so simple

Our objective for this processing pipeline is to extract audio segments into tags or transcripts through a processing. The input data is an audio file composed of several short sound segments (S1 to S6 in Figure 1). The output data corresponds to tags or transcripts ordered by timestamps.

Figure 1: Example audio file with segment boundaries

Figure 1: Example audio file with segment boundaries

Each segment can be processed independently and in an out-of-order fashion. This offers the opportunity to process segments concurrently and in parallel to optimize the overall inference throughput as well as maximize the usage of the resources.

Parallelization on an instance can be achieved through multi-threading (pThreads, std::threads, OpenMP) or multi-processing. The advantage of multi-threading over multi-processing is the ability to use shared memory. It enables developers to minimize data duplication across threads by sharing data across threads; the AR models in our case (Figure 2). Furthermore, a reduction in memory allows us to run more pipelines in parallel by increasing the number of engine threads in order to utilize all vCPUs on our Amazon EC2 instance (c5.4xlarge in our case, it offers 16 vCPUs). In theory, we expect to see higher hardware utilization and higher throughput for our AR engine as a result.

Figure 2: Multi-threaded AR Engine

Figure 2: Multi-threaded AR Engine

But we found these assumptions to be wrong. Indeed, we found that increasing the number of threads of the application led to an increase of the end-to-end latency for each audio segment and to a decrease of the engine throughput. For example, increasing the concurrency from 1 to 5 threads led to an increase of the latency by 4x which had a proportional effect on decreasing the throughput. In fact, metrics showed that within the pipeline, the latency of the inference stage alone was 3x higher than it’s single thread baseline. 

Using a profiler, we found that the CPU Spin Time increased, potentially due to CPU oversubscription which impacts system and application performance. Given our control over the application’s multi-thread implementation, we chose to dive deeper into the stack and identify potential conflicts with LibTorch’s default settings.

Diving deeper on LibTorch’s multi-threading and its impact on concurrency

LibTorch’s parallel implementations on CPU for inference are based on  global thread pools. Examples of implementations are Inter-op and intra-op parallelism, which can be chosen depending on the model’s properties. In both cases, it is possible to set the number of threads in each thread-poll to optimize the latency and throughput. 

To test if LibTorch’s parallel default implementation settings had a counter effect on our inference latency, we ran an experiment on a 16 vCPus machine with a 35-minute audio file, keeping the LibTorch inter-threads constant at 1 (because our models didn’t utilize the inter-op thread pool). We collected the following data as shown in Figure 3 and 4. 

Figure 3: CPU Utilization for different number of engine threads

Figure 3: CPU Utilization for different number of engine threads

Figure 4: Processing times for different number of engine threads

Figure 4: Processing times for different number of engine threads

Execution time in Figure 4 is the end-to-end processing time for processing all the segments of the given audio file. We have 4 different configurations of LibTorch intra-threads which are 1, 4, 8, 16 and we change the number of engine threads from 1 to 16 for each intra-thread LibTorch configuration. As we see in Figure 3, CPU utilization increases with an increase in the number of engine threads for all LibTorch intra-thread configurations. But as we see in Figure 4, an increase in CPU utilization doesn’t translate into lower execution time. We found out that in all but one case, as the number of engine threads shot up, so did execution time. The one exception was the case where the intra-thread pool size was 1.

Resolving the global thread pool issue

Using too many threads with a global thread pool led to performance degradation and caused an over-subscription problem. Without disabling LibTorch global thread pools, it was difficult to match the performance of the multi-process engine.

Disabling the LibTorch global thread pool is as simple as setting the intra-op/inter-op parallelism threads to 1, as shown here:

at::set_num_threads(1)           // Disables the intraop thread pool.
at::set_num_interop_threads(1). // Disables the interop thread pool.

As shown in Figure 4, the lowest processing time was measured when the LibTorch global thread pool was disabled.

This solution improved AR engine throughput in several cases. However, when evaluating long datasets (audio files longer than 2 hours in load test), we found that the memory footprint of the engine gradually started to increase.

Optimizing memory usage

We ran a load-test on the system with two hours long audio files and found out that the observed memory increase was the result of memory fragmentation within a multi-threaded LibTorch inference. We resolved this using jemalloc, which is a general purpose malloc(3) implementation that emphasizes fragmentation avoidance and scalable concurrency support. Using jemalloc, our peak memory usage decreased by an average of 34% and average memory usage decreased by 53%.

Figure 5: Memory usage over time using the same input file with and without jemalloc

Figure 5: Memory usage over time using the same input file with and without jemalloc

Summary

To optimize the performance of multi-threaded LibTorch-based inference engines, we recommend verifying that there is no oversubscription problem in LibTorch. In our case, all threads in the multi-threaded engine were sharing the LibTorch global thread pool, which caused an oversubscription problem. This was remedied by disabling the global thread pool: we disabled the interop and intraop global thread pool by setting threads to 1. To optimize the memory of a multi-threaded engine, we recommend using Jemalloc as a memory allocator tool rather than the default malloc function.

Read More

Capture public health insights more quickly with no-code machine learning using Amazon SageMaker Canvas

Capture public health insights more quickly with no-code machine learning using Amazon SageMaker Canvas

Public health organizations have a wealth of data about different types of diseases, health trends, and risk factors. Their staff has long used statistical models and regression analyses to make important decisions such as targeting populations with the highest risk factors for a disease with therapeutics, or forecasting the progression of concerning outbreaks.

When public health threats emerge, data velocity increases, incoming datasets can grow larger, and data management becomes more challenging. This makes it more difficult to analyze data holistically and capture insights from it. And when time is of the essence, speed and agility in analyzing data and drawing insights from it are key blockers to forming rapid and robust health responses.

Typical questions public health organizations face during times of stress include:

  • Will there be sufficient therapeutics in a certain location?
  • What risk factors are driving health outcomes?
  • Which populations have a higher risk of reinfection?

Because answering these questions requires understanding complex relationships between many different factors—often changing and dynamic—one powerful tool we have at our disposal is machine learning (ML), which can be deployed to analyze, predict, and solve these complex quantitative problems. We have increasingly seen ML applied to address difficult health-related problems such as classifying brain tumors with image analysis and predicting the need for mental health to deploy early intervention programs.

But what happens if public health organizations are in short supply of the skills required to apply ML to these questions? The application of ML to public health problems is impeded, and public health organizations lose the ability to apply powerful quantitative tools to address their challenges.

So how do we remove these bottlenecks? The answer is to democratize ML and allow a larger number of health professionals with deep domain expertise to use it and apply it to the questions they want to solve.

Amazon SageMaker Canvas is a no-code ML tool that empowers public health professionals such as epidemiologists, informaticians, and bio-statisticians to apply ML to their questions, without requiring a data science background or ML expertise. They can spend their time on the data, apply their domain expertise, quickly test hypothesis, and quantify insights. Canvas helps make public health more equitable by democratizing ML, allowing health experts to evaluate large datasets and empowering them with advanced insights using ML.

In this post, we show how public health experts can forecast on-hand demand for a certain therapeutic for the next 30 days using Canvas. Canvas provides you with a visual interface that allows you to generate accurate ML predictions on your own without requiring any ML experience or having to write a single line of code.

Solution overview

Let’s say we are working on data that we collected from states across the US. We may form a hypothesis that a certain municipality or location doesn’t have enough therapeutics in the coming weeks. How can we test this quickly and with a high degree of accuracy?

For this post, we use a publicly available dataset from the US Department of Health and Human Services, which contains state-aggregated time series data related to COVID-19, including hospital utilization, availability of certain therapeutics, and much more. The dataset (COVID-19 Reported Patient Impact and Hospital Capacity by State Timeseries (RAW)) is downloadable from healthdata.gov, and has 135 columns and over 60,000 rows. The dataset is updated periodically.

In the following sections, we demonstrate how to perform exploratory data analysis and preparation, build the ML forecasting model, and generate predictions using Canvas.

Perform exploratory data analysis and preparation

When doing a time series forecast in Canvas, we need to reduce the number of features or columns according to the service quotas. Initially, we reduce the number of columns to the 12 that are likely to be the most relevant. For example, we dropped the age-specific columns because we’re looking to forecast total demand. We also dropped columns whose data was similar to other columns we kept. In future iterations, it is reasonable to experiment with retaining other columns and using feature explainability in Canvas to quantify the importance of these features and which we want to keep. We also rename the state column to location.

Looking at the dataset, we also decide to remove all the rows for 2020, because there were limited therapeutics available at that time. This allows us to reduce the noise and improve the quality of the data for the ML model to learn from.

Reducing the number of columns can be done in different ways. You can edit the dataset in a spreadsheet, or directly inside Canvas using the user interface.

You can import data into Canvas from various sources, including from local files from your computer, Amazon Simple Storage Service (Amazon S3) buckets, Amazon Athena, Snowflake (see Prepare training and validation dataset for facies classification using Snowflake integration and train using Amazon SageMaker Canvas), and over 40 additional data sources.

After our data has been imported, we can explore and visualize our data to get additional insights into it, such as with scatterplots or bar charts. We also look at the correlation between different features to ensure that we have selected what we think are the best ones. The following screenshot shows an example visualization.

Build the ML forecasting model

Now we’re ready to create our model, which we can do with just a few clicks. We choose the column identifying on-hand therapeutics as our target. Canvas automatically identifies our problem as a time series forecast based on the target column we just selected, and we can configure the parameters needed.

We configure the item_id, the unique identifier, as location because our dataset is provided by location (US states). Because we’re creating a time series forecast, we need to select a time stamp, which is date in our dataset. Finally, we specify how many days into the future we want to forecast (for this example, we choose 30 days). Canvas also offers the ability to include a holiday schedule to improve accuracy. In this case, we use US holidays because this is a US-based dataset.

With Canvas, you can get insights from your data before you build a model by choosing Preview model. This saves you time and cost by not building a model if the results are unlikely to be satisfactory. By previewing our model, we realize that the impact of some columns is low, meaning the expected value of the column to the model is low. We remove columns by deselecting them in Canvas (red arrows in the following screenshot) and see an improvement in an estimated quality metric (green arrow).

Moving on to building our model, we have two options, Quick build and Standard build. Quick build produces a trained model in less than 20 minutes, prioritizing speed over accuracy. This is great for experimentation, and is a more thorough model than the preview model. Standard build produces a trained model in under 4 hours, prioritizing accuracy over latency, iterating through a number of model configurations to automatically select the best model.

First, we experiment with Quick build to validate our model preview. Then, because we’re happy with the model, we choose Standard build to have Canvas help build the best possible model for our dataset. If the Quick build model had produced unsatisfactory results, then we would go back and adjust the input data to capture a higher level of accuracy. We could accomplish this by, for instance, adding or removing columns or rows in our original dataset. The Quick build model supports rapid experimentation without having to rely on scarce data science resources or wait for a full model to be completed.

Generate predictions

Now that the model has been built, we can predict the availability of therapeutics by location. Let’s look at what our estimated on-hand inventory looks like for the next 30 days, in this case for Washington, DC.

Canvas outputs probabilistic forecasts for therapeutic demand, allowing us to understand both the median value as well as upper and lower bounds. In the following screenshot, you can see the tail end of the historical data (the data from the original dataset). You can then see three new lines: the median (50th quantile) forecast in purple, the lower bound (10th quantile) in light blue, and upper bound (90th quantile) in dark blue.

Examining upper and lower bounds provides insight into the probability distribution of the forecast and allows us to make informed decisions about desired levels of local inventory for this therapeutic. We can add this insight to other data (for example, disease progression forecasts, or therapeutic efficacy and uptake) to make informed decisions about future orders and inventory levels.

Conclusion

No-code ML tools empower public health experts to quickly and effectively apply ML to public health threats. This democratization of ML makes public health organizations more agile and more efficient in their mission of protecting public health. Ad hoc analyses that can identify important trends or inflection points in public health concerns can now be performed directly by specialists, without having to compete for limited ML expert resources and slowing down response times and decision-making.

In this post, we showed how someone without any knowledge of ML can use Canvas to forecast the on-hand inventory of a certain therapeutic. This analysis can be performed by any analyst in the field, through the power of cloud technologies and no-code ML. Doing so distributes capabilities broadly and allows public health agencies to be more responsive, and to more efficiently use centralized and field office resources to deliver better public health outcomes.

What are some of the questions you might be asking, and how may low-code/no-code tools be able to help you answer them? If you are interested in learning more about Canvas, refer to Amazon SageMaker Canvas and start applying ML to your own quantitative health questions.


About the authors

Henrik Balle is a Sr. Solutions Architect at AWS supporting the US Public Sector. He works closely with customers on a range of topics from machine learning to security and governance at scale. In his spare time, he loves road biking, motorcycling, or you might find him working on yet another home improvement project.

Dan Sinnreich leads Go to Market product management for Amazon SageMaker Canvas and Amazon Forecast. He is focused on democratizing low-code/no-code machine learning and applying it to improve business outcomes. Previous to AWS Dan built enterprise SaaS platforms and time-series risk models used by institutional investors to manage risk and construct portfolios. Outside of work, he can be found playing hockey, scuba diving, traveling, and reading science fiction.

Read More

Safe image generation and diffusion models with Amazon AI content moderation services

Safe image generation and diffusion models with Amazon AI content moderation services

Generative AI technology is improving rapidly, and it’s now possible to generate text and images based on text input. Stable Diffusion is a text-to-image model that empowers you to create photorealistic applications. You can easily generate images from text using Stable Diffusion models through Amazon SageMaker JumpStart.

The following are examples of input texts and the corresponding output images generated by Stable Diffusion. The inputs are “A boxer dancing on a table,” “A lady on the beach in swimming wear, water color style,” and “A dog in a suit.”

Sample images

Although generative AI solutions are powerful and useful, they can also be vulnerable to manipulation and abuse. Customers using them for image generation must prioritize content moderation to protect their users, platform, and brand by implementing strong moderation practices to create a safe and positive user experience while safeguarding their platform and brand reputation.

In this post, we explore using AWS AI services Amazon Rekognition and Amazon Comprehend, along with other techniques, to effectively moderate Stable Diffusion model-generated content in near-real time. To learn how to launch and generate images from text using a Stable Diffusion model on AWS, refer to Generate images from text with the stable diffusion model on Amazon SageMaker JumpStart.

Solution overview

Amazon Rekognition and Amazon Comprehend are managed AI services that provide pre-trained and customizable ML models via an API interface, eliminating the need for machine learning (ML) expertise. Amazon Rekognition Content Moderation automates and streamlines image and video moderation. Amazon Comprehend utilizes ML to analyze text and uncover valuable insights and relationships.

The following reference illustrates the creation of a RESTful proxy API for moderating Stable Diffusion text-to-image model-generated images in near-real time. In this solution, we launched and deployed a Stable Diffusion model (v2-1 base) using JumpStart. The solution uses negative prompts and text moderation solutions such as Amazon Comprehend and a rule-based filter to moderate input prompts. It also utilizes Amazon Rekognition to moderate the generated images. The RESTful API will return the generated image and the moderation warnings to the client if unsafe information is detected.

Architecture diagram

The steps in the workflow are as follows:

  1. The user send a prompt to generate an image.
  2. An AWS Lambda function coordinates image generation and moderation using Amazon Comprehend, JumpStart, and Amazon Rekognition:
    1. Apply a rule-based condition to input prompts in Lambda functions, enforcing content moderation with forbidden word detection.
    2. Use the Amazon Comprehend custom classifier to analyze the prompt text for toxicity classification.
    3. Send the prompt to the Stable Diffusion model through the SageMaker endpoint, passing both the prompts as user input and negative prompts from a predefined list.
    4. Send the image bytes returned from the SageMaker endpoint to the Amazon Rekognition DetectModerationLabel API for image moderation.
    5. Construct a response message that includes image bytes and warnings if the previous steps detected any inappropriate information in the prompt or generative image.
  3. Send the response back to the client.

The following screenshot shows a sample app built using the described architecture. The web UI sends user input prompts to the RESTful proxy API and displays the image and any moderation warnings received in the response. The demo app blurs the actual generated image if it contains unsafe content. We tested the app with the sample prompt “A sexy lady.”

Demo screenshot

You can implement more sophisticated logic for a better user experience, such as rejecting the request if the prompts contain unsafe information. Additionally, you could have a retry policy to regenerate the image if the prompt is safe, but the output is unsafe.

Predefine a list of negative prompts

Stable Diffusion supports negative prompts, which lets you specify prompts to avoid during image generation. Creating a predefined list of negative prompts is a practical and proactive approach to prevent the model from producing unsafe images. By including prompts like “naked,” “sexy,” and “nudity,” which are known to lead to inappropriate or offensive images, the model can recognize and avoid them, reducing the risk of generating unsafe content.

The implementation can be managed in the Lambda function when calling the SageMaker endpoint to run inference of the Stable Diffusion model, passing both the prompts from user input and the negative prompts from a predefined list.

Although this approach is effective, it could impact the results generated by the Stable Diffusion model and limit its functionality. It’s important to consider it as one of the moderation techniques, combined with other approaches such as text and image moderation using Amazon Comprehend and Amazon Rekognition.

Moderate input prompts

A common approach to text moderation is to use a rule-based keyword lookup method to identify whether the input text contains any forbidden words or phrases from a predefined list. This method is relatively easy to implement, with minimal performance impact and lower costs. However, the major drawback of this approach is that it’s limited to only detecting words included in the predefined list and can’t detect new or modified variations of forbidden words not included in the list. Users can also attempt to bypass the rules by using alternative spellings or special characters to replace letters.

To address the limitations of a rule-based text moderation, many solutions have adopted a hybrid approach that combines rule-based keyword lookup with ML-based toxicity detection. The combination of both approaches allows for a more comprehensive and effective text moderation solution, capable of detecting a wider range of inappropriate content and improving the accuracy of moderation outcomes.

In this solution, we use an Amazon Comprehend custom classifier to train a toxicity detection model, which we use to detect potentially harmful content in input prompts in cases where no explicit forbidden words are detected. With the power of machine learning, we can teach the model to recognize patterns in text that may indicate toxicity, even when such patterns aren’t easily detectable by a rule-based approach.

With Amazon Comprehend as a managed AI service, training and inference are simplified. You can easily train and deploy Amazon Comprehend custom classification with just two steps. Check out our workshop lab for more information about the toxicity detection model using an Amazon Comprehend custom classifier. The lab provides a step-by-step guide to creating and integrating a custom toxicity classifier into your application. The following diagram illustrates this solution architecture.

Comprehend custom classification

This sample classifier uses a social media training dataset and performs binary classification. However, if you have more specific requirements for your text moderation needs, consider using a more tailored dataset to train your Amazon Comprehend custom classifier.

Moderate output images

Although moderating input text prompts is important, it doesn’t guarantee that all images generated by the Stable Diffusion model will be safe for the intended audience, because the model’s outputs can contain a certain level of randomness. Therefore, it’s equally important to moderate the images generated by the Stable Diffusion model.

In this solution, we utilize Amazon Rekognition Content Moderation, which employs pre-trained ML models, to detect inappropriate content in images and videos. In this solution, we use the Amazon Rekognition DetectModerationLabel API to moderate images generated by the Stable Diffusion model in near-real time. Amazon Rekognition Content Moderation provides pre-trained APIs to analyze a wide range of inappropriate or offensive content, such as violence, nudity, hate symbols, and more. For a comprehensive list of Amazon Rekognition Content Moderation taxonomies, refer to Moderating content.

The following code demonstrates how to call the Amazon Rekognition DetectModerationLabel API to moderate images within an Lambda function using the Python Boto3 library. This function takes the image bytes returned from SageMaker and sends them to the Image Moderation API for moderation.

import boto3

# Initialize the Amazon Rekognition client object
rekognition = boto3.client('rekognition')
    
# Call the Rekognition Image moderation API and store the results
response = rekognition.detect_moderation_labels(
    Image={
        'Bytes': base64.b64decode(img_bytes)
    }
)
    
# Printout the API response
print(response)

For additional examples of the Amazon Rekognition Image Moderation API, refer to our Content Moderation Image Lab.

Effective image moderation techniques for fine-tuning models

Fine-tuning is a common technique used to adapt pre-trained models to specific tasks. In the case of Stable Diffusion, fine-tuning can be used to generate images that incorporate specific objects, styles, and characters. Content moderation is critical when training a Stable Diffusion model to prevent the creation of inappropriate or offensive images. This involves carefully reviewing and filtering out any data that could lead to the generation of such images. By doing so, the model learns from a more diverse and representative range of data points, improving its accuracy and preventing the propagation of harmful content.

JumpStart makes fine-tuning the Stable Diffusion Model easy by providing the transfer learning scripts using the DreamBooth method. You just need to prepare your training data, define the hyperparameters, and start the training job. For more details, refer to Fine-tune text-to-image Stable Diffusion models with Amazon SageMaker JumpStart.

The dataset for fine-tuning needs to be a single Amazon Simple Storage Service (Amazon S3) directory including your images and instance configuration file dataset_info.json, as shown in the following code. The JSON file will associate the images with the instance prompt like this: {'instance_prompt':<<instance_prompt>>}.

input_directory 
|---instance_image_1.png 
|---instance_image_2.png 
|---instance_image_3.png 
|---instance_image_4.png 
|---instance_image_5.png 
|---dataset_info.json

Obviously, you can manually review and filter the images, but this can be time-consuming and even impractical when you do this at scale across many projects and teams. In such cases, you can automate a batch process to centrally check all the images against the Amazon Rekognition DetectModerationLabel API and automatically flag or remove images so they don’t contaminate your training.

Moderation latency and cost

In this solution, a sequential pattern is used to moderate text and images. A rule-based function and Amazon Comprehend are called for text moderation, and Amazon Rekognition is used for image moderation, both before and after invoking Stable Diffusion. Although this approach effectively moderates input prompts and output images, it may increase the overall cost and latency of the solution, which is something to consider.

Latency

Both Amazon Rekognition and Amazon Comprehend offer managed APIs that are highly available and have built-in scalability. Despite potential latency variations due to input size and network speed, the APIs used in this solution from both services offer near-real-time inference. Amazon Comprehend custom classifier endpoints can offer a speed of less than 200 milliseconds for input text sizes of less than 100 characters, while the Amazon Rekognition Image Moderation API serves approximately 500 milliseconds for average file sizes of less than 1 MB. (The results are based on the test conducted using the sample application, which qualifies as a near-real-time requirement.)

In total, the moderation API calls to Amazon Rekognition and Amazon Comprehend will add up to 700 milliseconds to the API call. It’s important to note that the Stable Diffusion request usually takes longer depending on the complexity of the prompts and the underlying infrastructure capability. In the test account, using an instance type of ml.p3.2xlarge, the average response time for the Stable Diffusion model via a SageMaker endpoint was around 15 seconds. Therefore, the latency introduced by moderation is approximately 5% of the overall response time, making it a minimal impact on the overall performance of the system.

Cost

The Amazon Rekognition Image Moderation API employs a pay-as-you-go model based on the number of requests. The cost varies depending on the AWS Region used and follows a tiered pricing structure. As the volume of requests increases, the cost per request decreases. For more information, refer to Amazon Rekognition pricing.

In this solution, we utilized an Amazon Comprehend custom classifier and deployed it as an Amazon Comprehend endpoint to facilitate real-time inference. This implementation incurs both a one-time training cost and ongoing inference costs. For detailed information, refer to Amazon Comprehend Pricing.

Jumpstart enables you to quickly launch and deploy the Stable Diffusion model as a single package. Running inference on the Stable Diffusion model will incur costs for the underlying Amazon Elastic Compute Cloud (Amazon EC2) instance as well as inbound and outbound data transfer. For detailed information, refer to Amazon SageMaker Pricing.

Summary

In this post, we provided an overview of a sample solution that showcases how to moderate Stable Diffusion input prompts and output images using Amazon Comprehend and Amazon Rekognition. Additionally, you can define negative prompts in Stable Diffusion to prevent generating unsafe content. By implementing multiple moderation layers, the risk of producing unsafe content can be greatly reduced, ensuring a safer and more dependable user experience.

Learn more about content moderation on AWS and our content moderation ML use cases, and take the first step towards streamlining your content moderation operations with AWS.


About the Authors

Lana ZhangLana Zhang is a Senior Solutions Architect at AWS WWSO AI Services team, specializing in AI and ML for content moderation, computer vision, and natural language processing. With her expertise, she is dedicated to promoting AWS AI/ML solutions and assisting customers in transforming their business solutions across diverse industries, including social media, gaming, e-commerce, and advertising & marketing.

James WuJames Wu is a Senior AI/ML Specialist Solution Architect at AWS. helping customers design and build AI/ML solutions. James’s work covers a wide range of ML use cases, with a primary interest in computer vision, deep learning, and scaling ML across the enterprise. Prior to joining AWS, James was an architect, developer, and technology leader for over 10 years, including 6 years in engineering and 4 years in marketing and advertising industries.

Kevin CarlsonKevin Carlson is a Principal AI/ML Specialist with a focus on Computer Vision at AWS, where he leads Business Development and GTM for Amazon Rekognition. Prior to joining AWS, he led Digital Transformation globally at Fortune 500 Engineering company AECOM, with a focus on artificial intelligence and machine learning for generative design and infrastructure assessment. He is based in Chicago, where outside of work he enjoys time with his family, and is passionate about flying airplanes and coaching youth baseball.

John RouseJohn Rouse is a Senior AI/ML Specialist at AWS, where he leads global business development for AI services focused on Content Moderation and Compliance use cases. Prior to joining AWS, he has held senior level business development and leadership roles with cutting edge technology companies. John is working to put machine learning in the hands of every developer with AWS AI/ML stack. Small ideas bring about small impact. John’s goal for customers is to empower them with big ideas and opportunities that open doors so they can make a major impact with their customer.

Read More

Watch This Space: New Field of Spatial Finance Uses AI to Estimate Risk, Monitor Assets, Analyze Claims

Watch This Space: New Field of Spatial Finance Uses AI to Estimate Risk, Monitor Assets, Analyze Claims

When making financial decisions, it’s important to look at the big picture — say, one taken from a drone, satellite or AI-powered sensor.

The emerging field of spatial finance harnesses AI insights from remote sensors and aerial imagery to help banks, insurers, investment firms and businesses analyze risks and opportunities, enable new services and products, measure the environmental impact of their holdings, and assess damage after a crisis.

Spatial finance applications include monitoring assets, modeling energy efficiency, tracking emissions and pollution, detecting illegal mining and deforestation, and analyzing the risks of natural disasters. NVIDIA AI software and hardware can help the industry combine their business data with geospatial data to accelerate these applications.

By better understanding the environmental and social risks associated with an investment, the financial sector can choose to prioritize those that are more likely to support sustainable development — a framework known as environmental, social and governance (ESG).

Focus on sustainable investments is growing: A Bloomberg Intelligence analysis estimated that ESG assets will represent more than a third of total managed assets worldwide by 2025. And a report by the European Union Agency for the Space Programme predicts that the insurance and finance industry will become the top consumer of Earth observation data and services over the next decade — resulting in more than $1 billion in total revenue by 2031.

Several members of NVIDIA Inception, a global program that supports cutting-edge startups, are advancing these efforts with GPU-accelerated AI applications that can track water pollution near industrial plants, quantify the financial risk of wildfires, assess damage after storms and more.

Powerful Compute for Large-Scale Data

GPU-accelerated AI and data science can rapidly extract insights from complex, unstructured data — enabling banks and businesses to set up real-time streaming and analysis of data as it’s captured from satellites, drones, antennas and edge sensors.

By monitoring aerial imagery — available for free from public space agencies, or at higher granularity from private companies — analysts can get a clear view of how much water is being used from a reservoir over time, how many trees are being cut down for a construction project or how many homes were damaged by a tornado.

This capability can help audit investments by verifying the accuracy of written records such as government-mandated disclosures, environmental impact reports or even insurance claims.

For example, investors might track the supply chain of a company that reports it has achieved net zero in its production line, and discover that it actually relies on an overseas plant emitting coal ash visible in satellite images. Or, sensors that analyze heat emissions from buildings could help identify low-emitting businesses for a tax credit.

NVIDIA’s edge computing solutions, including the NVIDIA Jetson platform for autonomous machines and other embedded applications, are powering numerous AI initiatives in spatial finance.

In addition to using NVIDIA hardware to speed up their applications, developers are adopting software including the NVIDIA DeepStream software development kit for streaming analytics, part of the NVIDIA Metropolis platform for vision AI. They’re also using the NVIDIA Omniverse platform for building and operating metaverse applications for detailed, 3D visualizations of geospatial data.

Insuring Property — From Assessing Risks to Accelerating Claims

NVIDIA Inception members are developing GPU-accelerated applications that turn geospatial data into insights for insurance companies, reducing the number of expensive onsite visits needed to monitor the status of insured properties.

RSS-Hydro, based in Luxembourg, uses GPU computing on premises and in the cloud to train FloodSENS, a machine learning app that maps flood impact from satellite images. The company also uses NVIDIA Omniverse to animate FloodSENS in 3D, helping the team more effectively communicate flood risks and inform resource allocation planning during emergencies.

Toronto-based Ecopia AI uses deep learning-based mapping systems to mine geospatial data, helping to produce next-generation digital maps with highly accurate segmentation of buildings, roads, forests and more. These maps power diverse applications across the public and private sectors, including government climate resilience initiatives and insurance risk assessment. Ecopia uses NVIDIA GPUs to develop its AI models.

CrowdAI, based in the San Francisco Bay Area, uses deep learning tools to accelerate the insurance claims process by automatically analyzing aerial images and videos to detect assets that were damaged or destroyed in natural disasters. The company uses NVIDIA GPUs for both training and inference.

CrowdAI’s deep learning model detected buildings from this aerial image taken in the aftermath of Hurricane Michael in 2018. The AI also categorizes the level of damage – ranging from green representing no damage; to yellow and orange for minor and major damage, respectively; to purple for destroyed buildings. Image credit: CrowdAI, Inc., DigitalGlobe, NOAA, and Nearmap.

Predicting Risks and Opportunities for Businesses

Inception startups are also using geospatial data to help government groups and banks quantify the risks and opportunities of their investments — such as predicting crop yields, detecting industrial pollution and measuring the land and water use of an asset.

Switzerland-based Picterra is supporting sustainable finance with a geospatial MLOps platform that enables banks, insurance companies and financial consultancies to analyze ESG metrics. The company’s AI-driven insights can help the financial industry make investment decisions, model risk and quickly quantify vulnerabilities and opportunities in investment portfolios. The company uses NVIDIA Tensor Core GPUs and the NVIDIA CUDA Toolkit to develop its AI models, which process raw data from satellite, drone and aerial imagery.

London-based Satellite Vu, a startup applying satellite technology to address global challenges, will be able to monitor the temperature of any building on the planet in near real time using infrared camera data. These infrared images will provide its customers with insights about the economic activity, the energy efficiency of buildings, the urban heat island effect and more.

And Sourcenergy, based in Houston, uses geospatial data to power an energy supply chain intelligence platform that can help the financial services industry with market research. Its AI tools, developed using NVIDIA A100 GPUs, enable investors to independently create real-time models of energy companies’ well inventories and project costs, giving them insights even before the companies share data in their quarterly earnings reports.

Learn more about NVIDIA’s work in financial services, and read more on geospatial AI in investment management in chapter 10 of this handbook.

Read More

Who’ll Stop the Rain? Scientists Call for Climate Collaboration

Who’ll Stop the Rain? Scientists Call for Climate Collaboration

A trio of top scientists is helping lead one of the most ambitious efforts in the history of computing — building a digital twin of Earth.

Peter Bauer, Bjorn Stevens and Francisco “Paco” Doblas-Reyes agree that a digital twin of Earth needs to support resolutions down to a kilometer so a growing set of users can explore the risks of climate change and how to adapt to them. They say the work will require accelerated computing, AI and lots of collaboration.

Their Herculean efforts, some already using NVIDIA technologies, inspired Earth-2, NVIDIA’s contribution to the common cause.

“We will dedicate ourselves and our significant resources to direct NVIDIA’s scale and expertise in computational sciences, to join with the world’s climate science community,” Jensen Huang, founder and CEO of NVIDIA, had said when he announced the Earth-2 initiative in late 2021.

Collaborating on an Unprecedented Scale

Huang’s commitment signaled support for efforts like Destination Earth (DestinE), a pan-European project to create digital twins of the planet.

“No single computer may be enough to do it, so it needs a distributed, international effort,” said Bauer, a veteran with more than 20 years at Europe’s top weather forecasting center who now leads the project that aims to make planet-scale models available by 2030.

Last year, he co-authored a Nature article that said the work “requires collaboration on an unprecedented scale.”

Chart showing international collaboration on digital twins for climate change
Bauer calls for broad international cooperation on a new Earth information system.

In a March GTC talk, Bauer envisioned a federation that “mobilizes resources from many countries, including private players, and NVIDIA could be one that would be very interesting.”

Pix of Peter Bauer
Peter Bauer

Such resources would enable the enormous work of developing new numeric and machine-learning models, then running them in massive inference jobs to make predictions that stretch across multiple decades.

DestinE has its roots in a 2008 climate conference. It’s the fruit of a number of programs, including many Bauer led in his years with the European Centre for Medium-Range Weather Forecasts — based in Reading, England — which develops some of the most advanced weather forecast models in the world.

Consuming a Petabyte a Day

The collaboration is broad because the computing requirements are massive.

Pix of Francisco “Paco” Doblas-Reyes
Francisco Doblas-Reyes

“We’re talking about producing petabytes of data a day that have to be delivered very quickly,” said Doblas-Reyes, director of the Earth sciences department at the Barcelona Supercomputing Center, a lead author at Intergovernmental Panel on Climate Change — a group that creates some of the most definitive reports on climate change — and a contributor to the DestinE program.

The digital twin effort will turn the traditional approach to weather and climate forecasting “upside down so users can be the drivers of the process,” he said in a March talk at GTC, NVIDIA’s developer conference. The goal is to “put the user at the helm of producing climate information that’s more useful for climate adaptation,” he said.

His talk described the new models, workflows and systems needed to capture in detail the chaotic nature of climate systems.

Articulating the Vision

The vision for a digital twin crystalized in a keynote at the SC20 supercomputing conference from Stevens, a director at the Max Planck Institute for Meteorology, in Hamburg. He leads work on one of the world’s top weather models for climate applications, as well as an effort to enable simulations at kilometer-level resolution, an order of magnitude finer than today’s best work.

“We need a new type of computing capability … for planetary information systems that let us work through the consequences of our actions and policies, so we can build a more sustainable future,” he said.

Bjorn Stevens at landmark SC 20 talk on climate change
Stevens’ landmark talk at SC20 crystallized the vision of a digital twin of Earth.

Stevens described a digital twin that’s accurate and interactive. For example, he imagined people querying it to see how a warming climate could affect flooding in northern Europe or food security in Africa.

AI Enables Interactive Simulations

AI will play a lead role in giving users that level of interactivity, he said in a talk at GTC last year.

“We need AI to get to where we need to be,” he said, giving shout-outs to NVIDIA and colleagues, including Bauer and Doblas-Reyes. “Real steps forward come from people bringing their different perspectives together and rethinking how we work.”

Chart shows resolution of climate models needed over time approaches 1km
Climate simulations pursue ultra-high resolution for greater accuracy.

Doblas-Reyes agreed in his GTC talk this year.

“In my opinion, AI is a necessary complement for the digital twin — it’s the only way to offer true interactivity to users and help provide a good trajectory of what’s to come in our climate,” he said.

On a Journey Together

All three scientists gave examples of how NVIDIA technologies have been used in a wide variety of projects addressing climate change.

In his GTC talk, Stevens took a characteristically playful turn. He showed a cartoon version of Huang, like Isaac Newton, struck with a falling apple and an insight for how to engage with the scientific effort.

“We need you Jensen, and you need us,” Stevens said.

Slide from Stevens' GTC portraying Huang as Newton
Stevens playfully portrayed Huang as Isaac Newton in his GTC talk.

The MareNostrum 5 system coming to the Barcelona center provides one example. It’s expected to accelerate some of the DestinE work on NVIDIA H100 Tensor Core GPUs.

Building a digital twin of Earth is “an exciting opportunity to re-think the future of HPC with AI on top,” said Mike Pritchard, a veteran climate scientist who directs climate research at NVIDIA.

NVIDIA Omniverse for connecting 3D tools and developing metaverse applications, NVIDIA Modulus for physics-informed machine learning and NVIDIA Triton for AI inference all have roles to play in the broad effort, he said.

It’s a long and evolving collaboration, Bauer said in his GTC talk. “I sent my first email to NVIDIA on these issues 14 years ago, and NVIDIA has been with us on this journey ever since.”

To learn more, read the concept paper developed for the Berlin Summit for Earth Virtualization Engines, July 3-7, where Huang will deliver a keynote address.

Read More