July 2024 – Page 15

Create custom images for geospatial analysis with Amazon SageMaker Distribution in Amazon SageMaker Studio

Amazon SageMaker Studio provides a comprehensive suite of fully managed integrated development environments (IDEs) for machine learning (ML), including JupyterLab, Code Editor (based on Code-OSS), and RStudio. It supports all stages of ML development—from data preparation to deployment, and allows you to launch a preconfigured JupyterLab IDE for efficient coding within seconds. Additionally, its flexible interface and artificial intelligence (AI) powered coding assistant simplifies and enhances the ML workflow configuration, debugging, and code testing.

Geospatial data such as satellite images, coordinate traces, or aerial maps that are enriched with characteristics or attributes of other business and environmental datasets is becoming increasingly available. This unlocks valuable use cases in fields such as environmental monitoring, urban planning, agriculture, disaster response, transportation, and public health.

To effectively utilize the wealth of information contained in such datasets for ML and analytics, access to the right tools for geospatial data handling is crucial. This is especially relevant given that geospatial data often comes in specialized file formats such as Cloud Optimized GeoTIFF (COG), Zarr files, GeoJSON, and GeoParquet that require dedicated software tools and libraries to work with.

To address these specific needs within SageMaker Studio, this post shows you how to extend Amazon SageMaker Distribution with additional dependencies to create a custom container image tailored for geospatial analysis. Although the example in this post focuses on geospatial data science, the methodology presented can be applied to any kind of custom image based on SageMaker Distribution.

SageMaker Distribution images are Docker images that come with preinstalled data science packages and are preconfigured with a JupyterLab IDE, which allows you to use these images in the SageMaker Studio UI as well as for non-interactive workflows like processing or training. This allows you to use the same runtime across SageMaker Studio notebooks and asynchronous jobs like processing or training, facilitating a seamless transition from local experimentation to batch execution while only having to maintain a single Docker image.

In this post, we provide step-by-step guidance on how you can build and use custom container images in SageMaker Studio. Specifically, we demonstrate how you can customize SageMaker Distribution for geospatial workflows by extending it with open-source geospatial Python libraries. We explain how to build and deploy the image on AWS using continuous integration and delivery (CI/CD) tools and how to make the deployed image accessible in SageMaker Studio. All code used in this post, including the Dockerfile and infrastructure as code (IaC) templates for quick deployment, is available as a GitHub repository.

Solution overview

You can building a custom container image and use it in SageMaker Studio with the following steps:

Create a Dockerfile that includes the additional Python libraries and tools.
Build a custom container image from the Dockerfile.
Push the custom container image to a private repository on Amazon Elastic Container Registry (Amazon ECR).
Attach the image to your Amazon SageMaker Studio domain.
Access the image from your JupyterLab space.

The following diagram illustrates the solution architecture.

The solution uses AWS CodeBuild, a fully managed service that compiles source code and produces deployable software artifacts, to build a new container image from a Dockerfile. CodeBuild supports a broad selection of git version control sources like AWS CodeCommit, GitHub, and GitLab. For this post, we host our build files on Amazon Simple Storage Service (Amazon S3) and use it as the source provider for the CodeBuild project. You can extend this solution to work with alternative CI/CD tooling, including GitLab, Jenkins, Harness, or other tools.

CodeBuild retrieves the build files from Amazon S3, runs a Docker build, and pushes the resulting container image to a private ECR repository. Amazon ECR is a managed container registry that facilitates the storage, management, and deployment of container images.

The custom image is then attached to a SageMaker Studio domain and can be used by data scientists and data engineers as an IDE or as runtime for SageMaker processing or training jobs.

Prerequisites

This post covers the default approach for SageMaker Studio, which involves a managed network interface that allows internet communication. We also include steps to adapt this for use within a private virtual private cloud (VPC).

Before you get started, verify that you have the following prerequisites:

SageMaker Studio V2 – Make sure that you’re using the most current version of your SageMaker Studio domain and user profiles. If you’re currently using SageMaker Studio Classic, refer to Migrating from Amazon SageMaker Studio Classic. To create a new domain, see Quick setup to Amazon SageMaker.
Correct IAM permissions – You need to make sure that the AWS Identity and Access Management (IAM) role used has the correct permissions to run the build in CodeBuild, create a repository in Amazon ECR, and push images to that repository. For more details, refer to Using the Amazon SageMaker Studio Image Build CLI to build container images from your Studio notebooks. If you choose to deploy this solution using the provided IaC sample, these permissions will be set automatically.

If you intend to follow this post and deploy the CodeBuild project and the ECR repository using IaC, you also need to install the AWS Cloud Development Kit (AWS CDK) on your local machine. For instructions, see Getting started with the AWS CDK. If you’re using a cloud-based IDE like AWS Cloud9, the AWS CDK will usually come preinstalled.

If you want to securely deploy your custom container using your private VPC, you also need the following:

A VPC with a private subnet
VPC endpoints for the following services:
- Amazon S3
- SageMaker
- Amazon ECR
- AWS Security Token Service (AWS STS)
- CodeBuild for building Docker containers

To set up a SageMaker Studio domain with a private VPC, see Connect Studio notebooks in a VPC to external resources.

Extend SageMaker Distribution

By default, SageMaker Studio provides a selection of curated pre-built Docker images as part of SageMaker Distribution. These images include popular frameworks for ML, data science, and visualization, including deep learning frameworks like PyTorch, TensorFlow and Keras; popular Python packages like NumPy, scikit-learn, and pandas; and IDEs like JupyterLab and Code Editor. All installed libraries and packages are mutually compatible and are provided with their latest compatible versions. Each distribution version is available in two variants, CPU and GPU, and is hosted on the Amazon ECR Public Gallery. To be able to work with geospatial data in SageMaker Studio, you need to extend SageMaker Distribution by adding the required geospatial libraries like gdal, geospandas, leafmap, or rioxarray and make it accessible to users through SageMaker Studio.

Let’s first review how to extend SageMaker Distribution for geospatial analyses and ML. To do so, we largely follow the provided template for creating custom Docker files in SageMaker, with a few subtle but important differences specific to the geospatial libraries we want to install. The full Dockerfile is as follows:

# set distribution type (cpu or gpu)
ARG DISTRIBUTION_TYPE

# get SageMaker Distribution base image
# use fixed version for reproducibility, use "latest" for most recent version
FROM public.ecr.aws/sagemaker/sagemaker-distribution:1.8.0-$DISTRIBUTION_TYPE

#set SageMaker specific parameters and arguments
#see here for supported values: https://docs.aws.amazon.com/sagemaker/latest/dg/studio-updated-jl-image-specifications.html#studio-updated-jl-admin-guide-custom-images-user-and-filesystem
ARG NB_USER="sagemaker-user"
ARG NB_UID=1000
ARG NB_GID=100

ENV MAMBA_USER=$NB_USER

USER $ROOT

#set environment variables required for GDAL
ARG CPLUS_INCLUDE_PATH=/usr/include/gdal
ARG C_INCLUDE_PATH=/usr/include/gdal

#install GDAL and other required Linux packages
RUN apt-get --allow-releaseinfo-change update -y -qq 
   && apt-get update 
   && apt install -y software-properties-common 
   && add-apt-repository --yes ppa:ubuntugis/ppa 
   && apt-get update 
   && apt-get install -qq -y groff unzip libgdal-dev gdal-bin ffmpeg libsm6 libxext6 
   && apt-get install -y --reinstall build-essential 
   && apt-get clean 
   && rm -fr /var/lib/apt/lists/*

# use micromamaba package manager to install required geospatial python packages
USER $MAMBA_USER

RUN micromamba install gdal==3.6.4 --yes --channel conda-forge --name base 
   && micromamba install geopandas==0.13.2 rasterio==1.3.8 leafmap==0.31.3 rioxarray==0.15.1 --yes --channel conda-forge --name base 
   && micromamba clean -a

# set entrypoint and jupyter server args
ENTRYPOINT ["jupyter-lab"]
CMD ["--ServerApp.ip=0.0.0.0", "--ServerApp.port=8888", "--ServerApp.allow_origin=*", "--ServerApp.token=''", "--ServerApp.base_url=/jupyterlab/default"]

Let’s break down the key geospatial-specific modifications.

First, you install the Geospatial Data Abstraction Library (GDAL) on Linux. GDAL is an open source library that provides drivers for reading and writing raster and vector geospatial data formats. It provides the backbone for many open source and proprietary GIS applications, including the libraries used in the post. This is implemented as follows (see see Install GDAL for Python for more details for more details):

#install GDAL and other required Linux packages
RUN apt-get --allow-releaseinfo-change update -y -qq 
   && apt-get update 
   && apt install -y software-properties-common 
   && add-apt-repository --yes ppa:ubuntugis/ppa 
   && apt-get update 
   && apt-get install -qq -y groff unzip libgdal-dev gdal-bin ffmpeg libsm6 libxext6 
   && apt-get install -y --reinstall build-essential 
   && apt-get clean 
   && rm -fr /var/lib/apt/lists/*

You also need to set the following GDAL-specific environment variables:

ARG CPLUS_INCLUDE_PATH=/usr/include/gdal
ARG C_INCLUDE_PATH=/usr/include/gdal

With GDAL installed, you can now install the required geospatial Python libraries using the recommended micromamba package manager. This is implemented in the following code block:

# use micromamaba package manager to install required geospatial python packages
USER $MAMBA_USER

RUN micromamba install gdal==3.6.4 --yes --channel conda-forge --name base 
   && micromamba install geopandas==0.13.2 rasterio==1.3.8 leafmap==0.31.3 rioxarray==0.15.1 --yes --channel conda-forge --name base 
   && micromamba clean -a

The versions defined here have been tested with the underlying SageMaker Distribution. You can freely add additional libraries that you may need. Identifying the right version may require some level of experimentation.

Now that you have created your custom geospatial Dockerfile, you can build it and push the image to Amazon ECR.

Build a custom geospatial image

To build the Docker image, you need a build environment equipped with Docker and the AWS Command Line Interface (AWS CLI). This environment can be set up on your local machine, in a cloud-based IDE like AWS Cloud9, or as part of a continuous integration service like CodeBuild.

Before you build the Docker image, identify the ECR repository where you will push the image. Your image must be tagged in the following format: <your-aws-account-id>.dkr.ecr.<your-aws-region>.amazonaws.com/<your-repository-name>:<tag>. Without this tag, pushing it to an ECR repository is not possible. If you’re deploying the solution using the AWS CDK, an ECR repository is automatically created, and a CodeBuild project is configured to use this repository as the target for pushing the image. When you initiate the CodeBuild build, the image is built, tagged, and then pushed to the previously created ECR repository.

The following steps are applicable only if you choose to perform these actions manually.

To build the image manually, run the following command in the same directory as the Dockerfile:

docker build --build-arg DISTRIBUTION_TYPE=cpu -t ${ECR_ACCOUNT_ID}.dkr.ecr.${ECR_REGION}.amazonaws.com/${ECR_REPO_NAME}:latest-cpu .

After building your image, you must log in to the ECR repository with this command before pushing the image:

aws ecr get-login-password --region ${ECR_REGION} | docker login --username AWS --password-stdin ${ECR_ACCOUNT_ID}.dkr.ecr.${ECR_REGION}.amazonaws.com

Next, push your Docker image using the following command:

docker push ${ECR_ACCOUNT_ID}.dkr.ecr.${ECR_REGION}.amazonaws.com/${ECR_REPO_NAME}:latest-cpu

Your image has now been pushed to the ECR repository and you can proceed to attach it to SageMaker.

Attach the custom geospatial image to SageMaker Studio

After your custom image has been successfully pushed to Amazon ECR, you need to attach it to a SageMaker domain to be able to use it within SageMaker Studio.

On the SageMaker console, choose Domains under Admin configurations in the navigation pane.

If you don’t have a SageMaker domain set up yet, you can create one.

From the list of available domains, choose the domain to which you want to attach the geospatial image.
On the Domain details page, choose the Environment tab
In Custom images for personal Studio apps section, choose Attach image.

Choose New image and enter the ECR image URI from the build pipeline output. This should have the following format <your-aws-account-id>.dkr.ecr.<your-aws-region>.amazonaws.com/<your-repository-name>:<tag>
Choose Next.
For Image name, enter a custom image name (for this post, we use custom-geospatial-sm-dist).
For Image display name, enter a custom display name (for this post, we use Geospatial SageMaker Distribution (CPU)).
For Description, enter an image description.

Choose JupyterLab image as the application type and choose Submit.

When returning to the Environment tab on the Domain details page, you should now see your image listed under Custom images for personal Studio apps.

Attach the custom geospatial image using the AWS CLI

You can also automate the process using the AWS CLI.

First, register the image in SageMaker and create an image version:

SAGEMAKER_IMAGE_NAME=sagemaker-dist-custom-geospatial # adapt with your image name
ECR_IMAGE_URL='<account_id>.dkr.ecr.<region>.amazonaws.com/<ecr-repo-name>:latest-cpu' # replace with your ECR repository url
ROLE_ARN='The ARN of an IAM role for the execution role you want to use' # replace with the desired execution role

aws sagemaker create-image 
    --image-name ${SAGEMAKER_IMAGE_NAME} 
    --role-arn ${ROLE_ARN}

aws sagemaker create-app-image-config 
    --app-image-config-name ${SAGEMAKER_IMAGE_NAME}-app-image-config 
    --jupyter-lab-app-image-config {}

aws sagemaker create-image-version 
    --image-name ${SAGEMAKER_IMAGE_NAME} 
    --base-image ${ECR_IMAGE_URL}

Next, create a file containing the following content. You can add multiple custom images by adding additional entries to the CustomImages list.

{
  "DefaultUserSettings": {
    "JupyterLabAppSettings": {
      "CustomImages": [
                {
                    "ImageName": "sagemaker-dist-custom-geospatial",
                    "ImageVersionNumber": 1,
                    "AppImageConfigName": "sagemaker-dist-custom-geospatial-app-image-config "
                }
            ]
        }
    }
}

The next step assumes that you named the file from the previous step default-user-settings.json. The following command attaches the SageMaker image to the specified Studio domain:

DOMAIN_ID=d-####### # replace with your SageMaker Studio domain id
aws sagemaker update-domain --domain-id ${DOMAIN_ID} --cli-input-json file://default-user-settings.json

Use the custom geospatial Image in the JupyterLab app

In the previous section, we demonstrated how to attach the image to a SageMaker domain. When you create a new (or modify an existing) JupyterLab space inside this domain, the newly created custom image will now be available. You can choose it on the Image dropdown menu, where it now appears alongside the default AWS curated SageMaker Distribution image versions under Custom.

To run a space using the custom geospatial image, choose Geospatial SageMaker Distribution (CPU) as your image, then choose Run space.

After the space has been provisioned and is in Running state, choose Open JupyterLab. This will bring up the JupyterLab IDE in a new browser tab. Select a notebook with Python3 (ipykernel) to start up a new Jupyter notebook running on top of the custom geospatial image.

Run interactive geospatial data analyses and large-scale processing jobs in SageMaker

After you build the custom geospatial image and attach it to your SageMaker domain, you can use it in one of two main ways:

You can use the image as the base to run a JupyterLab notebook kernel to perform in-notebook interactive development and geospatial analytics.
You can use the image in a SageMaker processing job to run highly parallelized geospatial processing pipelines. Reusing the interactive kernel image for asynchronous batch processing can be advantageous because only a single image will have to maintained and routines developed in an interactive manner using a notebook can be expected to work seamlessly in the processing job. If startup latency caused by longer image load times is a concern, you can choose to build a dedicated more lightweight image just for processing (see Build Your Own Processing Container for details).

For hands-on examples of both approaches, refer to the accompanying GitHub repository.

In-notebook interactive development using a custom image

After you choose the custom geospatial image as the base image for your JupyterLab space, SageMaker provides you with access to many geospatial libraries that can now be imported without the need for additional installs. For example, you can run the following code to initialize a geometry object and plot it on a map within the familiar environment of a notebook:

import shapely
import leafmap
import geopandas

coords = [[-102.00723310488662,40.596123257503024],[-102.00723310488662,40.58168585757733],[-101.9882214495914,40.58168585757733],[-101.9882214495914,40.596123257503024],[-102.00723310488662,40.596123257503024]]
polgyon = shapely.Polygon(coords)
gdf = geopandas.GeoDataFrame(index=[0], crs='epsg:4326', geometry=[polgyon])
Map = leafmap.Map(center=[40.596123257503024, -102.00723310488662], zoom=13)
Map.add_basemap("USGS NAIP Imagery")
Map.add_gdf(gdf, layer_name="test", style={"color": "yellow", "fillOpacity": 0.3, "clickable": True,})
Map

Highly parallelized geospatial processing pipelines using a SageMaker processing job and a custom image

You can specify the custom image as the image to run a SageMaker processing job. This enables you to use specialist geospatial processing frameworks to run large-scale distributed data processing pipelines with just a few lines of code. The following code snippet initializes and then runs a SageMaker ScriptProcessor object that uses the custom geospatial image (specified using the geospatial_image_uri variable) to run a geospatial processing routine (specified in a processing script) on 20 ml.m5.2xlarge instances:

import sagemaker
from sagemaker import get_execution_role
from sagemaker.sklearn.processing import ScriptProcessor
from sagemaker.processing import ProcessingInput, ProcessingOutput

region = sagemaker.Session().boto_region_name
role = get_execution_role()

geospatial_image_uri = "<GEOSPATIAL-IMAGE-URI>" #<-- set to uri of the custom geospatial image

processor_geospatial_data_cube = ScriptProcessor(
    command=['python3'],
    image_uri=geospatial_image_uri,
    role=role,
    instance_count=20,
    instance_type='ml.m5.2xlarge',
    base_job_name='aoi-data-cube'
)

processor_geospatial_data_cube.run(
    code='scripts/generate_aoi_data_cube.py', #<-- processing script
    inputs=[
        ProcessingInput(
            source=f"s3://{bucket_name}/{bucket_prefix_aoi_meta}/",
            destination='/opt/ml/processing/input/aoi_meta/', #<-- meta data (incl. geography) of the area of observation
            s3_data_distribution_type="FullyReplicated" #<-- sharding strategy for distribution across nodes
        ),        
        ProcessingInput(
            source=f"s3://{bucket_name}/{bucket_prefix_sentinel2_meta}/",
            destination='/opt/ml/processing/input/sentinel2_meta/', #<-- Sentinel-2 scene metadata (1 file per scene)
            s3_data_distribution_type="ShardedByS3Key" #<-- sharding strategy for distribution across nodes
        ),
    ],
    outputs=[
        ProcessingOutput(
            source='/opt/ml/processing/output/',
            destination=f"s3://{bucket_name}/processing/geospatial-data-cube/{execution_id}/output/" #<-- output S3 path
        )
    ]
)

A typical processing routine involving raster file loading, clipping to an area of observation, resampling specific bands, and masking clouds among other steps across 134 110x110km Sentinel-2 scenes completes in under 15 minutes, as can be seen in the following Amazon CloudWatch dashboard.

Clean up

After you’re done running the notebook, don’t forget to stop the SageMaker Studio JupyterLab application to avoid incurring unnecessary costs. If you deployed the additional infrastructure using the AWS CDK, you can delete the deployed stack by running the following command in your local code checkout:

cd <path to repository>
cd deployment && cdk destroy

Conclusion

This post has equipped you with the knowledge and tools to build and use custom container images tailored for geospatial analysis in SageMaker Studio. By extending SageMaker Distribution with specialized geospatial libraries, you can customize your environment for specialized use cases. This empowers you to unlock the vast potential of geospatial data for applications such as environmental monitoring, urban planning, and precision agriculture—all within the familiar and user-friendly environment of SageMaker Studio.

Although this post focused on geospatial workflows, the methodology presented is broadly applicable. You can utilize the same principles to tailor container images for any domain requiring specific libraries or tools beyond the scope of SageMaker Distribution. This empowers you to create a truly customized development experience within SageMaker Studio, catering to your unique project needs.

The provided resources, including sample code and IaC templates, offer a solid foundation for building your own custom images. Experiment and explore how this approach can streamline your ML workflows involving geospatial data or any other specialized domain. To get started, visit the accompanying GitHub repository.

About the Authors

Janosch Woschitz is a Senior Solutions Architect at AWS, specializing in AI/ML. With over 15 years of experience, he supports customers globally in leveraging AI and ML for innovative solutions and building ML platforms on AWS. His expertise spans machine learning, data engineering, and scalable distributed systems, augmented by a strong background in software engineering and industry expertise in domains such as autonomous driving.

Dr. Karsten Schroer is a Senior Machine Learning (ML) Prototyping Architect at AWS, focused on helping customers leverage artificial intelligence (AI), ML, and generative AI technologies. With deep ML expertise, he collaborates with companies across industries to design and implement data- and AI-driven solutions that generate business value. Karsten holds a PhD in applied ML.

Anirudh Viswanathan is a Senior Product Manager, Technical, at AWS with the SageMaker team, where he focuses on Machine Learning. He holds a Master’s in Robotics from Carnegie Mellon University and an MBA from the Wharton School of Business. Anirudh is a named inventor on more than 50 AI/ML patents. He enjoys long-distance running, exploring art galleries, and attending Broadway shows.

Automating model customization in Amazon Bedrock with AWS Step Functions workflow

Large language models have become indispensable in generating intelligent and nuanced responses across a wide variety of business use cases. However, enterprises often have unique data and use cases that require customizing large language models beyond their out-of-the-box capabilities. Amazon Bedrock is a fully managed service that offers a choice of high-performing foundation models (FMs) from leading AI companies like AI21 Labs, Anthropic, Cohere, Meta, Mistral AI, Stability AI, and Amazon through a single API, along with a broad set of capabilities you need to build generative AI applications with security, privacy, and responsible AI. To enable secure and scalable model customization, Amazon Web Services (AWS) announced support for customizing models in Amazon Bedrock at AWS re:Invent 2023. This allows customers to further pre-train selected models using their own proprietary data to tailor model responses to their business context. The quality of the custom model depends on multiple factors including the training data quality and hyperparameters used to customize the model. This requires customers to perform multiple iterations to develop the best customized model for their requirement.

To address this challenge, AWS announced native integration between Amazon Bedrock and AWS Step Functions. This empowers customers to orchestrate repeatable and automated workflows for customizing Amazon Bedrock models.

In this post, we will demonstrate how Step Functions can help overcome key pain points in model customization. You will learn how to configure a sample workflow that orchestrates model training, evaluation, and monitoring. Automating these complex tasks through a repeatable framework reduces development timelines and unlocks the full value of Amazon Bedrock for your unique needs.

Architecture

We will use a summarization use case using Cohere Command Light Model in Amazon Bedrock for this demonstration. However, this workflow can be used for the summarization use case for other models by passing the base model ID and the required hyperparameters and making model-specific minor changes in the workflow. See the Amazon Bedrock user guide for the full list of supported models for customization. All the required infrastructure will be deployed using the AWS Serverless Application Model (SAM).

The following is a summary of the functionality of the architecture:

User uploads the training data in JSON Line into an Amazon Simple Storage Service (Amazon S3) training data bucket and the validation, reference inference data into the validation data bucket. This data must be in the JSON Line format.
The Step Function CustomizeBedrockModel state machine is started with the input parameters such as the model to customize, hyperparameters, training data locations, and other parameters discussed later in this post.
- The workflow invokes the Amazon Bedrock CreateModelCustomizationJob API synchronously to fine tune the base model with the training data from the S3 bucket and the passed-in hyperparameters.
- After the custom model is created, the workflow invokes the Amazon Bedrock CreateProvisionedModelThroughput API to create a provisioned throughput with no commitment.
- The parent state machine calls the child state machine to evaluate the performance of the custom model with respect to the base model.
- The child state machine invokes the base model and the customized model provisioned throughput with the same validation data from the S3 validation bucket and stores the inference results into the inference bucket.
- An AWS Lambda function is called to evaluate the quality of the summarization done by custom model and the base model using the BERTScore metric. If the custom model performs worse than the base model, the provisioned throughput is deleted.
- A notification email is sent with the outcome.

Prerequisites

Create an AWS account if you do not already have one.
Access to the AWS account through the AWS Management Console and the AWS Command Line Interface (AWS CLI). The AWS Identity and Access Management (IAM) user that you use must have permissions to make the necessary AWS service calls and manage AWS resources mentioned in this post. While providing permissions to the IAM user, follow the principle of least-privilege.
Git Installed.
AWS Serverless Application Model (AWS SAM) installed.
Docker must be installed and running.
You must enable the Cohere Command Light Model access in the Amazon Bedrock console in the AWS Region where you’re going to run the AWS SAM template. We will customize the model in this demonstration. However, the workflow can be extended with minor model-specific changes to support customization of other supported models. See the Amazon Bedrock user guide for the full list of supported models for customization. You must have no commitment model units reserved for the base model to run this demo.

Demo preparation

The resources in this demonstration will be provisioned in the US East (N. Virginia) AWS Region (us-east-1). We will walk through the following phases to implement our model customization workflow:

Deploy the solution using the AWS SAM template
Upload proprietary training data to the S3 bucket
Run the Step Functions workflow and monitor
View the outcome of training the base foundation model
Clean up

Step 1: Deploy the solution using the AWS SAM template

Refer to the GitHub repository for latest instruction. Run the below steps to deploy the Step Functions workflow using the AWS SAM template. You can

Create a new directory, navigate to that directory in a terminal and clone the GitHub repository:

git clone https://github.com/aws-samples/amazon-bedrock-model-customization.git

Change directory to the solution directory:

cd amazon-bedrock-model-customization

Run the build.sh to create the container image.

bash build.sh

When prompted, enter the following parameter values:

image_name=model-evaluation
repo_name=bedrock-model-customization
aws_account={your-AWS-account-id}
aws_region={your-region}

From the command line, use AWS SAM to deploy the AWS resources for the pattern as specified in the template.yml file:

sam deploy --guided

Provide the below inputs when prompted:

Enter a stack name.
Enter us-east-1 or your AWS Region where you enabled Amazon Bedrock Cohere Command Light Model.
Enter SenderEmailId - Once the model customization is complete email will come from this email id. You need to have access to this mail id to verify the ownership.
Enter RecipientEmailId - User will be notified to this email id.
Enter ContainerImageURI - ContainerImageURI is available from the output of the `bash build.sh` step.
Keep default values for the remaining fields.

Note the outputs from the SAM deployment process. These contain the resource names and/or ARNs which are used in the subsequent steps.

Step 2: Upload proprietary training data to the S3 bucket

Our proprietary training data will be uploaded to the dedicated S3 bucket created in the previous step, and used to fine-tune the Amazon Bedrock Cohere Command Light model. The training data needs to be in JSON Line format with every line containing a valid JSON with two attributes: prompt and completion.

I used this public dataset from HuggingFace and converted it to JSON Line format.

Upload the provided training data files to the S3 bucket using the command that follows. Replace TrainingDataBucket with the value from the sam deploy --guided output. Update your-region with the Region that you provided while running the SAM template.

aws s3 cp training-data.jsonl s3://{TrainingDataBucket}/training-data.jsonl --region {your-region}

Upload the validation-data.json file to the S3 bucket using the command that follows. Replace ValidationDataBucket with the value from the sam deploy --guided output. Update your-region with the Region that you provided while running the SAM template:

aws s3 cp validation-data.json s3://{ValidationDataBucket}/validation-data.json --region {your-region}

Upload the reference-inference.json file to the S3 bucket using the command that follows. Replace ValidationDataBucket with the value from the sam deploy --guided output. Update your-region with the region that you provided while running the SAM template.

aws s3 cp reference-inference.json s3://{ValidationDataBucket}/reference-inference.json --region {your-region}

You should have also received an email for verification of the sender email ID. Verify the email ID by following the instructions given in the email.

Step 3: Run the Step Functions workflow and monitor

We will now start the Step Functions state machine to fine tune the Cohere Command Light model in Amazon Bedrock based on the training data uploaded into the S3 bucket in the previous step. We will also pass the hyperparameters. Feel free to change them.

Run the following AWS CLI command to start the Step Functions workflow. Replace StateMachineCustomizeBedrockModelArn and TrainingDataBucket with the values from the sam deploy --guided output. Replace UniqueModelName and UniqueJobName with unique values. Change the values of the hyperparameters based on the selected model. Update your-region with the region that you provided while running the SAM template.

aws stepfunctions start-execution --state-machine-arn "{StateMachineCustomizeBedrockModelArn}" --input "{"BaseModelIdentifier": "cohere.command-light-text-v14:7:4k","CustomModelName": "{UniqueModelName}","JobName": "{UniqueJobName}", "HyperParameters": {"evalPercentage": "20.0", "epochCount": "1", "batchSize": "8", "earlyStoppingPatience": "6", "earlyStoppingThreshold": "0.01", "learningRate": "0.00001"},"TrainingDataFileName": "training-data.jsonl"}" --region {your-region}

Example output:

{
"executionArn": "arn:aws:states:{your-region}:123456789012:execution:{stack-name}-wcq9oavUCuDH:2827xxxx-xxxx-xxxx-xxxx-xxxx6e369948",
"startDate": "2024-01-28T08:00:26.030000+05:30"
}

The foundation model customization and evaluation might take 1 hour to 1.5 hours to complete! You will get a notification email after the customization is done.

Run the following AWS CLI command or sign in to the AWS Step Functions console to check the Step Functions workflow status. Wait until the workflow completes successfully. Replace the executionArn from the previous step output and update your-region.

aws stepfunctions describe-execution --execution-arn {executionArn} --query status --region {your-region}

Step 4: View the outcome of training the base foundation model

After the Step Functions workflow completes successfully, you will receive an email with the outcome of the quality of the customized model. If the customized model isn’t performing better than the base model, the provisioned throughput will be deleted. The following is a sample email:

If the quality of the inference response is not satisfactory, you will need to retrain the base model based on the updated training data or hyperparameters.

See the ModelInferenceBucket for the inferences generated from both the base foundation model and custom model.

Step 5: Cleaning up

Properly decommissioning provisioned AWS resources is an important best practice to optimize costs and enhance security posture after concluding proofs of concept and demonstrations. The following steps will remove the infrastructure components deployed earlier in this post:

Delete the Amazon Bedrock provisioned throughput of the custom mode. Ensure that the correct ProvisionedModelArn is provided to avoid an accidental unwanted delete. Also update your-region.

aws bedrock delete-provisioned-model-throughput --provisioned-model-id {ProvisionedModelArn} --region {your-region}

Delete the Amazon Bedrock custom model. Ensure that the correct CustomModelName is provided to avoid accidental unwanted delete. Also update your-region.

aws bedrock delete-custom-model --model-identifier {CustomModelName} --region {your-region}

Delete the content in the S3 bucket using the following command. Ensure that the correct bucket name is provided to avoid accidental data loss:

aws s3 rm s3://{TrainingDataBucket} --recursive --region {your-region}
aws s3 rm s3://{CustomizationOutputBucket} --recursive --region {your-region}
aws s3 rm s3://{ValidationDataBucket} --recursive --region {your-region}
aws s3 rm s3://{ModelInferenceBucket} --recursive --region {your-region}

To delete the resources deployed to your AWS account through AWS SAM, run the following command:

sam delete

Conclusion

This post outlined an end-to-end workflow for customizing an Amazon Bedrock model using AWS Step Functions as the orchestration engine. The automated workflow trains the foundation model on customized data and tunes hyperparameters. It then evaluates the performance of the customized model against the base foundation model to determine the efficacy of the training. Upon completion, the user is notified through email of the training results.

Customizing large language models requires specialized machine learning expertise and infrastructure. AWS services like Amazon Bedrock and Step Functions abstract these complexities so enterprises can focus on their unique data and use cases. By having an automated workflow for customization and evaluation, customers can customize models for their needs more quickly and with fewer the operational challenges.

Further study

About the Author

Biswanath Mukherjee is a Senior Solutions Architect at Amazon Web Services. He works with large strategic customers of AWS by providing them technical guidance to migrate and modernize their applications on AWS Cloud. With his extensive experience in cloud architecture and migration, he partners with customers to develop innovative solutions that leverage the scalability, reliability, and agility of AWS to meet their business needs. His expertise spans diverse industries and use cases, enabling customers to unlock the full potential of the AWS cloud.

‘Once Human,’ Twice the Thrills on GeForce NOW

Unlock new experiences every GFN Thursday. Whether post-apocalyptic survival adventures, narrative-driven games or vast, open worlds, GeForce NOW always has something fresh for members to explore.

This week, GeForce NOW brings the survival game Once Human from Starry Studio to the cloud, part of three new titles.

Survive the Stardust

Once Human on GeForce NOW — *We’re all just made of stardust.*

Step into a post-apocalyptic world where cosmic energy has transformed humanity in Once Human. As a Meta-Human, survive the contamination and use the powers of Stardust to navigate a new and bizarre open-world universe.

Experience elements of survival, crafting and combat while challenging players to gather resources, build shelters and fend off human and monstrous threats. Uncover the rich lore through interactions with various characters and artifacts scattered throughout the world.

Delve into the truth of Stardust — discover where it came from and what it wants. Play alone or grab a squad to fight, build and explore together. Level up with an Ultimate or Priority membership to stream across devices at higher resolutions and frame rates over free members. Gaming sessions are up to six hours for Priority members and eight hours for Ultimate members, plenty of time to unravel the cosmic mysteries of Once Human.

Happy New Games

Anger Foot on GeForce NOW — *Taking names and kicking butt.*

Unleash the world’s deadliest feet on a colorful cast of anthropomorphic enemies in Anger Foot from Devolver Digital. Clear out slums, sewers and skyscrapers, grab new weapons, unlock new sneakers and upgrade powers in absurd and wonderful ways. Kick and shoot to get to the exit — and leave behind a smoldering trail of shattered doors, broken bones and crumpled energy drinks.

Check out the list of new games this week:

Cricket 24 (New release on Xbox and available on PC Game Pass, July 9)
Once Human (New release on Steam, July 9)
Anger Foot (New release on Steam, July 11)

What are you planning to play this weekend? Let us know on X or in the comments below.

If you could replay any game as if it were the first time, which game would it be?

— NVIDIA GeForce NOW (@NVIDIAGFN) July 10, 2024

Collaborators: Sustainable electronics with Jake Smith and Aniruddh Vashisth

photos of Jake Smith and Aniruddh Vashisth for the Microsoft Research Collaborators podcast

Transforming research ideas into meaningful impact is no small feat. It often requires the knowledge and experience of individuals from across disciplines and institutions. Collaborators, a Microsoft Research Podcast series, explores the relationships—both expected and unexpected—behind the projects, products, and services being pursued and delivered by researchers at Microsoft and the diverse range of people they’re teaming up with.

Printed circuit boards (PCBs) are abundant—in the items we use daily and then in landfills when they’ve reached end of life. In this episode, Senior Researcher Jake Smith (opens in new tab) and Aniruddh Vashisth (opens in new tab), assistant professor of mechanical engineering at the University of Washington, join host Gretchen Huizinga to talk about the development of vitrimer-based PCBs, or vPCBs, that perform comparably to traditional circuit boards but have less environmental impact. Smith and Vashisth explore machine learning’s role in accelerating the discovery of more sustainable materials and what the more healable vitrimer polymer could mean not only for e-waste but more broadly for aerospace, the automotive industry, and beyond.

Learn more:

Recyclable vitrimer-based printed circuit boards for sustainable electronics
Nature Sustainability, April 2024
Microsoft Climate Research Initiative
Microsoft Research AI for Science
Storing digital data in synthetic DNA with Dr. Karin Strauss
Microsoft Research Podcast, October 2018

Transcript

[TEASER] [MUSIC PLAYS UNDER DIALOGUE]

ANIRUDDH VASHISTH: From the computation point of view, we always thought that if somebody gave us, like, a hundred different chemistries, we can do a bunch of simulations; tell you, like, 10 of these actually work. What we’ve been able to do specifically for vitrimers is that we’re able to look at the problem from the other side, and we are able to say that if you tell me a particular application, this particular chemistry would work best for you. In essence, what we were thinking of is that if aliens abducted all the chemists from the world, can we actually come up with a framework? [LAUGHTER]

JAKE SMITH: If all of this work is successful, in 10 years, maybe our materials design process looks completely different, where we’ve gone from this kind of brute-force screening to an approach where you start with the properties that you care about—they’re defined by the application that you have in mind—and we use this, like, “need space” to define the material that we would like, and we can use machine learning, artificial intelligence, in order to get us to the structure that we need to make in order to actually achieve this design space.

[TEASER ENDS]

GRETCHEN HUIZINGA: You’re listening to Collaborators, a Microsoft Research Podcast showcasing the range of expertise that goes into transforming mind-blowing ideas into world-changing technologies. I’m Dr. Gretchen Huizinga.

[MUSIC FADES]

I’m thrilled to be in the booth today, IRL, with Dr. Jake Smith, a senior researcher at Microsoft Research and part of the Microsoft Climate Research Initiative, or MCRI. And with him is Dr. Aniruddh Vashisth. He’s an assistant professor of mechanical engineering at the University of Washington and director of the Vashisth Research Lab. Jake and Aniruddh are working on a project that uses machine learning to help scientists design sustainable polymers with a particularly exciting application in the field of the ubiquitous printed circuit board, or PCB. But before we get all sustainable, let’s meet our collaborators!

Jake, I’ll start with you. You’re a self-described “chemist with relatively broad interests across applications” and you’ve done some pretty cool things in your career. Tell us about those interests and where they’ve led you and how they’ve contributed to the work you’re doing now in MCRI, or the Microsoft Climate Research Initiative.

JAKE SMITH: Yes. Thank you very much for having me. So I started, like most chemists, poking things around in the lab and learning really fundamentally about how atoms interact with one another and how this affects what we do or what we see at our microscopic level. And so after I left grad school doing this super-basic research, I wanted to do something more applied, and so I did a couple of postdocs, first, looking at how we can more effectively modify proteins after we’ve synthesized them so they might have a property that we care about and then later doing similar work on small molecules in a more traditional drug-design sense. But after I finished that, I wound up here at Microsoft. We were very interested in one molecule in particular, one family of molecules, which is DNA, and we wanted to know, how do we make DNA at just gigantic scale so that we can take that DNA and we could store digital data in it? And because DNA has this nice property that it kind of lasts forever, …

HUIZINGA: Yeah.

SMITH: … at least on our, you know, human scale, it makes a very, you know, nice archival storage medium. So we worked on this project for a while, and at some point, we determined we can, kind of, watch it blossom and find the next challenge to go work on.

HUIZINGA: Interesting …

SMITH: And the challenge that we, you know, wound up at I’ll describe as the Microsoft Climate Research Initiative, the MCRI. We were a group of applied scientists from, like, natural scientist backgrounds within Microsoft, and we said, how can we make a difference for Microsoft? And the difference that we thought was Microsoft has climate goals.

HUIZINGA: Oh, yeah!

SMITH: Microsoft wants to be carbon negative, it wants to be water positive, and it wants to be zero waste. And in order to make this happen, we need novel materials, which really are a macroscopic view of, once again, atomic behavior. And we said, hey, we understand atomic behavior. We’re interested in this.

HUIZINGA: [LAUGHS] We can help! We’re from the government …

SMITH: Yeah, maybe this is something we could help on. Yeah. And so here we are. We wound up with Aniruddh, and we’ll go into that later, I’m sure.

HUIZINGA: Yeah, yeah. So just quickly back to the DNA thing. Was that another collaboration? I had Karin Strauss on the podcast a while ago, and she talked about that.

SMITH: Oh, absolutely. Yeah, this was with Karin, and we had great collaborators, also at the University of Washington in the Molecular Information Systems Lab, or MISL, who did a lot of work with us on the practicalities of working with DNA once it’s synthesized and how would you do things like retrieve information from a big pool of DNA.

HUIZINGA: Right. Right. They could … people could go back to that podcast because she does unpack that quite a bit. Well, Aniruddh, you describe yourself as a “trained mechanician who hangs out with chemists,” hence your friendship with Jake here, but for your day job, you’re a professor and you have your own lab that conducts interdisciplinary research at the intersection, as you say, of mechanics and material science. So what made you want to move to that neighborhood, and what goes on there?

ANIRUDDH VASHISTH: Yeah. Well, again, thank you so much for having me here. I’m super excited about this. Yeah, just a little bit of background about me. So I started off with my undergrad in civil and mechanics from IIT BHU, did a PhD in mechanics at Penn State, and moved to Texas …

HUIZINGA: Go back … go back to, what’s the first one?

VASHISTH: It’s Indian Institute of Technology, in India, so that’s …

HUIZINGA: IIT …

VASHISTH: … IIT. I did my undergrad there and then straight away came to the US to do my PhD in mechanics at Penn State and then ended up going to Texas, to Texas A&M University, and postdoc-ed in a chemical engineering lab, and that’s how I became, like, super familiar and fond of chemical engineers and chemists! [LAUGHTER] And we moved to Seattle, when I got the job at University of Washington in 2021, with my wife and my daughter. And what we do in our lab is we make and break things now! [LAUGHS] We try to see, like, you know, when we are making and breaking these things, we try to see them from an experimental and a simulation point of view and try to gain some understanding of the mechanics of these different types of materials. Especially, we are very interested in polymers. I always joke with my students and my class that go about one day without touching a polymer, and I’m always surprised by the smiles or the smirks that I get! But in general, like, we have been super, super excited and interested about sustainable polymers, making sustainable composites. Particularly, we are very excited and interested in vitrimer polymers. So let me just take, like, a step back. I’ll probably wear my professor hat straight away here.

HUIZINGA: Yeah. Let’s do! Let’s go. [LAUGHTER]

VASHISTH: And I’ll tell you, just, like, taking a step back, what are the different types of polymers. So in general, you can think of polymers as thermosets or thermoplastics. So to Jake’s point, let’s just go to the molecular scale there, and you can think of polymers as bunch of these pasta noodles which can slide over each other, right. Or these bunch of pasta noodles which are packed together. So thermoset, as the name suggests, it’s a set network. The pasta noodles are kind of, like, set in their place. Thermoplastics is when these pasta noodles can slide over each other. So you’ve probably put too much sauce in there! [LAUGHTER] Yeah, so a good analogy there would be a lot of the adhesives that we use are thermosets because they set after a while. Thermoplastic … we use plastics for 3D printing a lot, so those are thermoplastics. So they’re solid. You can heat them up, you can make them flow, print something, and they solidify. Vitrimers are very exciting because, just like thermoplastics, they have this flowability associated to them but more at a molecular scale. Like, if you think of a single pasta noodle, it can unclick and re-click back again. So it’s like, you know, it’s made up of these small LEGO blocks that can unclick and re-click back again …

HUIZINGA: LEGO pasta …

VASHISTH: LEGO pasta …

HUIZINGA: I like that! [LAUGHS]

VASHISTH: Exactly. So this unclicking and re-clicking can make them re-processable, reusable, recyclable. Gives them, like, much longer life because you can heal them. And then vitrimers basically become the vampires of the polymer universe!

HUIZINGA: Meaning they don’t die?

VASHISTH: Well …

HUIZINGA: Or …

VASHISTH: They have like much longer life! [LAUGHTER]

SMITH: They sleep every now and then to regenerate! Yes … [LAUGHS]

HUIZINGA: Aniruddh, sticking with you for a minute, before we get into the collaboration, let’s do a quick level set on what we might call “The Secret Life of Circuit Boards.” For this, I’d like you to channel David Attenborough and narrate this PCB documentary. Where do we find printed circuit boards in their natural habitat? How many species are there? What do they do during the day? How long do they live? And what happens when they die?

VASHISTH: OK, so do I have to speak like David … ?

HUIZINGA: Yes, I’d appreciate it if you’d try. [LAUGHTER] … No. Just be your voice.

VASHISTH: Yeah. Yeah. So PCBs are, if you think about it, they are everywhere. PCBs are in these laptops that we have in front of us. Probably there are PCBs in these mics. Automobiles. Medical devices. So PCBs are, they’re just, like, everywhere. And depending upon, like, what is their end applications, they have a composite part of it, where you have, like, some sort of a stiff inclusion in a polymeric matrix, which is holding this part together and has bunch of electronics on top of it. And depending on the end application, it might come in different flavors: something that can sustain much higher temperatures; something which is flexible. Things of that sort. And they live as long as we use the material for, like, you know, as long as we are using these laptops or as long as we end up using our cars. And unfortunately, there is a lot of e-waste which is created at the end.

HUIZINGA: Right …

VASHISTH: There’s been a lot of effort in recycling and reusing these materials, but I’m confident we can do more.

HUIZINGA: Right.

VASHISTH: I think there’s like close to 50 million metric tons of …

HUIZINGA: Wow!

VASHISTH: … of e-waste which is generated—more than that actually—every year, so …

HUIZINGA: OK.

VASHISTH: … a lot of scope for us to work there.

HUIZINGA: Um, so right now, are they sort of uniform? The printed circuit board? I know we’re going to talk about vitrimer-based ones, but I mean, other than that, are there already multiple materials used for these PCBs? Jake, you can even address that.

SMITH: Yeah. Of course. So there are, like, kind of, graded ranks of circuit board materials …

HUIZINGA: OK.

SMITH: … that as Aniruddh said, you know, might be for specialty applications where you need higher-temperature tolerance than normal or you need lower noise out of your circuit board.

HUIZINGA: Gotcha.

SMITH: But, kind of, the bog-standard circuit board, the green one that you think about if you’ve ever seen a circuit board, this is like anti-flammability coating on a material called FR-4. So FR-4—which is an industrial name for a class of polymers that are flame-retardant, thus FR, and 4 gives you the general class—this is the circuit board material …

HUIZINGA: OK …

SMITH: … that, you know, we really targeted with this effort.

HUIZINGA: Interesting. So, Jake, let’s zoom out for a minute and talk about the big picture and why this is interesting to Microsoft Research. I keep hearing two phrases: sustainable electronics and a circular economy. So talk about how the one feeds into the other and what an ultimate success story would look like here.

SMITH: Yeah, absolutely. So I’ll start with the latter. When we set out to start the Microsoft Climate Research Initiative, we started with this vision of a circular economy that would do things that avoid what we, you know, can avoid using. But there are many cases where you can’t avoid using something that is nonrenewable. And there, what we really want to do is we want to recapture what we can’t avoid. And this project, you know, falls in the latter. There’s a lot of things that fall in the latter case. So, you know, we were looking at this at a very carbon dioxide-centric viewpoint where CO₂ is ultimately the thing that we’re thinking about in the circle, although you can draw a circular economy diagram with a lot of things in the circle. But from the CO₂ viewpoint, you know, what led us to this project with Aniruddh is we thought, we need to capture CO₂, but once you capture CO₂, you know, what do you do with it? [LAUGHTER] You can pump some of it back into the ground, but this is, you know, an economically non-productive activity. And so it’s something we have to do. It’s not something we want to do.

HUIZINGA: Right.

SMITH: And so what could we want to do with the CO₂that we’ve captured? And the thought was we do something economically viable with it. We, you know, upcycle the CO₂into something interesting, and what we really want, and what we still really want, is to be able to take that CO₂, convert it down into a useful chemical feedstock—and there are great laboratories …

HUIZINGA: Oh, interesting …

SMITH: … doing work on this—and then we could, you know, look at our plastic design problem and say, hey, we have all this FR-4 in the world. How could we replace the FR-4—the, you know, explicit atoms that are in the FR-4—with atoms that have come from CO₂that we pulled out of the air? And so this is, you know, the circular economy portion. We come down to, you know, the specific problem here. Aniruddh talked a lot about e-waste.

HUIZINGA: Yeah.

SMITH: And I have great colleagues who also collaborated with us on this project—Bichlien Nguyen, Kali Frost—who have been doing work with our product teams here at Microsoft on, you know, what can we do to reduce the amount of e-waste that they put out towards Microsoft’s climate goals?

HUIZINGA: Right.

SMITH: And Microsoft, as a producer of consumer electronics and a consumer of, you know, industrial electronics, has a big e-waste problem itself that we need to, you know, actually take research steps in order to ultimately address, and so what we thought was, you know, we have this end-of-life electronic. We can do things like desolder the components. We can recapture those ICs, which have a lot of embedded carbon in them in the silicon that’s actually there. We can take and we can etch out the copper that has been put over this to form the traces, and we can precipitate out that electrochemically to recapture the copper, but at the end of the day, we’re left with this big chunk of plastic, and it’s got some glass inside of it, too, for completeness sake, and the thought was, you know, how do we do this? You can’t recapture this with FR-4. FR-4, to go back to the spaghetti thing, …

HUIZINGA: Right … [LAUGHS]

SMITH: … spaghetti is glued to itself. It doesn’t come apart. It rips apart if you try and take it apart. And so we wanted to say, you know, what could we do and, you know, what could we do with Aniruddh and his lab in order to get at this problem and to get us at a FR-4 replacement that we could actually reach this complete circularity with.

HUIZINGA: Interesting! Well, Jake, that is an absolutely perfect segue into “how I met your mother,” which is, you know, how you all started working together. Who thought of who first, and so on. I’m always interested to hear both sides of the meet-up. So, Aniruddh, why don’t you take the baton from Jake right there and talk about, from your perspective, how you saw this coming together, who approached who, what happened—and then Jake can confirm or deny the story! [LAUGHTER]

VASHISTH: Yeah, yeah. So it actually started off, I have a fantastic colleague and a very good friend in CS department, Professor Vikram Iyer, and he actually introduced me to Bichlien Nguyen from Microsoft, and we got a coffee together and we were talking about vitrimers, like the work that we do in our lab, and I had this one schematic—I forget if it was on my phone or I was carrying around one paper in my pocket—and I showed them. I was like, you know, if we can actually do a bunch of simulations, guide an ML model, we can create, for lack of a better word, like a ChatGPT-type of model where instead of telling like, “This is the chemistry; tell me what the properties are,” we can go from the other side. You can ask the model, “Hey, I want a vitrimer chemistry which is recyclable, re-processable, that I can make airplanes out of or I can make glasses out of. Tell me what that chemistry would look like.” And I think, you know, Bichlien was excited about this idea, and she connected me with Jake, and I think I’ve been enjoying this collaboration for the last couple of years, …

HUIZINGA: Right …

VASHISTH: … working on that.

HUIZINGA: Was there a paper that started the talk, or was it just this napkin drawing? [LAUGHS]

VASHISTH: I think, to give myself a little bit of credit there, I think there was a paper with a nice drawing on it.

HUIZINGA: Right?

VASHISTH: Yeah. There was a white paper. Yeah.

HUIZINGA: That’s good. Well, Jake, what’s your side of this story?

SMITH: Ah, this is awesome! We got the first half that I didn’t know, so …

HUIZINGA: Oh—filling in gaps!

SMITH: This was the Bichlien-mediated half! [LAUGHTER] I was sharing an office with Bichlien, who apparently came up from this meeting, and, you know, I saw the mythical paper! She put this on my desk. And I’ll plug another MCRI project that we were working on there where—or at the time—where we were attempting to do reverse design, or inverse design, of metal organic frameworks, which are these really interesting molecules that have the possibility to actually serve as carbon capture absorbents, …

HUIZINGA: Oh, wow.

SMITH: … but the approach there was to use machine learning to help us, you know, sample this giant space of metal organic frameworks and find ones that had the property that we cared about. I mean, you draw this diagram that’s much like Aniruddh just described, where you’ve got this model that you train and out the other side comes what you want, and so this paper came down on my desk, and I looked at it and I said, “Hey, that’s what we’re doing!” [LAUGHTER] And it, kind of, you know, went from there. We had a chat. We determined, hey, we’re both interested in, you know, this general approach to getting to novel materials.

HUIZINGA: Right.

SMITH: And then, you know, we’ve already talked about the synergy between our interests and Microsoft’s interests and the, you know, great work or the great particular applications that are possible with the type of polymer work that Aniruddh does.

HUIZINGA: Yeah. So the University of Washington and Microsoft meet again. [LAUGHTER] Well, Jake, let’s do another zoom out question because I know there’s more than just the Microsoft Climate Research Initiative. This project is a perfect example of another broader initiative within Microsoft which has the potential to quote “accelerate and enhance current research,” and that’s AI for Science. So talk about the vision behind AI for Science, and then if you have any success stories—maybe including this one—tell us how it’s working out.

SMITH: Yeah, absolutely. We are—and by we, I mean myself and my immediate colleagues—are certainly not the only ones interested in applying AI to scientific discovery at Microsoft. And it turned out, a year or two after we started this collaboration, a bigger organization named AI for Science arose, and we became part of it. And it’s, you know, generally a group of people who—along with our kind of sister organization in research called Health Futures, who work more on the biology side—are interested in how AI can help us do science in (a) a faster way, but (b) maybe a smarter, better-use-of-resources way, or the ultimate goal, or the ultimate dream, is (c) a way that we just can’t think of doing right now. A way that, you know, it just is fundamentally incompatible with the way that research has historically been done in, you know, small groups of grad students directed by a professor who are themselves, you know, the actual engine behind the work that happens. And so the AI for Science vision, you know, it’s got a couple of parts that really map very well onto this project. The first part is we want to be able to simulate bigger systems. We want to be able to run simulations for longer, and we want to be able to do simulations at higher accuracy. When we get into the details of, you know, the particulars of the vitrimer project, you’ll see that one of the fundamental blocks here is the ability to run simulations, and Aniruddh’s excellent grad student Yiwen, you know, spent a ton of time trying to identify the appropriate simulation parameters in order to capture the behavior that we care about here. And so, the first AI for Science vision says we don’t need Yiwen to do that, you know, we’re going to have a drop-in solution or we’re going to have, you know, a set of drop-in solutions that can, you know, take this work away from you and make it much easier for you to go straight to running the simulations that you care about.

HUIZINGA: Yeah. A couple questions. Not on the list here, but you prompted them. No pun intended. Are these specialized models with the kinds of information … I mean, if I go to ChatGPT and ask it to do what you guys are doing, I’m not going to get the same return am I?

SMITH: Absolutely.

HUIZINGA: Am I?

SMITH: Oh, no, no, no, no! [LAUGHTER] I was saying you were absolutely correct. [LAUGHS] You can ask ChatGPT, and it will tell you all sorts of things that are very interesting. It can tell you, probably, a vitrimer. It could give you Aniruddh’s spiel about the spaghetti, I’m sure, if you prompted it in the correct way. But what it can’t tell you is, you know, “Hey, I have this particular vitrimer composition, and I would like to know at what temperature it’s going to melt when I heat it up.”

HUIZINGA: Right. OK, so I have one more question. You talk about the simulations. Those take a lot of compute. Am I right? Am I right?

SMITH: You’re absolutely right.

VASHISTH: Yeah.

HUIZINGA: So is that something that Microsoft brings to the party in terms of … I mean, does the University of Washington have the same access to that compute, or what’s the deal?

VASHISTH: I think especially on the scale, we were super happy and excited that we were collaborating with Microsoft. I think one of these simulations took, like, close to a couple of weeks, and we ended up doing, I would say, like, close to more than 30,000 simulations. So that’s a lot of compute time if you think about it.

HUIZINGA: To put that in perspective, how long would it take a human to do those simulations? [LAUGHS]

SMITH: [LAUGHS] Oh, man, to try and actually, like, go do all this in the lab …

HUIZINGA: Right!

SMITH: First, you got to make these 30,000, like, starting materials. This in itself … let’s say you could buy those. Then to actually run the experiments, how long does it take to do one …

HUIZINGA: And how much money?

VASHISTH: That’s … that’s like you’re talking about like one PhD student there.

HUIZINGA: Right?

VASHISTH: That’s like, you know, it takes like a couple of years just to synthesize something properly and then characterize it, and it’s …

HUIZINGA: Yeah …

VASHISTH: Yeah, no, I think the virtual world does have some pluses to it.

HUIZINGA: So this is a really good argument for AI for Science, meaning the things that it can do, artificial intelligence can do, at a scale that’s much smaller than what it would take a human to do.

SMITH: Yeah, absolutely. And I’ll plug the other big benefit now, which is, hey, we can run simulations. This is fantastic. But the other thing that I think all of us really hope AI can do is it can help us determine which simulations to run …

HUIZINGA: Ooh …

SMITH: … so we need less compute overall, we need less experiments if we have to go do the experiments, and this is …

HUIZINGA: So it’s the winnowing process.

SMITH: Exactly.

HUIZINGA: OK. That’s actually really interesting.

SMITH: And this is, like, the second, or maybe even the largest, vector for acceleration that we could see.

HUIZINGA: Cool. Well, every show I ask, what could possibly go wrong if you got everything right? And, Aniruddh, I want to call this the “Defense Against the Dark Arts” question for you. You’re using generative AI to propose what you call novel chemistries, which can sound really cool or really scary, depending on how you look at it. But you can’t just take advice from a chatbot and apply it directly to aerospace. You have to kind of go through some processes before. So what role do people, particularly experts in other disciplines, play here, and what other things do you need to be mindful of to ensure the outputs you get from this research are valid?

VASHISTH: Yeah, yeah. That’s a fantastic question. And I’ll actually piggyback on what Jake just said here, about Yiwen Zheng, who’s like a fantastic graduate student that we have in our lab. He figured out how to run these simulations at the first point. It was like six months of … like, really long ordeal. How to make sure that in the virtual world, we are synthesizing these polymers correctly and we are testing them correctly. So that human touch is essential, I feel like, at every step of this research, not just like doing virtual characterization or virtual synthesis of these materials, training the models, but eventually, when you train the models also and the model tells you that, well, these are, like, the 10 best polymers that would work out, there you need people like Jake who are like chemists, you know. They come in [LAUGHTER] and they’re like, hey, you know what? Like, out of these 10 chemistries, this one you can actually synthesize. It’s a one-step reaction or things of that sort. So we have a chemist in our lab also, Dr. Agni Biswal, who’s a postdoc. So we actually show him all these chemistries, apart from Jake and Bichlien. We show the chemistries to all the chemists and say, like, OK, what do you think about this? How do these look like? Are they totally insane, or can we actually make them? [LAUGHTER]

SMITH: Yeah, we still need that, like, human evaluation step at the end, at this point.

HUIZINGA: Yeah … VASHISTH: Exactly.

HUIZINGA: Ask a chemist! Well, and I would imagine it would be further than just, “This would be the best one,” or something like, “You better not do that one.” Are there ever like crazy responses or replies from the model?

SMITH: [LAUGHS] It’s fascinating. Models are very good—and particularly we’ll talk about models that generate small organic structures—at generating things that look reasonable. They follow all the rules. But there’s this next step beyond that. And you see this when you talk to people who’ve worked in med chem for, you know, 30 years of their life. Well, they’ll look at a structure and they’ll, like, get this gut feeling like, you know, a storm is coming in and their knee hurts, and they really don’t like that molecule. [LAUGHTER] And if you push them a little bit, you know, sometimes they can figure out why. They’ll be like, oh, I worked on, you know, a molecule that looked like that 20 years ago, and it, you know, turned out to have this toxicity, and so I don’t want to touch that again. But oftentimes, people can’t even tell you. They’ve just got this instinct …

HUIZINGA: Really?

SMITH: … that they’ve built up, and trying to, you know, capture that intuition is a really interesting next frontier for this sort of research.

HUIZINGA: Wow. You know, you guys are just making my brain fry because it’s like so many other questions I want to ask, but we’re actually getting there to some of them, and I’m hoping we’ll address those questions with the other things I have. So, Jake, I want to come … Well, first of all, Aniruddh, have you finished your defense against the dark arts? [LAUGHS]

VASHISTH: I think I can point out one more thing very quickly there, and as Jake said, like, we are learning a lot, particularly about these materials, like, the vitrimer materials. These are new chemistries, and we are still learning about, like, the mechanical, thermorheological properties; how to handle these materials. So I think there’s a lot that we don’t know right now. So it’s like a bunch of, like, unknown unknowns that are there. So …

HUIZINGA: Well, and that’s research, right? The unknown unknowns. Jake, I want to come back to the vision of the climate research initiative for a minute. One goal is to develop technologies that reduce the raw tonnage of e-waste, obviously. But if we’re honest, advances in technology have almost encouraged us to throw stuff away. It’s like before it even wears out. And I think we talked earlier about, you know, this will last as long as my car lasts or whatever, but I don’t like my car in five years. I want a different one, right? So I wonder if you’ve given any thought to what things, in addition to the work on reusable and recyclable components, we might do to reverse engineer the larger throwaway culture?

SMITH: This was interesting. I feel like this gets into real questions about social psychology and our own behaviors …

HUIZINGA: Yeah …

SMITH: … with individual things. Why do I have this can of carbonated water here when I could have a glass of carbonated water? But I want to, kind of, completely sidestep that because …

HUIZINGA: Yeah … Well, we know why! Because it’s convenient, and you can take it in your car and not spill.

SMITH: Agreed. Yes. All right. [LAUGHTER] I also have this cup, and it could not spill, as well.

HUIZINGA: True! Recyclable—reusable.

SMITH: Ahhh … no, no … this is like a—it’s an ingrained consumer behavior that I’ve developed that might … I’ll slip into “Jake’s Personal Perspectives” here, which is that it should not be on the individual consumer behavior changes to ultimately drive a shift towards reusable and recyclable things. And so one of the fundamental, like, hypotheses that we had with the, you know, design of the projects we put together with the MCRI was that if we put appropriate economic incentives in place, then we can naturally guide behavior at a much bigger scale than the individual consumer. And maybe we’ll see that trickle down to the consumer. Or maybe this means that the actual actors, the large-scale actors, then have the economic incentive to follow it themselves.

HUIZINGA: Right.

SMITH: And so with the e-waste question in particular, we talked a lot about FR-4 and, you know, it’s the part of the circuit board that you’re left over with at the end that there’s just nothing to do with …

HUIZINGA: Right.

SMITH: … and so you toss into landfill, you burn it, you do something like this. But, you know, with a project like this, where our goal was to take that material and now make it reusable, we can add this actual economic value to the waste there.

HUIZINGA: Yeah. I realized even as I asked that question, that I had the answer embedded in the question because, in part, how we design technologies drives how people use things.

SMITH: Yeah, absolutely. VASHISTH: Yeah.

HUIZINGA: And usually, the drivers are convenience and economics. So if upstream of consumer … consumption? [LAUGHTER] Upstream of that, the design drives environmental health and so on, that’s actually … that’s up to you guys! So let’s get out of this booth and get back to work! [LAUGHTER] Well, Jake, to that point, talk about the economics. We talk about a circular economy. And I know that recycling is expensive. Can you talk a little bit about how that could be impacted by work that you guys do?

SMITH: Recycling absolutely is expensive relative to landfilling or a similar alternative.

HUIZINGA: Right …

SMITH: One of the things that makes us target e-waste is that there are things of value in e-waste that are, like, innately valuable. When you go recollect that copper or the gold that you’ve put into this, when you recollect the integrated circuits, you know, they had value, and so a lot of the economic drive is already there to get you to the point where you have these circuit boards. And then, you know, the question was, how do we get that next bit of economic value so that you’ve taken steps this far, you have this pile of circuit boards, so you’ve already been incentivized to get to here and it will be easy to make this—even if it’s not a completely economically productive material—versus synthesizing a circuit board from virgin plastic, but it’s offset enough. We’ve taken enough of that penalty for reuse out that it can be justifiable to go do.

HUIZINGA: Right. OK. So talk—again, off script a little bit—but talk a little bit about how vitrimers help take it to the last mile.

VASHISTH: Yeah, I think the inherent property of the polymer to kind of unclick and re-click back again, the heal-ability of the polymer, that’s something that, kind of, drives this reusability and re-processability of the material. I’ll just, like, point out, like, you know, particularly to the PCB case, where we recently published a collaborative paper where we showed that we can actually make PCB boards using vitrimers. We can unassemble everything. We can take out the electronics, and even the composite, the glass fiber and the polymer composite, we can actually separate that, as well, which is, in my mind, like, a pretty big success.

HUIZINGA: Yeah.

VASHISTH: And then we can actually put everything back together and remake a PCB board, and, you know, keep on doing that. So …

HUIZINGA: OK, so you had talked to me before about “Ring Around the Rosie” and the hands and the feet. Can you … ?

SMITH: [LAUGHS] His favorite analogy!

HUIZINGA: Do that one just for our audience because it’s good.

VASHISTH: OK. So I’ll talk a little bit about thermoset/thermoplastic again, and then I’ll just give you a much broader perspective there.

HUIZINGA: Yeah.

VASHISTH: So the FR-4 PCBs that are made, they are usually made with thermosetting polymers. So if you think about thermosetting polymers, just think of kids playing “Ring of Roses,” right? Like their hands are fixed and their feet are fixed. Once the network is formed, there’s no way you can actually destroy that network. The nice thing about vitrimers is that when you provide an external stimulus, like, just think about these kids playing “Ring of Roses” again. Their feet can move and their handshakes can change, but the number of handshakes remain the same. So the polymer is kind of, like, unclicking and re-clicking back again.

HUIZINGA: OK.

VASHISTH: And if you can cleverly use this mechanism, you can actually recycle, reprocess the polymer itself. But what we showed, particularly for the PCB paper, was that you can actually separate all the other constituents that are associated with this composite, yeah.

HUIZINGA: OK. That’s … I love that. Well, sticking with you for a second, Aniruddh, talking about mechanical reality—not just chemical reality, but mechanical reality—even the best composites wear out, from wear and tear. Talk about the goal of this work on novel polymers from an engineering perspective. How do you think about designing for reality in this way?

VASHISTH: Yeah, yeah. That’s a fantastic question. So we were really motivated by what type of mechanical or thermal loadings materials see in day-to-day life. You know, I sit in my car, I drive it, it drives over the road, there is some fatigue loadings, there’s dynamic loading, and that dynamic loading actually leads to some mechanical flaws in the material, which damages it. And the thought was always that, can we restrict that flaw, or can we go a step further? Can we actually reverse that damage in these composites? And that’s where, you know, that unclicking/re-clicking behavior of vitrimer becomes, like, really powerful. So actually, the first work that we did on these type of materials was that we took a vitrimer composite and we applied fatigue loading on it, cyclic loading on it, mechanical loading. And then we saw that when there was enough damage accumulated in the system, we healed the system. And then we did this again. And we were able to do it again and again until I was like, I’ve spent too much money on this test frame! [LAUGHS] But it was really exciting because for a particular loading case that we were looking at, traditional composites were able to sustain that for 10,000 cycles, but for vitrimers, if we did periodic healing in the material, we were able to go up to a million cycles. So I think that’s really powerful.

HUIZINGA: Orders of magnitude.

VASHISTH: Yeah, exactly.

HUIZINGA: Wow. Jake, I want to broaden the conversation right now, beyond just you and Aniruddh, and talk about the larger teams you need to assemble to ensure success of projects like this. Do you have any stories you could share about how you go about building a team? You kind of alluded to it at the beginning. There’s sort of a pickup basketball metaphor there. Hey, he’s doing that. We’re doing this. But you have some intentionality about people you bring in. So what strengths do each institution bring, and how do you build a team?

SMITH: Yeah, absolutely. We’ve tried a bunch of these collaborations, and we’ve definitely got some learnings about which ones work better than others. This has been a super productive one. I think it’s because it has that right mix of skills and the right mix of things that each side are bringing. So what we want from a Microsoft side for a successful collaboration is we want a collaborator who is really a domain expert in, you know, something that we don’t necessarily understand but who can tell us, in great detail, these are the actual design criteria; these are, you know, where I run into trouble with my traditional research; this is the area that, you know, I’d like to do faster, but I don’t necessarily know how. And this was the critical part, I think, you know, from the get-go. They need to, themselves, be an extremely, you know, capable subject matter expert. Otherwise, we’re just kind of chatting. We don’t have anyone that really knows what the problem truly is and you make no progress or you … worse, you spend a whole lot of resources to make “progress”—I’m doing air quotes …

HUIZINGA: Yeah. I love air quotes on a podcast!

SMITH: [LAUGHS]—that is actually just completely tangential to what the field needs or what the actual device needs. So this was, you know, the fundamental ingredient. And then on top of that, we need to find a problem that’s of joint interest where, in particular, …

HUIZINGA: Right …

SMITH: … computation can help. You talked about the amount of computation that we have at our disposal as researchers at Microsoft, which is a tremendous strength. And so we want to be able to leverage that. And so for a collaboration like this, where running a large number of simulations was a fundamental ingredient to doing it, this was, you know, a really good fit, that we could come in and we could enable them to have more data to train the models that we build together.

HUIZINGA: Mm-hm. Well, as researchers, are you each kind of always scanning the horizon for who else is doing things in your field that—or tangential to your field but necessary? How does that work for recruiting, I would say?

VASHISTH: Yeah, that’s a good question. I think … I mean, that’s kind of like the job, right. For the machine learning work we did, we saw a lot of inspiration from biology, where people have been designing biomolecules. The challenges are different for us. Like, we are designing much larger chains. But we saw some inspiration from there. So always, like, looking out for, like, who is doing what is super helpful, and it leads to, like, really nice collaborations, as well. We’ve had, like, really fruitful collaborations with the professor Sid Kumar at TU Delft, and we always get his wisdom on some of these things, as well. But yeah, recruiting students also becomes, like, very interesting and how, like, people who can help us achieve our idea …

HUIZINGA: Yeah. Jake, what’s your take on it from the other seat? I mean, do you look actively at universities around the world—and even in your backyard—to … like U Dub … ? [LAUGHTER]

SMITH: My perspective on, like, how collaborations come in to be is they’re really serendipitous. You know, we talked about how this one came in to be, and it was because we all happen to know Vikram, and Vikram happened to connect Bichlien with Aniruddh, and it kind of rolled from there. But you can have serendipitous, you know, meetings at a conference, where you happen to, you know, sit next to someone at a talk and you both share the same perspective on, you know, how a research problem should be tackled, and something could come out of that. Or in some cases, you go actually shopping for a collaborator.

HUIZINGA: Right. [LAUGHTER]

SMITH: You know, you need to talk to 10 people to find the one that has that same research perspective as you. I’ll second Aniruddh’s, you know, observation that you get a very different perspective if you go find someone who, they may have the same, like, perspective on how research should be tackled, but they have a different perspective on what the ultimate output of that research would be. But, you know, they can often point you in areas where your research could be helpful that you can’t necessarily see because you lack the domain knowledge or you lack that particular angle on it.

HUIZINGA: Which is another interesting thing in my mind is, you know, the role that papers, published papers, play—that’s a lot of p’s in a sentence [LAUGHTER] … alliteration—that you would be reading or hearing about either in a lightning talk or a presentation at a conference. Does that broaden your perspective, as well? And how do you … like, do you call people up? “I read your paper … ”?

SMITH: [LAUGHS] I have cold-emailed people. You know, this works sometimes! Sometimes this is just the introduction that you need. But the interesting thing in my mind is how much the computer science conferences and things like ChemRxiv and arXiv have really replaced, for me, the traditional chemistry literature or the traditional publishing literature where you can have a conversation with this person while they’re still actively doing the work because they put their initial draft up there and it still needs revision, and there’s opportunities even earlier on in the research process than we’ve had in the past.

HUIZINGA: Huh. And to your earlier point, I’m envisioning an Amazon shopping cart for research collaborators. [LAUGHTER] “Oh, he looks good. Into my cart.” Aniruddh, I always like to know where a project is on the spectrum from what I call lab to life, and I know there are different development stages when it comes to technology finding its way into production and then into broader use. So to use another analogy I like, pretend this is a relay race and research is the first leg. Who else has to run, and who brings it across the line?

VASHISTH: Yeah, yeah. So I think the initial work that we have done, I think it’s been super fruitful, and to Jake’s point, like, converging to, like, a nice output. It took a bunch of chemists, mechanical engineers, simulation folks, machine learning scientists to get where we are. And, as Jake mentioned, we’ve actually put some of our publications on arXiv, and it’s getting traction now. So we’ve had some excitement from startups and companies which make polymers asking us, “Oh, can you actually … can we get a slice of this framework that you’re developing for designing vitrimers?” Which is very promising. So we have done very fundamental work, but now, like, what’s called “the valley of death” in research, [LAUGHTER] like taking it from lab to like production scale, …

HUIZINGA: Yeah.

VASHISTH: … it’s usually a very tightly knit collaboration between industry, labs, and sometimes national labs, too. So we’re excited that, actually, a couple of national labs have been interested in the work that we have been doing, so super optimistic about it.

HUIZINGA: So would you say that the vitrimer-based printed circuit board is a proof of concept right now? Or have you made prototypes? Where is that now?

SMITH: Yeah, absolutely. We’ve mentioned our other collaborator, Vikram Iyer, a couple of times. And in collaboration with his lab, we did actually make a prototype circuit board. We showed that it works as you expect. We showed that it can be disassembled. It can be put back together, and it still works as expected …

HUIZINGA: The “break stuff/make stuff back” thing …

VASHISTH: Yeah, exactly.

SMITH: But, you know, I think to the spirit of the question, it’s still individual kind of one-off experiments being run in a lab, and Aniruddh is right. There’s a long way to go from, like, Technology Readiness Level 3, where we’re doing it ourselves on bench scale, up to, you know, the 7, 8, 9, where it’s actually commercially viable and someone has been able to reproduce this at scale.

HUIZINGA: Right. … So that’s when you bring investors in or labs that can make stuff in and scale.

VASHISTH: Yeah. Yeah, I think once you’re, like, close to 7, I think that’s where you’re pretty much ready for the big show.

HUIZINGA: So where are you now? 2? 3?

VASHISTH: I would say, like, 2 or 3 …

SMITH: 2, 3, somewhere in that range.

VASHISTH: Yeah.

HUIZINGA: OK.

SMITH: The scales, kind of, differ depending on which agencies you see put it out.

HUIZINGA: So, Jake, before we close, I want to talk briefly about other applications of recyclable vitrimer-based polymers, in light of their importance to the climate research initiative and AI for Science. So what other industries have polymer components that have nowhere to go after they die but the landfill, and will this research transfer across to those industries?

SMITH: An excellent question. So my personal view on this is that there’s a couple of classes of polymers. There’s these very high-value application uses of polymers where we’re talking about the printed circuit boards; we’re talking about aerospace composite; we’re talking about the panels on your car; we’re talking about things like wind turbines …

HUIZINGA: Oh, yeah.

SMITH: … where there’s a long life cycle. You have this device that’s going to be in use for five years, 50 years, and at the end of that, the polymer itself is still probably pretty good. You could still use it and regenerate it. And so Aniruddh’s lab has done great work showing that you can take things like the side panel of a plane and actually disassemble this thing, heal it, keep it in use longer, and use it at the end of its lifetime. There’s this other class of polymers, which I think are the ones that most people think about—your Coke bottle—and vitrimers seem like a much harder sell there. I think this is more the domain of, you know, biodegradable polymers in the long run to really tackle the issues there. But I’m very excited in this, you know, high-value polymer, this long-lifetime polymer, this, like, permanent install polymer, however you want to think about it, for work like this to have an impact.

HUIZINGA: Yeah. From your lab’s perspective, Aniruddh, where do you see other applications with great promise?

VASHISTH: Yeah. So as Jake said, places where we need high-performance polymers is where we can go. So PCBs is one, aerospace and automotive industry is one, and maybe medical industry is, …

HUIZINGA: Oh, interesting…

VASHISTH: … yeah, is another one where we can actually … if you can make prosthetics out of vitrimers … prosthetics actually lose a little bit of their stiffness, you know, as you use them, and that’s because of localized damage. It’s the fatigue cycle, right. So what if you can actually heal your prosthetics and reuse them? So, yeah, I feel like, you know, there’s so many different applications, so many different routes that we can go down.

HUIZINGA: Yeah. Well, I like to end our Collaborators shows with a little vision casting, and I feel like this whole podcast is that. I should also say, you know, back in the ’50s, there was the big push to make plastics! Your word is vitrimers! So let’s do a little vision casting for vitrimer-based polymers. Assuming your research is wildly successful and becomes a truly game-changing technology, what does the future look like—I mean, specified future, not general future—and how has your work disrupted this field and made the world a better place? I’ll let you each have the last word. Who’d like to go first?

VASHISTH: Sure, I can go first. I’ll try to make sure that I break it up into computation and experiments …

HUIZINGA: Good.

VASHISTH: … so that once I go back, like, my lab does not, like, pounce on me. [LAUGHS] Yeah, so I think from the computation point of view, we always thought that if somebody gave us, like, a hundred different chemistries, we can actually bottle it down to, like, we can do a bunch of simulations; tell you, like, 10 of these actually work. What we’ve been able to do specifically for vitrimers is that we’re able to look at the problem from the other side, and we are able to say that if you tell me a particular application, this particular chemistry would work best for you. In essence, what we were thinking of is that if aliens abducted all the chemists from the world, can we actually come up with a framework? [LAUGHS] So I think it’ll be difficult to get there because as I said earlier that, you know, you need that human touch. But I think we are happy that that we are getting there. And I think what remains to be seen now is, like, you know, now that we have this type of a framework, like what are the next challenges? Like, we are going from the lab to the large scale; like, what challenges are associated there? And I think similarly for the experimental side of things also, we know a lot—we have developed frameworks—but there’s a lot of work that still needs to be done in understanding and translating these technologies to real-life applications.

HUIZINGA: I like that you’re kind of hedging your bets there, saying, I’m not going to paint a picture of the perfect world because my lab is going to be responsible for delivering it. [LAUGHTER] Jake, assuming you haven’t been abducted by aliens, what’s your take on this?

SMITH: I view, kind of, the goal of this work and the ideal impact of this work as an acceleration of getting us to these polymers being deployed in all these other applications that we’ve talked about, and we can go broader than this.

HUIZINGA: Yeah …

SMITH: I think that there’s a lot of work, both within the MCRI, within Microsoft, and outside of Microsoft in the bigger field, focused on acceleration towards a specific goal. And if all of this work is successful, in 10 years, maybe our materials design process looks completely different, where we’ve gone from this kind of brute-force screening that Aniruddh has talked about to an approach where you start with the properties that you care about; they’re defined by the application that you have in mind. You want to make your vitrimer PCB, it needs to have, you know, a specific temperature where it becomes gummy; it needs to have a specific resistance to burning; it needs to be able to effectively serve as the dielectric for your bigger circuits. And we use this, like, “need space” to define the material that we would like, and we can use machine learning, artificial intelligence, in order to get us to the structure that we need to make in order to actually achieve this design space. And so, this was, you know, our big bet within AI for Science. This is the big bet of this project. And with this project, you know, we take one step towards showing that you can do this in one case. And the future casting would be we can do this in every materials design case that you can think about.

HUIZINGA: Hmmm. You know, I’m thinking of lanes—track analogy again—but, you know, you’ve got mechanical engineering, you’ve got chemistry, and you’ve got artificial intelligence, and each of those sciences is advancing, and they’re using each other to, sort of, help advance in various ways, so this is an exciting, exciting project and collaboration.

[MUSIC]

Jake, Aniruddh, thanks for joining us today on Collaborators. This has been really fun for me. [LAUGHTER] So thanks for coming in and sharing your stories today.

VASHISTH: Thank you so much.

SMITH: Yeah. Of course. Thank you.

[MUSIC FADES]

The post Collaborators: Sustainable electronics with Jake Smith and Aniruddh Vashisth appeared first on Microsoft Research.

Japan Enhances AI Sovereignty With Advanced ABCI 3.0 Supercomputer

Enhancing Japan’s AI sovereignty and strengthening its research and development capabilities, Japan’s National Institute of Advanced Industrial Science and Technology (AIST) will integrate thousands of NVIDIA H200 Tensor Core GPUs into its AI Bridging Cloud Infrastructure 3.0 supercomputer (ABCI 3.0). The HPE Cray XD system will feature NVIDIA Quantum-2 InfiniBand networking for superior performance and scalability.

ABCI 3.0 is the latest iteration of Japan’s large-scale Open AI Computing Infrastructure designed to advance AI R&D. This collaboration underlines Japan’s commitment to advancing its AI capabilities and fortifying its technological independence.

“In August 2018, we launched ABCI, the world’s first large-scale open AI computing infrastructure,” said AIST Executive Officer Yoshio Tanaka. “Building on our experience over the past several years managing ABCI, we’re now upgrading to ABCI 3.0. In collaboration with NVIDIA we aim to develop ABCI 3.0 into a computing infrastructure that will advance further research and development capabilities for generative AI in Japan.”

“As generative AI prepares to catalyze global change, it’s crucial to rapidly cultivate research and development capabilities within Japan,” said AIST Solutions Co. Producer and Head of ABCI Operations Hirotaka Ogawa. “I’m confident that this major upgrade of ABCI in our collaboration with NVIDIA and HPE will enhance ABCI’s leadership in domestic industry and academia, propelling Japan towards global competitiveness in AI development and serving as the bedrock for future innovation.”

The ABCI 3.0 supercomputer will be housed in Kashiwa at a facility run by Japan’s National Institute of Advanced Industrial Science and Technology. Credit: Courtesy of National Institute of Advanced Industrial Science and Technology.

ABCI 3.0: A New Era for Japanese AI Research and Development

ABCI 3.0 is constructed and operated by AIST, its business subsidiary, AIST Solutions, and its system integrator, Hewlett Packard Enterprise (HPE).

The ABCI 3.0 project follows support from Japan’s Ministry of Economy, Trade and Industry, known as METI, for strengthening its computing resources through the Economic Security Fund and is part of a broader $1 billion initiative by METI that includes both ABCI efforts and investments in cloud AI computing.

NVIDIA is closely collaborating with METI on research and education following a visit last year by company founder and CEO, Jensen Huang, who met with political and business leaders, including Japanese Prime Minister Fumio Kishida, to discuss the future of AI.

NVIDIA’s Commitment to Japan’s Future

Huang pledged to collaborate on research, particularly in generative AI, robotics and quantum computing, to invest in AI startups and provide product support, training and education on AI.

During his visit, Huang emphasized that “AI factories” — next-generation data centers designed to handle the most computationally intensive AI tasks — are crucial for turning vast amounts of data into intelligence.

“The AI factory will become the bedrock of modern economies across the world,” Huang said during a meeting with the Japanese press in December.

With its ultra-high-density data center and energy-efficient design, ABCI provides a robust infrastructure for developing AI and big data applications.

The system is expected to come online by the end of this year and offer state-of-the-art AI research and development resources. It will be housed in Kashiwa, near Tokyo.

Unmatched Computing Performance and Efficiency

The facility will offer:

6 AI exaflops of computing capacity, a measure of AI-specific performance without sparsity
410 double-precision petaflops, a measure of general computing capacity
Each node is connected via the Quantum-2 InfiniBand platform at 200GB/s of bisectional bandwidth.

NVIDIA technology forms the backbone of this initiative, with hundreds of nodes each equipped with 8 NVLlink-connected H200 GPUs providing unprecedented computational performance and efficiency.

NVIDIA H200 is the first GPU to offer over 140 gigabytes (GB) of HBM3e memory at 4.8 terabytes per second (TB/s). The H200’s larger and faster memory accelerates generative AI and LLMs, while advancing scientific computing for HPC workloads with better energy efficiency and lower total cost of ownership.

NVIDIA H200 GPUs are 15X more energy-efficient than ABCI’s previous-generation architecture for AI workloads such as LLM token generation.

The integration of advanced NVIDIA Quantum-2 InfiniBand with In-Network computing — where networking devices perform computations on data, offloading the work from the CPU — ensures efficient, high-speed, low-latency communication, crucial for handling intensive AI workloads and vast datasets.

ABCI boasts world-class computing and data processing power, serving as a platform to accelerate joint AI R&D with industries, academia and governments.

METI’s substantial investment is a testament to Japan’s strategic vision to enhance AI development capabilities and accelerate the use of generative AI.

By subsidizing AI supercomputer development, Japan aims to reduce the time and costs of developing next-generation AI technologies, positioning itself as a leader in the global AI landscape.

FlashAttention-3: Fast and Accurate Attention with Asynchrony and Low-precision

Attention, as a core layer of the ubiquitous Transformer architecture, is a bottleneck for large language models and long-context applications. FlashAttention (and FlashAttention-2) pioneered an approach to speed up attention on GPUs by minimizing memory reads/writes, and is now used by most libraries to accelerate Transformer training and inference. This has contributed to a massive increase in LLM context length in the last two years, from 2-4K (GPT-3, OPT) to 128K (GPT-4), or even 1M (Llama 3). However, despite its success, FlashAttention has yet to take advantage of new capabilities in modern hardware, with FlashAttention-2 achieving only 35% utilization of theoretical max FLOPs on the H100 GPU. In this blogpost, we describe three main techniques to speed up attention on Hopper GPUs: exploiting asynchrony of the Tensor Cores and TMA to (1) overlap overall computation and data movement via warp-specialization and (2) interleave block-wise matmul and softmax operations, and (3) incoherent processing that leverages hardware support for FP8 low-precision.

We’re excited to release FlashAttention-3 that incorporates these techniques. It’s 1.5-2.0x faster than FlashAttention-2 with FP16, up to 740 TFLOPS, i.e., 75% utilization of H100 theoretical max FLOPS. With FP8, FlashAttention-3 reaches close to 1.2 PFLOPS, with 2.6x smaller error than baseline FP8 attention.

FlashAttention-3 is available at: https://github.com/Dao-AILab/flash-attention
Paper

FlashAttention Recap

FlashAttention is an algorithm that reorders the attention computation and leverages tiling and recomputation to significantly speed it up and reduce memory usage from quadratic to linear in sequence length. We use tiling to load blocks of inputs from HBM (GPU memory) to SRAM (fast cache), perform attention with respect to that block, and update the output in HBM. By not writing the large intermediate attention matrices to HBM, we reduce the amount of memory reads/writes, which brings 2-4x wallclock time speedup.

Here we show a diagram of FlashAttention forward pass: with tiling and softmax rescaling, we operate by blocks and avoid having to read/write from HBM, while obtaining the correct output with no approximation.

New hardware features on Hopper GPUs – WGMMA, TMA, FP8

While FlashAttention-2 can achieve up to 70% theoretical max FLOPS on Ampere (A100) GPUs, it does not yet take advantage of new features on Hopper GPUs to maximize performance. We describe some of the new Hopper-specific features here, and why they are important.

1. WGMMA (Warpgroup Matrix Multiply-Accumulate). This new feature makes use of the new Tensor Cores on Hopper, with much higher throughput¹ than the older mma.sync instruction in Ampere (image from the H100 white paper).

2. TMA (Tensor Memory Accelerator). This is a special hardware unit that accelerates the transfer of data between global memory and shared memory, taking care of all index calculation and out-of-bound predication. This frees up registers, which is a valuable resource to increase tile size and efficiency.

3. Low-precision with FP8. This doubles the Tensor Core throughput (e.g. 989 TFLOPS with FP16 and 1978 TFLOPS with FP8), but trades off accuracy by using fewer bits to represent floating point numbers.

FlashAttention-3 makes use of all of these new features of Hopper, using powerful abstractions from NVIDIA’s CUTLASS library.

By rewriting FlashAttention to use these new features, we can already significantly speed it up (e.g., from 350 TFLOPS in FlashAttention-2 FP16 forward pass to around 540-570 TFLOPS). However, the asynchronous nature of the new instructions on Hopper (WGMMA and TMA) opens up additional algorithmic opportunities to overlap operations and thereby extract even greater performance. For this blogpost, we’ll explain two such techniques specific to attention. The generic technique of warp specialization, with separate producer and consumer warps doing TMA and WGMMA, is well-covered elsewhere in the context of GEMM and works the same here.

Asynchrony: Overlapping GEMM and Softmax

Why overlap?

Attention has GEMMs (those matmuls between Q and K and between attention probability P and V) and softmax as its two main operations. Why do we need to overlap them? Isn’t most of the FLOPS in the GEMMs anyway? As long as the GEMMs are fast (e.g., computed using WGMMA instructions), shouldn’t the GPU be going brrrr?

The problem is that non-matmul operations are much slower than matmul operations on modern accelerators. Special functions such as exponential (for the softmax) have even lower throughput than floating point multiply-add; they are evaluated by the multi-function unit, a unit separate from floating point multiply-add or matrix multiply-add. As an example, the H100 GPU SXM5 has 989 TFLOPS of FP16 matrix multiply, but only 3.9 TFLOPS (256x less throughput) for special functions²! For head dimension 128, there are 512x more matmul FLOPS than exponential, which means that exponential can take 50% of the time compared to matmul. The situation is even worse for FP8, where the matmul FLOPS are twice as fast yet exponential FLOPS stay the same speed. Ideally we want matmul and softmax to operate in parallel. While the Tensor Cores are busy with matmul, the multi-function units should be calculating exponential!

Inter-warpgroup overlapping with pingpong scheduling

The first and easiest way to overlap GEMM and softmax is to do nothing at all! The warp schedulers already try to schedule warps so that if some warps are blocked (e.g., waiting for GEMM results), other warps can run. That is, the warp schedulers do some of this overlapping for us, for free.

However, we can improve on this by doing some of the scheduling manually. As an example, if we have 2 warpgroups (labeled 1 and 2 – each warpgroup is a group of 4 warps), we can use synchronization barriers (bar.sync) so that warpgroup 1 first does its GEMMs (e.g., GEMM1 of one iteration and GEMM0 of the next iteration), and then warpgroup 2 does its GEMMs while warpgroup 1 does its softmax, and so on. This “pingpong” schedule is illustrated in the figure below, where the same color denotes the same iteration.

This would allow us to perform the softmax in the shadow of the GEMMs of the other warpgroup. Of course, this figure is just a caricature; in practice the scheduling is not really this clean. Nevertheless, pingpong scheduling can improve FP16 attention forward pass from around 570 TFLOPS to 620 TFLOPS (head dim 128, seqlen 8K).

Intra-warpgroup overlapping of GEMM and Softmax

Even within one warpgroup, we can have some part of softmax running while the GEMMs of that warpgroup is running. This is illustrated in this figure, where the same color denotes the same iteration.

This pipelining increases throughput from around 620 TFLOPS to around 640-660 TFLOPS for FP16 attention forward, at the cost of higher register pressure. We need more registers to hold both accumulators of the GEMMs, and the input/output of softmax. Overall, we find this technique to offer a favorable tradeoff.

Low-precision: reduce quantization error with incoherent processing

LLM activation can have outliers with much larger magnitude than the rest of the features. These outliers make it difficult to quantize, producing much larger quantization errors. We leverage incoherent processing, a technique used in the quantization literature (e.g. from QuIP) that multiplies the query and key with a random orthogonal matrix to “spread out” the outliers and reduce quantization error. In particular, we use the Hadamard transform (with random signs), which can be done per attention head in O(d log d) instead of O(d^2) time, where d is the head dimension. Since the Hadamard transform is memory-bandwidth bound, it can be fused with previous operations such as rotary embedding (also memory-bandwidth bound) “for free”.

In our experiment where Q, K, V are generated from a standard normal distribution but 0.1% of the entries have large magnitudes (to simulate outliers), we found that incoherent processing can reduce the quantization error by 2.6x. We show numerical error comparison in the table below. Please see the paper for details.

Attention benchmark

We show some results with FlashAttention-3, and compare it to FlashAttention-2, as well as the implementation in Triton and cuDNN (both of which already use new hardware features of Hopper GPUs).

For FP16, we see about 1.6x-1.8x speedup over FlashAttention-2

For FP8, we can reach close to 1.2 PFLOPS!

Discussion

This blogpost highlights some of the optimizations for FlashAttention available on Hopper GPUs. Other optimizations (e.g., variable length sequences, persistent kernel, and in-kernel transpose for FP8) are covered in the paper.

We have seen that designing algorithms that take advantage of the hardware they run on can bring significant efficiency gains and unlock new model capabilities such as long context. We look forward to future work on optimization for LLM inference, as well as generalizing our techniques to other hardware architectures.

We also look forward to FlashAttention-3 being integrated in a future release of PyTorch.

Notes

Without the wgmma instruction, the older mma.sync instruction can only reach about ⅔ the peak throughput of Hopper Tensor Cores: https://arxiv.org/abs/2402.13499v1 ↩
The CUDA programming guide specifies that the throughput for special functions is 16 operations per streaming multiprocessor (SM) per clock cycle. We multiply 16 by 132 SMs and 1830 Mhz (clock speed used to calculate 989 TFLOPS of FP16 matmul) to get 3.9 TFLOPS ↩

Knowledge Bases for Amazon Bedrock now supports advanced parsing, chunking, and query reformulation giving greater control of accuracy in RAG based applications

Knowledge Bases for Amazon Bedrock is a fully managed service that helps you implement the entire Retrieval Augmented Generation (RAG) workflow from ingestion to retrieval and prompt augmentation without having to build custom integrations to data sources and manage data flows, pushing the boundaries for what you can do in your RAG workflows.

However, it’s important to note that in RAG-based applications, when dealing with large or complex input text documents, such as PDFs or .txt files, querying the indexes might yield subpar results. For example, a document might have complex semantic relationships in its sections or tables that require more advanced chunking techniques to accurately represent this relationship, otherwise the retrieved chunks might not address the user query. To address these performance issues, several factors can be controlled. In this blog post, we will discuss new features in Knowledge Bases for Amazon Bedrock can improve the accuracy of responses in applications that use RAG. These include advanced data chunking options, query decomposition, and CSV and PDF parsing improvements. These features empower you to further improve the accuracy of your RAG workflows with greater control and precision. In the next section, let’s go over each of the features including their benefits.

Features for improving accuracy of RAG based applications

In this section we will go through the new features provided by Knowledge Bases for Amazon Bedrock to improve the accuracy of generated responses to user query.

Advanced parsing

Advanced parsing is the process of analyzing and extracting meaningful information from unstructured or semi-structured documents. It involves breaking down the document into its constituent parts, such as text, tables, images, and metadata, and identifying the relationships between these elements.

Parsing documents is important for RAG applications because it enables the system to understand the structure and context of the information contained within the documents.

There are several techniques to parse or extract data from different document formats, one of which is using foundation models (FMs) to parse the data within the documents. It’s most helpful when you have complex data within documents such as nested tables, text within images, graphical representations of text and so on, which hold important information.

Using the advanced parsing option offers several benefits:

Improved accuracy: FMs can better understand the context and meaning of the text, leading to more accurate information extraction and generation.
Adaptability: Prompts for these parsers can be optimized on domain-specific data, enabling them to adapt to different industries or use cases.
Extracting entities: It can be customized to extract entities based on your domain and use case.
Complex document elements: It can understand and extract information represented in graphical or tabular format.

Parsing documents using FMs are particularly useful in scenarios where the documents to be parsed are complex, unstructured, or contain domain-specific terminology. It can handle ambiguities, interpret implicit information, and extract relevant details using their ability to understand semantic relationships, which is essential for generating accurate and relevant responses in RAG applications. These parsers might incur additional fees, see the pricing details before using this parser selection.

In Knowledge Bases for Amazon Bedrock, we provide our customers the option to use FMs for parsing complex documents such as .pdf files with nested tables or text within images.

From the AWS Management Console for Amazon Bedrock, you can start creating a knowledge base by choosing Create knowledge base. In Step 2: Configure data source, select Advanced (customization) under Chunking & parsing configurations, as shown in the following image. You can select one of the two models (Anthropic Claude 3 Sonnet or Haiku) currently available for parsing the documents.

If you want to customize the way the FM will parse your documents, you can optionally provide instructions based on your document structure, domain, or use case.

Based on your configuration, the ingestion process will parse and chunk documents, enhancing the overall response accuracy. We will now explore advanced data chunking options, namely semantic and hierarchical chunking which splits the documents into smaller units, organizes and store chunks in a vector store, which can improve the quality of chunks during retrieval.

Advanced data chunking options

The objective shouldn’t be to chunk data merely for the sake of chunking, but rather to transform it into a format that facilitates anticipated tasks and enables efficient retrieval for future value extraction. Instead of inquiring, “How should I chunk my data?”, the more pertinent question should be, “What is the most optimal approach to use to transform the data into a form the FM can use to accomplish the designated task?”^[1]

To achieve this goal, we introduced two new data chunking options within Knowledge Bases for Amazon Bedrock in addition to the fixed chunking, no chunking, and default chunking options:

Semantic chunking: Segments your data based on its semantic meaning, helping to ensure that the related information stays together in logical chunks. By preserving contextual relationships, your RAG model can retrieve more relevant and coherent results.
Hierarchical chunking: Organizes your data into a hierarchical structure, allowing for more granular and efficient retrieval based on the inherent relationships within your data.

Let’s do a deeper dive on each of these techniques.

Semantic chunking

Semantic chunking analyzes the relationships within a text and divides it into meaningful and complete chunks, which are derived based on the semantic similarity calculated by the embedding model. This approach preserves the information’s integrity during retrieval, helping to ensure accurate and contextually appropriate results.

By focusing on the text’s meaning and context, semantic chunking significantly improves the quality of retrieval. It should be used in scenarios where maintaining the semantic integrity of the text is crucial.

From the console, you can start creating a knowledge base by choosing Create knowledge base. In Step 2: Configure data source, select Advanced (customization) under the Chunking & parsing configurations and then select Semantic chunking from the Chunking strategy drop down list, as shown in the following image.

Details for the parameters that you need to configure.

Max buffer size for grouping surrounding sentences: The number of sentences to group together when evaluating semantic similarity. If you select a buffer size of 1, it will include the sentence previous, sentence target, and sentence next while grouping the sentences. Recommended value of this parameter is 1.
Max token size for a chunk: The maximum number of tokens that a chunk of text can contain. It can be minimum of 20 up to a maximum of 8,192 based on the context length of the embeddings model. For example, if you’re using the Cohere Embeddings model, the maximum size of a chunk can be 512. The recommended value of this parameter is 300.
Breakpoint threshold for similarity between sentence groups: Specify (by a percentage threshold) how similar the groups of sentences should be when semantically compared to each other. It should be a value between 50 and 99. The recommended value of this parameter is 95.

Knowledge Bases for Amazon Bedrock first divides documents into chunks based on the specified token size. Embeddings are created for each chunk, and similar chunks in the embedding space are combined based on the similarity threshold and buffer size, forming new chunks. Consequently, the chunk size can vary across chunks.

Although this method is more computationally intensive than fixed-size chunking, it can be beneficial for chunking documents where contextual boundaries aren’t clear—for example, legal documents or technical manuals.^[2]

Example:

Consider a legal document discussing various clauses and sub-clauses. The contextual boundaries between these sections might not be obvious, making it challenging to determine appropriate chunk sizes. In such cases, the dynamic chunking approach can be advantageous, because it can automatically identify and group related content into coherent chunks based on the semantic similarity among neighboring sentences.

Now that you understand the concept of semantic chunking, including when to use it, let’s do a deeper dive into hierarchical chunking.

Hierarchical chunking

With hierarchical chunking, you can organize your data into a hierarchical structure, allowing for more granular and efficient retrieval based on the inherent relationships within your data. Organizing your data into a hierarchical structure enables your RAG workflow to efficiently navigate and retrieve information from complex, nested datasets.

From the console, start creating a knowledge base by choose Create knowledge base. Configure data source, select Advanced (customization) under the Chunking & parsing configurations and then select Hierarchical chunking from the Chunking strategy drop-down list, as shown in the following image.

The following are some parameters that you need to configure.

Max parent token size: This is the maximum number of tokens that a parent chunk can contain. The value can range from 1 to 8,192 and is independent of the context length of the embeddings model because the parent chunk isn’t embedded. The recommended value of this parameter is 1,500.
Max child token size: This is the maximum number of tokens that a child token can contain. The value can range from 1 to 8,192 based on the context length of the embeddings model. The recommended value of this parameter is 300.
Overlap tokens between chunks: This is the percentage overlap between child chunks. Parent chunk overlap depends on the child token size and child percentage overlap that you specify. The recommended value for this parameter is 20 percent of the max child token size value.

After the documents are parsed, the first step is to chunk the documents based on the parent and child chunking size. The chunks are then organized into a hierarchical structure, where parent chunk (higher level) represents larger chunks (for example, documents or sections), and child chunks (lower level) represent smaller chunks (for example, paragraphs or sentences). The relationship between the parent and child chunks are maintained. This hierarchical structure allows for efficient retrieval and navigation of the corpus.

Some of the benefits include:

Efficient retrieval: The hierarchical structure allows faster and more targeted retrieval of relevant information; first by performing semantic search on the child chunk and then returning the parent chunk during retrieval. By replacing the children chunks with the parent chunk, we provide large and comprehensive context to the FM.
Context preservation: Organizing the corpus in a hierarchical manner helps preserve the contextual relationships between chunks, which can be beneficial for generating coherent and contextually relevant text.

Note: In hierarchical chunking, we return parent chunks and semantic search is performed on children chunks, therefore, you might see less number of search results returned as one parent can have multiple children.

Hierarchical chunking is best suited for complex documents that have a nested or hierarchical structure, such as technical manuals, legal documents, or academic papers with complex formatting and nested tables. You can combine the FM parsing discussed previously to parse the documents and select hierarchical chunking to improve the accuracy of generated responses.

By organizing the document into a hierarchical structure during the chunking process, the model can better understand the relationships between different parts of the content, enabling it to provide more contextually relevant and coherent responses.

Now that you understand the concepts for semantic and hierarchical chunking, in case you want to have more flexibility, you can use a Lambda function for adding custom processing logic to chunks such as metadata processing or defining your custom logic for chunking. In the next section, we discuss custom processing using Lambda function provided by Knowledge bases for Amazon Bedrock.

Custom processing using Lambda functions

For those seeking more control and flexibility, Knowledge Bases for Amazon Bedrock now offers the ability to define custom processing logic using AWS Lambda functions. Using Lambda functions, you can customize the chunking process to align with the unique requirements of your RAG application. Furthermore, you can extend it beyond chunking, because Lambda can also be used to streamline metadata processing, which can help unlock additional avenues for efficiency and precision.

You can begin by writing a Lambda function with your custom chunking logic or use any of the chunking methodologies provided by your favorite open source framework such as LangChain and LLamaIndex. Make sure to create the Lambda layer for the specific open source framework. After writing and testing the Lambda function, you can start creating a knowledge base by choosing Create knowledge base, in Step 2: Configure data source, select Advanced (customization) under the Chunking & parsing configurations and then select corresponding lambda function from Select Lambda function drop down, as shown in the following image:

From the drop down, you can select any Lambda function created in the same AWS Region, including the verified version of the Lambda function. Next, you will provide the Amazon Simple Storage Service (Amazon S3) path where you want to store the input documents to run your Lambda function on and to store the output of the documents.

So far, we have discussed advanced parsing using FMs and advanced data chunking options to improve the quality of your search results and accuracy of the generated responses. In the next section, we will discuss some optimizations that have been added to Knowledge Bases for Amazon Bedrock to improve the accuracy of parsing .csv files.

Metadata customization for .csv files

Knowledge Bases for Amazon Bedrock now offers an enhanced .csv file processing feature that separates content and metadata. This update streamlines the ingestion process by allowing you to designate specific columns as content fields and others as metadata fields. Consequently, it reduces the number of required files and enables more efficient data management, especially for large .csv file datasets. Moreover, the metadata customization feature introduces a dynamic approach to storing additional metadata alongside data chunks from .csv files. This contrasts with the current static process of maintaining metadata.

This customization capability unlocks new possibilities for data cleaning, normalization, and enrichment processes, enabling augmentation of your data. To use the metadata customization feature, you need to provide metadata files alongside the source .csv files, with the same name as the source data file and a <filename>.csv.metadata.json suffix. This metadata file specifies the content and metadata fields of the source .csv file. Here’s an example of the metadata file content:

{
    "metadataAttributes": {
        "docSpecificMetadata1": "docSpecificMetadataVal1",
        "docSpecificMetadata2": "docSpecificMetadataVal2"
    },
    "documentStructureConfiguration": {
        "type": "RECORD_BASED_STRUCTURE_METADATA",
        "recordBasedStructureMetadata": {
            "contentFields": [
                {
                    "fieldName": "String"
                }
            ],
            "metadataFieldsSpecification": {
                "fieldsToInclude": [
                    {
                         "fieldName": "String"
                    }
                ],
                "fieldsToExclude": [
                    {
                        "fieldName": "String"
                    }
                ]
            }
        }
    }
}

Use the following steps to experiment with the .csv file improvement feature:

Upload the .csv file and corresponding <filename>.csv.metadata.json file in the same Amazon S3 prefix.
Create a knowledge base using either the console or the Amazon Bedrock SDK.
Start ingestion using either the console or the SDK.
Retrieve API and RetrieveAndGenerate API can be used to query the structured .csv file data using either the console or the SDK.

Query reformulation

Often, input queries can be complex with many questions and complex relationships. With such complex prompts, the resulting query embeddings might have some semantic dilution, resulting in retrieved chunks that might not address such a multi-faceted query resulting in reduced accuracy along with a less than desirable response from your RAG application.

Now with query reformulation supported by Knowledge Bases for Amazon Bedrock, we can take a complex input query and break it into multiple sub-queries. These sub-queries will then separately go through their own retrieval steps to find relevant chunks. In this process, the subqueries having less semantic complexity might find more targeted chunks. These chunks will then be pooled and ranked together before passing them to the FM to generate a response.

Example: Consider the following complex query to a financial document for the fictitious company Octank asking about multiple unrelated topics:

“Where is the Octank company waterfront building located and how does the whistleblower scandal hurt the company and its image?”

We can decompose the query into multiple subqueries:

Where is the Octank Waterfront building located?
What is the whistleblower scandal involving Octank?
How did the whistleblower scandal affect Octank’s reputation and public image?

Now, we have more targeted questions that might help retrieve chunks from the knowledge base from more semantically relevant sections of the documents without some of the semantic dilution that can occur from embedding multiple asks in a single complex query.

Query reformulation can be enabled in the console after creating a knowledge base by going to Test Knowledge Base Configurations and turning on Break down queries under Query modifications.

Query reformulation can also be enabled during runtime using the RetrieveAndGenerateAPI by adding an additional element to the KnowledgeBaseConfiguration as follows:

    "orchestrationConfiguration": {
        "queryTransformationConfiguration": {
        "type": "QUERY_DECOMPOSITION"
    }
}

Query reformulation is another tool that might help increase accuracy for complex queries that you might encounter in production, giving you another way to optimize for the unique interactions your users might have with your application.

Conclusion

With the introduction of these advanced features, Knowledge Bases for Amazon Bedrock solidifies its position as a powerful and versatile solution for implementing RAG workflows. Whether you’re dealing with complex queries, unstructured data formats, or intricate data organizations, Knowledge Bases for Amazon Bedrock empowers you with the tools and capabilities to unlock the full potential of your knowledge base.

By using advanced data chunking options, query decomposition, and .csv file processing, you have greater control over the accuracy and customization of your retrieval processes. These features not only help improve the quality of your knowledge base, but also can facilitate more efficient and effective decision-making, enabling your organization to stay ahead in the ever-evolving world of data-driven insights.

Embrace the power of Knowledge Bases for Amazon Bedrock and unlock new possibilities in your retrieval and knowledge management endeavors. Stay tuned for more exciting updates and features from the Amazon Bedrock team as they continue to push the boundaries of what’s possible in the realm of knowledge bases and information retrieval.

For more detailed information, code samples, and implementation guides, see the Amazon Bedrock documentation and AWS blog posts.

For additional resources, see:

References:

[1] LlamaIndex: Chunking Strategies for Large Language Models. Part — 1
[2] How to Choose the Right Chunking Strategy for Your LLM Application

About the authors

Sandeep Singh is a Senior Generative AI Data Scientist at Amazon Web Services, helping businesses innovate with generative AI. He specializes in Generative AI, Artificial Intelligence, Machine Learning, and System Design. He is passionate about developing state-of-the-art AI/ML-powered solutions to solve complex business problems for diverse industries, optimizing efficiency and scalability.

Mani Khanuja is a Tech Lead – Generative AI Specialists, author of the book Applied Machine Learning and High Performance Computing on AWS, and a member of the Board of Directors for Women in Manufacturing Education Foundation Board. She leads machine learning projects in various domains such as computer vision, natural language processing, and generative AI. She speaks at internal and external conferences such AWS re:Invent, Women in Manufacturing West, YouTube webinars, and GHC 23. In her free time, she likes to go for long runs along the beach.

Chris Pecora is a Generative AI Data Scientist at Amazon Web Services. He is passionate about building innovative products and solutions while also focused on customer-obsessed science. When not running experiments and keeping up with the latest developments in generative AI, he loves spending time with his kids.

Streamline generative AI development in Amazon Bedrock with Prompt Management and Prompt Flows (preview)

Today, we’re excited to introduce two powerful new features for Amazon Bedrock: Prompt Management and Prompt Flows, in public preview. These features are designed to accelerate the development, testing, and deployment of generative artificial intelligence (AI) applications, enabling developers and business users to create more efficient and effective solutions that are easier to maintain. You can use the Prompt Management and Flows features graphically on the Amazon Bedrock console or Amazon Bedrock Studio, or programmatically through the Amazon Bedrock SDK APIs.

As the adoption of generative AI continues to grow, many organizations face challenges in efficiently developing and managing prompts. Also, modern applications often require chaining or routing logics that add complexity to the development. With the Prompt Management and Flows features, Amazon Bedrock addresses these pain points by providing intuitive tools for designing and storing prompts, creating complex workflows, and advancing collaboration among team members.

Before introducing the details of the new capabilities, let’s review how prompts are typically developed, managed, and used in a generative AI application.

The prompt lifecycle

Developing effective prompts for generative AI applications is an iterative process that requires careful design, testing, and refinement. Understanding this lifecycle is crucial for creating high-quality, reliable AI-powered solutions. Let’s explore the key stages of a typical prompting lifecycle:

Prompt design – This initial stage involves crafting prompts that effectively communicate the desired task or query to the foundation model. Prompts are often built as prompt templates that contain variables, dynamic context, or other content to be provided at inference time. Good prompt design considers factors such as clarity, specificity, and context to elicit the most relevant and accurate responses.
Testing and evaluation – After they’re designed, prompts or prompt templates are tested with various inputs to assess their performance and robustness. This stage often involves comparing multiple variations to identify the most effective formulations.
Refinement – Based on the testing results, prompts are iteratively refined to improve their effectiveness. This often involves adjusting wording, adding or removing context, or modifying the structure of the prompt.
Versioning and cataloging – As prompts are developed and refined, it’s crucial to maintain versions and organize them in a prompt catalog. This allows teams to track changes, compare performance across versions, and access proven prompts for reuse.
Deployment – After prompts have been optimized, they can be deployed as part of a generative AI application. This involves integrating the prompt into a larger system or workflow.
Monitoring and iteration – After deployment, teams continually monitor the performance of prompts in live applications and iterate to maintain or improve their effectiveness.

Throughout this lifecycle, the prompt design and prompt catalog play critical roles. A well-designed prompt significantly enhances the quality and relevance of AI-generated responses. A comprehensive prompt catalog is a valuable resource for developers, enabling them to use proven prompts and best practices across projects, saving both time and money.

For more complex generative AI applications, developers often employ patterns such as prompt chaining or prompt routing. These approaches allow for the definition of more sophisticated logic and dynamic workflows, often called prompt flows.

Prompt chaining uses the output of one prompt as input for another, creating a sequence of interactions with the foundation model (FM) to accomplish more complex tasks. For example, a customer service chatbot could initially use an FM to extract key information about a customer and their issue, then pass the details as input for calling a function to open a support ticket. The following diagram illustrates this workflow.

Prompt routing refers to the process of dynamically selecting and applying different prompts based on certain conditions or the nature of the input, allowing for more flexible and context-aware AI applications. For example, a user request to a banking assistant could dynamically decide if the answer can be best found with Retrieval Augmented Generation (RAG) when asked about the available credit cards details, or calling a function for running a query when the user asks about their account balance. The following diagram illustrates this workflow.

Combining these two patterns is common in modern generative AI application development. By understanding and optimizing each stage of the prompting lifecycle and using techniques like chaining and routing, you can create more powerful, efficient, and effective generative AI solutions.

Let’s dive into the new features in Amazon Bedrock and explore how they can help you transform your generative AI development process.

Prompt management: Optimize your AI interactions

The Prompt Management feature streamlines the creation, evaluation, deployment, and sharing of prompts. This feature helps developers and business users obtain the best responses from FMs for their specific use cases.

Key benefits of Prompt Management include the following:

Rapid prompt creation and iteration – Create your prompts and variations with the built-in prompt builder on the Amazon Bedrock console or with the CreatePrompt Incorporate dynamic information using inputs for building your prompt templates.
Seamless testing and deployment – Quickly test individual prompts, set variables and their test values. Create prompt versions stored in the built-in prompt library for cataloging and management using the Amazon Bedrock console or the GetPrompt, ListPrompts, and CreatePromptVersion
Collaborative prompt development – Use your prompts and prompt templates in flows or Amazon Bedrock Studio. Prompt management enables team members to collaborate on prompt creation, evaluation, and deployment, improving efficiencies in the development process.

There are no prerequisites for using the Prompt Management feature beyond access to the Amazon Bedrock console. For information on AWS Regions and models supported, refer to Prompt management in Amazon Bedrock. If you don’t currently have access to the Amazon Bedrock console, refer to Set up Amazon Bedrock.

To get started with the Prompt Management feature on the Amazon Bedrock console, complete the following steps:

On the Amazon Bedrock console, under Builder tools in the navigation pane, choose Prompt management.

Create a new prompt or select an existing one from the prompt library.

Use the prompt builder to select a model, set parameters, and write the prompt content.

Configure variables for creating prompt templates and test your prompts dynamically.

Create and manage prompt versions for using in your generative AI flows.

Prompt flows: Visualize and accelerate Your AI workflows

The Amazon Bedrock Flows feature introduces a visual builder that simplifies the creation of complex generative AI workflows. This feature allows you to link multiple FMs, prompts, and other AWS services, reducing development time and effort.

Key benefits of prompt flows include:

Intuitive visual builder – Drag and drop components to create a flow, linking prompts with other prompts, AI services, knowledge bases, and business logic. This visual approach helps eliminate the need for extensive coding and provides a comprehensive overview of your application’s structure. Alternatively, you can use the CreateFlow API for a programmatic creation of flows that help you automate processes and development pipelines.
Rapid testing and deployment – Test your flows directly on the Amazon Bedrock console for faster iteration or using the InvokeFlow At any time, you can snapshot the flow for integration into your generative AI application. The flow is surfaced through an Agents for Amazon Bedrock runtime endpoint. You can create flow versions on the Amazon Bedrock console or with the CreateFlowVersion API. Creating an alias on the Amazon Bedrock console or with the CreateFlowAlias API enables straightforward rollbacks and A/B testing between different versions of the flow without impacting your service or development pipelines.
Manage and templatize – Accelerate your development with flow templates for repeated common use cases. You can manage your flows on the Amazon Bedrock console or with the GetFlow and ListFlows

Before you get started in your account, refer to How Flows for Amazon Bedrock works for details on the permissions required and quotas. When you’re ready, complete the following steps to get started with flows on the Amazon Bedrock console:

On the Amazon Bedrock console, under Builder tools in the navigation pane, choose Flows.
Create a flow by providing a name, description, and AWS Identity and Access Management (IAM) role.
Access the visual builder in the working draft of your flow.
Drag and drop individual components or nodes, including prompt templates from your prompt catalog, and link them together. You can edit the properties of each node and use other elements available in Amazon Bedrock.
Use the available nodes to implement conditions, code hooks with AWS Lambda functions, or integrations with AI services such as Amazon Lex, among many other options to be added soon. You can chain or route steps to define your own logic and processing outputs.
Test your prompt flows dynamically and set up your outputs for deploying your generative AI applications.

In our example, we create a flow for dynamically routing the user question to either query a knowledge base in Amazon Bedrock or respond directly from the LLM. We can now invoke this flow from our application frontend.

Example use case: Optimizing ecommerce customer service chatbots

To illustrate the power of these new features, let’s consider Octank, a fictional large ecommerce company facing challenges to efficiently create, test, and deploy AI-powered customer service chatbots for different product categories. This resulted in inconsistent performance and slow iteration cycles.

In the following notebook, we provide a guided example that you can follow to get started with Prompt Management and Prompt Flows programmatically.

Using prompt management and flows in Amazon Bedrock, Octank’s development and prompt engineering teams can now accomplish the following:

Create visual and programmatic workflows for each product category chatbot, incorporating different FMs and AI services as needed
Rapidly prototype and test prompt variations for each chatbot, optimizing for accuracy and relevance
Collaborate across teams to refine prompts and share best practices
Deploy and A/B test different chatbot versions to identify the most effective configurations

As a result, Octank has significantly reduced their development time, improved chatbot response quality, and achieved more consistent performance across product lines with increased reuse of artefacts.

Conclusion

The new Prompt Management and Flows features in Amazon Bedrock represent a significant leap forward in generative AI development. By streamlining workflow creation, prompt management, and team collaboration, these tools enable faster time-to-market and higher-quality AI-powered solutions.

We invite you to explore these new features in preview and experience firsthand how they can improve your generative AI development process. To get started, open the Amazon Bedrock console or discover the new APIs in the Amazon Bedrock SDK, and begin creating your prompts and flows today.

We’re excited to see the innovative applications you’ll build with these new capabilities. As always, we welcome your feedback through AWS re:Post for Amazon Bedrock or your usual AWS contacts. Join the generative AI builder community at community.aws to share your experiences and learn from others.

Stay tuned for more updates as we continue to enhance Amazon Bedrock and empower you to build the next generation of AI-powered applications!

To learn more, refer to the documentation on prompt management and prompt flows for Amazon Bedrock.

About the Authors

Antonio Rodriguez is a Sr. Generative AI Specialist Solutions Architect at AWS. He helps companies of all sizes solve their challenges, embrace innovation, and create new business opportunities with Amazon Bedrock. Apart from work, he loves to spend time with his family and play sports with his friends.

Jared Dean is a Principal AI/ML Solutions Architect at AWS. Jared works with customers across industries to develop machine learning applications that improve efficiency. He is interested in all things AI, technology, and BBQ.

Empowering everyone with GenAI to rapidly build, customize, and deploy apps securely: Highlights from the AWS New York Summit

Imagine this—all employees relying on generative artificial intelligence (AI) to get their work done faster, every task becoming less mundane and more innovative, and every application providing a more useful, personal, and engaging experience. To realize this future, organizations need more than a single, powerful large language model (LLM) or chat assistant. They need a full range of capabilities to build and scale generative AI applications that are tailored to their business and use case —including apps with built-in generative AI, tools to rapidly experiment and build their own generative AI apps, a cost-effective and performant infrastructure, and security controls and guardrails. That’s why we are investing in a comprehensive generative AI stack. At the top layer, which includes generative AI-powered applications, we have Amazon Q, the most capable generative AI-powered assistant. The middle layer has Amazon Bedrock, which provides tools to easily and rapidly build, deploy, and scale generative AI applications leveraging LLMs and other foundation models (FMs). And at the bottom, there’s our resilient, cost-effective infrastructure layer, which includes chips purpose-built for AI, as well as Amazon SageMaker to build and run FMs. All of these services are secure by design, and we keep adding features that are critical to deploying generative AI applications tailored to your business. During the last 18 months, we’ve launched more than twice as many machine learning (ML) and generative AI features into general availability than the other major cloud providers combined. That’s another reason why hundreds of thousands of customers are now using our AI services.

Today at the AWS New York Summit, we announced a wide range of capabilities for customers to tailor generative AI to their needs and realize the benefits of generative AI faster. We’re enabling anyone to build generative AI applications with Amazon Q Apps by writing a simple natural language prompt—in seconds. We’re making it easier to leverage your data, supercharge agents, and quickly, securely, and responsibly deploy generative AI into production with new features in Amazon Bedrock. And we announced new partnerships with innovators like Scale AI to help you customize your applications quickly and easily.

Generative AI-powered apps transform business as usual

Generative AI democratizes information, gives more people the ability to create and innovate, and provides access to productivity-enhancing assistance that was never available before. That’s why we’re building generative AI-powered applications for everyone.

Amazon Q, which includes Amazon Q Developer and Amazon Q Business, is the most capable generative AI-powered assistant for software development and helping employees make better decisions—faster—leveraging their company’s data. Not only does Amazon Q generate the industry’s most accurate coding suggestions, it can also autonomously perform multistep tasks like upgrading Java applications and generating and implementing new features. Amazon Q is where developers need it on the AWS Management Console and in popular integrated development environments, including IntelliJ IDEA, Visual Studio, VS Code, and Amazon SageMaker Studio. You can securely customize Amazon Q Developer with your internal code base to get more relevant and useful recommendations for in-line coding and save even more time. For instance, National Australia Bank has seen increased acceptance rates of 60%, up from 50% and Amazon Prime developers have already seen a 30% increase in acceptance rates. Amazon Q can also help employees do more with the vast troves of data and information contained in their company’s documents, systems, and applications by answering questions, providing summaries, generating business intelligence (BI) dashboards and reports, and even generating applications that automate key tasks. We’re super excited about the productivity gains customers and partners have seen, with early signals that Amazon Q could help their employees become over 80% more productive at their jobs.

To enable all employees to create their own generative AI applications to automate tasks, today we announced the general availability of Amazon Q Apps, a feature of Amazon Q Business. With Amazon Q Apps employees can go from conversation to generative AI-powered app based on their company data in seconds. Users simply describe the application they want in a prompt and Amazon Q instantly generates it. Amazon Q also gives employees the option to generate an app from an existing conversation with a single click. During preview, we saw users generate applications for diverse tasks, including summarizing feedback, creating onboarding plans, writing copy, drafting memos, and many more. For instance, Druva, a data security provider, created an Amazon Q App to support their request for proposal (RFP) process by summarizing the required information almost instantly, reducing RFP response times by up to 25%.

In addition to Amazon Q Apps, which makes it easy for any employee to automate their individual tasks, today we announced AWS App Studio (preview), a generative AI-powered service that enables technical professionals such as IT project managers, data engineers, and enterprise architects to use natural language to create, deploy, and manage enterprise applications across an organization. With App Studio, a user simply describes the application they want, what they want it to do, and the data sources they want to integrate with, and App Studio builds an application in minutes that could have taken a professional developer days to build a similar application from scratch. App Studio’s generative AI-powered assistant eliminates the learning curve of typical low-code tools, accelerating the application creation process and simplifying common tasks like designing the UI, building workflows, and testing the application. Each application can be immediately scaled to thousands of users and is secure and fully managed by AWS, eliminating the need for any operational expertise.

New features and capabilities supercharge Amazon Bedrock—speeding development of generative AI apps

Amazon Bedrock is the fastest and easiest way to build and scale secure generative AI applications with the broadest selection of leading LLMs and FMs as well as easy-to-use capabilities for developers. Tens of thousands of customers are already using Amazon Bedrock, and it’s one of AWS’s fastest growing services over the last decade. For example, Ferrari is rapidly introducing new experiences for customers, dealers, and internal teams to run faster simulations, create new knowledge bases that assist dealers and technical users, enhance the racing fan experience, and create hyper-personalized vehicle recommendations for customers from the millions of options offered by Ferrari in seconds.

Since the start of 2024, we have announced the general availability of more features and capabilities in Amazon Bedrock than comparable services from other leading cloud providers to help customers get generative AI apps from proof of concept to production faster. This includes support for new industry-leading models from Anthropic, Meta, Mistral, and more, as well as the recent addition of Anthropic Claude 3.5 Sonnet, their most advanced model to date, which was made available immediately for Amazon Bedrock customers. Thousands of customers have already used Anthropic’s Claude 3.5 since its release.

Today, we announced some major new Amazon Bedrock innovations that enable you to:

Customize generative AI applications with your data. You can customize generative AI applications with your data to make them specific to your use case, your organization, and your industry:

Fine tune Anthropic’s Claude 3 Haiku in Amazon Bedrock – With Amazon Bedrock, you can privately and securely fine tune Amazon Titan, Cohere Command and Command Lite, and Meta Llama 2 models by providing labeled data in Amazon Simple Storage Service (Amazon S3) to specialize the model for your business and use case. Starting today, Amazon Bedrock is also the only fully managed service that provides you with the ability to fine tune Anthropic’s Claude 3 Haiku (in preview). Read more in the News Blog.
Leverage even more data sources for Retrieval Augmented Generation (RAG) – With RAG, you can provide a model with new knowledge or up-to-date info from multiple sources, including document repositories, databases, and APIs. For example, the model might use RAG to retrieve search results from Amazon OpenSearch Service or documents from Amazon S3. Knowledge Bases for Amazon Bedrock fully manages this experience by connecting to your private data sources, including Amazon Aurora, Amazon OpenSearch Serverless, MongoDB, Pinecone, and Redis Enterprise Cloud. Today, we’ve expanded the list to include connectors for Salesforce, Confluence, and SharePoint (in preview), so organizations can leverage more business data to customize models for their specific needs. More knowledge base updates can be found in the News Blog.
Get the fastest vector search available – To further enhance your RAG workflows, we’ve added vector search to some of our most popular data services, including OpenSearch Service and OpenSearch Serverless, Aurora, Amazon Relational Database Service (Amazon RDS), and more. Customers can co-locate vector data with operational data, reducing the overhead of managing another database. Today, we’re also excited to announce the general availability of vector search for Amazon MemoryDB. Amazon MemoryDB delivers the fastest vector search performance at the highest recall rates among popular vector databases on AWS, making it a great fit for use cases that require single-digit millisecond latency. For example, Amazon Advertising, IBISWorld, Mediaset, and other organizations are using it to deliver real-time semantic search, and Broadridge Financial is running RAG while delivering the same real-time response rates that their customers are accustomed to. You can use MemoryDB vector search standalone today, and soon, you’ll be able to access it through Knowledge Bases for Amazon Bedrock. Read more about MemoryDB in the News Blog.

Create more advanced, personalized customer experiences. With Agents for Amazon Bedrock, applications can take action, executing multistep tasks using company systems and data sources, making generative AI applications substantially more useful. Today, we’re adding key capabilities to Agents for Amazon Bedrock. Previously, agents were limited to taking action based on information from within a single session. Now agents can retain memory across multiple interactions to remember where you last left off and provide better recommendations based on prior interactions. For instance, in a flight booking application, a developer can create an agent that can remember the last time you traveled or that you opt for a vegetarian meal. Agents can also now interpret code to tackle complex data-driven use cases, such as data analysis, data visualization, text processing, solving equations, and optimization problems. For instance, an application user can ask to analyze the historical real estate prices across various zip codes to identify investment opportunities. Check out the News Blogs for more on these capabilities.

De-risk generative AI with Guardrails for Amazon Bedrock. Customers are concerned about hallucinations, where LLMs generate incorrect responses by conflating multiple pieces of information, providing incorrect information, or inventing new information. These results can misinform employees and customers and harm brands, limiting the usefulness of generative AI. Today, we’re adding contextual grounding checks in Guardrails for Amazon Bedrock to detect hallucinations in model responses for applications using RAG and summarization applications. Contextual grounding checks add to the industry-leading safety protection in Guardrails for Amazon Bedrock to make sure the LLM response is based on the right enterprise source data and evaluates the LLM response to confirm that it’s relevant to the user’s query or instruction. Contextual grounding checks can detect and filter over 75% hallucinated responses for RAG and summarization workloads. Read more about our commitments to responsible AI on the AWS Machine Learning Blog.

We’re excited to see how our customers leverage these ever-expanding capabilities of Amazon Bedrock to customize their generative AI applications for vertical industries and business functions. For example, Deloitte is using Amazon Bedrock’s advanced customization capabilities to build their C-Suite AI solution, designed specifically for CFOs. It leverages Deloitte’s proprietary data and industry depth across the finance function. C-Suite AI provides customized AI models tailored to the needs of CFOs, with applications that span critical finance areas, generative analytics for data-driven insights, contract intelligence, and investor relations support.

New partners and trainings help customers along the AI journey

Our extensive partner network helps our customers along the journey to realizing the potential of generative AI. For example, BrainBox AI—which worked with our generative AI competency partner, Caylent—developed its AI assistant ARIA on AWS to help reduce energy costs and emissions in buildings. We have been building out our partner network and training offerings to help customers move quickly from experiment to broad usage. Our AWS Generative AI Competency Partner Program is designed to identify, validate, and promote AWS Partners with demonstrated AWS technical expertise and proven customer success. Today 19 new partners joined the program, giving customers access to 60 Generative AI Competency Partners across the globe. New partners include C3.ai, Cognizant, IBM, and LG CNS, and we have significantly expanded customer offerings into Korea, Greater China, and LATAM, and Saudi Arabia.

We’re also announcing a new partnership with Scale AI, our first model customization and evaluation partner. Through this collaboration, enterprise and public sector organizations can use Scale GenAI Platform and Scale Donovan to evaluate their generative AI applications and further customize, configure, and fine tune models to ensure trust and high performance in production, all built on Amazon Bedrock. Scale AI upholds the highest standards of privacy and regulatory compliance working with some of the most stringent government customers, such as the US Department of Defense. Customers can access Scale AI through an engagement with the AWS Generative AI Innovation Center, a program offered by AWS that pairs you with AWS science and strategy experts, or through the AWS Marketplace.

To help upskill your workforce, we’re making a new interactive online learning experience available, AWS SimuLearn, that pairs generative AI-powered simulations and hands-on training, to help people learn how to translate business problems into technical solutions. This is part of our broader commitment to provide free cloud computing skills training to 29 million people worldwide by 2025. Today, we announced that we surpassed this milestone, more than a year ahead of schedule.

We’re giving customers tools that put the power of generative AI into all employees’ hands, providing more ways to create personalized and relevant generative AI-powered applications, and working on the tough problems like reducing hallucinations so more companies can gain benefits from generative AI. We’re energized by the progress our customers have already made in making generative AI a reality for their organizations and will continue to innovate on their behalf. To watch the New York Summit keynote for an in-depth look at these announcements, visit our AWS New York Summit page or learn more about our generative AI services.

About the author

Swami Sivasubramanian is Vice President of Data and Machine Learning at AWS. In this role, Swami oversees all AWS Database, Analytics, and AI & Machine Learning services. His team’s mission is to help organizations put their data to work with a complete, end-to-end data solution to store, access, analyze, and visualize, and predict.

A progress update on our commitment to safe, responsible generative AI

Responsible AI is a longstanding commitment at Amazon. From the outset, we have prioritized responsible AI innovation by embedding safety, fairness, robustness, security, and privacy into our development processes and educating our employees. We strive to make our customers’ lives better while also establishing and implementing the necessary safeguards to help protect them. Our practical approach to transform responsible AI from theory into practice, coupled with tools and expertise, enables AWS customers to implement responsible AI practices effectively within their organizations. To date, we have developed over 70 internal and external offerings, tools, and mechanisms that support responsible AI, published or funded over 500 research papers, studies, and scientific blogs on responsible AI, and delivered tens of thousands of hours of responsible AI training to our Amazon employees. Amazon also continues to expand its portfolio of free responsible AI training courses for people of all ages, backgrounds, and levels of experience.

Today, we are sharing a progress update on our responsible AI efforts, including the introduction of new tools, partnerships, and testing that improve the safety, security, and transparency of our AI services and models.

Launched new tools and capabilities to build and scale generative AI safely, supported by adversarial style testing (i.e., red teaming)

In April 2024, we announced the general availability of Guardrails for Amazon Bedrock and Model Evaluation in Amazon Bedrock to make it easier to introduce safeguards, prevent harmful content, and evaluate models against key safety and accuracy criteria. Guardrails is the only solution offered by a major cloud provider that enables customers to build and customize safety and privacy protections for their generative AI applications in a single solution. It helps customers block up to 85% of harmful content on top of the native protection from FMs on Amazon Bedrock.

In May, we published a new AI Service Card for Amazon Titan Text Premier to further support our investments in responsible, transparent generative AI. AI Service Cards are a form of responsible AI documentation that provide customers with a single place to find information on the intended use cases and limitations, responsible AI design choices, and deployment and performance optimization best practices for our AI services and models. We’ve created more than 10 AI Service Cards thus far to deliver transparency for our customers as part of our comprehensive development process that addresses fairness, explainability, veracity and robustness, governance, transparency, privacy and security, safety, and controllability.

AI systems can also have performance flaws and vulnerabilities that can increase risk around security threats or harmful content. At Amazon, we test our AI systems and models, such as Amazon Titan, using a variety of techniques, including manual red-teaming. Red-teaming engages human testers to probe an AI system for flaws in an adversarial style, and complements our other testing techniques, which include automated benchmarking against publicly available and proprietary datasets, human evaluation of completions against proprietary datasets, and more. For example, we have developed proprietary evaluation datasets of challenging prompts that we use to assess development progress on Titan Text. We test against multiple use cases, prompts, and data sets because it is unlikely that a single evaluation dataset can provide an absolute picture of performance. Altogether, Titan Text has gone through multiple iterations of red-teaming on issues including safety, security, privacy, veracity, and fairness.

Introduced watermarking to enable users to determine if visual content is AI-generated

A common use case for generative AI is the creation of digital content, like images, videos, and audio, but to help prevent disinformation, users need to be able to able to identify AI-generated content. Techniques such as watermarking can be used to confirm if it comes from a particular AI model or provider. To help reduce the spread of disinformation, all images generated by Amazon Titan Image Generator have an invisible watermark by default. It is designed to be tamper-resistant, helping increase transparency around AI-generated content and combat disinformation. We also introduced a new API (preview) in Amazon Bedrock that checks for the existence of this watermark and helps you confirm whether an image was generated by Titan Image Generator.

Promoted collaboration among companies and governments regarding trust and safety risks

Collaboration among companies, governments, researchers, and the AI community is critical to foster the development of AI that is safe, responsible, and trustworthy. In February 2024, Amazon joined the U.S. Artificial Intelligence Safety Institute Consortium, established by the National Institute of Standards and Technology (NIST). Amazon is collaborating with NIST to establish a new measurement science to enable the identification of scalable, interoperable measurements and methodologies to promote development of trustworthy AI. We are also contributing $5 million in AWS compute credits to the Institute for the development of tools and methodologies to evaluate the safety of foundation models. Also in February, Amazon joined the “Tech Accord to Combat Deceptive Use of AI in 2024 Elections” at the Munich Security Conference. This is an important part of our collective work to advance safeguards against deceptive activity and protect the integrity of elections.

We continue to find new ways to engage in and encourage information-sharing among companies and governments as the technology continues to evolve. This includes our work with Thorn and All Tech is Human to safely design our generative AI services to reduce the risk that they will be misused for child exploitation. We’re also a member of the Frontier Model Forum to advance the science, standards, and best practices in the development of frontier AI models.

Used AI as a force for good to address society’s greatest challenges and supported initiatives that foster education

At Amazon, we are committed to promoting the safe and responsible development of AI as a force for good. We continue to see examples across industries where generative AI is helping to address climate change and improve healthcare. Brainbox AI, a pioneer in commercial building technology, launched the world’s first generative AI-powered virtual building assistant on AWS to deliver insights to facility managers and building operators that will help optimize energy usage and reduce carbon emissions. Gilead, an American biopharmaceutical company, accelerates life-saving drug development with AWS generative AI by understanding a clinical study’s feasibility and optimizing site selection through AI-driven protocol analysis utilizing both internal and real-world datasets.

As we navigate the transformative potential of these technologies, we believe that education is the foundation for realizing their benefits while mitigating risks. That’s why we offer education on potential risks surrounding generative AI systems. Amazon employees have spent tens of thousands of training hours since July 2023, covering a range of critical topics like risk assessments, as well as deep dives into complex considerations surrounding fairness, privacy, and model explainability. As part of Amazon’s “AI Ready” initiative to provide free AI skills training to 2 million people globally by 2025, we’ve launched new free training courses about safe and responsible AI use on our digital learning centers. The courses include “Introduction to Responsible AI” for new-to-cloud learners on AWS Educate and courses like “Responsible AI Practices,” and “Security, Compliance, and Governance for AI Solutions” on AWS Skill Builder.

Delivering groundbreaking innovation with trust at the forefront

As an AI pioneer, Amazon continues to foster the safe, responsible, and trustworthy development of AI technology. We are dedicated to driving innovation on behalf of our customers while also establishing and implementing the necessary safeguards. We’re also committed to working with companies, governments, academia, and researchers alike to deliver groundbreaking generative AI innovation with trust at the forefront.

About the author

Vasi Philomin is VP of Generative AI at AWS. He leads generative AI efforts, including Amazon Bedrock and Amazon Titan.

Solution overview

Prerequisites

Extend SageMaker Distribution

Build a custom geospatial image

Attach the custom geospatial image to SageMaker Studio

Attach the custom geospatial image using the AWS CLI

Use the custom geospatial Image in the JupyterLab app

Run interactive geospatial data analyses and large-scale processing jobs in SageMaker

In-notebook interactive development using a custom image

Highly parallelized geospatial processing pipelines using a SageMaker processing job and a custom image

Clean up

Conclusion

About the Authors

Architecture

Prerequisites

Demo preparation

Step 1: Deploy the solution using the AWS SAM template

Step 2: Upload proprietary training data to the S3 bucket

Step 3: Run the Step Functions workflow and monitor

Step 4: View the outcome of training the base foundation model

Step 5: Cleaning up

Conclusion

Further study

About the Author

Survive the Stardust

Happy New Games

Learn more:

Subscribe to the Microsoft Research Podcast:

Transcript

ABCI 3.0: A New Era for Japanese AI Research and Development

NVIDIA’s Commitment to Japan’s Future

Unmatched Computing Performance and Efficiency

FlashAttention Recap

New hardware features on Hopper GPUs – WGMMA, TMA, FP8

Asynchrony: Overlapping GEMM and Softmax

Inter-warpgroup overlapping with pingpong scheduling

Intra-warpgroup overlapping of GEMM and Softmax

Low-precision: reduce quantization error with incoherent processing

Attention benchmark

Discussion

Notes

Features for improving accuracy of RAG based applications

Advanced parsing

Advanced data chunking options

Semantic chunking

Hierarchical chunking

Custom processing using Lambda functions

Metadata customization for .csv files

Query reformulation

Conclusion

References:

About the authors

The prompt lifecycle

Prompt management: Optimize your AI interactions

Prompt flows: Visualize and accelerate Your AI workflows

Example use case: Optimizing ecommerce customer service chatbots

Conclusion

About the Authors

Generative AI-powered apps transform business as usual

­New features and capabilities supercharge Amazon Bedrock—speeding development of generative AI apps

New partners and trainings help customers along the AI journey

About the author

Launched new tools and capabilities to build and scale generative AI safely, supported by adversarial style testing (i.e., red teaming)

Introduced watermarking to enable users to determine if visual content is AI-generated

Promoted collaboration among companies and governments regarding trust and safety risks

Used AI as a force for good to address society’s greatest challenges and supported initiatives that foster education

Delivering groundbreaking innovation with trust at the forefront

About the author

Navigation

GenAI Vision Endless Possibilities

"I'm interested in things that change the world or that affect the future and wondrous, new technology where you see it, and you're like, 'Wow, how did that even happen? How is that possible?'" -- Elon Musk

Copyright © 2019-2025 Vedere AI. All Rights Reserved.

New features and capabilities supercharge Amazon Bedrock—speeding development of generative AI apps