Amazon AWS – Page 104

Run ML inference on unplanned and spiky traffic using Amazon SageMaker multi-model endpoints

February 19, 2024

by Ram Vegiraju Amazon AWS

Amazon SageMaker multi-model endpoints (MMEs) are a fully managed capability of SageMaker inference that allows you to deploy thousands of models on a single endpoint. Previously, MMEs pre-determinedly allocated CPU computing power to models statically regardless the model traffic load, using Multi Model Server (MMS) as its model server. In this post, we discuss a solution in which an MME can dynamically adjust the compute power assigned to each model based on the model’s traffic pattern. This solution enables you to use the underlying compute of MMEs more efficiently and save costs.

MMEs dynamically load and unload models based on incoming traffic to the endpoint. When utilizing MMS as the model server, MMEs allocate a fixed number of model workers for each model. For more information, refer to Model hosting patterns in Amazon SageMaker, Part 3: Run and optimize multi-model inference with Amazon SageMaker multi-model endpoints.

However, this can lead to a few issues when your traffic pattern is variable. Let’s say you have a singular or few models receiving a large amount of traffic. You can configure MMS to allocate a high number of workers for these models, but this gets assigned to all the models behind the MME because it’s a static configuration. This leads to a large number of workers using hardware compute—even the idle models. The opposite problem can happen if you set a small value for the number of workers. The popular models won’t have enough workers at the model server level to properly allocate enough hardware behind the endpoint for these models. The main issue is that it’s difficult to remain traffic pattern agnostic if you can’t dynamically scale your workers at the model server level to allocate the necessary amount of compute.

The solution we discuss in this post uses DJLServing as the model server, which can help mitigate some of the issues that we discussed and enable per-model scaling and enable MMEs to be traffic pattern agnostic.

MME architecture

SageMaker MMEs enable you to deploy multiple models behind a single inference endpoint that may contain one or more instances. Each instance is designed to load and serve multiple models up to its memory and CPU/GPU capacity. With this architecture, a software as a service (SaaS) business can break the linearly increasing cost of hosting multiple models and achieve reuse of infrastructure consistent with the multi-tenancy model applied elsewhere in the application stack. The following diagram illustrates this architecture.

A SageMaker MME dynamically loads models from Amazon Simple Storage Service (Amazon S3) when invoked, instead of downloading all the models when the endpoint is first created. As a result, an initial invocation to a model might see higher inference latency than the subsequent inferences, which are completed with low latency. If the model is already loaded on the container when invoked, then the download step is skipped and the model returns the inferences with low latency. For example, assume you have a model that is only used a few times a day. It’s automatically loaded on demand, whereas frequently accessed models are retained in memory and invoked with consistently low latency.

Behind each MME are model hosting instances, as depicted in the following diagram. These instances load and evict multiple models to and from memory based on the traffic patterns to the models.

SageMaker continues to route inference requests for a model to the instance where the model is already loaded such that the requests are served from a cached model copy (see the following diagram, which shows the request path for the first prediction request vs. the cached prediction request path). However, if the model receives many invocation requests, and there are additional instances for the MME, SageMaker routes some requests to another instance to accommodate the increase. To take advantage of automated model scaling in SageMaker, make sure you have instance auto scaling set up to provision additional instance capacity. Set up your endpoint-level scaling policy with either custom parameters or invocations per minute (recommended) to add more instances to the endpoint fleet.

Model server overview

A model server is a software component that provides a runtime environment for deploying and serving machine learning (ML) models. It acts as an interface between the trained models and client applications that want to make predictions using those models.

The primary purpose of a model server is to allow effortless integration and efficient deployment of ML models into production systems. Instead of embedding the model directly into an application or a specific framework, the model server provides a centralized platform where multiple models can be deployed, managed, and served.

Model servers typically offer the following functionalities:

Model loading – The server loads the trained ML models into memory, making them ready for serving predictions.
Inference API – The server exposes an API that allows client applications to send input data and receive predictions from the deployed models.
Scaling – Model servers are designed to handle concurrent requests from multiple clients. They provide mechanisms for parallel processing and managing resources efficiently to ensure high throughput and low latency.
Integration with backend engines – Model servers have integrations with backend frameworks like DeepSpeed and FasterTransformer to partition large models and run highly optimized inference.

DJL architecture

DJL Serving is an open source, high performance, universal model server. DJL Serving is built on top of DJL, a deep learning library written in the Java programming language. It can take a deep learning model, several models, or workflows and make them available through an HTTP endpoint. DJL Serving supports deploying models from multiple frameworks like PyTorch, TensorFlow, Apache MXNet, ONNX, TensorRT, Hugging Face Transformers, DeepSpeed, FasterTransformer, and more.

DJL Serving offers many features that allow you to deploy your models with high performance:

Ease of use – DJL Serving can serve most models out of the box. Just bring the model artifacts, and DJL Serving can host them.
Multiple device and accelerator support – DJL Serving supports deploying models on CPU, GPU, and AWS Inferentia.
Performance – DJL Serving runs multithreaded inference in a single JVM to boost throughput.
Dynamic batching – DJL Serving supports dynamic batching to increase throughput.
Auto scaling – DJL Serving will automatically scale workers up and down based on the traffic load.
Multi-engine support – DJL Serving can simultaneously host models using different frameworks (such as PyTorch and TensorFlow).
Ensemble and workflow models – DJL Serving supports deploying complex workflows comprised of multiple models, and runs parts of the workflow on CPU and parts on GPU. Models within a workflow can use different frameworks.

In particular, the auto scaling feature of DJL Serving makes it straightforward to ensure the models are scaled appropriately for the incoming traffic. By default, DJL Serving determines the maximum number of workers for a model that can be supported based on the hardware available (CPU cores, GPU devices). You can set lower and upper bounds for each model to make sure that a minimum traffic level can always be served, and that a single model doesn’t consume all available resources.

DJL Serving uses a Netty frontend on top of backend worker thread pools. The frontend uses a single Netty setup with multiple HttpRequestHandlers. Different request handlers will provide support for the Inference API, Management API, or other APIs available from various plugins.

The backend is based around the WorkLoadManager (WLM) module. The WLM takes care of multiple worker threads for each model along with the batching and request routing to them. When multiple models are served, WLM checks the inference request queue size of each model first. If the queue size is greater than two times a model’s batch size, WLM scales up the number of workers assigned to that model.

Solution overview

The implementation of DJL with an MME differs from the default MMS setup. For DJL Serving with an MME, we compress the following files in the model.tar.gz format that SageMaker Inference is expecting:

model.joblib – For this implementation, we directly push the model metadata into the tarball. In this case, we are working with a .joblib file, so we provide that file in our tarball for our inference script to read. If the artifact is too large, you can also push it to Amazon S3 and point towards that in the serving configuration you define for DJL.
serving.properties – Here you can configure any model server-related environment variables. The power of DJL here is that you can configure minWorkers and maxWorkers for each model tarball. This allows for each model to scale up and down at the model server level. For instance, if a singular model is receiving the majority of the traffic for an MME, the model server will scale the workers up dynamically. In this example, we don’t configure these variables and let DJL determine the necessary number of workers depending on our traffic pattern.
model.py – This is the inference script for any custom preprocessing or postprocessing you would like to implement. The model.py expects your logic to be encapsulated in a handle method by default.
requirements.txt (optional) – By default, DJL comes installed with PyTorch, but any additional dependencies you need can be pushed here.

For this example, we showcase the power of DJL with an MME by taking a sample SKLearn model. We run a training job with this model and then create 1,000 copies of this model artifact to back our MME. We then showcase how DJL can dynamically scale to handle any type of traffic pattern that your MME may receive. This can include an even distribution of traffic across all models or even a few popular models receiving the majority of the traffic. You can find all the code in the following GitHub repo.

Prerequisites

For this example, we use a SageMaker notebook instance with a conda_python3 kernel and ml.c5.xlarge instance. To perform the load tests, you can use an Amazon Elastic Compute Cloud (Amazon EC2) instance or a larger SageMaker notebook instance. In this example, we scale to over a thousand transactions per second (TPS), so we suggest testing on a heavier EC2 instance such as an ml.c5.18xlarge so that you have more compute to work with.

Create a model artifact

We first need to create our model artifact and data that we use in this example. For this case, we generate some artificial data with NumPy and train using an SKLearn linear regression model with the following code snippet:

import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
import joblib

# Generate dummy data
np.random.seed(0)
X = np.random.rand(100, 1)
y = 2 * X + 1 + 0.1 * np.random.randn(100, 1)
# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Create a Linear Regression model
model = LinearRegression()
# Train the model on the training data
model.fit(X_train, y_train)

# Create serialized model artifact
model_filename = "model.joblib"
joblib.dump(model, model_filename)

After you run the preceding code, you should have a model.joblib file created in your local environment.

Pull the DJL Docker image

The Docker image djl-inference:0.23.0-cpu-full-v1.0 is our DJL serving container used in this example. You can adjust the following URL depending on your Region:

inference_image_uri = "474422712127.dkr.ecr.us-east-1.amazonaws.com/djl-serving-cpu:latest"

Optionally, you can also use this image as a base image and extend it to build your own Docker image on Amazon Elastic Container Registry (Amazon ECR) with any other dependencies you need.

Create the model file

First, we create a file called serving.properties. This instructs DJLServing to use the Python engine. We also define the max_idle_time of a worker to be 600 seconds. This makes sure that we take longer to scale down the number of workers we have per model. We don’t adjust minWorkers and maxWorkers that we can define and we let DJL dynamically compute the number of workers needed depending on the traffic each model is receiving. The serving.properties is shown as follows. To see the complete list of configuration options, refer to Engine Configuration.

engine=Python
max_idle_time=600

Next, we create our model.py file, which defines the model loading and inference logic. For MMEs, each model.py file is specific to a model. Models are stored in their own paths under the model store (usually /opt/ml/model/). When loading models, they will be loaded under the model store path in their own directory. The full model.py example in this demo can be seen in the GitHub repo.

We create a model.tar.gz file that includes our model (model.joblib), model.py, and serving.properties:

#Build tar file with model data + inference code, replace this cell with your model.joblib
bashCommand = "tar -cvpzf model.tar.gz model.joblib requirements.txt model.py serving.properties"
process = subprocess.Popen(bashCommand.split(), stdout=subprocess.PIPE)
output, error = process.communicate()

For demonstration purposes, we make 1,000 copies of the same model.tar.gz file to represent the large number of models to be hosted. In production, you need to create a model.tar.gz file for each of your models.

Lastly, we upload these models to Amazon S3.

Create a SageMaker model

We now create a SageMaker model. We use the ECR image defined earlier and the model artifact from the previous step to create the SageMaker model. In the model setup, we configure Mode as MultiModel. This tells DJLServing that we’re creating an MME.

mme_model_name = "sklearn-djl-mme" + strftime("%Y-%m-%d-%H-%M-%S", gmtime())
print("Model name: " + mme_model_name)

create_model_response = sm_client.create_model(
ModelName=mme_model_name,
ExecutionRoleArn=role,
PrimaryContainer={"Image": inference_image_uri, "Mode": "MultiModel", "ModelDataUrl": mme_artifacts},
)

Create a SageMaker endpoint

In this demo, we use 20 ml.c5d.18xlarge instances to scale to a TPS in the thousands range. Make sure to get a limit increase on your instance type, if necessary, to achieve the TPS you are targeting.

mme_epc_name = "sklearn-djl-mme-epc" + strftime("%Y-%m-%d-%H-%M-%S", gmtime())
endpoint_config_response = sm_client.create_endpoint_config(
EndpointConfigName=mme_epc_name,
ProductionVariants=[
{
"VariantName": "sklearnvariant",
"ModelName": mme_model_name,
"InstanceType": "ml.c5d.18xlarge",
"InitialInstanceCount": 20
},],)

Load testing

At the time of writing, the SageMaker in-house load testing tool Amazon SageMaker Inference Recommender doesn’t natively support testing for MMEs. Therefore, we use the open source Python tool Locust. Locust is straightforward to set up and can track metrics such as TPS and end-to-end latency. For a full understanding of how to set it up with SageMaker, see Best practices for load testing Amazon SageMaker real-time inference endpoints.

In this use case, we have three different traffic patterns we want to simulate with MMEs, so we have the following three Python scripts that align with each pattern. Our goal here is to prove that, regardless of what our traffic pattern is, we can achieve the same target TPS and scale appropriately.

Evenly distributed traffic (locust_distributed.py)
90% traffic to 10 popular models (locust_top_n.py)
90% traffic to a single hot model (locust_script_one.py)

We can specify a weight in our Locust script to assign traffic across different portions of our models. For instance, with our single hot model, we implement two methods as follows:

# popular model
def sendPopular(self):

        request_meta = {
            "request_type": "InvokeEndpoint",
            "name": "SageMaker",
            "start_time": time.time(),
            "response_length": 0,
            "response": None,
            "context": {},
            "exception": None,
        }
        start_perf_counter = time.perf_counter()
        try:
            response = self.sagemaker_client.invoke_endpoint(
                EndpointName=self.endpoint_name,
                Body=self.payload,
                ContentType=self.content_type,
                TargetModel = "sklearn-0.tar.gz"
            )
  
# rest of model          
def sendRest(self):

        request_meta = {
            "request_type": "InvokeEndpoint",
            "name": "SageMaker",
            "start_time": time.time(),
            "response_length": 0,
            "response": None,
            "context": {},
            "exception": None,
        }
        start_perf_counter = time.perf_counter()
   
        try:
            response = self.sagemaker_client.invoke_endpoint(
                EndpointName=self.endpoint_name,
                Body=self.payload,
                ContentType=self.content_type,
                TargetModel = f'sklearn-{random.randint(1,989)}.tar.gz'
            )
            response_body = response["Body"].read()

We can then assign a certain weight to each method, which is when a certain method receives a specific percentage of the traffic:

# assign weights to models
class MyUser(BotoUser):

# 90% of traffic to singular model
@task(9)
def send_request(self):
self.client.sendPopular()

@task
def send_request_major(self):
self.client.sendRest()

For 20 ml.c5d.18xlarge instances, we see the following invocation metrics on the Amazon CloudWatch console. These values remain fairly consistent across all three traffic patterns. To understand CloudWatch metrics for SageMaker real-time inference and MMEs better, refer to SageMaker Endpoint Invocation Metrics.

You can find the rest of the Locust scripts in the locust-utils directory in the GitHub repository.

Summary

In this post, we discussed how an MME can dynamically adjust the compute power assigned to each model based on the model’s traffic pattern. This newly launched feature is available in all AWS Regions where SageMaker is available. Note that at the time of announcement, only CPU instances are supported. To learn more, refer to Supported algorithms, frameworks, and instances.

About the Authors

Ram Vegiraju is a ML Architect with the SageMaker Service team. He focuses on helping customers build and optimize their AI/ML solutions on Amazon SageMaker. In his spare time, he loves traveling and writing.

Qingwei Li is a Machine Learning Specialist at Amazon Web Services. He received his Ph.D. in Operations Research after he broke his advisor’s research grant account and failed to deliver the Nobel Prize he promised. Currently he helps customers in the financial service and insurance industry build machine learning solutions on AWS. In his spare time, he likes reading and teaching.

James Wu is a Senior AI/ML Specialist Solution Architect at AWS. helping customers design and build AI/ML solutions. James’s work covers a wide range of ML use cases, with a primary interest in computer vision, deep learning, and scaling ML across the enterprise. Prior to joining AWS, James was an architect, developer, and technology leader for over 10 years, including 6 years in engineering and 4 years in marketing & advertising industries.

Saurabh Trikande is a Senior Product Manager for Amazon SageMaker Inference. He is passionate about working with customers and is motivated by the goal of democratizing machine learning. He focuses on core challenges related to deploying complex ML applications, multi-tenant ML models, cost optimizations, and making deployment of deep learning models more accessible. In his spare time, Saurabh enjoys hiking, learning about innovative technologies, following TechCrunch and spending time with his family.

Xu Deng is a Software Engineer Manager with the SageMaker team. He focuses on helping customers build and optimize their AI/ML inference experience on Amazon SageMaker. In his spare time, he loves traveling and snowboarding.

Siddharth Venkatesan is a Software Engineer in AWS Deep Learning. He currently focusses on building solutions for large model inference. Prior to AWS he worked in the Amazon Grocery org building new payment features for customers world-wide. Outside of work, he enjoys skiing, the outdoors, and watching sports.

Rohith Nallamaddi is a Software Development Engineer at AWS. He works on optimizing deep learning workloads on GPUs, building high performance ML inference and serving solutions. Prior to this, he worked on building microservices based on AWS for Amazon F3 business. Outside of work he enjoys playing and watching sports.

Use Amazon Titan models for image generation, editing, and searching

February 19, 2024

by Rohit Mittal Amazon AWS

Amazon Bedrock provides a broad range of high-performing foundation models from Amazon and other leading AI companies, including Anthropic, AI21, Meta, Cohere, and Stability AI, and covers a wide range of use cases, including text and image generation, searching, chat, reasoning and acting agents, and more. The new Amazon Titan Image Generator model allows content creators to quickly generate high-quality, realistic images using simple English text prompts. The advanced AI model understands complex instructions with multiple objects and returns studio-quality images suitable for advertising, ecommerce, and entertainment. Key features include the ability to refine images by iterating on prompts, automatic background editing, and generating multiple variations of the same scene. Creators can also customize the model with their own data to output on-brand images in a specific style. Importantly, Titan Image Generator has in-built safeguards, like invisible watermarks on all AI-generated images, to encourage responsible use and mitigate the spread of disinformation. This innovative technology makes producing custom images in large volume for any industry more accessible and efficient.

The new Amazon Titan Multimodal Embeddings model helps build more accurate search and recommendations by understanding text, images, or both. It converts images and English text into semantic vectors, capturing meaning and relationships in your data. You can combine text and images like product descriptions and photos to identify items more effectively. The vectors power speedy, accurate search experiences. Titan Multimodal Embeddings is flexible in vector dimensions, enabling optimization for performance needs. An asynchronous API and Amazon OpenSearch Service connector make it easy to integrate the model into your neural search applications.

In this post, we walk through how to use the Titan Image Generator and Titan Multimodal Embeddings models via the AWS Python SDK.

Image generation and editing

In this section, we demonstrate the basic coding patterns for using the AWS SDK to generate new images and perform AI-powered edits on existing images. Code examples are provided in Python, and JavaScript (Node.js) is also available in this GitHub repository.

Before you can write scripts that use the Amazon Bedrock API, you need to install the appropriate version of the AWS SDK in your environment. For Python scripts, you can use the AWS SDK for Python (Boto3). Python users may also want to install the Pillow module, which facilitates image operations like loading and saving images. For setup instructions, refer to the GitHub repository.

Additionally, enable access to the Amazon Titan Image Generator and Titan Multimodal Embeddings models. For more information, refer to Model access.

Helper functions

The following function sets up the Amazon Bedrock Boto3 runtime client and generates images by taking payloads of different configurations (which we discuss later in this post):

import boto3
import json, base64, io
from random import randint
from PIL import Image

bedrock_runtime_client = boto3.client("bedrock-runtime")


def titan_image(
    payload: dict,
    num_image: int = 2,
    cfg: float = 10.0,
    seed: int = None,
    modelId: str = "amazon.titan-image-generator-v1",
) -> list:
    #   ImageGenerationConfig Options:
    #   - numberOfImages: Number of images to be generated
    #   - quality: Quality of generated images, can be standard or premium
    #   - height: Height of output image(s)
    #   - width: Width of output image(s)
    #   - cfgScale: Scale for classifier-free guidance
    #   - seed: The seed to use for reproducibility
    seed = seed if seed is not None else randint(0, 214783647)
    body = json.dumps(
        {
            **payload,
            "imageGenerationConfig": {
                "numberOfImages": num_image,  # Range: 1 to 5
                "quality": "premium",  # Options: standard/premium
                "height": 1024,  # Supported height list above
                "width": 1024,  # Supported width list above
                "cfgScale": cfg,  # Range: 1.0 (exclusive) to 10.0
                "seed": seed,  # Range: 0 to 214783647
            },
        }
    )

    response = bedrock_runtime_client.invoke_model(
        body=body,
        modelId=modelId,
        accept="application/json",
        contentType="application/json",
    )

    response_body = json.loads(response.get("body").read())
    images = [
        Image.open(io.BytesIO(base64.b64decode(base64_image)))
        for base64_image in response_body.get("images")
    ]
    return images

Generate images from text

Scripts that generate a new image from a text prompt follow this implementation pattern:

Configure a text prompt and optional negative text prompt.
Use the BedrockRuntime client to invoke the Titan Image Generator model.
Parse and decode the response.
Save the resulting images to disk.

Text-to-image

The following is a typical image generation script for the Titan Image Generator model:

# Text Variation
# textToImageParams Options:
#   text: prompt to guide the model on how to generate variations
#   negativeText: prompts to guide the model on what you don't want in image
images = titan_image(
    {
        "taskType": "TEXT_IMAGE",
        "textToImageParams": {
            "text": "two dogs walking down an urban street, facing the camera",  # Required
            "negativeText": "cars",  # Optional
        },
    }
)

This will produce images similar to the following.

Response Image 1	Response Image 2

Image variants

Image variation provides a way to generate subtle variants of an existing image. The following code snippet uses one of the images generated in the previous example to create variant images:

# Import an input image like this (only PNG/JPEG supported):
with open("<YOUR_IMAGE_FILE_PATH>", "rb") as image_file:
    input_image = base64.b64encode(image_file.read()).decode("utf8")

# Image Variation
# ImageVariationParams Options:
#   text: prompt to guide the model on how to generate variations
#   negativeText: prompts to guide the model on what you don't want in image
#   images: base64 string representation of the input image, only 1 is supported
images = titan_image(
    {
        "taskType": "IMAGE_VARIATION",
        "imageVariationParams": {
            "text": "two dogs walking down an urban street, facing the camera",  # Required
            "images": [input_image],  # One image is required
            "negativeText": "cars",  # Optional
        },
    },
)

This will produce images similar to the following.

Original Image	Response Image 1	Response Image 2

Edit an existing image

The Titan Image Generator model allows you to add, remove, or replace elements or areas within an existing image. You specify which area to affect by providing one of the following:

Mask image – A mask image is a binary image in which the 0-value pixels represent the area you want to affect and the 255-value pixels represent the area that should remain unchanged.
Mask prompt – A mask prompt is a natural language text description of the elements you want to affect, that uses an in-house text-to-segmentation model.

For more information, refer to Prompt Engineering Guidelines.

Scripts that apply an edit to an image follow this implementation pattern:

Load the image to be edited from disk.
Convert the image to a base64-encoded string.
Configure the mask through one of the following methods:
1. Load a mask image from disk, encoding it as base64 and setting it as the maskImage parameter.
2. Set the maskText parameter to a text description of the elements to affect.
Specify the new content to be generated using one of the following options:
1. To add or replace an element, set the text parameter to a description of the new content.
2. To remove an element, omit the text parameter completely.
Use the BedrockRuntime client to invoke the Titan Image Generator model.
Parse and decode the response.
Save the resulting images to disk.

Object editing: Inpainting with a mask image

The following is a typical image editing script for the Titan Image Generator model using maskImage. We take one of the images generated earlier and provide a mask image, where 0-value pixels are rendered as black and 255-value pixels as white. We also replace one of the dogs in the image with a cat using a text prompt.

with open("<YOUR_MASK_IMAGE_FILE_PATH>", "rb") as image_file:
    mask_image = base64.b64encode(image_file.read()).decode("utf8")

# Import an input image like this (only PNG/JPEG supported):
with open("<YOUR_ORIGINAL_IMAGE_FILE_PATH>", "rb") as image_file:
    input_image = base64.b64encode(image_file.read()).decode("utf8")

# Inpainting
# inPaintingParams Options:
#   text: prompt to guide inpainting
#   negativeText: prompts to guide the model on what you don't want in image
#   image: base64 string representation of the input image
#   maskImage: base64 string representation of the input mask image
#   maskPrompt: prompt used for auto editing to generate mask

images = titan_image(
    {
        "taskType": "INPAINTING",
        "inPaintingParams": {
            "text": "a cat",  # Optional
            "negativeText": "bad quality, low res",  # Optional
            "image": input_image,  # Required
            "maskImage": mask_image,
        },
    },
    num_image=3,
)

This will produce images similar to the following.

Original Image	Mask Image	Edited Image

Object removal: Inpainting with a mask prompt

In another example, we use maskPrompt to specify an object in the image, taken from the earlier steps, to edit. By omitting the text prompt, the object will be removed:

# Import an input image like this (only PNG/JPEG supported):
with open("<YOUR_IMAGE_FILE_PATH>", "rb") as image_file:
    input_image = base64.b64encode(image_file.read()).decode("utf8")

images = titan_image(
    {
        "taskType": "INPAINTING",
        "inPaintingParams": {
            "negativeText": "bad quality, low res",  # Optional
            "image": input_image,  # Required
            "maskPrompt": "white dog",  # One of "maskImage" or "maskPrompt" is required
        },
    },
)

This will produce images similar to the following.

Original Image	Response Image

Background editing: Outpainting

Outpainting is useful when you want to replace the background of an image. You can also extend the bounds of an image for a zoom-out effect. In the following example script, we use maskPrompt to specify which object to keep; you can also use maskImage. The parameter outPaintingMode specifies whether to allow modification of the pixels inside the mask. If set as DEFAULT, pixels inside of the mask are allowed to be modified so that the reconstructed image will be consistent overall. This option is recommended if the maskImage provided doesn’t represent the object with pixel-level precision. If set as PRECISE, the modification of pixels inside of the mask is prevented. This option is recommended if using a maskPrompt or a maskImage that represents the object with pixel-level precision.

# Import an input image like this (only PNG/JPEG supported):
with open("<YOUR_IMAGE_FILE_PATH>", "rb") as image_file:
    input_image = base64.b64encode(image_file.read()).decode("utf8")

# OutPaintingParams Options:
#   text: prompt to guide outpainting
#   negativeText: prompts to guide the model on what you don't want in image
#   image: base64 string representation of the input image
#   maskImage: base64 string representation of the input mask image
#   maskPrompt: prompt used for auto editing to generate mask
#   outPaintingMode: DEFAULT | PRECISE
images = titan_image(
    {
        "taskType": "OUTPAINTING",
        "outPaintingParams": {
            "text": "forest",  # Required
            "image": input_image,  # Required
            "maskPrompt": "dogs",  # One of "maskImage" or "maskPrompt" is required
            "outPaintingMode": "PRECISE",  # One of "PRECISE" or "DEFAULT"
        },
    },
    num_image=3,
)

This will produce images similar to the following.

Original Image	Text	Response Image
	“beach”
	“forest”

In addition, the effects of different values for outPaintingMode, with a maskImage that doesn’t outline the object with pixel-level precision, are as follows.

Original Image	Mask Image	Text	outPaintingMode	Response Image
		“forest”	DEFAULT
		“forest”	PRECISE

This section has given you an overview of the operations you can perform with the Titan Image Generator model. Specifically, these scripts demonstrate text-to-image, image variation, inpainting, and outpainting tasks. You should be able to adapt the patterns for your own applications by referencing the parameter details for those task types detailed in Amazon Titan Image Generator documentation.

Multimodal embedding and searching

You can use the Amazon Titan Multimodal Embeddings model for enterprise tasks such as image search and similarity-based recommendation, and it has built-in mitigation that helps reduce bias in searching results. There are multiple embedding dimension sizes for best latency/accuracy trade-offs for different needs, and all can be customized with a simple API to adapt to your own data while persisting data security and privacy. Amazon Titan Multimodal Embeddings is provided as simple APIs for real-time or asynchronous batch transform searching and recommendation applications, and can be connected to different vector databases, including Amazon OpenSearch Service.

Helper functions

The following function converts an image, and optionally text, into multimodal embeddings:

def titan_multimodal_embedding(
    image_path: str = None,  # maximum 2048 x 2048 pixels
    description: str = None,  # English only and max input tokens 128
    dimension: int = 1024,  # 1,024 (default), 384, 256
    model_id: str = "amazon.titan-embed-image-v1",
):
    payload_body = {}
    embedding_config: dict = {"embeddingConfig": {"outputEmbeddingLength": dimension}}

    # You can specify either text or image or both
    if image_path:
        # Maximum image size supported is 2048 x 2048 pixels
        with open(image_path, "rb") as image_file:
            payload_body["inputImage"] = base64.b64encode(image_file.read()).decode(
                "utf8"
            )
    if description:
        payload_body["inputText"] = description

    assert payload_body, "please provide either an image and/or a text description"
    print("n".join(payload_body.keys()))

    response = bedrock_runtime_client.invoke_model(
        body=json.dumps({**payload_body, **embedding_config}),
        modelId=model_id,
        accept="application/json",
        contentType="application/json",
    )

    return json.loads(response.get("body").read())

The following function returns the top similar multimodal embeddings given a query multimodal embeddings. Note that in practice, you can use a managed vector database, such as OpenSearch Service. The following example is for illustration purposes:

from scipy.spatial.distance import cdist
import numpy as np

def search(query_emb: np.array, indexes: np.array, top_k: int = 1):
    dist = cdist(query_emb, indexes, metric="cosine")
    return dist.argsort(axis=-1)[0, :top_k], np.sort(dist, axis=-1)[:top_k]

Synthetic dataset

For illustration purposes, we use Anthropic’s Claude 2.1 model in Amazon Bedrock to randomly generate seven different products, each with three variants, using the following prompt:

Generate a list of 7 items description for an online e-commerce shop, each comes with 3 variants of color or type. All with separate full sentence description.

The following is the list of returned outputs:

1. T-shirt
- A red cotton t-shirt with a crew neck and short sleeves.
- A blue cotton t-shirt with a v-neck and short sleeves.
- A black polyester t-shirt with a scoop neck and cap sleeves.

2. Jeans
- Classic blue relaxed fit denim jeans with a mid-rise waist.
- Black skinny fit denim jeans with a high-rise waist and ripped details at the knees.
- Stonewash straight leg denim jeans with a standard waist and front pockets.

3. Sneakers
- White leather low-top sneakers with an almond toe cap and thick rubber outsole.
- Gray mesh high-top sneakers with neon green laces and a padded ankle collar.
- Tan suede mid-top sneakers with a round toe and ivory rubber cupsole.

4. Backpack
- A purple nylon backpack with padded shoulder straps, front zipper pocket and laptop sleeve.
- A gray canvas backpack with brown leather trims, side water bottle pockets and drawstring top closure.
- A black leather backpack with multiple interior pockets, top carry handle and adjustable padded straps.

5. Smartwatch
- A silver stainless steel smartwatch with heart rate monitor, GPS tracker and sleep analysis.
- A space gray aluminum smartwatch with step counter, phone notifications and calendar syncing.
- A rose gold smartwatch with activity tracking, music controls and customizable watch faces.

6. Coffee maker
- A 12-cup programmable coffee maker in brushed steel with removable water tank and keep warm plate.
- A compact 5-cup single serve coffee maker in matt black with travel mug auto-dispensing feature.
- A retro style stovetop percolator coffee pot in speckled enamel with stay-cool handle and glass knob lid.

7. Yoga mat
- A teal 4mm thick yoga mat made of natural tree rubber with moisture-wicking microfiber top.
- A purple 6mm thick yoga mat made of eco-friendly TPE material with integrated carrying strap.
- A patterned 5mm thick yoga mat made of PVC-free material with towel cover included.

Assign the above response to variable response_cat. Then we use the Titan Image Generator model to create product images for each item:

import re

def extract_text(input_string):
    pattern = r"- (.*?)($|n)"
    matches = re.findall(pattern, input_string)
    extracted_texts = [match[0] for match in matches]
    return extracted_texts

product_description = extract_text(response_cat)

titles = []
for prompt in product_description:
    images = titan_image(
        {
            "taskType": "TEXT_IMAGE",
            "textToImageParams": {
                "text": prompt,  # Required
            },
        },
        num_image=1,
    )
    title = "_".join(prompt.split()[:4]).lower()
    titles.append(title)
    images[0].save(f"{title}.png", format="png")

All the generated images can be found in the appendix at the end of this post.

Multimodal dataset indexing

Use the following code for multimodal dataset indexing:

multimodal_embeddings = []
for image_filename, description in zip(titles, product_description):
    embedding = titan_multimodal_embedding(f"{image_filename}.png", dimension=1024)["embedding"]
    multimodal_embeddings.append(embedding)

Multimodal searching

Use the following code for multimodal searching:

query_prompt = "<YOUR_QUERY_TEXT>"
query_embedding = titan_multimodal_embedding(description=query_prompt, dimension=1024)["embedding"]
# If searching via Image
# query_image_filename = "<YOUR_QUERY_IMAGE>"
# query_emb = titan_multimodal_embedding(image_path=query_image_filename, dimension=1024)["embedding"]
idx_returned, dist = search(np.array(query_embedding)[None], np.array(multimodal_embeddings))

The following are some search results.

Query	Results
“sneaker”
“white sneaker”
“leather backpack”
“purple backpack”

Conclusion

The post introduces the Amazon Titan Image Generator and Amazon Titan Multimodal Embeddings models. Titan Image Generator enables you to create custom, high-quality images from text prompts. Key features include iterating on prompts, automatic background editing, and data customization. It has safeguards like invisible watermarks to encourage responsible use. Titan Multimodal Embeddings converts text, images, or both into semantic vectors to power accurate search and recommendations. We then provided Python code samples for using these services, and demonstrated generating images from text prompts and iterating on those images; editing existing images by adding, removing, or replacing elements specified by mask images or mask text; creating multimodal embeddings from text, images, or both; and searching for similar multimodal embeddings to a query. We also demonstrated using a synthetic e-commerce dataset indexed and searched using Titan Multimodal Embeddings. The aim of this post is to enable developers to start using these new AI services in their applications. The code patterns can serve as templates for custom implementations.

All the code is available on the GitHub repository. For more information, refer to the Amazon Bedrock User Guide.

About the Authors

Rohit Mittal is a Principal Product Manager at Amazon AI building multi-modal foundation models. He recently led the launch of Amazon Titan Image Generator model as part of Amazon Bedrock service. Experienced in AI/ML, NLP, and Search, he is interested in building products that solves customer pain points with innovative technology.

Dr. Ashwin Swaminathan is a Computer Vision and Machine Learning researcher, engineer, and manager with 12+ years of industry experience and 5+ years of academic research experience. Strong fundamentals and proven ability to quickly gain knowledge and contribute to newer and emerging areas.

Dr. Yusheng Xie is a Principal Applied Scientist at Amazon AGI. His work focuses building multi-modal foundation models. Before joining AGI, he was leading various multi-modal AI development at AWS such as Amazon Titan Image Generator and Amazon Textract Queries.

Dr. Hao Yang is a Principal Applied Scientist at Amazon. His main research interests are object detection and learning with limited annotations. Outside work, Hao enjoys watching films, photography, and outdoor activities.

Dr. Davide Modolo is an Applied Science Manager at Amazon AGI, working on building large multimodal foundational models. Before joining Amazon AGI, he was a manager/lead for 7 years in AWS AI Labs (Amazon Bedrock and Amazon Rekognition). Outside of work, he enjoys traveling and playing any kind of sport, especially soccer.

Dr. Baichuan Sun, is currently serving as a Sr. AI/ML Solutions Architect at AWS, focusing on generative AI and applies his knowledge in data science and machine learning to provide practical, cloud-based business solutions. With experience in management consulting and AI solution architecture, he addresses a range of complex challenges, including robotics computer vision, time series forecasting, and predictive maintenance, among others. His work is grounded in a solid background of project management, software R&D, and academic pursuits. Outside of work, Dr. Sun enjoys the balance of traveling and spending time with family and friends.

Dr. Kai Zhu currently works as Cloud Support Engineer at AWS, helping customers with issues in AI/ML related services like SageMaker, Bedrock, etc. He is a SageMaker Subject Matter Expert. Experienced in data science and data engineering, he is interested in building generative AI powered projects.

Kris Schultz has spent over 25 years bringing engaging user experiences to life by combining emerging technologies with world class design. In his role as Senior Product Manager, Kris helps design and build AWS services to power Media & Entertainment, Gaming, and Spatial Computing.

Appendix

In the following sections, we demonstrate challenging sample use cases like text insertion, hands, and reflections to highlight the capabilities of the Titan Image Generator model. We also include the sample output images produced in earlier examples.

Text

The Titan Image Generator model excels at complex workflows like inserting readable text into images. This example demonstrates Titan’s ability to clearly render uppercase and lowercase letters in a consistent style within an image.

a corgi wearing a baseball cap with text “genai”	a happy boy giving a thumbs up, wearing a tshirt with text “generative AI”

Hands

The Titan Image Generator model also has the ability to generate detailed AI images. The image shows realistic hands and fingers with visible detail, going beyond more basic AI image generation that may lack such specificity. In the following examples, notice the precise depiction of the pose and anatomy.

a person’s hand viewed from above	a close look at a person’s hands holding a coffee mug

Mirror

The images generated by the Titan Image Generator model spatially arrange objects and accurately reflect mirror effects, as demonstrated in the following examples.

A cute fluffy white cat stands on its hind legs, peering curiously into an ornate golden mirror. In the reflection the cat sees itself	beautiful sky lake with reflections on the water

Synthetic product images

The following are the product images generated earlier in this post for the Titan Multimodal Embeddings model.

Build a contextual chatbot application using Knowledge Bases for Amazon Bedrock

February 19, 2024

by Manish Chugh Amazon AWS

Modern chatbots can serve as digital agents, providing a new avenue for delivering 24/7 customer service and support across many industries. Their popularity stems from the ability to respond to customer inquiries in real time and handle multiple queries simultaneously in different languages. Chatbots also offer valuable data-driven insights into customer behavior while scaling effortlessly as the user base grows; therefore, they present a cost-effective solution for engaging customers. Chatbots use the advanced natural language capabilities of large language models (LLMs) to respond to customer questions. They can understand conversational language and respond naturally. However, chatbots that merely answer basic questions have limited utility. To become trusted advisors, chatbots need to provide thoughtful, tailored responses.

One way to enable more contextual conversations is by linking the chatbot to internal knowledge bases and information systems. Integrating proprietary enterprise data from internal knowledge bases enables chatbots to contextualize their responses to each user’s individual needs and interests. For example, a chatbot could suggest products that match a shopper’s preferences and past purchases, explain details in language adapted to the user’s level of expertise, or provide account support by accessing the customer’s specific records. The ability to intelligently incorporate information, understand natural language, and provide customized replies in a conversational flow allows chatbots to deliver real business value across diverse use cases.

The popular architecture pattern of Retrieval Augmented Generation (RAG) is often used to augment user query context and responses. RAG combines the capabilities of LLMs with the grounding in facts and real-world knowledge that comes from retrieving relevant texts and passages from corpus of data. These retrieved texts are then used to inform and ground the output, reducing hallucination and improving relevance.

In this post, we illustrate contextually enhancing a chatbot by using Knowledge Bases for Amazon Bedrock, a fully managed serverless service. The Knowledge Bases for Amazon Bedrock integration allows our chatbot to provide more relevant, personalized responses by linking user queries to related information data points. Internally, Amazon Bedrock uses embeddings stored in a vector database to augment user query context at runtime and enable a managed RAG architecture solution. We use the Amazon letters to shareholders dataset to develop this solution.

Retrieval Augmented Generation

RAG is an approach to natural language generation that incorporates information retrieval into the generation process. RAG architecture involves two key workflows: data preprocessing through ingestion, and text generation using enhanced context.

The data ingestion workflow uses LLMs to create embedding vectors that represent semantic meaning of texts. Embeddings are created for documents and user questions. The document embeddings are split into chunks and stored as indexes in a vector database. The text generation workflow then takes a question’s embedding vector and uses it to retrieve the most similar document chunks based on vector similarity. It augments prompts with these relevant chunks to generate an answer using the LLM. For more details, refer to the Primer on Retrieval Augmented Generation, Embeddings, and Vector Databases section in Preview – Connect Foundation Models to Your Company Data Sources with Agents for Amazon Bedrock.

The following diagram illustrates the high-level RAG architecture.

Although the RAG architecture has many advantages, it involves multiple components, including a database, retrieval mechanism, prompt, and generative model. Managing these interdependent parts can introduce complexities in system development and deployment. The integration of retrieval and generation also requires additional engineering effort and computational resources. Some open source libraries provide wrappers to reduce this overhead; however, changes to libraries can introduce errors and add additional overhead of versioning. Even with open source libraries, significant effort is required to write code, determine optimal chunk size, generate embeddings, and more. This setup work alone can take weeks depending on data volume.

Therefore, a managed solution that handles these undifferentiated tasks could streamline and accelerate the process of implementing and managing RAG applications.

Knowledge Bases for Amazon Bedrock

Knowledge Bases for Amazon Bedrock is a serverless option to build powerful conversational AI systems using RAG. It offers fully managed data ingestion and text generation workflows.

For data ingestion, it handles creating, storing, managing, and updating text embeddings of document data in the vector database automatically. It splits the documents into manageable chunks for efficient retrieval. The chunks are then converted to embeddings and written to a vector index, while allowing you to see the source documents when answering a question.

For text generation, Amazon Bedrock provides the RetrieveAndGenerate API to create embeddings of user queries, and retrieves relevant chunks from the vector database to generate accurate responses. It also supports source attribution and short-term memory needed for RAG applications.

This enables you to focus on your core business applications and removes the undifferentiated heavy lifting.

Solution overview

The solution presented in this post uses a chatbot created using a Streamlit application and includes the following AWS services:

Amazon Simple Storage Service (Amazon S3) as source
Knowledge Bases for Amazon Bedrock for data ingestion
An Amazon OpenSearch Serverless vector store to save text embeddings
AWS Lambda as an API function to invoke the Knowledge Bases API

The following diagram is a common solution architecture pattern you can use to integrate any chatbot application to Knowledge Bases for Amazon Bedrock.

This architecture includes the following steps:

A user interacts with the Streamlit chatbot interface and submits a query in natural language
This triggers a Lambda function, which invokes the Knowledge Bases RetrieveAndGenerate API. Internally, Knowledge Bases uses an Amazon Titan embedding model and converts the user query to a vector and finds chunks that are semantically similar to the user query. The user prompt is than augmented with the chunks that are retrieved from the knowledge base. The prompt alongside the additional context is then sent to a LLM for response generation. In this solution, we use Anthropic Claude Instant as our LLM to generate user responses using additional context. Note that this solution is supported in Regions where Anthropic Claude on Amazon Bedrock is available.
A contextually relevant response is sent back to the chatbot application and user.

Prerequisites

Amazon Bedrock users need to request access to foundation models before they are available for use. This is a one-time action and takes less than a minute. For this solution, you’ll need to enable access to the Titan Embeddings G1 – Text and Claude Instant – v1.2 model in Amazon Bedrock. For more information, refer to Model access.

Clone the GitHub repo

The solution presented in this post is available in the following GitHub repo. You need to clone the GitHub repository to your local machine. Open a terminal window and run the following command. Note this is one single git clone command.

git clone --depth 2 --filter=blob:none --no-checkout https://github.com/aws-samples/amazon-bedrock-samples && cd amazon-bedrock-samples && git checkout main rag-solutions/contextual-chatbot-using-knowledgebase

Upload your knowledge dataset to Amazon S3

We download the dataset for our knowledge base and upload it into a S3 bucket. This dataset will feed and power knowledge base. Complete the following steps:

Navigate to the Annual reports, proxies and shareholder letters data repository and download the last few years of Amazon shareholder letters.
On the Amazon S3 console, choose Buckets in the navigation pane.
Choose Create bucket.
Name the bucket knowledgebase-<your-awsaccount-number>.
Leave all other bucket settings as default and choose Create.
Navigate to the knowledgebase-<your-awsaccount-number> bucket.
Choose Create folder and name it dataset.
Leave all other folder settings as default and choose Create.
Navigate back to the bucket home and choose Create folder to create a new folder and name it lambdalayer.
Leave all other settings as default and choose Create.
Navigate to the dataset folder.
Upload the annual reports, proxies and shareholder letters dataset files you downloaded earlier to this bucket and choose Upload.
Navigate to the lambdalayer folder.
Upload the knowledgebase-lambdalayer.zip file available under the /lambda/layer folder in the GitHub repo you cloned earlier and choose Upload. You will use this Lambda layer code later to create the Lambda function.

Create a knowledge base

In this step, we create a knowledge base using the Amazon shareholder letters dataset we uploaded to our S3 bucket in the previous step.

On the Amazon Bedrock console, under Orchestration in the navigation pane, choose Knowledge base.
Choose Create knowledge base.
In the Knowledge base details section, enter a name and optional description.
In the IAM permissions section, select Create and use a new service role and enter a name for the role.
Add tags as needed.
Choose Next.
Leave Data source name as the default name.
For S3 URI, choose Browse S3 to choose the S3 bucket knowledgebase-<your-account-number>/dataset/.You need to point to the bucket and dataset folder you created in the previous steps.
In the Advanced settings section, leave the default values (if you want, you can change the default chunking strategy and specify the chunk size and overlay in percentage).
Choose Next.
For Embeddings model, select Titan Embedding G1 – Text.
For Vector database, you can either select Quick create a new vector store or Choose a vector store you have created. Note that, to use the vector store of your choice, you need have a vector store preconfigured to use. We currently support four vector engine types: the vector engine for Amazon OpenSearch Serverless, Amazon Aurora, Pinecone, and Redis Enterprise Cloud. For this post, we select Quick create a new vector store, which by default creates a new OpenSearch Serverless vector store in your account.
Choose Next.
On the Review and create page, review all the information, or choose Previous to modify any options.
Choose Create knowledge base.Note the knowledge base creation process begins and the status is In progress. It will take a few minutes to create the vector store and knowledge base. Don’t navigate away from the page, otherwise creation will fail.
When the knowledge base status is in the Ready state, note down the knowledge base ID. You will use it in the next steps to configure the Lambda function.
Now that knowledge base is ready, we need to sync our Amazon shareholders letter data to it. In the Data Source section of the knowledge base details page, choose Sync to trigger the data ingestion process from the S3 bucket to the knowledge base.

This sync process splits the document files into smaller chunks of the chunk size specified earlier, generates vector embeddings using the selected text embedding model, and stores them in the vector store managed by Knowledge Bases for Amazon Bedrock.

When the dataset sync is complete, the status of the data source will change to the Ready state. Note that, if you add any additional documents in the S3 data folder, you need to re-sync the knowledge base.

Congratulations, your knowledge base is ready.

Note that you can also use Knowledge Bases for Amazon Bedrock service APIs and the AWS Command Line Interface (AWS CLI) to programmatically create a knowledge base. You will need to run various sections of the Jupyter notebook provided under the /notebook folder in the GitHub repo.

Create a Lambda function

This Lambda function is deployed using an AWS CloudFormation template available in the GitHub repo under the /cfn folder. The template requires two parameters: the S3 bucket name and the knowledge base ID.

On the AWS CloudFormation service home page, choose Create stack to create a new stack.
Select Template is ready for Prepare template.
Select Upload the template file for Template source.
Choose Choose file, navigate to the GitHub repo you cloned earlier, and choose the .yaml file under the /cfn folder.
Choose Next.
For Stack name, enter a name.
In the Parameters section, enter the knowledge base ID and S3 bucket name you noted down earlier.
Choose Next.
Leave all default options as is, choose Next, and choose Submit.
Verify that the CloudFormation template ran successfully, and there are no errors.

Congratulations, you have created a Lambda function, related roles, and policies successfully.

Test the contextual chatbot application

To test your chatbot application, complete the following steps:

Open a new terminal or a command line window on your machine.
Run the following command to install the AWS SDK for Python (Boto3). Boto3 makes it straightforward to integrate a Python application, library, or script with AWS services.
```
pip install boto3
```
Run the following command to install and set up a local Python development environment to run the Streamlit application:
```
pip install streamlit
```
Navigate to the /streamlit folder in the GitHub repository folder you cloned earlier.
Run the following command to instantiate the chatbot application:
```
python -m streamlit run chatbot.py
```

This should open a web-based chat application powered by Streamlit in your default web browser.

Use this Streamlit chatbot application to post natural language questions to start the conversations powered by Knowledge Bases for Amazon Bedrock.

When you submit a prompt, the Streamlit app triggers the Lambda function, which invokes the Knowledge Bases RetrieveAndGenerate API to search and generate responses.

The following table includes some sample questions and related knowledge base responses. Try out some of these questions by using prompts.

Questions	Answers
What is Amazon doing in the field of generative AI?	Amazon has been working on their own large language models (LLMs) for generative AI and believes it will transform and improve every customer experience. They plan to continue investing substantially in these models across all their consumer, seller, brand, and creator experiences.
What is AWS year-over-year revenue in 2022?	AWS revenue grew 29% year-over-year in 2022 on a $62 billion revenue base.
How many days has Amazon asked employees to come to work in office?	Amazon has asked corporate employees to come back to office at least three days a week beginning May 2022.
By what percentage did AWS revenue grow year-over-year in 2022?	AWS had a 29% year-over-year (‘YoY’) revenue in 2022.
Compared to Graviton2 processors, what performance improvement did Graviton3 chips deliver according to the passage?	In 2022, AWS delivered their Graviton3 chips, providing 25% better performance than the Graviton2 processors.
Which was the first inference chip launched by AWS according to the passage?	AWS launched their first inference chips (“Inferentia”) in 2019, and they have saved companies like Amazon over a hundred million dollars in capital expense.
According to the context, in what year did Amazon’s annual revenue increase from $245B to $434B?	Amazon’s annual revenue increased from $245B in 2019 to $434B in 2022.
Tell me again what was the revenue in 2019?	Amazon’s revenue in 2019 was $245 billion.
and, 2021?	Amazon’s revenue in 2021 was $469.8 billion, an increase of 22% over 2020.
And, remind me again when was the first inference chip was launched?	Amazon’s first inference chip was Inferentia, which launched in 2019.

During the first call to the Lambda function, the RetrieveAndGenerate API returns a sessionId, which is then passed by the Streamlit app along with the subsequent user prompt as an input to the RetrieveAndGenerate API to continue the conversation in the same session. The RetrieveAndGenerate API manages the short-term memory and uses the chat history as long as the same sessionId is passed as an input in the successive calls.

Congratulations, you have successfully created and tested a chatbot application using Knowledge Bases for Amazon Bedrock.

Clean up

Failing to delete resources such as the S3 bucket, OpenSearch Serverless collection, and knowledge base will incur charges. To clean up these resources, delete the CloudFormation stack, delete the S3 bucket (including any document folders and files stored in that bucket), delete the OpenSearch Serverless collection, delete the knowledge base, and delete any roles, policies, and permissions that you created earlier.

Conclusion

In this post, we provided an overview of contextual chatbots and explained why they’re important. We described the complexities involved in data ingestion and text generation workflows for a RAG architecture. We then introduced how Knowledge Bases for Amazon Bedrock creates a fully managed serverless RAG system, including a vector store. Finally, we provided a solution architecture and sample code in a GitHub repo to retrieve and generate contextual responses for a chatbot application using a knowledge base.

By explaining the value of contextual chatbots, the challenges of RAG systems, and how Knowledge Bases for Amazon Bedrock addresses those challenges, this post aimed to showcase how Amazon Bedrock enables you to build sophisticated conversational AI applications with minimal effort.

For more information, see the Amazon Bedrock Developer Guide and Knowledge Base APIs.

About the Authors

Manish Chugh is a Principal Solutions Architect at AWS based in San Francisco, CA. He specializes in machine learning and generative AI. He works with organizations ranging from large enterprises to early-stage startups on problems related to machine learning. His role involves helping these organizations architect scalable, secure, and cost-effective workloads on AWS. He regularly presents at AWS conferences and other partner events. Outside of work, he enjoys hiking on East Bay trails, road biking, and watching (and playing) cricket.

Mani Khanuja is a Tech Lead – Generative AI Specialists, author of the book Applied Machine Learning and High Performance Computing on AWS, and a member of the Board of Directors for Women in Manufacturing Education Foundation Board. She leads machine learning projects in various domains such as computer vision, natural language processing, and generative AI. She speaks at internal and external conferences such AWS re:Invent, Women in Manufacturing West, YouTube webinars, and GHC 23. In her free time, she likes to go for long runs along the beach.

Pallavi Nargund is a Principal Solutions Architect at AWS. In her role as a cloud technology enabler, she works with customers to understand their goals and challenges, and give prescriptive guidance to achieve their objective with AWS offerings. She is passionate about women in technology and is a core member of Women in AI/ML at Amazon. She speaks at internal and external conferences such as AWS re:Invent, AWS Summits, and webinars. Outside of work she enjoys volunteering, gardening, cycling and hiking.

Amazon and UCLA announce 2023 Science Hub awards

February 16, 2024

by Amazon AWS

Gift project and sponsored awards examine topics that range from improving 3D garment modeling to reducing false positives in code analysis.Read More

Code Llama 70B is now available in Amazon SageMaker JumpStart

February 16, 2024

by Kyle Ulrich Amazon AWS

Today, we are excited to announce that Code Llama foundation models, developed by Meta, are available for customers through Amazon SageMaker JumpStart to deploy with one click for running inference. Code Llama is a state-of-the-art large language model (LLM) capable of generating code and natural language about code from both code and natural language prompts. You can try out this model with SageMaker JumpStart, a machine learning (ML) hub that provides access to algorithms, models, and ML solutions so you can quickly get started with ML. In this post, we walk through how to discover and deploy the Code Llama model via SageMaker JumpStart.

Code Llama

Code Llama is a model released by Meta that is built on top of Llama 2. This state-of-the-art model is designed to improve productivity for programming tasks for developers by helping them create high-quality, well-documented code. The models excel in Python, C++, Java, PHP, C#, TypeScript, and Bash, and have the potential to save developers’ time and make software workflows more efficient.

It comes in three variants, engineered to cover a wide variety of applications: the foundational model (Code Llama), a Python specialized model (Code Llama Python), and an instruction-following model for understanding natural language instructions (Code Llama Instruct). All Code Llama variants come in four sizes: 7B, 13B, 34B, and 70B parameters. The 7B and 13B base and instruct variants support infilling based on surrounding content, making them ideal for code assistant applications. The models were designed using Llama 2 as the base and then trained on 500 billion tokens of code data, with the Python specialized version trained on an incremental 100 billion tokens. The Code Llama models provide stable generations with up to 100,000 tokens of context. All models are trained on sequences of 16,000 tokens and show improvements on inputs with up to 100,000 tokens.

The model is made available under the same community license as Llama 2.

Foundation models in SageMaker

SageMaker JumpStart provides access to a range of models from popular model hubs, including Hugging Face, PyTorch Hub, and TensorFlow Hub, which you can use within your ML development workflow in SageMaker. Recent advances in ML have given rise to a new class of models known as foundation models, which are typically trained on billions of parameters and are adaptable to a wide category of use cases, such as text summarization, digital art generation, and language translation. Because these models are expensive to train, customers want to use existing pre-trained foundation models and fine-tune them as needed, rather than train these models themselves. SageMaker provides a curated list of models that you can choose from on the SageMaker console.

You can find foundation models from different model providers within SageMaker JumpStart, enabling you to get started with foundation models quickly. You can find foundation models based on different tasks or model providers, and easily review model characteristics and usage terms. You can also try out these models using a test UI widget. When you want to use a foundation model at scale, you can do so without leaving SageMaker by using pre-built notebooks from model providers. Because the models are hosted and deployed on AWS, you can rest assured that your data, whether used for evaluating or using the model at scale, is never shared with third parties.

Discover the Code Llama model in SageMaker JumpStart

To deploy the Code Llama 70B model, complete the following steps in Amazon SageMaker Studio:

On the SageMaker Studio home page, choose JumpStart in the navigation pane.
Search for Code Llama models and choose the Code Llama 70B model from the list of models shown.

You can find more information about the model on the Code Llama 70B model card.

The following screenshot shows the endpoint settings. You can change the options or use the default ones.
Accept the End User License Agreement (EULA) and choose Deploy.

This will start the endpoint deployment process, as shown in the following screenshot.

Deploy the model with the SageMaker Python SDK

Alternatively, you can deploy through the example notebook by choosing Open Notebook within model detail page of Classic Studio. The example notebook provides end-to-end guidance on how to deploy the model for inference and clean up resources.

To deploy using notebook, we start by selecting an appropriate model, specified by the model_id. You can deploy any of the selected models on SageMaker with the following code:

from sagemaker.jumpstart.model import JumpStartModel

model = JumpStartModel(model_id="meta-textgeneration-llama-codellama-70b")
predictor = model.deploy(accept_eula=False)  # Change EULA acceptance to True

This deploys the model on SageMaker with default configurations, including default instance type and default VPC configurations. You can change these configurations by specifying non-default values in JumpStartModel. Note that by default, accept_eula is set to False. You need to set accept_eula=True to deploy the endpoint successfully. By doing so, you accept the user license agreement and acceptable use policy as mentioned earlier. You can also download the license agreement.

Invoke a SageMaker endpoint

After the endpoint is deployed, you can carry out inference by using Boto3 or the SageMaker Python SDK. In the following code, we use the SageMaker Python SDK to call the model for inference and print the response:

def print_response(payload, response):
    print(payload["inputs"])
    print(f"> {response[0]['generated_text']}")
    print("n==================================n")

The function print_response takes a payload consisting of the payload and model response and prints the output. Code Llama supports many parameters while performing inference:

max_length – The model generates text until the output length (which includes the input context length) reaches max_length. If specified, it must be a positive integer.
max_new_tokens – The model generates text until the output length (excluding the input context length) reaches max_new_tokens. If specified, it must be a positive integer.
num_beams – This specifies the number of beams used in the greedy search. If specified, it must be an integer greater than or equal to num_return_sequences.
no_repeat_ngram_size – The model ensures that a sequence of words of no_repeat_ngram_size is not repeated in the output sequence. If specified, it must be a positive integer greater than 1.
temperature – This controls the randomness in the output. Higher temperature results in an output sequence with low-probability words, and lower temperature results in an output sequence with high-probability words. If temperature is 0, it results in greedy decoding. If specified, it must be a positive float.
early_stopping – If True, text generation is finished when all beam hypotheses reach the end of sentence token. If specified, it must be Boolean.
do_sample – If True, the model samples the next word as per the likelihood. If specified, it must be Boolean.
top_k – In each step of text generation, the model samples from only the top_k most likely words. If specified, it must be a positive integer.
top_p – In each step of text generation, the model samples from the smallest possible set of words with cumulative probability top_p. If specified, it must be a float between 0 and 1.
return_full_text – If True, the input text will be part of the output generated text. If specified, it must be Boolean. The default value for it is False.
stop – If specified, it must be a list of strings. Text generation stops if any one of the specified strings is generated.

You can specify any subset of these parameters while invoking an endpoint. Next, we show an example of how to invoke an endpoint with these arguments.

Code completion

The following examples demonstrate how to perform code completion where the expected endpoint response is the natural continuation of the prompt.

We first run the following code:

prompt = """
import socket

def ping_exponential_backoff(host: str):
"""

payload = {
    "inputs": prompt,
    "parameters": {"max_new_tokens": 256, "temperature": 0.2, "top_p": 0.9},
}
response = predictor.predict(payload)
print_response(payload, response)

We get the following output:

"""
    Pings the given host with exponential backoff.
    """
    timeout = 1
    while True:
        try:
            socket.create_connection((host, 80), timeout=timeout)
            return
        except socket.error:
            timeout *= 2

For our next example, we run the following code:

prompt = """
import argparse
def main(string: str):
    print(string)
    print(string[::-1])
if __name__ == "__main__":
"""

payload = {
    "inputs": prompt,
    "parameters": {"max_new_tokens": 256, "temperature": 0.2, "top_p": 0.9},
}
predictor.predict(payload)

We get the following output:

parser = argparse.ArgumentParser(description='Reverse a string')
    parser.add_argument('string', type=str, help='String to reverse')
    args = parser.parse_args()
    main(args.string)

Code generation

The following examples show Python code generation using Code Llama.

We first run the following code:

prompt = """
Write a python function to traverse a list in reverse.
"""

payload = {
    "inputs": prompt,
    "parameters": {"max_new_tokens": 256, "temperature": 0.2, "top_p": 0.9},
}
response = predictor.predict(payload)
print_response(payload, response)

We get the following output:

def reverse(list1):
    for i in range(len(list1)-1,-1,-1):
        print(list1[i])

list1 = [1,2,3,4,5]
reverse(list1)

For our next example, we run the following code:

prompt = """
Write a python function to to carry out bubble sort.
"""

payload = {
    "inputs": prompt,
    "parameters": {"max_new_tokens": 256, "temperature": 0.1, "top_p": 0.9},
}
response = predictor.predict(payload)
print_response(payload, response)

We get the following output:

def bubble_sort(arr):
    n = len(arr)
    for i in range(n):
        for j in range(0, n-i-1):
            if arr[j] > arr[j+1]:
                arr[j], arr[j+1] = arr[j+1], arr[j]
    return arr

arr = [64, 34, 25, 12, 22, 11, 90]
print(bubble_sort(arr))

These are some of the examples of code-related tasks using Code Llama 70B. You can use the model to generate even more complicated code. We encourage you to try it using your own code-related use cases and examples!

Clean up

After you have tested the endpoints, make sure you delete the SageMaker inference endpoints and the model to avoid incurring charges. Use the following code:

predictor.delete_endpoint()

Conclusion

In this post, we introduced Code Llama 70B on SageMaker JumpStart. Code Llama 70B is a state-of-the-art model for generating code from natural language prompts as well as code. You can deploy the model with a few simple steps in SageMaker JumpStart and then use it to carry out code-related tasks such as code generation and code infilling. As a next step, try using the model with your own code-related use cases and data.

About the authors

Dr. Kyle Ulrich is an Applied Scientist with the Amazon SageMaker JumpStart team. His research interests include scalable machine learning algorithms, computer vision, time series, Bayesian non-parametrics, and Gaussian processes. His PhD is from Duke University and he has published papers in NeurIPS, Cell, and Neuron.

Dr. Farooq Sabir is a Senior Artificial Intelligence and Machine Learning Specialist Solutions Architect at AWS. He holds PhD and MS degrees in Electrical Engineering from the University of Texas at Austin and an MS in Computer Science from Georgia Institute of Technology. He has over 15 years of work experience and also likes to teach and mentor college students. At AWS, he helps customers formulate and solve their business problems in data science, machine learning, computer vision, artificial intelligence, numerical optimization, and related domains. Based in Dallas, Texas, he and his family love to travel and go on long road trips.

June Won is a product manager with SageMaker JumpStart. He focuses on making foundation models easily discoverable and usable to help customers build generative AI applications. His experience at Amazon also includes mobile shopping application and last mile delivery.

Detect anomalies in manufacturing data using Amazon SageMaker Canvas

February 15, 2024

by Helge Aufderheide Amazon AWS

With the use of cloud computing, big data and machine learning (ML) tools like Amazon Athena or Amazon SageMaker have become available and useable by anyone without much effort in creation and maintenance. Industrial companies increasingly look at data analytics and data-driven decision-making to increase resource efficiency across their entire portfolio, from operations to performing predictive maintenance or planning.

Due to the velocity of change in IT, customers in traditional industries are facing a dilemma of skillset. On the one hand, analysts and domain experts have a very deep knowledge of the data in question and its interpretation, yet often lack the exposure to data science tooling and high-level programming languages such as Python. On the other hand, data science experts often lack the experience to interpret the machine data content and filter it for what is relevant. This dilemma hampers the creation of efficient models that use data to generate business-relevant insights.

Amazon SageMaker Canvas addresses this dilemma by providing domain experts a no-code interface to create powerful analytics and ML models, such as forecasts, classification, or regression models. It also allows you to deploy and share these models with ML and MLOps specialists after creation.

In this post, we show you how to use SageMaker Canvas to curate and select the right features in your data, and then train a prediction model for anomaly detection, using the no-code functionality of SageMaker Canvas for model tuning.

Anomaly detection for the manufacturing industry

At the time of writing, SageMaker Canvas focuses on typical business use cases, such as forecasting, regression, and classification. For this post, we demonstrate how these capabilities can also help detect complex abnormal data points. This use case is relevant, for instance, to pinpoint malfunctions or unusual operations of industrial machines.

Anomaly detection is important in the industry domain, because machines (from trains to turbines) are normally very reliable, with times between failures spanning years. Most data from these machines, such as temperature senor readings or status messages, describes the normal operation and has limited value for decision-making. Engineers look for abnormal data when investigating root causes for a fault or as warning indicators for future faults, and performance managers examine abnormal data to identify potential improvements. Therefore, the typical first step in moving towards data-driven decision-making relies on finding that relevant (abnormal) data.

In this post, we use SageMaker Canvas to curate and select the right features in data, and then train a prediction model for anomaly detection, using SageMaker Canvas no-code functionality for model tuning. Then we deploy the model as a SageMaker endpoint.

Solution overview

For our anomaly detection use case, we train a prediction model to predict a characteristic feature for the normal operation of a machine, such as the motor temperature indicated in a car, from influencing features, such as the speed and recent torque applied in the car. For anomaly detection on a new sample of measurements, we compare the model predictions for the characteristic feature with the observations provided.

For the example of the car motor, a domain expert obtains measurements of the normal motor temperature, recent motor torque, ambient temperature, and other potential influencing factors. These allow you to train a model to predict the temperature from the other features. Then we can use the model to predict the motor temperature on a regular basis. When the predicted temperature for that data is similar to the observed temperature in that data, the motor is working normally; a discrepancy will point to an anomaly, such as the cooling system failing or a defect in the motor.

The following diagram illustrates the solution architecture.

The solution consists of four key steps:

The domain expert creates the initial model, including data analysis and feature curation using SageMaker Canvas.
The domain expert shares the model via the Amazon SageMaker Model Registry or deploys it directly as a real-time endpoint.
An MLOps expert creates the inference infrastructure and code translating the model output from a prediction into an anomaly indicator. This code typically runs inside an AWS Lambda function.
When an application requires an anomaly detection, it calls the Lambda function, which uses the model for inference and provides the response (whether or not it’s an anomaly).

Prerequisites

To follow along with this post, you must meet the following prerequisites:

The domain expert user has access to Sagemaker Canvas.
The MLOps expert user has access to a SageMaker notebook and the AWS Management Console. For more information, refer to Getting Started with the AWS Management Console.
The domain expert has access to the dataset they want to use to train their anomaly detection model in CSV or another format that SageMaker Canvas supports.

Create the model using SageMaker

The model creation process follows the standard steps to create a regression model in SageMaker Canvas. For more information, refer to Getting started with using Amazon SageMaker Canvas.

First, the domain expert loads relevant data into SageMaker Canvas, such as a time series of measurements. For this post, we use a CSV file containing the (synthetically generated) measurements of an electrical motor. For details, refer to Import data into Canvas. The sample data used is available for download as a CSV.

Curate the data with SageMaker Canvas

After the data is loaded, the domain expert can use SageMaker Canvas to curate the data used in the final model. For this, the expert selects those columns that contain characteristic measurements for the problem in question. More precisely, the expert selects columns that are related to each other, for instance, by a physical relationship such as a pressure-temperature curve, and where a change in that relationship is a relevant anomaly for their use case. The anomaly detection model will learn the normal relationship between the selected columns and indicate when data doesn’t conform to it, such as an abnormally high motor temperature given the current load on the motor.

In practice, the domain expert needs to select a set of suitable input columns and a target column. The inputs are typically the collection of quantities (numeric or categorical) that determine a machine’s behavior, from demand settings, to load, speed, or ambient temperature. The output is typically a numeric quantity that indicates the performance of the machine’s operation, such as a temperature measuring energy dissipation or another performance metric changing when the machine runs under suboptimal conditions.

To illustrate the concept of what quantities to select for input and output, let’s consider a few examples:

For rotating equipment, such as the model we build in this post, typical inputs are the rotation speed, torque (current and history), and ambient temperature, and the targets are the resulting bearing or motor temperatures indicating good operational conditions of the rotations
For a wind turbine, typical inputs are the current and recent history of wind speed and rotor blade settings, and the target quantity is the produced power or rotational speed
For a chemical process, typical inputs are the percentage of different ingredients and the ambient temperature, and targets are the heat produced or the viscosity of the end product
For moving equipment such as sliding doors, typical inputs are the power input to the motors, and the target value is the speed or completion time for the movement
For an HVAC system, typical inputs are the achieved temperature difference and load settings, and the target quantity is the energy consumption measured

Ultimately, the right inputs and targets for a given equipment will depend on the use case and anomalous behavior to detect, and are best known to a domain expert who is familiar with the intricacies of the specific dataset.

In most cases, selecting suitable input and target quantities means selecting the right columns only and marking the target column (for this example, bearing_temperature). However, a domain expert can also use the no-code features of SageMaker Canvas to transform columns and refine or aggregate the data. For instance, you can extract or filter specific dates or timestamps from the data that are not relevant. SageMaker Canvas supports this process, showing statistics on the quantities selected, allowing you to understand if a quantity has outliers and spread that may affect the results of the model.

Train, tune, and evaluate the model

After the domain expert has selected suitable columns in the dataset, they can train the model to learn the relationship between the inputs and outputs. More precisely, the model will learn to predict the target value selected from the inputs.

Normally, you can use the SageMaker Canvas Model Preview option. This provide a quick indication of the model quality to expect, and allows you to investigate the effect that different inputs have on the output metric. For instance, in the following screenshot, the model is most affected by the motor_speed and ambient_temperature metrics when predicting bearing_temperature. This is sensible, because these temperatures are closely related. At the same time, additional friction or other means of energy loss are likely to affect this.

For the model quality, the RMSE of the model is an indicator how well the model was able to learn the normal behavior in the training data and reproduce the relationships between the input and output measures. For instance, in the following model, the model should be able to predict the correct motor_bearing temperature within 3.67 degrees Celsius, so we can consider a deviation of the real temperature from a model prediction that is larger than, for example, 7.4 degrees as an anomaly. The real threshold that you would use, however, will depend on the sensitivity required in the deployment scenario.

Finally, after the model evaluation and tuning is finished, you can start the complete model training that will create the model to use for inference.

Deploy the model

Although SageMaker Canvas can use a model for inference, productive deployment for anomaly detection requires you to deploy the model outside of SageMaker Canvas. More precisely, we need to deploy the model as an endpoint.

In this post and for simplicity, we deploy the model as an endpoint from SageMaker Canvas directly. For instructions, refer to Deploy your models to an endpoint. Make sure to take note of the deployment name and consider the pricing of the instance type you deploy to (for this post, we use ml.m5.large). SageMaker Canvas will then create a model endpoint that can be called to obtain predictions.

In industrial settings, a model needs to undergo thorough testing before it can be deployed. For this, the domain expert will not deploy it, but instead share the model to the SageMaker Model Registry. Here, an MLOps operations expert can take over. Typically, that expert will test the model endpoint, evaluate the size of computing equipment required for the target application, and determine most cost-efficient deployment, such as deployment for serverless inference or batch inference. These steps are normally automated (for instance, using Amazon Sagemaker Pipelines or the Amazon SDK).

Use the model for anomaly detection

In the previous step, we created a model deployment in SageMaker Canvas, called canvas-sample-anomaly-model. We can use it to obtain predictions of a bearing_temperature value based on the other columns in the dataset. Now, we want to use this endpoint to detect anomalies.

To identify anomalous data, our model will use the prediction model endpoint to get the expected value of the target metric and then compare the predicted value against the actual value in the data. The predicted value indicates the expected value for our target metric based on the training data. The difference of this value therefore is a metric for the abnormality of the actual data observed. We can use the following code:

# We are using pandas dataframes for data handling
import pandas as pd 
import boto3,json
sm_runtime_client = boto3.client('sagemaker-runtime')

# Configuration of the actual model invocation
endpoint_name="canvas-sample-anomaly-model"
# Name of the column in the input data to compare with predictions
TARGET_COL='bearing_temperature' 

def do_inference(data, endpoint_name):
    # Example Code provided by Sagemaker Canvas
    body = data.to_csv(header=False, index=True).encode("utf-8")
    response = sm_runtime_client.invoke_endpoint(Body = body,
                              EndpointName = endpoint_name,
                              ContentType = "text/csv",
                              Accept = "application/json",
                              )
    return json.loads(response["Body"].read())


def input_transformer(input_data, drop_cols = [ TARGET_COL ] ):
    # Transform the input: Drop the Target column
    return input_data.drop(drop_cols,axis =1 )

def output_transformer(input_data,response):
    # Take the initial input data and compare it to the response of the prediction model
    scored = input_data.copy()
    scored.loc[ input_data.index,'prediction_'+TARGET_COL ] = pd.DataFrame(
response[ 'predictions' ],
index = input_data.index 
)['score']
    scored.loc[ input_data.index,'error' ] = (
scored[ TARGET_COL ]-scored[ 'prediction_'+TARGET_COL ]
).abs()
    return scored

# Run the inference
raw_input = pd.read_csv(MYFILE) # Read my data for inference
to_score = input_transformer(raw_input) # Prepare the data
predictions = do_inference(to_score, endpoint_name) # create predictions
results = output_transformer(to_score,predictions) # compare predictions & actuals

The preceding code performs the following actions:

The input data is filtered down to the right features (function “input_transformer“).
The SageMaker model endpoint is invoked with the filtered data (function “do_inference“), where we handle input and output formatting according to the sample code provided when opening the details page of our deployment in SageMaker Canvas.
The result of the invocation is joined to the original input data and the difference is stored in the error column (function “output_transform“).

Find anomalies and evaluate anomalous events

In a typical setup, the code to obtain anomalies is run in a Lambda function. The Lambda function can be called from an application or Amazon API Gateway. The main function returns an anomaly score for each row of the input data—in this case, a time series of an anomaly score.

For testing, we can also run the code in a SageMaker notebook. The following graphs show the inputs and output of our model when using the sample data. Peaks in the deviation between predicted and actual values (anomaly score, shown in the lower graph) indicate anomalies. For instance, in the graph, we can see three distinct peaks where the anomaly score (difference between expected and real temperature) surpasses 7 degrees Celsius: the first after a long idle time, the second at a steep drop of bearing_temperature, and the last where bearing_temperature is high compared to motor_speed.

In many cases, knowing the time series of the anomaly score is already sufficient; you can set up a threshold for when to warn of a significant anomaly based on the need for model sensitivity. The current score then indicates that a machine has an abnormal state that needs investigation. For instance, for our model, the absolute value of the anomaly score is distributed as shown in the following graph. This confirms that most anomaly scores are below the (2xRMS=)8 degrees found during training for the model as the typical error. The graph can help you choose a threshold manually, such that the right percentage of the evaluated samples are marked as anomalies.

If the desired output are events of anomalies, then the anomaly scores provided by the model require refinement to be relevant for business use. For this, the ML expert will typically add postprocessing to remove noise or large peaks on the anomaly score, such as adding a rolling mean. In addition, the expert will typically evaluate the anomaly score by a logic similar to raising an Amazon CloudWatch alarm, such as monitoring for the breach of a threshold over a specific duration. For more information about setting up alarms, refer to Using Amazon CloudWatch alarms. Running these evaluations in the Lambda function allows you to send warnings, for instance, by publishing a warning to an Amazon Simple Notification Service (Amazon SNS) topic.

Clean up

After you have finished using this solution, you should clean up to avoid unnecessary cost:

In SageMaker Canvas, find your model endpoint deployment and delete it.
Log out of SageMaker Canvas to avoid charges for it running idly.

Summary

In this post, we showed how a domain expert can evaluate input data and create an ML model using SageMaker Canvas without the need to write code. Then we showed how to use this model to perform real-time anomaly detection using SageMaker and Lambda through a simple workflow. This combination empowers domain experts to use their knowledge to create powerful ML models without additional training in data science, and enables MLOps experts to use these models and make them available for inference flexibly and efficiently.

A 2-month free tier is available for SageMaker Canvas, and afterwards you only pay for what you use. Start experimenting today and add ML to make the most of your data.

About the author

Helge Aufderheide is an enthusiast of making data usable in the real world with a strong focus on Automation, Analytics and Machine Learning in Industrial Applications, such as Manufacturing and Mobility.

Do large language models understand the world?

February 15, 2024

by Amazon AWS

In addition to its practical implications, recent work on “meaning representations” could shed light on some old philosophical questions.Read More

USC SURE student develops prototype algorithm to help automate spacecraft docking

February 14, 2024

by Amazon AWS

Derek Chibuzor utilized his SURE experience to gain “exposure to an aerospace research project in a professional research environment.”Read More

Enhance Amazon Connect and Lex with generative AI capabilities

February 14, 2024

by Hamza Nadeem Amazon AWS

Effective self-service options are becoming increasingly critical for contact centers, but implementing them well presents unique challenges.

Amazon Lex provides your Amazon Connect contact center with chatbot functionalities such as automatic speech recognition (ASR) and natural language understanding (NLU) capabilities through voice and text channels. The bot takes natural language speech or text input, recognizes the intent behind the input, and fulfills the user’s intent by invoking the appropriate response.

Callers can have diverse accents, pronunciation, and grammar. Combined with background noise, this can make it challenging for speech recognition to accurately understand statements. For example, “I want to track my order” may be misrecognized as “I want to truck my holder.” Failed intents like these frustrate customers who have to repeat themselves, get routed incorrectly, or are escalated to live agents—costing businesses more.

Amazon Bedrock democratizes foundational model (FM) access for developers to effortlessly build and scale generative AI-based applications for the modern contact center. FMs delivered by Amazon Bedrock, such as Amazon Titan and Anthropic Claude, are pretrained on internet-scale datasets that gives them strong NLU capabilities such as sentence classification, question and answer, and enhanced semantic understanding despite speech recognition errors.

In this post, we explore a solution that uses FMs delivered by Amazon Bedrock to enhance intent recognition of Amazon Lex integrated with Amazon Connect, ultimately delivering an improved self-service experience for your customers.

Overview of solution

The solution uses Amazon Connect, Amazon Lex , AWS Lambda, and Amazon Bedrock in the following steps:

An Amazon Connect contact flow integrates with an Amazon Lex bot via the GetCustomerInput block.
When the bot fails to recognize the caller’s intent and defaults to the fallback intent, a Lambda function is triggered.
The Lambda function takes the transcript of the customer utterance and passes it to a foundation model in Amazon Bedrock
Using its advanced natural language capabilities, the model determines the caller’s intent.
The Lambda function then directs the bot to route the call to the correct intent for fulfillment.

By using Amazon Bedrock foundation models, the solution enables the Amazon Lex bot to understand intents despite speech recognition errors. This results in smooth routing and fulfillment, preventing escalations to agents and frustrating repetitions for callers.

The following diagram illustrates the solution architecture and workflow.

In the following sections, we look at the key components of the solution in more detail.

Lambda functions and the LangChain Framework

When the Amazon Lex bot invokes the Lambda function, it sends an event message that contains bot information and the transcription of the utterance from the caller. Using this event message, the Lambda function dynamically retrieves the bot’s configured intents, intent description, and intent utterances and builds a prompt using LangChain, which is an open source machine learning (ML) framework that enables developers to integrate large language models (LLMs), data sources, and applications.

An Amazon Bedrock foundation model is then invoked using the prompt and a response is received with the predicted intent and confidence level. If the confidence level is greater than a set threshold, for example 80%, the function returns the identified intent to Amazon Lex with an action to delegate. If the confidence level is below the threshold, it defaults back to the default FallbackIntent and an action to close it.

In-context learning, prompt engineering, and model invocation

We use in-context learning to be able to use a foundation model to accomplish this task. In-context learning is the ability for LLMs to learn the task using just what’s in the prompt without being pre-trained or fine-tuned for the particular task.

In the prompt, we first provide the instruction detailing what needs to be done. Then, the Lambda function dynamically retrieves and injects the Amazon Lex bot’s configured intents, intent descriptions, and intent utterances into the prompt. Finally, we provide it instructions on how to output its thinking and final result.

The following prompt template was tested on text generation models Anthropic Claude Instant v1.2 and Anthropic Claude v2. We use XML tags to better improve the performance of the model. We also add room for the model to think before identifying the final intent to better improve its reasoning for choosing the right intent. The {intent_block} contains the intent IDs, intent descriptions, and intent utterances. The {input} block contains the transcribed utterance from the caller. Three backticks (“`) are added at the end to help the model output a code block more consistently. A <STOP> sequence is added to stop it from generating further.

"""
Human: You are a call center agent. You try to understand the intent given an utterance from the caller.

The available intents are as follows, the intent of the caller is highly likely to be one of these.
<intents>
{intents_block} </intents>
The output format is:
<thinking>
</thinking>

<output>
{{
     "intent_id": intent_id,
     "confidence": confidence
}}
</output><STOP>

For the given utterance, you try to categorize the intent of the caller to be one of the intents in <intents></intents> tags.
If it does not match any intents or the utterance is blank, respond with FALLBCKINT and confidence of 1.0.
Respond with the intent name and confidence between 0.0 and 1.0.
Put your thinking in <thinking></thinking> tags before deciding on the intent.

Utterance: {input}

Assistant: ```"""

After the model has been invoked, we receive the following response from the foundation model:

<thinking>
The given utterance is asking for checking where their shipment is. It matches the intent order status.
</thinking>

{
    "intent": "ORDERSTATUSID",
    "confidence": 1.0
}
```

Filter available intents based on contact flow session attributes

When using the solution as part of an Amazon Connect contact flow, you can further enhance the ability of the LLM to identify the correct intent by specifying the session attribute available_intents in the “Get customer input” block with a comma-separated list of intents, as shown in the following screenshot. By doing so, the Lambda function will only include these specified intents as part of the prompt to the LLM, reducing the number of intents that the LLM has to reason through. If the available_intents session attribute is not specified, all intents in the Amazon Lex bot will be used by default.

Lambda function response to Amazon Lex

After the LLM has determined the intent, the Lambda function responds in the specific format required by Amazon Lex to process the response.

If a matching intent is found above the confidence threshold, it returns a dialog action type Delegate to instruct Amazon Lex to use the selected intent and subsequently return the completed intent back to Amazon Connect. The response output is as follows:

{
    "sessionState": {
        "dialogAction": {
        "type": "Delegate"
        },
        "intent": {
        "name": intent,
        "state": "InProgress",
        }
    }
}

If the confidence level is below the threshold or an intent was not recognized, a dialog action type Close is returned to instruct Amazon Lex to close the FallbackIntent, and return the control back to Amazon Connect. The response output is as follows:

{
    "sessionState": {
        "dialogAction": {
        "type": "Close"
        },
        "intent": {
        "name": intent,
        "state": "Fulfilled",
        }
    }
}

The complete source code for this sample is available in GitHub.

Prerequisites

Before you get started, make sure you have the following prerequisites:

A basic understanding of the Amazon Connect contact center solution using Amazon Lex and Amazon Bedrock
An AWS account with an AWS Identity and Access Management (IAM) user with permissions to deploy the CloudFormation template
The AWS Command Line Interface (AWS CLI) installed and configured for use
Docker installed and running for building the Lambda container image
Python 3.9 or later, to package Python code for the Lambda function
jq installed

Implement the solution

To implement the solution, complete the following steps:

Clone the repository

git clone https://github.com/aws-samples/amazon-connect-with-amazon-lex-genai-capabilities
cd amazon-connect-with-amazon-lex-genai-capabilities

Run the following command to initialize the environment and create an Amazon Elastic Container Registry (Amazon ECR) repository for our Lambda function’s image. Provide the AWS Region and ECR repository name that you would like to create.
```
bash ./scripts/build.sh region-name repository-name
```
Update the ParameterValue fields in the scripts/parameters.json file:
- ParameterKey ("AmazonECRImageUri") – Enter the repository URL from the previous step.
- ParameterKey ("AmazonConnectName") – Enter a unique name.
- ParameterKey ("AmazonLexBotName") – Enter a unique name.
- ParameterKey ("AmazonLexBotAliasName") – The default is “prodversion”; you can change it if needed.
- ParameterKey ("LoggingLevel") – The default is “INFO”; you can change it if required. Valid values are DEBUG, WARN, and ERROR.
- ParameterKey ("ModelID") – The default is “anthropic.claude-instant-v1”; you can change it if you need to use a different model.
- ParameterKey ("AmazonConnectName") – The default is “0.75”; you can change it if you need to update the confidence score.
Run the command to generate the CloudFormation stack and deploy the resources:
```
bash ./scripts/deploy.sh region cfn-stack-name
```

If you don’t want to build the contact flow from scratch in Amazon Connect, you can import the sample flow provided with this repository filelocation: /contactflowsample/samplecontactflow.json.

Log in to your Amazon Connect instance. The account must be assigned a security profile that includes edit permissions for flows.
On the Amazon Connect console, in the navigation pane, under Routing, choose Contact flows.
Create a new flow of the same type as the one you are importing.
Choose Save and Import flow.
Select the file to import and choose Import.

When the flow is imported into an existing flow, the name of the existing flow is updated, too.

Review and update any resolved or unresolved references as necessary.
To save the imported flow, choose Save. To publish, choose Save and Publish.
After you upload the contact flow, update the following configurations:
- Update the GetCustomerInput blocks with the correct Amazon Lex bot name and version.
- Under Manage Phone Number, update the number with the contact flow or IVR imported earlier.

Verify the configuration

Verify that the Lambda function created with the CloudFormation stack has an IAM role with permissions to retrieve bots and intent information from Amazon Lex (list and read permissions), and appropriate Amazon Bedrock permissions (list and read permissions).

In your Amazon Lex bot, for your configured alias and language, verify that the Lambda function was set up correctly. For the FallBackIntent, confirm that Fulfillmentis set to Active to be able to run the function whenever the FallBackIntent is triggered.

At this point, your Amazon Lex bot will automatically run the Lambda function and the solution should work seamlessly.

Test the solution

Let’s look at a sample intent, description, and utterance configuration in Amazon Lex and see how well the LLM performs with sample inputs that contains typos, grammar mistakes, and even a different language.

The following figure shows screenshots of our example. The left side shows the intent name, its description, and a single-word sample utterance. Without much configuration on Amazon Lex, the LLM is able to predict the correct intent (right side). In this test, we have a simple fulfillment message from the correct intent.

Clean up

To clean up your resources, run the following command to delete the ECR repository and CloudFormation stack:

bash ./scripts/cleanup.sh region repository-name cfn-stack-name

Conclusion

By using Amazon Lex enhanced with LLMs delivered by Amazon Bedrock, you can improve the intent recognition performance of your bots. This provides a seamless self-service experience for a diverse set of customers, bridging the gap between accents and unique speech characteristics, and ultimately enhancing customer satisfaction.

To dive deeper and learn more about generative AI, check out these additional resources:

For more information on how you can experiment with the generative AI-powered self-service solution, see Deploy self-service question answering with the QnABot on AWS solution powered by Amazon Lex with Amazon Kendra and large language models.

About the Authors

Hamza Nadeem is an Amazon Connect Specialist Solutions Architect at AWS, based in Toronto. He works with customers throughout Canada to modernize their Contact Centers and provide solutions to their unique customer engagement challenges and business requirements. In his spare time, Hamza enjoys traveling, soccer and trying new recipes with his wife.

Parag Srivastava is a Solutions Architect at Amazon Web Services (AWS), helping enterprise customers with successful cloud adoption and migration. During his professional career, he has been extensively involved in complex digital transformation projects. He is also passionate about building innovative solutions around geospatial aspects of addresses.

Ross Alas is a Solutions Architect at AWS based in Toronto, Canada. He helps customers innovate with AI/ML and Generative AI solutions that leads to real business outcomes. He has worked with a variety of customers from retail, financial services, technology, pharmaceutical, and others. In his spare time, he loves the outdoors and enjoying nature with his family.

Sangeetha Kamatkar is a Solutions Architect at Amazon Web Services (AWS), helping customers with successful cloud adoption and migration. She works with customers to craft highly scalable, flexible, and resilient cloud architectures that address customer business problems. In her spare time, she listens to music, watch movies and enjoy gardening during summer time.

Skeleton-based pose annotation labeling using Amazon SageMaker Ground Truth

February 14, 2024

by Arthur Putnam Amazon AWS

Pose estimation is a computer vision technique that detects a set of points on objects (such as people or vehicles) within images or videos. Pose estimation has real-world applications in sports, robotics, security, augmented reality, media and entertainment, medical applications, and more. Pose estimation models are trained on images or videos that are annotated with a consistent set of points (coordinates) defined by a rig. To train accurate pose estimation models, you first need to acquire a large dataset of annotated images; many datasets have tens or hundreds of thousands of annotated images and take significant resources to build. Labeling mistakes are important to identify and prevent because model performance for pose estimation models is heavily influenced by labeled data quality and data volume.

In this post, we show how you can use a custom labeling workflow in Amazon SageMaker Ground Truth specifically designed for keypoint labeling. This custom workflow helps streamline the labeling process and minimize labeling errors, thereby reducing the cost of obtaining high-quality pose labels.

Importance of high-quality data and reducing labeling errors

High-quality data is fundamental for training robust and reliable pose estimation models. The accuracy of these models is directly tied to the correctness and precision of the labels assigned to each pose keypoint, which, in turn, depends on the effectiveness of the annotation process. Additionally, having a substantial volume of diverse and well-annotated data ensures that the model can learn a broad range of poses, variations, and scenarios, leading to improved generalization and performance across different real-world applications. The acquisition of these large, annotated datasets involves human annotators who carefully label images with pose information. While labeling points of interest within the image, it’s useful to see the skeletal structure of the object while labeling in order to provide visual guidance to the annotator. This is helpful for identifying labeling errors before they are incorporated into the dataset like left-right swaps or mislabels (such as marking a foot as a shoulder). For example, a labeling error like the left-right swap made in the following example can easily be identified by the crossing of the skeleton rig lines and the mismatching of the colors. These visual cues help labelers recognize mistakes and will result in a cleaner set of labels.

Due to the manual nature of labeling, obtaining large and accurate labeled datasets can be cost-prohibitive and even more so with an inefficient labeling system. Therefore, labeling efficiency and accuracy are critical when designing your labeling workflow. In this post, we demonstrate how to use a custom SageMaker Ground Truth labeling workflow to quickly and accurately annotate images, reducing the burden of developing large datasets for pose estimation workflows.

Overview of solution

This solution provides an online web portal where the labeling workforce can use a web browser to log in, access labeling jobs, and annotate images using the crowd-2d-skeleton user interface (UI), a custom UI designed for keypoint and pose labeling using SageMaker Ground Truth. The annotations or labels created by the labeling workforce are then exported to an Amazon Simple Storage Service (Amazon S3) bucket, where they can be used for downstream processes like training deep learning computer vision models. This solution walks you through how to set up and deploy the necessary components to create a web portal as well as how to create labeling jobs for this labeling workflow.

The following is a diagram of the overall architecture.

This architecture is comprised of several key components, each of which we explain in more detail in the following sections. This architecture provides the labeling workforce with an online web portal hosted by SageMaker Ground Truth. This portal allows each labeler to log in and see their labeling jobs. After they’ve logged in, the labeler can select a labeling job and begin annotating images using the custom UI hosted by Amazon CloudFront. We use AWS Lambda functions for pre-annotation and post-annotation data processing.

The following screenshot is an example of the UI.

The labeler can mark specific keypoints on the image using the UI. The lines between keypoints will be automatically drawn for the user based on a skeleton rig definition that the UI uses. The UI allows many customizations, such as the following:

Custom keypoint names
Configurable keypoint colors
Configurable rig line colors
Configurable skeleton and rig structures

Each of these are targeted features to improve the ease and flexibility of labeling. Specific UI customization details can be found in the GitHub repo and are summarized later in this post. Note that in this post, we use human pose estimation as a baseline task, but you can expand it to labeling object pose with a pre-defined rig for other objects as well, such as animals or vehicles. In the following example, we show how this can be applied to label the points of a box truck.

SageMaker Ground Truth

In this solution, we use SageMaker Ground Truth to provide the labeling workforce with an online portal and a way to manage labeling jobs. This post assumes that you’re familiar with SageMaker Ground Truth. For more information, refer to Amazon SageMaker Ground Truth.

CloudFront distribution

For this solution, the labeling UI requires a custom-built JavaScript component called the crowd-2d-skeleton component. This component can be found on GitHub as part of Amazon’s open source initiatives. The CloudFront distribution will be used to host the crowd-2d-skeleton.js, which is needed by the SageMaker Ground Truth UI. The CloudFront distribution will be assigned an origin access identity, which will allow the CloudFront distribution to access the crowd-2d-skeleton.js residing in the S3 bucket. The S3 bucket will remain private and no other objects in this bucket will be available via the CloudFront distribution due to restrictions we place on the origin access identity through a bucket policy. This is a recommended practice for following the least-privilege principle.

Amazon S3 bucket

We use the S3 bucket to store the SageMaker Ground Truth input and output manifest files, the custom UI template, images for the labeling jobs, and the JavaScript code needed for the custom UI. This bucket will be private and not accessible to the public. The bucket will also have a bucket policy that restricts the CloudFront distribution to only being able to access the JavaScript code needed for the UI. This prevents the CloudFront distribution from hosting any other object in the S3 bucket.

Pre-annotation Lambda function

SageMaker Ground Truth labeling jobs typically use an input manifest file, which is in JSON Lines format. This input manifest file contains metadata for a labeling job, acts as a reference to the data that needs to be labeled, and helps configure how the data should be presented to the annotators. The pre-annotation Lambda function processes items from the input manifest file before the manifest data is input to the custom UI template. This is where any formatting or special modifications to the items can be done before presenting the data to the annotators in the UI. For more information on pre-annotation Lambda functions, see Pre-annotation Lambda.

Post-annotation Lambda function

Similar to the pre-annotation Lambda function, the post-annotation function handles additional data processing you may want to do after all the labelers have finished labeling but before writing the final annotation output results. This processing is done by a Lambda function, which is responsible for formatting the data for the labeling job output results. In this solution, we are simply using it to return the data in our desired output format. For more information on post-annotation Lambda functions, see Post-annotation Lambda.

Post-annotation Lambda function role

We use an AWS Identity and Access Management (IAM) role to give the post-annotation Lambda function access to the S3 bucket. This is needed to read the annotation results and make any modifications before writing out the final results to the output manifest file.

SageMaker Ground Truth role

We use this IAM role to give the SageMaker Ground Truth labeling job the ability to invoke the Lambda functions and to read the images, manifest files, and custom UI template in the S3 bucket.

Prerequisites

For this walkthrough, you should have the following prerequisites:

Familiarity with SageMaker Ground Truth labeling jobs and the workforce portal
Familiarity with the AWS Cloud Development Kit (AWS CDK)
An AWS account with the permissions to deploy the AWS CDK stack
A SageMaker Ground Truth private workforce
Python 3.9+ installed
The AWS CDK installed

For this solution, we use the AWS CDK to deploy the architecture. Then we create a sample labeling job, use the annotation portal to label the images in the labeling job, and examine the labeling results.

Create the AWS CDK stack

After you complete all the prerequisites, you’re ready to deploy the solution.

Set up your resources

Complete the following steps to set up your resources:

Download the example stack from the GitHub repo.
Use the cd command to change into the repository.
Create your Python environment and install required packages (see the repository README.md for more details).
With your Python environment activated, run the following command:
```
cdk synth
```
Run the following command to deploy the AWS CDK:
```
cdk deploy
```
Run the following command to run the post-deployment script:
```
python scripts/post_deployment_script.py
```

Create a labeling job

After you have set up your resources, you’re ready to create a labeling job. For the purposes of this post, we create a labeling job using the example scripts and images provided in the repository.

CD into the scripts directory in the repository.
Download the example images from the internet by running the following code:
```
python scripts/download_example_images.py
```

This script downloads a set of 10 images, which we use in our example labeling job. We review how to use your own custom input data later in this post.

Create a labeling job by running to following code:

python scripts/create_example_labeling_job.py <Labeling Workforce ARN>

This script takes a SageMaker Ground Truth private workforce ARN as an argument, which should be the ARN for a workforce you have in the same account you deployed this architecture into. The script will create the input manifest file for our labeling job, upload it to Amazon S3, and create a SageMaker Ground Truth custom labeling job. We take a deeper dive into the details of this script later in this post.

Label the dataset

After you have launched the example labeling job, it will appear on the SageMaker console as well as the workforce portal.

In the workforce portal, select the labeling job and choose Start working.

You’ll be presented with an image from the example dataset. At this point, you can use the custom crowd-2d-skeleton UI to annotate the images. You can familiarize yourself with the crowd-2d-skeleton UI by referring to User Interface Overview. We use the rig definition from the COCO keypoint detection dataset challenge as the human pose rig. To reiterate, you can customize this without our custom UI component to remove or add points based on your requirements.

When you’re finished annotating an image, choose Submit. This will take you to the next image in the dataset until all images are labeled.

Access the labeling results

When you have finished labeling all the images in the labeling job, SageMaker Ground Truth will invoke the post-annotation Lambda function and produce an output.manifest file containing all of the annotations. This output.manifest will be stored in the S3 bucket. In our case, the location of the output manifest should follow the S3 URI path s3://<bucket name> /labeling_jobs/output/<labeling job name>/manifests/output/output.manifest. The output.manifest file is a JSON Lines file, where each line corresponds to a single image and its annotations from the labeling workforce. Each JSON Lines item is a JSON object with many fields. The field we are interested in is called label-results. The value of this field is an object containing the following fields:

dataset_object_id – The ID or index of the input manifest item
data_object_s3_uri – The image’s Amazon S3 URI
image_file_name – The image’s file name
image_s3_location – The image’s Amazon S3 URL
original_annotations – The original annotations (only set and used if you are using a pre-annotation workflow)
updated_annotations – The annotations for the image
worker_id – The workforce worker who made the annotations
no_changes_needed – Whether the no changes needed check box was selected
was_modified – Whether the annotation data differs from the original input data
total_time_in_seconds – The time it took the workforce worker to annotation the image

With these fields, you can access your annotation results for each image and do calculations like average time to label an image.

Create your own labeling jobs

Now that we have created an example labeling job and you understand the overall process, we walk you through the code responsible for creating the manifest file and launching the labeling job. We focus on the key parts of the script that you may want to modify to launch your own labeling jobs.

We cover snippets of code from the create_example_labeling_job.py script located in the GitHub repository. The script starts by setting up variables that are used later in the script. Some of the variables are hard-coded for simplicity, whereas others, which are stack dependent, will be imported dynamically at runtime by fetching the values created from our AWS CDK stack.

# Setup/get variables values from our CDK stack
s3_upload_prefix = "labeling_jobs"
image_dir = 'scripts/images'
manifest_file_name = "example_manifest.txt"
s3_bucket_name = read_ssm_parameter('/crowd_2d_skeleton_example_stack/bucket_name')
pre_annotation_lambda_arn = read_ssm_parameter('/crowd_2d_skeleton_example_stack/pre_annotation_lambda_arn')
post_annotation_lambda_arn = read_ssm_parameter('/crowd_2d_skeleton_example_stack/post_annotation_lambda_arn')
ground_truth_role_arn = read_ssm_parameter('/crowd_2d_skeleton_example_stack/sagemaker_ground_truth_role')
ui_template_s3_uri = f"s3://{s3_bucket_name}/infrastructure/ground_truth_templates/crowd_2d_skeleton_template.html"
s3_image_upload_prefix = f'{s3_upload_prefix}/images'
s3_manifest_upload_prefix = f'{s3_upload_prefix}/manifests'
s3_output_prefix = f'{s3_upload_prefix}/output'

The first key section in this script is the creation of the manifest file. Recall that the manifest file is a JSON lines file that contains the details for a SageMaker Ground Truth labeling job. Each JSON Lines object represents one item (for example, an image) that needs to be labeled. For this workflow, the object should contain the following fields:

source-ref – The Amazon S3 URI to the image you wish to label.
annotations – A list of annotation objects, which is used for pre-annotating workflows. See the crowd-2d-skeleton documentation for more details on the expected values.

The script creates a manifest line for each image in the image directory using the following section of code:

# For each image in the image directory lets create a manifest line
manifest_items = []
for filename in os.listdir(image_dir):
    if filename.endswith('.jpg') or filename.endswith('.png'):
        img_path = os.path.join(
            image_dir,
            filename
        )
        object_name = os.path.join(
            s3_image_upload_prefix,
            filename
        ).replace("\", "/")

        # upload to s3_bucket
        s3_client.upload_file(img_path, s3_bucket_name, object_name)
f
        # add it to manifest file
        manifest_items.append({
            "source-ref": f's3://{s3_bucket_name}/{object_name}',
            "annotations": [],
        })

If you want to use different images or point to a different image directory, you can modify that section of the code. Additionally, if you’re using a pre-annotation workflow, you can update the annotations array with a JSON string consisting of the array and all its annotation objects. The details of the format of this array are documented in the crowd-2d-skeleton documentation.

With the manifest line items now created, you can create and upload the manifest file to the S3 bucket you created earlier:

# Create Manifest file
manifest_file_contents = "n".join([json.dumps(mi) for mi in manifest_items])
with open(manifest_file_name, "w") as file_handle:
    file_handle.write(manifest_file_contents)

# Upload manifest file
object_name = os.path.join(
    s3_manifest_upload_prefix,
    manifest_file_name
).replace("\", "/")
s3_client.upload_file(manifest_file_name, s3_bucket_name, object_name)

Now that you have created a manifest file containing the images you want to label, you can create a labeling job. You can create the labeling job programmatically using the AWS SDK for Python (Boto3). The code to create a labeling job is as follows:

# Create labeling job
client = boto3.client("sagemaker")
now = int(round(datetime.now().timestamp()))
response = client.create_labeling_job(
    LabelingJobName=f"crowd-2d-skeleton-example-{now}",
    LabelAttributeName="label-results",
    InputConfig={
        "DataSource": {
            "S3DataSource": {"ManifestS3Uri": f's3://{s3_bucket_name}/{object_name}'},
        },
        "DataAttributes": {},
    },
    OutputConfig={
        "S3OutputPath": f"s3://{s3_bucket_name}/{s3_output_prefix}/",
    },
    RoleArn=ground_truth_role_arn,
    HumanTaskConfig={
        "WorkteamArn": workteam_arn,
        "UiConfig": {"UiTemplateS3Uri": ui_template_s3_uri},
        "PreHumanTaskLambdaArn": pre_annotation_lambda_arn,
        "TaskKeywords": ["example"],
        "TaskTitle": f"Crowd 2D Component Example {now}",
        "TaskDescription": "Crowd 2D Component Example",
        "NumberOfHumanWorkersPerDataObject": 1,
        "TaskTimeLimitInSeconds": 28800,
        "TaskAvailabilityLifetimeInSeconds": 2592000,
        "MaxConcurrentTaskCount": 123,
        "AnnotationConsolidationConfig": {
            "AnnotationConsolidationLambdaArn": post_annotation_lambda_arn
        },
    },
)
print(response)

The aspects of this code you may want to modify are LabelingJobName, TaskTitle, and TaskDescription. The LabelingJobName is the unique name of the labeling job that SageMaker will use to reference your job. This is also the name that will appear on the SageMaker console. TaskTitle serves a similar purpose, but doesn’t need to be unique and will be the name of the job that appears in the workforce portal. You may want to make these more specific to what you are labeling or what the labeling job is for. Lastly, we have the TaskDescription field. This field appears in the workforce portal to provide extra context to the labelers as to what the task is, such as instructions and guidance for the task. For more information on these fields as well as the others, refer to the create_labeling_job documentation.

Make adjustments to the UI

In this section, we go over some of the ways you can customize the UI. The following is a list of the most common potential customizations to the UI in order to adjust it to your modeling task:

You can define which keypoints can be labeled. This includes the name of the keypoint and its color.
You can change the structure of the skeleton (which keypoints are connected).
You can change the line colors for specific lines between specific keypoints.

All of these UI customizations are configurable through arguments passed into the crowd-2d-skeleton component, which is the JavaScript component used in this custom workflow template. In this template, you will find the usage of the crowd-2d-skeleton component. A simplified version is shown in the following code:

<crowd-2d-skeleton
        imgSrc="{{ task.input.image_s3_uri | grant_read_access }}"
        keypointClasses='<keypoint classes>'
        skeletonRig='<skeleton rig definition>'
        skeletonBoundingBox='<skeleton bounding box size>'
        initialValues="{{ task.input.initial_values }}"
>

In the preceding code example, you can see the following attributes on the component: imgSrc, keypointClasses, skeletonRig, skeletonBoundingBox, and intialValues. We describe each attribute’s purpose in the following sections, but customizing the UI is as straightforward as changing the values for these attributes, saving the template, and rerunning the post_deployment_script.py we used previously.

imgSrc attribute

The imgSrc attribute controls which image to show in the UI when labeling. Usually, a different image is used for each manifest line item, so this attribute is often populated dynamically using the built-in Liquid templating language. You can see in the previous code example that the attribute value is set to {{ task.input.image_s3_uri | grant_read_access }}, which is Liquid template variable that will be replaced with the actual image_s3_uri value when the template is being rendered. The rendering process starts when the user opens an image for annotation. This process grabs a line item from the input manifest file and sends it to the pre-annotation Lambda function as an event.dataObject. The pre-annotation function takes take the information it needs from the line item and returns a taskInput dictionary, which is then passed to the Liquid rendering engine, which will replace any Liquid variables in your template. For example, let’s say you have a manifest file with the following line:

{"source-ref": "s3://my-bucket/exmaple.jpg", "annotations": []}

This data would be passed to the pre-annotation function. The following code shows how the function extracts the values from the event object:

def lambda_handler(event, context):
    print("Pre-Annotation Lambda Triggered")
    data_object = event["dataObject"]  # this comes directly from the manifest file
    annotations = data_object["annotations"]

    taskInput = {
        "image_s3_uri": data_object["source-ref"],
        "initial_values": json.dumps(annotations)
    }
    return {"taskInput": taskInput, "humanAnnotationRequired": "true"}

The object returned from the function in this case would look like the following code:

{
  "taskInput": {
    "image_s3_uri": "s3://my-bucket/exmaple.jpg",
    "annotations": "[]"
  },
  "humanAnnotationRequired": "true"
}

The returned data from the function is then available to the Liquid template engine, which replaces the template values in the template with the data values returned by the function. The result would be something like the following code:

<crowd-2d-skeleton
        imgSrc="s3://my-bucket/exmaple.jpg" <-- This was “injected” into template
        keypointClasses='<keypoint classes>'
        skeletonRig='<skeleton rig definition>'
        skeletonBoundingBox='<skeleton bounding box size>'
        initialValues="[]"
>

keypointClasses attribute

The keypointClasses attribute defines which keypoints will appear in the UI and be used by the annotators. This attribute takes a JSON string containing a list of objects. Each object represents a keypoint. Each keypoint object should contain the following fields:

id – A unique value to identify that keypoint.
color – The color of the keypoint represented as an HTML hex color.
label – The name or keypoint class.
x – This optional attribute is only needed if you want to use the draw skeleton functionality in the UI. The value for this attribute is the x position of the keypoint relative to the skeleton’s bounding box. This value is usually obtained by the Skeleton Rig Creator tool. If you are doing keypoint annotations and don’t need to draw an entire skeleton at once, you can set this value to 0.
y – This optional attribute is similar to x, but for the vertical dimension.

For more information on the keypointClasses attribute, see the keypointClasses documentation.

skeletonRig attribute

The skeletonRig attribute controls which keypoints should have lines drawn between them. This attribute takes a JSON string containing a list of keypoint label pairs. Each pair informs the UI which keypoints to draw lines between. For example, '[["left_ankle","left_knee"],["left_knee","left_hip"]]' informs the UI to draw lines between "left_ankle" and "left_knee" and draw lines between "left_knee" and "left_hip". This can be generated by the Skeleton Rig Creator tool.

skeletonBoundingBox attribute

The skeletonBoundingBox attribute is optional and only needed if you want to use the draw skeleton functionality in the UI. The draw skeleton functionality is the ability to annotate entire skeletons with a single annotation action. We don’t cover this feature in this post. The value for this attribute is the skeleton’s bounding box dimensions. This value is usually obtained by the Skeleton Rig Creator tool. If you are doing keypoint annotations and don’t need to draw an entire skeleton at once, you can set this value to null. It is recommended to use the Skeleton Rig Creator tool to get this value.

intialValues attribute

The initialValues attribute is used to pre-populate the UI with annotations obtained from another process (such as another labeling job or machine learning model). This is useful when doing adjustment or review jobs. The data for this field is usually populated dynamically in the same description for the imgSrc attribute. More details can be found in the crowd-2d-skeleton documentation.

Clean up

To avoid incurring future charges, you should delete the objects in your S3 bucket and delete your AWS CDK stack. You can delete your S3 objects via the Amazon SageMaker console or the AWS Command Line Interface (AWS CLI). After you have deleted all of the S3 objects in the bucket, you can destroy the AWS CDK by running the following code:

cdk destroy

This will remove the resources you created earlier.

Considerations

Additional steps maybe needed to productionize your workflow. Here are some considerations depending on your organizations risk profile:

Adding access and application logging
Adding a web application firewall (WAF)
Adjusting IAM permissions to follow least privilege

Conclusion

In this post, we shared the importance of labeling efficiency and accuracy in building pose estimation datasets. To help with both items, we showed how you can use SageMaker Ground Truth to build custom labeling workflows to support skeleton-based pose labeling tasks, aiming to enhance efficiency and precision during the labeling process. We showed how you can further extend the code and examples to various custom pose estimation labeling requirements.

We encourage you to use this solution for your labeling tasks and to engage with AWS for assistance or inquiries related to custom labeling workflows.

About the Authors

Arthur Putnam is a Full-Stack Data Scientist in AWS Professional Services. Arthur’s expertise is centered around developing and integrating front-end and back-end technologies into AI systems. Outside of work, Arthur enjoys exploring the latest advancements in technology, spending time with his family, and enjoying the outdoors.

Ben Fenker is a Senior Data Scientist in AWS Professional Services and has helped customers build and deploy ML solutions in industries ranging from sports to healthcare to manufacturing. He has a Ph.D. in physics from Texas A&M University and 6 years of industry experience. Ben enjoys baseball, reading, and raising his kids.

Jarvis Lee is a Senior Data Scientist with AWS Professional Services. He has been with AWS for over six years, working with customers on machine learning and computer vision problems. Outside of work, he enjoys riding bicycles.