Amazon AWS – Page 157

Amazon and University of Texas at Austin launch Science Hub

April 11, 2023

by Amazon AWS

The collaboration supports education, community outreach, and the application of academic research to video streaming and robotics.Read More

Amazon releases largest dataset for training “pick and place” robots

April 10, 2023

by Amazon AWS

Dataset of images collected in an industrial setting features more than 190,000 objects, orders of magnitude more than previous datasets.Read More

Inpaint images with Stable Diffusion using Amazon SageMaker JumpStart

April 10, 2023

by Vivek Madan Amazon AWS

In November 2022, we announced that AWS customers can generate images from text with Stable Diffusion models using Amazon SageMaker JumpStart. Today, we are excited to introduce a new feature that enables users to inpaint images with Stable Diffusion models. Inpainting refers to the process of replacing a portion of an image with another image based on a textual prompt. By providing the original image, a mask image that outlines the portion to be replaced, and a textual prompt, the Stable Diffusion model can produce a new image that replaces the masked area with the object, subject, or environment described in the textual prompt.

You can use inpainting for restoring degraded images or creating new images with novel subjects or styles in certain sections. Within the realm of architectural design, Stable Diffusion inpainting can be applied to repair incomplete or damaged areas of building blueprints, providing precise information for construction crews. In the case of clinical MRI imaging, the patient’s head must be restrained, which may lead to subpar results due to the cropping artifact causing data loss or reduced diagnostic accuracy. Image inpainting can effectively help mitigate these suboptimal outcomes.

In this post, we present a comprehensive guide on deploying and running inference using the Stable Diffusion inpainting model in two methods: through JumpStart’s user interface (UI) in Amazon SageMaker Studio, and programmatically through JumpStart APIs available in the SageMaker Python SDK.

Solution overview

The following images are examples of inpainting. The original images are on the left, the mask image is in the center, and the inpainted image generated by the model is on the right. For the first example, the model was provided with the original image, a mask image, and the textual prompt “a white cat, blue eyes, wearing a sweater, lying in park,” as well as the negative prompt “poorly drawn feet.” For the second example, the textual prompt was “A female model gracefully showcases a casual long dress featuring a blend of pink and blue hues,”

Running large models like Stable Diffusion requires custom inference scripts. You have to run end-to-end tests to make sure that the script, the model, and the desired instance work together efficiently. JumpStart simplifies this process by providing ready-to-use scripts that have been robustly tested. You can access these scripts with one click through the Studio UI or with very few lines of code through the JumpStart APIs.

The following sections guide you through deploying the model and running inference using either the Studio UI or the JumpStart APIs.

Note that by using this model, you agree to the CreativeML Open RAIL++-M License.

Access JumpStart through the Studio UI

In this section, we illustrate the deployment of JumpStart models using the Studio UI. The accompanying video demonstrates locating the pre-trained Stable Diffusion inpainting model on JumpStart and deploying it. The model page offers essential details about the model and its usage. To perform inference, we employ the ml.p3.2xlarge instance type, which delivers the required GPU acceleration for low-latency inference at an affordable price. After the SageMaker hosting instance is configured, choose Deploy. The endpoint will be operational and prepared to handle inference requests within approximately 10 minutes.

JumpStart provides a sample notebook that can help accelerate the time it takes to run inference on the newly created endpoint. To access the notebook in Studio, choose Open Notebook in the Use Endpoint from Studio section of the model endpoint page.

Use JumpStart programmatically with the SageMaker SDK

Utilizing the JumpStart UI enables you to deploy a pre-trained model interactively with only a few clicks. Alternatively, you can employ JumpStart models programmatically by using APIs integrated within the SageMaker Python SDK.

In this section, we choose an appropriate pre-trained model in JumpStart, deploy this model to a SageMaker endpoint, and perform inference on the deployed endpoint, all using the SageMaker Python SDK. The following examples contain code snippets. To access the complete code with all the steps included in this demonstration, refer to the Introduction to JumpStart Image editing – Stable Diffusion Inpainting example notebook.

Deploy the pre-trained model

SageMaker utilizes Docker containers for various build and runtime tasks. JumpStart utilizes the SageMaker Deep Learning Containers (DLCs) that are framework-specific. We first fetch any additional packages, as well as scripts to handle training and inference for the selected task. Then the pre-trained model artifacts are separately fetched with model_uris, which provides flexibility to the platform. This allows multiple pre-trained models to be used with a single inference script. The following code illustrates this process:

model_id, model_version = "model-inpainting-stabilityai-stable-diffusion-2-inpainting-fp16", "*"
# Retrieve the inference docker container uri
deploy_image_uri = image_uris.retrieve(
    region=None,
    framework=None,  # automatically inferred from model_id
    image_scope="inference",
    model_id=model_id,
    model_version=model_version,
    instance_type=inference_instance_type,
)
# Retrieve the inference script uri
deploy_source_uri = script_uris.retrieve(model_id=model_id, model_version=model_version, script_scope="inference")

base_model_uri = model_uris.retrieve(model_id=model_id, model_version=model_version, model_scope="inference")

Next, we provide those resources to a SageMaker model instance and deploy an endpoint:

# Create the SageMaker model instance
# Create the SageMaker model instance
model = Model(
    image_uri=deploy_image_uri,
    source_dir=deploy_source_uri,
    model_data=base_model_uri,
    entry_point="inference.py",  # entry point file in source_dir and present in deploy_source_uri
    role=aws_role,
    predictor_cls=Predictor,
    name=endpoint_name,
)

# deploy the Model - note that we need to pass the Predictor class when we deploy the model through the Model class,
# in order to run inference through the SageMaker API
base_model_predictor = model.deploy(
    initial_instance_count=1,
    instance_type=inference_instance_type,
    predictor_cls=Predictor,
    endpoint_name=endpoint_name,
)

After the model is deployed, we can obtain real-time predictions from it!

Input

The input is the base image, a mask image, and the prompt describing the subject, object, or environment to be substituted in the masked-out portion. Creating the perfect mask image for in-painting effects involves several best practices. Start with a specific prompt, and don’t hesitate to experiment with various Stable Diffusion settings to achieve desired outcomes. Utilize a mask image that closely resembles the image you aim to inpaint. This approach aids the inpainting algorithm in completing the missing sections of the image, resulting in a more natural appearance. High-quality images generally yield better results, so make sure your base and mask images are of good quality and resemble each other. Additionally, opt for a large and smooth mask image to preserve detail and minimize artifacts.

The endpoint accepts the base image and mask as raw RGB values or a base64 encoded image. The inference handler decodes the image based on content_type:

For content_type = “application/json”, the input payload must be a JSON dictionary with the raw RGB values, textual prompt, and other optional parameters
For content_type = “application/json;jpeg”, the input payload must be a JSON dictionary with the base64 encoded image, a textual prompt, and other optional parameters

Output

The endpoint can generate two types of output: a Base64-encoded RGB image or a JSON dictionary of the generated images. You can specify which output format you want by setting the accept header to "application/json" or "application/json;jpeg" for a JPEG image or base64, respectively.

For accept = “application/json”, the endpoint returns the a JSON dictionary with RGB values for the image
For accept = “application/json;jpeg”, the endpoint returns a JSON dictionary with the JPEG image as bytes encoded with base64.b64 encoding

Note that sending or receiving the payload with the raw RGB values may hit default limits for the input payload and the response size. Therefore, we recommend using the base64 encoded image by setting content_type = “application/json;jpeg” and accept = “application/json;jpeg”.

The following code is an example inference request:

content_type = "application/json;jpeg"

with open(input_img_file_name, "rb") as f:
    input_img_image_bytes = f.read()
with open(input_img_mask_file_name, "rb") as f:
    input_img_mask_image_bytes = f.read()

encoded_input_image = base64.b64encode(bytearray(input_img_image_bytes)).decode()
encoded_mask = base64.b64encode(bytearray(input_img_mask_image_bytes)).decode()


payload = {
    "prompt": "a white cat, blue eyes, wearing a sweater, lying in park",
    "image": encoded_input_image,
    "mask_image": encoded_mask,
    "num_inference_steps": 50,
    "guidance_scale": 7.5,
    "seed": 0,
    "negative_prompt": "poorly drawn feet",
}


accept = "application/json;jpeg"

def query(model_predictor, payload, content_type, accept):
    """Query the model predictor."""

    query_response = model_predictor.predict(
        payload,
        {
            "ContentType": content_type,
            "Accept": accept,
        },
    )
    return query_response

query_response = query(model_predictor, json.dumps(payload).encode("utf-8"), content_type, accept)
generated_images = parse_response(query_response)

Supported parameters

Stable Diffusion inpainting models support many parameters for image generation:

image – The original image.
mask – An image where the blacked-out portion remains unchanged during image generation and the white portion is replaced.
prompt – A prompt to guide the image generation. It can be a string or a list of strings.
num_inference_steps (optional) – The number of denoising steps during image generation. More steps lead to higher quality image. If specified, it must be a positive integer. Note that more inference steps will lead to a longer response time.
guidance_scale (optional) – A higher guidance scale results in an image more closely related to the prompt, at the expense of image quality. If specified, it must be a float. guidance_scale<=1 is ignored.
negative_prompt (optional) – This guides the image generation against this prompt. If specified, it must be a string or a list of strings and used with guidance_scale. If guidance_scale is disabled, this is also disabled. Moreover, if the prompt is a list of strings, then the negative_prompt must also be a list of strings.
seed (optional) – This fixes the randomized state for reproducibility. If specified, it must be an integer. Whenever you use the same prompt with the same seed, the resulting image will always be the same.
batch_size (optional) – The number of images to generate in a single forward pass. If using a smaller instance or generating many images, reduce batch_size to be a small number (1–2). The number of images = number of prompts*num_images_per_prompt.

Limitations and biases

Even though Stable Diffusion has impressive performance in inpainting, it suffers from several limitations and biases. These include but are not limited to:

The model may not generate accurate faces or limbs because the training data doesn’t include sufficient images with these features.
The model was trained on the LAION-5B dataset, which has adult content and may not be fit for product use without further considerations.
The model may not work well with non-English languages because the model was trained on English language text.
The model can’t generate good text within images.
Stable Diffusion inpainting typically works best with images of lower resolutions, such as 256×256 or 512×512 pixels. When working with high-resolution images (768×768 or higher), the method might struggle to maintain the desired level of quality and detail.
Although the use of a seed can help control reproducibility, Stable Diffusion inpainting may still produce varied results with slight alterations to the input or parameters. This might make it challenging to fine-tune the output for specific requirements.
The method might struggle with generating intricate textures and patterns, especially when they span large areas within the image or are essential for maintaining the overall coherence and quality of the inpainted region.

For more information on limitations and bias, refer to the Stable Diffusion Inpainting model card.

Inpainting solution with mask generated via a prompt

CLIPSeq is an advanced deep learning technique that utilizes the power of pre-trained CLIP (Contrastive Language-Image Pretraining) models to generate masks from input images. This approach provides an efficient way to create masks for tasks such as image segmentation, inpainting, and manipulation. CLIPSeq uses CLIP to generate a text description of the input image. The text description is then used to generate a mask that identifies the pixels in the image that are relevant to the text description. The mask can then be used to isolate the relevant parts of the image for further processing.

CLIPSeq has several advantages over other methods for generating masks from input images. First, it’s a more efficient method, because it doesn’t require the image to be processed by a separate image segmentation algorithm. Second, it’s more accurate, because it can generate masks that are more closely aligned with the text description of the image. Third, it’s more versatile, because you can use it to generate masks from a wide variety of images.

However, CLIPSeq also has some disadvantages. First, the technique may have limitations in terms of subject matter, because it relies on pre-trained CLIP models that may not encompass specific domains or areas of expertise. Second, it can be a sensitive method, because it’s susceptible to errors in the text description of the image.

For more information, refer to Virtual fashion styling with generative AI using Amazon SageMaker.

Clean up

After you’re done running the notebook, make sure to delete all resources created in the process to ensure that the billing is stopped. The code to clean up the endpoint is available in the associated notebook.

Conclusion

In this post, we showed how to deploy a pre-trained Stable Diffusion inpainting model using JumpStart. We showed code snippets in this post—the full code with all of the steps in this demo is available in the Introduction to JumpStart – Enhance image quality guided by prompt example notebook. Try out the solution on your own and send us your comments.

To learn more about the model and how it works, see the following resources:

To learn more about JumpStart, check out the following posts:

About the Authors

Dr. Vivek Madan is an Applied Scientist with the Amazon SageMaker JumpStart team. He got his PhD from University of Illinois at Urbana-Champaign and was a Post Doctoral Researcher at Georgia Tech. He is an active researcher in machine learning and algorithm design and has published papers in EMNLP, ICLR, COLT, FOCS, and SODA conferences.

Alfred Shen is a Senior AI/ML Specialist at AWS. He has been working in Silicon Valley, holding technical and managerial positions in diverse sectors including healthcare, finance, and high-tech. He is a dedicated applied AI/ML researcher, concentrating on CV, NLP, and multimodality. His work has been showcased in publications such as EMNLP, ICLR, and Public Health.

Deploy large language models on AWS Inferentia2 using large model inference containers

April 10, 2023

by Qingwei Li Amazon AWS

You don’t have to be an expert in machine learning (ML) to appreciate the value of large language models (LLMs). Better search results, image recognition for the visually impaired, creating novel designs from text, and intelligent chatbots are just some examples of how these models are facilitating various applications and tasks.

ML practitioners keep improving the accuracy and capabilities of these models. As a result, these models grow in size and generalize better, such as in the evolution of transformer models. We explained in a previous post how you can use Amazon SageMaker deep learning containers (DLCs) to deploy these kinds of large models using a GPU-based instance.

In this post, we take the same approach but host the model on AWS Inferentia2. We use the AWS Neuron software development kit (SDK) to access the Inferentia device and benefit from its high performance. We then use a large model inference container powered by Deep Java Library (DJLServing) as our model serving solution. We demonstrate how these three layers work together by deploying an OPT-13B model on an Amazon Elastic Compute Cloud (Amazon EC2) inf2.48xlarge instance.

The three pillars

The following image represents the layers of hardware and software working to help you unlock the best price and performance of your large language models. AWS Neuron and tranformer-neuronx are the SDKs used to run deep learning workloads on AWS Inferentia. Lastly, DJLServing is the serving solution that is integrated in the container.

Hardware: Inferentia

AWS Inferentia, specifically designed for inference by AWS, is a high-performance and low-cost ML inference accelerator. In this post, we use AWS Inferentia2 (available via Inf2 instances), the second generation purpose-built ML inference accelerator.

Each EC2 Inf2 instance is powered by up to 12 Inferentia2 devices, and allows you to choose between four instance sizes.

Amazon EC2 Inf2 supports NeuronLink v2, a low-latency and high-bandwidth chip-to-chip interconnect, which enables high performance collective communication operations such as AllReduce and AllGather. This efficiently shards models across AWS Inferentia2 devices (such as via Tensor Parallelism), and therefore optimizes latency and throughput. This is particularly useful for large language models. For benchmark performance figures, refer to AWS Neuron Performance.

At the heart of the Amazon EC2 Inf2 instance are AWS Inferentia2 devices, each containing two NeuronCores-v2. Each NeuronCore-v2 is an independent, heterogenous compute-unit, with four main engines: Tensor, Vector, Scalar, and GPSIMD engines. It includes an on-chip software-managed SRAM memory for maximizing data locality. The following diagram shows the internal workings of the AWS Inferentia2 device architecture.

Neuron and transformers-neuronx

Above the hardware layer are the software layers used to interact with AWS Inferentia. AWS Neuron is the SDK used to run deep learning workloads on AWS Inferentia and AWS Trainium based instances. It enables end-to-end ML development lifecycle to build new models, train and optimize these models, and deploy them for production. AWS Neuron includes a deep learning compiler, runtime, and tools that are natively integrated with popular frameworks like TensorFlow and PyTorch.

transformers-neuronx is an open-source library built by the AWS Neuron team that helps run transformer decoder inference workflows using the AWS Neuron SDK. Currently, it has examples for the GPT2, GPT-J, and OPT model types, and different model sizes that have their forward functions re-implemented in a compiled language for extensive code analysis and optimizations. Customers can implement other model architecture based on the same library. AWS Neuron-optimized transformer decoder classes have been re-implemented in XLA HLO (High Level Operations) using a syntax called PyHLO. The library also implements tensor parallelism to shard the model weights across multiple NeuronCores.

Tensor parallelism is needed because the models are so large, they don’t fit into a single accelerator HBM memory. The support for tensor parallelism by the AWS Neuron runtime in transformers-neuronx makes heavy use of collective operations such as AllReduce. The following are some principles for setting the tensor parallelism degree (number of NeuronCores participating in sharded matrix multiply operations) for AWS Neuron-optimized transformer decoder models:

The number of attention heads needs to be divisible by the tensor parallelism degree
The total data size of model weights and key-value caches needs to be smaller than 16 GB times the tensor parallelism degree
Currently, the Neuron runtime supports tensor parallelism degrees 1, 2, 8, and 32 on Trn1 and supports tensor parallelism degrees 1, 2, 4, 8, and 24 on Inf2

DJLServing

DJLServing is a high-performance model server that added support for AWS Inferentia2 in March 2023. The AWS Model Server team offers a container image that can help LLM/AIGC use cases. DJL is also part of Rubikon support for Neuron that includes the integration between DJLServing and transformers-neuronx. The DJLServing model server and transformers-neuronx library are the core components of the container built to serve the LLMs supported through the transformers library. This container and the subsequent DLCs will be able to load the models on the AWS Inferentia chips on an Amazon EC2 Inf2 host along with the installed AWSInferentia drivers and toolkit. In this post, we explain two ways of running the container.

The first way is to run the container without writing any additional code. You can use the default handler for a seamless user experience and pass in one of the supported model names and any load time configurable parameters. This will compile and serve an LLM on an Inf2 instance. The following code shows an example:

engine=Python
option.entryPoint=djl_python.transformers_neuronx
option.task=text-generation
option.model_id=facebook/opt-1.3b
option.tensor_parallel_degree=2

Alternatively, you can write your own model.py file, but that requires implementing the model loading and inference methods to serve as a bridge between the DJLServing APIs and, in this case, the transformers-neuronx APIs. You can also provide configurable parameters in a serving.properties file to be picked up during model loading. For the full list of configurable parameters, refer to All DJL configuration options.

The following code is a sample model.py file. The serving.properties file is similar to the one shown earlier.

def load_model(properties):
     """
    Load a model based from the framework provided APIs
    :param: properties configurable properties for model loading
            specified in serving.properties
    :return: model and other artifacts required for inference
    """
    batch_size = int(properties.get("batch_size", 2))
    tp_degree = int(properties.get("tensor_parallel_degree", 2))
    amp = properties.get("dtype", "f16")
    
    model_id = "facebook/opt-13b"
    model = OPTForCausalLM.from_pretrained(model_id, low_cpu_mem_usage=True)
    ...
    
    tokenizer = AutoTokenizer.from_pretrained(model_id)
    model = OPTForSampling.from_pretrained(load_path,
                                           batch_size=batch_size,
                                           amp=amp,
                                           tp_degree=tp_degree)
    model.to_neuron()
    return model, tokenizer, batch_size

Let’s see what this all looks like on an Inf2 instance.

Launch the Inferentia hardware

We first need to launch an inf.42xlarge instance to host our OPT-13b model. We use the Deep Learning AMI Neuron PyTorch 1.13.0 (Ubuntu 20.04) 20230226 Amazon Machine Image (AMI) because it already includes the Docker image and necessary drivers for the AWS Neuron runtime.

We increase the storage of the instance to 512 GB to accommodate for large language models.

Install necessary dependencies and create the model

We set up a Jupyter notebook server with our AMI to make it easier to view and manage our directories and files. When we’re in the desired directory, we set subdirectories for logs and models and create a serving.properties file.

We can use the standalone model provided by the DJL Serving container. This means we don’t have to define a model, but we do need to provide a serving.properties file. See the following code:

option.model_id=facebook/opt-1.3b
option.batch_size=2
option.tensor_parallel_degree=2
option.n_positions=256
option.dtype=fp16
option.model_loading_timeout=600
engine=Python
option.entryPoint=djl_python.transformers-neuronx
#option.s3url=s3://djl-llm/opt-1.3b/

#can also specify which device to load on.
#engine=Python ---because the handles are implement in python.

This instructs the DJL model server to use the OPT-13B model. We set the batch size to 2 and dtype=f16 for the model to fit on the neuron device. DJL serving supports dynamic batching and by setting a similar tensor_parallel_degree value, we can increase throughput of inference requests because we distribute inference across multiple NeuronCores. We also set n_positions=256 because this informs the maximum length we expect the model to have.

Our instance has 12 AWS Neuron devices, or 24 NeuronCores, while our OPT-13B model requires 40 attention heads. For example, setting tensor_parallel_degree=8 means every 8 NeuronCores will host one model instance. If you divide the required attention heads (40) by the number of NeuronCores (8), then you get 5 attention heads allocated to each NeuronCore, or 10 on each AWS Neuron device.

You can use the following sample model.py file, which defines the model and creates the handler function. You can edit it to meet your needs, but be sure it can be supported on transformers-neuronx.

cat serving.properties

option.tensor_parallel_degree=2 
option.batch_size=2 
option.dtype=f16 
engine=Python

cat model.py

import torch
import tempfile
import os

from transformers.models.opt import OPTForCausalLM
from transformers import AutoTokenizer
from transformers_neuronx import dtypes
from transformers_neuronx.module import save_pretrained_split
from transformers_neuronx.opt.model import OPTForSampling
from djl_python import Input, Output

model = None

def load_model(properties):
    batch_size = int(properties.get("batch_size", 2))
    tp_degree = int(properties.get("tensor_parallel_degree", 2))
    amp = properties.get("dtype", "f16")
    model_id = "facebook/opt-13b"
    load_path = os.path.join(tempfile.gettempdir(), model_id)
    model = OPTForCausalLM.from_pretrained(model_id,
                                           low_cpu_mem_usage=True)
    dtype = dtypes.to_torch_dtype(amp)
    for block in model.model.decoder.layers:
        block.self_attn.to(dtype)
        block.fc1.to(dtype)
        block.fc2.to(dtype)
    model.lm_head.to(dtype)
    save_pretrained_split(model, load_path)
    tokenizer = AutoTokenizer.from_pretrained(model_id)
    model = OPTForSampling.from_pretrained(load_path,
                                           batch_size=batch_size,
                                           amp=amp,
                                           tp_degree=tp_degree)
    model.to_neuron()
    return model, tokenizer, batch_size

def infer(seq_length, prompt):
    with torch.inference_mode():
        input_ids = torch.as_tensor([tokenizer.encode(text) for text in prompt])
        generated_sequence = model.sample(input_ids,
                                          sequence_length=seq_length)
        outputs = [tokenizer.decode(gen_seq) for gen_seq in generated_sequence]
    return outputs

def handle(inputs: Input):
    global model, tokenizer, batch_size
    if not model:
        model, tokenizer, batch_size = load_model(inputs.get_properties())

    if inputs.is_empty():
        # Model server makes an empty call to warmup the model on startup
        return None

    data = inputs.get_as_json()
    seq_length = data["seq_length"]
    prompt = data["text"]
    outputs = infer(seq_length, prompt)
    result = {"outputs": outputs}
    return Output().add_as_json(result)

mkdir -p models/opt13b logs
mv serving.properties model.py models/opt13b

Run the serving container

The last steps before inference are to pull the Docker image for the DJL serving container and run it on our instance:

docker pull deepjavalibrary/djl-serving:0.21.0-pytorch-inf2

After you pull the container image, run the following command to deploy your model. Make sure you’re in the right directory that contains the logs and models subdirectory because the command will map these to the container’s /opt/directories.

docker run -it --rm --network=host 
           -v `pwd`/models:/opt/ml/model 
           -v `pwd`/logs:/opt/djl/logs 
           -u djl --device /dev/neuron0  --device /dev/neuron10  --device /dev/neuron2  --device /dev/neuron4  --device /dev/neuron6  --device /dev/neuron8 --device /dev/neuron1  --device /dev/neuron11 
           -e MODEL_LOADING_TIMEOUT=7200 
           -e PREDICT_TIMEOUT=360 
           deepjavalibrary/djl-serving:0.21.0-pytorch-inf2 serve

Run inference

Now that we’ve deployed the model, let’s test it out with a simple CURL command to pass some JSON data to our endpoint. Because we set a batch size of 2, we pass along the corresponding number of inputs:

curl -X POST "http://127.0.0.1:8080/predictions/opt13b" 
     -H 'Content-Type: application/json' 
     -d '{"seq_length":2048,
          "text":[
                    "Hello, I am a language model,",
                    "Welcome to Amazon Elastic Compute Cloud,"
                  ]
          }'

The preceding command generates a response in the command line. The model is quite chatty but its response validates our model. We were able to run inference on our LLM thanks to Inferentia!

Clean up

Don’t forget to delete your EC2 instance once you are done to save cost.

Conclusion

In this post, we deployed an Amazon EC2 Inf2 instance to host an LLM and ran inference using a large model inference container. You learned how AWS Inferentia and the AWS Neuron SDK interact to allow you to easily deploy LLMs for inference at an optimal price-to-performance ratio. Stay tuned for updates on more capabilities and new innovations with Inferentia. For more examples about Neuron, see aws-neuron-samples.

About the Authors

Qingwei Li is a Machine Learning Specialist at Amazon Web Services. He received his Ph.D. in Operations Research after he broke his advisor’s research grant account and failed to deliver the Nobel Prize he promised. Currently he helps customers in the financial service and insurance industry build machine learning solutions on AWS. In his spare time, he likes reading and teaching.

Peter Chung is a Solutions Architect for AWS, and is passionate about helping customers uncover insights from their data. He has been building solutions to help organizations make data-driven decisions in both the public and private sectors. He holds all AWS certifications as well as two GCP certifications. He enjoys coffee, cooking, staying active, and spending time with his family.

Aaqib Ansari is a Software Development Engineer with the Amazon SageMaker Inference team. He focuses on helping SageMaker customers accelerate model inference and deployment. In his spare time, he enjoys hiking, running, photography and sketching.

Qing Lan is a Software Development Engineer in AWS. He has been working on several challenging products in Amazon, including high performance ML inference solutions and high performance logging system. Qing’s team successfully launched the first Billion-parameter model in Amazon Advertising with very low latency required. Qing has in-depth knowledge on the infrastructure optimization and Deep Learning acceleration.

Frank Liu is a Software Engineer for AWS Deep Learning. He focuses on building innovative deep learning tools for software engineers and scientists. In his spare time, he enjoys hiking with friends and family.

Deploy pre-trained models on AWS Wavelength with 5G edge using Amazon SageMaker JumpStart

April 7, 2023

by Robert Belson Amazon AWS

With the advent of high-speed 5G mobile networks, enterprises are more easily positioned than ever with the opportunity to harness the convergence of telecommunications networks and the cloud. As one of the most prominent use cases to date, machine learning (ML) at the edge has allowed enterprises to deploy ML models closer to their end-customers to reduce latency and increase responsiveness of their applications. As an example, smart venue solutions can use near-real-time computer vision for crowd analytics over 5G networks, all while minimizing investment in on-premises hardware networking equipment. Retailers can deliver more frictionless experiences on the go with natural language processing (NLP), real-time recommendation systems, and fraud detection. Even ground and aerial robotics can use ML to unlock safer, more autonomous operations.

To reduce the barrier to entry of ML at the edge, we wanted to demonstrate an example of deploying a pre-trained model from Amazon SageMaker to AWS Wavelength, all in less than 100 lines of code. In this post, we demonstrate how to deploy a SageMaker model to AWS Wavelength to reduce model inference latency for 5G network-based applications.

Solution overview

Across AWS’s rapidly expanding global infrastructure, AWS Wavelength brings the power of cloud compute and storage to the edge of 5G networks, unlocking more performant mobile experiences. With AWS Wavelength, you can extend your virtual private cloud (VPC) to Wavelength Zones corresponding to the telecommunications carrier’s network edge in 29 cities across the globe. The following diagram shows an example of this architecture.

You can opt in to the Wavelength Zones within a given Region via the AWS Management Console or the AWS Command Line Interface (AWS CLI). To learn more about deploying geo-distributed applications on AWS Wavelength, refer to Deploy geo-distributed Amazon EKS clusters on AWS Wavelength.

Building on the fundamentals discussed in this post, we look to ML at the edge as a sample workload with which to deploy to AWS Wavelength. As our sample workload, we deploy a pre-trained model from Amazon SageMaker JumpStart.

SageMaker is a fully managed ML service that allows developers to easily deploy ML models into their AWS environments. Although AWS offers a number of options for model training—from AWS Marketplace models and SageMaker built-in algorithms—there are a number of techniques to deploy open-source ML models.

JumpStart provides access to hundreds of built-in algorithms with pre-trained models that can be seamlessly deployed to SageMaker endpoints. From predictive maintenance and computer vision to autonomous driving and fraud detection, JumpStart supports a variety of popular use cases with one-click deployment on the console.

Because SageMaker is not natively supported in Wavelength Zones, we demonstrate how to extract the model artifacts from the Region and re-deploy to the edge. To do so, you use Amazon Elastic Kubernetes Service (Amazon EKS) clusters and node groups in Wavelength Zones, followed by creating a deployment manifest with the container image generated by JumpStart. The following diagram illustrates this architecture.

Prerequisites

To make this as easy as possible, ensure that your AWS account has Wavelength Zones enabled. Note that this integration is only available in us-east-1 and us-west-2, and you will be using us-east-1 for the duration of the demo.

To opt in to AWS Wavelength, complete the following steps:

On the Amazon VPC console, choose Zones under Settings and choose US East (Verizon) / us-east-1-wl1.
Choose Manage.
Select Opted in.
Choose Update zones.

Create AWS Wavelength infrastructure

Before we convert the local SageMaker model inference endpoint to a Kubernetes deployment, you can create an EKS cluster in a Wavelength Zone. To do so, deploy an Amazon EKS cluster with an AWS Wavelength node group. To learn more, you can visit this guide on the AWS Containers Blog or Verizon’s 5GEdgeTutorials repository for one such example.

Next, using an AWS Cloud9 environment or interactive development environment (IDE) of choice, download the requisite SageMaker packages and Docker Compose, a key dependency of JumpStart.

pip install sagemaker
pip install 'sagemaker[local]' --upgrade
sudo curl -L "https://github.com/docker/compose/releases/download/1.23.1/docker-compose-$(uname -s)-$(uname -m)" -o /usr/local/bin/docker-compose
sudo chmod +x /usr/local/bin/docker-compose
docker-compose --version

Create model artifacts using JumpStart

First, make sure that you have an AWS Identity and Access Management (IAM) execution role for SageMaker. To learn more, visit SageMaker Roles.

Using this example, create a file called train_model.py that uses the SageMaker Software Development Kit (SDK) to retrieve a pre-built model (replace <your-sagemaker-execution-role> with the Amazon Resource Name (ARN) of your SageMaker execution role). In this file, you deploy a model locally using the instance_type attribute in the model.deploy() function, which starts a Docker container within your IDE using all requisite model artifacts you defined:

#train_model.py
from sagemaker import image_uris, model_uris, script_uris
from sagemaker.model import Model
from sagemaker.predictor import Predictor
from sagemaker.utils import name_from_base
import sagemaker, boto3, json
from sagemaker import get_execution_role

aws_role = "<your-sagemaker-execution-role>"
aws_region = boto3.Session().region_name
sess = sagemaker.Session()

# model_version="*" fetches the latest version of the model.
infer_model_id = "tensorflow-tc-bert-en-uncased-L-12-H-768-A-12-2"
infer_model_version= "*"
endpoint_name = name_from_base(f"jumpstart-example-{infer_model_id}")

# Retrieve the inference docker container uri.
deploy_image_uri = image_uris.retrieve(
region=None,
framework=None,
image_scope="inference",
model_id=infer_model_id,
model_version=infer_model_version,
instance_type="local",
)
# Retrieve the inference script uri.
deploy_source_uri = script_uris.retrieve(
model_id=infer_model_id, model_version=infer_model_version, script_scope="inference"
)
# Retrieve the base model uri.
base_model_uri = model_uris.retrieve(
model_id=infer_model_id, model_version=infer_model_version, model_scope="inference"
)
model = Model(
image_uri=deploy_image_uri,
source_dir=deploy_source_uri,
model_data=base_model_uri,
entry_point="inference.py",
role=aws_role,
predictor_cls=Predictor,
name=endpoint_name,
)
print(deploy_image_uri,deploy_source_uri,base_model_uri)
# deploy the Model.
base_model_predictor = model.deploy(
initial_instance_count=1,
instance_type="local",
endpoint_name=endpoint_name,
)

Next, set infer_model_id to the ID of the SageMaker model that you would like to use.

For a complete list, refer to Built-in Algorithms with pre-trained Model Table. In our example, we use the Bidirectional Encoder Representations from Transformers (BERT) model, commonly used for natural language processing.

Run the train_model.py script to retrieve the JumpStart model artifacts and deploy the pre-trained model to your local machine:

python train_model.py

Should this step succeed, your output may resemble the following:

763104351884.dkr.ecr.us-east-1.amazonaws.com/tensorflow-inference:2.8-cpu
s3://jumpstart-cache-prod-us-east-1/source-directory-tarballs/tensorflow/inference/tc/v2.0.0/sourcedir.tar.gz
s3://jumpstart-cache-prod-us-east-1/tensorflow-infer/v2.0.0/infer-tensorflow-tc-bert-en-uncased-L-12-H-768-A-12-2.tar.gz

In the output, you will see three artifacts in order: the base image for TensorFlow inference, the inference script that serves the model, and the artifacts containing the trained model. Although you could create a custom Docker image with these artifacts, another approach is to let SageMaker local mode create the Docker image for you. In the subsequent steps, we extract the container image running locally and deploy to Amazon Elastic Container Registry (Amazon ECR) as well as push the model artifact separately to Amazon Simple Storage Service (Amazon S3).

Convert local mode artifacts to remote Kubernetes deployment

Now that you have confirmed that SageMaker is working locally, let’s extract the deployment manifest from the running container. Complete the following steps:

Identify the location of the SageMaker local mode deployment manifest: To do so, search our root directory for any files named docker-compose.yaml.

docker_manifest=$( find /tmp/tmp* -name "docker-compose.yaml" -printf '%T+ %pn' | sort | tail -n 1 | cut -d' ' -f2-)
echo $docker_manifest

Identify the location of the SageMaker local mode model artifacts: Next, find the underlying volume mounted to the local SageMaker inference container, which will be used in each EKS worker node after we upload the artifact to Amazon s3.

model_local_volume = $(grep -A1 -w "volumes:" $docker_manifest | tail -n 1 | tr -d ' ' | awk -F: '{print $1}' | cut -c 2-) 
# Returns something like: /tmp/tmpcr4bu_a7</p>

Create local copy of running SageMaker inference container: Next, we’ll find the currently running container image running our machine learning inference model and make a copy of the container locally. This will ensure we have our own copy of the container image to pull from Amazon ECR.

# Find container ID of running SageMaker Local container
mkdir sagemaker-container
container_id=$(docker ps --format "{{.ID}} {{.Image}}" | grep "tensorflow" | awk '{print $1}')
# Retrieve the files of the container locally
docker cp $my_container_id:/ sagemaker-container/

Before acting on the model_local_volume, which we’ll push to Amazon S3, push a copy of the running Docker image, now in the sagemaker-container directory, to Amazon Elastic Container Registry. Be sure to replace region, aws_account_id, docker_image_id and my-repository:tag or follow the Amazon ECR user guide. Also, be sure to take note of the final ECR Image URL (aws_account_id.dkr.ecr.region.amazonaws.com/my-repository:tag), which we will use in our EKS deployment.

aws ecr get-login-password --region region | docker login --username AWS --password-stdin aws_account_id.dkr.ecr.region.amazonaws.com
docker build .
docker tag <docker-image-id> aws_account_id.dkr.ecr.region.amazonaws.com/my-repository:tag
docker push aws_account_id.dkr.ecr.region.amazonaws.com/my-repository:tag

Now that we have an ECR image corresponding to the inference endpoint, create a new Amazon S3 bucket and copy the SageMaker Local artifacts (model_local_volume) to this bucket. In parallel, create an Identity Access Management (IAM) that provides Amazon EC2 instances access to read objects within the bucket. Be sure to replace <unique-bucket-name> with a globally unique name for your Amazon S3 bucket.

# Create S3 Bucket for model artifacts
aws s3api create-bucket --bucket <unique-bucket-name>
aws s3api put-public-access-block --bucket <unique-bucket-name> --public-access-block-configuration "BlockPublicAcls=true,IgnorePublicAcls=true,BlockPublicPolicy=true,RestrictPublicBuckets=true"
# Step 2: Create IAM attachment to Node Group
cat > ec2_iam_policy.json << EOF
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "s3:GetObject",
        "s3:PutObject",
        "s3:ListBucket"
      ],
      "Resource": [
        "arn:aws:s3:::sagemaker-wavelength-demo-app/*",
        "arn:aws:s3:::sagemaker-wavelength-demo-app"
      ]
    }
  ]
}

# Create IAM policy
policy_arn=$(aws iam create-policy --policy-name sagemaker-demo-app-s3 --policy-document file://ec2_iam_policy.json --query Policy.Arn)
aws iam attach-role-policy --role-name wavelength-eks-Cluster-wl-workers --policy-arn $policy_arn

# Push model artifacts to S3
cd $model_local_volume
tar -cvf sagemaker_model.tar .
aws s3 cp sagemaker_model.tar s3://

Next, to ensure that each EC2 instance pulls a copy of the model artifact on launch, edit the user data for your EKS worker nodes. In your user data script, ensure that each node retrieves the model artifacts using the the S3 API at launch. Be sure to replace <unique-bucket-name> with a globally unique name for your Amazon S3 bucket. Given that the node’s user data will also include the EKS bootstrap script, the complete user data may look something like this.

#!/bin/bash
mkdir /tmp/model</p><p>cd /tmp/model
aws s3api get-object --bucket sagemaker-wavelength-demo-app --key sagemaker_model.tar  sagemaker_model.tar
tar -xvf sagemaker_model.tar
set -o xtrace
/etc/eks/bootstrap.sh <your-eks-cluster-id>

Now, you can inspect the existing docker manifest it and translate it to Kubernetes-friendly manifest files using Kompose, a well-known conversion tool. Note: if you get a version compatibility error, change the version attribute in line 27 of docker-compose.yml to “2”.

curl -L https://github.com/kubernetes/kompose/releases/download/v1.26.0/kompose-linux-amd64 -o kompose
chmod +x kompose && sudo mv ./kompose /usr/local/bin/compose
cd "$(dirname "$docker_manifest")"
kompose convert

After running Kompose, you’ll see four new files: a Deployment object, Service object, PersistentVolumeClaim object, and NetworkPolicy object. You now have everything you need to begin your foray into Kubernetes at the edge!

Deploy SageMaker model artifacts

Make sure you have kubectl and aws-iam-authenticator downloaded to your AWS Cloud9 IDE. If not, follow the installation guides:

Now, complete the following steps:

Modify the service/algo-1-ow3nv object to switch the service type from ClusterIP to NodePort. In our example, we have selected port 30,007 as our NodePort:

# algo-1-ow3nv-service.yaml
apiVersion: v1
kind: Service
metadata:
  annotations:
    kompose.cmd: kompose convert
    kompose.version: 1.26.0 (40646f47)
  creationTimestamp: null
  labels:
    io.kompose.service: algo-1-ow3nv
  name: algo-1-ow3nv
spec:
  type: NodePort
  ports:
    - name: "8080"
      port: 8080
      targetPort: 8080
      nodePort: 30007
  selector:
    io.kompose.service: algo-1-ow3nv
status:
  loadBalancer: {}

Next, you must allow the NodePort in the security group for your node. To do so, retrieve the security groupID and allow-list the NodePort:

node_group_sg=$(aws ec2 describe-security-groups --filters Name=group-name,Values='wavelength-eks-Cluster*' --query "SecurityGroups[0].GroupId" --output text)
aws ec2 authorize-security-group-ingress --group-id $node_group_sg --ip-permissions IpProtocol=tcp,FromPort=30007,ToPort=30007,IpRanges='[{CidrIp=0.0.0.0/0}]'

Next, modify the algo-1-ow3nv-deployment.yaml manifest to mount the /tmp/model hostPath directory to the container. Replace <your-ecr-image> with the ECR image you created earlier:

# algo-1-ow3nv-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  annotations:
    kompose.cmd: kompose convert
    kompose.version: 1.26.0 (40646f47)
  creationTimestamp: null
  labels:
    io.kompose.service: algo-1-ow3nv
  name: algo-1-ow3nv
spec:
  replicas: 1
  selector:
    matchLabels:
      io.kompose.service: algo-1-ow3nv
  strategy:
    type: Recreate
  template:
    metadata:
      annotations:
        kompose.cmd: kompose convert
        kompose.version: 1.26.0 (40646f47)
      creationTimestamp: null
      labels:
        io.kompose.network/environment-sagemaker-local: "true"
        io.kompose.service: algo-1-ow3nv
    spec:
      containers:
        - args:
            - serve
          env:
            - name: SAGEMAKER_CONTAINER_LOG_LEVEL
              value: "20"
            - name: SAGEMAKER_PROGRAM
              value: inference.py
            - name: SAGEMAKER_REGION
              value: us-east-1
            - name: SAGEMAKER_SUBMIT_DIRECTORY
              value: /opt/ml/model/code
          image: <your-ecr-image>
          name: sagemaker-test-model
          ports:
            - containerPort: 8080
          resources: {}
          stdin: true
          tty: true
          volumeMounts:
            - mountPath: /opt/ml/model
              name: algo-1-ow3nv-claim0
      restartPolicy: Always
      volumes:
        - name: algo-1-ow3nv-claim0
          hostPath:
            path: /tmp/model
status: {}

With the manifest files you created from Kompose, use kubectl to apply the configs to your cluster:

$ kubectl apply -f algo-1-ow3nv-deployment.yaml algo-1-ow3nv-service.yaml
deployment.apps/algo-1-ow3nv created
service/algo-1-ow3nv created

Connect to the 5G edge model

To connect to your model, complete the following steps:

On the Amazon EC2 console, retrieve the carrier IP of the EKS worker node or use the AWS CLI to query the carrier IP address directly:

aws ec2 describe-instances --filters "Name=tag:aws:autoscaling:groupName,Values=eks-EKSNodeGroup*" --query 'Reservations[*].Instances[*].[Placement.AvailabilityZone,NetworkInterfaces[].Association.CarrierIp]' --output text
# Example Output: 155.146.1.12

Now, with the carrier IP address extracted, you can connect to the model directly using the NodePort. Create a file called invoke.py to invoke the BERT model directly by providing a text-based input that will be run against a sentiment-analyzer to determine whether the tone was positive or negative:

import json
endpoint_name="jumpstart-example-tensorflow-tc-bert-en-uncased-L-12-H-768-A-12-2"
request_body = "simply stupid , irrelevant and deeply , truly , bottomlessly cynical ".encode("utf-8")
import requests
r2=requests.post(url="http://155.146.1.12:30007/invocations", data=request_body, headers={"Content-Type":"application/x-text","Accept":"application/json;verbose"})
print(r2.text)

Your output should resemble the following:

{"probabilities": [0.998723, 0.0012769578], "labels": [0, 1], "predicted_label": 0}

Clean up

To destroy all application resources created, delete the AWS Wavelength worker nodes, the EKS control plane, and all the resources created within the VPC. Additionally, delete the ECR repo used to host the container image, the S3 buckets used to host the SageMaker model artifacts and the sagemaker-demo-app-s3 IAM policy.

Conclusion

In this post, we demonstrated a novel approach to deploying SageMaker models to the network edge using Amazon EKS and AWS Wavelength. To learn about Amazon EKS best practices on AWS Wavelength, refer to Deploy geo-distributed Amazon EKS clusters on AWS Wavelength. Additionally, to learn more about Jumpstart, visit the Amazon SageMaker JumpStart Developer Guide or the JumpStart Available Model Table.

About the Authors

Robert Belson is a Developer Advocate in the AWS Worldwide Telecom Business Unit, specializing in AWS Edge Computing. He focuses on working with the developer community and large enterprise customers to solve their business challenges using automation, hybrid networking and the edge cloud.

Mohammed Al-Mehdar is a Senior Solutions Architect in the Worldwide Telecom Business Unit at AWS. His main focus is to help enable customers to build and deploy Telco and Enterprise IT workloads on AWS. Prior to joining AWS, Mohammed has been working in the Telco industry for over 13 years and brings a wealth of experience in the areas of LTE Packet Core, 5G, IMS and WebRTC. Mohammed holds a bachelor’s degree in Telecommunications Engineering from Concordia University.

Evan Kravitz is a software engineer at Amazon Web Services, working on SageMaker JumpStart. He enjoys cooking and going on runs in New York City.

Justin St. Arnauld is an Associate Director – Solution Architects at Verizon for the Public Sector with over 15 years of experience in the IT industry. He is a passionate advocate for the power of edge computing and 5G networks and is an expert in developing innovative technology solutions that leverage these technologies. Justin is particularly enthusiastic about the capabilities offered by Amazon Web Services (AWS) in delivering cutting-edge solutions for his clients. In his free time, Justin enjoys keeping up-to-date with the latest technology trends and sharing his knowledge and insights with others in the industry.

Import data from over 40 data sources for no-code machine learning with Amazon SageMaker Canvas

April 6, 2023

by Brandon Nair Amazon AWS

Data is at the heart of machine learning (ML). Including relevant data to comprehensively represent your business problem ensures that you effectively capture trends and relationships so that you can derive the insights needed to drive business decisions. With Amazon SageMaker Canvas, you can now import data from over 40 data sources to be used for no-code ML. Canvas expands access to ML by providing business analysts with a visual interface that allows them to generate accurate ML predictions on their own—without requiring any ML experience or having to write a single line of code. Now, you can import data in-app from popular relational data stores such as Amazon Athena as well as third-party software as a service (SaaS) platforms supported by Amazon AppFlow such as Salesforce, SAP OData, and Google Analytics.

The process of gathering high-quality data for ML can be complex and time-consuming, because the proliferation of SaaS applications and data storage services has created a spread of data across a multitude of systems. For example, you may need to conduct a customer churn analysis using customer data from Salesforce, financial data from SAP, and logistics data from Snowflake. To create a dataset across these sources, you need to log into each application individually, select the desired data, and export it locally, where it can then be aggregated using a different tool. This dataset then needs to be imported into a separate application for ML.

With this launch, Canvas empowers you to capitalize on data stored in disparate sources by supporting in-app data import and aggregation from over 40 data sources. This feature is made possible through new native connectors to Athena and to Amazon AppFlow via the AWS Glue Data Catalog. Amazon AppFlow is a managed service that enables you to securely transfer data from third-party SaaS applications to Amazon Simple Storage Service (Amazon S3) and catalog the data with the Data Catalog with just a few clicks. After your data is transferred, you can simply access the data source within Canvas, where you can view table schemas, join tables within or across data sources, write Athena queries, and preview and import your data. After your data is imported, you can use existing Canvas functionalities such as building an ML model, viewing column impact data, or generating predictions. You can automate the data transfer process in Amazon AppFlow to activate on a schedule to ensure that you always have access to the latest data in Canvas.

Solution overview

The steps outlined in this post provide two examples of how to import data into Canvas for no-code ML. In the first example, we demonstrate how to import data through Athena. In the second example, we show how to import data from a third-party SaaS application via Amazon AppFlow.

Import data from Athena

In this section, we show an example of importing data in Canvas from Athena to conduct a customer segmentation analysis. We create an ML classification model to categorize our customer base into four different classes, with the end goal to use the model to predict which class a new customer will fall into. We follow three major steps: import the data, train a model, and generate predictions. Let’s get started.

Import the data

To import data from Athena, complete the following steps:

On the Canvas console, choose Datasets in the navigation pane, then choose Import.
Expand the Data Source menu and choose Athena.
Choose the correct database and table that you want to import from. You can optionally preview the table by choosing the preview icon.

The following screenshot shows an example of the preview table.

In our example, we segment customers based on the marketing channel through which they have engaged our services. This is specified by the column segmentation, where A is print media, B is mobile, C is in-store promotions, and D is television.

When you’re satisfied that you have the right table, drag the desired table into the Drag and drop datasets to join section.
You can now optionally select or deselect columns, join tables by dragging another table into the Drag and drop datasets to join section, or write SQL queries to specify your data slice. For this post, we use all the data in the table.
To import the data, choose Import data.

Your data is imported into Canvas as a dataset from the specific table in Athena.

Train a model

After your data is imported, it shows up on the Datasets page. At this stage, you can build a model. To do so, complete the following steps:

Select your dataset and choose Create a model.
For Model name, enter your model name (for this post, my_first_model).
Canvas enables you to create models for predictive analysis, image analysis, and text analysis. Because we want to categorize customers, select Predictive analysis for Problem type.
To proceed, choose Create.

On the Build page, you can see statistics about your dataset, such as the percentage of missing values and mean of the data.

For Target column, choose a column (for this post, segmentation).

Canvas offers two types of models that can generate predictions. Quick build prioritizes speed over accuracy, providing a model in 2–15 minutes. Standard build prioritizes accuracy over speed, providing a model in 2–4 hours.

For this post, choose Quick build.
After the model is trained, you can analyze the model accuracy.

The following model categorizes customers correctly 94.67% of the time.

You can optionally also view how each column impacts the categorization. In this example, as a customer ages, the column has less of an influence on the categorization. To generate predictions with your new model, choose Predict.

Generate predictions

On the Predict tab, you can generate both batch predictions and single predictions. Complete the following steps:

For this post, choose Single prediction to understand what customer segmentation will result for a new customer.

For our prediction, we want to understand what segmentation a customer will be if they are 32 years old and a lawyer by profession.

Replace the corresponding values with these inputs.
Choose Update.

The updated prediction is displayed in the prediction window. In this example, a 32-year old lawyer is classified in segment D.

Import data from a third-party SaaS application to AWS

To import data from third-party SaaS applications into Canvas for no-code ML, you must first transfer data from the application to Amazon S3 via Amazon AppFlow. In this example, we transfer manufacturing data from SAP OData.

To transfer your data, complete the following steps:

On the Amazon AppFlow console, choose Create flow.
For Flow name, enter a name.
Choose Next.
For Source name, choose your desired third-party SaaS application (for this post, SAP OData).
Choose Create new connection.
In the Connect to SAP OData pop-up window, fill out the authentication details and choose Connect.
For SAP OData object, choose the object containing your data within SAP OData.
For Destination name, choose Amazon S3.
For Bucket details, specify your S3 bucket details.
Select Catalog your data in the AWS Glue Data Catalog.
For User role, choose the AWS Identity and Access Management (IAM) role that the Canvas user will use to access the data from.
For Flow trigger, select Run on demand.

Alternatively, you can automate the flow transfer by selecting Run flow on schedule.

Choose Next.
Choose how to map the fields and complete the field mapping. For this post, because there is no corresponding destination database to map to, there is no need to specify the mapping.
Choose Next.
Optionally, add filters if necessary to restrict data transferred.
Choose Next.
Review your details and choose Create flow.

When the flow is created, a green ribbon will populate at the top of the page indicating that it is successfully updated.

Choose Run flow.

At this stage, you have successfully transferred your data from SAP OData to Amazon S3.

Now you can import the data from within the Canvas app. To import your data from Canvas, follow the same set of steps as described in the Data import section earlier in this post. For this example, on the Data source drop-down menu on the Data import page, you can see SAP OData listed.

You are now able to use all existing Canvas functionalities, such as cleaning your data, building an ML model, viewing column impact data, and generating predictions.

Clean up

To clean up the resources provisioned, log out of the Canvas application by choosing Log out in the navigation pane.

Conclusion

With Canvas, you can now import data for no-code ML from 47 data sources through native connectors with Athena and Amazon AppFlow via the AWS Glue Data Catalog. This process enables you to directly access and aggregate data across data sources within Canvas after data is transferred via Amazon AppFlow. You can automate the data transfer to activate on a schedule, which means that you don’t have to go through the process again to refresh your data. With this process, you can create new datasets with your latest data without having to leave the Canvas app. This feature is now available in all AWS Regions where Canvas is available. To get started with importing your data, navigate to the Canvas console and follow the steps outlined in this post. To learn more, refer to Connect to data sources.

About the authors

Brandon Nair is a Senior Product Manager for Amazon SageMaker Canvas. His professional interest lies in creating scalable machine learning services and applications. Outside of work he can be found exploring national parks, perfecting his golf swing or planning an adventure trip.

Sanjana Kambalapally is a Software Development Manager for AWS Sagemaker Canvas, which aims at democratizing machine learning by building no code ML applications.

Xin Xu is a software development engineer in the Canvas team, where he works on data preparation, among other aspects in no-code machine learning products. In his spare time, he enjoys jogging, reading and watching movies.

Volkan Unsal is a Sr. Frontend Engineer in the Canvas team, where he builds no-code products to make artificial intelligence accessible to humans. In his spare time, he enjoys running, reading, watching e-sports, and martial arts.

Predicting new and existing product sales in semiconductors using Amazon Forecast

April 6, 2023

by Souad Boutane Amazon AWS

This is a joint post by NXP SEMICONDUCTORS N.V. & AWS Machine Learning Solutions Lab (MLSL)

Machine learning (ML) is being used across a wide range of industries to extract actionable insights from data to streamline processes and improve revenue generation. In this post, we demonstrate how NXP, an industry leader in the semiconductor sector, collaborated with the AWS Machine Learning Solutions Lab (MLSL) to use ML techniques to optimize the allocation of the NXP research and development (R&D) budget to maximize their long-term return on investment (ROI).

NXP directs its R&D efforts largely to the development of new semiconductor solutions where they see significant opportunities for growth. To outpace market growth, NXP invests in research and development to extend or create leading market positions, with an emphasis on fast-growing, sizable market segments. For this engagement, they sought to generate monthly sales forecasts for new and existing products across different material groups and business lines. In this post, we demonstrate how the MLSL and NXP employed Amazon Forecast and other custom models for long-term sales predictions for various NXP products.

“We engaged with the team of scientists and experts at [the] Amazon Machine Learning Solutions Lab to build a solution for predicting new product sales and understand if and which additional features could help inform [the] decision-making process for optimizing R&D spending. Within just a few weeks, the team delivered multiple solutions and analyses across some of our business lines, material groups, and on [an] individual product level. MLSL delivered a sales forecast model, which complements our current way of manual forecasting, and helped us model the product lifecycle with novel machine learning approaches using Amazon Forecast and Amazon SageMaker. While keeping a constant collaborative workstream with our team, MLSL helped us with upskilling our professionals when it comes to scientific excellence and best practices on ML development using AWS infrastructure.”

– Bart Zeeman, Strategist and Analyst at CTO office in NXP Semiconductors.

Goals and use case

The goal of the engagement between NXP and the MLSL team is to predict the overall sales of NXP in various end markets. In general, the NXP team is interested in macro-level sales that include the sales of various business lines (BLs), which contain multiple material groups (MAGs). Furthermore, the NXP team is also interested in predicting the product lifecycle of newly introduced products. The lifecycle of a product is divided into four different phases (Introduction, Growth, Maturity, and Decline). The product lifecycle prediction enables the NXP team to identify the revenue generated by each product to further allocate R&D funding to the products generating the highest amounts of sales or products with the highest potential to maximize the ROI for R&D activity. Additionally, they can predict the long-term sales on a micro level, which gives them a bottom-up look on how their revenue changes over time.

In the following sections, we present the key challenges associated with developing robust and efficient models for long-term sales forecasts. We further describe the intuition behind various modeling techniques employed to achieve the desired accuracy. We then present the evaluation of our final models, where we compare the performance of the proposed models in terms of sales prediction with the market experts at NXP. We also demonstrate the performance of our state-of-the-art point cloud-based product lifecycle prediction algorithm.

Challenges

One of the challenges we faced while using fine-grained or micro-level modeling like product-level models for sale prediction was missing sales data. The missing data is the result of lack of sales during every month. Similarly, for macro-level sales prediction, the length of the historical sales data was limited. Both the missing sales data and the limited length of historical sales data pose significant challenges in terms of model accuracy for long-term sales prediction into 2026. We observed during the exploratory data analysis (EDA) that as we move from micro-level sales (product level) to macro-level sales (BL level), missing values become less significant. However, the maximum length of historical sales data (maximum length of 140 months) still posed significant challenges in terms of model accuracy.

Modeling techniques

After EDA, we focused on forecasting at the BL and MAG levels and at the product level for one of the largest end markets (the automobile end market) for NXP. However, the solutions we developed can be extended to other end markets. Modeling at the BL, MAG, or product level has its own pros and cons in terms of model performance and data availability. The following table summarizes such pros and cons for each level. For macro-level sales prediction, we employed the Amazon Forecast AutoPredictor for our final solution. Similarly, for micro-level sales prediction, we developed a novel point cloud-based approach.

Macro sales prediction (top-down)

To predict the long terms sales values (2026) at the macro level, we tested various methods, including Amazon Forecast, GluonTS, and N-BEATS (implemented in GluonTS and PyTorch). Overall, Forecast outperformed all other methods based on a backtesting approach (described in the Evaluation Metrics section later in this post) for macro-level sales prediction. We also compared the accuracy of AutoPredictor against human predictions.

We also proposed using N-BEATS due to its interpretative properties. N-BEATS is based on a very simple but powerful architecture that uses an ensemble of feedforward networks that employ the residual connections with stacked residual blocks for forecasting. This architecture further encodes the inductive bias in its architecture to make the time series model capable of extracting trend and seasonality (see the following figure). These interpretations were generated using PyTorch Forecasting.

Micro sales prediction (bottom-up)

In this section, we discuss a novel method developed to predict the product lifecycle shown in the following figure while taking into consideration the cold start product. We implemented this method using PyTorch on Amazon SageMaker Studio. First, we introduced a point cloud-based method. This method first converts sales data into a point cloud, where each point represents sales data at a certain age of the product. The point cloud-based neural network model is further trained using this data to learn the parameters of the product lifecycle curve (see the following figure). In this approach, we also incorporated additional features, including product description as a bag of words to tackle the cold start problem for predicting the product lifecycle curve.

Time series as point cloud-based product lifecycle prediction

We developed a novel point cloud-based approach to predict the product lifecycle and micro-level sales predictions. We also incorporated additional features to further improve the model accuracy for the cold start product lifecycle predictions. These features include product fabrication techniques and other related categorical information related to the products. Such additional data can help the model predict sales of a new product even before the product is released on the market (cold start). The following figure demonstrates the point cloud-based approach. The model takes the normalized sales and age of the product (number of months since the product is launched) as input. Based on these inputs, the model learns parameters during the training using gradient descent. During the forecast phase, the parameters along with the features of a cold start product are used for predicting the lifecycle. The large number of missing values in the data at the product level negatively impacts nearly all of the existing time series models. This novel solution is based on the ideas of lifecycle modeling and treating time series data as point clouds to mitigate the missing values.

The following figure demonstrates how our point cloud-based lifecycle method addresses the missing data values and is capable of predicting the product lifecycle with very few training samples. The X-axis represents the age in time, and the Y-axis represents the sales of a product. Orange dots represent the training samples, green dots represent the testing samples, and the blue line demonstrates the predicted lifecycle of a product by the model.

Methodology

To predict macro-level sales, we employed Amazon Forecast among other techniques. Similarly, for micro sales, we developed a state-of-the-art point cloud-based custom model. Forecast outperformed all other methods in terms of model performance. We used Amazon SageMaker notebook instances to create a data processing pipeline that extracted training examples from Amazon Simple Storage Service (Amazon S3). The training data was further used as input for Forecast to train a model and predict long-term sales.

Training a time series model using Amazon Forecast consists of three main steps. In the first step, we imported the historical data into Amazon S3. Second, a predictor was trained using the historical data. Finally, we deployed the trained predictor to generate the forecast. In this section, we provide a detailed explanation along with code snippets of each step.

We started by extracting the latest sales data. This step included uploading the dataset to Amazon S3 in the correct format. Amazon Forecast takes three columns as inputs: timestamp, item_id, and target_value (sales data). The timestamp column contains the time of sales, which could be formatted as hourly, daily, and so on. The item_id column contains the name of the sold items, and the target_value column contains sales values. Next, we used the path of training data located in Amazon S3, defined the time series dataset frequency (H, D, W, M, Y), defined a dataset name, and identified the attributes of the dataset (mapped the respective columns in the dataset and their data types). Next, we called the create_dataset function from the Boto3 API to create a dataset with attributes such as Domain, DatasetType, DatasetName, DatasetFrequency, and Schema. This function returned a JSON object that contained the Amazon Resource Name (ARN). This ARN was subsequently used in the following steps. See the following code:

dataset_path = "PATH_OF_DATASET_IN_S3"
DATASET_FREQUENCY = "M" # Frequency of dataset (H, D, W, M, Y) 
TS_DATASET_NAME = "NAME_OF_THE_DATASET"
TS_SCHEMA = {
   "Attributes":[
      {
         "AttributeName":"item_id",
         "AttributeType":"string"
      },
       {
         "AttributeName":"timestamp",
         "AttributeType":"timestamp"
      },
      {
         "AttributeName":"target_value",
         "AttributeType":"float"
      }
   ]
}

create_dataset_response = forecast.create_dataset(Domain="CUSTOM",
                                                  DatasetType='TARGET_TIME_SERIES',
                                                  DatasetName=TS_DATASET_NAME,
                                                  DataFrequency=DATASET_FREQUENCY,
                                                  Schema=TS_SCHEMA)

ts_dataset_arn = create_dataset_response['DatasetArn']

After the dataset was created, it was imported into Amazon Forecast using the Boto3 create_dataset_import_job function. The create_dataset_import_job function takes the job name (a string value), the ARN of the dataset from the previous step, the location of the training data in Amazon S3 from the previous step, and the time stamp format as arguments. It returns a JSON object containing the import job ARN. See the following code:

TIMESTAMP_FORMAT = "yyyy-MM-dd"
TS_IMPORT_JOB_NAME = "SALES_DATA_IMPORT_JOB_NAME"

ts_dataset_import_job_response = 
    forecast.create_dataset_import_job(DatasetImportJobName=TS_IMPORT_JOB_NAME,
                                       DatasetArn=ts_dataset_arn,
                                       DataSource= {
                                         "S3Config" : {
                                             "Path": ts_s3_path,
                                             "RoleArn": role_arn
                                         } 
                                       },
                                       TimestampFormat=TIMESTAMP_FORMAT,
                                       TimeZone = TIMEZONE)

ts_dataset_import_job_arn = ts_dataset_import_job_response['DatasetImportJobArn']

The imported dataset was then used to create a dataset group using the create_dataset_group function. This function takes the domain (string values defining the domain of the forecast), dataset group name, and the dataset ARN as inputs:

DATASET_GROUP_NAME = "SALES_DATA_GROUP_NAME"
DATASET_ARNS = [ts_dataset_arn]

create_dataset_group_response = 
    forecast.create_dataset_group(Domain="CUSTOM",
                                  DatasetGroupName=DATASET_GROUP_NAME,
                                  DatasetArns=DATASET_ARNS)

dataset_group_arn = create_dataset_group_response['DatasetGroupArn']

Next, we used the dataset group to train forecasting models. Amazon Forecast offers various state-of-the-art models; any of these models can be used for training. We used AutoPredictor as our default model. The main advantage of using AutoPredictor is that it automatically generates the item-level forecast, using the optimal model from an ensemble of six state-of-the-art models based on the input dataset. The Boto3 API provides the create_auto_predictor function for training an auto prediction model. The input parameters of this function are PredictorName, ForecastHorizon, and ForecastFrequency. Users are also responsible for selecting the forecast horizon and frequency. The forecast horizon represents the window size of the future prediction, which can be formatted hours, days, weeks, months, and so on. Similarly, forecast frequency represents the granularity of the forecast values, such as hourly, daily, weekly, monthly, or yearly. We mainly focused on predicting monthly sales of NXP on various BLs. See the following code:

PREDICTOR_NAME = "SALES_PREDICTOR"
FORECAST_HORIZON = 24
FORECAST_FREQUENCY = "M"

create_auto_predictor_response = 
    forecast.create_auto_predictor(PredictorName = PREDICTOR_NAME,
                                   ForecastHorizon = FORECAST_HORIZON,
                                   ForecastFrequency = FORECAST_FREQUENCY,
                                   DataConfig = {
                                       'DatasetGroupArn': dataset_group_arn
                                    })

predictor_arn = create_auto_predictor_response['PredictorArn']

The trained predictor was then used to generate forecast values. Forecasts were generated using the create_forecast function from the previously trained predictor. This function takes the name of the forecast and the ARN of the predictor as inputs and generates the forecast values for the horizon and frequency defined in the predictor:

FORECAST_NAME = "SALES_FORECAST"

create_forecast_response = 
    forecast.create_forecast(ForecastName=FORECAST_NAME,
                             PredictorArn=predictor_arn)

Amazon Forecast is a fully managed service that automatically generates training and test datasets and provides various accuracy metrics to evaluate the reliability of the model-generated forecast. However, to build consensus on the predicted data and compare the predicted values with human predictions, we divided our historic data into training data and validation data manually. We trained the model using the training data without exposing the model to validation data and generated the prediction for the length of validation data. The validation data was compared with the predicted values to evaluate the model performance. Validation metrics may include mean absolute percent error (MAPE) and weighted absolute percent error (WAPE), among others. We used WAPE as our accuracy metric, as discussed in the next section.

Evaluation metrics

We first verified the model performance using backtesting to validate the prediction of our forecast model for long term sales forecast (2026 sales). We evaluated the model performance using the WAPE. The lower the WAPE value, the better the model. The key advantage of using WAPE over other error metrics like MAPE is that WAPE weighs the individual impact of each item’s sale. Therefore, it accounts for each product’s contribution to the total sale while calculating the overall error. For example, if you make an error of 2% on a product that generates $30 million and an error of 10% in a product that generates $50,000, your MAPE will not tell the entire story. The 2% error is actually costlier than the 10% error, something you can’t tell by using MAPE. Comparatively, WAPE will account for these differences. We also predicted various percentile values for the sales to demonstrate the upper and lower bounds of the model forecast.

Macro-level sales prediction model validation

Next, we validated the model performance in terms of WAPE values. We calculated the WAPE value of a model by splitting the data into test and validation sets. For example, in the 2019 WAPE value, we trained our model using sales data between 2011–2018 and predicted sales values for the next 12 months (2019 sale). Next, we calculated the WAPE value using the following formula:

We repeated the same procedure to calculate the WAPE value for 2020 and 2021. We evaluated the WAPE for all BLs in the auto end market for 2019, 2020, and 2021. Overall, we observed that Amazon Forecast can achieve a 0.33 WAPE value even for the year of 2020 (during the COVID-19 pandemic). In 2019 and 2020, our model achieved less than 0.1 WAPE values, demonstrating high accuracy.

Macro-level sales prediction baseline comparison

We compared the performance of the macro sales prediction models developed using Amazon Forecast to three baseline models in terms of WAPE value for 2019, 2020 and 2021 (see the following figure). Amazon Forecast either significantly outperformed the other baseline models or performed on par for all 3 years. These results further validate the effectiveness of our final model predictions.

Macro-level sales prediction model vs. human predictions

To further validate the confidence of our macro-level model, we next compared the performance of our model with the human-predicted sales values. At the beginning of the fourth quarter every year, market experts at NXP predict the sales value of each BL, taking into consideration global market trends as well as other global indicators that could potentially impact the sales of NXP products. We compare the percent error of the model prediction vs. human prediction to the actual sales values in 2019, 2020, and 2021. We trained three models using data from 2011–2018 and predicted the sales values until 2021. We next calculated the MAPE for the actual sales values. We then used the human-predicted values by the end of 2018 (test the model forecast 1Y ahead to 3Y ahead forecast). We repeated this process to predict the values in 2019 (1Y ahead forecast to 2Y ahead forecast) and 2020 (for 1Y ahead forecast). Overall, the model performed on par with the human predictors or better in some cases. These results demonstrate the effectiveness and reliability of our model.

Micro-level sales prediction and product lifecycle

The following figure depicts how the model behaves using product data while having access to very few observations for each product (namely one or two observations at the input for product lifecycle prediction). The orange dots represent the training data, the green dots represent the testing data, and the blue line represents the model predicted product lifecycle.

The model can be fed more observations for context without the need for re-training as new sales data become available. The following figure demonstrates how the model behaves if it is given more context. Ultimately, more context leads to lower WAPE values.

In addition, we managed to incorporate additional features for each product, including fabrication techniques and other categorical information. In this regard, external features helped reduce the WAPE value in the low-context regime (see the following figure). There are two explanations for this behavior. First, we need to let the data speak for itself in the high-context regimes. The additional features can interfere with this process. Second, we need better features. We used 1,000 dimensional one-hot-encoded features (bag of words). The conjecture is that better feature engineering techniques can help reduce WAPE even further.

Such additional data can help the model predict sales of new products even before the product is released on the market. For example, in the following figure, we plot how much mileage we can get only out of external features.

Conclusion

In this post, we demonstrated how the MLSL and NXP teams worked together to predict macro- and micro-level long-term sales for NXP. The NXP team will now learn how to use these sales predictions in their processes—for example, to use it as input for R&D funding decisions and enhance ROI. We used Amazon Forecast to predict the sales for business lines (macro sales), which we referred to as the top-down approach. We also proposed a novel approach using time series as a point cloud to tackle the challenges of missing values and cold start at the product level (micro level). We referred to this approach as bottom-up, where we predicted the monthly sales of each product. We further incorporated external features of each product to enhance the performance of the model for cold start.

Overall, the models developed during this engagement performed on par compared to human prediction. In some cases, the models performed better than human predictions in the long term. These results demonstrate the effectiveness and reliability of our models.

This solution can be employed for any forecasting problem. For further assistance in terms of designing and developing ML solutions, please free to get in touch with the MLSL team.

About the authors

Souad Boutane is a data scientist at NXP-CTO, where she is transforming various data into meaningful insights to support business decision using advanced tools and techniques.

Ben Fridolin is a data scientist at NXP-CTO, where he coordinates on accelerating AI and cloud adoption. He focuses on machine learning, deep learning and end-to-end ML solutions.

Cornee Geenen is a project lead in the Data Portfolio of NXP supporting the organization in it’s digital transformation towards becoming data centric.

Bart Zeeman is a strategist with a passion for data & analytics at NXP-CTO where he is driving for better data driven decisions for more growth and innovation.

Ahsan Ali is an Applied Scientist at the Amazon Machine Learning Solutions Lab, where he works with customers from different domains to solve their urgent and expensive problems using state-of-the-art AI/ML techniques.

Yifu Hu is an Applied Scientist in the Amazon Machine Learning Solutions lab, where he helps design creative ML solutions to address customers’ business problems in various industries.

Mehdi Noori is an Applied Science Manager at Amazon ML Solutions Lab, where he helps develop ML solutions for large organizations across various industries and leads the Energy vertical. He is passionate about using AI/ML to help customers achieve their Sustainability goals.

Huzefa Rangwala is a Senior Applied Science Manager at AIRE, AWS. He leads a team of scientists and engineers to enable machine learning based discovery of data assets. His research interests are in responsible AI, federated learning and applications of ML in health care and life sciences.

Rustan Leino provides proof that software is bug-free

April 5, 2023

by Amazon AWS

As a senior principal applied scientist at Amazon Web Services, Leino is continuing his career as a leading expert in program verification.Read More

Implement unified text and image search with a CLIP model using Amazon SageMaker and Amazon OpenSearch Service

April 5, 2023

by Kevin Du Amazon AWS

The rise of text and semantic search engines has made ecommerce and retail businesses search easier for its consumers. Search engines powered by unified text and image can provide extra flexibility in search solutions. You can use both text and images as queries. For example, you have a folder of hundreds of family pictures in your laptop. You want to quickly find a picture that was taken when you and your best friend were in front of your old house’s swimming pool. You can use conversational language like “two people stand in front of a swimming pool” as a query to search in a unified text and image search engine. You don’t need to have the right keywords in image titles to perform the query.

Amazon OpenSearch Service now supports the cosine similarity metric for k-NN indexes. Cosine similarity measures the cosine of the angle between two vectors, where a smaller cosine angle denotes a higher similarity between the vectors. With cosine similarity, you can measure the orientation between two vectors, which makes it a good choice for some specific semantic search applications.

Contrastive Language-Image Pre-Training (CLIP) is a neural network trained on a variety of image and text pairs. The CLIP neural network is able to project both images and text into the same latent space, which means that they can be compared using a similarity measure, such as cosine similarity. You can use CLIP to encode your products’ images or description into embeddings, and then store them into an OpenSearch Service k-NN index. Then your customers can query the index to retrieve products that they’re interested in.

You can use CLIP with Amazon SageMaker to perform encoding. Amazon SageMaker Serverless Inference is a purpose-built inference service that makes it easy to deploy and scale machine learning (ML) models. With SageMaker, you can deploy serverless for dev and test, and then move to real-time inference when you go to production. SageMaker serverless helps you save cost by scaling down infrastructure to 0 during idle times. This is perfect for building a POC, where you will have long idle times between development cycles. You can also use Amazon SageMaker batch transform to get inferences from large datasets.

In this post, we demonstrate how to build a search application using CLIP with SageMaker and OpenSearch Service. The code is open source, and it is hosted on GitHub.

Solution overview

OpenSearch Service provides text-matching and embedding k-NN search. We use embedding k-NN search in this solution. You can use both image and text as a query to search items from the inventory. Implementing this unified image and text search application consists of two phases:

k-NN reference index – In this phase, you pass a set of corpus documents or product images through a CLIP model to encode them into embeddings. Text and image embeddings are numerical representations of the corpus or images, respectively. You save those embeddings into a k-NN index in OpenSearch Service. The concept underpinning k-NN is that similar data points exist in close proximity in the embedding space. As an example, the text “a red flower,” the text “rose,” and an image of red rose are similar, so these text and image embeddings are close to each other in the embedding space.
k-NN index query – This is the inference phase of the application. In this phase, you submit a text search query or image search query through the deep learning model (CLIP) to encode as embeddings. Then, you use those embeddings to query the reference k-NN index stored in OpenSearch Service. The k-NN index returns similar embeddings from the embedding space. For example, if you pass the text of “a red flower,” it would return the embeddings of a red rose image as a similar item.

The following figure illustrates the solution architecture.

The workflow steps are as follows:

Create a SageMaker model from a pretrained CLIP model for batch and real-time inference.
Generate embeddings of product images using a SageMaker batch transform job.
Use SageMaker Serverless Inference to encode query image and text into embeddings in real time.
Use Amazon Simple Storage Service (Amazon S3) to store the raw text (product description) and images (product images) and image embedding generated by the SageMaker batch transform jobs.
Use OpenSearch Service as the search engine to store embeddings and find similar embeddings.
Use a query function to orchestrate encoding the query and perform a k-NN search.

We use Amazon SageMaker Studio notebooks (not shown in the diagram) as the integrated development environment (IDE) to develop the solution.

Set up solution resources

To set up the solution, complete the following steps:

Create a SageMaker domain and a user profile. For instructions, refer to Step 5 of Onboard to Amazon SageMaker Domain Using Quick setup.
Create an OpenSearch Service domain. For instructions, see Creating and managing Amazon OpenSearch Service domains.

You can also use an AWS CloudFormation template by following the GitHub instructions to create a domain.

You can connect Studio to Amazon S3 from Amazon Virtual Private Cloud (Amazon VPC) using an interface endpoint in your VPC, instead of connecting over the internet. By using an interface VPC endpoint (interface endpoint), the communication between your VPC and Studio is conducted entirely and securely within the AWS network. Your Studio notebook can connect to OpenSearch Service over a private VPC to ensure secure communication.

OpenSearch Service domains offer encryption of data at rest, which is a security feature that helps prevent unauthorized access to your data. Node-to-node encryption provides an additional layer of security on top of the default features of OpenSearch Service. Amazon S3 automatically applies server-side encryption (SSE-S3) for each new object unless you specify a different encryption option.

In the OpenSearch Service domain, you can attach identity-based policies define who can access a service, which actions they can perform, and if applicable, the resources on which they can perform those actions.

Encode images and text pairs into embeddings

This section discusses how to encode images and text into embeddings. This includes preparing data, creating a SageMaker model, and performing batch transform using the model.

Data overview and preparation

You can use a SageMaker Studio notebook with a Python 3 (Data Science) kernel to run the sample code.

For this post, we use the Amazon Berkeley Objects Dataset. The dataset is a collection of 147,702 product listings with multilingual metadata and 398,212 unique catalogue images. We only use the item images and item names in US English. For demo purposes, we use approximately 1,600 products. For more details about this dataset, refer to the README. The dataset is hosted in a public S3 bucket. There are 16 files that include product description and metadata of Amazon products in the format of listings/metadata/listings_<i>.json.gz. We use the first metadata file in this demo.

You use pandas to load the metadata, then select products that have US English titles from the data frame. Pandas is an open-source data analysis and manipulation tool built on top of the Python programming language. You use an attribute called main_image_id to identify an image. See the following code:

meta = pd.read_json("s3://amazon-berkeley-objects/listings/metadata/listings_0.json.gz", lines=True)
def func_(x):
    us_texts = [item["value"] for item in x if item["language_tag"] == "en_US"]
    return us_texts[0] if us_texts else None
 
meta = meta.assign(item_name_in_en_us=meta.item_name.apply(func_))
meta = meta[~meta.item_name_in_en_us.isna()][["item_id", "item_name_in_en_us", "main_image_id"]]
print(f"#products with US English title: {len(meta)}")
meta.head()

There are 1,639 products in the data frame. Next, link the item names with the corresponding item images. images/metadata/images.csv.gz contains image metadata. This file is a gzip-compressed CSV file with the following columns: image_id, height, width, and path. You can read the metadata file and then merge it with item metadata. See the following code:

image_meta = pd.read_csv("s3://amazon-berkeley-objects/images/metadata/images.csv.gz")
dataset = meta.merge(image_meta, left_on="main_image_id", right_on="image_id")
dataset.head()

You can use the SageMaker Studio notebook Python 3 kernel built-in PIL library to view a sample image from the dataset:

from sagemaker.s3 import S3Downloader as s3down
from pathlib import Path
from PIL import Image
 
def get_image_from_item_id(item_id = "B0896LJNLH", return_image=True):
    s3_data_root = "s3://amazon-berkeley-objects/images/small/"
 
    item_idx = dataset.query(f"item_id == '{item_id}'").index[0]
    s3_path = dataset.iloc[item_idx].path
    local_data_root = f'./data/images'
    local_file_name = Path(s3_path).name
 
    s3down.download(f'{s3_data_root}{s3_path}', local_data_root)
 
    local_image_path = f"{local_data_root}/{local_file_name}"
    if return_image:
        img = Image.open(local_image_path)
        return img, dataset.iloc[item_idx].item_name_in_en_us
    else:
        return local_image_path, dataset.iloc[item_idx].item_name_in_en_us
image, item_name = get_image_from_item_id()
print(item_name)
image

Model preparation

Next, create a SageMaker model from a pretrained CLIP model. The first step is to download the pre-trained model weighting file, put it into a model.tar.gz file, and upload it to an S3 bucket. The path of the pretrained model can be found in the CLIP repo. We use a pretrained ResNet-50 (RN50) model in this demo. See the following code:

%%writefile build_model_tar.sh
#!/bin/bash
 
MODEL_NAME=RN50.pt
MODEL_NAME_URL=https://openaipublic.azureedge.net/clip/models/afeb0e10f9e5a86da6080e35cf09123aca3b358a0c3e3b6c78a7b63bc04b6762/RN50.pt
 
BUILD_ROOT=/tmp/model_path
S3_PATH=s3://<your-bucket>/<your-prefix-for-model>/model.tar.gz
 
 
rm -rf $BUILD_ROOT
mkdir $BUILD_ROOT
cd $BUILD_ROOT && curl -o $BUILD_ROOT/$MODEL_NAME $MODEL_NAME_URL
cd $BUILD_ROOT && tar -czvf model.tar.gz .
aws s3 cp $BUILD_ROOT/model.tar.gz  $S3_PATH
!bash build_model_tar.sh

You then need to provide an inference entry point script for the CLIP model. CLIP is implemented using PyTorch, so you use the SageMaker PyTorch framework. PyTorch is an open-source ML framework that accelerates the path from research prototyping to production deployment. For information about deploying a PyTorch model with SageMaker, refer to Deploy PyTorch Models. The inference code accepts two environment variables: MODEL_NAME and ENCODE_TYPE. This helps us switch between different CLIP model easily. We use ENCODE_TYPE to specify if we want to encode an image or a piece of text. Here, you implement the model_fn, input_fn, predict_fn, and output_fn functions to override the default PyTorch inference handler. See the following code:

!mkdir -p code
%%writefile code/clip_inference.py
 
import io
import torch
import clip
from PIL import Image
import json
import logging
import sys
import os
 
import torch
import torch.nn as nn
import torch.nn.functional as F
from torchvision.transforms import ToTensor
 
logger = logging.getLogger(__name__)
logger.setLevel(logging.DEBUG)
logger.addHandler(logging.StreamHandler(sys.stdout))
 
MODEL_NAME = os.environ.get("MODEL_NAME", "RN50.pt")
# ENCODE_TYPE could be IMAGE or TEXT
ENCODE_TYPE = os.environ.get("ENCODE_TYPE", "TEXT")
 
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
 
# defining model and loading weights to it.
def model_fn(model_dir):
    model, preprocess = clip.load(os.path.join(model_dir, MODEL_NAME), device=device)
    return {"model_obj": model, "preprocess_fn": preprocess}
 
def load_from_bytearray(request_body):
    
    return image
 
# data loading
def input_fn(request_body, request_content_type):
    assert request_content_type in (
        "application/json",
        "application/x-image",
    ), f"{request_content_type} is an unknown type."
    if request_content_type == "application/json":
        data = json.loads(request_body)["inputs"]
    elif request_content_type == "application/x-image":
        image_as_bytes = io.BytesIO(request_body)
        data = Image.open(image_as_bytes)
    return data
 
# inference
def predict_fn(input_object, model):
    model_obj = model["model_obj"]
    # for image preprocessing
    preprocess_fn = model["preprocess_fn"]
    assert ENCODE_TYPE in ("TEXT", "IMAGE"), f"{ENCODE_TYPE} is an unknown encode type."
 
    # preprocessing
    if ENCODE_TYPE == "TEXT":
        input_ = clip.tokenize(input_object).to(device)
    elif ENCODE_TYPE == "IMAGE":
        input_ = preprocess_fn(input_object).unsqueeze(0).to(device)
 
    # inference
    with torch.no_grad():
        if ENCODE_TYPE == "TEXT":
            prediction = model_obj.encode_text(input_)
        elif ENCODE_TYPE == "IMAGE":
            prediction = model_obj.encode_image(input_)
    return prediction
  
# Serialize the prediction result into the desired response content type
def output_fn(predictions, content_type):
    assert content_type == "application/json"
    res = predictions.cpu().numpy().tolist()
return json.dumps(res)

The solution requires additional Python packages during model inference, so you can provide a requirements.txt file to allow SageMaker to install additional packages when hosting models:

%%writefile code/requirements.txt
ftfy
regex
tqdm
git+https://github.com/openai/CLIP.git

You use the PyTorchModel class to create an object to contain the information of the model artifacts’ Amazon S3 location and the inference entry point details. You can use the object to create batch transform jobs or deploy the model to an endpoint for online inference. See the following code:

from sagemaker.pytorch import PyTorchModel
from sagemaker import get_execution_role, Session
 
role = get_execution_role()
shared_params = dict(
    entry_point="clip_inference.py",
    source_dir="code",
    role=role,
    model_data="s3://<your-bucket>/<your-prefix-for-model>/model.tar.gz",
    framework_version="1.9.0",
    py_version="py38",
)
 
clip_image_model = PyTorchModel(
    env={'MODEL_NAME': 'RN50.pt', "ENCODE_TYPE": "IMAGE"},
    name="clip-image-model",
    **shared_params
)
 
clip_text_model = PyTorchModel(
    env={'MODEL_NAME': 'RN50.pt', "ENCODE_TYPE": "TEXT"},
    name="clip-text-model",
    **shared_params
)

Batch transform to encode item images into embeddings

Next, we use the CLIP model to encode item images into embeddings, and use SageMaker batch transform to run batch inference.

Before creating the job, use the following code snippet to copy item images from the Amazon Berkeley Objects Dataset public S3 bucket to your own bucket. The operation takes less than 10 minutes.

from multiprocessing.pool import ThreadPool
import boto3
from tqdm import tqdm
from urllib.parse import urlparse
 
s3_sample_image_root = "s3://<your-bucket>/<your-prefix-for-sample-images>"
s3_data_root = "s3://amazon-berkeley-objects/images/small/"
 
client = boto3.client('s3')
 
def upload_(args):
    client.copy_object(CopySource=args["source"], Bucket=args["target_bucket"], Key=args["target_key"])
 
arugments = []
for idx, record in dataset.iterrows():
    argument = {}
    argument["source"] = (s3_data_root + record.path)[5:]
    argument["target_bucket"] = urlparse(s3_sample_image_root).netloc
    argument["target_key"] = urlparse(s3_sample_image_root).path[1:] + record.path
    arugments.append(argument)
 
with ThreadPool(4) as p:
    r = list(tqdm(p.imap(upload_, arugments), total=len(dataset)))

Next, you perform inference on the item images in a batch manner. The SageMaker batch transform job uses the CLIP model to encode all the images stored in the input Amazon S3 location and uploads output embeddings to an output S3 folder. The job takes around 10 minutes.

batch_input = s3_sample_image_root + "/"
output_path = f"s3://<your-bucket>/inference/output"
 
clip_image_transformer = clip_image_model.transformer(
    instance_count=1,
    instance_type="ml.c5.xlarge",
    strategy="SingleRecord",
    output_path=output_path,
)
 
clip_image_transformer.transform(
    batch_input, 
    data_type="S3Prefix",
    content_type="application/x-image", 
    wait=True,
)

Load embeddings from Amazon S3 to a variable, so you can ingest the data into OpenSearch Service later:

embedding_root_path = "./data/embedding"
s3down.download(output_path, embedding_root_path)
 
embeddings = []
for idx, record in dataset.iterrows():
    embedding_file = f"{embedding_root_path}/{record.path}.out"
    embeddings.append(json.load(open(embedding_file))[0])

Create an ML-powered unified search engine

This section discusses how to create a search engine that that uses k-NN search with embeddings. This includes configuring an OpenSearch Service cluster, ingesting item embedding, and performing free text and image search queries.

Set up the OpenSearch Service domain using k-NN settings

Earlier, you created an OpenSearch cluster. Now you’re going to create an index to store the catalog data and embeddings. You can configure the index settings to enable the k-NN functionality using the following configuration:

index_settings = {
  "settings": {
    "index.knn": True,
    "index.knn.space_type": "cosinesimil"
  },
  "mappings": {
    "properties": {
      "embeddings": {
        "type": "knn_vector",
        "dimension": 1024 #Make sure this is the size of the embeddings you generated, for RN50, it is 1024
      }
    }
  }
}

This example uses the Python Elasticsearch client to communicate with the OpenSearch cluster and create an index to host your data. You can run %pip install elasticsearch in the notebook to install the library. See the following code:

import boto3
import json
from requests_aws4auth import AWS4Auth
from elasticsearch import Elasticsearch, RequestsHttpConnection
 
def get_es_client(host = "<your-opensearch-service-domain-url>",
    port = 443,
    region = "<your-region>",
    index_name = "clip-index"):
 
    credentials = boto3.Session().get_credentials()
    awsauth = AWS4Auth(credentials.access_key,
                       credentials.secret_key,
                       region,
                       'es',
                       session_token=credentials.token)
 
    headers = {"Content-Type": "application/json"}
 
    es = Elasticsearch(hosts=[{'host': host, 'port': port}],
                       http_auth=awsauth,
                       use_ssl=True,
                       verify_certs=True,
                       connection_class=RequestsHttpConnection,
                       timeout=60 # for connection timeout errors
    )
    return es
es = get_es_client()
es.indices.create(index=index_name, body=json.dumps(index_settings))

Ingest image embedding data into OpenSearch Service

You now loop through your dataset and ingest items data into the cluster. The data ingestion for this practice should finish within 60 seconds. It also runs a simple query to verify if the data has been ingested into the index successfully. See the following code:

# ingest_data_into_es
 
for idx, record in tqdm(dataset.iterrows(), total=len(dataset)):
    body = record[['item_name_in_en_us']].to_dict()
    body['embeddings'] = embeddings[idx]
    es.index(index=index_name, id=record.item_id, doc_type='_doc', body=body)
 
# Check that data is indeed in ES
res = es.search(
    index=index_name, body={
        "query": {
                "match_all": {}
    }},
    size=2)
assert len(res["hits"]["hits"]) > 0

Perform a real-time query

Now that you have a working OpenSearch Service index that contains embeddings of item images as our inventory, let’s look at how you can generate embedding for queries. You need to create two SageMaker endpoints to handle text and image embeddings, respectively.

You also create two functions to use the endpoints to encode images and texts. For the encode_text function, you add this is before an item name to translate an item name to a sentence for item description. memory_size_in_mb is set as 6 GB to serve the underline Transformer and ResNet models. See the following code:

text_predictor = clip_text_model.deploy(
    instance_type='ml.c5.xlarge',
    initial_instance_count=1,
    serverless_inference_config=ServerlessInferenceConfig(memory_size_in_mb=6144),
    serializer=JSONSerializer(),
    deserializer=JSONDeserializer(),
    wait=True
)
 
image_predictor = clip_image_model.deploy(
    instance_type='ml.c5.xlarge',
    initial_instance_count=1,
    serverless_inference_config=ServerlessInferenceConfig(memory_size_in_mb=6144),
    serializer=IdentitySerializer(content_type="application/x-image"),
    deserializer=JSONDeserializer(),
    wait=True
)
 
def encode_image(file_name="./data/images/0e9420c6.jpg"):    
    with open(file_name, "rb") as f:
        payload = f.read()
        payload = bytearray(payload)
    res = image_predictor.predict(payload)
    return res[0]
 
def encode_name(item_name):
    res = text_predictor.predict({"inputs": [f"this is a {item_name}"]})
    return res[0]

You can firstly plot the picture that will be used.

item_image_path, item_name = get_image_from_item_id(item_id = "B0896LJNLH", return_image=False)
feature_vector = encode_image(file_name=item_image_path)
print(feature_vector.shape)
Image.open(item_image_path)

Let’s look at the results of a simple query. After retrieving results from OpenSearch Service, you get the list of item names and images from dataset:

def search_products(embedding, k = 3):
    body = {
        "size": k,
        "_source": {
            "exclude": ["embeddings"],
        },
        "query": {
            "knn": {
                "embeddings": {
                    "vector": embedding,
                    "k": k,
                }
            }
        },
    }        
    res = es.search(index=index_name, body=body)
    images = []
    for hit in res["hits"]["hits"]:
        id_ = hit["_id"]
        image, item_name = get_image_from_item_id(id_)
        image.name_and_score = f'{hit["_score"]}:{item_name}'
        images.append(image)
    return images
 
def display_images(
    images: [PilImage], 
    columns=2, width=20, height=8, max_images=15, 
    label_wrap_length=50, label_font_size=8):
 
    if not images:
        print("No images to display.")
        return 
 
    if len(images) > max_images:
        print(f"Showing {max_images} images of {len(images)}:")
        images=images[0:max_images]
 
    height = max(height, int(len(images)/columns) * height)
    plt.figure(figsize=(width, height))
    for i, image in enumerate(images):
 
        plt.subplot(int(len(images) / columns + 1), columns, i + 1)
        plt.imshow(image)
 
        if hasattr(image, 'name_and_score'):
            plt.title(image.name_and_score, fontsize=label_font_size); 
            
images = search_products(feature_vector)

The first item has a score of 1.0, because the two images are the same. Other items are different types of glasses in the OpenSearch Service index.

You can use text to query the index as well:

feature_vector = encode_name("drinkware glass")
images = search_products(feature_vector)
display_images(images)

You’re now able to get three pictures of water glasses from the index. You can find the images and text within the same latent space with the CLIP encoder. Another example of this is to search for the word “pizza” in the index:

feature_vector = encode_name("pizza")
images = search_products(feature_vector)
display_images(images)

Clean up

With a pay-per-use model, Serverless Inference is a cost-effective option for an infrequent or unpredictable traffic pattern. If you have a strict service-level agreement (SLA), or can’t tolerate cold starts, real-time endpoints are a better choice. Using multi-model or multi-container endpoints provide scalable and cost-effective solutions for deploying large numbers of models. For more information, refer to Amazon SageMaker Pricing.

We suggest deleting the serverless endpoints when they are no longer needed. After finishing this exercise, you can remove the resources with the following steps (you can delete these resources from the AWS Management Console, or using the AWS SDK or SageMaker SDK):

Delete the endpoint you created.
Optionally, delete the registered models.
Optionally, delete the SageMaker execution role.
Optionally, empty and delete the S3 bucket.

Summary

In this post, we demonstrated how to create a k-NN search application using SageMaker and OpenSearch Service k-NN index features. We used a pre-trained CLIP model from its OpenAI implementation.

The OpenSearch Service ingestion implementation of the post is only used for prototyping. If you want to ingest data from Amazon S3 into OpenSearch Service at scale, you can launch an Amazon SageMaker Processing job with the appropriate instance type and instance count. For another scalable embedding ingestion solution, refer to Novartis AG uses Amazon OpenSearch Service K-Nearest Neighbor (KNN) and Amazon SageMaker to power search and recommendation (Part 3/4).

CLIP provides zero-shot capabilities, which makes it possible to adopt a pre-trained model directly without using transfer learning to fine-tune a model. This simplifies the application of the CLIP model. If you have pairs of product images and descriptive text, you can fine-tune the model with your own data using transfer learning to further improve the model performance. For more information, see Learning Transferable Visual Models From Natural Language Supervision and the CLIP GitHub repository.

About the Authors

Kevin Du is a Senior Data Lab Architect at AWS, dedicated to assisting customers in expediting the development of their Machine Learning (ML) products and MLOps platforms. With more than a decade of experience building ML-enabled products for both startups and enterprises, his focus is on helping customers streamline the productionalization of their ML solutions. In his free time, Kevin enjoys cooking and watching basketball.

Ananya Roy is a Senior Data Lab architect specialised in AI and machine learning based out of Sydney Australia . She has been working with diverse range of customers to provide architectural guidance and help them to deliver effective AI/ML solution via data lab engagement. Prior to AWS , she was working as senior data scientist and dealt with large-scale ML models across different industries like Telco, banks and fintech’s. Her experience in AI/ML has allowed her to deliver effective solutions for complex business problems, and she is passionate about leveraging cutting-edge technologies to help teams achieve their goals.

Promote search content using Featured Results for Amazon Kendra

April 5, 2023

by Maran Chandrasekaran Amazon AWS

Amazon Kendra is an intelligent search service powered by machine learning (ML). We are excited to announce the launch of Amazon Kendra Featured Results. This new feature makes specific documents or content appear at the top of the search results page whenever a user issues a certain query. You can use Featured Results to improve the visibility of new documents or to promote certain documents when users enter certain queries.

For example, you can specify that if your users enter the query “new products 2023,” then select the documents titled “What’s new” and “Coming soon” will feature at the top of the search results page. Furthermore, if your users frequently use certain queries, you can specify these queries for Featured Results. For example, if you look at your top queries using Amazon Kendra Analytics and find that specific queries such as “How does kendra semantically rank results?” and “kendra semantic search” are frequently used, then it might be useful for the queries to feature the document titled “Amazon Kendra search 101.”

In this post, we introduce Featured Results and show you how to use them.

Overview of solution

Featured results enables you to create direct mappings from exact queries to documents in your index, allowing you to bypass the usual Amazon Kendra ranking process. Amazon Kendra naturally handles keyword type queries to rank the most useful documents in the search results, avoiding excessive featuring of results based on simple keywords. Featured results are designed for specific queries, rather than queries that are too broad in scope. You can experiment with featuring different documents for different queries, or ensure certain documents get the visibility they deserve.

Prerequisites

To follow along, you should have the following prerequisites:

An active AWS account.
An Amazon Kendra index. For instructions, see Creating an index.

You can skip this step if you have a preexisting index to use for this demo.

Add a sample dataset to your index

Complete the following steps to add sample dataset to your index:

On the Amazon Kendra console, go to your index and choose Data sources.
Choose Add data source.
Under Available data sources, select Sample AWS documentation and choose Add dataset.
Enter a name for your Data source name (such as sample-aws-data) and choose Add data source.

Search without Featured Results

On the Amazon Kendra console, choose Search indexed content. In the query field, start with a query such as “Kendra S3 connectors”.

In search results, “DataSourceConfiguration – Amazon Kendra” is listed as the top search result based on the ranking process. But if you want to promote “Getting started with an Amazon S3 data source (Console) – Amazon Kendra,” you can bypass the Amazon Kendra ranking process to feature this result at the top of the search results page.

Create a Featured Results set

To feature certain results, you must specify an exact match of a full text query, not a partial match of a query using a keyword or phrase contained within a query. For example, if you only specify the query “Kendra” in a featured result set, queries such as “How does Kendra semantically rank results?” will not render the Featured Results. For more information on limits, see Quotas for Amazon Kendra. To create a Featured Results set, complete the following steps:

In the navigation pane, choose Featured results, under Enrichments.
Choose Create set.
Enter a name for your set (such as kendra_connector_feature) and choose Next.
Enter a keyword to find items to feature (kendra s3 connectors).
Select Getting started with an Amazon S3 data source (Console) – Amazon Kendra from the search results.
Choose Next.
Choose Add query.
Enter a query string (such as kendra s3 connectors) and choose Add.
Choose Next.
On the Review and create page, choose Create.

Your Amazon Kendra index is now ready for natural language queries.

Search with Featured Results

On the Amazon Kendra console, choose Search indexed content. In the query field, enter the keyword used in the feature results set kendra s3 connectors.Now, you should see Getting started with an Amazon S3 data source (Console) – Amazon Kendra featured as the top result on the search page

For more information about querying the index, see Querying an Index.

Clean up

To avoid incurring future charges and to clean out unused roles and policies, delete the resources you created:

On the Amazon Kendra index, choose Indexes in the navigation pane.
Select the index you created and on the Actions menu, choose Delete.
To confirm deletion, enter Delete when prompted and choose Delete.

Wait until you get the confirmation message; the process can take up to 15 minutes.

Conclusion

In this post, you learned how to use Amazon Kendra Featured Results to promote content in an enterprise search solution.

There are many additional features that we didn’t cover. For example:

You can enable user-based access control for your Amazon Kendra index, and restrict access to documents based on the access controls you have already configured.
You can map object attributes to Amazon Kendra index attributes, and enable them for faceting, search, and display in the search results.
You can quickly find information from webpages (HTML tables) using Amazon Kendra tabular search.

To learn more about Amazon Kendra, refer Amazon Kendra Developer Guide.

About the Authors

Maran Chandrasekaran is a Senior Solutions Architect at Amazon Web Services, working with our enterprise customers. Outside of work, he loves to travel.

Kartik Mittal is a Software Engineer at Amazon Web Services, working on Amazon Kendra, an enterprise search engine. Outside of work, he enjoys hiking and loves to travel.

Surya Ram is a Software Engineer at Amazon Web Services, working on Amazon Kendra. Outside of work, he enjoys chess, basketball and cricket.

Solution overview

Access JumpStart through the Studio UI

Use JumpStart programmatically with the SageMaker SDK

Deploy the pre-trained model

Input

Output

Supported parameters

Limitations and biases

Inpainting solution with mask generated via a prompt

Clean up

Conclusion

About the Authors

The three pillars

Hardware: Inferentia

Neuron and transformers-neuronx

DJLServing

Launch the Inferentia hardware

Install necessary dependencies and create the model

Run the serving container

Run inference

Clean up

Conclusion

About the Authors

Solution overview

Prerequisites

Create AWS Wavelength infrastructure

Create model artifacts using JumpStart

Convert local mode artifacts to remote Kubernetes deployment

Deploy SageMaker model artifacts

Connect to the 5G edge model

Clean up

Conclusion

About the Authors

Solution overview

Import data from Athena

Import the data

Train a model

Generate predictions

Import data from a third-party SaaS application to AWS

Clean up

Conclusion

About the authors

Goals and use case

Challenges

Modeling techniques

Macro sales prediction (top-down)

Micro sales prediction (bottom-up)

Time series as point cloud-based product lifecycle prediction

Methodology

Evaluation metrics

Macro-level sales prediction model validation

Macro-level sales prediction baseline comparison

Macro-level sales prediction model vs. human predictions

Micro-level sales prediction and product lifecycle

Conclusion

About the authors

Solution overview

Set up solution resources

Encode images and text pairs into embeddings

Data overview and preparation

Model preparation

Batch transform to encode item images into embeddings

Create an ML-powered unified search engine

Set up the OpenSearch Service domain using k-NN settings

Ingest image embedding data into OpenSearch Service

Perform a real-time query

Clean up

Summary

About the Authors

Overview of solution

Prerequisites

Add a sample dataset to your index

Search without Featured Results

Create a Featured Results set

Search with Featured Results

Clean up

Conclusion

About the Authors

Navigation

GenAI Vision Endless Possibilities