Implement unified text and image search with a CLIP model using Amazon SageMaker and Amazon OpenSearch Service

Implement unified text and image search with a CLIP model using Amazon SageMaker and Amazon OpenSearch Service

The rise of text and semantic search engines has made ecommerce and retail businesses search easier for its consumers. Search engines powered by unified text and image can provide extra flexibility in search solutions. You can use both text and images as queries. For example, you have a folder of hundreds of family pictures in your laptop. You want to quickly find a picture that was taken when you and your best friend were in front of your old house’s swimming pool. You can use conversational language like “two people stand in front of a swimming pool” as a query to search in a unified text and image search engine. You don’t need to have the right keywords in image titles to perform the query.

Amazon OpenSearch Service now supports the cosine similarity metric for k-NN indexes. Cosine similarity measures the cosine of the angle between two vectors, where a smaller cosine angle denotes a higher similarity between the vectors. With cosine similarity, you can measure the orientation between two vectors, which makes it a good choice for some specific semantic search applications.

Contrastive Language-Image Pre-Training (CLIP) is a neural network trained on a variety of image and text pairs. The CLIP neural network is able to project both images and text into the same latent space, which means that they can be compared using a similarity measure, such as cosine similarity. You can use CLIP to encode your products’ images or description into embeddings, and then store them into an OpenSearch Service k-NN index. Then your customers can query the index to retrieve products that they’re interested in.

You can use CLIP with Amazon SageMaker to perform encoding. Amazon SageMaker Serverless Inference is a purpose-built inference service that makes it easy to deploy and scale machine learning (ML) models. With SageMaker, you can deploy serverless for dev and test, and then move to real-time inference when you go to production. SageMaker serverless helps you save cost by scaling down infrastructure to 0 during idle times. This is perfect for building a POC, where you will have long idle times between development cycles. You can also use Amazon SageMaker batch transform to get inferences from large datasets.

In this post, we demonstrate how to build a search application using CLIP with SageMaker and OpenSearch Service. The code is open source, and it is hosted on GitHub.

Solution overview

OpenSearch Service provides text-matching and embedding k-NN search. We use embedding k-NN search in this solution. You can use both image and text as a query to search items from the inventory. Implementing this unified image and text search application consists of two phases:

  • k-NN reference index – In this phase, you pass a set of corpus documents or product images through a CLIP model to encode them into embeddings. Text and image embeddings are numerical representations of the corpus or images, respectively. You save those embeddings into a k-NN index in OpenSearch Service. The concept underpinning k-NN is that similar data points exist in close proximity in the embedding space. As an example, the text “a red flower,” the text “rose,” and an image of red rose are similar, so these text and image embeddings are close to each other in the embedding space.
  • k-NN index query – This is the inference phase of the application. In this phase, you submit a text search query or image search query through the deep learning model (CLIP) to encode as embeddings. Then, you use those embeddings to query the reference k-NN index stored in OpenSearch Service. The k-NN index returns similar embeddings from the embedding space. For example, if you pass the text of “a red flower,” it would return the embeddings of a red rose image as a similar item.

The following figure illustrates the solution architecture.

Solution Diagram

The workflow steps are as follows:

  1. Create a SageMaker model from a pretrained CLIP model for batch and real-time inference.
  2. Generate embeddings of product images using a SageMaker batch transform job.
  3. Use SageMaker Serverless Inference to encode query image and text into embeddings in real time.
  4. Use Amazon Simple Storage Service (Amazon S3) to store the raw text (product description) and images (product images) and image embedding generated by the SageMaker batch transform jobs.
  5. Use OpenSearch Service as the search engine to store embeddings and find similar embeddings.
  6. Use a query function to orchestrate encoding the query and perform a k-NN search.

We use Amazon SageMaker Studio notebooks (not shown in the diagram) as the integrated development environment (IDE) to develop the solution.

Set up solution resources

To set up the solution, complete the following steps:

  1. Create a SageMaker domain and a user profile. For instructions, refer to Step 5 of Onboard to Amazon SageMaker Domain Using Quick setup.
  2. Create an OpenSearch Service domain. For instructions, see Creating and managing Amazon OpenSearch Service domains.

You can also use an AWS CloudFormation template by following the GitHub instructions to create a domain.

You can connect Studio to Amazon S3 from Amazon Virtual Private Cloud (Amazon VPC) using an interface endpoint in your VPC, instead of connecting over the internet. By using an interface VPC endpoint (interface endpoint), the communication between your VPC and Studio is conducted entirely and securely within the AWS network. Your Studio notebook can connect to OpenSearch Service over a private VPC to ensure secure communication.

OpenSearch Service domains offer encryption of data at rest, which is a security feature that helps prevent unauthorized access to your data. Node-to-node encryption provides an additional layer of security on top of the default features of OpenSearch Service. Amazon S3 automatically applies server-side encryption (SSE-S3) for each new object unless you specify a different encryption option.

In the OpenSearch Service domain, you can attach identity-based policies define who can access a service, which actions they can perform, and if applicable, the resources on which they can perform those actions.

Encode images and text pairs into embeddings

This section discusses how to encode images and text into embeddings. This includes preparing data, creating a SageMaker model, and performing batch transform using the model.

Data overview and preparation

You can use a SageMaker Studio notebook with a Python 3 (Data Science) kernel to run the sample code.

For this post, we use the Amazon Berkeley Objects Dataset. The dataset is a collection of 147,702 product listings with multilingual metadata and 398,212 unique catalogue images. We only use the item images and item names in US English. For demo purposes, we use approximately 1,600 products. For more details about this dataset, refer to the README. The dataset is hosted in a public S3 bucket. There are 16 files that include product description and metadata of Amazon products in the format of listings/metadata/listings_<i>.json.gz. We use the first metadata file in this demo.

You use pandas to load the metadata, then select products that have US English titles from the data frame. Pandas is an open-source data analysis and manipulation tool built on top of the Python programming language. You use an attribute called main_image_id to identify an image. See the following code:

meta = pd.read_json("s3://amazon-berkeley-objects/listings/metadata/listings_0.json.gz", lines=True)
def func_(x):
    us_texts = [item["value"] for item in x if item["language_tag"] == "en_US"]
    return us_texts[0] if us_texts else None
 
meta = meta.assign(item_name_in_en_us=meta.item_name.apply(func_))
meta = meta[~meta.item_name_in_en_us.isna()][["item_id", "item_name_in_en_us", "main_image_id"]]
print(f"#products with US English title: {len(meta)}")
meta.head()

There are 1,639 products in the data frame. Next, link the item names with the corresponding item images. images/metadata/images.csv.gz contains image metadata. This file is a gzip-compressed CSV file with the following columns: image_id, height, width, and path. You can read the metadata file and then merge it with item metadata. See the following code:

image_meta = pd.read_csv("s3://amazon-berkeley-objects/images/metadata/images.csv.gz")
dataset = meta.merge(image_meta, left_on="main_image_id", right_on="image_id")
dataset.head()

data sample

You can use the SageMaker Studio notebook Python 3 kernel built-in PIL library to view a sample image from the dataset:

from sagemaker.s3 import S3Downloader as s3down
from pathlib import Path
from PIL import Image
 
def get_image_from_item_id(item_id = "B0896LJNLH", return_image=True):
    s3_data_root = "s3://amazon-berkeley-objects/images/small/"
 
    item_idx = dataset.query(f"item_id == '{item_id}'").index[0]
    s3_path = dataset.iloc[item_idx].path
    local_data_root = f'./data/images'
    local_file_name = Path(s3_path).name
 
    s3down.download(f'{s3_data_root}{s3_path}', local_data_root)
 
    local_image_path = f"{local_data_root}/{local_file_name}"
    if return_image:
        img = Image.open(local_image_path)
        return img, dataset.iloc[item_idx].item_name_in_en_us
    else:
        return local_image_path, dataset.iloc[item_idx].item_name_in_en_us
image, item_name = get_image_from_item_id()
print(item_name)
image

glass cup and title

Model preparation

Next, create a SageMaker model from a pretrained CLIP model. The first step is to download the pre-trained model weighting file, put it into a model.tar.gz file, and upload it to an S3 bucket. The path of the pretrained model can be found in the CLIP repo. We use a pretrained ResNet-50 (RN50) model in this demo. See the following code:

%%writefile build_model_tar.sh
#!/bin/bash
 
MODEL_NAME=RN50.pt
MODEL_NAME_URL=https://openaipublic.azureedge.net/clip/models/afeb0e10f9e5a86da6080e35cf09123aca3b358a0c3e3b6c78a7b63bc04b6762/RN50.pt
 
BUILD_ROOT=/tmp/model_path
S3_PATH=s3://<your-bucket>/<your-prefix-for-model>/model.tar.gz
 
 
rm -rf $BUILD_ROOT
mkdir $BUILD_ROOT
cd $BUILD_ROOT && curl -o $BUILD_ROOT/$MODEL_NAME $MODEL_NAME_URL
cd $BUILD_ROOT && tar -czvf model.tar.gz .
aws s3 cp $BUILD_ROOT/model.tar.gz  $S3_PATH
!bash build_model_tar.sh

You then need to provide an inference entry point script for the CLIP model. CLIP is implemented using PyTorch, so you use the SageMaker PyTorch framework. PyTorch is an open-source ML framework that accelerates the path from research prototyping to production deployment. For information about deploying a PyTorch model with SageMaker, refer to Deploy PyTorch Models. The inference code accepts two environment variables: MODEL_NAME and ENCODE_TYPE. This helps us switch between different CLIP model easily. We use ENCODE_TYPE to specify if we want to encode an image or a piece of text. Here, you implement the model_fn, input_fn, predict_fn, and output_fn functions to override the default PyTorch inference handler. See the following code:

!mkdir -p code
%%writefile code/clip_inference.py
 
import io
import torch
import clip
from PIL import Image
import json
import logging
import sys
import os
 
import torch
import torch.nn as nn
import torch.nn.functional as F
from torchvision.transforms import ToTensor
 
logger = logging.getLogger(__name__)
logger.setLevel(logging.DEBUG)
logger.addHandler(logging.StreamHandler(sys.stdout))
 
MODEL_NAME = os.environ.get("MODEL_NAME", "RN50.pt")
# ENCODE_TYPE could be IMAGE or TEXT
ENCODE_TYPE = os.environ.get("ENCODE_TYPE", "TEXT")
 
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
 
# defining model and loading weights to it.
def model_fn(model_dir):
    model, preprocess = clip.load(os.path.join(model_dir, MODEL_NAME), device=device)
    return {"model_obj": model, "preprocess_fn": preprocess}
 
def load_from_bytearray(request_body):
    
    return image
 
# data loading
def input_fn(request_body, request_content_type):
    assert request_content_type in (
        "application/json",
        "application/x-image",
    ), f"{request_content_type} is an unknown type."
    if request_content_type == "application/json":
        data = json.loads(request_body)["inputs"]
    elif request_content_type == "application/x-image":
        image_as_bytes = io.BytesIO(request_body)
        data = Image.open(image_as_bytes)
    return data
 
# inference
def predict_fn(input_object, model):
    model_obj = model["model_obj"]
    # for image preprocessing
    preprocess_fn = model["preprocess_fn"]
    assert ENCODE_TYPE in ("TEXT", "IMAGE"), f"{ENCODE_TYPE} is an unknown encode type."
 
    # preprocessing
    if ENCODE_TYPE == "TEXT":
        input_ = clip.tokenize(input_object).to(device)
    elif ENCODE_TYPE == "IMAGE":
        input_ = preprocess_fn(input_object).unsqueeze(0).to(device)
 
    # inference
    with torch.no_grad():
        if ENCODE_TYPE == "TEXT":
            prediction = model_obj.encode_text(input_)
        elif ENCODE_TYPE == "IMAGE":
            prediction = model_obj.encode_image(input_)
    return prediction
  
# Serialize the prediction result into the desired response content type
def output_fn(predictions, content_type):
    assert content_type == "application/json"
    res = predictions.cpu().numpy().tolist()
return json.dumps(res)

The solution requires additional Python packages during model inference, so you can provide a requirements.txt file to allow SageMaker to install additional packages when hosting models:

%%writefile code/requirements.txt
ftfy
regex
tqdm
git+https://github.com/openai/CLIP.git

You use the PyTorchModel class to create an object to contain the information of the model artifacts’ Amazon S3 location and the inference entry point details. You can use the object to create batch transform jobs or deploy the model to an endpoint for online inference. See the following code:

from sagemaker.pytorch import PyTorchModel
from sagemaker import get_execution_role, Session
 
role = get_execution_role()
shared_params = dict(
    entry_point="clip_inference.py",
    source_dir="code",
    role=role,
    model_data="s3://<your-bucket>/<your-prefix-for-model>/model.tar.gz",
    framework_version="1.9.0",
    py_version="py38",
)
 
clip_image_model = PyTorchModel(
    env={'MODEL_NAME': 'RN50.pt', "ENCODE_TYPE": "IMAGE"},
    name="clip-image-model",
    **shared_params
)
 
clip_text_model = PyTorchModel(
    env={'MODEL_NAME': 'RN50.pt', "ENCODE_TYPE": "TEXT"},
    name="clip-text-model",
    **shared_params
)

Batch transform to encode item images into embeddings

Next, we use the CLIP model to encode item images into embeddings, and use SageMaker batch transform to run batch inference.

Before creating the job, use the following code snippet to copy item images from the Amazon Berkeley Objects Dataset public S3 bucket to your own bucket. The operation takes less than 10 minutes.

from multiprocessing.pool import ThreadPool
import boto3
from tqdm import tqdm
from urllib.parse import urlparse
 
s3_sample_image_root = "s3://<your-bucket>/<your-prefix-for-sample-images>"
s3_data_root = "s3://amazon-berkeley-objects/images/small/"
 
client = boto3.client('s3')
 
def upload_(args):
    client.copy_object(CopySource=args["source"], Bucket=args["target_bucket"], Key=args["target_key"])
 
arugments = []
for idx, record in dataset.iterrows():
    argument = {}
    argument["source"] = (s3_data_root + record.path)[5:]
    argument["target_bucket"] = urlparse(s3_sample_image_root).netloc
    argument["target_key"] = urlparse(s3_sample_image_root).path[1:] + record.path
    arugments.append(argument)
 
with ThreadPool(4) as p:
    r = list(tqdm(p.imap(upload_, arugments), total=len(dataset)))

Next, you perform inference on the item images in a batch manner. The SageMaker batch transform job uses the CLIP model to encode all the images stored in the input Amazon S3 location and uploads output embeddings to an output S3 folder. The job takes around 10 minutes.

batch_input = s3_sample_image_root + "/"
output_path = f"s3://<your-bucket>/inference/output"
 
clip_image_transformer = clip_image_model.transformer(
    instance_count=1,
    instance_type="ml.c5.xlarge",
    strategy="SingleRecord",
    output_path=output_path,
)
 
clip_image_transformer.transform(
    batch_input, 
    data_type="S3Prefix",
    content_type="application/x-image", 
    wait=True,
)

Load embeddings from Amazon S3 to a variable, so you can ingest the data into OpenSearch Service later:

embedding_root_path = "./data/embedding"
s3down.download(output_path, embedding_root_path)
 
embeddings = []
for idx, record in dataset.iterrows():
    embedding_file = f"{embedding_root_path}/{record.path}.out"
    embeddings.append(json.load(open(embedding_file))[0])

Create an ML-powered unified search engine

This section discusses how to create a search engine that that uses k-NN search with embeddings. This includes configuring an OpenSearch Service cluster, ingesting item embedding, and performing free text and image search queries.

Set up the OpenSearch Service domain using k-NN settings

Earlier, you created an OpenSearch cluster. Now you’re going to create an index to store the catalog data and embeddings. You can configure the index settings to enable the k-NN functionality using the following configuration:

index_settings = {
  "settings": {
    "index.knn": True,
    "index.knn.space_type": "cosinesimil"
  },
  "mappings": {
    "properties": {
      "embeddings": {
        "type": "knn_vector",
        "dimension": 1024 #Make sure this is the size of the embeddings you generated, for RN50, it is 1024
      }
    }
  }
}

This example uses the Python Elasticsearch client to communicate with the OpenSearch cluster and create an index to host your data. You can run %pip install elasticsearch in the notebook to install the library. See the following code:

import boto3
import json
from requests_aws4auth import AWS4Auth
from elasticsearch import Elasticsearch, RequestsHttpConnection
 
def get_es_client(host = "<your-opensearch-service-domain-url>",
    port = 443,
    region = "<your-region>",
    index_name = "clip-index"):
 
    credentials = boto3.Session().get_credentials()
    awsauth = AWS4Auth(credentials.access_key,
                       credentials.secret_key,
                       region,
                       'es',
                       session_token=credentials.token)
 
    headers = {"Content-Type": "application/json"}
 
    es = Elasticsearch(hosts=[{'host': host, 'port': port}],
                       http_auth=awsauth,
                       use_ssl=True,
                       verify_certs=True,
                       connection_class=RequestsHttpConnection,
                       timeout=60 # for connection timeout errors
    )
    return es
es = get_es_client()
es.indices.create(index=index_name, body=json.dumps(index_settings))

Ingest image embedding data into OpenSearch Service

You now loop through your dataset and ingest items data into the cluster. The data ingestion for this practice should finish within 60 seconds. It also runs a simple query to verify if the data has been ingested into the index successfully. See the following code:

# ingest_data_into_es
 
for idx, record in tqdm(dataset.iterrows(), total=len(dataset)):
    body = record[['item_name_in_en_us']].to_dict()
    body['embeddings'] = embeddings[idx]
    es.index(index=index_name, id=record.item_id, doc_type='_doc', body=body)
 
# Check that data is indeed in ES
res = es.search(
    index=index_name, body={
        "query": {
                "match_all": {}
    }},
    size=2)
assert len(res["hits"]["hits"]) > 0

Perform a real-time query

Now that you have a working OpenSearch Service index that contains embeddings of item images as our inventory, let’s look at how you can generate embedding for queries. You need to create two SageMaker endpoints to handle text and image embeddings, respectively.

You also create two functions to use the endpoints to encode images and texts. For the encode_text function, you add this is before an item name to translate an item name to a sentence for item description. memory_size_in_mb is set as 6 GB to serve the underline Transformer and ResNet models. See the following code:

text_predictor = clip_text_model.deploy(
    instance_type='ml.c5.xlarge',
    initial_instance_count=1,
    serverless_inference_config=ServerlessInferenceConfig(memory_size_in_mb=6144),
    serializer=JSONSerializer(),
    deserializer=JSONDeserializer(),
    wait=True
)
 
image_predictor = clip_image_model.deploy(
    instance_type='ml.c5.xlarge',
    initial_instance_count=1,
    serverless_inference_config=ServerlessInferenceConfig(memory_size_in_mb=6144),
    serializer=IdentitySerializer(content_type="application/x-image"),
    deserializer=JSONDeserializer(),
    wait=True
)
 
def encode_image(file_name="./data/images/0e9420c6.jpg"):    
    with open(file_name, "rb") as f:
        payload = f.read()
        payload = bytearray(payload)
    res = image_predictor.predict(payload)
    return res[0]
 
def encode_name(item_name):
    res = text_predictor.predict({"inputs": [f"this is a {item_name}"]})
    return res[0]
 

You can firstly plot the picture that will be used.

item_image_path, item_name = get_image_from_item_id(item_id = "B0896LJNLH", return_image=False)
feature_vector = encode_image(file_name=item_image_path)
print(feature_vector.shape)
Image.open(item_image_path)

glass cup

Let’s look at the results of a simple query. After retrieving results from OpenSearch Service, you get the list of item names and images from dataset:

def search_products(embedding, k = 3):
    body = {
        "size": k,
        "_source": {
            "exclude": ["embeddings"],
        },
        "query": {
            "knn": {
                "embeddings": {
                    "vector": embedding,
                    "k": k,
                }
            }
        },
    }        
    res = es.search(index=index_name, body=body)
    images = []
    for hit in res["hits"]["hits"]:
        id_ = hit["_id"]
        image, item_name = get_image_from_item_id(id_)
        image.name_and_score = f'{hit["_score"]}:{item_name}'
        images.append(image)
    return images
 
def display_images(
    images: [PilImage], 
    columns=2, width=20, height=8, max_images=15, 
    label_wrap_length=50, label_font_size=8):
 
    if not images:
        print("No images to display.")
        return 
 
    if len(images) > max_images:
        print(f"Showing {max_images} images of {len(images)}:")
        images=images[0:max_images]
 
    height = max(height, int(len(images)/columns) * height)
    plt.figure(figsize=(width, height))
    for i, image in enumerate(images):
 
        plt.subplot(int(len(images) / columns + 1), columns, i + 1)
        plt.imshow(image)
 
        if hasattr(image, 'name_and_score'):
            plt.title(image.name_and_score, fontsize=label_font_size); 
            
images = search_products(feature_vector)

results

The first item has a score of 1.0, because the two images are the same. Other items are different types of glasses in the OpenSearch Service index.

You can use text to query the index as well:

feature_vector = encode_name("drinkware glass")
images = search_products(feature_vector)
display_images(images)

results

You’re now able to get three pictures of water glasses from the index. You can find the images and text within the same latent space with the CLIP encoder. Another example of this is to search for the word “pizza” in the index:

feature_vector = encode_name("pizza")
images = search_products(feature_vector)
display_images(images)

pizza results

Clean up

With a pay-per-use model, Serverless Inference is a cost-effective option for an infrequent or unpredictable traffic pattern. If you have a strict service-level agreement (SLA), or can’t tolerate cold starts, real-time endpoints are a better choice. Using multi-model or multi-container endpoints provide scalable and cost-effective solutions for deploying large numbers of models. For more information, refer to Amazon SageMaker Pricing.

We suggest deleting the serverless endpoints when they are no longer needed. After finishing this exercise, you can remove the resources with the following steps (you can delete these resources from the AWS Management Console, or using the AWS SDK or SageMaker SDK):

  1. Delete the endpoint you created.
  2. Optionally, delete the registered models.
  3. Optionally, delete the SageMaker execution role.
  4. Optionally, empty and delete the S3 bucket.

Summary

In this post, we demonstrated how to create a k-NN search application using SageMaker and OpenSearch Service k-NN index features. We used a pre-trained CLIP model from its OpenAI implementation.

The OpenSearch Service ingestion implementation of the post is only used for prototyping. If you want to ingest data from Amazon S3 into OpenSearch Service at scale, you can launch an Amazon SageMaker Processing job with the appropriate instance type and instance count. For another scalable embedding ingestion solution, refer to Novartis AG uses Amazon OpenSearch Service K-Nearest Neighbor (KNN) and Amazon SageMaker to power search and recommendation (Part 3/4).

CLIP provides zero-shot capabilities, which makes it possible to adopt a pre-trained model directly without using transfer learning to fine-tune a model. This simplifies the application of the CLIP model. If you have pairs of product images and descriptive text, you can fine-tune the model with your own data using transfer learning to further improve the model performance. For more information, see Learning Transferable Visual Models From Natural Language Supervision and the CLIP GitHub repository.


About the Authors

Kevin Du is a Senior Data Lab Architect at AWS, dedicated to assisting customers in expediting the development of their Machine Learning (ML) products and MLOps platforms. With more than a decade of experience building ML-enabled products for both startups and enterprises, his focus is on helping customers streamline the productionalization of their ML solutions. In his free time, Kevin enjoys cooking and watching basketball.

Ananya Roy is a Senior Data Lab architect specialised in AI and machine learning based out of Sydney Australia . She has been working with diverse range of customers to provide architectural guidance and help them to deliver effective AI/ML solution via data lab engagement. Prior to AWS , she was working as senior data scientist and dealt with large-scale ML models across different industries like Telco, banks and fintech’s. Her experience in AI/ML has allowed her to deliver effective solutions for complex business problems, and she is passionate about leveraging cutting-edge technologies to help teams achieve their goals.

Read More

Promote search content using Featured Results for Amazon Kendra

Promote search content using Featured Results for Amazon Kendra

Amazon Kendra is an intelligent search service powered by machine learning (ML). We are excited to announce the launch of Amazon Kendra Featured Results. This new feature makes specific documents or content appear at the top of the search results page whenever a user issues a certain query. You can use Featured Results to improve the visibility of new documents or to promote certain documents when users enter certain queries.

For example, you can specify that if your users enter the query “new products 2023,” then select the documents titled “What’s new” and “Coming soon” will feature at the top of the search results page. Furthermore, if your users frequently use certain queries, you can specify these queries for Featured Results. For example, if you look at your top queries using Amazon Kendra Analytics and find that specific queries such as “How does kendra semantically rank results?” and “kendra semantic search” are frequently used, then it might be useful for the queries to feature the document titled “Amazon Kendra search 101.”

In this post, we introduce Featured Results and show you how to use them.

Overview of solution

Featured results enables you to create direct mappings from exact queries to documents in your index, allowing you to bypass the usual Amazon Kendra ranking process. Amazon Kendra naturally handles keyword type queries to rank the most useful documents in the search results, avoiding excessive featuring of results based on simple keywords. Featured results are designed for specific queries, rather than queries that are too broad in scope. You can experiment with featuring different documents for different queries, or ensure certain documents get the visibility they deserve.

Prerequisites

To follow along, you should have the following prerequisites:

You can skip this step if you have a preexisting index to use for this demo.

Add a sample dataset to your index

Complete the following steps to add sample dataset to your index:

  1. On the Amazon Kendra console, go to your index and choose Data sources.
  2. Choose Add data source.
  3. Under Available data sources, select Sample AWS documentation and choose Add dataset.
  4. Enter a name for your Data source name (such as sample-aws-data) and choose Add data source.

Search without Featured Results

On the Amazon Kendra console, choose Search indexed content. In the query field, start with a query such as “Kendra S3 connectors”.

In search results, “DataSourceConfiguration – Amazon Kendra” is listed as the top search result based on the ranking process. But if you want to promote “Getting started with an Amazon S3 data source (Console) – Amazon Kendra,” you can bypass the Amazon Kendra ranking process to feature this result at the top of the search results page.

Create a Featured Results set

To feature certain results, you must specify an exact match of a full text query, not a partial match of a query using a keyword or phrase contained within a query. For example, if you only specify the query “Kendra” in a featured result set, queries such as “How does Kendra semantically rank results?” will not render the Featured Results. For more information on limits, see Quotas for Amazon Kendra. To create a Featured Results set, complete the following steps:

  1. In the navigation pane, choose Featured results, under Enrichments.
  2. Choose Create set.

  3. Enter a name for your set (such as kendra_connector_feature) and choose Next.
  4. Enter a keyword to find items to feature (kendra s3 connectors).
  5. Select Getting started with an Amazon S3 data source (Console) – Amazon Kendra from the search results.
  6. Choose Next.
  7. Choose Add query.

  8. Enter a query string (such as kendra s3 connectors) and choose Add.
  9. Choose Next.
  10. On the Review and create page, choose Create.

Your Amazon Kendra index is now ready for natural language queries.

Search with Featured Results

On the Amazon Kendra console, choose Search indexed content. In the query field, enter the keyword used in the feature results set kendra s3 connectors.Now, you should see Getting started with an Amazon S3 data source (Console) – Amazon Kendra featured as the top result on the search page

For more information about querying the index, see Querying an Index.

Clean up

To avoid incurring future charges and to clean out unused roles and policies, delete the resources you created:

  1. On the Amazon Kendra index, choose Indexes in the navigation pane.
  2. Select the index you created and on the Actions menu, choose Delete.
  3. To confirm deletion, enter Delete when prompted and choose Delete.

Wait until you get the confirmation message; the process can take up to 15 minutes.

Conclusion

In this post, you learned how to use Amazon Kendra Featured Results to promote content in an enterprise search solution.

There are many additional features that we didn’t cover. For example:

  • You can enable user-based access control for your Amazon Kendra index, and restrict access to documents based on the access controls you have already configured.
  • You can map object attributes to Amazon Kendra index attributes, and enable them for faceting, search, and display in the search results.
  • You can quickly find information from webpages (HTML tables) using Amazon Kendra tabular search.

To learn more about Amazon Kendra, refer Amazon Kendra Developer Guide.


About the Authors

Maran Chandrasekaran is a Senior Solutions Architect at Amazon Web Services, working with our enterprise customers. Outside of work, he loves to travel.

 Kartik Mittal is a Software Engineer at Amazon Web Services, working on Amazon Kendra, an enterprise search engine. Outside of work, he enjoys hiking and loves to travel.

Surya Ram is a Software Engineer at Amazon Web Services, working on Amazon Kendra. Outside of work, he enjoys chess, basketball and cricket.

Read More

Automatic image cropping with Amazon Rekognition

Automatic image cropping with Amazon Rekognition

Digital publishers are continuously looking for ways to streamline and automate their media workflows in order to generate and publish new content as rapidly as they can.

Many publishers have a large library of stock images that they use for their articles. These images can be reused many times for different stories, especially when the publisher has images of celebrities. Quite often, a journalist may need to crop out a desired celebrity from an image to use for their upcoming story. This is a manual, repetitive task that should be automated. Sometimes, an author may want to use an image of a celebrity, but it contains two people and the primary celebrity needs to be cropped from the image. Other times, celebrity images might need to be reformatted for publishing to a variety of platforms like mobile, social media, or digital news. Additionally, an author may need to change the image aspect ratio or put the celebrity in crisp focus.

In this post, we demonstrate how to use Amazon Rekognition to perform image analysis. Amazon Rekognition makes it easy to add this capability to your applications without any machine learning (ML) expertise and comes with various APIs to fulfil use cases such as object detection, content moderation, face detection and analysis, and text and celebrity recognition, which we use in this example.

The celebrity recognition feature in Amazon Rekognition automatically recognizes tens of thousands of well-known personalities in images and videos using ML. Celebrity recognition can detect not just the presence of the given celebrity but also the location within the image.

Overview of solution

In this post, we demonstrate how we can pass in a photo, a celebrity name, and an aspect ratio for the outputted image to be able to generate a cropped image of the given celebrity capturing their face in the center.

When working with the Amazon Rekognition celebrity detection API, many elements are returned in the response. The following are some key response elements:

  • MatchConfidence – A match confidence score that can be used to control API behavior. We recommend applying a suitable threshold to this score in your application to choose your preferred operating point. For example, by setting a threshold of 99%, you can eliminate false positives but may miss some potential matches.
  • Name, Id, and Urls – The celebrity name, a unique Amazon Rekognition ID, and list of URLs such as the celebrity’s IMDb or Wikipedia link for further information.
  • BoundingBox – Coordinates of the rectangular bounding box location for each recognized celebrity face.
  • KnownGender – Known gender identity for each recognized celebrity.
  • Emotions – Emotion expressed on the celebrity’s face, for example, happy, sad, or angry.
  • Pose – Pose of the celebrity face, using three axes of roll, pitch, and yaw.
  • Smile – Whether the celebrity is smiling or not.

Part of the API response from Amazon Rekognition includes the following code:

{
    "CelebrityFaces":
    [
        {
            "Urls":
            [
                "www.wikidata.org/wiki/Q2536951"
            ],
            "Name": "Werner Vogels",
            "Id": "23iZ1oP",
            "Face":
            {
                "BoundingBox":
                {
                    "Width": 0.10331031680107117,
                    "Height": 0.20054641366004944,
                    "Left": 0.5003396272659302,
                    "Top": 0.07391933351755142
                },
                "Confidence": 99.99765014648438,
...

In this exercise, we demonstrate how to use the bounding box element to identify the location of the face, as shown in the following example image. All of the dimensions are represented as ratios of the overall image size, so the numbers in the response are between 0–1. For example, in the sample API response, the width of the bounding box is 0.1, which implies the face width is 10% of the total width of the image.

Werner Vogels Bounding box

With this bounding box, we are now able to use logic to make sure that the face remains within the edges of the new image we create. We can apply some padding around this bounding box to keep the face in the center.

In the following sections, we show how to create the following cropped image output with Werner Vogels in crisp focus.

We launch an Amazon SageMaker notebook, which provides a Python environment where you can run the code to pass an image to Amazon Rekognition and then automatically modify the image with the celebrity in focus.

Werner Vogels cropped

The code performs the following high-level steps:

  1. Make a request to the recognize_celebrities API with the given image and celebrity name.
  2. Filter the response for the bounding box information.
  3. Add some padding to the bounding box such that we capture some of the background.

Prerequisites

For this walkthrough, you should have the following prerequisites:

Upload the sample image

Upload your sample celebrity image to your S3 bucket.

Run the code

To run the code, we use a SageMaker notebook, however any IDE would also work after installing Python, pillow, and Boto3. We create a SageMaker notebook as well as the AWS Identity and Access Management (IAM) role with the required permissions. Complete the following steps:

  1. Create the notebook and name it automatic-cropping-celebrity.

The default execution policy, which was created when creating the SageMaker notebook, has a simple policy that gives the role permissions to interact with Amazon S3.

  1. Update the Resource constraint with the S3 bucket name:
{
    "Version": "2012-10-17",
    "Statement":
    [
        {
            "Effect": "Allow",
            "Action":
            [
                "s3:GetObject",
                "s3:PutObject",
                "s3:DeleteObject",
                "s3:ListBucket"
            ],
            "Resource":
            [
                "arn:aws:s3::: # your-s3-bucket-name "
            ]
        }
    ]
}
  1. Create another policy to add to the SageMaker notebook IAM role to be able to call the RecognizeCelebrities API:
{
    "Version": "2012-10-17",
    "Statement":
    [
        {
            "Effect": "Allow",
            "Action": "rekognition:RecognizeCelebrities",
            "Resource": "*"
        }
    ]
}

IAM permissions

  1. On the SageMaker console, choose Notebook instances in the navigation pane.
  2. Locate the automatic-cropping-celebrity notebook and choose Open Jupyter.
  3. Choose New and conda_python3 as the kernel for your notebook.

Jupyter notebook

For the following steps, copy the code blocks into your Jupyter notebook and run them by choosing Run.

  1. First, we import helper functions and libraries:
import boto3
from PIL import Image
  1. Set variables
bucket = '<YOUR_BUCKET_NAME>'    
file = '<YOUR_FILE_NAME>'
celeb = '<CELEBRITY_NAME>'
aspect_ratio = <ASPECT_RATIO_OF_OUTPUT_IMAGE, e.g. 1 for square>
  1. Create a service client
rek = boto3.client('rekognition')
s3 = boto3.client('s3')
  1. Function to recognize the celebrities
def recognize_celebrity(photo):       

    with open(photo, 'rb') as image:
        response = rek.recognize_celebrities(Image={'Bytes': image.read()})

    image=Image.open(photo)
    file_type=image.format.lower()
    path, ext=image.filename.rsplit(".", 1)
    celeb_faces = response['CelebrityFaces']
        
    print(f'Detected {len(celeb_faces)} faces for {photo}')
    
    return celeb_faces, image, path, file_type
    
  1. Function to get the bounding box of the given celebrity:
def get_bounding_box(celeb_faces, img_width, img_height, celeb):
    bbox = None
    for celebrity in celeb_faces:
        if celebrity['Name'] == celeb:
                    
            box = celebrity['Face']['BoundingBox']    
            left = img_width * box['Left']    
            top = img_height * box['Top']    
            width = img_width * box['Width']    
            height = img_height * box['Height']              
            
            print('Left: ' + '{0:.0f}'.format(left))    
            print('Top: ' + '{0:.0f}'.format(top))    
            print('Face Width: ' + "{0:.0f}".format(width))    
            print('Face Height: ' + "{0:.0f}".format(height))    
                
            #dimenions of famous face inside the bounding boxes    
            x1=left    
            y1=top    
            x2=left+width    
            y2=top+height
            
            bbox = [x1,y1,x2,y2]
            print(f'Bbox coordinates: {bbox}')
    if bbox == None:
        raise ValueError(f"{celeb} not found in results")
            
    return bbox
  1. Function to add some padding to the bounding box, so we capture some background around the face
def pad_bbox(bbox, pad_width=0.5, pad_height=0.3):
    x1, y1, x2, y2 = bbox
    width = x2 - x1
    height = y2 - y1
    
    #dimenions of new image with padding 
    x1= max(x1 - (pad_width * width),0)    
    y1= max(y1 - (pad_height * height),0)  
    x2= max(x2 + (pad_width * width),0)
    y2= max(y2 + (pad_height * height),0)                       
            
    #dimenions of new image with aspect ratio, 1 is square, 1.5 is 6:4, 0.66 is 4:6
                        
    x1= max(x1-(max((y2-y1)*max(aspect_ratio,1)-(x2-x1),0)/2),0)    
    y1= max(y1-(max((x2-x1)*1/(min((aspect_ratio),1))-(y2-y1),0)/2),0) 
    x2= max(x2+(max((y2-y1)*max((aspect_ratio),1)-(x2-x1),0)/2),0)
    y2= max(y2+(max((x2-x1)*1/(min((aspect_ratio),1))-(y2-y1),0)/2),0)
                        
    print('x1-coordinate after padding: ' + '{0:.0f}'.format(x1))    
    print('y1-coordinate after padding: ' + '{0:.0f}'.format(y1))    
    print('x2-coordinate after padding: ' + "{0:.0f}".format(x2))    
    print('y2-coordinate after padding: ' + "{0:.0f}".format(y2))
    
    return [x1,y1,x2,y2]
  1. Function to save the image to the notebook storage and to Amazon S3
def save_image(roi, image, path, file_type):
    
    x1, y1, x2, y2 = roi
    
    image = image.crop((x1,y1,x2,y2))
    
    image.save(f'{path}-cropped.{file_type}')
            
    s3.upload_file(f'{path}-cropped.{file_type}', bucket, f'{path}-cropped.{file_type}')            
        
    return image
  1. Use the Python main() function to combine the preceding functions to complete the workflow of saving a new cropped image of our celebrity:
def main():
    # Download S3 image to local 
    s3.download_file(bucket, file, './'+file)
    
    #Load photo and recognize celebrity
    celeb_faces, img, file_name, file_type = recognize_celebrity(file)
    width, height = img.size
    
    #Get bounding box
    bbox = get_bounding_box(celeb_faces, width, height, celeb)
    
    #Get padded bounding box
    padded_bbox = pad_bbox(bbox)
     
    #Save result and display  
    result = save_image(padded_bbox, img, file_name, file_type)
    display(result)
    
    
if __name__ == "__main__":
    main()

When you run this code block, you can see that we found Werner Vogels and created a new image with his face in the center.

Werner Vogels cropped

The image will be saved to the notebook and also uploaded to the S3 bucket.

Jupyter notebook output

You could include this solution in a larger workflow; for example, a publishing company might want to publish this capability as an endpoint to reformat and resize images on the fly when publishing articles of celebrities to multiple platforms.

Cleaning up

To avoid incurring future charges, delete the resources:

  1. On the SageMaker console, select your notebook and on the Actions menu, choose Stop.
  2. After the notebook is stopped, on the Actions menu, choose Delete.
  3. On the IAM console, delete the SageMaker execution role you created.
  4. On the Amazon S3 console, delete the input image and any output files from your S3 bucket.

Conclusion

In this post, we showed how we can use Amazon Rekognition to automate an otherwise manual task of modifying images to support media workflows. This is particularly important within the publishing industry where speed matters in getting fresh content out quickly and to multiple platforms.

For more information about working with media assets, refer to Media intelligence just got smarter with Media2Cloud 3.0


About the Author

Mark Watkins is a Solutions Architect within the Media and Entertainment team. He helps customers creating AI/ML solutions which solve their business challenges using AWS. He has been working on several AI/ML projects related to computer vision, natural language processing, personalization, ML at the edge, and more. Away from professional life, he loves spending time with his family and watching his two little ones growing up.

Read More

NVIDIA Takes Inference to New Heights Across MLPerf Tests

NVIDIA Takes Inference to New Heights Across MLPerf Tests

MLPerf remains the definitive measurement for AI performance as an independent, third-party benchmark. NVIDIA’s AI platform has consistently shown leadership across both training and inference since the inception of MLPerf, including the MLPerf Inference 3.0 benchmarks released today.

“Three years ago when we introduced A100, the AI world was dominated by computer vision. Generative AI has arrived,” said NVIDIA founder and CEO Jensen Huang.

“This is exactly why we built Hopper, specifically optimized for GPT with the Transformer Engine. Today’s MLPerf 3.0 highlights Hopper delivering 4x more performance than A100.

“The next level of Generative AI requires new AI infrastructure to train large language models with great energy efficiency. Customers are ramping Hopper at scale, building AI infrastructure with tens of thousands of Hopper GPUs connected by NVIDIA NVLink and InfiniBand.

“The industry is working hard on new advances in safe and trustworthy Generative AI. Hopper is enabling this essential work,” he said.

The latest MLPerf results show NVIDIA taking AI inference to new levels of performance and efficiency from the cloud to the edge.

Specifically, NVIDIA H100 Tensor Core GPUs running in DGX H100 systems delivered the highest performance in every test of AI inference, the job of running neural networks in production. Thanks to software optimizations, the GPUs delivered up to 54% performance gains from their debut in September.

In healthcare, H100 GPUs delivered a 31% performance increase since September on 3D-UNet, the MLPerf benchmark for medical imaging.

H100 GPU AI inference performance on MLPerf workloads

Powered by its Transformer Engine, the H100 GPU, based on the Hopper architecture, excelled on BERT, a transformer-based large language model that paved the way for today’s broad use of generative AI.

Generative AI lets users quickly create text, images, 3D models and more. It’s a capability companies from startups to cloud service providers are rapidly adopting to enable new business models and accelerate existing ones.

Hundreds of millions of people are now using generative AI tools like ChatGPT — also a transformer model — expecting instant responses.

At this iPhone moment of AI, performance on inference is vital. Deep learning is now being deployed nearly everywhere, driving an insatiable need for inference performance from factory floors to online recommendation systems.

L4 GPUs Speed Out of the Gate

NVIDIA L4 Tensor Core GPUs made their debut in the MLPerf tests at over 3x the speed of prior-generation T4 GPUs. Packaged in a low-profile form factor, these accelerators are designed to deliver high throughput and low latency in almost any server.

L4 GPUs ran all MLPerf workloads. Thanks to their support for the key FP8 format, their results were particularly stunning on the performance-hungry BERT model.

NVIDIA L4 GPU AI inference performance on MLPerf workloads

In addition to stellar AI performance, L4 GPUs deliver up to 10x faster image decode, up to 3.2x faster video processing and over 4x faster graphics and real-time rendering performance.

Announced two weeks ago at GTC, these accelerators are already available from major systems makers and cloud service providers. L4 GPUs are the latest addition to NVIDIA’s portfolio of AI inference platforms launched at GTC.

Software, Networks Shine in System Test

NVIDIA’s full-stack AI platform showed its leadership in a new MLPerf test.

The so-called network-division benchmark streams data to a remote inference server. It reflects the popular scenario of enterprise users running AI jobs in the cloud with data stored behind corporate firewalls.

On BERT, remote NVIDIA DGX A100 systems delivered up to 96% of their maximum local performance, slowed in part because they needed to wait for CPUs to complete some tasks. On the ResNet-50 test for computer vision, handled solely by GPUs, they hit the full 100%.

Both results are thanks, in large part, to NVIDIA Quantum Infiniband networking, NVIDIA ConnectX SmartNICs and software such as NVIDIA GPUDirect.

Orin Shows 3.2x Gains at the Edge

Separately, the NVIDIA Jetson AGX Orin system-on-module delivered gains of up to 63% in energy efficiency and 81% in performance compared with its results a year ago. Jetson AGX Orin supplies inference when AI is needed in confined spaces at low power levels, including on systems powered by batteries.

Jetson AGX Orin AI inference performance on MLPerf benchmarks

For applications needing even smaller modules drawing less power, the Jetson Orin NX 16G shined in its debut in the benchmarks. It delivered up to 3.2x the performance of the prior-generation Jetson Xavier NX processor.

A Broad NVIDIA AI Ecosystem

The MLPerf results show NVIDIA AI is backed by the industry’s broadest ecosystem in machine learning.

Ten companies submitted results on the NVIDIA platform in this round. They came from the Microsoft Azure cloud service and system makers including ASUS, Dell Technologies, GIGABYTE, H3C, Lenovo, Nettrix, Supermicro and xFusion.

Their work shows users can get great performance with NVIDIA AI both in the cloud and in servers running in their own data centers.

NVIDIA partners participate in MLPerf because they know it’s a valuable tool for customers evaluating AI platforms and vendors. Results in the latest round demonstrate that the performance they deliver today will grow with the NVIDIA platform.

Users Need Versatile Performance

NVIDIA AI is the only platform to run all MLPerf inference workloads and scenarios in data center and edge computing. Its versatile performance and efficiency make users the real winners.

Real-world applications typically employ many neural networks of different kinds that often need to deliver answers in real time.

For example, an AI application may need to understand a user’s spoken request, classify an image, make a recommendation and then deliver a response as a spoken message in a human-sounding voice. Each step requires a different type of AI model.

The MLPerf benchmarks cover these and other popular AI workloads. That’s why the tests ensure IT decision makers will get performance that’s dependable and flexible to deploy.

Users can rely on MLPerf results to make informed buying decisions, because the tests are transparent and objective. The benchmarks enjoy backing from a broad group that includes Arm, Baidu, Facebook AI, Google, Harvard, Intel, Microsoft, Stanford and the University of Toronto.

Software You Can Use

The software layer of the NVIDIA AI platform, NVIDIA AI Enterprise,  ensures users get optimized performance from their infrastructure investments as well as the enterprise-grade support, security and reliability required to run AI in the corporate data center.

All the software used for these tests is available from the MLPerf repository, so anyone can get these world-class results.

Optimizations are continuously folded into containers available on NGC, NVIDIA’s catalog for GPU-accelerated software. The catalog hosts NVIDIA TensorRT, used by every submission in this round to optimize AI inference.

Read this technical blog for a deeper dive into the optimizations fueling NVIDIA’s MLPerf performance and efficiency.

Read More

Automate and implement version control for Amazon Kendra FAQs

Automate and implement version control for Amazon Kendra FAQs

Amazon Kendra is an intelligent search service powered by machine learning (ML). Amazon Kendra reimagines enterprise search for your websites and applications so your employees and customers can easily find the content they’re looking for, even when it’s scattered across multiple locations and content repositories within your organization.

Amazon Kendra FAQs allow users to upload frequently asked questions with their corresponding answers. This helps to consistently answer common queries among end-users. As of this writing, when you want to update FAQs, you must delete the FAQ and create it again. In this post, we present a simpler, faster approach for updating your Amazon Kendra FAQs (with versioning enabled). Our method eliminates the manual steps of creating and deleting FAQs when you update their contents.

Overview of solution

We use a fully deployable AWS CloudFormation template to create an Amazon Simple Storage Service (Amazon S3) bucket, which becomes the source to store your Amazon Kendra FAQs. Each index-based FAQ is maintained in the folder with a prefix relating to the Amazon Kendra index.

This solution uses an AWS Lambda function that gets triggered by an Amazon S3 event notification. When you upload an FAQ to the S3 folder mapped to a specific Amazon Kendra index, it creates a new version of the FAQ for your index. Older versions of FAQs are deleted only after the new FAQ index version is created, achieving near-zero downtime of index searching.

The following figure shows the workflow of how our method creates and deletes a new version of an Amazon Kendra FAQ.

Architecture for Automated FAQ Update for Amazon Kendra

The workflow steps are as follows:

  1. The user uploads the Amazon Kendra FAQ document to the S3 bucket mapped to the Amazon Kendra index.
  2. The Amazon S3 PutObject event triggers the Lambda function, which reads the event details.
  3. The Lambda function creates a new version of the FAQ for the target index for each uploaded document and deletes the older versions of the FAQ.
  4. The Lambda function then publishes a message to Amazon Simple Notification Service (Amazon SNS), which sends an email to the user notifying them that the FAQ has been successfully updated.

Prerequisites

Before you begin the walkthrough, you need an AWS account (if you don’t have one, you can sign up for one). You also need to create the files containing the sample FAQs:

  • basic.csv – The following code is the sample FAQ CSV template:
    How many free clinics are in Spokane WA?, 13, https://www.freeclinics.com/
    How many free clinics are there in Mountain View Missouri?, 7, https://www.freeclinics.com/

  • demo.json – The following code is the sample FAQ JSON template:
    {
      "SchemaVersion": 1,
      "FaqDocuments": [
        {
          "Question": "How many free clinics are in Spokane WA?",
          "Answer": "13"
        },
        {
          "Question": "How many free clinics are there in Mountain View Missouri?",
          "Answer": "7",
          "Attributes": {
            "_source_uri": "https://www.freeclinics.com",
            "_category": "Charitable Clinics"
          }
        }
      ]
    }

  • header_demo.csv – The following code is the sample FAQ CSV template with header:
    _question,_answer,_last_updated_at
    How many free clinics are in Spokane WA?, 13, 2012-03-25T12:30:10+01:00
    How many free clinics are there in Mountain View Missouri?, 7, 2012-03-25T12:30:10+01:00

Deploy the solution

The CloudFormation templates that create the resources used by this solution can found in the GitHub repository. Follow the instructions in the repository to deploy the solution. AWS CloudFormation creates the following resources in your account:

  • An S3 bucket that will be the source for the Amazon Kendra FAQ.
  • An Amazon Kendra index.
  • An AWS Identity and Access Management (IAM) role for the Amazon Kendra FAQ to read (GetObject) from the S3 bucket.
  • A Lambda function that is configured to get triggered by an Amazon S3 event. The function is created outside of an Amazon VPC.

Note that resource creation can take approximately 30 minutes.

After you run the deployment, you’ll receive an email prompting you to confirm the subscription at the approver email address. Choose Confirm subscription.

Amazon SNS subscription Email

You’re redirected to a page confirming your subscription.

SNS Subscription Confirmation

Verify that the Amazon Kendra index is listed on the Amazon Kendra console. In this post, we named the Amazon Kendra index sample-kendra-index.

Amazon Kendra index as seen from the Amazon Kendra console

Upload a sample FAQ document to Amazon S3

In the previous step, you successfully deployed the CloudFormation stack. We use the output of the stack in the following steps:

  1. On the Outputs tab of the CloudFormation stack, note the values for S3Bucket (kendra-faq-<random-stack-id>) and KendraIndex.
    AWS CloudFormation Output
  2. On the Amazon S3 console, navigate to the S3 bucket created from the CloudFormation stack.
  3. Choose Create folder and create a folder called faq-<index-id>. For index-id, use the value you noted for the CloudFormation parameter KendraIndex. After the folder is created, this becomes the prefix for the sample-kendra-index FAQ.
    Create S3 folder prefixed with faq
  4. Upload the demo.json FAQ document to that folder.
    Upload the demo.json FAQ document in that folder

Verify that the index FAQ is created

To confirm that the index FAQ is created, complete the following steps:

  1. On the Amazon Kendra console, navigate to the index sample_kendra_index, which was created as part of the deployment.
  2. Navigate to the FAQs page for this index to check if an FAQ is listed.

The index has the naming convention <file-name>-faq-<Date-Time>.

Resulting FAQ created by the automation solution

When the FAQ is successfully created, you will receive another email informing you about it. You may upload new versions of the FAQ after you have received this email.

Receiving email for successful FAQ creation

Note that the automation identifies the file format that it must use while creating the FAQ by reading the uploaded file extension and as an exception case by the prefix of header_ for the CSV document with a header. The target Amazon Kendra index is identified by the S3 bucket folder name, which has the index ID as the suffix; for example, faq-1f01abb8-341c-4921-ad16-139ee517a845.

Upload additional FAQ documents

Amazon Kendra FAQ supports three types of file format: CSV, CSV_WITH_HEADER, and JSON. Make sure that when you upload a CSV file with the header, the file name should have a prefix with header_ (this is only when using the CSV file format with a header in its contents). To upload your FAQ documents, complete the following steps:

  1. Upload the header_demo.csv file to the same folder.
    Upload the heder_demo.csv FAQ document in that folder
  2. Verify that the FAQ is created on the Amazon Kendra console.
    Verify that the FAQ is created

FAQ creation is case-sensitive to the file format of the FAQ document that you upload. For example, if you upload demo.json and demo.JSON, both are treated as unique objects in Amazon S3. Therefore, this action creates two FAQs, such as demo-json-faq-22-09-2022-20-09-11 and demo-JSON-faq-22-09-2022-20-09-11.

  1. Upload demo.JSON.
    demo.json and demo.JSON are uploaded to the S3 bucket
  2. Verify that the FAQ for demo.JSON is created on the Amazon Kendra console.
    Case sensitive file names result in 2 new FAQs created

Create a new version of the index FAQ

Now the solution is self-sufficient and able to work independently whenever you upload a new version of the FAQ document in Amazon S3.

To test this, upload a new updated version of your demo.json FAQ document to the faq-<index-id> folder. When you navigate to the FAQ for the index, there will be an FAQ named <file-name>-faq-<Date-Time>.

This solution creates a new version of the FAQ for the new version of the FAQ document that was uploaded in Amazon S3. When the FAQ is active, it deletes the older version of the FAQ for the same document.

Verify that only the latest version of the FAQ exists in the index

Create an FAQ with a description

This solution also supports creating an FAQ with a description when files are named in a specific manner: <document_name>-desc-<your faq description>.fileformat[json|csv]. For example, demo-desc-hello world.json. Upload this FAQ document to the faq-<index-id> folder.

Upload the file with the description in its name to S3

After you upload the document, the FAQ will be created and it will have the description as mentioned in the file name.

FAQ created with description

You should only use -desc- when you must add a description to an FAQ. If you upload a file with the same document_name prefix, it will delete the old FAQ created from the document_name.fileformat FAQ document and create a new FAQ with the description.

Clean up

To clean up, perform the following actions:

  1. Empty the S3 bucket that was created by the CloudFormation stack to store the FAQ documents. For instructions, refer to Emptying a bucket.
  2. Delete the CloudFormation stack. For instructions, refer to Deleting a stack on the AWS CloudFormation console.

Conclusion

In this post, we introduced an automated way to manage your Amazon Kendra FAQs. After implementing this solution, you should be able to create and delete FAQs just by uploading them to an S3 bucket. This way, you save time by avoiding repetitive manual changes and troubleshooting inconsistent issues that are caused by unexpected operational incidents. You can also audit Amazon Kendra FAQs across your organization with confidence.

Do you have feedback about this post? Submit your comments in the comments section. You can also post questions on the AWS re:Post forum.


About the Author

debobhadDebojit is a DevOps consultant who specializes in helping customers deliver secure and reliable solutions using AWS services. He concentrates on infrastructure development and building serverless solutions with AWS and DevOps. Apart from work, Debojit enjoys watching movies and spending time with his family.

glennchiGlenn is a Cloud Architect at AWS. He utilizes technology to help customers deliver on their desired outcomes in their cloud adoption journey. His current focus is DevOps and developing open-source software.

shalabhShalabh is a Senior Consultant based in London. His main focus is helping companies deliver secure, reliable, and fast solutions using AWS services. He gets very excited about customers innovating with AWS and DevOps. Outside of work, Shalabh is a cricket fan and a passionate singer.

Read More