Schneider Electric leverages Retrieval Augmented LLMs on SageMaker to ensure real-time updates in their ERP systems

Schneider Electric leverages Retrieval Augmented LLMs on SageMaker to ensure real-time updates in their ERP systems

This post was co-written with Anthony Medeiros, Manager of Solutions Engineering and Architecture for North America Artificial Intelligence, and Blake Santschi, Business Intelligence Manager, from Schneider Electric. Additional Schneider Electric experts include Jesse Miller, Somik Chowdhury, Shaswat Babhulgaonkar, David Watkins, Mark Carlson and Barbara Sleczkowski. 

Enterprise Resource Planning (ERP) systems are used by companies to manage several business functions such as accounting, sales or order management in one system. In particular, they are routinely used to store information related to customer accounts. Different organizations within a company might use different ERP systems and merging them is a complex technical challenge at scale which requires domain-specific knowledge.

Schneider Electric is a leader in digital transformation of energy management and industrial automation. To best serve their customers’ needs, Schneider Electric needs to keep track of the links between related customers’ accounts in their ERP systems. As their customer base grows, new customers are added daily, and their account teams have to manually sort through these new customers and link them to the proper parent entity.

The linking decision is based on the most recent information available publicly on the Internet or in the media, and might be affected by recent acquisitions, market news or divisional re-structuring. An example of account linking would be to identify the relationship between Amazon and its subsidiary, Whole Foods Market [source].

Schneider Electric is deploying large language models for their capabilities in answering questions in various knowledge specific domains, the date the model has been trained is limiting its knowledge. They addressed that challenge by using a Retriever-Augmented Generation open source large language model available on Amazon SageMaker JumpStart to process large amounts of external knowledge pulled and exhibit corporate or public relationships among ERP records.

In early 2023, when Schneider Electric decided to automate part of its accounts linking process using artificial intelligence (AI), the company partnered with the AWS Machine Learning Solutions Lab (MLSL). With MLSL’s expertise in ML consulting and execution, Schneider Electric was able to develop an AI architecture that would reduce the manual effort in their linking workflows, and deliver faster data access to their downstream analytics teams.

Generative AI

Generative AI and large language models (LLMs) are transforming the way business organizations are able to solve traditionally complex challenges related to natural language processing and understanding. Some of the benefits offered by LLMs include the ability to comprehend large portions of text and answer related questions by producing human-like responses. AWS makes it easy for customers to experiment with and productionize LLM workloads by making many options available via Amazon SageMaker JumpStart, Amazon Bedrock, and Amazon Titan.

External Knowledge Acquisition

LLMs are known for their ability to compress human knowledge and have demonstrated remarkable capabilities in answering questions in various knowledge specific domains, but their knowledge is limited by the date the model has been trained. We address that information cutoff by coupling the LLM with a Google Search API to deliver a powerful Retrieval Augmented LLM (RAG) that addresses Schneider Electric’s challenges. The RAG is able to process large amounts of external knowledge pulled from the Google search and exhibit corporate or public relationships among ERP records.

See the following example:

Question: Who is the parent company of One Medical?
Google query: “One Medical parent company” → information → LLM
Answer: One Medical, a subsidiary of Amazon…

The preceding example (taken from the Schneider Electric customer database) concerns an acquisition that happened in February 2023 and thus would not be caught by the LLM alone due to knowledge cutoffs. Augmenting the LLM with Google search guarantees the most up-to-date information.

Flan-T5 model

In that project we used Flan-T5-XXL model from the Flan-T5 family of models.

The Flan-T5 models are instruction-tuned and therefore are capable of performing various zero-shot NLP tasks. In our downstream task there was no need to accommodate a vast amount of world knowledge but rather to perform well on question answering given a context of texts provided through search results, and therefore, the 11B parameters T5 model performed well.

JumpStart provides convenient deployment of this model family through Amazon SageMaker Studio and the SageMaker SDK. This includes Flan-T5 Small, Flan-T5 Base, Flan-T5 Large, Flan-T5 XL, and Flan-T5 XXL. Furthermore, JumpStart provides a few versions of Flan-T5 XXL at different levels of quantization. We deployed Flan-T5-XXL to an endpoint for inference using Amazon SageMaker Studio Jumpstart.

Path to Flan-T5 SageMaker JumpStart

Retrieval Augmented LLM with LangChain

LangChain is popular and fast growing framework allowing development of applications powered by LLMs. It is based on the concept of chains, which are combinations of different components designed to improve the functionality of LLMs for a given task. For instance, it allows us to customize prompts and integrate LLMs with different tools like external search engines or data sources. In our use-case, we used Google Serper component to search the web, and deployed the Flan-T5-XXL model available on Amazon SageMaker Studio Jumpstart. LangChain performs the overall orchestration and allows the search result pages be fed into the Flan-T5-XXL instance.

The Retrieval-Augmented Generation (RAG) consists of two steps:

  1. Retrieval of relevant text chunks from external sources
  2. Augmentation of the chunks with context in the prompt given to the LLM.

For Schneider Electric’ use-case, the RAG proceeds as follows:

  1. The given company name is combined with a question like “Who is the parent company of X”, where X is the given company) and passed to a google query using the Serper AI
  2. The extracted information is combined with the prompt and original question and passed to the LLM for an answer.

The following diagram illustrates this process.

RAG Workflow

Use the following code to create an endpoint:

# Spin FLAN-T5-XXL Sagemaker Endpoint
llm = SagemakerEndpoint(...)

Instantiate search tool:

search = GoogleSerperAPIWrapper()
search_tool = Tool(
	name="Search",
	func=search.run,
	description="useful for when you need to ask with search",
	verbose=False)

In the following code, we chain together the retrieval and augmentation components:

my_template = """
Answer the following question using the information. n
Question : {question}? n
Information : {search_result} n
Answer: """
prompt_template = PromptTemplate(
	input_variables=["question", 'search_result'],
	template=my_template)
question_chain = LLMChain(
	llm=llm,
	prompt=prompt_template,
	output_key="answer")

def search_and_reply_company(company):
	# Retrieval
	search_result = search_tool.run(f"{company} parent company")
	# Augmentation
	output = question_chain({
		"question":f"Who is the parent company of {company}?",
		"search_result": search_result})
	return output["answer"]

search_and_reply_company("Whole Foods Market")
"Amazon"

The Prompt Engineering

The combination of the context and the question is called the prompt. We noticed that the blanket prompt we used (variations around asking for the parent company) performed well for most public sectors (domains) but didn’t generalize well to education or healthcare since the notion of parent company is not meaningful there. For education, we used “X” while for healthcare we used “Y”.

To enable this domain specific prompt selection, we also had to identify the domain a given account belongs to. For this, we also used a RAG where a multiple choice question “What is the domain of {account}?” as a first step, and based on the answer we inquired on the parent of the account using the relevant prompt as a second step. See the following code:

my_template_options = """
Answer the following question using the information. n
Question :  {question}? n
Information : {search_result} n
Options :n {options} n
Answer:
"""

prompt_template_options = PromptTemplate(
input_variables=["question", 'search_result', 'options'],
template=my_template_options)
question_chain = LLMChain(
	llm=llm,
	prompt=prompt_template_options,
	output_key="answer")
	
my_options = """
- healthcare
- education
- oil and gas
- banking
- pharma
- other domain """

def search_and_reply_domain(company):
search_result = search_tool.run(f"{company} ")
output = question_chain({
	"question":f"What is the domain of {company}?",
	"search_result": search_result,
	"options":my_options})
return output["answer"]

search_and_reply_domain("Exxon Mobil")
"oil and gas"

The sector specific prompts have boosted the overall performance from 55% to 71% of accuracy. Overall, the effort and time invested to develop effective prompts appear to significantly improve the quality of LLM response.

RAG with tabular data (SEC-10k)

The SEC 10K filings is another reliable source of information for subsidiaries and subdivisions filed annually by a publicly traded companies. These filings are available directly on SEC EDGAR or through  CorpWatch API.

We assume the information is given in tabular format. Below is a pseudo csv dataset that mimics the original format of the SEC-10K dataset. It is possible to merge multiple csv data sources into a combined pandas dataframe:

# A pseudo dataset similar by schema to the CorpWatch API dataset
df.head()

index	relation_id		source_cw_id	target_cw_id	parent		subsidiary
  1		90				22569           37				AMAZON		WHOLE FOODS MARKET
873		1467			22569			781				AMAZON		TWITCH
899		1505			22569			821				AMAZON		ZAPPOS
900		1506			22569			821				AMAZON		ONE MEDICAL
901		1507			22569			821				AMAZON		WOOT!

The LangChain provides an abstraction layer for pandas through create_pandas_dataframe_agent.  There are two key advantages to using LangChain/LLMs for this task:

  1. Once spun up, it allows a downstream consumer to interact with the dataset in natural language rather than code
  2. It is more robust to misspellings and different ways of naming accounts.

We spin the endpoint as above and create the agent:

# Create pandas dataframe agent
agent = create_pandas_dataframe_agent(llm, df, varbose=True)

In the following code, we query for the parent/subsidiary relationship and the agent translates the query into pandas language:

# Example 1
query = "Who is the parent of WHOLE FOODS MARKET?"
agent.run(query)

#### output
> Entering new AgentExecutor chain...
Thought: I need to find the row with WHOLE FOODS MARKET in the subsidiary column
Action: python_repl_ast
Action Input: df[df['subsidiary'] == 'WHOLE FOODS MARKET']
Observation:
source_cw_id	target_cw_id	parent		subsidiary
22569			37				AMAZON		WHOLE FOODS MARKET
Thought: I now know the final answer
Final Answer: AMAZON
> Finished chain.
# Example 2
query = "Who are the subsidiaries of Amazon?"
agent.run(query)
#### output
> Entering new AgentExecutor chain...
Thought: I need to find the row with source_cw_id of 22569
Action: python_repl_ast
Action Input: df[df['source_cw_id'] == 22569]
...
Thought: I now know the final answer
Final Answer: The subsidiaries of Amazon are Whole Foods Market, Twitch, Zappos, One Medical, Woot!...> Finished chain.
'The subsidiaries of Amazon are Whole Foods Market, Twitch, Zappos, One Medical, Woot!.'

Conclusion

In this post, we detailed how we used building blocks from LangChain to augment an LLM with search capabilities, in order to uncover relationships between Schneider Electric’s customer accounts. We extended the initial pipeline to a two-step process with domain identification before using a domain specific prompt for higher accuracy.

In addition to the Google Search query, datasets that detail corporate structures such as the SEC 10K filings can be used to further augment the LLM with trustworthy information. Schneider Electric team will also be able to extend and design their own prompts mimicking the way they classify some public sector accounts, further improving the accuracy of the pipeline. These capabilities will enable Schneider Electric to maintain up-to-date and accurate organizational structures of their customers, and unlock the ability to do analytics on top of this data.


About the Authors

Anthony Medeiros is a Manager of Solutions Engineering and Architecture at Schneider Electric. He specializes in delivering high-value AI/ML initiatives to many business functions within North America. With 17 years of experience at Schneider Electric, he brings a wealth of industry knowledge and technical expertise to the team.

Blake Sanstchi is a Business Intelligence Manager at Schneider Electric, leading an analytics team focused on supporting the Sales organization through data-driven insights.

Joshua LevyJoshua Levy is Senior Applied Science Manager in the Amazon Machine Learning Solutions lab, where he helps customers design and build AI/ML solutions to solve key business problems.

Kosta Belz is a Senior Applied Scientist with AWS MLSL with focus on Generative AI and document processing. He is passionate about building applications using Knowledge Graphs and NLP. He has around 10 years of experience in building Data & AI solutions to create value for customers and enterprises.

Aude Genevay is an Applied Scientist in the Amazon GenAI Incubator, where she helps customers solve key business problems through ML and AI. She previously was a researcher in theoretical ML and enjoys applying her knowledge to deliver state-of-the-art solutions to customers.

Md Sirajus Salekin is an Applied Scientist at AWS Machine Learning Solution Lab. He helps AWS customers to accelerate their business by building AI/ML solutions. His research interests are multimodal machine learning, generative AI, and ML applications in healthcare.

Zichen Wang, PhD, is a Senior Applied Scientist in AWS. With several years of research experience in developing ML and statistical methods using biological and medical data, he works with customers across various verticals to solve their ML problems.

Anton Gridin is a Principal Solutions Architect supporting Global Industrial Accounts, based out of New York City. He has more than 15 years of experience building secure applications and leading engineering teams.

Read More

Use AWS PrivateLink to set up private access to Amazon Bedrock

Use AWS PrivateLink to set up private access to Amazon Bedrock

Amazon Bedrock is a fully managed service provided by AWS that offers developers access to foundation models (FMs) and the tools to customize them for specific applications. It allows developers to build and scale generative AI applications using FMs through an API, without managing infrastructure. You can choose from various FMs from Amazon and leading AI startups such as AI21 Labs, Anthropic, Cohere, and Stability AI to find the model that’s best suited for your use case. With the Amazon Bedrock serverless experience, you can quickly get started, easily experiment with FMs, privately customize them with your own data, and seamlessly integrate and deploy them into your applications using AWS tools and capabilities.

Customers are building innovative generative AI applications using Amazon Bedrock APIs using their own proprietary data. When accessing Amazon Bedrock APIs, customers are looking for mechanism to set up a data perimeter without exposing their data to internet so they can mitigate potential threat vectors from internet exposure. The Amazon Bedrock VPC endpoint powered by AWS PrivateLink allows you to establish a private connection between the VPC in your account and the Amazon Bedrock service account. It enables VPC instances to communicate with service resources without the need for public IP addresses.

In this post, we demonstrate how to set up private access on your AWS account to access Amazon Bedrock APIs over VPC endpoints powered by PrivateLink to help you build generative AI applications securely with your own data.

Solution overview

You can use generative AI to develop a diverse range of applications, such as text summarization, content moderation, and other capabilities. When building such generative AI applications using FMs or base models, customers want to generate a response without going over the public internet or based on their proprietary data that may reside in their enterprise databases.

In the following diagram, we depict an architecture to set up your infrastructure to read your proprietary data residing in Amazon Relational Database Service (Amazon RDS) and augment the Amazon Bedrock API request with product information when answering product-related queries from your generative AI application. Although we use Amazon RDS in this diagram for illustration purposes, you can test the private access of the Amazon Bedrock APIs end to end using the instructions provided in this post.

The workflow steps are as follows:

  1. AWS Lambda running in your private VPC subnet receives the prompt request from the generative AI application.
  2. Lambda makes a call to proprietary RDS database and augments the prompt query context (for example, adding product information) and invokes the Amazon Bedrock API with the augmented query request.
  3. The API call is routed to the Amazon Bedrock VPC endpoint that is associated to the VPC endpoint policy with Allow permissions to Amazon Bedrock APIs.
  4. The Amazon Bedrock service API endpoint receives the API request over PrivateLink without traversing the public internet.
  5. You can change the Amazon Bedrock VPC endpoint policy to Deny permissions to validate that Amazon Bedrock APIs calls are denied.
  6. You can also privately access Amazon Bedrock APIs over the VPC endpoint from your corporate network through an AWS Direct Connect gateway.

Prerequisites

Before you get started, make sure you have the following prerequisites:

  • An AWS account
  • An AWS Identity and Access Management (IAM) federation role with access to do the following:
    • Create, edit, view, and delete VPC network resources
    • Create, edit, view and delete Lambda functions
    • Create, edit, view and delete IAM roles and policies
    • List foundation models and invoke the Amazon Bedrock foundation model
  • For this post, we use the us-east-1 Region
  • Request foundation model access via the Amazon Bedrock console

Set up the private access infrastructure

In this section, we set up the infrastructure such as VPC, private subnets, security groups, and Lambda function using an AWS CloudFormation template.

Use the following template to create the infrastructure stack Bedrock-GenAI-Stack in your AWS account.

The CloudFormation template creates the following resources on your behalf:

  • A VPC with two private subnets in separate Availability Zones
  • Security groups and routing tables
  • IAM role and policies for use by Lambda, Amazon Bedrock, and Amazon Elastic Compute Cloud (Amazon EC2)

Set up the VPC endpoint for Amazon Bedrock

In this section, we use Amazon Virtual Private Cloud (Amazon VPC) to set up the VPC endpoint for Amazon Bedrock to facilitate private connectivity from your VPC to Amazon Bedrock.

  1. On the Amazon VPC console, under Virtual private cloud in the navigation pane, choose Endpoints.
  2. Choose Create endpoint.
  3. For Name tag, enter bedrock-vpce.
  4. Under Services, search for bedrock-runtime and select com.amazonaws.<region>.bedrock-runtime.
  5. For VPC, specify the VPC Bedrock-GenAI-Project-vpc that you created through the CloudFormation stack in the previous section.
  6. In the Subnets section, and select the Availability Zones and choose the corresponding subnet IDs from the drop-down menu.
  7. For Security groups, select the security group with the group name Bedrock-GenAI-Stack-VPCEndpointSecurityGroup- and description Allow TLS for VPC Endpoint.

A security group acts as a virtual firewall for your instance to control inbound and outbound traffic. Note that this VPC endpoint security group only allows traffic originating from the security group attached to your VPC private subnets, adding a layer of protection.

  1. Choose Create endpoint.
  2. In the Policy section, select Custom and enter the following least privilege policy to ensure only certain actions are allowed on the specified foundation model resource, arn:aws:bedrock:*::foundation-model/anthropic.claude-instant-v1 for a given principal (such as Lambda function IAM role).
    {
    	"Version": "2012-10-17",
    	"Statement": [
    		{
    		    "Action": [
    		        "bedrock:InvokeModel"
    		        ],
    		    "Resource": [
    		        "arn:aws:bedrock:*::foundation-model/anthropic.claude-instant-v1"
    		        ],
    		    "Effect": "Allow",
    		    "Principal": {
                    "AWS": "arn:aws:iam::<accountid>:role/GenAIStack-Bedrock"
                }
    		}
    	]
    }

It may take up to 2 minutes until the interface endpoint is created and the status changes to Available. You can refresh the page to check the latest status.

Set up the Lambda function over private VPC subnets

Complete the following steps to configure the Lambda function:

  1. On the Lambda console, choose Functions in the navigation pane.
  2. Choose the function gen-ai-lambda-stack-BedrockTestLambdaFunction-XXXXXXXXXXXX.
  3. On the Configuration tab, choose Permissions in the left pane.
  4. Under Execution role¸ choose the link for the role gen-ai-lambda-stack-BedrockTestLambdaFunctionRole-XXXXXXXXXXXX.

You’re redirected to the IAM console.

  1. In the Permissions policies section, choose Add permissions and choose Create inline policy.
  2. On the JSON tab, modify the policy as follows:
    {
        "Version": "2012-10-17",
        "Statement": [
            {
                "Sid": "eniperms",
                "Effect": "Allow",
                "Action": [
                    "ec2:CreateNetworkInterface",
                    "ec2:DescribeNetworkInterfaces",
                    "ec2:DeleteNetworkInterface",
                    "ec2:*VpcEndpoint*"
                ],
                "Resource": "*"
            }
        ]
    }

  3. Choose Next.
  4. For Policy name, enter enivpce-policy.
  5. Choose Create policy.
  6. Add the following inline policy (provide your source VPC endpoints) for restricting Lambda access to Amazon Bedrock APIs only via VPC endpoints:
    {
        "Id": "lambda-bedrock-sourcevpce-access-only",
        "Version": "2012-10-17",
        "Statement": [
            {
                "Effect": "Allow",
                "Action": [
    		   "bedrock:ListFoundationModels",
                    "bedrock:InvokeModel"
                ],
                "Resource": "*",
                "Condition": {
                    "ForAnyValue:StringEquals": {
                        "aws:sourceVpce": [
                            "vpce-<bedrock-runtime-vpce>"
                        ]
                    }
                }
            }
        ]
    } 

  7. On Lambda function page, on the Configuration tab, choose VPC in the left pane, then choose Edit.
  8. For VPC, choose Bedrock-GenAI-Project-vpc.
  9. For Subnets, choose the private subnets.
  10. For Security groups, choose gen-ai-lambda-stack-SecurityGroup- (the security group for the Amazon Bedrock workload in private subnets).
  11. Choose Save.

Test private access controls

Now you can test the private access controls (Amazon Bedrock APIs over VPC endpoints).

  1. On the Lambda console, choose Functions in the navigation pane.
  2. Choose the function gen-ai-lambda-stack-BedrockTestLambdaFunction-XXXXXXXXXXXX.
  3. On the Code tab, choose Test.

You should see the following response from the Amazon Bedrock API call (Status: Succeeded).

  1. To deny access to Amazon Bedrock APIs over VPC endpoints, navigate to the Amazon VPC console.
  2. Under Virtual private cloud in the navigation pane, choose Endpoints.
  3. Choose your policy and navigate to the Policy tab.

Currently, the VPC endpoint policy is set to Allow.

  1. To deny access, choose Edit Policy.
  2. Change Allow to Deny and choose Save.

It may take up to 2 minutes for the policy for the VPC endpoint to update.

{
	"Version": "2012-10-17",
	"Statement": [
		{
		    "Action": [
		        "bedrock:InvokeModel"
		        ],
		    "Resource": [
		        "arn:aws:bedrock:*::foundation-model/anthropic.claude-instant-v1"
		        ],
		    "Effect": "Deny",
		    "Principal": {
                "AWS": "arn:aws:iam::<accountid>:role/GenAIStack-Bedrock"
            }
		}
	]
}
  1. Return to the Lambda function page and on the Code tab, choose Test.

As shown in the following screenshot, the access request to Amazon Bedrock over the VPC endpoint was denied (Status: Failed).

Through this testing process, we demonstrated how traffic from your VPC to the Amazon Bedrock API endpoint is traversing over the PrivateLink connection and not through the internet connection.

Clean up

Follow these steps to avoid incurring future charges:

  1. Clean up the VPC endpoints.
  2. Clean up the VPC.
  3. Delete the CloudFormation stack.

Conclusion

In this post, we demonstrated how to set up and operationalize a private connection between a generative AI workload deployed on your customer VPC and Amazon Bedrock using an interface VPC endpoint powered by PrivateLink. When using the architecture discussed in this post, the traffic between your customer VPC and Amazon Bedrock will not leave the Amazon network, ensuring your data is not exposed to the public internet and thereby helping with your compliance requirements.

As a next step, try the solution out in your account and share your feedback.


About the Authors

Ram Vittal is a Principal ML Solutions Architect at AWS. He has over 3 decades of experience architecting and building distributed, hybrid, and cloud applications. He is passionate about building secure and scalable AI/ML and big data solutions to help enterprise customers with their cloud adoption and optimization journey to improve their business outcomes. In his spare time, he rides his motorcycle and walks with his 3-year-old Sheepadoodle!

Ray Khorsandi is an AI/ML specialist at AWS, supporting strategic customers with AI/ML best practices. With an M.Sc. and Ph.D. in Electrical Engineering and Computer Science, he leads enterprises to build secure, scalable AI/ML and big data solutions to optimize their cloud adoption. His passions include computer vision, NLP, generative AI, and MLOps. Ray enjoys playing soccer and spending quality time with family.

Michael Daniels is an AI/ML Specialist at AWS. His expertise lies in building and leading AI/ML and generative AI solutions for complex and challenging business problems, which is enhanced by his Ph.D. from the Univ. of Texas and his M.Sc. in Computer Science specialization in Machine Learning from the Georgia Institute of Technology. He excels in applying cutting-edge cloud technologies to innovate, inspire, and transform industry-leading organizations, while also effectively communicating with stakeholders at any level or scale. In his spare time, you can catch Michael skiing or snowboarding in the mountains.

Read More

Deploy and fine-tune foundation models in Amazon SageMaker JumpStart with two lines of code

Deploy and fine-tune foundation models in Amazon SageMaker JumpStart with two lines of code

We are excited to announce a simplified version of the Amazon SageMaker JumpStart SDK that makes it straightforward to build, train, and deploy foundation models. The code for prediction is also simplified. In this post, we demonstrate how you can use the simplified SageMaker Jumpstart SDK to get started with using foundation models in just a couple of lines of code.

For more information about the simplified SageMaker JumpStart SDK for deployment and training, refer to Low-code deployment with the JumpStartModel class and Low-code fine-tuning with the JumpStartEstimator class, respectively.

Solution overview

SageMaker JumpStart provides pre-trained, open-source models for a wide range of problem types to help you get started with machine learning (ML). You can incrementally train and fine-tune these models before deployment. JumpStart also provides solution templates that set up infrastructure for common use cases, and executable example notebooks for ML with Amazon SageMaker. You can access the pre-trained models, solution templates, and examples through the SageMaker JumpStart landing page in Amazon SageMaker Studio or use the SageMaker Python SDK.

To demonstrate the new features of the SageMaker JumpStart SDK, we show you how to use the pre-trained Flan T5 XL model from Hugging Face for text generation for summarization tasks. We also showcase how, in just a few lines of code, you can fine-tune the Flan T5 XL model for summarization tasks. You can use any other model for text generation like Llama2, Falcon, or Mistral AI.

You can find the notebook for this solution using Flan T5 XL in the GitHub repo.

Deploy and invoke the model

Foundation models hosted on SageMaker JumpStart have model IDs. For the full list of model IDs, refer to Built-in Algorithms with pre-trained Model Table. For this post, we use the model ID of the Flan T5 XL text generation model. We instantiate the model object and deploy it to a SageMaker endpoint by calling its deploy method. See the following code:

from sagemaker.jumpstart.model import JumpStartModel

# Replace with larger model if needed
pretrained_model = JumpStartModel(model_id="huggingface-text2text-flan-t5-base")
pretrained_predictor = pretrained_model.deploy()

Next, we invoke the model to create a summary of the provided text using the Flan T5 XL model. The new SDK interface makes it straightforward for you to invoke the model: you just need to pass the text to the predictor and it returns the response from the model as a Python dictionary.

text = """Summarize this content - Amazon Comprehend uses natural language processing (NLP) to extract insights about the content of documents. It develops insights by recognizing the entities, key phrases, language, sentiments, and other common elements in a document. Use Amazon Comprehend to create new products based on understanding the structure of documents. For example, using Amazon Comprehend you can search social networking feeds for mentions of products or scan an entire document repository for key phrases. 
You can access Amazon Comprehend document analysis capabilities using the Amazon Comprehend console or using the Amazon Comprehend APIs. You can run real-time analysis for small workloads or you can start asynchronous analysis jobs for large document sets. You can use the pre-trained models that Amazon Comprehend provides, or you can train your own custom models for classification and entity recognition. """
query_response = pretrained_predictor.predict(text)
print(query_response["generated_text"])

The following is the output of the summarization task:

Understand how Amazon Comprehend works. Use Amazon Comprehend to analyze documents.

Fine-tune and deploy the model

The SageMaker JumpStart SDK provides you with a new class, JumpStartEstimator, which simplifies fine-tuning. You can provide the location of fine-tuning data and optionally pass validations datasets as well. After you fine-tune the model, use the deploy method of the Estimator object to deploy the fine-tuned model:

from sagemaker.jumpstart.estimator import JumpStartEstimator

estimator = JumpStartEstimator(
    model_id=model_id,
)
estimator.set_hyperparameters(instruction_tuned="True", epoch="3", max_input_length="1024")
estimator.fit({"training": train_data_location})
finetuned_predictor = estimator.deploy()

Customize the new classes in the SageMaker SDK

The new SDK makes it straightforward to deploy and fine-tune JumpStart models by defaulting many parameters. You still have the option to override the defaults and customize the deployment and invocation based on your requirements. For example, you can customize input payload format type, instance type, VPC configuration, and more for your environment and use case.

The following code shows how to override the instance type while deploying your model:

finetuned_predictor = estimator.deploy(instance_type='ml.g5.2xlarge')

The SageMaker JumpStart SDK deploy function will automatically select a default content type and serializer for you. If you want to change the format type of the input payload, you can use serializers and content_types objects to introspect the options available to you by passing the model_id of the model you are working with. In the following code, we set the payload input format as JSON by setting JSONSerializer as serializer and application/json as content_type:

from sagemaker import serializers
from sagemaker import content_types

serializer_options = serializers.retrieve_options(model_id=model_id, model_version=model_version)
content_type_options = content_types.retrieve_options(model_id=model_id, model_version=model_version)

pretrained_predictor.serializer = serializers.JSONSerializer()
pretrained_predictor.content_type = 'application/json'

Next, you can invoke the Flan T5 XL model for the summarization task with a payload of the JSON format. In the following code, we also pass inference parameters in the JSON payload for making responses more accurate:

from sagemaker import serializers

input_text= """Summarize this content - Amazon Comprehend uses natural language processing (NLP) to extract insights about the content of documents. It develops insights by recognizing the entities, key phrases, language, sentiments, and other common elements in a document. Use Amazon Comprehend to create new products based on understanding the structure of documents. For example, using Amazon Comprehend you can search social networking feeds for mentions of products or scan an entire document repository for key phrases.
You can access Amazon Comprehend document analysis capabilities using the Amazon Comprehend console or using the Amazon Comprehend APIs. You can run real-time analysis for small workloads or you can start asynchronous analysis jobs for large document sets. You can use the pre-trained models that Amazon Comprehend provides, or you can train your own custom models for classification and entity recognition. """

parameters = {
    "max_length": 600,
    "num_return_sequences": 1,
    "top_p": 0.01,
    "do_sample": False,
}

payload = {"text_inputs": input_text, **parameters} #JSON Input format

pretrained_predictor.serializer = serializers.JSONSerializer()
query_response = pretrained_predictor.predict(payload)
print(query_response["generated_texts"][0])

If you’re looking for more ways to customize the inputs and other options for hosting and fine-tuning, refer to the documentation for the JumpStartModel and JumpStartEstimator classes.

Conclusion

In this post, we showed you how you can use the simplified SageMaker JumpStart SDK for building, training, and deploying task-based and foundation models in just a few lines of code. We demonstrated the new classes like JumpStartModel and JumpStartEstimator using the Hugging Face Flan T5-XL model as an example. You can use any of the other SageMaker JumpStart foundation models for use cases such as content writing, code generation, question answering, summarization, classification, information retrieval, and more. To see the whole list of models available with SageMaker JumpStart, refer to Built-in Algorithms with pre-trained Model Table. SageMaker JumpStart also supports task-specific models for many popular problem types.

We hope the simplified interface of the SageMaker JumpStart SDK will help you get started quickly and enable you to deliver faster. We look forward to hearing how you use the simplified SageMaker JumpStart SDK to create exciting applications!


About the authors

Evan Kravitz is a software engineer at Amazon Web Services, working on SageMaker JumpStart. He is interested in the confluence of machine learning with cloud computing. Evan received his undergraduate degree from Cornell University and master’s degree from the University of California, Berkeley. In 2021, he presented a paper on adversarial neural networks at the ICLR conference. In his free time, Evan enjoys cooking, traveling, and going on runs in New York City.

Rachna Chadha is a Principal Solution Architect AI/ML in Strategic Accounts at AWS. Rachna is an optimist who believes that ethical and responsible use of AI can improve society in the future and bring economic and social prosperity. In her spare time, Rachna likes spending time with her family, hiking, and listening to music.

Jonathan Guinegagne is a Senior Software Engineer with Amazon SageMaker JumpStart at AWS. He got his master’s degree from Columbia University. His interests span machine learning, distributed systems, and cloud computing, as well as democratizing the use of AI. Jonathan is originally from France and now lives in Brooklyn, NY.

Dr. Ashish Khetan is a Senior Applied Scientist with Amazon SageMaker built-in algorithms and helps develop machine learning algorithms. He got his PhD from University of Illinois Urbana-Champaign. He is an active researcher in machine learning and statistical inference, and has published many papers in NeurIPS, ICML, ICLR, JMLR, ACL, and EMNLP conferences.

Read More

Elevate your marketing solutions with Amazon Personalize and generative AI

Elevate your marketing solutions with Amazon Personalize and generative AI

Generative artificial intelligence is transforming how enterprises do business. Organizations are using AI to improve data-driven decisions, enhance omnichannel experiences, and drive next-generation product development. Enterprises are using generative AI specifically to power their marketing efforts through emails, push notifications, and other outbound communication channels. Gartner predicts that “by 2025, 30% of outbound marketing messages from large organizations will be synthetically generated.” However, generative AI alone isn’t enough to deliver engaging customer communication. Research shows that the most impactful communication is personalized—showing the right message to the right user at the right time. According to McKinsey, “71% of consumers expect companies to deliver personalized interactions.” Customers can use Amazon Personalize and generative AI to curate concise, personalized content for marketing campaigns, increase ad engagement, and enhance conversational chatbots.

Developers can use Amazon Personalize to build applications powered by the same type of machine learning (ML) technology used by Amazon.com for real-time personalized recommendations. With Amazon Personalize, developers can improve user engagement through personalized product and content recommendations with no ML expertise required. Using recipes (algorithms prepared to support specific uses cases) provided by Amazon Personalize, customers can deliver a wide array of personalization, including specific product or content recommendations, personalized ranking, and user segmentation. Additionally, as a fully managed artificial intelligence service, Amazon Personalize accelerates customers’ digital transformations with ML, making it easier to integrate personalized recommendations into existing websites, applications, email marketing systems, and so on.

In this post, we illustrate how you can elevate your marketing campaigns using Amazon Personalize and generative AI with Amazon Bedrock. Together, Amazon Personalize and generative AI help you tailor your marketing to individual consumer preferences.

How exactly do Amazon Personalize and Amazon Bedrock work together to achieve this? Imagine as a marketer that you want to send tailored emails to users recommending movies they would enjoy based on their interactions across your platform. Or perhaps you want to send targeted emails to a segment of users promoting a new shoe they might be interested in. The following use cases use generative AI to enhance two common marketing emails.

Use Case 1: Use generative AI to deliver targeted one-to-one personalized emails

With Amazon Personalize and Amazon Bedrock, you can generate personalized recommendations and create outbound messages with a personal touch tailored to each of your users.

The following diagram illustrates the architecture and workflow for delivering targeted personalized emails powered by generative AI.

First, import your dataset of users’ interactions into Amazon Personalize for training. Amazon Personalize automatically trains a model using the Top Picks for You recipe. As an output, Amazon Personalize provides recommendations that align with the users’ preferences.

You can use the following code to identify recommended items for users:

get_recommendations_response = personalize_runtime.get_recommendations(
                            recommenderArn = workshop_recommender_top_picks_arn,
                            userId = str(user_id),
                            numResults = number_of_movies_to_recommend)

For more information, see the Amazon Personalize API reference.

The provided output of recommendations is ingested by Amazon Bedrock using a prompt, which includes your user preferences, demographics, and Amazon Personalize recommended items.

For example, a marketer who wants to create a personalized email that is charming and fun for a user might use the following prompt:

Create a personalized email which is charming and fun so that the user is engaged. The user has recently watched family-friendly films. These are the recommended items – The Little Mermaid, Encanto, Spider-Man: Into the Spider-Verse.

By invoking one of the foundation models (FM) provided in Amazon Bedrock, such as Claude 2, with the prompt and sample code that follows, you can create a personalized email for a user:

personalized_email_response = bedrock_client.invoke_model(
                            body = prompt,
                            modelId = identifier_of_the_model)

For more information, see the Amazon Bedrock API reference.

Amazon Bedrock returns a personalized email for the user:

Subject: Fall in love with this recommended selection for movie night!

Dear <user name>,

Desiring the cozy feel of fall? No problem! Check our top three recommendations for movies that will have you cozy on the couch with your loved ones:

1. The Little Mermaid: This classic Disney movie is all about a mermaid princess named Ariel, who dreams of the human world. Because of her fascination, she makes a deal with the sea witch Ursula and learns a major lesson.

2. Encanto: This Disney movie is about the Madrigals, a Colombian family who lives in a magical house. Each member of the family has a unique gift, except for young Maribel who must help save her family.

3. Spider-Man: Into the Spider-Verse: This animated superhero movie is a must-see action movie. Spider-man, a Brooklyn teen named Miles Morales, teams up with other spider-powered people to save the multiverse.

With lovable characters, catchy tunes, and moving stories, you really can’t go wrong with any of these three. Grab the popcorn because you’re in for a treat!

Use case 2: Use generative AI to elevate one-to-many marketing campaigns

When it comes to one-to-many email marketing, generic content can result in low engagement (that is, low open rates and unsubscribes). One way companies circumvent this outcome is to manually craft variations of outbound messages with compelling subjects. This can lead to inefficient use of time. By integrating Amazon Personalize and Amazon Bedrock into your workflow, you can quickly identify the interested segment of users and create variations of email content with greater relevance and engagement.

The following diagram illustrates the architecture and workflow for elevating marketing campaigns powered by generative AI.

To compose one-to-many emails, first import your dataset of users’ interactions into Amazon Personalize for training. Amazon Personalize trains the model using the user segmentation recipe. With the user segmentation recipe, Amazon Personalize automatically identifies the individual users that demonstrate a propensity for the chosen items as the target audience.

To identify the target audience and retrieve metadata for an item you can use the following sample code:

create_batch_segment_response = personalize.create_batch_segment_job(
        jobName = job_name,
        solutionVersionArn = solution_version_arn,
        numResults = number_of_users_to_recommend
        jobInput =  {
            "s3DataSource": {
                "path": batch_input_path
            }
        },
        jobOutput = {
            "s3DataDestination": {
            "path": batch_output_path
            }
        }
)

For more information, see the Amazon Personalize API reference.

Amazon Personalize delivers a list of recommended users to target for each item to batch_output_path. You can then invoke the user segment into Amazon Bedrock using one of the FMs along with your prompt.

For this use case, you might want to market a newly released sneaker through email. An example prompt might include the following:

For the user segment “sneaker heads”, create a catchy email that promotes the latest sneaker “Ultra Fame II”. Provide users with discount code FAME10 to save 10%.

Similar to the first use case, you’ll use the following code in Amazon Bedrock:

personalized_email_response = bedrock_client.invoke_model(
                                body = prompt,
                                modelId = identifier_of_the_model)

For more information, see the Amazon Bedrock API reference.

Amazon Bedrock returns a personalized email based on the items chosen for each user as shown:

Subject: <<name>>, your ticket to the Hall of Fame awaits

Hey <<name>>,

The wait is over. Check out the new Ultra Fame II! It’s the most innovative and comfortable Ultra Fame shoe yet. Its new design will have you turning heads with every step. Plus, you’ll get a mix of comfort, support, and style that’s just enough to get you into the Hall of Fame.

Don’t wait until it’s too late. Use the code FAME10 to save 10% on your next pair.

To test and determine the email that leads to the highest engagement, you can use Amazon Bedrock to generate a variation of catchy subject lines and content in a fraction of the time it would take to manually produce test content.

Conclusion

By integrating Amazon Personalize and Amazon Bedrock, you are enabled to deliver personalized promotional content to the right audience.

Generative AI powered by FMs is changing how businesses build hyper-personalized experiences for consumers. AWS AI services, such as Amazon Personalize and Amazon Bedrock, can help recommend and deliver products, content, and compelling marketing messages personalized to your users. For more information on working with generative AI on AWS, see to Announcing New Tools for Building with Generative AI on AWS.


About the Authors

Ba’Carri Johnson is a Sr. Technical Product Manager working with AWS AI/ML on the Amazon Personalize team. With a background in computer science and strategy, she is passionate about product innovation. In her spare time, she enjoys traveling and exploring the great outdoors.

Ragini Prasad is a Software Development Manager with the Amazon Personalize team focused on building AI-powered recommender systems at scale. In her spare time, she enjoys art and travel.

Jingwen Hu is a Sr. Technical Product Manager working with AWS AI/ML on the Amazon Personalize team. In her spare time, she enjoys traveling and exploring local food.

Anna Grüebler is a Specialist Solutions Architect at AWS focusing on artificial intelligence. She has more than 10 years of experience helping customers develop and deploy machine learning applications. Her passion is taking new technologies and putting them in the hands of everyone and solving difficult problems by taking advantage of using AI in the cloud.

Tim Wu Kunpeng is a Sr. AI Specialist Solutions Architect with extensive experience in end-to-end personalization solutions. He is a recognized industry expert in e-commerce and media and entertainment, with expertise in generative AI, data engineering, deep learning, recommendation systems, responsible AI, and public speaking.

Read More

Intelligently search Drupal content using Amazon Kendra

Intelligently search Drupal content using Amazon Kendra

Amazon Kendra is an intelligent search service powered by machine learning (ML). Amazon Kendra helps you easily aggregate content from a variety of content repositories into a centralized index that lets you quickly search all your enterprise data and find the most accurate answer. Drupal is a content management software. It’s used to make many of the websites and applications we use every day. Drupal has a great feature set, like straightforward content authoring, reliable performance, and security. Many organizations use Drupal to store their content. One of the key requirements for many customers using Drupal is the ability to easily and securely find accurate information across all the documents in the data source.

With the Amazon Kendra Drupal connector, you can index Drupal content, filter the types of custom content you want to index, and easily search through Drupal content using Amazon Kendra intelligent search.

This post shows you how to use the Amazon Kendra Drupal connector to configure the connector as a data source for your Amazon Kendra index and search your Drupal documents. Based on the configuration of the Drupal connector, you can synchronize the connector to crawl and index different types of Drupal content such as blogs and wikis. The connector also ingests the access control list (ACL) information for each file. The ACL information is used for user context filtering, where search results for a query are filtered by what a user has authorized access to.

Prerequisites

To try out the Amazon Kendra connector for Drupal using this post as a reference, you need the following:

Configure the data source using the Amazon Kendra connector for Drupal

To add a data source to your Amazon Kendra index using the Drupal connector, you can use an existing index or create a new index. Then complete the following steps. For more information on this topic, refer to the Amazon Kendra Developer Guide.

  1. On the Amazon Kendra console, open your index and choose Data sources in the navigation pane.
  2. Choose Add data source.
  3. Under Drupal, choose Add connector.
  4. In the Specify data source details section, enter a name and description and choose Next.
  5. On the Define access and security section, for Drupal Host URL, enter the Drupal site URL.
  6. To configure the SSL certificates, you can create a self-signed certificate for this setup using the openssl x509 -in mydrupalsite.pem -out drupal.crt command and store the certificate in an Amazon Simple Storage Service (Amazon S3) bucket. For more details on generating a private key and the certificate, refer to Generating Certificates.
  7. Choose Browse S3 and choose the S3 bucket with the SSL certificate.
  8. Under Authentication, you have two options:
    • Use Secrets Manager to create new Drupal authentication credentials. You need a Drupal admin user name and password (additionally, a client ID and client secret for OAuth 2.0 authentication).
    • Use an existing Secrets Manager secret that has the Drupal authentication credentials you want the connector to access (additionally, a client ID and client secret for OAuth 2.0 authentication).
  9. Choose Save and add secret.
  10. For IAM role, choose Create a new role or choose an existing IAM role configured with appropriate IAM policies to access the Secrets Manager secret, Amazon Kendra index, and data source.

Refer to IAM roles for data sources for the required permissions for the IAM role.

  1. Choose Next.
  2. In the Configure sync settings section, select Articles, Basic pages, Basic blocks, Custom content types, and Custom Blocks along with options to crawl comments and attachments as needed.
  3. Optionally, enter the include/exclude patterns for the entity titles.
  4. Provide information about your sync scope (full or delta only) and specify the run schedule.
  5. Choose Next.

  6. In the Set field mappings section, add custom Drupal fields you want to sync and their respective Amazon Kendra field mappings. The required fields are pre-mapped by Amazon Kendra.
  7. Choose Next.
  8. Review the configuration settings and save the data source.
  9. Choose Sync now on the created data source to start data synchronization with the Amazon Kendra Index.

The time required to crawl and sync the contents into Amazon Kendra varies based on the volume of content and the throughput.

You can now search the indexed Drupal content using the search console or a search application. Optionally, you can search with ACL with the following additional steps.

  1. Go to the index page that you created and on the User access control tab, choose Edit settings.
  2. Under Access control settings, select Yes, keep the default values for Username and Groups, choose JSON for Token type, and keep the user-group expansion as None.
  3. On the next page, retain the default values (or change them based on your capacity requirements) and choose Update.

Perform intelligent search with Amazon Kendra

Before you try searching on the Amazon Kendra console or using the API, make sure that the data source sync is complete. To check, view the data sources and verify if the last sync was successful.

  1. To start your search, on the Amazon Kendra console, choose Search indexed content in the navigation pane.

You’re redirected to the Amazon Kendra search console. Now you can search information from the Drupal documents you indexed using Amazon Kendra.

  1. For this post, we search for a document stored in the Drupal data source.
  2. Expand Test query with an access token and choose Apply token.
  3. For Username, enter the email address associated with your Drupal account.
  4. Choose Apply.

Now the user can only see the content they have access based on the user name or groups specified. In our example, the Drupal user with the test@amazon.com email doesn’t have access to any documents on Drupal, so none are displayed.

Limitations

Note the following limitations when using this solution:

  • The content types (such as article, or basic page) that aren’t associated with any view cannot be crawled.
  • If an administrator doesn’t have access to a block, then you can’t crawl the data from the block.
  • The document body for article, basic page, basic block, user-defined content type, and user-defined block type is displayed in HTML format. If the HTML content is not well-formed, then the HTML related tags will appear in the document body and therefore can be seen on the Amazon Kendra search results. This is the same with comments of article, basic page, basic block, user-defined content type, user-defined block type.
  • The content type or block type without description or body will not be injected into the Amazon Kendra index because there is a validation on the Amazon Kendra SDK side. However, Drupal allows you to create the content type without description or body. Only the comments and attachments of the respective content types or block types (if they exist) will be injected into the Amazon Kendra index.

Clean up

To avoid incurring future costs, clean up the resources you created as part of this solution. If you created a new Amazon Kendra index while testing this solution, delete it. If you only added a new data source using the Amazon Kendra connector for Drupal, delete that data source. Delete any IAM users created.

Conclusion

With the Amazon Kendra Drupal connector, your organization can search contents stored in a Drupal site securely using intelligent search powered by Amazon Kendra. In this post, we introduced you to the integration, but there are many additional features that we didn’t cover, such as the following:

  • You can map additional fields to Amazon Kendra index attributes and enable them for faceting, search, and display in the search results
  • You can integrate the Drupal data source with the Custom Document Enrichment (CDE) capability in Amazon Kendra to perform additional attribute mapping logic and even custom content transformation during ingestion

To learn more about the possibilities with Drupal, refer to the Amazon Kendra Developer Guide.

For more information on other Amazon Kendra built-in connectors for popular data sources, refer to the Amazon Kendra Connectors page.


About the authors

Channa Basavaraja is a Senior Solutions Architect at AWS with over 2 decades of experience building distributed business solutions. His areas of depth span Machine Learning, app/mobile dev, event-driven architecture, and IoT/edge computing.

Yuanhua Wang is a software engineer at AWS with more than 15 years of experience in the technology industry. His interests are software architecture and build tools on cloud computing.

Read More

Intuitivo achieves higher throughput while saving on AI/ML costs using AWS Inferentia and PyTorch

Intuitivo achieves higher throughput while saving on AI/ML costs using AWS Inferentia and PyTorch

This is a guest post by Jose Benitez, Founder and Director of AI and Mattias Ponchon, Head of Infrastructure at Intuitivo.

Intuitivo, a pioneer in retail innovation, is revolutionizing shopping with its cloud-based AI and machine learning (AI/ML) transactional processing system. This groundbreaking technology enables us to operate millions of autonomous points of purchase (A-POPs) concurrently, transforming the way customers shop. Our solution outpaces traditional vending machines and alternatives, offering an economical edge with its ten times cheaper cost, easy setup, and maintenance-free operation. Our innovative new A-POPs (or vending machines) deliver enhanced customer experiences at ten times lower cost because of the performance and cost advantages AWS Inferentia delivers. Inferentia has enabled us to run our You Only Look Once (YOLO) computer vision models five times faster than our previous solution and supports seamless, real-time shopping experiences for our customers. Additionally, Inferentia has also helped us reduce costs by 95 percent compared to our previous solution. In this post, we cover our use case, challenges, and a brief overview of our solution using Inferentia.

The changing retail landscape and need for A-POP

The retail landscape is evolving rapidly, and consumers expect the same easy-to-use and frictionless experiences they are used to when shopping digitally. To effectively bridge the gap between the digital and physical world, and to meet the changing needs and expectations of customers, a transformative approach is required. At Intuitivo, we believe that the future of retail lies in creating highly personalized, AI-powered, and computer vision-driven autonomous points of purchase (A-POP). This technological innovation brings products within arm’s reach of customers. Not only does it put customers’ favorite items at their fingertips, but it also offers them a seamless shopping experience, devoid of long lines or complex transaction processing systems. We’re excited to lead this exciting new era in retail.

With our cutting-edge technology, retailers can quickly and efficiently deploy thousands of A-POPs. Scaling has always been a daunting challenge for retailers, mainly due to the logistic and maintenance complexities associated with expanding traditional vending machines or other solutions. However, our camera-based solution, which eliminates the need for weight sensors, RFID, or other high-cost sensors, requires no maintenance and is significantly cheaper. This enables retailers to efficiently establish thousands of A-POPs, providing customers with an unmatched shopping experience while offering retailers a cost-effective and scalable solution.

Using cloud inference for real-time product identification

While designing a camera-based product recognition and payment system, we ran into a decision of whether this should be done on the edge or the cloud. After considering several architectures, we designed a system that uploads videos of the transactions to the cloud for processing.

Our end users start a transaction by scanning the A-POP’s QR code, which triggers the A-POP to unlock and then customers grab what they want and go. Preprocessed videos of these transactions are uploaded to the cloud. Our AI-powered transaction pipeline automatically processes these videos and charges the customer’s account accordingly.

The following diagram shows the architecture of our solution.

Unlocking high-performance and cost-effective inference using AWS Inferentia

As retailers look to scale operations, cost of A-POPs becomes a consideration. At the same time, providing a seamless real-time shopping experience for end-users is paramount. Our AI/ML research team focuses on identifying the best computer vision (CV) models for our system. We were now presented with the challenge of how to simultaneously optimize the AI/ML operations for performance and cost.

We deploy our models on Amazon EC2 Inf1 instances powered by Inferentia, Amazon’s first ML silicon designed to accelerate deep learning inference workloads. Inferentia has been shown to reduce inference costs significantly. We used the AWS Neuron SDK—a set of software tools used with Inferentia—to compile and optimize our models for deployment on EC2 Inf1 instances.

The code snippet that follows shows how to compile a YOLO model with Neuron. The code works seamlessly with PyTorch and functions such as torch.jit.trace()and neuron.trace()record the model’s operations on an example input during the forward pass to build a static IR graph.

from ultralytics import YOLO
import torch_neuronx
import torch

batch_size = 1
imgsz = (640, 640)
im = torch.zeros(batch_size, 3, *imgsz).to('cpu')  # mock input

# Compiler options
half = True  # fp16
fp8 = False
dynamic = False  # dynamic batch

f = 'yolov8n.neuronx'  # output model name
neuronx_cc_args = ['--auto-cast', 'none']

if half:
    neuronx_cc_args = ['--auto-cast', 'all', '--auto-cast-type', 'fp16']
elif fp8:
    neuronx_cc_args = ['--auto-cast', 'all', '--auto-cast-type', 'fp8_e4m3']

model = torch.load('yolov8n.pt')['model']
model.eval()
model.float()
model = model.fuse()
neuronx_model = torch_neuronx.trace(
    model,
    example_inputs=im,
    compiler_args=neuronx_cc_args,
)

if dynamic:
    neuronx_model = torch_neuronx.dynamic_batch(neuronx_model)

neuronx_model.save(f)

We migrated our compute-heavy models to Inf1. By using AWS Inferentia, we achieved the throughput and performance to match our business needs. Adopting Inferentia-based Inf1 instances in the MLOps lifecycle was a key to achieving remarkable results:

  1. Performance improvement: Our large computer vision models now run five times faster, achieving over 120 frames per second (FPS), allowing for seamless, real-time shopping experiences for our customers. Furthermore, the ability to process at this frame rate not only enhances transaction speed, but also enables us to feed more information into our models. This increase in data input significantly improves the accuracy of product detection within our models, further boosting the overall efficacy of our shopping systems.
  2. Cost savings: We slashed inference costs. This significantly enhanced the architecture design supporting our A-POPs.

Data parallel inference was easy with AWS Neuron SDK

To improve performance of our inference workloads and extract maximum performance from Inferentia, we wanted to use all available NeuronCores in the Inferentia accelerator. Achieving this performance was easy with the built-in tools and APIs from the Neuron SDK. We used the torch.neuron.DataParallel() API. We’re currently using inf1.2xlarge which has one Inferentia accelerator with four Neuron accelerators. So we’re using torch.neuron.DataParallel() to fully use the Inferentia hardware and use all available NeuronCores. This Python function implements data parallelism at the module level on models created by the PyTorch Neuron API. Data parallelism is a form of parallelization across multiple devices or cores (NeuronCores for Inferentia), referred to as nodes. Each node contains the same model and parameters, but data is distributed across the different nodes. By distributing the data across multiple nodes, data parallelism reduces the total processing time of large batch size inputs compared to sequential processing. Data parallelism works best for models in latency-sensitive applications that have large batch size requirements.

Looking ahead: Accelerating retail transformation with foundation models and scalable deployment

As we venture into the future, the impact of foundation models on the retail industry cannot be overstated. Foundation models can make a significant difference in product labeling. The ability to quickly and accurately identify and categorize different products is crucial in a fast-paced retail environment. With modern transformer-based models, we can deploy a greater diversity of models to serve more of our AI/ML needs with higher accuracy, improving the experience for users and without having to waste time and money training models from scratch. By harnessing the power of foundation models, we can accelerate the process of labeling, enabling retailers to scale their A-POP solutions more rapidly and efficiently.

We have begun implementing Segment Anything Model (SAM), a vision transformer foundation model that can segment any object in any image (we will discuss this further in another blog post). SAM allows us to accelerate our labeling process with unparalleled speed. SAM is very efficient, able to process approximately 62 times more images than a human can manually create bounding boxes for in the same timeframe. SAM’s output is used to train a model that detects segmentation masks in transactions, opening up a window of opportunity for processing millions of images exponentially faster. This significantly reduces training time and cost for product planogram models.

Our product and AI/ML research teams are excited to be at the forefront of this transformation. The ongoing partnership with AWS and our use of Inferentia in our infrastructure will ensure that we can deploy these foundation models cost effectively. As early adopters, we’re working with the new AWS Inferentia 2-based instances. Inf2 instances are built for today’s generative AI and large language model (LLM) inference acceleration, delivering higher performance and lower costs. Inf2 will enable us to empower retailers to harness the benefits of AI-driven technologies without breaking the bank, ultimately making the retail landscape more innovative, efficient, and customer-centric.

As we continue to migrate more models to Inferentia and Inferentia2, including transformers-based foundational models, we are confident that our alliance with AWS will enable us to grow and innovate alongside our trusted cloud provider. Together, we will reshape the future of retail, making it smarter, faster, and more attuned to the ever-evolving needs of consumers.

Conclusion

In this technical traverse, we’ve highlighted our transformational journey using AWS Inferentia for its innovative AI/ML transactional processing system. This partnership has led to a five times increase in processing speed and a stunning 95 percent reduction in inference costs compared to our previous solution. It has changed the current approach of the retail industry by facilitating a real-time and seamless shopping experience.

If you’re interested in learning more about how Inferentia can help you save costs while optimizing performance for your inference applications, visit the Amazon EC2 Inf1 instances and Amazon EC2 Inf2 instances product pages. AWS provides various sample codes and getting started resources for Neuron SDK that you can find on the Neuron samples repository.


About the Authors

Matias Ponchon is the Head of Infrastructure at Intuitivo. He specializes in architecting secure and robust applications. With extensive experience in FinTech and Blockchain companies, coupled with his strategic mindset, helps him to design innovative solutions. He has a deep commitment to excellence, that’s why he consistently delivers resilient solutions that push the boundaries of what’s possible.

Jose Benitez is the Founder and Director of AI at Intuitivo, specializing in the development and implementation of computer vision applications. He leads a talented Machine Learning team, nurturing an environment of innovation, creativity, and cutting-edge technology. In 2022, Jose was recognized as an ‘Innovator Under 35’ by MIT Technology Review, a testament to his groundbreaking contributions to the field. This dedication extends beyond accolades and into every project he undertakes, showcasing a relentless commitment to excellence and innovation.

Diwakar Bansal is an AWS Senior Specialist focused on business development and go-to-market for Gen AI and Machine Learning accelerated computing services. Previously, Diwakar has led product definition, global business development, and marketing of technology products for IoT, Edge Computing, and Autonomous Driving focusing on bringing AI and Machine Learning to these domains.

Read More

Empower your business users to extract insights from company documents using Amazon SageMaker Canvas Generative AI

Empower your business users to extract insights from company documents using Amazon SageMaker Canvas Generative AI

Enterprises seek to harness the potential of Machine Learning (ML) to solve complex problems and improve outcomes. Until recently, building and deploying ML models required deep levels of technical and coding skills, including tuning ML models and maintaining operational pipelines. Since its introduction in 2021, Amazon SageMaker Canvas has enabled business analysts to build, deploy, and use a variety of ML models – including tabular, computer vision, and natural language processing – without writing a line of code. This has accelerated the ability of enterprises to apply ML to use cases such as time-series forecasting, customer churn prediction, sentiment analysis, industrial defect detection, and many others.

As announced on October 5, 2023, SageMaker Canvas expanded its support of models to foundation models (FMs) – large language models used to generate and summarize content. With the October 12, 2023 release, SageMaker Canvas lets users ask questions and get responses that are grounded in their enterprise data. This ensures that results are context-specific, opening up additional use cases where no-code ML can be applied to solve business problems. For example, business teams can now formulate responses consistent with an organization’s specific vocabulary and tenets, and can more quickly query lengthy documents to get responses specific and grounded to the contents of those documents. All this content is performed in a private and secure manner, ensuring that all sensitive data is accessed with proper governance and safeguards.

To get started, a cloud administrator configures and populates Amazon Kendra indexes with enterprise data as data sources for SageMaker Canvas. Canvas users select the index where their documents are, and can ideate, research, and explore knowing that the output will always be backed by their sources-of-truth. SageMaker Canvas uses state-of-the-art FMs from Amazon Bedrock and Amazon SageMaker JumpStart. Conversations can be started with multiple FMs side-by-side, comparing the outputs and truly making generative-AI accessible to everyone.

In this post, we will review the recently released feature, discuss the architecture, and present a step-by-step guide to enable SageMaker Canvas to query documents from your knowledge base, as shown in the following screen capture.

Solution overview

Foundation models can produce hallucinations – responses that are generic, vague, unrelated, or factually incorrect. Retrieval Augmented Generation (RAG) is a frequently used approach to reduce hallucinations. RAG architectures are used to retrieve data from outside of an FM, which is then used to perform in-context learning to answer the user’s query. This ensures that the FM can use data from a trusted knowledge base and use that knowledge to answer users’ questions, reducing the risk of hallucination.

With RAG, the data external to the FM and used to augment user prompts can come from multiple disparate data sources, such as document repositories, databases, or APIs. The first step is to convert your documents and any user queries into a compatible format to perform relevancy semantic search. To make the formats compatible, a document collection, or knowledge library, and user-submitted queries are converted into numerical representations using embedding models.

With this release, RAG functionality is provided in a no-code and seamless manner. Enterprises can enrich the chat experience in Canvas with Amazon Kendra as the underlying knowledge management system. The following diagram illustrates the solution architecture.

Connecting SageMaker Canvas to Amazon Kendra requires a one-time set-up. We describe the set-up process in detail in Setting up Canvas to query documents. If you haven’t already set-up your SageMaker Domain, refer to Onboard to Amazon SageMaker Domain.

As part of the domain configuration, a cloud administrator can choose one or more Kendra indices that the business analyst can query when interacting with the FM through SageMaker Canvas.

After the Kendra indices are hydrated and configured, business analysts use them with SageMaker Canvas by starting a new chat and selecting “Query Documents” toggle. SageMaker Canvas will then manage the underlying communication between Amazon Kendra and the FM of choice to perform the following operations:

  1. Query the Kendra indices with the question coming from the user.
  2. Retrieve the snippets (and the sources) from Kendra indices.
  3. Engineer the prompt with the snippets with the original query so that the foundation model can generate an answer from the retrieved documents.
  4. Provide the generated answer to the user, along with references to the pages/documents that were used to formulate the response.

Setting up Canvas to query documents

In this section, we will walk you through the steps to set up Canvas to query documents served through Kendra indexes. You should have the following prerequisites:

  • SageMaker Domain setup – Onboard to Amazon SageMaker Domain
  • Create a Kendra index (or more than one)
  • Setup the Kendra Amazon S3 connector – follow the Amazon S3 Connector – and upload PDF files and other documents to the Amazon S3 bucket associated with the Kendra index
  • Setup IAM so that Canvas has the appropriate permissions, including those required for calling Amazon Bedrock and/or SageMaker endpoints – follow the Set-up Canvas Chat documentation

Now you can update the Domain so that it can access the desired indices. On the SageMaker console, for the given Domain, select Edit under the Domain Settings tab. Enable the toggle “Enable query documents with Amazon Kendra” which can be found at the Canvas Settings step. Once activated, choose one or more Kendra indices that you want to use with Canvas. Once activated, choose one or more Kendra indices that you want to use with Canvas.

That’s all that’s needed to configure Canvas query documents feature. Users can now jump into a chat within Canvas and start using the knowledge bases that have been attached to the Domain through the Kendra indexes. The maintainers of the knowledge-base can continue to update the source-of-truth and with the syncing capability in Kendra, the chat users will automatically be able to use the up-to-date information in a seamless manner.

Using the Query Documents feature for chat

As a SageMaker Canvas user, the Query Documents feature can be accessed from within a chat. To start the chat session, click or search for the “Generate, extract and summarize content” button from the Ready-to-use models tab in SageMaker Canvas.

Once there, you can turn on and off Query Documents with the toggle at the top of the screen. Check out the information prompt to learn more about the feature.

When Query Documents is enabled, you can choose among a list of Kendra indices enabled by the cloud administrator.

You can select an index when starting a new chat. You can then ask a question in the UX with knowledge being automatically sourced from the selected index. Note that after a conversation has started against a specific index, it is not possible to switch to another index.

For the questions asked, the chat will show the answer generated by the FM along with the source documents that contributed to generating the answer. When clicking any of the source documents, Canvas opens a preview of the document, highlighting the excerpt used by the FM.

Conclusion

Conversational AI has immense potential to transform customer and employee experience by providing a human-like assistant with natural and intuitive interactions such as:

  • Performing research on a topic or search and browse the organization’s knowledge base
  • Summarizing volumes of content to rapidly gather insights
  • Searching for Entities, Sentiments, PII and other useful data, and increasing the business value of unstructured content
  • Generating drafts for documents and business correspondence
  • Creating knowledge articles from disparate internal sources (incidents, chat logs, wikis)

The innovative integration of chat interfaces, knowledge retrieval, and FMs enables enterprises to provide accurate, relevant responses to user questions by using their domain knowledge and sources-of-truth.

By connecting SageMaker Canvas to knowledge bases in Amazon Kendra, organizations can keep their proprietary data within their own environment while still benefiting from state-of-the-art natural language capabilities of FMs. With the launch of SageMaker Canvas’s Query Documents feature, we are making it easy for any enterprise to use LLMs and their enterprise knowledge as source-of-truth to power a secure chat experience. All this functionality is available in a no-code format, allowing businesses to avoid handling the repetitive and non-specialized tasks.

To learn more about SageMaker Canvas and how it helps make it easier for everyone to start with Machine Learning, check out the SageMaker Canvas announcement. Learn more about how SageMaker Canvas helps foster collaboration between data scientists and business analysts by reading the Build, Share & Deploy post. Finally, to learn how to create your own Retrieval Augmented Generation workflow, refer to SageMaker JumpStart RAG.

References

Lewis, P., Perez, E., Piktus, A., Petroni, F., Karpukhin, V., Goyal, N., Küttler, H., Lewis, M., Yih, W., Rocktäschel, T., Riedel, S., Kiela, D. (2020). Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks. Advances in Neural Information Processing Systems, 33, 9459-9474.


About the Authors

Picture of DavideDavide Gallitelli is a Senior Specialist Solutions Architect for AI/ML. He is based in Brussels and works closely with customers all around the globe that are looking to adopt Low-Code/No-Code Machine Learning technologies, and Generative AI. He has been a developer since he was very young, starting to code at the age of 7. He started learning AI/ML at university, and has fallen in love with it since then.

Bilal Alam is an Enterprise Solutions Architect at AWS with a focus on the Financial Services industry. On most days Bilal is helping customers with building, uplifting and securing their AWS environment to deploy their most critical workloads. He has extensive experience in Telco, networking, and software development. More recently, he has been looking into using AI/ML to solve business problems.

Pashmeen Mistry is a Senior Product Manager at AWS. Outside of work, Pashmeen enjoys adventurous hikes, photography, and spending time with his family.

Dan Sinnreich is a Senior Product Manager at AWS, helping to democratize low-code/no-code machine learning. Previous to AWS, Dan built and commercialized enterprise SaaS platforms and time-series models used by institutional investors to manage risk and construct optimal portfolios. Outside of work, he can be found playing hockey, scuba diving, and reading science fiction.

Read More

Detection and high-frequency monitoring of methane emission point sources using Amazon SageMaker geospatial capabilities

Detection and high-frequency monitoring of methane emission point sources using Amazon SageMaker geospatial capabilities

Methane (CH4) is a major anthropogenic greenhouse gas that‘s a by-product of oil and gas extraction, coal mining, large-scale animal farming, and waste disposal, among other sources. The global warming potential of CH4 is 86 times that of CO2 and the Intergovernmental Panel on Climate Change (IPCC) estimates that methane is responsible for 30 percent of observed global warming to date. Rapidly reducing leakage of CH4 into the atmosphere represents a critical component in the fight against climate change. In 2021, the U.N. introduced The Global Methane Pledge at the Climate Change Conference (COP26), with a goal to take “fast action on methane to keep a 1.5C future within reach.” The Pledge has 150 signatories including the U.S. and EU.

Early detection and ongoing monitoring of methane sources is a key component of meaningful action on methane and is therefore becoming a concern for policy makers and organizations alike. Implementing affordable, effective methane detection solutions at scale – such as on-site methane detectors or aircraft-mounted spectrometers – is challenging, as they are often impractical or prohibitively expensive. Remote sensing using satellites, on the other hand, can provide the global-scale, high-frequency, and cost-effective detection functionality that stakeholders desire.

In this blog post, we show you how you can use Sentinel 2 satellite imagery hosted on the AWS Registry of Open Data in combination with Amazon SageMaker geospatial capabilities to detect point sources of CH4 emissions and monitor them over time. Drawing on recent findings from the earth observation literature you will learn how you can implement a custom methane detection algorithm and use it to detect and monitor methane leakage from a variety of sites across the globe. This post includes accompanying code on GitHub that provides additional technical detail and helps you to get started with your own methane monitoring solution.

Traditionally, running complex geospatial analyses was a difficult, time-consuming, and resource-intensive undertaking. Amazon SageMaker geospatial capabilities make it easier for data scientists and machine learning engineers to build, train, and deploy models using geospatial data. Using SageMaker geospatial capabilities, you can efficiently transform or enrich large-scale geospatial datasets, accelerate model building with pre-trained machine learning (ML) models, and explore model predictions and geospatial data on an interactive map using 3D accelerated graphics and built-in visualization tools.

Remote sensing of methane point sources using multispectral satellite imagery

Satellite-based methane sensing approaches typically rely on the unique transmittance characteristics of CH4. In the visible spectrum, CH4 has transmittance values equal or close to 1, meaning it’s undetectable by the naked eye. Across certain wavelengths, however, methane does absorb light (transmittance <1), a property which can be exploited for detection purposes. For this, the short wavelength infrared (SWIR) spectrum (1500–2500 nm spectral range) is typically chosen, which is where CH4 is most detectable. Hyper- and multispectral satellite missions (that is, those with optical instruments that capture image data within multiple wavelength ranges (bands) across the electromagnetic spectrum) cover these SWIR ranges and therefore represent potential detection instruments. Figure 1 plots the transmittance characteristics of methane in the SWIR spectrum and the SWIR coverage of various candidate multispectral satellite instruments (adapted from this study).

Figure 1 – Transmittance characteristics of methane in the SWIR spectrum and coverage of Sentinel-2 multi-spectral missions

Figure 1 – Transmittance characteristics of methane in the SWIR spectrum and coverage of Sentinel-2 multi-spectral missions

Many multispectral satellite missions are limited either by a low revisit frequency (for example, PRISMA Hyperspectral at approximately 16 days) or by low spatial resolution (for example, Sentinel 5 at 7.5 km x 7.5 km). The cost of accessing data is an additional challenge: some dedicated constellations operate as commercial missions, potentially making CH4 emission insights less readily available to researchers, decision makers, and other concerned parties due to financial constraints. ESA’s Sentinel-2 multispectral mission, which this solution is based on, strikes an appropriate balance between revisit rate (approximately 5 days), spatial resolution (approximately 20 m) and open access (hosted on the AWS Registry of Open Data).

Sentinel-2 has two bands that cover the SWIR spectrum (at a 20 m resolution): band-11 (1610 nm central wavelength) and band-12 (2190 nm central wavelength). Both bands are suitable for methane detection, while band-12 has significantly higher sensitivity to CH4 absorption (see Figure 1). Intuitively there are two possible approaches to using this SWIR reflectance data for methane detection. First, you could focus on just a single SWIR band (ideally the one that is most sensitive to CH4 absorption) and compute the pixel-by-pixel difference in reflectance across two different satellite passes. Alternatively, you use data from a single satellite pass for detection by using the two adjacent spectral SWIR bands that have similar surface and aerosol reflectance properties but have different methane absorption characteristics.

The detection method we implement in this blog post combines both approaches. We draw on recent findings from the earth observation literature and compute the fractional change in top-of-the-atmosphere (TOA) reflectance Δρ (that is, reflectance measured by Sentinel-2 including contributions from atmospheric aerosols and gases) between two satellite passes and the two SWIR bands; one baseline pass where no methane is present (base) and one monitoring pass where an active methane point source is suspected (monitor). Mathematically, this can be expressed as follows:

Equation 1Equation (1)

where ρ is the TOA reflectance as measured by Sentinel-2, cmonitor and cbase are computed by regressing TOA reflectance values of band-12 against those of band-11 across the entire scene (that is, ρb11 = c * ρb12). For more details, refer to this study on high-frequency monitoring of anomalous methane point sources with multispectral Sentinel-2 satellite observations.

Implement a methane detection algorithm with SageMaker geospatial capabilities

To implement the methane detection algorithm, we use the SageMaker geospatial notebook within Amazon SageMaker Studio. The geospatial notebook kernel is pre-equipped with essential geospatial libraries such as GDAL, GeoPandas, Shapely, xarray, and Rasterio, enabling direct visualization and processing of geospatial data within the Python notebook environment. See the getting started guide to learn how to start using SageMaker geospatial capabilities.

SageMaker provides a purpose-built API designed to facilitate the retrieval of satellite imagery through a consolidated interface using the SearchRasterDataCollection API call. SearchRasterDataCollection relies on the following input parameters:

  • Arn: The Amazon resource name (ARN) of the queried raster data collection
  • AreaOfInterest: A polygon object (in GeoJSON format) representing the region of interest for the search query
  • TimeRangeFilter: Defines the time range of interest, denoted as {StartTime: <string>, EndTime: <string>}
  • PropertyFilters: Supplementary property filters, such as specifications for maximum acceptable cloud cover, can also be incorporated

This method supports querying from various raster data sources which can be explored by calling ListRasterDataCollections. Our methane detection implementation uses Sentinel-2 satellite imagery, which can be globally referenced using the following ARN: arn:aws:sagemaker-geospatial:us-west-2:378778860802:raster-data-collection/public/nmqj48dcu3g7ayw8.

This ARN represents Sentinel-2 imagery, which has been processed to Level 2A (surface reflectance, atmospherically corrected). For methane detection purposes, we will use top-of-atmosphere (TOA) reflectance data (Level 1C), which doesn’t include the surface level atmospheric corrections that would make changes in aerosol composition and density (that is, methane leaks) undetectable.

To identify potential emissions from a specific point source, we need two input parameters: the coordinates of the suspected point source and a designated timestamp for methane emission monitoring. Given that the SearchRasterDataCollection API uses polygons or multi-polygons to define an area of interest (AOI), our approach involves expanding the point coordinates into a bounding box first and then using that polygon to query for Sentinel-2 imagery using SearchRasterDateCollection.

In this example, we monitor a known methane leak originating from an oil field in Northern Africa. This is a standard validation case in the remote sensing literature and is referenced, for example, in this study. A fully executable code base is provided on the amazon-sagemaker-examples GitHub repository. Here, we highlight only selected code sections that represent the key building blocks for implementing a methane detection solution with SageMaker geospatial capabilities. See the repository for additional details.

We start by initializing the coordinates and target monitoring date for the example case.

#coordinates and date for North Africa oil field
#see here for reference: https://doi.org/10.5194/amt-14-2771-2021
point_longitude = 5.9053
point_latitude = 31.6585
target_date = '2019-11-20'
#size of bounding box in each direction around point
distance_offset_meters = 1500

The following code snippet generates a bounding box for the given point coordinates and then performs a search for the available Sentinel-2 imagery based on the bounding box and the specified monitoring date:

def bbox_around_point(lon, lat, distance_offset_meters):
    #Equatorial radius (km) taken from https://nssdc.gsfc.nasa.gov/planetary/factsheet/earthfact.html
    earth_radius_meters = 6378137
    lat_offset = math.degrees(distance_offset_meters / earth_radius_meters)
    lon_offset = math.degrees(distance_offset_meters / (earth_radius_meters * math.cos(math.radians(lat))))
    return geometry.Polygon([
        [lon - lon_offset, lat - lat_offset],
        [lon - lon_offset, lat + lat_offset],
        [lon + lon_offset, lat + lat_offset],
        [lon + lon_offset, lat - lat_offset],
        [lon - lon_offset, lat - lat_offset],
    ])

#generate bounding box and extract polygon coordinates
aoi_geometry = bbox_around_point(point_longitude, point_latitude, distance_offset_meters)
aoi_polygon_coordinates = geometry.mapping(aoi_geometry)['coordinates']

#set search parameters
search_params = {
    "Arn": "arn:aws:sagemaker-geospatial:us-west-2:378778860802:raster-data-collection/public/nmqj48dcu3g7ayw8", # Sentinel-2 L2 data
    "RasterDataCollectionQuery": {
        "AreaOfInterest": {
            "AreaOfInterestGeometry": {
                "PolygonGeometry": {
                    "Coordinates": aoi_polygon_coordinates
                }
            }
        },
        "TimeRangeFilter": {
            "StartTime": "{}T00:00:00Z".format(as_iso_date(target_date)),
            "EndTime": "{}T23:59:59Z".format(as_iso_date(target_date))
        }
    },
}
#query raster data using SageMaker geospatial capabilities
sentinel2_items = geospatial_client.search_raster_data_collection(**search_params)

The response contains a list of matching Sentinel-2 items and their corresponding metadata. These include Cloud-Optimized GeoTIFFs (COG) for all Sentinel-2 bands, as well as thumbnail images for a quick preview of the visual bands of the image. Naturally, it’s also possible to access the full-resolution satellite image (RGB plot), shown in Figure 2 that follows.

Figure 2Figure 2 – Satellite image (RGB plot) of AOI

As previously detailed, our detection approach relies on fractional changes in top-of-the-atmosphere (TOA) SWIR reflectance. For this to work, the identification of a good baseline is crucial. Finding a good baseline can quickly become a tedious process that involves plenty of trial and error. However, good heuristics can go a long way in automating this search process. A search heuristic that has worked well for cases investigated in the past is as follows: for the past day_offset=n days, retrieve all satellite imagery, remove any clouds and clip the image to the AOI in scope. Then compute the average band-12 reflectance across the AOI. Return the Sentinel tile ID of the image with the highest average reflectance in band-12.

This logic is implemented in the following code excerpt. Its rationale relies on the fact that band-12 is highly sensitive to CH4 absorption (see Figure 1). A greater average reflectance value corresponds to a lower absorption from sources such as methane emissions and therefore provides a strong indication for an emission free baseline scene.

def approximate_best_reference_date(lon, lat, date_to_monitor, distance_offset=1500, cloud_mask=True, day_offset=30):
    
    #initialize AOI and other parameters
    aoi_geometry = bbox_around_point(lon, lat, distance_offset)
    BAND_12_SWIR22 = "B12"
    max_mean_swir = None
    ref_s2_tile_id = None
    ref_target_date = date_to_monitor   
        
    #loop over n=day_offset previous days
    for day_delta in range(-1 * day_offset, 0):
        date_time_obj = datetime.strptime(date_to_monitor, '%Y-%m-%d')
        target_date = (date_time_obj + timedelta(days=day_delta)).strftime('%Y-%m-%d')   
        
        #get Sentinel-2 tiles for current date
        s2_tiles_for_target_date = get_sentinel2_meta_data(target_date, aoi_geometry)
        
        #loop over available tiles for current date
        for s2_tile_meta in s2_tiles_for_target_date:
            s2_tile_id_to_test = s2_tile_meta['Id']
            #retrieve cloud-masked (optional) L1C band 12
            target_band_data = get_s2l1c_band_data_xarray(s2_tile_id_to_test, BAND_12_SWIR22, clip_geometry=aoi_geometry, cloud_mask=cloud_mask)
            #compute mean reflectance of SWIR band
            mean_swir = target_band_data.sum() / target_band_data.count()
            
            #ensure the visible/non-clouded area is adequately large
            visible_area_ratio = target_band_data.count() / (target_band_data.shape[1] * target_band_data.shape[2])
            if visible_area_ratio <= 0.7: #<-- ensure acceptable cloud cover
                continue
            
            #update maximum ref_s2_tile_id and ref_target_date if applicable
            if max_mean_swir is None or mean_swir > max_mean_swir: 
                max_mean_swir = mean_swir
                ref_s2_tile_id = s2_tile_id_to_test
                ref_target_date = target_date

    return (ref_s2_tile_id, ref_target_date)

Using this method allows us to approximate a suitable baseline date and corresponding Sentinel-2 tile ID. Sentinel-2 tile IDs carry information on the mission ID (Sentinel-2A/Sentinel-2B), the unique tile number (such as, 32SKA), and the date the image was taken among other information and uniquely identify an observation (that is, a scene). In our example, the approximation process suggests October 6, 2019 (Sentinel-2 tile: S2B_32SKA_20191006_0_L2A), as the most suitable baseline candidate.

Next, we can compute the corrected fractional change in reflectance between the baseline date and the date we’d like to monitor. The correction factors c (see Equation 1 preceding) can be calculated with the following code:

def compute_correction_factor(tif_y, tif_x):
    
    #get flattened arrays for regression
    y = np.array(tif_y.values.flatten())
    x = np.array(tif_x.values.flatten())
    np.nan_to_num(y, copy=False)
    np.nan_to_num(x, copy=False)
        
    #fit linear model using least squares regression
    x = x[:,np.newaxis] #reshape
    c, _, _, _ = np.linalg.lstsq(x, y, rcond=None)

    return c[0]

The full implementation of Equation 1 is given in the following code snippet:

def compute_corrected_fractional_reflectance_change(l1_b11_base, l1_b12_base, l1_b11_monitor, l1_b12_monitor):
    
    #get correction factors
    c_monitor = compute_correction_factor(tif_y=l1_b11_monitor, tif_x=l1_b12_monitor)
    c_base = compute_correction_factor(tif_y=l1_b11_base, tif_x=l1_b12_base)
    
    #get corrected fractional reflectance change
    frac_change = ((c_monitor*l1_b12_monitor-l1_b11_monitor)/l1_b11_monitor)-((c_base*l1_b12_base-l1_b11_base)/l1_b11_base)
    return frac_change

Finally, we can wrap the above methods into an end-to-end routine that identifies the AOI for a given longitude and latitude, monitoring date and baseline tile, acquires the required satellite imagery, and performs the fractional reflectance change computation.

def run_full_fractional_reflectance_change_routine(lon, lat, date_monitor, baseline_s2_tile_id, distance_offset=1500, cloud_mask=True):
    
    #get bounding box
    aoi_geometry = bbox_around_point(lon, lat, distance_offset)
    
    #get S2 metadata
    s2_meta_monitor = get_sentinel2_meta_data(date_monitor, aoi_geometry)
    
    #get tile id
    grid_id = baseline_s2_tile_id.split("_")[1]
    s2_tile_id_monitor = list(filter(lambda x: f"_{grid_id}_" in x["Id"], s2_meta_monitor))[0]["Id"]
    
    #retrieve band 11 and 12 of the Sentinel L1C product for the given S2 tiles
    l1_swir16_b11_base = get_s2l1c_band_data_xarray(baseline_s2_tile_id, BAND_11_SWIR16, clip_geometry=aoi_geometry, cloud_mask=cloud_mask)
    l1_swir22_b12_base = get_s2l1c_band_data_xarray(baseline_s2_tile_id, BAND_12_SWIR22, clip_geometry=aoi_geometry, cloud_mask=cloud_mask)
    l1_swir16_b11_monitor = get_s2l1c_band_data_xarray(s2_tile_id_monitor, BAND_11_SWIR16, clip_geometry=aoi_geometry, cloud_mask=cloud_mask)
    l1_swir22_b12_monitor = get_s2l1c_band_data_xarray(s2_tile_id_monitor, BAND_12_SWIR22, clip_geometry=aoi_geometry, cloud_mask=cloud_mask)
    
    #compute corrected fractional reflectance change
    frac_change = compute_corrected_fractional_reflectance_change(
        l1_swir16_b11_base,
        l1_swir22_b12_base,
        l1_swir16_b11_monitor,
        l1_swir22_b12_monitor
    )
    
    return frac_change

Running this method with the parameters we determined earlier yields the fractional change in SWIR TOA reflectance as an xarray.DataArray. We can perform a first visual inspection of the result by running a simple plot() invocation on this data array. Our method reveals the presence of a methane plume at the center of the AOI that was undetectable in the RGB plot seen previously.

Figure 3Figure 3 – Fractional reflectance change in TOA reflectance (SWIR spectrum)

As a final step, we extract the identified methane plume and overlay it on a raw RGB satellite image to provide the important geographic context. This is achieved by thresholding, which can be implemented as shown in the following:

def get_plume_mask(change_in_reflectance_tif, threshold_value):
    cr_masked = change_in_reflectance_tif.copy()
    #set values above threshold to nan
    cr_masked[cr_masked > treshold_value] = np.nan 
    #apply mask on nan values
    plume_tif = np.ma.array(cr_masked, mask=cr_masked==np.nan) 
    
    return plume_tif

For our case, a threshold of -0.02 fractional change in reflectance yields good results but this can change from scene to scene and you will have to calibrate this for your specific use case. Figure 4 that follows illustrates how the plume overlay is generated by combining the raw satellite image of the AOI with the masked plume into a single composite image that shows the methane plume in its geographic context.

Figure 4 – RGB image, fractional reflectance change in TOA reflectance (SWIR spectrum), and methane plume overlay for AOI

Figure 4 – RGB image, fractional reflectance change in TOA reflectance (SWIR spectrum), and methane plume overlay for AOI

Solution validation with real-world methane emission events

As a final step, we evaluate our method for its ability to correctly detect and pinpoint methane leakages from a range of sources and geographies. First, we use a controlled methane release experiment specifically designed for the validation of space-based point-source detection and quantification of onshore methane emissions. In this 2021 experiment, researchers performed several methane releases in Ehrenberg, Arizona over a 19-day period. Running our detection method for one of the Sentinel-2 passes during the time of that experiment produces the following result showing a methane plume:

Figure 5Figure 5 – Methane plume intensities for Arizona Controlled Release Experiment

The plume generated during the controlled release is clearly identified by our detection method. The same is true for other known real-world leakages (in Figure 6 that follows) from sources such as a landfill in East Asia (left) or an oil and gas facility in North America (right).

Figure 6Figure 6 – Methane plume intensities for an East Asian landfill (left) and an oil and gas field in North America (right)

In sum, our method can help identify methane emissions both from controlled releases and from various real-world point sources across the globe. This works best for on-shore point sources with limited surrounding vegetation. It does not work for off-shore scenes due to the high absorption (that is, low transmittance) of the SWIR spectrum by water. Given that the proposed detection algorithm relies on variations in methane intensity, our method also requires pre-leakage observations. This can make monitoring of leakages with constant emission rates challenging.

Clean up

To avoid incurring unwanted charges after a methane monitoring job has completed, ensure that you terminate the SageMaker instance and delete any unwanted local files.

Conclusion

By combining SageMaker geospatial capabilities with open geospatial data sources you can implement your own highly customized remote monitoring solutions at scale. This blog post focused on methane detection, a focal area for governments, NGOs and other organizations seeking to detect and ultimately avoid harmful methane emissions. You can get started today in your own journey into geospatial analytics by spinning up a Notebook with the SageMaker geospatial kernel and implement your own detection solution. See the GitHub repository to get started building your own satellite-based methane detection solution. Also check out the sagemaker-examples repository for further examples and tutorials on how to use SageMaker geospatial capabilities in other real-world remote sensing applications.


About the authors

Karsten SchroerDr. Karsten Schroer is a Solutions Architect at AWS. He supports customers in leveraging data and technology to drive sustainability of their IT infrastructure and build cloud-native data-driven solutions that enable sustainable operations in their respective verticals. Karsten joined AWS following his PhD studies in applied machine learning & operations management. He is truly passionate about technology-enabled solutions to societal challenges and loves to dive deep into the methods and application architectures that underlie these solutions.

Janosch WoschitzJanosch Woschitz is a Senior Solutions Architect at AWS, specializing in geospatial AI/ML. With over 15 years of experience, he supports customers globally in leveraging AI and ML for innovative solutions that capitalize on geospatial data. His expertise spans machine learning, data engineering, and scalable distributed systems, augmented by a strong background in software engineering and industry expertise in complex domains such as autonomous driving.

Read More