Index your web crawled content using the new Web Crawler for Amazon Kendra

Index your web crawled content using the new Web Crawler for Amazon Kendra

Amazon Kendra is a highly accurate and simple-to-use intelligent search service powered by machine learning (ML). Amazon Kendra offers a suite of data source connectors to simplify the process of ingesting and indexing your content, wherever it resides.

Valuable data in organizations is stored in both structured and unstructured repositories. An enterprise search solution should be able to provide you with a fully managed experience and simplify the process of indexing your content from a variety of data sources in the enterprise.

One such unstructured data repository are internal and external websites. Sites may need to be crawled to create news feeds, analyze language use, or create bots to answer questions based on the website data.

We’re excited to announce that you can now use the new Amazon Kendra Web Crawler to search for answers from content stored in internal and external websites or create chatbots. In this post, we show how to index information stored in websites and use the intelligent search in Amazon Kendra to search for answers from content stored in internal and external websites. In addition, the ML-powered intelligent search can accurately get answers for your questions from unstructured documents with natural language narrative content, for which keyword search is not very effective.

The Web Crawler offers the following new features:

  • Support for Basic, NTLM/Kerberos, Form, and SAML authentication
  • The ability to specify 100 seed URLs and store connection configuration in Amazon Simple Storage Service (Amazon S3)
  • Support for a web and internet proxy with the ability to provide proxy credentials
  • Support for crawling dynamic content, such as a website containing JavaScript
  • Field mapping and regex filtering features

Solution overview

With Amazon Kendra, you can configure multiple data sources to provide a central place to search across your document repository. For our solution, we demonstrate how to index a crawled website using the Amazon Kendra Web Crawler. The solution consists of the following steps:

  1. Choose an authentication mechanism for the website (if required) and store the details in AWS Secrets Manager.
  2. Create an Amazon Kendra index.
  3. Create a Web Crawler data source V2 via the Amazon Kendra console.
  4. Run a sample query to test the solution.

Prerequisites

To try out the Amazon Kendra Web Crawler, you need the following:

Gather authentication details

For protected and secure websites, the following authentication types and standards are supported:

  • Basic
  • NTLM/Kerberos
  • Form authentication
  • SAML

You need the authentication information when you set up the data source.

For basic or NTLM authentication, you need to provide your Secrets Manager secret, user name, and password.secrets manager basic auth

Form and SAML authentication require additional information, as shown in the following screenshot. Some of the fields like User name button Xpath are optional and will depend on whether the site you are crawling uses a button after entering the user name. Also note that you will need to know how to determine the Xpath of the user name and password field and the submit buttons.

secrets manager saml

Create an Amazon Kendra index

To create an Amazon Kendra index, complete the following steps:

  1. On the Amazon Kendra console, choose Create an Index.kendra
  2. For Index name, enter a name for the index (for example, Web Crawler).
  3. Enter an optional description.
  4. For Role name, enter an IAM role name.
  5. Configure optional encryption settings and tags.
  6. Choose Next.index details
  7. In the Configure user access control section, leave the settings at their defaults and choose Next.user access control
  8. For Provisioning editions, select Developer edition and choose Next.provisioning edition
  9. On the review page, choose Create.

This creates and propagates the IAM role and then creates the Amazon Kendra index, which can take up to 30 minutes.

kendra index

Create an Amazon Kendra Web Crawler data source

Complete the following steps to create your data source:

  1. On the Amazon Kendra console, choose Data sources in the navigation pane.
  2. Locate the WebCrawler connector V2.0 tile and choose Add connector.webcrawler connector
  3. For Data source name, enter a name (for example, crawl-fda).
  4. Enter an optional description.
  5. Choose Next.data source details
  6. In the Source section, select Source URL and enter a URL. For this post, we use https://www.fda.gov/ as an example source URL.
  7. In the Authentication section, chose the appropriate authentication based on the site that you want to crawl. For this post, we select No authentication because it’s a public site and doesn’t need authentication.
  8. In the Web proxy section, you can specify a Secrets Manager secret (if required).
    1. Choose Create and Add New Secret.
    2. Enter the authentication details that you gathered previously.
    3. Choose Save.
  9. In the IAM role section, choose Create a new role and enter a name (for example, AmazonKendra-Web Crawler-datasource-role).
  10. Choose Next.access and security
  11. In the Sync scope section, configure your sync settings based on the site you are crawling. For this post, we leave all the default settings.
  12. For Sync mode, choose how you want to update your index. For this post, we select Full sync.
  13. For Sync run schedule, choose Run on demand.
  14. Choose Next.sync setting
  15. Optionally, you can set field mappings. For this post, we keep the defaults for now.

Mapping fields is a useful exercise where you can substitute field names to values that are user-friendly and that fit in your organization’s vocabulary.

  1. Choose Next.field mapping
  2. Choose Add data source.add data source
  3. To sync the data source, choose Sync now on the data source details page.start sync
  4. Wait for the sync to complete.sync complete

Example of an authenticated website

If you want to crawl a site that has authentication, then in the Authentication section in the previous steps, you need to specify the authentication details. The following is an example if you selected Form authentication.

  1. In the Source section, select Source URL and enter a URL. For this example, we use https://accounts.autodesk.com.
  2. In the Authentication section, select Form authentication.
  3. In the Web proxy section, specify your Secrets Manager secret. This is required for any option other than No authentication.
    1. Choose Create and Add New Secret.
    2. Enter the authentication details that you gathered previously.
    3. Choose Save.

    create secrets manager secret

Test the solution

Now that you have ingested the content from the site into your Amazon Kendra index, you can test some queries.

  1. Go to your index and choose Search indexed content.
  2. Enter a sample search query and test out your search results (your query will vary based on the contents of site your crawled and the query entered).search results

Congratulations! You have successfully used Amazon Kendra to surface answers and insights based on the content indexed from the site you crawled.

Clean up

To avoid incurring future costs, clean up the resources you created as part of this solution. If you created a new Amazon Kendra index while testing this solution, delete it. If you only added a new data source using the Amazon Kendra Web Crawler V2, delete that data source.

Conclusion

With the new Amazon Kendra Web Crawler V2, organizations can crawl any website that is public or behind authentication and use it for intelligent search powered by Amazon Kendra.

To learn about these possibilities and more, refer to the Amazon Kendra Developer Guide. For more information on how you can create, modify, or delete metadata and content when ingesting your data, refer to Enriching your documents during ingestion and Enrich your content and metadata to enhance your search experience with custom document enrichment in Amazon Kendra.


About the Author

Jiten Dedhia is a Sr. Solutions Architect with over 20 years of experience in the software industry. He has worked with global financial services clients, providing them advice on modernizing by using services provided by AWS.

Gunwant Walbe is a Software Development Engineer at Amazon Web Services. He is an avid learner and keen to adopt new technologies. He develops complex business applications, and Java is his primary language of choice.

Read More

New – No-code generative AI capabilities now available in Amazon SageMaker Canvas

New – No-code generative AI capabilities now available in Amazon SageMaker Canvas

Launched in 2021, Amazon SageMaker Canvas is a visual, point-and-click service that allows business analysts and citizen data scientists to use ready-to-use machine learning (ML) models and build custom ML models to generate accurate predictions without the need to write any code. Ready-to-use models enable you to derive immediate insights from text, image, and document data (such as sentiment analysis, document processing, or object detection in images). Custom models allow you to build predictive models for use cases such as demand forecasting, customer churn, and defect detection in manufacturing.

We are excited to announce that SageMaker Canvas is expanding its support of ready-to-use models to include foundation models (FMs), enabling you to use generative AI to generate and summarize content. You can use natural language with a conversational chat interface to perform tasks such as creating narratives, reports, and blog posts; answering questions; summarizing notes and articles; and explaining concepts, without writing a single line of code. Your data is not used to improve the base models, is not shared with third-party model providers, and stays entirely within your secure AWS environment.

SageMaker Canvas allows you to access a variety of FMs that include Amazon Bedrock models (such as Claude 2 from Anthropic and Jurassic-2 from AI21 Labs) and publicly available Amazon SageMaker JumpStart models, including Falcon-7B-Instruct, Falcon-40B-Instruct, and MPT-7B-Instruct). You may use a single model or up to three models to compare model responses side by side. In SageMaker Canvas, Amazon Bedrock models are always active, allowing you to use them instantly. SageMaker JumpStart models can be started and deployed in your AWS account on demand and are automatically shut down after two hours of inactivity.

Let’s explore how to use the generative AI capabilities of SageMaker Canvas. For this post, we work with a fictitious enterprise customer support use case as an example.

Prerequisites

Complete the following prerequisite steps:

  1. Create an AWS account.
  2. Set up SageMaker Canvas and optionally configure it to use a VPC without internet access.
  3. Set up model access in Amazon Bedrock.
  4. Request service quota increases for g5.12xlarge and g5.2xlarge, if required, in your Region. These instances are required to host the SageMaker JumpStart model endpoints. Other instances may be selected based on availability.

Handling customer complaints

Let’s say that you’re a customer support analyst who handles complaints for a bicycle company. When receiving a customer complaint, you can use SageMaker Canvas to analyze the complaint and generate a personalized response to the customer. To do so, complete the following steps:

  1. On the SageMaker console, choose Canvas in the navigation pane.
  2. Choose your domain and user profile and choose Open Canvas to open the SageMaker Canvas application.

SageMaker Canvas is also accessible using single sign-on or other existing identity providers (IdPs) without having to first access the SageMaker console.

  1. Choose Generate, extract and summarize content to open the chat console.
  2. With the Claude 2 model selected, enter your instructions to retrieve the customer sentiment for the provided complaint and press Enter.
  3. You may want to know the specific problems with the bicycle, especially if it’s a long complaint. So, ask for the problems with the bicycle. Note that you don’t have to repost the complaint because SageMaker Canvas stores the context for your chat.

Now that we understand the customer’s problem, you can send them a response including a link to the company’s feedback form.

  1. In the input window, request a response to the customer complaint.
  2. If you want to generate another response from the FM, choose the refresh icon in the response section.

The original response and all new responses are paginated within the response section. Note that the new response is different from the original response. You can choose the copy icon in the response section to copy the response to an email or document, as required.

  1. You can also modify the model’s response by requesting specific changes. For example, let’s ask the model to add a $50 gift card offer to the email response.

Comparing model responses

You can compare the model responses from multiple models (up to three). Let’s compare two Amazon Bedrock models (Claude 2 and Jurassic-2 Ultra) with a SageMaker JumpStart model (Falcon-7B-Instruct) to evaluate and find the best model for your use case:

  1. Choose New chat to open a chat interface.
  2. On the model drop-down menu, choose Start up another model.
  3. On the Foundation models page, under Amazon SageMaker JumpStart models, choose Falcon-7B-Instruct and in the right pane, choose Start up model.

The model will take around 10 minutes to start.

  1. On the Foundation models page, confirm that the Falcon-7B-Instruct model is active before proceeding to the next step.
  2. Choose New chat to open a chat interface.
  3. Choose Compare to display a drop-down menu for the second model, then choose Compare again to display a drop-down menu for the third model.
  4. Choose the Falcon-7B-Instruct model on the first drop-down menu, Claude 2 on the second drop-down menu, and Jurassic-2 Ultra on the third drop-down menu.
  5. Enter your instructions in the chat input box and press Enter.

You will see responses from all three models.

Clean up

Any SageMaker JumpStart models started from SageMaker Canvas will be automatically shut down after 2 hours of inactivity. If you want to shut down these models sooner to save costs, follow the instructions in this section. Note that Amazon Bedrock models are not deployed in your account, so there is no need to shut these down.

  1. To shut down the Falcon-40B-Instruct SageMaker JumpStart model, you can choose from two methods:
    1. On the results comparison page, choose the Falcon-7B-Instruct model’s options menu (three dots), then choose Shut down model.
    2. Alternatively, choose New chat, and on the model drop-down menu, choose Start up another model. Then, on the Foundation models page, under Amazon SageMaker JumpStart models, choose Falcon-7B-Instruct and in the right pane, choose Shut down model.
  2. Choose Log out in the left pane to log out of the SageMaker Canvas application to stop the consumption of SageMaker Canvas workspace instance hours and release all resources used by the workspace instance.

Conclusion

In this post, you learned how to use SageMaker Canvas to generate text with ready-to-use models from Amazon Bedrock and SageMaker JumpStart. You used the Claude 2 model to analyze the sentiment of a customer complaint, ask questions, and generate a response without a single line of code. You also started a publicly available model and compared responses from three models.

For Amazon Bedrock models, you are charged based on the volume of input tokens and output tokens as per the Amazon Bedrock pricing page. Because SageMaker JumpStart models are deployed on SageMaker instances, you are charged for the duration of usage based on the instance type as per the Amazon SageMaker pricing page.

SageMaker Canvas continues to democratize AI with a no-code visual, interactive workspace that allows business analysts to build ML models that address a wide variety of use cases. Try out the new generative AI capabilities in SageMaker Canvas today! These capabilities are available in all Regions where Amazon Bedrock or SageMaker JumpStart are available.


About the Authors

Anand Iyer has been a Principal Solutions Architect at AWS since 2016. Anand has helped global healthcare, financial services, and telecommunications clients architect and implement enterprise software solutions using AWS and hybrid cloud technologies. He has an MS in Computer Science from Louisiana State University Baton Rouge, and an MBA from USC Marshall School of Business, Los Angeles. He is AWS certified in the areas of Security, Solutions Architecture, and DevOps Engineering.

Gavin Satur is a Principal Solutions Architect at Amazon Web Services. He works with enterprise customers to build strategic, well-architected solutions and is passionate about automation. Outside of work, he enjoys family time, tennis, cooking, and traveling.

Gunjan Jain is an AWS Solutions Architect in SoCal and primarily works with large financial services companies. He helps with cloud adoption, cloud optimization, and adopting best practices for being Well-Architected on the cloud.

Harpreet Dhanoa, a seasoned Senior Solutions Architect at AWS, has a strong background in designing and building scalable distributed systems. He is passionate about machine learning, observability, and analytics. He enjoys helping large-scale customers build their cloud enterprise strategy and transform their business in AWS. In his free time, Harpreet enjoys playing basketball with his two sons and spending time with his family.

Read More

Whisper models for automatic speech recognition now available in Amazon SageMaker JumpStart

Whisper models for automatic speech recognition now available in Amazon SageMaker JumpStart

Today, we’re excited to announce that the OpenAI Whisper foundation model is available for customers using Amazon SageMaker JumpStart. Whisper is a pre-trained model for automatic speech recognition (ASR) and speech translation. Trained on 680 thousand hours of labelled data, Whisper models demonstrate a strong ability to generalize to many datasets and domains without the need for fine-tuning. Sagemaker JumpStart is the machine learning (ML) hub of SageMaker that provides access to foundation models in addition to built-in algorithms and end-to-end solution templates to help you quickly get started with ML.

You can also do ASR using Amazon Transcribe ,a fully-managed and continuously trained automatic speech recognition service.

In this post, we show you how to deploy the OpenAI Whisper model and invoke the model to transcribe and translate audio.

The OpenAI Whisper model uses the huggingface-pytorch-inference container. As a SageMaker JumpStart model hub customer, you can use ASR without having to maintain the model script outside of the SageMaker SDK. SageMaker JumpStart models also improve security posture with endpoints that enable network isolation.

Foundation models in SageMaker

SageMaker JumpStart provides access to a range of models from popular model hubs including Hugging Face, PyTorch Hub, and TensorFlow Hub, which you can use within your ML development workflow in SageMaker. Recent advances in ML have given rise to a new class of models known as foundation models, which are typically trained on billions of parameters and can be adapted to a wide category of use cases, such as text summarization, generating digital art, and language translation. Because these models are expensive to train, customers want to use existing pre-trained foundation models and fine-tune them as needed, rather than train these models themselves. SageMaker provides a curated list of models that you can choose from on the SageMaker console.

You can now find foundation models from different model providers within SageMaker JumpStart, enabling you to get started with foundation models quickly. SageMaker JumpStart offers foundation models based on different tasks or model providers, and you can easily review model characteristics and usage terms. You can also try these models using a test UI widget. When you want to use a foundation model at scale, you can do so without leaving SageMaker by using pre-built notebooks from model providers. Because the models are hosted and deployed on AWS, you trust that your data, whether used for evaluating or using the model at scale, won’t be shared with third parties.

OpenAI Whisper foundation models

Whisper is a pre-trained model for ASR and speech translation. Whisper was proposed in the paper Robust Speech Recognition via Large-Scale Weak Supervision by Alec Radford, and others, from OpenAI. The original code can be found in this GitHub repository.

Whisper is a Transformer-based encoder-decoder model, also referred to as a sequence-to-sequence model. It was trained on 680 thousand hours of labelled speech data annotated using large-scale weak supervision. Whisper models demonstrate a strong ability to generalize to many datasets and domains without the need for fine-tuning.

The models were trained on either English-only data or multilingual data. The English-only models were trained on the task of speech recognition. The multilingual models were trained on speech recognition and speech translation. For speech recognition, the model predicts transcriptions in the same language as the audio. For speech translation, the model predicts transcriptions to a different language to the audio.

Whisper checkpoints come in five configurations of varying model sizes. The smallest four are trained on either English-only or multilingual data. The largest checkpoints are multilingual only. All ten of the pre-trained checkpoints are available on the Hugging Face hub. The checkpoints are summarized in the following table with links to the models on the hub:

Model name Number of parameters Multilingual
whisper-tiny 39 M Yes
whisper-base 74 M Yes
whisper-small 244 M Yes
whisper-medium 769 M Yes
whisper-large 1550 M Yes
whisper-large-v2 1550 M Yes

Lets explore how you can use Whisper models in SageMaker JumpStart.

OpenAI Whisper foundation models WER and latency comparison

The word error rate (WER) for different OpenAI Whisper models based on the LibriSpeech test-clean is shown in the following table.  WER is a common metric for the performance of a speech recognition or machine translation system. It measures the difference between the reference text (the ground truth or the correct transcription) and the output of an ASR system in terms of the number of errors, including substitutions, insertions, and deletions that are needed to transform the ASR output into the reference text. These numbers have been taken from the Hugging Face website.

Model WER (percent)
whisper-tiny 7.54
whisper-base 5.08
whisper-small 3.43
whisper-medium 2.9
whisper-large 3
whisper-large-v2 3

For this blog, we took the below audio file and compared the latency of speech recognition across different whisper models. Latency is the amount of time from the moment that a user sends a request until the time that your application indicates that the request has been completed. The numbers in the following table represent the average latency for a total of 100 requests using the same audio file with the model hosted on the ml.g5.2xlarge instance.

Model Average latency(s) Model output
whisper-tiny 0.43 We are living in very exciting times with machine lighting. The speed of ML model development will really actually increase. But you won’t get to that end state that we won in the next coming years. Unless we actually make these models more accessible to everybody.
whisper-base 0.49 We are living in very exciting times with machine learning. The speed of ML model development will really actually increase. But you won’t get to that end state that we won in the next coming years. Unless we actually make these models more accessible to everybody.
whisper-small 0.84 We are living in very exciting times with machine learning. The speed of ML model development will really actually increase. But you won’t get to that end state that we want in the next coming years unless we actually make these models more accessible to everybody.
whisper-medium 1.5 We are living in very exciting times with machine learning. The speed of ML model development will really actually increase. But you won’t get to that end state that we want in the next coming years unless we actually make these models more accessible to everybody.
whisper-large 1.96 We are living in very exciting times with machine learning. The speed of ML model development will really actually increase. But you won’t get to that end state that we want in the next coming years unless we actually make these models more accessible to everybody.
whisper-large-v2 1.98 We are living in very exciting times with machine learning. The speed of ML model development will really actually increase. But you won’t get to that end state that we want in the next coming years unless we actually make these models more accessible to everybody.

Solution walkthrough

You can deploy Whisper models using the Amazon SageMaker console or using an Amazon SageMaker Notebook. In this post, we demonstrate how to deploy the Whisper API using the SageMaker Studio console or a SageMaker Notebook and then use the deployed model for speech recognition and language translation. The code used in this post can be found in this GitHub notebook.

Let’s expand each step in detail.

Deploy Whisper from the console

  1. To get started with SageMaker JumpStart, open the Amazon SageMaker Studio console and go to the launch page of SageMaker JumpStart and select Get Started with JumpStart.
  2. To choose a Whisper model, you can either use the tabs at the top or use the search box at the top right as shown in the following screenshot. For this example, use the search box on the top right and enter Whisper, and then select the appropriate Whisper model from the dropdown menu.
  3. After you select the Whisper model, you can use the console to deploy the model. You can select an instance for deployment or use the default.

Deploy the foundation model from a Sagemaker Notebook

The steps to first deploy and then use the deployed model to solve different tasks are:

  1. Set up
  2. Select a model
  3. Retrieve artifacts and deploy an endpoint
  4. Use deployed model for ASR
  5. Use deployed model for language translation
  6. Clean up the endpoint

Set up

This notebook was tested on an ml.t3.medium instance in SageMaker Studio with the Python 3 (data science) kernel and in an Amazon SageMaker Notebook instance with the conda_python3 kernel.

%pip install --upgrade sagemaker --quiet

Select a pre-trained model

Set up a SageMaker Session using Boto3, and then select the model ID that you want to deploy.

model_id = "huggingface-asr-whisper-large-v2"

Retrieve artifacts and deploy an endpoint

Using SageMaker, you can perform inference on the pre-trained model, even without fine-tuning it first on a new dataset. To host the pre-trained model, create an instance of sagemaker.model.Model and deploy it. The following code uses the default instance ml.g5.2xlarge for the inference endpoint of a whisper-large-v2 model. You can deploy the model on other instance types by passing instance_type in the JumpStartModel class. The deployment might take few minutes.

#Deploying the model

from sagemaker.jumpstart.model import JumpStartModel
from sagemaker.serializers import JSONSerializer

my_model = JumpStartModel(model_id=dropdown.value)
predictor = my_model.deploy()

Automatic speech recognition

Next, you read the sample audio file, sample1.wav, from a SageMaker Jumpstart public Amazon Simple Storage Service (Amazon S3) location and pass it to the predictor for speech recognition. You can replace this sample file with any other sample audio file but make sure the .wav file is sampled at 16 kHz because is required by the automatic speech recognition models. The input audio file must be less than 30 seconds.

from scipy.io.wavfile import read
import json
import boto3
from sagemaker.jumpstart import utils

# The wav files must be sampled at 16kHz (this is required by the automatic speech recognition models), so make sure to resample them if required. The input audio file must be less than 30 seconds.
s3_bucket = utils.get_jumpstart_content_bucket(boto3.Session().region_name)
key_prefix = "training-datasets/asr_notebook_data"
input_audio_file_name = "sample1.wav"

s3_client = boto3.client("s3")
s3_client.download_file(s3_bucket, f"{key_prefix}/{input_audio_file_name }", input_audio_file_name )

with open(input_audio_file_name, "rb") as file:
    wav_file_read = file.read()

# If you receive client error (413) please check the payload size to the endpoint. Payloads for SageMaker invoke endpoint requests are limited to about 5MB
response = predictor.predict(wav_file_read)
print(response["text"])

This model supports many parameters when performing inference. They include:

  • max_length: The model generates text until the output length. If specified, it must be a positive integer.
  • language and task: Specify the output language and task here. The model supports the task of transcription or translation.
  • max_new_tokens: The maximum numbers of tokens to generate.
  • num_return_sequences: The number of output sequences returned. If specified, it must be a positive integer.
  • num_beams: The number of beams used in the greedy search. If specified, it must be integer greater than or equal to num_return_sequences.
  • no_repeat_ngram_size: The model ensures that a sequence of words of no_repeat_ngram_size isn’t repeated in the output sequence. If specified, it must be a positive integer greater than 1.
  • temperature: This controls the randomness in the output. Higher temperature results in an output sequence with low-probability words and lower temperature results in an output sequence with high-probability words. If temperature approaches 0, it results in greedy decoding. If specified, it must be a positive float.
  • early_stopping: If True, text generation is finished when all beam hypotheses reach the end of sentence token. If specified, it must be boolean.
  • do_sample: If True, sample the next word for the likelihood. If specified, it must be boolean.
  • top_k: In each step of text generation, sample from only the top_k most likely words. If specified, it must be a positive integer.
  • top_p: In each step of text generation, sample from the smallest possible set of words with cumulative probability top_p. If specified, it must be a float between 0 and 1.

You can specify any subset of the preceding parameters when invoking an endpoint. Next, we show you an example of how to invoke an endpoint with these arguments.

Language translation

To showcase language translation using Whisper models, use the following audio file in French and translate it to English. The file must be sampled at 16 kHz (as required by the ASR models), so make sure to resample files if required and make sure your samples don’t exceed 30 seconds.

  1. Download the sample_french1.wav from SageMaker JumpStart from the public S3 location so it can be passed in payload for translation by the Whisper model.

    input_audio_file_name = "sample_french1.wav"
    
    s3_client.download_file(s3_bucket, f"{key_prefix}/{input_audio_file_name }", input_audio_file_name )

  2. Set the task parameter as translate and language as French to force the Whisper model to perform speech translation.
    with open(input_audio_file_name, "rb") as file:
        wav_file_read = file.read()
    
    payload = {"audio_input": wav_file_read.hex(), "language": "french", "task": "translate"}
    
    predictor.serializer = JSONSerializer()
    predictor.content_type = "application/json"

  3. Use predictor to predict the translation of the language. If you receive client error (error 413), check the payload size to the endpoint. Payloads for SageMaker invoke endpoint requests are limited to about 5 MB.
    response = predictor.predict(payload)
    print(response["text"])

  4. The text output translated to English from the French audio file follows:
    [' Welcome to JPBSystem. We have more than 150 employees and 90% of sales. We have developed about 15 patents.']

Clean up

After you’ve tested the endpoint, delete the SageMaker inference endpoint and delete the model to avoid incurring charges.

Conclusion

In this post, we showed you how to test and use OpenAI Whisper models to build interesting applications using Amazon SageMaker. Try out the foundation model in SageMaker today and let us know your feedback!

This guidance is for informational purposes only. You should still perform your own independent assessment and take measures to ensure that you comply with your own specific quality control practices and standards, and the local rules, laws, regulations, licenses and terms of use that apply to you, your content, and the third-party model referenced in this guidance. AWS has no control or authority over the third-party model referenced in this guidance and does not make any representations or warranties that the third-party model is secure, virus-free, operational, or compatible with your production environment and standards. AWS does not make any representations, warranties, or guarantees that any information in this guidance will result in a particular outcome or result.


About the authors

Hemant Singh is an Applied Scientist with experience in Amazon SageMaker JumpStart. He got his masters from Courant Institute of Mathematical Sciences and B.Tech from IIT Delhi. He has experience in working on a diverse range of machine learning problems within the domain of natural language processing, computer vision, and time series analysis.

Rachna Chadha is a Principal Solution Architect AI/ML in Strategic Accounts at AWS. Rachna is an optimist who believes that ethical and responsible use of AI can improve society in future and bring economical and social prosperity. In her spare time, Rachna likes spending time with her family, hiking and listening to music.

Dr. Ashish Khetan is a Senior Applied Scientist with Amazon SageMaker built-in algorithms and helps develop machine learning algorithms. He got his PhD from University of Illinois Urbana-Champaign. He is an active researcher in machine learning and statistical inference, and has published many papers in NeurIPS, ICML, ICLR, JMLR, ACL, and EMNLP conferences.

Read More

Reinventing a cloud-native federated learning architecture on AWS

Machine learning (ML), especially deep learning, requires a large amount of data for improving model performance. Customers often need to train a model with data from different regions, organizations, or AWS accounts. It is challenging to centralize such data for ML due to privacy requirements, high cost of data transfer, or operational complexity.

Federated learning (FL) is a distributed ML approach that trains ML models on distributed datasets. The goal of FL is to improve the accuracy of ML models by using more data, while preserving the privacy and the locality of distributed datasets. FL increases the amount of data available for training ML models, especially data associated with rare and new events, resulting in a more general ML model. Existing partner open-source FL solutions on AWS include FedML and NVIDIA FLARE. These open-source packages are deployed in the cloud by running in virtual machines, without using the cloud-native services available on AWS.

In this blog, you will learn to build a cloud-native FL architecture on AWS. By using infrastructure as code (IaC) tools on AWS, you can deploy FL architectures with ease. Also, a cloud-native architecture takes full advantage of a variety of AWS services with proven security and operational excellence, thereby simplifying the development of FL.

We first discuss different approaches and challenges of FL. We then demonstrate how to build a cloud-native FL architecture on AWS. The sample code to build this architecture is available on GitHub. We use the AWS Cloud Development Kit (AWS CDK) to deploy the architecture with one-click deployment. The sample code demos a scenario where the server and all clients belong to the same organization (the same AWS account), but their datasets cannot be centralized due to data localization requirements. The sample code supports horizontal and synchronous FL for training neural network models. The ML framework used at FL clients is TensorFlow.

Overview of federated learning

FL typically involves a central FL server and a group of clients. Clients are compute nodes that perform local training. In an FL training round, the central server first sends a common global model to a group of clients. Clients train the global model with local data, then provide local models back to the server. The server aggregates the local models into a new global model, then starts a new training round. There may be tens of training rounds until the global model converges or until the number of training rounds reaches a threshold. Therefore, FL exchanges ML models between the central FL server and clients, without moving training data to a central location.

There are two major categories of FL depending on the client type: cross-device and cross-silo. Cross-device FL trains a common global models by keeping all the training data locally on a large number of devices, such as mobile phones or IoT devices, with limited and unstable network connections. Therefore, the design of cross-device FL needs to consider frequent joining and dropout of FL clients.

Cross-silo FL trains a global model on datasets distributed at different organizations and geo-distributed data centers. These datasets are prohibited from moving out of organizations and data center regions due to data protection regulations, operational challenges (such as data duplication and synchronization), or high costs. In contrast with cross-device FL, cross-silo FL assumes that organizations or data centers have reliable network connections, powerful computing resources, and addressable datasets.

FL has been applied to various industries, such as finance, healthcare, medicine, and telecommunications, where privacy preservation is critical or data localization is required. FL has been used to train a global model for financial crime detection among multiple financial institutions. The global model outperforms models trained with only local datasets by 20%. In healthcare, FL has been used to predict mortality of hospitalized patients based on electronic health records from multiple hospitals. The global model predicting mortality outperforms local models at all participating hospitals. FL has also been used for brain tumor segmentation. The global models for brain tumor segmentation perform similarly to the model trained by collecting distributed datasets at a central location. In telecommunications, FL can be applied to edge computing, wireless spectrum management, and 5G core networks.

There are many other ways to classify FL:

  • Horizontal or vertical – Depending on the partition of features in distributed datasets, FL can be classified as horizontal or vertical. In horizontal FL, all distributed datasets have the same set of features. In vertical FL, datasets have different groups of features, requiring additional communication patterns to align samples based on overlapped features.
  • Synchronous or asynchronous – Depending on the aggregation strategy at an FL server, FL can be classified as synchronous or asynchronous. A synchronous FL server aggregates local models from a selected set of clients into a global model. An asynchronous FL server immediately updates the global model after a local model is received from a client, thereby reducing the waiting time and improving training efficiency.
  • Hub-and-spoke or peer-to-peer – The typical FL topology is hub-and-spoke, where a central FL server coordinates a set of clients. Another FL topology is peer-to-peer without any centralized FL server, where FL clients aggregate information from neighboring clients to learn a model.

Challenges in FL

You can address the following challenges using algorithms running at FL servers and clients in a common FL architecture:

  • Data heterogeneity – FL clients’ local data can vary (i.e., data heterogeneity) due to particular geographic locations, organizations, or time windows. Data heterogeneity impacts the accuracy of global models, leading to more training iterations and longer training time. Many solutions have been proposed to mitigate the impact of data heterogeneity, such as optimization algorithms, partial data sharing among clients, and domain adaptation.
  • Privacy preservation – Local and global models may leak private information via an adversarial attack. Many privacy preservation approaches have been proposed for FL. A secure aggregation approach can be used to preserve the privacy of local models exchanged between FL servers and clients. Local and global differential privacy approaches bound the privacy loss by adding noise to local or global models, which provides a controlled trade-off between privacy and model accuracy. Depending on the privacy requirements, combinations of different privacy preservation approaches can be used.
  • Federated analytics – Federated analytics provides statistical measurements of distributed datasets without violating privacy requirements. Federated analytics is important not only for data analysis across distributed datasets before training, but also for model monitoring at inference.

Despite these challenges of FL algorithms, it is critical to build a secure architecture that provides end-to-end FL operations. One important challenge to building such an architecture is to enable the ease of deployment. The architecture must coordinate FL servers and clients for FL model building, training, and deployment, including continuous integration and continuous development (CI/CD) among clients, traceability, and authentication and access control for FL servers and clients. These features are similar to centralized ML operations (ML Ops), but are more challenging to implement because more parties are involved. The architecture also needs to be flexible to implement different FL topologies and synchronous or asynchronous aggregation.

Solution overview

We propose a cloud-native FL architecture on AWS, as shown in the following diagram. The architecture includes a central FL server and two FL clients. In reality, the number of FL clients can reach hundreds for cross-silo clients. The FL server must be on the AWS Cloud because it consists of a suite of microservices offered on the cloud. The FL clients can be on AWS or on the customer premises. The FL clients host their own local dataset and have their own IT and ML system for training ML models.

During FL model training, the FL server and a group of clients exchange ML models. That is, the clients download a global ML model from the server, perform local training, and upload local models to the server. The server downloads local models, aggregates local models into a new global model. This model exchange procedure is a single FL training round. The FL training round repeats until the global model reaches a given accuracy or the number of training rounds reach a threshold.

FL-architecture

Figure 1 – A cloud-native FL architecture for model training between a FL server and FL clients.

Prerequisites

To implement this solution, you need an AWS account to launch the services for a central FL server and the two clients. On-premises FL clients need to install the AWS Command Line Interface (AWS CLI), which allows access to the AWS services at the FL server, including Amazon Simple Queue Service (Amazon SQS), Amazon Simple Storage Service (Amazon S3), and Amazon DynamoDB.

Federated learning steps

In this section, we walk through the proposed architecture in Figure 1. At the FL server, the AWS Step Functions state machine runs a workflow as shown in Figure 2, which executes Steps 0, 1, and 5 from Figure 1. The state machine initiates the AWS services at the server (Step 0) and iterates FL training rounds. For each training round, the state machine sends out an Amazon Simple Notification Service (Amazon SNS) notification to the topic global_model_ready, along with a task token (Step 1). The state machine then pauses and waits for a callback with the task token. There are SQS queues subscribing to the global_model_ready topic. Each SQS queue corresponds to an FL client and queues the notifications sent from the server to the client.

Figure 2 – The workflow at the Step Functions state machine.

Each client keeps pulling messages from its assigned SQS queue. When a global_model_ready notification is received, the client downloads a global model from Amazon S3 (Step 2) and starts local training (Step 3). Local training generates a local model. The client then uploads the local model to Amazon S3 and writes the local model information, along with the received task token, to the DynamoDB table (Step 4).

We implement the FL model registry using Amazon S3 and DynamoDB. We use Amazon S3 to store the global and local models. We use DynamoDB table to store local model information because local model information can be different between FL algorithms, which requires a flexible schema supported by a DynamoDB table.

We also enable a DynamoDB stream to trigger a Lambda function, so that whenever a record is written into the DynamoDB table (when a new local model is received), a Lambda function is triggered to check if required local models are collected (Step 5). If so, the Lambda function runs the aggregation function to aggregate the local models into global models. The resulting global model is written to Amazon S3. The function also sends a callback, along with the task token retrieved from the DynamoDB table, to the Step Functions state machine. The state machine then determines if the FL training should be continued with a new training round or should be stopped based on a condition, for example, the number of training rounds reaching a threshold.

Each FL client uses the following sample code to interact with the FL server. If you want to customize the local training at your FL clients, the localTraining() function can be modified as long as the returned values are local_model_name and local_model_info for uploading to the FL server. You can select any ML framework for training local models at FL clients as long as all clients use the same ML framework.

# Step 2: receive notifications and model file name from its SQS queue
client.receiveNotificationsFromServer(sqs_region, client_queue_name)

# Step 3: download a global model and train locally
local_model_name, local_model_info = client.localTraining(global_model_name, s3_fl_model_registry)

# Step 4: upload the local model and local model info to the FL server
client.uploadToFLServer(s3_fl_model_registry, local_model_name, dynamodb_table_model_info, local_model_info)

The Lambda function for running the aggregation function at the server has the following sample code. If you want to customize the aggregation algorithm, you need to modify the fedAvg() function and the output.

# Step 5: aggregate local models in the Lambda function
def lambda_handler(event, context):
	# obtain task_name from the event triggered by the DynamoDB Stream
	task_name = event['Records'][0]['dynamodb']['Keys']['taskName']['S']

	# retrieve transactions from the DynamoDB table
	transactions = readFromFLServerTaskTable(os.environ['TASKS_TABLE_NAME'], task_name)

	# read local model info from required clients 
	# token is a call back token from the Step Functions state machine
	local_model_info, round_id, token = receiveUpdatedModelsFromClients(transactions, task_name)

	# fedAvg function aggregates local models into a global model and stores the global model in S3
	global_model_name, avg_train_acc, avg_test_acc, avg_train_loss, avg_test_loss = fedAvg(local_model_info, round_id)

	# output sent to the Step Function state machine
	output = {'taskName': task_name, 'roundId': str(round_id), 'trainAcc': str(avg_train_acc), 'testAcc': str(avg_test_acc), 'trainLoss': str(avg_train_loss), 'testLoss': str(avg_test_loss), 'weightsFile': str(global_model_name)}

	# send call back to the Step Functions state machine to report that the task identified by the token successfully completed
	step_client = boto3.client('stepfunctions')
	out_str = json.dumps(output)
	step_client.send_task_success(taskToken=token, output=out_str)

This architecture has two innovative designs. First, the FL server uses serverless services, such as Step Functions and Lambda. Therefore, no computing instance is kept running for the FL server, which minimizes the computing cost. Second, FL clients pull messages from their assigned SQS queues and upload or download models and info to or from services at the FL server. This design avoids the FL server directly accessing resources at the clients, which is critical to provide private and flexible IT and ML environments (on premises or on the AWS Cloud) to FL clients.

Advantages of being cloud-native

This architecture is cloud-native and provides end-to-end transparency by using AWS services with proven security and operational excellence. For example, you can have cross-account clients to assume roles to access the resource at the FL server. For on-premises clients, the AWS CLI and AWS SDK for Python (Boto3) at clients automatically provide secure network connections between the FL server and clients. For clients on the AWS Cloud, you can use AWS PrivateLink and AWS services with data encryption in transit and at rest for data protection. You can use Amazon Cognito and AWS Identity and Access Management (IAM) for the authentication and access control of FL servers and clients. For deploying the trained global model, you can use ML Ops capabilities in Amazon SageMaker.

The cloud-native architecture also enables integration with customized ML frameworks and federated learning algorithms and protocols. For example, you can select a ML framework for training local models at FL clients and customize different aggregation algorithms as scripts running in Lambda functions at the server. Also, you can modify the workflows in Step Functions to accommodate different communication protocols between the server and clients.

Another advantage of the cloud-native architecture is the ease of deployment by using IaC tools offered for the cloud. You can use the AWS Cloud Development Kit (AWS CDK) and AWS CloudFormation for one-click deployment.

Conclusion

New privacy laws continue to be implemented worldwide, and technology infrastructures are rapidly expanding across multiple regions and extending to network edges. Federated learning helps cloud customers use distributed datasets to train accurate ML models in a privacy-preserving manner. Federated learning also supports data localization and potentially saves costs, because it does not require large amounts of raw data to be moved or shared.

You can start experimenting and building cloud-native federated learning architectures for your use cases. You can customize the architecture to support various ML frameworks, such as TensorFlow or PyTorch. You can also customize it to support different FL algorithms, including asynchronous federated learning, aggregation algorithms, and differential privacy algorithms. You can enable this architecture with FL Ops functionalities using ML Ops capabilities in Amazon SageMaker.


About the Authors

Qiong (Jo) Zhang, PhD, is a Senior Partner SA at AWS, specializing in AI/ML. Her current areas of interest include federated learning, distributed training, and generative AI.  She holds 30+ patents and has co-authored 100+ journal/conference papers. She is also the recipient of the Best Paper Award at IEEE NetSoft 2016, IEEE ICC 2011, ONDM 2010, and IEEE GLOBECOM 2005.


Parker Newton
is an applied scientist in AWS Cryptography. He received his Ph.D. in cryptography from U.C. Riverside, specializing in lattice-based cryptography and the complexity of computational learning problems. He is currently working at AWS in secure computation and privacy, designing cryptographic protocols to enable customers to securely run workloads in the cloud while preserving the privacy of their data.

Olivia Choudhury, PhD, is a Senior Partner SA at AWS. She helps partners, in the Healthcare and Life Sciences domain, design, develop, and scale state-of-the-art solutions leveraging AWS. She has a background in genomics, healthcare analytics, federated learning, and privacy-preserving machine learning. Outside of work, she plays board games, paints landscapes, and collects manga.

Gang Fu  is a Healthcare Solution Architect at AWS. He holds a PhD in Pharmaceutical Science from the University of Mississippi and has over ten years of technology and biomedical research experience. He is passionate about technology and the impact it can make on healthcare.

Kris is a renowned leader in machine learning and generative AI, with a career spanning Goldman Sachs, consulting for major banks, and successful ventures like Foglight and SiteRock. He founded Indigo Capital Management and co-founded adaptiveARC, focusing on green energy tech. Kris also supports non-profits aiding assault victims and disadvantaged youth.

Bill Horne is a General Manager in AWS Cryptography. He leads the Cryptographic Computing Program, consisting of a team of applied scientists and engineers who are solving customer problems using emerging technologies like secure multiparty computation and homomorphic encryption. Prior to joining AWS in 2020 he was the VP and General Manager of Intertrust Secure Systems and was the Director of Security Research at Hewlett-Packard Enterprise. He is the author of 60 peer reviewed publications in the areas of security and machine learning, and holds 50 granted patents and 58 patents pending.

Read More

Mistral 7B foundation models from Mistral AI are now available in Amazon SageMaker JumpStart

Mistral 7B foundation models from Mistral AI are now available in Amazon SageMaker JumpStart

Today, we are excited to announce that the Mistral 7B foundation models, developed by Mistral AI, are available for customers through Amazon SageMaker JumpStart to deploy with one click for running inference. With 7 billion parameters, Mistral 7B can be easily customized and quickly deployed. You can try out this model with SageMaker JumpStart, a machine learning (ML) hub that provides access to algorithms and models so you can quickly get started with ML. In this post, we walk through how to discover and deploy the Mistral 7B model.

What is Mistral 7B

Mistral 7B is a foundation model developed by Mistral AI, supporting English text and code generation abilities. It supports a variety of use cases, such as text summarization, classification, text completion, and code completion. To demonstrate the easy customizability of the model, Mistral AI has also released a Mistral 7B Instruct model for chat use cases, fine-tuned using a variety of publicly available conversation datasets.

Mistral 7B is a transformer model and uses grouped-query attention and sliding-window attention to achieve faster inference (low latency) and handle longer sequences. Group query attention is an architecture that combines multi-query and multi-head attention to achieve output quality close to multi-head attention and comparable speed to multi-query attention. Sliding-window attention uses the stacked layers of a transformer to attend in the past beyond the window size to increase context length. Mistral 7B has an 8,000-token context length, demonstrates low latency and high throughput, and has strong performance when compared to larger model alternatives, providing low memory requirements at a 7B model size. The model is made available under the permissive Apache 2.0 license, for use without restrictions.

What is SageMaker JumpStart

With SageMaker JumpStart, ML practitioners can choose from a growing list of best-performing foundation models. ML practitioners can deploy foundation models to dedicated Amazon SageMaker instances within a network isolated environment, and customize models using SageMaker for model training and deployment.

You can now discover and deploy Mistral 7B with a few clicks in Amazon SageMaker Studio or programmatically through the SageMaker Python SDK, enabling you to derive model performance and MLOps controls with SageMaker features such as Amazon SageMaker Pipelines, Amazon SageMaker Debugger, or container logs. The model is deployed in an AWS secure environment and under your VPC controls, helping ensure data security.

Discover models

You can access Mistral 7B foundation models through SageMaker JumpStart in the SageMaker Studio UI and the SageMaker Python SDK. In this section, we go over how to discover the models in SageMaker Studio.

SageMaker Studio is an integrated development environment (IDE) that provides a single web-based visual interface where you can access purpose-built tools to perform all ML development steps, from preparing data to building, training, and deploying your ML models. For more details on how to get started and set up SageMaker Studio, refer to Amazon SageMaker Studio.

In SageMaker Studio, you can access SageMaker JumpStart, which contains pre-trained models, notebooks, and prebuilt solutions, under Prebuilt and automated solutions.

From the SageMaker JumpStart landing page, you can browse for solutions, models, notebooks, and other resources. You can find Mistral 7B in the Foundation Models: Text Generation carousel.

You can also find other model variants by choosing Explore all Text Models or searching for “Mistral.”

You can choose the model card to view details about the model such as license, data used to train, and how to use. You will also find two buttons, Deploy and Open notebook, which will help you use the model (the following screenshot shows the Deploy option).

Deploy models

Deployment starts when you choose Deploy. Alternatively, you can deploy through the example notebook that shows up when you choose Open notebook. The example notebook provides end-to-end guidance on how to deploy the model for inference and clean up resources.

To deploy using notebook, we start by selecting the Mistral 7B model, specified by the model_id. You can deploy any of the selected models on SageMaker with the following code:

from sagemaker.jumpstart.model import JumpStartModel

model = JumpStartModel(model_id="huggingface-llm-mistral-7b-instruct")
predictor = model.deploy()

This deploys the model on SageMaker with default configurations, including default instance type (ml.g5.2xlarge) and default VPC configurations. You can change these configurations by specifying non-default values in JumpStartModel. After it’s deployed, you can run inference against the deployed endpoint through the SageMaker predictor:

payload = {"inputs": "<s>[INST] Hello! [/INST]"}
predictor.predict(payload)

Optimizing the deployment configuration

Mistral models use Text Generation Inference (TGI version 1.1) model serving. When deploying models with the TGI deep learning container (DLC), you can configure a variety of launcher arguments via environment variables when deploying your endpoint. To support the 8,000-token context length of Mistral 7B models, SageMaker JumpStart has configured some of these parameters by default: we set MAX_INPUT_LENGTH and MAX_TOTAL_TOKENS to 8191 and 8192, respectively. You can view the full list by inspecting your model object:

print(model.env)

By default, SageMaker JumpStart doesn’t clamp concurrent users via the environment variable MAX_CONCURRENT_REQUESTS smaller than the TGI default value of 128. The reason is because some users may have typical workloads with small payload context lengths and want high concurrency. Note that the SageMaker TGI DLC supports multiple concurrent users through rolling batch. When deploying your endpoint for your application, you might consider whether you should clamp MAX_TOTAL_TOKENS or MAX_CONCURRENT_REQUESTS prior to deployment to provide the best performance for your workload:

model.env["MAX_CONCURRENT_REQUESTS"] = "4"

Here, we show how model performance might differ for your typical endpoint workload. In the following tables, you can observe that small-sized queries (128 input words and 128 output tokens) are quite performant under a large number of concurrent users, reaching token throughput on the order of 1,000 tokens per second. However, as the number of input words increases to 512 input words, the endpoint saturates its batching capacity—the number of concurrent requests allowed to be processed simultaneously—resulting in a throughput plateau and significant latency degradations starting around 16 concurrent users. Finally, when querying the endpoint with large input contexts (for example, 6,400 words) simultaneously by multiple concurrent users, this throughput plateau occurs relatively quickly, to the point where your SageMaker account will start encountering 60-second response timeout limits for your overloaded requests.

. throughput (tokens/s)
concurrent users 1 2 4 8 16 32 64 128
model instance type input words output tokens .
mistral-7b-instruct ml.g5.2xlarge 128 128 30 54 89 166 287 499 793 1030
512 128 29 50 80 140 210 315 383 458
6400 128 17 25 30 35
. p50 latency (ms/token)
concurrent users 1 2 4 8 16 32 64 128
model instance type input words output tokens .
mistral-7b-instruct ml.g5.2xlarge 128 128 32 33 34 36 41 46 59 88
512 128 34 36 39 43 54 71 112 213
6400 128 57 71 98 154

Inference and example prompts

Mistral 7B

You can interact with a base Mistral 7B model like any standard text generation model, where the model processes an input sequence and outputs predicted next words in the sequence. The following is a simple example with multi-shot learning, where the model is provided with several examples and the final example response is generated with contextual knowledge of these previous examples:

> Input
Tweet: "I get sad when my phone battery dies."
Sentiment: Negative
###
Tweet: "My day has been :+1:"
Sentiment: Positive
###
Tweet: "This is the link to the article"
Sentiment: Neutral
###
Tweet: "This new music video was incredibile"
Sentiment:

> Output
 Positive

Mistral 7B instruct

The instruction-tuned version of Mistral accepts formatted instructions where conversation roles must start with a user prompt and alternate between user and assistant. A simple user prompt may look like the following:

<s>[INST] {user_prompt} [/INST]

A multi-turn prompt would look like the following:

<s>[INST] {user_prompt_1} [/INST] {assistant_response_1} </s><s>[INST] {user_prompt_1} [/INST]

This pattern repeats for however many turns are in the conversation.

In the following sections, we explore some examples using the Mistral 7B Instruct model.

Knowledge retrieval

The following is an example of knowledge retrieval:

> Input
<s>[INST] Which country has the most natural lakes? Answer with only the country name. [/INST] 

> Output
1. Canada

Large context question answering

To demonstrate how to use this model to support large input context lengths, the following example embeds a passage, titled “Rats” by Robert Sullivan (reference), from the MCAS Grade 10 English Language Arts Reading Comprehension test into the input prompt instruction and asks the model a directed question about the text:

> Input
<s>[INST] A rat is a rodent, the most common mammal in the world. Rattus norvegicus is one of the approximately four hundred different kinds of rodents, and it is known by many names, each of which describes a trait or a perceived trait or sometimes a habitat: the earth rat, the roving rat, the barn rat, the fi eld rat, the migratory rat, the house rat, the sewer rat, the water rat, the wharf rat, the alley rat, the gray rat, the brown rat, and the common rat. The average brown rat is large and stocky; it grows to be approximately sixteen inches long from its nose to its tail—the size of a large adult human male’s foot—and weighs about a pound, though brown rats have been measured by scientists and exterminators at twenty inches and up to two pounds. The brown rat is sometimes confused with the black rat, or Rattus rattus, which is smaller and once inhabited New York City and all of the cities of America but, since Rattus norvegicus pushed it out, is now relegated to a minor role. (The two species still survive alongside each other in some Southern coastal cities and on the West Coast, in places like Los Angeles, for example, where the black rat lives in attics and palm trees.) The black rat is always a very dark gray, almost black, and the brown rat is gray or brown, with a belly that can be light gray, yellow, or even a pure-seeming white. One spring, beneath the Brooklyn Bridge, I saw a red-haired brown rat that had been run over by a car. Both pet rats and laboratory rats are Rattus norvegicus, but they are not wild and therefore, I would emphasize, not the subject of this book. Sometimes pet rats are called fancy rats. But if anyone has picked up this book to learn about fancy rats, then they should put this book down right away; none of the rats mentioned herein are at all fancy.

Rats are nocturnal, and out in the night the brown rat’s eyes are small and black and shiny; when a fl ashlight shines into them in the dark, the eyes of a rat light up like the eyes of a deer. Though it forages* in darkness, the brown rat has poor eyesight. It makes up for this with, fi rst of all, an excellent sense of smell. . . . They have an excellent sense of taste, detecting the most minute amounts of poison, down to one part per million. A brown rat has strong feet, the two front paws each equipped with four clawlike nails, the rear paws even longer and stronger. It can run and climb with squirrel-like agility. It is an excellent swimmer, surviving in rivers and bays, in sewer streams and toilet bowls.

The brown rat’s teeth are yellow, the front two incisors being especially long and sharp, like buckteeth. When the brown rat bites, its front two teeth spread apart. When it gnaws, a fl ap of skin plugs the space behind its incisors. Hence, when the rat gnaws on indigestible materials—concrete or steel, for example—the shavings don’t go down the rat’s throat and kill it. Its incisors grow at a rate of fi ve inches per year. Rats always gnaw, and no one is certain why—there are few modern rat studies. It is sometimes erroneously stated that the rat gnaws solely to limit the length of its incisors, which would otherwise grow out of its head, but this is not the case: the incisors wear down naturally. In terms of hardness, the brown rat’s teeth are stronger than aluminum, copper, lead, and iron. They are comparable to steel. With the alligator-like structure of their jaws, rats can exert a biting pressure of up to seven thousand pounds per square inch. Rats, like mice, seem to be attracted to wires—to utility wires, computer wires, wires in vehicles, in addition to gas and water pipes. One rat expert theorizes that wires may be attractive to rats because of their resemblance to vines and the stalks of plants; cables are the vines of the city. By one estimate, 26 percent of all electric-cable breaks and 18 percent of all phone-cable disruptions are caused by rats. According to one study, as many as 25 percent of all fi res of unknown origin are rat-caused. Rats chew electrical cables. Sitting in a nest of tattered rags and newspapers, in the fl oorboards of an old tenement, a rat gnaws the head of a match—the lightning in the city forest.

When it is not gnawing or feeding on trash, the brown rat digs. Anywhere there is dirt in a city, brown rats are likely to be digging—in parks, in fl owerbeds, in little dirt-poor backyards. They dig holes to enter buildings and to make nests. Rat nests can be in the floorboards of apartments, in the waste-stuffed corners of subway stations, in sewers, or beneath old furniture in basements. “Cluttered and unkempt alleyways in cities provide ideal rat habitat, especially those alleyways associated with food-serving establishments,” writes Robert Corrigan in Rodent Control, a pest control manual. “Alley rats can forage safely within the shadows created by the alleyway, as well as quickly retreat to the safety of cover in these narrow channels.” Often, rats burrow under concrete sidewalk slabs. Entrance to a typical under-the-sidewalk rat’s nest is gained through a two-inch-wide hole—their skeletons collapse and they can squeeze into a hole as small as three quarters of an inch wide, the average width of their skull. This tunnel then travels about a foot down to where it widens into a nest or den. The den is lined with soft debris, often shredded plastic garbage or shopping bags, but sometimes even grasses or plants; some rat nests have been found stuffed with the gnawed shavings of the wood-based, spring-loaded snap traps that are used in attempts to kill them. The back of the den then narrows into a long tunnel that opens up on another hole back on the street. This second hole is called a bolt hole; it is an emergency exit. A bolt hole is typically covered lightly with dirt or trash—camoufl age. Sometimes there are networks of burrows, which can stretch beneath a few concrete squares on a sidewalk, or a number of backyards, or even an entire city block—when Rattus norvegicus fi rst came to Selkirk, England, in 1776, there were so many burrows that people feared the town might sink. Rats can also nest in basements, sewers, manholes, abandoned pipes of any kind, fl oorboards, or any hole or depression. “Often,” Robert Corrigan writes, “‘city rats’ will live unbeknownst to people right beneath their feet.”

Rats also inhabit subways, as most people in New York City and any city with a subway system are well aware. Every once in a while, there are reports of rats boarding trains, but for the most part rats stay on the tracks—subway workers I have talked to refer to rats as “track rabbits.” People tend to think that the subways are fi lled with rats, but in fact rats are not everywhere in the system; they live in the subways according to the supply of discarded human food and sewer leaks. Sometimes, rats use the subway purely for nesting purposes; they fi nd ways through the walls of the subway stations leading from the tracks to the restaurants and stores on the street—the vibrations of subway trains tend to create rat-size cracks and holes. Many subway rats tend to live near stations that are themselves near fast-food restaurants. At the various subway stations near Herald Square, for example, people come down from the streets and throw the food that they have not eaten onto the tracks, along with newspapers and soda bottles and, I have noticed, thousands of nolonger-charged AA batteries, waiting to leak acid. The rats eat freely from the waste and sit at the side of the little streams of creamy brown sewery water that fl ows between the rails. They sip the water the way rats do, either with their front paws or by scooping it up with their incisors.


What is the most likely reason the author states, "cables are the vines of the city"? [/INST] 


> Output
1. The author states that rats are attracted to wires, such as utility wires, computer wires, and vehicle wires, which may be due to their resemblance to vines and the stalks of plants.

Mathematics and reasoning

The Mistral models also report strengths in mathematics accuracy. Mistral can provide comprehension such as the following math logic:

> Input
<s>[INST] I bought an ice cream for 6 kids. Each cone was $1.25 and I paid with a $10 bill. How many dollars did I get back? Explain first before answering. [/INST]

> Output
1. First, let's figure out how much the ice cream cost in total. Since each cone was $1.25 and there were 6 kids, the total cost would be:

Total cost = Number of cones * Cost per cone
Total cost = 6 * $1.25
Total cost = $7.50

2. Next, we'll subtract the total cost from the amount paid ($10) to find out how much money was returned:

Money returned = Amount paid - Total cost
Money returned = $10 - $7.50
Money returned = $2.50

So, you got back $2.50 after buying the ice cream for the 6 kids.

Coding

The following is an example of a coding prompt:

> Input
<s>[INST] In Bash, how do I list all text files in the current directory (excluding subdirectories) that have been modified in the last month? [/INST] 


> Output
1. Use the `find` command to search for text files in the current directory (excluding subdirectories) that have been modified in the last month. The `-type f` option specifies that we only want to search for files, not directories. The `-name "*.txt"` option specifies that we only want to search for files with the `.txt` extension. The `-mtime -30` option specifies that we only want to search for files that have been modified in the last 30 days (1 month).

```bash
find . -type f -name "*.txt" -mtime -30
```

This command will list all text files in the current directory (excluding subdirectories) that have been modified in the last month.

Clean up

After you’re done running the notebook, make sure to delete all the resources that you created in the process so your billing is stopped. Use the following code:

predictor.delete_model()
predictor.delete_endpoint()

Conclusion

In this post, we showed you how to get started with Mistral 7B in SageMaker Studio and deploy the model for inference. Because foundation models are pre-trained, they can help lower training and infrastructure costs and enable customization for your use case. Visit Amazon SageMaker JumpStart now to get started.

Resources


About the Authors

Dr. Kyle Ulrich is an Applied Scientist with the Amazon SageMaker JumpStart team. His research interests include scalable machine learning algorithms, computer vision, time series, Bayesian non-parametrics, and Gaussian processes. His PhD is from Duke University and he has published papers in NeurIPS, Cell, and Neuron.

Dr. Ashish Khetan is a Senior Applied Scientist with Amazon SageMaker JumpStart and helps develop machine learning algorithms. He got his PhD from University of Illinois Urbana-Champaign. He is an active researcher in machine learning and statistical inference, and has published many papers in NeurIPS, ICML, ICLR, JMLR, ACL, and EMNLP conferences.

Vivek Singh is a product manager with Amazon SageMaker JumpStart. He focuses on enabling customers to onboard SageMaker JumpStart to simplify and accelerate their ML journey to build generative AI applications.

Roy Allela is a Senior AI/ML Specialist Solutions Architect at AWS based in Munich, Germany. Roy helps AWS customers—from small startups to large enterprises—train and deploy large language models efficiently on AWS. Roy is passionate about computational optimization problems and improving the performance of AI workloads.

Read More

Use no-code machine learning to derive insights from product reviews using Amazon SageMaker Canvas sentiment analysis and text analysis models

Use no-code machine learning to derive insights from product reviews using Amazon SageMaker Canvas sentiment analysis and text analysis models

According to Gartner, 85% of software buyers trust online reviews as much as personal recommendations. Customers provide feedback and reviews about products they have purchased through many channels, including review websites, vendor websites, sales calls, social media, and many others. The problem with the increasing volume of customer reviews across multiple channels is that it can be challenging for companies to process and derive meaningful insights from the data using traditional methods. Machine learning (ML) can analyze large volumes of product reviews and identify patterns, sentiments, and topics discussed. With this information, companies can gain a better understanding of customer preferences, pain points, and satisfaction levels. They can also use this information to improve products and services, identify trends, and take strategic actions that drive business growth. However, implementing ML can be a challenge for companies that lack resources such as ML practitioners, data scientists, or artificial intelligence (AI) developers. With the new Amazon SageMaker Canvas features, business analysts can now use ML to derive insights from product reviews.

SageMaker Canvas is designed for the functional needs of business analysts to use AWS no code ML for ad hoc analysis of tabular data. SageMaker Canvas is a visual, point-and-click service that allows business analysts to generate accurate ML predictions without writing a single line of code or requiring ML expertise. You can use models to make predictions interactively and for batch scoring on bulk datasets. SageMaker Canvas offers fully-managed ready-to-use AI model and custom model solutions. For common ML use cases, you can use a ready-to-use AI model to generate predictions with your data without any model training. For ML use cases specific to your business domain, you can train an ML model with your own data for custom prediction.

In this post, we demonstrate how to use the ready-to-use sentiment analysis model and custom text analysis model to derive insights from product reviews. In this use case, we have a set of synthesized product reviews that we want to analyze for sentiments and categorize the reviews by product type, to make it easy to draw patterns and trends that can help business stakeholders make better informed decisions. First, we describe the steps to determine the sentiment of the reviews using the ready-to-use sentiment analysis model. Then, we walk you through the process to train a text analysis model to categorize the reviews by product type. Next, we explain how to review the trained model for performance. Finally, we explain how to use the trained model to perform predictions.

Sentiment analysis is a natural language processing (NLP) ready-to-use model that analyzes text for sentiments. Sentiment analysis may be run for single line or batch predictions. The predicted sentiments for each line of text are either positive, negative, mixed or neutral.

Text analysis allows you to classify text into two or more categories using custom models. In this post, we want to classify product reviews based on product type. To train a text analysis custom model, you simply provide a dataset consisting of the text and the associated categories in a CSV file. The dataset requires a minimum of two categories and 125 rows of text per category. After the model is trained, you can review the model’s performance and retrain the model if needed, before using it for predictions.

Prerequisites

Complete the following prerequisites:

  1. Have an AWS account.
  2. Set up SageMaker Canvas.
  3. Download the sample product reviews datasets:
    • sample_product_reviews.csv – Contains 2,000 synthesized product reviews and is used for sentiment analysis and Text Analysis predictions.
    • sample_product_reviews_training.csv – Contains 600 synthesized product reviews and three product categories, and is for text analysis model training.

Sentiment analysis

First, you use sentiment analysis to determine the sentiments of the product reviews by completing the following steps.

  1. On the SageMaker console, click Canvas in the navigation pane, then click Open Canvas to open the SageMaker Canvas application.
  2. Click Ready-to-use models in the navigation pane, then click Sentiment analysis.
  3. Click Batch prediction, then click Create dataset.
  4. Provide a Dataset name and click Create.
  5. Click Select files from your computer to import the sample_product_reviews.csv dataset.
  6. Click Create dataset and review the data. The first column contains the reviews and is used for sentiment analysis. The second column contains the review ID and is used for reference only.
  7. Click Create dataset to complete the data upload process.
  8. In the Select dataset for predictions view, select sample_product_reviews.csv and then click Generate predictions. 
  9. When the batch prediction is complete, click View to view the predictions.

Sentiment Analysis Steps

The Sentiment and Confidence columns provide the sentiment and confidence score, respectively. A confidence score is a statistical value between 0 and 100%, that shows the probability that the sentiment is correctly predicted.

  1. Click Download CSV to download the results to your computer.

Text analysis

In this section, we go through the steps to perform text analysis with a custom model: importing the data, training the model and then making predictions.

Import the data

First import the training dataset. Complete the following steps:

  1. On Ready-to-use models page, click Create a custom model
  2. For Model name, enter a name (for example, Product Reviews Analysis). Click Text analysis, then click Create.
  3. On the Select tab, click Create dataset to import the sample_product_reviews_training.csv dataset.
  4. Provide a Dataset name and click Create.
  5. Click Create dataset and review the data. The training dataset contains a third column describing product category, the target column consisting of three products: books, video, and music.
  6. Click Create dataset to complete the data upload process.
  7. On the Select dataset page, select sample_product_reviews_training.csv and click Select dataset.

Classification Steps

Train the model

Next, you configure the model to begin the training process.

  1. On the Build tab, on the Target column drop-down menu, click product_category as the training target.
  2. Click product_review as the source.
  3. Click Quick build to start the model training.

For more information about the differences between Quick build and Standard build, refer to Build a custom model.

When the model training is complete, you may review the performance of the model before you use it for prediction.

  1. On the Analyze tab, the model’s confidence score will be displayed. A confidence score indicates how certain a model is that its predictions are correct. On the Overview tab, review the performance for each category.
  2. Click Scoring to review the model accuracy insights.
  3. Click Advance metrics to review the confusion matrix and F1 score.

Make predictions

To make a prediction with your custom model, complete the following steps:

  1. On the Predict tab, click Batch prediction, then click Manual.
  2. Click the same dataset, sample_product_reviews.csv, that you used previously for the sentiment analysis, then click Generate predictions.
  3. When the batch prediction is complete, click View to view the predictions.

For custom model prediction, it takes some time for SageMaker Canvas to deploy the model for initial use. SageMaker Canvas automatically de-provisions the model if idle for 15 minutes to save costs.

The Prediction (Category) and Confidence columns provide the predicted product categories and confidence scores, respectively.

  1. Highlight the completed job, select the three dots and click Download to download the results to your computer.

Clean up

Click Log out in the navigation pane to log out of the SageMaker Canvas application to stop the consumption of Canvas session hours and release all resources.

Conclusion

In this post, we demonstrated how you can use Amazon SageMaker Canvas to derive insights from product reviews without ML expertise. First, you used a ready-to-use sentiment analysis model to determine the sentiments of the product reviews. Next, you used text analysis to train a custom model with the quick build process. Finally, you used the trained model to categorize the product reviews into product categories. All without writing a single line of code. We recommend that you repeat the text analysis process with the standard build process to compare the model results and prediction confidence.


About the Authors

Gavin Satur is a Principal Solutions Architect at Amazon Web Services. He works with enterprise customers to build strategic, well-architected solutions and is passionate about automation. Outside work, he enjoys family time, tennis, cooking and traveling.

Les Chan is a Sr. Solutions Architect at Amazon Web Services, based in Irvine, California. Les is passionate about working with enterprise customers on adopting and implementing technology solutions with the sole focus of driving customer business outcomes. His expertise spans application architecture, DevOps, serverless, and machine learning.

Aaqib Bickiya is a Solutions Architect at Amazon Web Services based in Southern California. He helps enterprise customers in the retail space accelerate projects and implement new technologies. Aaqib’s focus areas include machine learning, serverless, analytics, and communication services

Read More

Prepare your data for Amazon Personalize with Amazon SageMaker Data Wrangler

Prepare your data for Amazon Personalize with Amazon SageMaker Data Wrangler

A recommendation engine is only as good as the data used to prepare it. Transforming raw data into a format that is suitable for a model is key to getting better personalized recommendations for end-users.

In this post, we walk through how to prepare and import the MovieLens dataset, a dataset prepared by GroupLens research at the University of Minnesota, which consists of a variety of user rankings of various movies, into Amazon Personalize using Amazon SageMaker Data Wrangler. [1]

Solution overview

Amazon Personalize is a managed service whose core value proposition is its ability to learn user preferences from their past behavior and quickly adjust those learned preferences to take account of changing user behavior in near-real time. To be able to develop this understanding of users, Amazon Personalize needs to train on the historical user behavior so that it can find patterns that are generalizable towards the future. Specifically, the main type of data that Amazon Personalize learns from is what we call an interactions dataset, which is a tabular dataset that consists of at minimum three critical columns, userID, itemID, and timestamp, representing a positive interaction between a user and an item at a specific time. Users Amazon Personalize need to upload data containing their own customer’s interactions in order for the model to be able to learn these behavioral trends. Although the internal algorithms within Amazon Personalize have been chosen based on Amazon’s experience in the machine learning space, a personalized model doesn’t come pre-loaded with any sort of data and trains models on a customer-by-customer basis.

The MovieLens dataset explored in this walkthrough isn’t in this format, so to prepare it for Amazon Personalize, we use SageMaker Data Wrangler, a purpose-built data aggregation and preparation tool for machine learning. It has over 300 preconfigured data transformations as well as the ability to bring in custom code to create custom transformations in PySpark, SQL, and a variety of data processing libraries, such as pandas.

Prerequisites

First, we need to have an Amazon SageMaker Studio domain set up. For details on how to set it up, refer to Onboard to Amazon SageMaker Domain using Quick setup.

Also, we need to set up the right permissions using AWS Identity and Access Management (IAM) for Amazon Personalize and Amazon SageMaker service roles so that they can access the needed functionalities.

You can create a new Amazon Personalize dataset group to use in this walkthrough or use an existing one.

Finally, we need to download and unzip the MovieLens dataset and place it in an Amazon Simple Storage Service (Amazon S3) bucket.

Launch SageMaker Data Wrangler from Amazon Personalize

To start with the SageMaker Data Wrangler integration with Amazon Personalize, complete the following steps:

  1. On the Amazon Personalize console, navigate to the Overview page of your dataset group.
  2. Choose Import interaction data, Import user data, or Import item data, depending on the dataset type (for this post, we choose Import interaction data).

  1. For Import method, select Import data using Data Wrangler.
  2. Choose Next.

  1. Specify the SageMaker domain, user profile, and IAM service role that you created earlier as part of the prerequisites.
  2. Choose Next.

  1. Continue through the steps to launch an instance of SageMaker Data Wrangler.

Setting up the environment for the first time can take up to 5 minutes.

Import the raw data into SageMaker Data Wrangler

When using SageMaker Data Wrangler to prepare and import data, we use a data flow. A data flow defines a series of transformations and analyses on data to prepare it to create a machine learning model. Each time we add a step to our flow, SageMaker Data Wrangler takes an action on our data, such as joining it with another dataset or dropping some rows and columns.

To start, let’s import the raw data.

  1. On the data flow page, choose Import data.

With SageMaker Data Wrangler, we can import data from over 50 supported data sources.

  1. For Data sources¸ choose Amazon S3.

  1. Choose the dataset you uploaded to your S3 bucket.

SageMaker Data Wrangler automatically displays a preview of the data.

  1. Keep the default settings and choose Import.

After the data is imported, SageMaker Data Wrangler automatically validates the datasets and detects the data types for all the columns based on its sampling.

  1. Choose Data flow at the top of the Data types page to view the main data flow before moving to the next step.

One of the main advantages of SageMaker Data Wrangler is the ability to run previews of your transformations on a small subset of data before committing to apply the transformations on the entire dataset. To run the same transformation on multiple partitioned files in Amazon S3, you can use parameters.

Transform the data

To transform data in SageMaker Data Wrangler, you add a Transform step to your data flow. SageMaker Data Wrangler includes over 300 transforms that you can use to prepare your data, including a Map columns for Amazon Personalize transform. You can use the general SageMaker Data Wrangler transforms to fix issues such as outliers, type issues, and missing values, or apply data preprocessing steps.

To use Amazon Personalize, the data you provided in the interactions dataset must match your dataset schema. For our movie recommender engine, the proposed interactions dataset schema includes:

  • user_id (string)
  • item_id (string)
  • event_type (string)
  • timestamp (in Unix epoch time format)

To learn more about Amazon Personalize datasets and schemas, refer to Datasets and schemas.

The ratings.csv file as shown in the last step in the previous section includes movies rated from 1–5. We want to build a movie recommender engine based on that. To do so, we must complete the following steps:

  1. Modify the columns data types.
  2. Create two event types: Click and Watch.
  3. Assign all movies rated 2 and above as Click and movies rated 4 and above as both Click and Watch.
  4. Drop the ratings column.
  5. Map the columns to the Amazon Personalize interactions dataset schema.
  6. Validate that our timestamp is in Unix epoch time.

Note that Step 3 isn’t needed to make a personalization model. If we want to use one of the Amazon Personalize streamlined video on demand domain recommenders, such as Top Picks for You, Click and Watch would be required event types. However, if we don’t have these, we could not include an event type field (or add our own event types such as the raw user ratings) and use a custom recipe such as User Personalization. Regardless of what type of recommendation engine we use, we need to ensure our dataset contains only representations of positive user intent. So whatever approach you choose, you need to drop all one-star ratings (and possibly two-star ratings as well).

Now let’s use SageMaker Data Wrangler to perform the preceding steps.

  1. On the Data flow page, choose the first transform, called Data types.

  1. Update the type for each column.
  2. Choose Preview to reflect the changes, then choose Update.

  1. To add a step in the data flow, choose the plus sign next to the step you want to perform the transform on, then choose Add transform.

  1. To filter the event Click out of the movie ratings, we add a Filter data step to filter out the movies rated 2 and above.

  1. Add another custom transform step to add a new column, eventType, with Click as an assigned value.
  2. Choose Preview to review your transformation to double-check the results are as intended, then choose Add.
  3. In this case, we write some PySpark code to add a column called eventType whose value will be uniformly Click for all of our two-star through five-star movies:
from pyspark.sql.functions import lit

df = df.withColumn("eventType", lit("Click"))

  1. For the Watch events, repeat the previous steps for movies rated 4 and above and assign the Watch value by adding the steps to the Data types step. Our PySpark code for these steps is as follows:
from pyspark.sql.functions import lit

df = df.withColumn("eventType", lit("Watch"))

Up to this point, the data flow should look like the following screenshot.

Concatenate datasets

Because we have two datasets for watched and clicked events, let’s walk through how to concatenate these into one interactions dataset.

  1. On the Data flow page, choose the plus sign next to Create Watch Event and choose Concatenate.

  1. Choose the other final step (Create Click Event), and this should automatically map (converge) both the sets into a concatenate preview.

  1. Choose Configure to view a preview of the concatenated datasets.
  2. Add a name to the step.
  3. Choose Add to add the step.

The data flow now looks like the following screenshot.

  1. Now, let’s add a Manage columns step to drop the original rating column.

Amazon Personalize has default column names for users, items, and timestamps. These default column names are user_id, item_id, and timestamp.

  1. Let’s add a Transform for Amazon Personalize step to replace the existing column headers with the default headers.
  2. In our case, we also use the event_type field, so let’s map that as well.

With this step, the data transformation activity is complete and the interactions dataset is ready for the next step.

Next, let’s validate our timestamps.

  1. We can do this by adding a Custom transform step. For this post, we choose Python (User-Defined Function).
  2. Choose timestamp as the input column and as the output, create a new column called readable_timestamp.
  3. Choose Python as the mode for the transformation and insert the following code for the Python function:
def custom_func(value: int) → str:
    return datetime.utcfromtimestamp(value).strftime('%Y-%m-%d %H:%M:%S')
  1. Choose Preview to review the changes.

In this case, we see dates in the 2000s—because MovieLens started collecting data in 1996, this aligns with what is expected. If we don’t choose Add, this transformation won’t be added to our data flow.

  1. Because this was merely a sanity check, you can navigate back to the data flow by choosing Data flow in the upper left corner.

Finally, we add an analysis step to create a summary report about the dataset. This step performs an analysis to assess the suitability of the dataset for Amazon Personalize.

  1. Choose the plus sign next to the final step on the data flow and choose Add analysis.
  2. For Analysis type¸ choose Data Quality And Insights Report for Amazon Personalize.
  3. For Dataset type¸ choose Interactions.
  4. Choose Create.

The MovieLens dataset is quite clean, so the analysis shows no issues. If some issues were identified, you can iterate on the dataset and rerun the analysis until you can address them.

Note the analysis by default runs on a sample of 50,000 rows.

Import the dataset to Amazon Personalize

At this point, our raw data has been transformed and we are ready to import the transformed interactions dataset to Amazon Personalize. SageMaker Data Wrangler gives you the ability to export your data to a location within an S3 bucket. You can specify the location using one of the following methods:

  • Destination node – Where SageMaker Data Wrangler stores the data after it has processed it
  • Export to – Exports the data resulting from a transformation to Amazon S3
  • Export data – For small datasets, you can quickly export the data that you’ve transformed

With the Destination node method, to export your data, you create destination nodes and a SageMaker Data Wrangler job. Creating a SageMaker Data Wrangler job starts a SageMaker Processing job to export your flow. You can choose the destination nodes that you want to export after you’ve created them.

  1. Choose the plus sign next to the node that represents the transformations you want to export.

  1. Choose Export to and then choose Amazon S3 (via Jupyter Notebook).

Note we could have also chosen to export the data to Amazon Personalize via a Jupyter notebook available in SageMaker Data Wrangler.

  1. For Dataset name, enter a name, which will be used as a folder name in the S3 bucket provided as a destination.
  2. You can specify the file type, field delimiter, and compression method.
  3. Optionally, specify the number of partitions and column to partition by.
  4. Choose Add destination.

The data flow should look like the following screenshot.

  1. Create a job to process the data flow and store the data in the destination (S3 bucket) that we configured in the previous step.
  2. Enter a job name, then choose Configure job.

SageMaker Data Wrangler provides the ability to configure the instance type, instance count, and job configuration, and the ability to create a schedule to process the job. For guidance on how to choose an instance count, refer to Create and Use a Data Wrangler Flow.

To monitor the status of the job, navigate to the Dashboard page on the SageMaker console. The Processing section shows the number of completed and created jobs. You can drill down to get more details about the completed job.

When the job is complete, a new file of the transformed data is created in the destination specified.

  1. Return to the Amazon Personalize console and navigate to the dataset group to import another dataset.
  2. Choose Import interaction data.

  1. Select Import data directly into Amazon Personalize datasets to import the transformed dataset directly from Amazon S3, then choose Next.

  1. Define the schema. For this post, our case our dataset consists of the user_id (string), item_id (string), event_type (string), and timestamp (long) fields.

At this point, you can create a video on demand domain recommender or a custom solution. To do so, follow the steps in Preparing and importing data

Conclusion

In this post, we described how to use SageMaker Data Wrangler to prepare a sample dataset for Amazon Personalize. SageMaker Data Wrangler offers over 300 transformations. These transformations and the ability to add custom user transformations can help streamline the process of creating a quality dataset to offer hyper-personalized content to end-users.

Although we only explored how to prepare an interactions dataset in this post, you can use SageMaker Data Wrangler to prepare user and item datasets as well. For more information on the types of data that can be used with Amazon Personalize, refer to Datasets and schemas.

If you’re new to Amazon Personalize or SageMaker Data Wrangler, refer to Get Started with Amazon Personalize or Get Started with SageMaker Data Wrangler, respectively. If you have any questions related to this post, please add them in the comments section.


About the Authors

Maysara Hamdan is a Partner Solutions Architect based in Atlanta, Georgia. Maysara has over 15 years of experience in building and architecting Software Applications and IoT Connected Products in Telecom and Automotive Industries. In AWS, Maysara helps partners in building their cloud practices and growing their businesses. Maysara is passionate about new technologies and is always looking for ways to help partners innovate and grow.

Eric Bolme is a Specialist Solution Architect with AWS based on the East Coast of the United States. He has 8 years of experience building out a variety of deep learning and other AI use cases and focuses on Personalization and Recommendation use cases with AWS.


References

[1] Maxwell Harper and Joseph A. Konstan. 2015. The MovieLens Datasets: History and Context. ACM Transactions on Interactive Intelligent Systems (TiiS) 5, 4, Article 19 (December 2015), 19 pages. DOI=http://dx.doi.org/10.1145/2827872

Read More

Personalize your generative AI applications with Amazon SageMaker Feature Store

Personalize your generative AI applications with Amazon SageMaker Feature Store

Large language models (LLMs) are revolutionizing fields like search engines, natural language processing (NLP), healthcare, robotics, and code generation. The applications also extend into retail, where they can enhance customer experiences through dynamic chatbots and AI assistants, and into digital marketing, where they can organize customer feedback and recommend products based on descriptions and purchase behaviors.

The personalization of LLM applications can be achieved by incorporating up-to-date user information, which typically involves integrating several components. One such component is a feature store, a tool that stores, shares, and manages features for machine learning (ML) models. Features are the inputs used during training and inference of ML models. For instance, in an application that recommends movies, features could include previous ratings, preference categories, and demographics. Amazon SageMaker Feature Store is a fully managed repository designed specifically for storing, sharing, and managing ML model features. Another essential component is an orchestration tool suitable for prompt engineering and managing different type of subtasks. Generative AI developers can use frameworks like LangChain, which offers modules for integrating with LLMs and orchestration tools for task management and prompt engineering.

Building on the concept of dynamically fetching up-to-date data to produce personalized content, the use of LLMs has garnered significant attention in recent research for recommender systems. The underlying principle of these approaches involves the construction of prompts that encapsulate the recommendation task, user profiles, item attributes, and user-item interactions. These task-specific prompts are then fed into the LLM, which is tasked with predicting the likelihood of interaction between a particular user and item. As stated in the paper Personalized Recommendation via Prompting Large Language Models, recommendation-driven and engagement-guided prompting components play a crucial role in enabling LLMs to focus on relevant context and align with user preferences.

In this post, we elucidate the simple yet powerful idea of combining user profiles and item attributes to generate personalized content recommendations using LLMs. As demonstrated throughout the post, these models hold immense potential in generating high-quality, context-aware input text, which leads to enhanced recommendations. To illustrate this, we guide you through the process of integrating a feature store (representing user profiles) with an LLM to generate these personalized recommendations.

Solution overview

Let’s imagine a scenario where a movie entertainment company promotes movies to different users via an email campaign. The promotion contains 25 well-known movies, and we want to select the top three recommendations for each user based on their interests and previous rating behaviors.

For example, given a user’s interest in different movie genres like action, romance, and sci-fi, we could have an AI system determine the top three recommended movies for that particular user. In addition, the system might generate personalized messages for each user in a tone tailored to their preferences. We include some examples of personalized messages later in this post.

This AI application would include several components working together, as illustrated in the following diagram:

  1. A user profiling engine takes in a user’s previous behaviors and outputs a user profile reflecting their interests.
  2. A feature store maintains user profile data.
  3. A media metadata store keeps the promotion movie list up to date.
  4. A language model takes the current movie list and user profile data, and outputs the top three recommended movies for each user, written in their preferred tone.
  5. An orchestrating agent coordinates the different components.

In summary, intelligent agents could construct prompts using user- and item-related data and deliver customized natural language responses to users. This would represent a typical content-based recommendation system, which recommends items to users based on their profiles. The user’s profile is stored and maintained in the feature store and revolves around their preferences and tastes. It is commonly derived based on their previous behaviors, such as ratings.

The following diagram illustrates how it works.

The application follows these steps to provide responses to a user’s recommendation:

  1. The user profiling engine that takes a user’s historical movie rating as input, outputs user interest, and stores the feature in SageMaker Feature Store. This process can be updated in a scheduling manner.
  2. The agent takes the user ID as input, searches for the user interest, and completes the prompt template following the user’s interests.
  3. The agent takes the promotion item list (movie name, description, genre) from a media metadata store.
  4. The interests prompt template and promotion item list are fed into an LLM for email campaign messages.
  5. The agent sends the personalized email campaign to the end user.

The user profiling engine builds a profile for each user, capturing their preferences and interests. This profile can be represented as a vector with elements mapping to features like movie genres, with values indicating the user’s level of interest. The user profiles in the feature store allow the system to suggest personalized recommendations matching their interests. User profiling is a well-studied domain within recommendation systems. To simplify, you can build a regression algorithm using a user’s previous ratings across different categories to infer their overall preferences. This can be done with algorithms like XGBoost.

Code walkthrough

In this section, we provide examples of the code. The full code walkthrough is available in the GitHub repo.

After obtaining the user interests feature from the user profiling engine, we can store the results in the feature store. SageMaker Feature Store supports batch feature ingestion and online storage for real-time inference. For ingestion, data can be updated in an offline mode, whereas inference needs to happen in milliseconds. SageMaker Feature Store ensures that offline and online datasets remain in sync.

For data ingestion, we use the following code:

from sagemaker.feature_store.feature_group import FeatureGroup

feature_group_name = 'user-profile-feature-group'
feature_group = FeatureGroup(name=feature_group_name, feature_definitions=feature_definitions, sagemaker_session=sess)

#Ingest data
feature_group.ingest(data_frame=data_frame, max_workers=6, wait=True)

For real-time online storage, we could use the following code to extract the user profile based on the user ID:

feature_record = featurestore_runtime_client.get_record(FeatureGroupName=feature_group_name, RecordIdentifierValueAsString=customer_id)
print(feature_record)

Then we rank the top three interested movie categories to feed the downstream recommendation engine:

User ID: 42
Top3 Categories: [‘Animation’, ‘Thriller’, ‘Adventure’]

Our application employs two primary components. The first component retrieves data from a feature store, and the second component acquires a list of movie promotions from the metadata store. The coordination between these components is managed by Chains from LangChain, which represent a sequence of calls to components.

It’s worth mentioning that in complex scenarios, the application may need more than a fixed sequence of calls to LLMs or other tools. Agents, equipped with a suite of tools, use an LLM to determine the sequence of actions to be taken. Whereas Chains encode a hardcoded sequence of actions, agents use the reasoning power of a language model to dictate the order and nature of actions.

The connection between different data sources, including SageMaker Feature Store, is demonstrated in the following code. All the retrieved data is consolidated to construct an extensive prompt, serving as input for the LLM. We dive deep into the specifics of prompt design in the subsequent section. The following is a prompt template definition that interfaces with multiple data sources:­

from langchain.prompts import StringPromptTemplate

class FeatureStorePromptTemplate(StringPromptTemplate):
    
    feature_group_name = 'user-profile-feature-group'
    
    def format(self, **kwargs) -> str:
        user_id = kwargs.pop("user_id")
        feature_record = self.fetch_user_preference_from_feature_store(user_id)
        user_preference = self.rank_user_preference(feature_record)
        
        kwargs["promotion_movie_list"] = self.read_promotion_list()
        kwargs["user_preference"] = user_preference
        return prompt.format(**kwargs)
    
    def fetch_user_preference_from_feature_store(self, user_id):
        
        boto_session = boto3.Session()
        featurestore_runtime_client = boto_session.client('sagemaker-featurestore-runtime')
        feature_record = featurestore_runtime_client.get_record(FeatureGroupName=self.feature_group_name, RecordIdentifierValueAsString=str(user_id))
        return feature_record['Record']
    
    # Rank Top_3_Categories for given user's preference
    def rank_user_preference(self, data) -> str:
        # refer to the details in the notebook
        return str(top_categories_df.values.tolist())
        
    # Get promotion movie list from metadata store
    def read_promotion_list(self,) -> str:
        # refer to the details in the notebook
        return output_string

In addition, we use Amazon SageMaker to host our LLM model and expose it as the LangChain SageMaker endpoint. To deploy the LLM, we use Amazon SageMaker JumpStart (for more details, refer to Llama 2 foundation models from Meta are now available in Amazon SageMaker JumpStart). After the model is deployed, we can create the LLM module:

from langchain import PromptTemplate, SagemakerEndpoint
from langchain.llms.sagemaker_endpoint import LLMContentHandler

class ContentHandler(LLMContentHandler):

    def transform_input(self, prompt: str, model_kwargs: dict) -> bytes:
        # refer to the details in the notebook
        
    def transform_output(self, output: bytes) -> str:
        # refer to the details in the notebook

content_handler = ContentHandler()

sm_llm = SagemakerEndpoint(
    endpoint_name = endpoint_name,
    region_name = aws_region,
    model_kwargs = parameters,
    endpoint_kwargs={"CustomAttributes": 'accept_eula=true'},
    content_handler = content_handler,
)

In the context of our application, the agent runs a sequence of steps, called an LLMChain. It integrates a prompt template, model, and guardrails to format the user input, pass it to the model, get a response, and then validate (and, if necessary, rectify) the model output.

from langchain.chains import LLMChain
llmchain = LLMChain(llm=sm_llm, prompt=prompt_template)
email_content = llmchain.run({'user_id': 4})
print(email_content)

In the next section, we walk through the prompt engineering for the LLM to output expected results.

LLM recommendation prompting and results

Following the high-level concept of engagement-guided prompting as described in the research study Personalized Recommendation via Prompting Large Language Models, the fundamental principle of our prompting strategy is to integrate user preferences in creating prompts. These prompts are designed to guide the LLM towards more effectively identifying attributes within the content description that align with user preferences. To elaborate further, our prompt comprises several components:

  • Contextual relevance – The initial part of our prompt template incorporates media metadata such as item name (movie title), description (movie synopsis), and attribute (movie genre). By incorporating this information, the prompt provides the LLM with a broader context and a more comprehensive understanding of the content. This contextual information aids the LLM in better understanding the item through its description and attributes, thereby enhancing its utility in content recommendation scenarios.
  • User preference alignment – By taking into account a user profile that signifies user preferences, potential recommendations are better positioned to identify content characteristics and features that resonate with target users. This alignment augments the utility of the item descriptions because it enhances the efficiency of recommending items that are relevant and in line with user preferences.
  • Enhanced recommendation quality – The engagement-guided prompt uses user preferences to identify relevant promotional items. We can also use user preference to adjust the tone of the LLM for the final output. This can result in an accurate, informative, and personalized experience, thereby improving the overall performance of the content recommendation system.

The following code shows an example prompt template:

prompt_template = """
Our company, "Classic Cinema" frequently promotes movies that we aim to recommend to our customers. This month, we have several popular movies on promotion.

As an AI agent, you are tasked to assist "Classic Cinema" in crafting an email campaign to recommend relevant movies to users. The recommendations should adhere to several guidelines, including contextual relevance, ensuring the recommendations are strictly from our promotional movie list. Additionally, the recommendations should align with user preferences, suggesting items that are relevant and in harmony with the user's preferred categories. You are to provide precisely three top recommended movies. Finally, please draft the email to reflect the tone of the user's preferred categories. The email should not exceed 100 words.

The recommended movies should be sourced from this contextual relevance movie list:
{promotion_movie_list}.

The user has expressed interest in {user_preference}.

Please ensure the recommendations are relevant, and the tone of the email reflects the tastes of those interested in the {user_preference} movie category.

Ensure the letter appeals to those interested in the {user_preference} movie category, and keep the email campaign within a 100-word limit. """

The following is an example result with user preferences of sci-fi, adventure, and war genres:

Subject: Explore the Frontier of Classic Cinema with Our Sci-Fi, Adventure, and War Movies!

Dear [Name],
Are you ready to embark on a journey through time and space, experience heart-pumping action, and witness the bravery of heroes on the battlefield? Look no further! Classic Cinema proudly presents our carefully curated selection of movies, tailored specifically to your interests in Sci-Fi, Adventure, and War.

First up, we have the iconic “Star Wars: Episode V – The Empire Strikes Back” (1980). This classic installment in the Star Wars saga follows Luke Skywalker’s journey to becoming a Jedi Master, while his friends face off against the evil Empire. With its groundbreaking special effects and memorable characters, this movie is a must-see for any fan of the genre.

Next, we recommend “Interstellar” (2014), a thought-provoking and visually stunning film that delves into the mysteries of time and space. As a group of explorers embark on a quest to save humanity, they encounter breathtaking landscapes and uncover the secrets of the universe.

Lastly, we have “Saving Private Ryan” (1998), a gripping and intense war drama that follows a group of soldiers on a dangerous mission to find and rescue a paratrooper whose brothers have been killed in action. This critically acclaimed film is a powerful tribute to the heroes of World War II.
Don’t miss out on these cinematic masterpieces! Watch them now and experience the thrill of adventure, the wonder of sci-fi, and the bravery of war heroes.
Happy viewing, and may the force be with you!

Best regards,
Classic Cinema Team

The following is another example result with a user preference of documentary, musical, and drama:

Subject: Classic Cinema’s Recommendations for Documentary, Musical, and Drama Lovers

Dear [Name],
We hope this email finds you well and that you’re enjoying the variety of movies available on our platform. At Classic Cinema, we take pride in catering to the diverse tastes of our customers, and we’ve selected three exceptional movies that we believe will resonate with your interest in Documentary, Musical, and Drama.


First up, we have “The Shawshank Redemption” (1994), a powerful and uplifting drama that follows the journey of two prisoners as they find hope and redemption in a corrupt and unforgiving prison system. With its gripping storyline, outstanding performances, and timeless themes, this movie is a must-see for anyone who loves a well-crafted drama.


Next, we recommend “The Lord of the Rings: The Fellowship of the Ring” (2001), an epic adventure that combines breathtaking visuals, memorable characters, and a richly detailed world. This movie is a masterclass in storytelling, with a deep sense of history and culture that will transport you to Middle-earth and leave you wanting more.


Lastly, we suggest “The Pianist” (2002), a profound and moving documentary that tells the true story of Władysław Szpilman, a Polish Jewish pianist who struggled to survive the destruction of the Warsaw ghetto during World War II. This film is a powerful reminder of the human spirit’s capacity for resilience and hope, even in the face of unimaginable tragedy.


We hope these recommendations resonate with your interests and provide you with an enjoyable and enriching movie experience. Don’t miss out on these timeless classics – watch them now and discover the magic of Classic Cinema!
Best regards,
The Classic Cinema Team

We have carried out tests with both Llama 2 7B-Chat (see the following code sample) and Llama 70B for comparison. Both models performed well, yielding consistent conclusions. By using a prompt template filled with up-to-date data, we found it easier to test arbitrary LLMs, helping us choose the right balance between performance and cost. We have also made several shared observations that are worth noting.

Firstly, we can see that the recommendations provided genuinely align with user preferences. The movie recommendations are guided by various components within our application, most notably the user profile stored in the feature store.

Additionally, the tone of the emails corresponds to user preferences. Thanks to the advanced language understanding capabilities of LLM, we can customize the movie descriptions and email content, tailoring them to each individual user.

Furthermore, the final output format can be designed into the prompt. For example, in our case, the salutation “Dear [Name]” needs to be filled by the email service. It’s important to note that although we avoid exposing personally identifiable information (PII) within our generative AI application, there is the possibility to reintroduce this information during postprocessing, assuming the right level of permissions are granted.

Clean up

To avoid unnecessary costs, delete the resources you created as part of this solution, including the feature store and LLM inference endpoint deployed with SageMaker JumpStart.

Conclusion

The power of LLMs in generating personalized recommendations is immense and transformative, particularly when coupled with the right tools. By integrating SageMaker Feature Store and LangChain for prompt engineering, developers can construct and manage highly tailored user profiles. This results in high-quality, context-aware inputs that significantly enhance recommendation performance. In our illustrative scenario, we saw how this can be applied to tailor movie recommendations to individual user preferences, resulting in a highly personalized experience.

As the LLM landscape continues to evolve, we anticipate seeing more innovative applications that use these models to deliver even more engaging, personalized experiences. The possibilities are boundless, and we are excited to see what you will create with these tools. With resources such as SageMaker JumpStart and Amazon Bedrock now available to accelerate the development of generative AI applications, we strongly recommend exploring the construction of recommendation solutions using LLMs on AWS.


About the Authors

Yanwei Cui, PhD, is a Senior Machine Learning Specialist Solutions Architect at AWS. He started machine learning research at IRISA (Research Institute of Computer Science and Random Systems), and has several years of experience building AI-powered industrial applications in computer vision, natural language processing, and online user behavior prediction. At AWS, he shares his domain expertise and helps customers unlock business potentials and drive actionable outcomes with machine learning at scale. Outside of work, he enjoys reading and traveling.

Gordon Wang is a Senior AI/ML Specialist TAM at AWS. He supports strategic customers with AI/ML best practices cross many industries. He is passionate about computer vision, NLP, generative AI, and MLOps. In his spare time, he loves running and hiking.

Michelle Hong, PhD, works as Prototyping Solutions Architect at Amazon Web Services, where she helps customers build innovative applications using a variety of AWS components. She demonstrated her expertise in machine learning, particularly in natural language processing, to develop data-driven solutions that optimize business processes and improve customer experiences.

Bin Wang, PhD, is a Senior Analytic Specialist Solutions Architect at AWS, boasting over 12 years of experience in the ML industry, with a particular focus on advertising. He possesses expertise in natural language processing (NLP), recommender systems, diverse ML algorithms, and ML operations. He is deeply passionate about applying ML/DL and big data techniques to solve real-world problems. Outside of his professional life, he enjoys music, reading, and traveling.

Read More