Predicting the delays caused when robots’ paths intersect can improve task assignment and path planning in warehouses.Read More
Enhance Amazon Lex with conversational FAQ features using LLMs
Amazon Lex is a service that allows you to quickly and easily build conversational bots (“chatbots”), virtual agents, and interactive voice response (IVR) systems for applications such as Amazon Connect.
Artificial intelligence (AI) and machine learning (ML) have been a focus for Amazon for over 20 years, and many of the capabilities that customers use with Amazon are driven by ML. Today, large language models (LLMs) are transforming the way developers and enterprises solve historically complex challenges related to natural language understanding (NLU). We announced Amazon Bedrock recently, which democratizes Foundational Model access for developers to easily build and scale generative AI-based applications, using familiar AWS tools and capabilities. One of the challenges enterprises face is to incorporate their business knowledge into LLMs to deliver accurate and relevant responses. When leveraged effectively, enterprise knowledge bases can be used to deliver tailored self-service and assisted-service experiences, by delivering information that helps customers solve problems independently and/or augmenting an agent’s knowledge. Today, a bot developer can improve self-service experiences without utilizing LLMs in a couple of ways. First, by creating intents, sample utterances, and responses, thereby covering all anticipated user questions within an Amazon Lex bot. Second, developers can also integrate bots with search solutions, which can index documents stored across a wide range of repositories and find the most relevant document to answer their customer’s question. These methods are effective, but require developer resources making getting started difficult.
One of the benefits offered by LLMs is the ability to create relevant and compelling conversational self-service experiences. They do so by leveraging enterprise knowledge base(s) and delivering more accurate and contextual responses. This blog post introduces a powerful solution for augmenting Amazon Lex with LLM-based FAQ features using the Retrieval Augmented Generation (RAG). We will review how the RAG approach augments Amazon Lex FAQ responses using your company data sources. In addition, we will also demonstrate Amazon Lex integration with LlamaIndex, which is an open-source data framework that provides knowledge source and format flexibility to the bot developer. As a bot developer gains confidence with using a LlamaIndex to explore LLM integration, they can scale the Amazon Lex capability further. They can also use enterprise search services such as Amazon Kendra, which is natively integrated with Amazon Lex.
In this solution, we showcase the practical application of an Amazon Lex chatbot with LLM-based RAG enhancement. We use the Zappos customer support use case as an example to demonstrate the effectiveness of this solution, which takes the user through an enhanced FAQ experience (with LLM), rather than directing them to fallback (default, without LLM).
Solution overview
RAG combines the strengths of traditional retrieval-based and generative AI based approaches to Q&A systems. This methodology harnesses the power of large language models, such as Amazon Titan or open-source models (for example, Falcon), to perform generative tasks in retrieval systems. It also takes into account the semantic context from stored documents more effectively and efficiently.
RAG starts with an initial retrieval step to retrieve relevant documents from a collection based on the user’s query. It then employs a language model to generate a response by considering both the retrieved documents and the original query. By integrating RAG into Amazon Lex, we can provide accurate and comprehensive answers to user queries, resulting in a more engaging and satisfying user experience.
The RAG approach requires document ingestion so that embeddings can be created to enable LLM-based search. The following diagram shows how the ingestion process creates the embeddings that are then used by the chatbot during fallback to answer the customer’s question.
With this solution architecture, you should choose the most suitable LLM for your use case. It also provides an inference endpoint choice between Amazon Bedrock (in limited preview) and models hosted on Amazon SageMaker JumpStart, offering additional LLM flexibility.
The document is uploaded to an Amazon Simple Storage Service (Amazon S3) bucket. The S3 bucket has an event listener attached that invokes an AWS Lambda function on changes to the bucket. The event listener ingests the new document and places the embeddings in another S3 bucket. The embeddings are then used by the RAG implementation in the Amazon Lex bot during the fallback intent to answer the customer’s question. The next diagram shows the architecture of how an FAQ bot within Lex can be enhanced with LLMs and RAG.
Let’s explore how we can integrate RAG based on LlamaIndex into an Amazon Lex bot. We provide code examples and an AWS Cloud Development Kit (AWS CDK) import to assist you in setting up the integration. You can find the code examples in our GitHub repository. The following sections provide a step-by-step guide to help you set up the environment and deploy the necessary resources.
How RAG works with Amazon Lex
The flow of RAG involves an iterative process where the retriever component retrieves relevant passages, the question and passages help construct the prompt, and the generation component produces a response. This combination of retrieval and generation techniques allows the RAG model to take advantage of the strengths of both approaches, providing accurate and contextually appropriate answers to user questions. The workflow provides the following capabilities:
- Retriever engine – The RAG model begins with a retriever component responsible for retrieving relevant documents from a large corpus. This component typically uses an information retrieval technique like TF-IDF or BM25 to rank and select documents that are likely to contain the answer to a given question. The retriever scans the document corpus and retrieves a set of relevant passages.
- Prompt helper – After the retriever has identified the relevant passages, the RAG model moves to prompt creation. The prompt is a combination of the question and the retrieved passages, serving as additional context for the prompt, which is used as input to the generator component. To create the prompt, the model typically augments the question with the selected passages in a specific format.
- Response generation – The prompt, consisting of the question and relevant passages, is fed into the generation component of the RAG model. The generation component is usually a language model capable of reasoning through the prompt to generate a coherent and relevant response.
- Final response – Finally, the RAG model selects the highest-ranked answer as the output and presents it as the response to the original question. The selected answer can be further postprocessed or formatted as necessary before being returned to the user. In addition, the solution enables the filtering of the generated response if the retrieval results yields a low confidence score, implying that it likely falls outside the distribution (OOD).
LlamaIndex: An open-source data framework for LLM-based applications
In this post, we demonstrate the RAG solution based on LlamaIndex. LlamaIndex is an open-source data framework specifically designed to facilitate LLM-based applications. It offers a robust and scalable solution for managing document collection in different formats. With LlamaIndex, bot developers are empowered to effortlessly integrate LLM-based QA (question answering) capabilities into their applications, eliminating the complexities associated with managing solutions catered to large-scale document collections. Furthermore, this approach proves to be cost-effective for smaller-sized document repositories.
Prerequisites
You should have the following prerequisites:
- An AWS account
- An AWS Identity and Access Management (IAM) user and role permissions to access the following:
- Amazon Lex
- Lambda
- Amazon SageMaker
- An S3 bucket
- The AWS CDK installed
Set up your development environment
The main third-party package requirements are llama_index and sagemaker sdk. Follow the specified commands in our GitHub repository’s README to set up your environment properly.
Deploy the required resources
This step involves creating an Amazon Lex bot, S3 buckets, and a SageMaker endpoint. Additionally, you need to Dockerize the code in the Docker image directory and push the images to Amazon Elastic Container Registry (Amazon ECR) so that it can run in Lambda. Follow the specified commands in our GitHub repository’s README to deploy the services.
During this step, we demonstrate LLM hosting via SageMaker Deep Learning Containers. Adjust the settings according to your computation needs:
- Model – To find a model that meets your requirements, you can explore resources like the Hugging Face model hub. It offers a variety of models such as Falcon 7B or Flan-T5-XXL. Additionally, you can find detailed information about various officially supported model architectures, helping you make an informed decision. For more information about different model types, refer to optimized architectures.
- Model inference endpoint – Define the path of the model (for example, Falcon 7B), choose your instance type (for example, g5.4xlarge), and use quantization (for example, int-8 quantization).Note: This solution provides you the flexibility to choose another model inferencing endpoint. You can also use Amazon Bedrock, which provides access to other LLMs such as Amazon Titan.Note: This solution provides you the flexibility to choose another model inferencing endpoint. You can also use Amazon Bedrock, which provides access to other LLMs such as Amazon Titan.
Set up your document index via LlamaIndex
To set up your document index, first upload your document data. We assume that you have the source of your FAQ content, such as a PDF or text file.
After the document data is uploaded, the LlamaIndex system will automatically initiate the process of creating the document index. This task is performed by a Lambda function, which generates the index and saves it to an S3 bucket.
To enable efficient retrieval of relevant information, configure the document retriever using the LlamaIndex Retriever Query Engine. This engine offers several customization options, such as the following:
- Embedding models – You can choose your embedding model, such as Hugging Face embedding.
- Confidence cutoff – Specify a confidence cutoff threshold to determine the quality of retrieval results. If the confidence score falls below this threshold, you can choose to provide out-of-scope responses, indicating that the query is beyond the scope of the indexed documents.
Test the integration
Define your bot definition with a fallback intent and use the Amazon Lex console to test your FAQ requests. For more details, please refer to GitHub repository. The following screenshot shows an example conversation with the bot.
Tips to boost your bot efficiency
The following tips could potentially further improve the efficiency of your bot:
- Index storage – Store your index in an S3 bucket or a service with vector database capabilities such as Amazon OpenSearch. By utilizing cloud-based storage solutions, you can enhance the accessibility and scalability of your index, leading to faster retrieval times and improved overall performance. Also, Refer to this blog post for an Amazon Lex bot that utilizes an Amazon Kendra search solution.
- Retrieval optimization – Experiment with different sizes of embedding models for the retriever. The choice of embedding model can significantly impact the input requirements of your LLM. Finding the optimal balance between model size and retrieval performance can result in improved efficiency and faster response times.
- Prompt engineering – Experiment with different prompt formats, lengths, and styles to optimize the performance and quality of your bot’s answers.
- LLM model selection – Select the most suitable LLM model for your specific use case. Consider factors such as model size, language capabilities, and compatibility with your application requirements. Choosing the right LLM model ensures optimal performance and efficient utilization of system resources.
Contact center conversations can span from self-service to a live human interaction. For use cases involving human-to-human interactions over Amazon Connect, you can use Wisdom to search and find content across multiple repositories, such as frequently asked questions (FAQs), wikis, articles, and step-by-step instructions for handling different customer issues.
Clean up
To avoid incurring future expenses, proceed with deleting all the resources that were deployed as part of this exercise. We have provided a script to shut down the SageMaker endpoint gracefully. Usage details are in the README. Additionally, to remove all the other resources you can run cdk destroy
in the same directory as the other cdk commands to deprovision all the resources in your stack.
Summary
This post discussed the following steps to enhance Amazon Lex with LLM-based QA features using the RAG strategy and LlamaIndex:
- Install the necessary dependencies, including LlamaIndex libraries
- Set up model hosting via Amazon SageMaker or Amazon Bedrock (in limited preview)
- Configure LlamaIndex by creating an index and populating it with relevant documents
- Integrate RAG into Amazon Lex by modifying the configuration and configuring RAG to use LlamaIndex for document retrieval
- Test the integration by engaging in conversations with the chatbot and observing its retrieval and generation of accurate responses
By following these steps, you can seamlessly incorporate powerful LLM-based QA capabilities and efficient document indexing into your Amazon Lex chatbot, resulting in more accurate, comprehensive, and contextually aware interactions with users. As a follow up, we also invite you to review our next blog post, which explores enhancing the Amazon Lex FAQ experience using URL ingestion and LLMs.
About the authors
Max Henkel-Wallace is a Software Development Engineer at AWS Lex. He enjoys working leveraging technology to maximize customer success. Outside of work he is passionate about cooking, spending time with friends, and backpacking.
Song Feng is a Senior Applied Scientist at AWS AI Labs, specializing in Natural Language Processing and Artificial Intelligence. Her research explores various aspects of these fields including document-grounded dialogue modeling, reasoning for task-oriented dialogues, and interactive text generation using multimodal data.
Saket Saurabh is an engineer with AWS Lex team. He works on improving Lex developer experience to help developers build more human-like chat bots. Outside of work, he enjoys traveling, discovering diverse cuisines, and learn about different cultures.
f
Enhance Amazon Lex with LLMs and improve the FAQ experience using URL ingestion
In today’s digital world, most consumers would rather find answers to their customer service questions on their own rather than taking the time to reach out to businesses and/or service providers. This blog post explores an innovative solution to build a question and answer chatbot in Amazon Lex that uses existing FAQs from your website. This AI-powered tool can provide quick, accurate responses to real-world inquiries, allowing the customer to quickly and easily solve common problems independently.
Single URL ingestion
Many enterprises have a published set of answers for FAQs for their customers available on their website. In this case, we want to offer customers a chatbot that can answer their questions from our published FAQs. In the blog post titled Enhance Amazon Lex with conversational FAQ features using LLMs, we demonstrated how you can use a combination of Amazon Lex and LlamaIndex to build a chatbot powered by your existing knowledge sources, such as PDF or Word documents. To support a simple FAQ, based on a website of FAQs, we need to create an ingestion process that can crawl the website and create embeddings that can be used by LlamaIndex to answer customer questions. In this case, we will build on the bot created in the previous blog post, which queries those embeddings with a user’s utterance and returns the answer from the website FAQs.
The following diagram shows how the ingestion process and the Amazon Lex bot work together for our solution.
In the solution workflow, the website with FAQs is ingested via AWS Lambda. This Lambda function crawls the website and stores the resulting text in an Amazon Simple Storage Service (Amazon S3) bucket. The S3 bucket then triggers a Lambda function that uses LlamaIndex to create embeddings that are stored in Amazon S3. When a question from an end-user arrives, such as “What is your return policy?”, the Amazon Lex bot uses its Lambda function to query the embeddings using a RAG-based approach with LlamaIndex. For more information about this approach and the pre-requisites, refer to the blog post, Enhance Amazon Lex with conversational FAQ features using LLMs.
After the pre-requisites from the aforementioned blog are complete, the first step is to ingest the FAQs into a document repository that can be vectorized and indexed by LlamaIndex. The following code shows how to accomplish this:
In the preceding example, we take a predefined FAQ website URL from Zappos and ingest it using the EZWebLoader
class. With this class, we have navigated to the URL and loaded all the questions that are in the page into an index. We can now ask a question like “Does Zappos have gift cards?” and get the answers directly from our FAQs on the website. The following screenshot shows the Amazon Lex bot test console answering that question from the FAQs.
We were able to achieve this because we had crawled the URL in the first step and created embedddings that LlamaIndex could use to search for the answer to our question. Our bot’s Lambda function shows how this search is run whenever the fallback intent is returned:
This solution works well when a single webpage has all the answers. However, most FAQ sites are not built on a single page. For instance, in our Zappos example, if we ask the question “Do you have a price matching policy?”, then we get a less-than-satisfactory answer, as shown in the following screenshot.
In the preceding interaction, the price-matching policy answer isn’t helpful for our user. This answer is short because the FAQ referenced is a link to a specific page about the price matching policy and our web crawl was only for the single page. Achieving better answers will mean crawling these links as well. The next section shows how to get answers to questions that require two or more levels of page depth.
N-level crawling
When we crawl a web page for FAQ knowledge, the information we want can be contained in linked pages. For example, in our Zappos example, we ask the question “Do you have a price matching policy?” and the answer is “Yes please visit <link> to learn more.” If someone asks “What is your price matching policy?” then we want to give a complete answer with the policy. Achieving this means we have the need to traverse links to get the actual information for our end-user. During the ingestion process, we can use our web loader to find the anchor links to other HTML pages and then traverse them. The following code change to our web crawler allows us to find links in the pages we crawl. It also includes some additional logic to avoid circular crawling and allow a filter by a prefix.
In the preceding code, we introduce the ability to crawl N levels deep, and we give a prefix that allows us to restrict crawling to only things that begin with a certain URL pattern. In our Zappos example, the customer service pages all are rooted from zappos.com/c
, so we include that as a prefix to limit our crawls to a smaller and more relevant subset. The code shows how we can ingest up to two levels deep. Our bot’s Lambda logic remains the same because nothing has changed except the crawler ingests more documents.
We now have all the documents indexed and we can ask a more detailed question. In the following screenshot, our bot provides the correct answer to the question “Do you have a price matching policy?”
We now have a complete answer to our question about price matching. Instead of simply being told “Yes see our policy,” it gives us the details from the second-level crawl.
Clean up
To avoid incurring future expenses, proceed with deleting all the resources that were deployed as part of this exercise. We have provided a script to shut down the Sagemaker endpoint gracefully. Usage details are in the README. Additionally, to remove all the other resources you can run cdk destroy
in the same directory as the other cdk commands to deprovision all the resources in your stack.
Conclusion
The ability to ingest a set of FAQs into a chatbot enables your customers to find the answers to their questions with straightforward, natural language queries. By combining the built-in support in Amazon Lex for fallback handling with a RAG solution such as a LlamaIndex, we can provide a quick path for our customers to get satisfying, curated, and approved answers to FAQs. By applying N-level crawling into our solution, we can allow for answers that could possibly span multiple FAQ links and provide deeper answers to our customer’s queries. By following these steps, you can seamlessly incorporate powerful LLM-based Q and A capabilities and efficient URL ingestion into your Amazon Lex chatbot. This results in more accurate, comprehensive, and contextually aware interactions with users.
About the authors
Max Henkel-Wallace is a Software Development Engineer at AWS Lex. He enjoys working leveraging technology to maximize customer success. Outside of work he is passionate about cooking, spending time with friends, and backpacking.
Song Feng is a Senior Applied Scientist at AWS AI Labs, specializing in Natural Language Processing and Artificial Intelligence. Her research explores various aspects of these fields including document-grounded dialogue modeling, reasoning for task-oriented dialogues, and interactive text generation using multimodal data.
John Baker is a Principal SDE at AWS where he works on Natural Language Processing, Large Language Models and other ML/AI related projects. He has been with Amazon for 9+ years and has worked across AWS, Alexa and Amazon.com. In his spare time, John enjoys skiing and other outdoor activities throughout the Pacific Northwest.
Geopipe uses AI to create a digital twin of Earth
With help from the Alexa Fund, the company is making it easier to virtually reconstruct reality.Read More
Build an email spam detector using Amazon SageMaker
Spam emails, also known as junk mail, are sent to a large number of users at once and often contain scams, phishing content, or cryptic messages. Spam emails are sometimes sent manually by a human, but most often they are sent using a bot. Examples of spam emails include fake ads, chain emails, and impersonation attempts. There is a risk that a particularly well-disguised spam email may land in your inbox, which can be dangerous if clicked on. It’s important to take extra precautions to protect your device and sensitive information.
As technology is improving, the detection of spam emails becomes a challenging task due to its changing nature. Spam is quite different from other types of security threats. It may at first appear like an annoying message and not a threat, but it has an immediate effect. Also spammers often adapt new techniques. Organizations who provide email services want to minimize spam as much as possible to avoid any damage to their end customers.
In this post, we show how straightforward it is to build an email spam detector using Amazon SageMaker. The built-in BlazingText algorithm offers optimized implementations of Word2vec and text classification algorithms. Word2vec is useful for various natural language processing (NLP) tasks, such as sentiment analysis, named entity recognition, and machine translation. Text classification is essential for applications like web searches, information retrieval, ranking, and document classification.
Solution overview
This post demonstrates how you can set up email spam detector and filter spam emails using SageMaker. Let’s see how a spam detector typically works, as shown in the following diagram.
Emails are sent through a spam detector. An email is sent to the spam folder if the spam detector detects it as spam. Otherwise, it’s sent to the customer’s inbox.
We walk you through the following steps to set up our spam detector model:
- Download the sample dataset from the GitHub repo.
- Load the data in an Amazon SageMaker Studio notebook.
- Prepare the data for the model.
- Train, deploy, and test the model.
Prerequisites
Before diving into this use case, complete the following prerequisites:
- Set up an AWS account.
- Set up a SageMaker domain.
- Create an Amazon Simple Storage Service (Amazon S3) bucket. For instructions, see Create your first S3 bucket.
Download the dataset
Download the email_dataset.csv from GitHub and upload the file to the S3 bucket.
The BlazingText algorithm expects a single preprocessed text file with space-separated tokens. Each line in the file should contain a single sentence. If you need to train on multiple text files, concatenate them into one file and upload the file in the respective channel.
Load the data in SageMaker Studio
To perform the data load, complete the following steps:
- Download the
spam_detector.ipynb
file from GitHub and upload the file in SageMaker Studio. - In your Studio notebook, open the
spam_detector.ipynb
notebook. - If you are prompted to choose a Kernel, choose the Python 3 (Data Science 3.0) kernel and choose Select. If not, verify that the right kernel has been automatically selected.
- Import the required Python library and set the roles and the S3 buckets. Specify the S3 bucket and prefix where you uploaded email_dataset.csv.
- Run the data load step in the notebook.
- Check if the dataset is balanced or not based on the Category labels.
We can see our dataset is balanced.
Prepare the data
The BlazingText algorithm expects the data in the following format:
Here’s an example:
Check Training and Validation Data Format for the BlazingText Algorithm.
You now run the data preparation step in the notebook.
- First, you need to convert the Category column to an integer. The following cell replaces the SPAM value with 1 and the HAM value with 0.
- The next cell adds the prefix
__label__
to each Category value and tokenizes the Message column.
- The next step is to split the dataset into train and validation datasets and upload the files to the S3 bucket.
Train the model
To train the model, complete the following steps in the notebook:
- Set up the BlazingText estimator and create an estimator instance passing the container image.
- Set the learning mode hyperparameter to supervised.
BlazingText has both unsupervised and supervised learning modes. Our use case is text classification, which is supervised learning.
- Create the train and validation data channels.
- Start training the model.
- Get the accuracy of the train and validation dataset.
Deploy the model
In this step, we deploy the trained model as an endpoint. Choose your preferred instance
Test the model
Let’s provide an example of three email messages that we want to get predictions for:
- Click on below link, provide your details and win this award
- Best summer deal here
- See you in the office on Friday.
Tokenize the email message and specify the payload to use when calling the REST API.
Now we can predict the email classification for each email. Call the predict method of the text classifier, passing the tokenized sentence instances (payload) into the data argument.
Clean up
Finally , you can delete the endpoint to avoid any unexpected cost.
Also, delete the data file from S3 bucket.
Conclusion
In this post, we walked you through the steps to create an email spam detector using the SageMaker BlazingText algorithm. With the BlazingText algorithm, you can scale to large datasets. BlazingText is used for textual analysis and text classification problems, and has both unsupervised and supervised learning modes. You can use the algorithm for use cases like customer sentiment analysis and text classification.
To learn more about the BlazingText algorithm, check out BlazingText algorithm.
About the Author
Dhiraj Thakur is a Solutions Architect with Amazon Web Services. He works with AWS customers and partners to provide guidance on enterprise cloud adoption, migration, and strategy. He is passionate about technology and enjoys building and experimenting in the analytics and AI/ML space.
Llama 2 foundation models from Meta are now available in Amazon SageMaker JumpStart
Today, we are excited to announce that Llama 2 foundation models developed by Meta are available for customers through Amazon SageMaker JumpStart. The Llama 2 family of large language models (LLMs) is a collection of pre-trained and fine-tuned generative text models ranging in scale from 7 billion to 70 billion parameters. Fine-tuned LLMs, called Llama-2-chat, are optimized for dialogue use cases. You can easily try out these models and use them with SageMaker JumpStart, which is a machine learning (ML) hub that provides access to algorithms, models, and ML solutions so you can quickly get started with ML.
In this post, we walk through how to use Llama 2 models via SageMaker JumpStart.
What is Llama 2
Llama 2 is an auto-regressive language model that uses an optimized transformer architecture. Llama 2 is intended for commercial and research use in English. It comes in a range of parameter sizes—7 billion, 13 billion, and 70 billion—as well as pre-trained and fine-tuned variations. According to Meta, the tuned versions use supervised fine-tuning (SFT) and reinforcement learning with human feedback (RLHF) to align to human preferences for helpfulness and safety. Llama 2 was pre-trained on 2 trillion tokens of data from publicly available sources. The tuned models are intended for assistant-like chat, whereas pre-trained models can be adapted for a variety of natural language generation tasks. Regardless of which version of the model a developer uses, the responsible use guide from Meta can assist in guiding additional fine-tuning that may be necessary to customize and optimize the models with appropriate safety mitigations.
What is SageMaker JumpStart
With SageMaker JumpStart, ML practitioners can choose from a broad selection of open source foundation models. ML practitioners can deploy foundation models to dedicated Amazon SageMaker instances from a network isolated environment and customize models using SageMaker for model training and deployment.
You can now discover and deploy Llama 2 with a few clicks in Amazon SageMaker Studio or programmatically through the SageMaker Python SDK, enabling you to derive model performance and MLOps controls with SageMaker features such as Amazon SageMaker Pipelines, Amazon SageMaker Debugger, or container logs. The model is deployed in an AWS secure environment and under your VPC controls, helping ensure data security. Llama 2 models are available today in Amazon SageMaker Studio, initially in us-east 1
and us-west 2
regions.
Discover models
You can access the foundation models through SageMaker JumpStart in the SageMaker Studio UI and the SageMaker Python SDK. In this section, we go over how to discover the models in SageMaker Studio.
SageMaker Studio is an integrated development environment (IDE) that provides a single web-based visual interface where you can access purpose-built tools to perform all ML development steps, from preparing data to building, training, and deploying your ML models. For more details on how to get started and set up SageMaker Studio, refer to Amazon SageMaker Studio.
Once you’re on the SageMaker Studio, you can access SageMaker JumpStart, which contains pre-trained models, notebooks, and prebuilt solutions, under Prebuilt and automated solutions.
From the SageMaker JumpStart landing page, you can browse for solutions, models, notebooks, and other resources. You can find two flagship Llama 2 models in the Foundation Models: Text Generation carousel. If you don’t see Llama 2 models, update your SageMaker Studio version by shutting down and restarting. For more information about version updates, refer to Shut down and Update Studio Apps.
You can also find other four model variants by choosing Explore all Text Generation Models or searching for llama
in the search box.
You can choose the model card to view details about the model such as license, data used to train, and how to use. You can also find two buttons, Deploy and Open Notebook, which help you use the model.
When you choose either button, a pop-up will show the end-user license agreement and acceptable use policy for you to acknowledge.
Upon acknowledging, you will proceed to the next step to use the model.
Deploy a model
When you choose Deploy and acknowledge the terms, model deployment will start. Alternatively, you can deploy through the example notebook that shows up by choosing Open Notebook. The example notebook provides end-to-end guidance on how to deploy the model for inference and clean up resources.
To deploy using a notebook, we start by selecting an appropriate model, specified by the model_id
. You can deploy any of the selected models on SageMaker with the following code:
This deploys the model on SageMaker with default configurations, including default instance type and default VPC configurations. You can change these configurations by specifying non-default values in JumpStartModel. After it’s deployed, you can run inference against the deployed endpoint through the SageMaker predictor:
Fine-tuned chat models (Llama-2-7b-chat, Llama-2-13b-chat, Llama-2-70b-chat) accept a history of chat between the user and the chat assistant, and generate the subsequent chat. The pre-trained models (Llama-2-7b, Llama-2-13b, Llama-2-70b) requires a string prompt and perform text completion on the provided prompt. See the following code:
Note that by default, accept_eula
is set to false. You need to set accept_eula=true
to invoke the endpoint successfully. By doing so, you accept the user license agreement and acceptable use policy as mentioned earlier. You can also download the license agreement.
Custom_attributes
used to pass EULA are key/value pairs. The key and value are separated by =
and pairs are separated by ;
. If the user passes the same key more than once, the last value is kept and passed to the script handler (i.e., in this case, used for conditional logic). For example, if accept_eula=false; accept_eula=true
is passed to the server, then accept_eula=true
is kept and passed to the script handler.
Inference parameters control the text generation process at the endpoint. The maximum new tokens control refers to the size of the output generated by the model. Note that this is not the same as the number of words because the vocabulary of the model is not the same as the English language vocabulary, and each token may not be an English language word. Temperature controls the randomness in the output. Higher temperature results in more creative and hallucinated outputs. All the inference parameters are optional.
The following table lists all the Llama models available in SageMaker JumpStart along with the model_ids
, default instance types, and the maximum number of total tokens (sum of number of input tokens and number of generated tokens) supported for each of these models.
Model Name | Model ID | Max Total Tokens | Default Instance Type |
Llama-2-7b | meta-textgeneration-llama-2-7b | 4096 | ml.g5.2xlarge |
Llama-2-7b-chat | meta-textgeneration-llama-2-7b-f | 4096 | ml.g5.2xlarge |
Llama-2-13b | meta-textgeneration-llama-2-13b | 4096 | ml.g5.12xlarge |
Llama-2-13b-chat | meta-textgeneration-llama-2-13b-f | 4096 | ml.g5.12xlarge |
Llama-2-70b | meta-textgeneration-llama-2-70b | 4096 | ml.g5.48xlarge |
Llama-2-70b-chat | meta-textgeneration-llama-2-70b-f | 4096 | ml.g5.48xlarge |
Note that SageMaker endpoints have a timeout limit of 60s. Thus, even though the model may be able to generate 4096 tokens, if text generation takes more than 60s, request will fail. For 7B, 13B, and 70B models, we recommend to set max_new_tokens
no greater than 1500, 1000, and 500 respectively, while keeping the total number of tokens less than 4K.
Inference and example prompts for Llama-2-70b
You can use Llama models for text completion for any piece of text. Through text generation, you can perform a variety of tasks, such as answering questions, language translation, sentiment analysis, and many more. Input payload to the endpoint looks like the following code:
The following are some sample example prompts and the text generated by the model. All outputs are generated with inference parameters {"max_new_tokens":256, "top_p":0.9, "temperature":0.6}
.
In the next example, we show how to use Llama models with few-shot in-context learning, where we provide training samples available to the model. Note that we only make inference on the deployed model and during this process, model weights don’t change.
Inference and example prompts for Llama-2-70b-chat
With Llama-2-Chat models, which are optimized for dialogue use cases, the input to the chat model endpoints is the previous history between the chat assistant and the user. You can ask questions contextual to the conversation that has happened so far. You can also provide the system configuration, such as personas that define the chat assistant’s behavior. The input payload to the endpoint looks like the following code:
The following are some sample example prompts and the text generated by the model. All outputs are generated with the inference parameters {"max_new_tokens": 512, "top_p": 0.9, "temperature": 0.6}
.
In the following example, the user has had a conversation with the assistant about tourist sites in Paris. Next, the user is inquiring about the first option recommended by the chat assistant.
In the following examples, we set the system’s configuration:
Clean up
After you’re done running the notebook, make sure to delete all resources so that all the resources that you created in the process are deleted and your billing is stopped:
Conclusion
In this post, we showed you how to get started with Llama 2 models in SageMaker Studio. With this, you have access to six Llama 2 foundation models that contain billions of parameters. Because foundation models are pre-trained, they can also help lower training and infrastructure costs and enable customization for your use case. To get started with SageMaker JumpStart, visit the following resources:
- SageMaker JumpStart documentation
- SageMaker JumpStart Foundation Models documentation
- SageMaker JumpStart product detail page
- SageMaker JumpStart model catalog
About the authors
June Won is a product manager with SageMaker JumpStart. He focuses on making foundation models easily discoverable and usable to help customers build generative AI applications. His experience at Amazon also includes mobile shopping application and last mile delivery.
Dr. Vivek Madan is an Applied Scientist with the Amazon SageMaker JumpStart team. He got his PhD from University of Illinois at Urbana-Champaign and was a Post Doctoral Researcher at Georgia Tech. He is an active researcher in machine learning and algorithm design and has published papers in EMNLP, ICLR, COLT, FOCS, and SODA conferences.
Dr. Kyle Ulrich is an Applied Scientist with the Amazon SageMaker JumpStart team. His research interests include scalable machine learning algorithms, computer vision, time series, Bayesian non-parametrics, and Gaussian processes. His PhD is from Duke University and he has published papers in NeurIPS, Cell, and Neuron.
Dr. Ashish Khetan is a Senior Applied Scientist with Amazon SageMaker JumpStart and helps develop machine learning algorithms. He got his PhD from University of Illinois Urbana-Champaign. He is an active researcher in machine learning and statistical inference, and has published many papers in NeurIPS, ICML, ICLR, JMLR, ACL, and EMNLP conferences.
Sundar Ranganathan is the Global Head of GenAI/Frameworks GTM Specialists at AWS. He focuses on developing GTM strategy for large language models, GenAI, and large-scale ML workloads across AWS services like Amazon EC2, EKS, EFA, AWS Batch, and Amazon SageMaker. His experience includes leadership roles in product management and product development at NetApp, Micron Technology, Qualcomm, and Mentor Graphics.
Configure cross-account access of Amazon Redshift clusters in Amazon SageMaker Studio using VPC peering
With cloud computing, as compute power and data became more available, machine learning (ML) is now making an impact across every industry and is a core part of every business and industry.
Amazon SageMaker Studio is the first fully integrated ML development environment (IDE) with a web-based visual interface. You can perform all ML development steps and have complete access, control, and visibility into each step required to build, train, and deploy models.
Amazon Redshift is a fully managed, fast, secure, and scalable cloud data warehouse. Organizations often want to use SageMaker Studio to get predictions from data stored in a data warehouse such as Amazon Redshift.
As described in the AWS Well-Architected Framework, separating workloads across accounts enables your organization to set common guardrails while isolating environments. This can be particularly useful for certain security requirements, as well as to simplify cost controls and monitoring between projects and teams. Organizations with a multi-account architecture typically have Amazon Redshift and SageMaker Studio in two separate AWS accounts. Also, Amazon Redshift and SageMaker Studio are typically configured in VPCs with private subnets to improve security and reduce the risk of unauthorized access as a best practice.
Amazon Redshift natively supports cross-account data sharing when RA3 node types are used. If you’re using any other Amazon Redshift node types, such as DS2 or DC2, you can use VPC peering to establish a cross-account connection between Amazon Redshift and SageMaker Studio.
In this post, we walk through step-by-step instructions to establish a cross-account connection to any Amazon Redshift node type (RA3, DC2, DS2) by connecting the Amazon Redshift cluster located in one AWS account to SageMaker Studio in another AWS account in the same Region using VPC peering.
Solution overview
We start with two AWS accounts: a producer account with the Amazon Redshift data warehouse, and a consumer account for Amazon SageMaker ML use cases that has SageMaker Studio set up. The following is a high-level overview of the workflow:
- Set up SageMaker Studio with
VPCOnly
mode in the consumer account. This prevents SageMaker from providing internet access to your studio notebooks. All SageMaker Studio traffic is through the specified VPC and subnets. - Update your SageMaker Studio domain to turn on
SourceIdentity
to propagate the user profile name. - Create an AWS Identity and Access Management (IAM) role in the Amazon Redshift producer account that the SageMaker Studio IAM role will assume to access Amazon Redshift.
- Update the SageMaker IAM execution role in the SageMaker Studio consumer account that SageMaker Studio will use to assume the role in the producer Amazon Redshift account.
- Set up a peering connection between VPCs in the Amazon Redshift producer account and SageMaker Studio consumer account.
- Query Amazon Redshift in SageMaker Studio in the consumer account.
The following diagram illustrates our solution architecture.
Prerequisites
The steps in this post assume that Amazon Redshift is launched in a private subnet in the Amazon Redshift producer account. Launching Amazon Redshift in a private subnet provides an additional layer of security and isolation compared to launching it in a public subnet because the private subnet is not directly accessible from the internet and more secure from external attacks.
To download public libraries, you must create a VPC and a private and public subnet in the SageMaker consumer account. Then launch a NAT gateway in the public subnet and add an internet gateway for SageMaker Studio in the private subnet to access the internet. For instructions on how to establish a connection to a private subnet, refer to How do I set up a NAT gateway for a private subnet in Amazon VPC?
Set up SageMaker Studio with VPCOnly mode in the consumer account
To create SageMaker Studio with VPCOnly
mode, complete the following steps:
- On the SageMaker console, choose Studio in the navigation pane.
- Launch SageMaker Studio, choose Standard setup, and choose Configure.
If you’re already using AWS IAM Identity Center (successor to AWS Single Sign-On) for accessing your AWS accounts, you can use it for authentication. Otherwise, you can use IAM for authentication and use your existing federated roles.
- In the General settings section, select Create a new role.
- In the Create an IAM role section, optionally specify your Amazon Simple Storage Service (Amazon S3) buckets by selecting Any, Specific, or None, then choose Create role.
This creates a SageMaker execution role, such as AmazonSageMaker-ExecutionRole-00000000
.
- Under Network and Storage Section, choose your VPC, subnet (private subnet), and security group that you created as a prerequisite.
- Select VPC Only, then choose Next.
Update your SageMaker Studio domain to turn on SourceIdentity to propagate the user profile name
SageMaker Studio is integrated with AWS CloudTrail to enable administrators to monitor and audit user activity and API calls from SageMaker Studio notebooks. You can configure SageMaker Studio to record the user identity (specifically, the user profile name) to monitor and audit user activity and API calls from SageMaker Studio notebooks in CloudTrail events.
To log specific user activity among several user profiles, we recommended that you turn on SourceIdentity
to propagate the SageMaker Studio domain with the user profile name. This allows you to persist the user information into the session so you can attribute actions to a specific user. This attribute is also persisted over when you chain roles, so you can get fine-grained visibility into their actions in the producer account. As of the time this post was written, you can only configure this using the AWS Command Line Interface (AWS CLI) or any command line tool.
To update this configuration, all apps in the domain must be in the Stopped or Deleted state.
Use the following code to enable the propagation of the user profile name as the SourceIdentity
:
This requires that you add sts:SetSourceIdentity
in the trust relationship for your execution role.
Create an IAM role in the Amazon Redshift producer account that SageMaker Studio must assume to access Amazon Redshift
To create a role that SageMaker will assume to access Amazon Redshift, complete the following steps:
- Open the IAM console in the Amazon Redshift producer account.
- Choose Roles in the navigation pane, then choose Create role.
- On the Select trusted entity page, select Custom trust policy.
- Enter the following custom trust policy into the editor and provide your SageMaker consumer account ID and the SageMaker execution role that you created:
- Choose Next.
- On the Add required permissions page, choose Create policy.
- Add the following sample policy and make necessary edits based on your configuration.
- Save the policy by adding a name, such as
RedshiftROAPIUserAccess
.
The SourceIdentity
attribute is used to tie the identity of the original SageMaker Studio user to the Amazon Redshift database user. The actions by the user in the producer account can then be monitored using CloudTrail and Amazon Redshift database audit logs.
- On the Name, review, and create page, enter a role name, review the settings, and choose Create role.
Update the IAM role in the SageMaker consumer account that SageMaker Studio assumes in the Amazon Redshift producer account
To update the SageMaker execution role for it to assume the role that we just created, complete the following steps:
- Open the IAM console in the SageMaker consumer account.
- Choose Roles in the navigation pane, then choose the SageMaker execution role that we created (
AmazonSageMaker-ExecutionRole-*
). - In the Permissions policy section, on the Add permissions menu, choose Create inline policy.
- In the editor, on the JSON tab, enter the following policy, where <StudioRedshiftRoleARN> is the ARN of the role you created in the Amazon Redshift producer account:
You can get the ARN of the role created in the Amazon Redshift producer account on the IAM console, as shown in the following screenshot.
- Choose Review policy.
- For Name, enter a name for your policy.
- Choose Create policy.
Your permission policies should look similar to the following screenshot.
Set up a peering connection between the VPCs in the Amazon Redshift producer account and SageMaker Studio consumer account
To establish communication between the SageMaker Studio VPC and Amazon Redshift VPC, the two VPCs need to be peered using VPC peering. Complete the following steps to establish a connection:
- In either the Amazon Redshift or SageMaker account, open the Amazon VPC console.
- In the navigation pane, choose Peering connections, then choose Create peering connection.
- For Name, enter a name for your connection.
- Under Select a local VPC to peer with, choose a local VPC.
- Under Select another VPC to peer with, specify another VPC in the same Region and another account.
- Choose Create peering connection.
- Review the VPC peering connection and choose Accept request to activate.
After the VPC peering connection is successfully established, you create routes on both the SageMaker and Amazon Redshift VPCs to complete connectivity between them.
- In the SageMaker account, open the Amazon VPC console.
- Choose Route tables in the navigation pane, then choose the VPC that is associated with SageMaker and edit the routes.
- Add CIDR for the destination Amazon Redshift VPC and the target as the peering connection.
- Additionally, add a NAT gateway.
- Choose Save changes.
- In the Amazon Redshift account, open the Amazon VPC console.
- Choose Route tables in the navigation pane, then choose the VPC that is associated with Amazon Redshift and edit the routes.
- Add CIDR for the destination SageMaker VPC and the target as the peering connection.
- Additionally, add an internet gateway.
- Choose Save changes.
You can connect to SageMaker Studio from your VPC through an interface endpoint in your VPC instead of connecting over the internet. When you use a VPC interface endpoint, communication between your VPC and the SageMaker API or runtime is conducted entirely and securely within the AWS network.
- To create a VPC endpoint, in the SageMaker account, open the VPC console.
- Choose Endpoints in the navigation pane, then choose Create endpoint.
- Specify the SageMaker VPC, the respective subnets and appropriate security groups to allow inbound and outbound NFS traffic for your SageMaker notebooks domain, and choose Create VPC endpoint.
Query Amazon Redshift in SageMaker Studio in the consumer account
After all the networking has been successfully established, follow the steps in this section to connect to the Amazon Redshift cluster in the SageMaker Studio consumer account using the AWS SDK for pandas library:
- In SageMaker Studio, create a new notebook.
- If the AWS SDK for pandas package is not installed you can install it using the following:
This installation is not persistent and will be lost if the KernelGateway App is deleted. Custom packages can be added as part of a Lifecycle Configuration.
- Enter the following code in the first cell and run the code. Replace
RoleArn
andregion_name
values based on your account settings:
- Enter the following code in a new cell and run the code to get the current SageMaker user profile name:
- Enter the following code in a new cell and run the code:
To successfully query Amazon Redshift, your database administrator needs to assign the newly created user with the required read permissions within the Amazon Redshift cluster in the producer account.
- Enter the following code in a new cell, update the query to match your Amazon Redshift table, and run the cell. This should return the records successfully for further data processing and analysis.
You can now start building your data transformations and analysis based on your business requirements.
Clean up
To clean up any resources to avoid incurring recurring costs, delete the SageMaker VPC endpoints, Amazon Redshift cluster, and SageMaker Studio apps, users, and domain. Also delete any S3 buckets and objects you created.
Conclusion
In this post, we showed how to establish a cross-account connection between private Amazon Redshift and SageMaker Studio VPCs in different accounts using VPC peering and access Amazon Redshift data in SageMaker Studio using IAM role chaining, while also logging the user identity when the user accessed Amazon Redshift from SageMaker Studio. With this solution, you eliminate the need to manually move data between accounts to access data. We also walked through how to access the Amazon Redshift cluster using the AWS SDK for pandas library in SageMaker Studio and prepare the data for your ML use cases.
To learn more about Amazon Redshift and SageMaker, refer to the Amazon Redshift Database Developer Guide and Amazon SageMaker Documentation.
About the Authors
Supriya Puragundla is a Senior Solutions Architect at AWS. She helps key customer accounts on their AI and ML journey. She is passionate about data-driven AI and the area of depth in machine learning.
Marc Karp is a Machine Learning Architect with the Amazon SageMaker team. He focuses on helping customers design, deploy, and manage ML workloads at scale. In his spare time, he enjoys traveling and exploring new places.
Effectively solve distributed training convergence issues with Amazon SageMaker Hyperband Automatic Model Tuning
Recent years have shown amazing growth in deep learning neural networks (DNNs). This growth can be seen in more accurate models and even opening new possibilities with generative AI: large language models (LLMs) that synthesize natural language, text-to-image generators, and more. These increased capabilities of DNNs come with the cost of having massive models that require significant computational resources in order to be trained. Distributed training addresses this problem with two techniques: data parallelism and model parallelism. Data parallelism is used to scale the training process over multiple nodes and workers, and model parallelism splits a model and fits them over the designated infrastructure. Amazon SageMaker distributed training jobs enable you with one click (or one API call) to set up a distributed compute cluster, train a model, save the result to Amazon Simple Storage Service (Amazon S3), and shut down the cluster when complete. Furthermore, SageMaker has continuously innovated in the distributed training space by launching features like heterogeneous clusters and distributed training libraries for data parallelism and model parallelism.
Efficient training on a distributed environment requires adjusting hyperparameters. A common example of good practice when training on multiple GPUs is to multiply batch (or mini-batch) size by the GPU number in order to keep the same batch size per GPU. However, adjusting hyperparameters often impacts model convergence. Therefore, distributed training needs to balance three factors: distribution, hyperparameters, and model accuracy.
In this post, we explore the effect of distributed training on convergence and how to use Amazon SageMaker Automatic Model Tuning to fine-tune model hyperparameters for distributed training using data parallelism.
The source code mentioned in this post can be found on the GitHub repository (an m5.xlarge instance is recommended).
Scale out training from a single to distributed environment
Data parallelism is a way to scale the training process to multiple compute resources and achieve faster training time. With data parallelism, data is partitioned among the compute nodes, and each node computes the gradients based on their partition and updates the model. These updates can be done using one or multiple parameter servers in an asynchronous, one-to-many, or all-to-all fashion. Another way can be to use an AllReduce algorithm. For example, in the ring-allreduce algorithm, each node communicates with only two of its neighboring nodes, thereby reducing the overall data transfers. To learn more about parameter servers and ring-allreduce, see Launching TensorFlow distributed training easily with Horovod or Parameter Servers in Amazon SageMaker. With regards to data partitioning, if there are n compute nodes, then each node should get a subset of the data, approximately 1/n in size.
To demonstrate the effect of scaling out training on model convergence, we run two simple experiments:
- Train an image classification model using a fully connected-layer DNN with ReLU activation functions using MXNet and Gluon frameworks. For training data, we used the MNIST dataset of handwritten digits. We used the source provided in the SageMaker example repository.
- Train a binary classification model using the SageMaker built-in XGBoost algorithm. We used the direct marketing dataset to predict bank customers who are likely to respond with a specific offer. The source code and steps to reproduce the experiment can be found on the GitHub repo.
Each model training ran twice: on a single instance and distributed over multiple instances. For the DNN distributed training, in order to fully utilize the distributed processors, we multiplied the mini-batch size by the number of instances (four). The following table summarizes the setup and results.
Problem type | Image classification | Binary classification | ||
Model | DNN | XGBoost | ||
Instance | ml.c4.xlarge | ml.m5.2xlarge | ||
Data set |
(Labeled images) |
Direct Marketing (tabular, numeric and vectorized categories) |
||
Validation metric | Accuracy | AUC | ||
Epocs/Rounds | 20 | 150 | ||
Number of Instances | 1 | 4 | 1 | 3 |
Distribution type | N/A | Parameter server | N/A | AllReduce |
Training time (minutes) | 8 | 3 | 3 | 1 |
Final Validation score | 0.97 | 0.11 | 0.78 | 0.63 |
For both models, the training time was reduced almost linearly by the distribution factor. However, model convergence suffered a significant drop. This behavior is consistent for the two different models, the different compute instances, the different distribution methods, and different data types. So, why did distributing the training process affect model accuracy?
There are a number of theories that try to explain this effect:
- When tensor updates are big in size, traffic between workers and the parameter server can get congested. Therefore, asynchronous parameter servers will suffer significantly worse convergence due to delays in weights updates [1].
- Increasing batch size can lead to over-fitting and poor generalization, thereby reducing the validation accuracy [2].
- When asynchronously updating model parameters, some DNNs might not be using the most recent updated model weights; therefore, they will be calculating gradients based on weights that are a few iterations behind. This leads to weight staleness [3] and can be caused by a number of reasons.
- Some hyperparameters are model or optimizer specific. For example, the XGBoost official documentation says that the
exact
value for thetree_mode
hyperparameter doesn’t support distributed training because XGBoost employs row splitting data distribution whereas theexact
tree method works on a sorted column format. - Some researchers proposed that configuring a larger mini-batch may lead to gradients with less stochasticity. This can happen when the loss function contains local minima and saddle points and no change is made to step size, to optimization getting stuck in such local minima or saddle point [4].
Optimize for distributed training
Hyperparameter optimization (HPO) is the process of searching and selecting a set of hyperparameters that are optimal for a learning algorithm. SageMaker Automatic Model Tuning (AMT) provides HPO as a managed service by running multiple training jobs on the provided dataset. SageMaker AMT searches the ranges of hyperparameters that you specify and returns the best values, as measured by a metric that you choose. You can use SageMaker AMT with the built-in algorithms or use your custom algorithms and containers.
However, optimizing for distributed training differs from common HPO because instead of launching a single instance per training job, each job actually launches a cluster of instances. This means a greater impact on cost (especially if you consider costly GPU-accelerated instances, which are typical for DNN). In addition to AMT limits, you could possibly hit SageMaker account limits for concurrent number of training instances. Finally, launching clusters can introduce operational overhead due to longer starting time. SageMaker AMT has specific features to address these issues. Hyperband with early stopping ensures that well-performing hyperparameters configurations are fine-tuned and those that underperform are automatically stopped. This enables efficient use of training time and reduces unnecessary costs. Also, SageMaker AMT fully supports the use of Amazon EC2 Spot Instances, which can optimize the cost of training up to 90% over on-demand instances. With regards to long start times, SageMaker AMT automatically reuses training instances within each tuning job, thereby reducing the average startup time of each training job by 20 times. Additionally, you should follow AMT best practices, such as choosing the relevant hyperparameters, their appropriate ranges and scales, and the best number of concurrent training jobs, and setting a random seed to reproduce results.
In the next section, we see these features in action as we configure, run, and analyze an AMT job using the XGBoost example we discussed earlier.
Configure, run, and analyze a tuning job
As mentioned earlier, the source code can be found on the GitHub repo. In Steps 1–5, we download and prepare the data, create the xgb3
estimator (the distributed XGBoost estimator is set to use three instances), run the training jobs, and observe the results. In this section, we describe how to set up the tuning job for that estimator, assuming you already went through Steps 1–5.
A tuning job computes optimal hyperparameters for the training jobs it launches by using a metric to evaluate performance. You can configure your own metric, which SageMaker will parse based on regex you configure and emit to stdout
, or use the metrics of SageMaker built-in algorithms. In this example, we use the built-in XGBoost objective metric, so we don’t need to configure a regex. To optimize for model convergence, we optimize based on the validation AUC metric:
We tune seven hyperparameters:
- num_round – Number of rounds for boosting during the training.
- eta – Step size shrinkage used in updates to prevent overfitting.
- alpha – L1 regularization term on weights.
- min_child_weight – Minimum sum of instance weight (hessian) needed in a child. If the tree partition step results in a leaf node with the sum of instance weight less than
min_child_weight
, the building process gives up further partitioning. - max_depth – Maximum depth of a tree.
- colsample_bylevel – Subsample ratio of columns for each split, in each level. This subsampling takes place once for every new depth level reached in a tree.
- colsample_bytree – Subsample ratio of columns when constructing each tree. For every tree constructed, the subsampling occurs once.
To learn more about XGBoost hyperparameters, see XGBoost Hyperparameters. The following code shows the seven hyperparameters and their ranges:
Next, we provide the configuration for the Hyperband strategy and the tuner object configuration using the SageMaker SDK. HyperbandStrategyConfig
can use two parameters: max_resource
(optional) for the maximum number of iterations to be used for a training job to achieve the objective, and min_resource
– the minimum number of iterations to be used by a training job before stopping the training. We use HyperbandStrategyConfig
to configure StrategyConfig
, which is later used by the tuning job definition. See the following code:
Now we create a HyperparameterTuner
object, to which we pass the following information:
- The XGBoost estimator, set to run with three instances
- The objective metric name and definition
- Our hyperparameter ranges
- Tuning resource configurations such as number of training jobs to run in total and how many training jobs can be run in parallel
- Hyperband settings (the strategy and configuration we configured in the last step)
- Early stopping (
early_stopping_type
) set toOff
Why do we set early stopping to Off? Training jobs can be stopped early when they are unlikely to improve the objective metric of the hyperparameter tuning job. This can help reduce compute time and avoid overfitting your model. However, Hyperband uses an advanced built-in mechanism to apply early stopping. Therefore, the parameter early_stopping_type
must be set to Off
when using the Hyperband internal early stopping feature. See the following code:
Finally, we start the automatic model tuning job by calling the fit method. If you want to launch the job in an asynchronous fashion, set wait
to False
. See the following code:
You can follow the job progress and summary on the SageMaker console. In the navigation pane, under Training, choose Hyperparameter tuning jobs, then choose the relevant tuning job. The following screenshot shows the tuning job with details on the training jobs’ status and performance.
When the tuning job is complete, we can review the results. In the notebook example, we show how to extract results using the SageMaker SDK. First, we examine how the tuning job increased model convergence. You can attach the HyperparameterTuner
object using the job name and call the describe method. The method returns a dictionary containing tuning job metadata and results.
In the following code, we retrieve the value of the best-performing training job, as measured by our objective metric (validation AUC):
The result is 0.78 in AUC on the validation set. That’s a significant improvement over the initial 0.63!
Next, let’s see how fast our training job ran. For that, we use the HyperparameterTuningJobAnalytics method in the SDK to fetch results about the tuning job, and read into a Pandas data frame for analysis and visualization:
Let’s see the average time a training job took with Hyperband strategy:
The average time took approximately 1 minute. This is consistent with the Hyperband strategy mechanism that stops underperforming training jobs early. In terms of cost, the tuning job charged us for a total of 30 minutes of training time. Without Hyperband early stopping, the total billable training duration was expected to be 90 minutes (30 jobs * 1 minutes per job * 3 instances per job). That is three times better in cost savings! Finally, we see that the tuning job ran 30 training jobs and took a total of 12 minutes. That is almost 50% less of the expected time (30 jobs/4 jobs in parallel * 3 minutes per job).
Conclusion
In this post, we described some observed convergence issues when training models with distributed environments. We saw that SageMaker AMT using Hyperband addressed the main concerns that optimizing data parallel distributed training introduced: convergence (which improved by more than 10%), operational efficiency (the tuning job took 50% less time than a sequential, non-optimized job would have taken) and cost-efficiency (30 vs. the 90 billable minutes of training job time). The following table summarizes our results:
Improvement Metric | No Tuning/Naive Model Tuning Implementation | SageMaker Hyperband Automatic Model Tuning | Measured Improvement |
Model Quality (Measured by validation AUC) |
0.63 | 0.78 | 15% |
Cost (Measured by billable training minutes) |
90 | 30 | 66% |
Operational efficiency (Measured by total running time) |
24 | 12 | 50% |
In order to fine-tune with regards to scaling (cluster size), you can repeat the tuning job with multiple cluster configurations and compare the results to find the optimal hyperparameters that satisfy speed and model accuracy.
We included the steps to achieve this in the last section of the notebook.
References
[1] Lian, Xiangru, et al. “Asynchronous decentralized parallel stochastic gradient descent.” International Conference on Machine Learning. PMLR, 2018. [2] Keskar, Nitish Shirish, et al. “On large-batch training for deep learning: Generalization gap and sharp minima.” arXiv preprint arXiv:1609.04836 (2016). [3] Dai, Wei, et al. “Toward understanding the impact of staleness in distributed machine learning.” arXiv preprint arXiv:1810.03264 (2018). [4] Dauphin, Yann N., et al. “Identifying and attacking the saddle point problem in high-dimensional non-convex optimization.” Advances in neural information processing systems 27 (2014).About the Author
Uri Rosenberg is the AI & ML Specialist Technical Manager for Europe, Middle East, and Africa. Based out of Israel, Uri works to empower enterprise customers to design, build, and operate ML workloads at scale. In his spare time, he enjoys cycling, hiking, and complaining about data preparation.
Bringing code analysis tools to Jupyter notebooks
Based on a survey of thousands of machine learning practitioners, a new CodeGuru extension addresses common problems, such as code cell execution order, incorrect API calls, and security.Read More
Pronunciation detection for Alexa’s new English-learning experience
Data augmentation, novel loss functions, and weakly supervised training enable a state-of-the art model for recognizing mispronunciations.Read More