Novel “cuboid attention” helps transformers handle large-scale multidimensional data, while diffusion models enable probabilistic predictionRead More
RecSys: Rajeev Rastogi on three recommendation system challenges
In a keynote address, the Amazon International vice president will discuss recommendations in directed graphs, training models whose target labels change, and using prediction uncertainty to improve model performance.Read More
Build a classification pipeline with Amazon Comprehend custom classification (Part I)
“Data locked away in text, audio, social media, and other unstructured sources can be a competitive advantage for firms that figure out how to use it“
Only 18% of organizations in a 2019 survey by Deloitte reported being able to take advantage of unstructured data. The majority of data, between 80% and 90%, is unstructured data. That is a big untapped resource that has the potential to give businesses a competitive edge if they can find out how to use it. It can be difficult to find insights from this data, particularly if efforts are needed to classify, tag, or label it. Amazon Comprehend custom classification can be useful in this situation. Amazon Comprehend is a natural-language processing (NLP) service that uses machine learning to uncover valuable insights and connections in text.
Document categorization or classification has significant benefits across business domains –
- Improved search and retrieval – By categorizing documents into relevant topics or categories, it makes it much easier for users to search and retrieve the documents they need. They can search within specific categories to narrow down results.
- Knowledge management – Categorizing documents in a systematic way helps to organize an organization’s knowledge base. It makes it easier to locate relevant information and see connections between related content.
- Streamlined workflows – Automatic document sorting can help streamline many business processes like processing invoices, customer support, or regulatory compliance. Documents can be automatically routed to the right people or workflows.
- Cost and time savings – Manual document categorization is tedious, time-consuming, and expensive. AI techniques can take over this mundane task and categorize thousands of documents in a short time at a much lower cost.
- Insight generation – Analyzing trends in document categories can provide useful business insights. For example, an increase in customer complaints in a product category could signify some issues that need to be addressed.
- Governance and policy enforcement – Setting up document categorization rules helps to ensure that documents are classified correctly according to an organization’s policies and governance standards. This allows for better monitoring and auditing.
- Personalized experiences – In contexts like website content, document categorization allows for tailored content to be shown to users based on their interests and preferences as determined from their browsing behavior. This can increase user engagement.
The complexity of developing a bespoke classification machine learning model varies depending on a variety of aspects such as data quality, algorithm, scalability, and domain knowledge, to mention a few. It’s essential to start with a clear problem definition, clean and relevant data, and gradually work through the different stages of model development. However, businesses can create their own unique machine learning models using Amazon Comprehend custom classification to automatically classify text documents into categories or tags, to meet business specific requirements and map to business technology and document categories. As human tagging or categorization is no longer necessary, this can save businesses a lot of time, money, and labor. We have made this process simple by automating the whole training pipeline.
In first part of this multi-series blog post, you will learn how to create a scalable training pipeline and prepare training data for Comprehend Custom Classification models. We will introduce a custom classifier training pipeline that can be deployed in your AWS account with few clicks. We are using the BBC news dataset, and will be training a classifier to identify the class (e.g. politics, sports) that a document belongs to. The pipeline will enable your organization to rapidly respond to changes and train new models without having to start from scratch each time. You may scale up and train multiple models based on your demand easily.
Prerequisites
- An active AWS account (Click here to create a new AWS account)
- Access to Amazon Comprehend, Amazon S3, Amazon Lambda, Amazon Step Function, Amazon SNS, and Amazon CloudFormation
- Training data (semi-structure or text) prepared in following section
- Basic knowledge about Python and Machine Learning in general
Prepare training data
This solution can take input as either text format (ex. CSV) or semi-structured format (ex. PDF).
Text input
Amazon Comprehend custom classification supports two modes: multi-class and multi-label.
In multi-class mode, each document can have one and only one class assigned to it. The training data should be prepared as two-column CSV file with each line of the file containing a single class and the text of a document that demonstrates the class.
Example for BBC news dataset:
In multi-label mode, each document has at least one class assigned to it, but can have more. Training data should be as a two-column CSV file, which each line of the file containing one or more classes and the text of the training document. More than one class should be indicated by using a delimiter between each class.
No header should be included in the CSV file for either of the training mode.
Semi-structured input
Starting in 2023, Amazon Comprehend now supports training models using semi-structured documents. The training data for semi-structure input is comprised of a set of labeled documents, which can be pre-identified documents from a document repository that you already have access to. The following is an example of an annotations file CSV data required for training (Sample Data):
The annotations CSV file contains three columns: The first column contains the label for the document, the second column is the document name (i.e., file name), and the last column is the page number of the document that you want to include in the training dataset. In most cases, if the annotations CSV file is located at the same folder with all other document, then you just need to specify the document name in the second column. However, if the CSV file is located in a different location, then you’d need to specify the path to location in the second column, such as path/to/prefix/document1.pdf
.
For details, how to prepare your training data, please refer to here.
Solution overview
- Amazon Comprehend training pipeline starts when training data (.csv file for text input and annotation .csv file for semi-structure input) is uploaded to a dedicated Amazon Simple Storage Service (Amazon S3) bucket.
- An AWS Lambda function is invoked by Amazon S3 trigger such that every time an object is uploaded to specified Amazon S3 location, the AWS Lambda function retrieves the source bucket name and the key name of the uploaded object and pass it to training step function workflow.
- In training step function, after receiving the training data bucket name and object key name as input parameters, a custom model training workflow kicks-off as a series of lambdas functions as described:
StartComprehendTraining
: This AWS Lambda function defines aComprehendClassifier
object depending on the type of input files (i.e., text or semi-structured) and then kicks-off an Amazon Comprehend custom classification training task by calling create_document_classifier Application Programming Interfact (API), which returns a training Job Amazon Resource Names (ARN) . Subsequently, this function checks the status of the training job by invoking describe_document_classifier API. Finally, it returns a training Job ARN and job status, as output to the next stage of training workflow.GetTrainingJobStatus
: This AWS Lambda checks the job status of training job in every 15 minutes, by calling describe_document_classifier API, until training job status changes to Complete or Failed.GenerateMultiClass
orGenerateMultiLabel
: If you select yes for performance report when launching the stack, one of these two AWS Lambdas will run analysis according to your Amazon Comprehend model outputs, which generates per class performance analysis and save it to Amazon S3.GenerateMultiClass
: This AWS Lambda will be called if your input is MultiClass and you select yes for performance report.GenerateMultiLabel
: This AWS Lambda will be called if your input is MultiLabel and you select yes for performance report.
- Once the training is done successfully, the solution generates following outputs:
- Custom Classification Model: A trained model ARN will be available in your account for future inference work.
- Confusion Matrix [Optional]: A confusion matrix (
confusion_matrix
.json) will be available in user defined output Amazon S3 path, depending on the user selection. - Amazon Simple Notification Service notification [Optional]: A notification email will be sent about training job status to the subscribers, depending on the initial user selection.
Walkthrough
Launching the solution
To deploy your pipeline, complete the following steps:
- Choose Launch Stack button:
- Choose Next
- Specify the pipeline details with the options fitting your use case:
Information for each stack detail:
- Stack name (Required) – the name you specified for this AWS CloudFormation stack. The name must be unique in the Region in which you’re creating it.
- Q01ClassifierInputBucketName (Required) – The Amazon S3 bucket name to store your input data. It should be a globally unique name and AWS CloudFormation stack helps you create the bucket while it’s being launched.
- Q02ClassifierOutputBucketName (Required) – The Amazon S3 bucket name to store outputs from Amazon Comprehend and the pipeline. It should also be a globally unique name.
- Q03InputFormat – A dropdown selection, you can choose text (if your training data is csv files) or semi-structure (if your training data are semi-structure [e.g., PDF files]) based on your data input format.
- Q04Language – A dropdown selection, choosing the language of documents from supported list. Please note, currently only English is supported if your input format is semi-structure.
- Q05MultiClass – A dropdown selection, select yes if your input is MultiClass mode. Otherwise, select no.
- Q06LabelDelimiter – Only required if your Q05MultiClass answer is no. This delimiter is used in your training data to separate each class.
- Q07ValidationDataset – A dropdown selection, change the answer to yes if you want to test the performance of trained classifier with your own test data.
- Q08S3ValidationPath – Only required if your Q07ValidationDataset answer is yes.
- Q09PerformanceReport – A dropdown selection, select yes if you want to generate the class-level performance report post model training. The report will be saved in you specified output bucket in Q02ClassifierOutputBucketName.
- Q10EmailNotification – A dropdown selection. Select yes if you want to receive notification after model is trained.
- Q11EmailID – Enter valid email address for receiving performance report notification. Please note, you have to confirm subscription from your email after AWS CloudFormation stack is launched, before you could receive notification when training is completed.
- In the Amazon Configure stack options section, add optional tags, permissions, and other advanced settings.
- Choose Next
- Review the stack details and select I acknowledge that AWS CloudFormation might create AWS IAM resources.
- Choose Submit. This initiates pipeline deployment in your AWS account.
- After the stack is deployed successfully, then you can start using the pipeline. Create a
/training-data
folder under your specified Amazon S3 location for input. Note: Amazon S3 automatically applies server-side encryption (SSE-S3) for each new object unless you specify a different encryption option. Please refer Data protection in Amazon S3 for more details on data protection and encryption in Amazon S3.
- Upload your training data to the folder. (If the training data are semi-structure, then upload all the PDF files before uploading .csv format label information).
You’re done! You’ve successfully deployed your pipeline and you can check the pipeline status in deployed step function. (You will have a trained model in your Amazon Comprehend custom classification panel).
If you choose the model and its version inside Amazon Comprehend Console, then you can now see more details about the model you just trained. It includes the Mode you select, which corresponds to the option Q05MultiClass, the number of labels, and the number of trained and test documents inside your training data. You could also check the overall performance below; however, if you want to check detailed performance for each class, then please refer to the Performance Report generated by the deployed pipeline.
Service quotas
Your AWS account has default quotas for Amazon Comprehend and AmazonTextract, if inputs are in semi-structure format. To view service quotas, please refer here for Amazon Comprehend and here for AmazonTextract.
Clean up
To avoid incurring ongoing charges, delete the resources you created as part of this solution when you’re done.
- On the Amazon S3 console, manually delete the contents inside buckets you created for input and output data.
- On the AWS CloudFormation console, choose Stacks in the navigation pane.
- Select the main stack and choose Delete.
This automatically deletes the deployed stack.
- Your trained Amazon Comprehend custom classification model will remain in your account. If you don’t need it anymore, in Amazon Comprehend console, delete the created model.
Conclusion
In this post, we showed you the concept of a scalable training pipeline for Amazon Comprehend custom classification models and providing an automated solution to efficiently training new models. The AWS CloudFormation template provided makes it possible for you to create your own text classification models effortlessly, catering to demand scales. The solution adopts the recent announced Euclid feature and accepts inputs in text or semi-structured format.
Now, we encourage you, our readers, to test these tools. You can find more details about training data preparation and understand the custom classifier metrics. Try it out and see firsthand how it can streamline your model training process and enhance efficiency. Please share your feedback to us!
About the Authors
Sandeep Singh is a Senior Data Scientist with AWS Professional Services. He is passionate about helping customers innovate and achieve their business objectives by developing state-of-the-art AI/ML powered solutions. He is currently focused on Generative AI, LLMs, prompt engineering, and scaling Machine Learning across enterprises. He brings recent AI advancements to create value for customers.
Yanyan Zhang is a Senior Data Scientist in the Energy Delivery team with AWS Professional Services. She is passionate about helping customers solve real problems with AI/ML knowledge. Recently, her focus has been on exploring the potential of Generative AI and LLM. Outside of work, she loves traveling, working out and exploring new things.
Wrick Talukdar is a Senior Architect with the Amazon Comprehend Service team. He works with AWS customers to help them adopt machine learning on a large scale. Outside of work, he enjoys reading and photography.
Fine-tune Falcon 7B and other LLMs on Amazon SageMaker with @remote decorator
Today, generative AI models cover a variety of tasks from text summarization, Q&A, and image and video generation. To improve the quality of output, approaches like n-short learning, Prompt engineering, Retrieval Augmented Generation (RAG) and fine tuning are used. Fine-tuning allows you to adjust these generative AI models to achieve improved performance on your domain-specific tasks.
With Amazon SageMaker, now you can run a SageMaker training job simply by annotating your Python code with @remote decorator. The SageMaker Python SDK automatically translates your existing workspace environment, and any associated data processing code and datasets, into an SageMaker training job that runs on the training platform. This has the advantage of writing the code in a more natural, object-oriented way, and still uses SageMaker capabilities to run training jobs on a remote cluster with minimal changes.
In this post, we showcase how to fine-tune a Falcon-7B Foundation Models (FM) using @remote decorator from SageMaker Python SDK. It also uses Hugging Face’s parameter-efficient fine-tuning (PEFT) library and quantization techniques through bitsandbytes to support fine-tuning. The code presented in this blog can also be used to fine-tune other FMs, such as Llama-2 13b.
The full precision representations of this model might have challenges to fit into memory on a single or even several Graphic Processing Units (GPUs) — or may even need a bigger instance. Hence, in order to fine-tune this model without increasing cost, we use the technique known as Quantized LLMs with Low-Rank Adapters (QLoRA). QLoRA is an efficient fine-tuning approach that reduces memory usage of LLMs while maintaining very good performance.
Advantages of using @remote decorator
Before going further, let’s understand how remote decorator improves developer productivity while working with SageMaker:
- @remote decorator triggers a training job directly using native python code, without the explicit invocation of SageMaker Estimators and SageMaker input channels
- Low barrier for entry for developers training models on SageMaker.
- No need to switch Integrated development environments (IDEs). Continue writing code in your choice of IDE and invoke SageMaker training jobs.
- No need to learn about containers. Continue providing dependencies in a
requirements.txt
and supply that to remote decorator.
Prerequisites
An AWS account is needed with an AWS Identity and Access Management (AWS IAM) role that has permissions to manage resources created as part of the solution. For details, refer to Creating an AWS account.
In this post, we use Amazon SageMaker Studio with the Data Science 3.0
image and a ml.t3.medium
fast launch instance. However, you can use any integrated development environment (IDE) of your choice. You just need to set up your AWS Command Line Interface (AWS CLI) credentials correctly. For more information, refer to Configure the AWS CLI.
For fine-tuning, the Falcon-7B, an ml.g5.12xlarge
instance is used in this post. Please ensure sufficient capacity for this instance in AWS account.
You need to clone this Github repository for replicating the solution demonstrated in this post.
Solution overview
- Install pre-requisites to fine tuning the Falcon-7B model
- Set up remote decorator configurations
- Preprocess the dataset containing AWS services FAQs
- Fine-tune Falcon-7B on AWS services FAQs
- Test the fine-tune models on sample questions related to AWS services
1. Install prerequisites to fine tuning the Falcon-7B model
Launch the notebook falcon-7b-qlora-remote-decorator_qa.ipynb in SageMaker Studio by selecting the Image as Data Science
and Kernel as Python 3
. Install all the required libraries mentioned in the requirements.txt
. Few of the libraries need to be installed on the notebook instance itself. Perform other operations needed for dataset processing and triggering a SageMaker training job.
2. Setup remote decorator configurations
Create a configuration file where all the configurations related to Amazon SageMaker training job are specified. This file is read by @remote decorator while running the training job. This file contains settings like dependencies, training image, instance, and the execution role to be used for training job. For a detailed reference of all the settings supported by config file, check out Configuring and using defaults with the SageMaker Python SDK.
It’s not mandatory to use the config.yaml
file in order to work with the @remote decorator. This is just a cleaner way to supply all configurations to the @remote decorator. This keeps SageMaker and AWS related parameters outside of code with a one time effort for setting up the config file used across the team members. All the configurations could also be supplied directly in the decorator arguments, but that reduces readability and maintainability of changes in the long run. Also, the configuration file can be created by an administrator and shared with all the users in an environment.
Preprocess the dataset containing AWS services FAQs
Next step is to load and preprocess the dataset to make it ready for training job. First, let us have a look at the dataset:
It shows FAQ for one of the AWS services. In addition to QLoRA, bitsanbytes
is used to convert to 4-bit precision to quantize frozen LLM to 4-bit and attach LoRA adapters on it.
Create a prompt template to convert each FAQ sample to a prompt format:
Next step is to convert the inputs (text) to token IDs. This is done by a Hugging Face Transformers Tokenizer.
Now simply use the prompt_template
function to convert all the FAQ to prompt format and set up train and test datasets.
4. Fine tune Falcon-7B on AWS services FAQs
Now you can prepare the training script and define the training function train_fn
and put @remote decorator on the function.
The training function does the following:
- tokenizes and chunks the dataset
- set up
BitsAndBytesConfig
, which specifies the model should be loaded in 4-bit but while computation should be converted tobfloat16
. - Load the model
- Find target modules and update the necessary matrices by using the utility method
find_all_linear_names
- Create LoRA configurations that specify ranking of update matrices (
s
), scaling factor (lora_alpha
), the modules to apply the LoRA update matrices (target_modules
), dropout probability for Lora layers(lora_dropout
),task_type
, etc. - Start the training and evaluation
And invoke the train_fn()
The tuning job would be running on the Amazon SageMaker training cluster. Wait for tuning job to finish.
5. Test the fine tune models on sample questions related to AWS services
Now, it’s time to run some tests on the model. First, let us load the model:
Now load a sample question from the training dataset to see the original answer and then ask the same question from the tuned model to see the answer in comparison.
Here is a sample a question from training set and the original answer:
Now, same question being asked to tuned Falcon-7B model:
This concludes the implementation of fine tuning Falcon-7B on AWS services FAQ dataset using @remote decorator from Amazon SageMaker Python SDK.
Cleaning up
Complete the following steps to clean up your resources:
- Shut down the Amazon SageMaker Studio instances to avoid incurring additional costs.
- Clean up your Amazon Elastic File System (Amazon EFS) directory by clearing the Hugging Face cache directory:
Conclusion
In this post, we showed you how to effectively use the @remote decorator’s capabilities to fine-tune Falcon-7B model using QLoRA, Hugging Face PEFT with bitsandbtyes
without applying significant changes in the training notebook, and used Amazon SageMaker capabilities to run training jobs on a remote cluster.
All the code shown as part of this post to fine-tune Falcon-7B is available in the GitHub repository. The repository also contains notebook showing how to fine-tune Llama-13B.
As a next step, we encourage you to check out the @remote decorator functionality and Python SDK API and use it in your choice of environment and IDE. Additional examples are available in the amazon-sagemaker-examples repository to get you started quickly. You can also check out the following posts:
- Run your local machine learning code as Amazon SageMaker Training jobs with minimal code changes
- Access private repos using the @remote decorator for Amazon SageMaker training workloads
- Interactively fine-tune Falcon-40B and other LLMs on Amazon SageMaker Studio notebooks using QLoRA
About the Authors
Bruno Pistone is an AI/ML Specialist Solutions Architect for AWS based in Milan. He works with large customers helping them to deeply understand their technical needs and design AI and Machine Learning solutions that make the best use of the AWS Cloud and the Amazon Machine Learning stack. His expertise include: Machine Learning end to end, Machine Learning Industrialization, and Generative AI. He enjoys spending time with his friends and exploring new places, as well as travelling to new destinations.
Vikesh Pandey is a Machine Learning Specialist Solutions Architect at AWS, helping customers from financial industries design and build solutions on generative AI and ML. Outside of work, Vikesh enjoys trying out different cuisines and playing outdoor sports.
Simplify access to internal information using Retrieval Augmented Generation and LangChain Agents
This post takes you through the most common challenges that customers face when searching internal documents, and gives you concrete guidance on how AWS services can be used to create a generative AI conversational bot that makes internal information more useful.
Unstructured data accounts for 80% of all the data found within organizations, consisting of repositories of manuals, PDFs, FAQs, emails, and other documents that grows daily. Businesses today rely on continuously growing repositories of internal information, and problems arise when the amount of unstructured data becomes unmanageable. Often, users find themselves reading and checking many different internal sources to find the answers they need.
Internal question and answer forums can help users get highly specific answers but also require longer wait times. In the case of company-specific internal FAQs, long wait times result in lower employee productivity. Question and answer forums are difficult to scale as they rely on manually written answers. With generative AI, there is currently a paradigm shift in how users search and find information. The next logical step is to use generative AI to condense large documents into smaller bite sized information for easier user consumption. Instead of spending a long time reading text or waiting for answers, users can generate summaries in real-time based on multiple existing repositories of internal information.
Solution overview
The solution allows customers to retrieve curated responses to questions asked about internal documents by using a transformer model to generate answers to questions about data that it has not been trained on, a technique known as zero-shot prompting. By adopting this solution, customers can gain the following benefits:
- Find accurate answers to questions based on existing sources of internal documents
- Reduce the time users spend searching for answers by using Large Language Models (LLMs) to provide near-immediate answers to complex queries using documents with the most updated information
- Search previously answered questions through a centralized dashboard
- Reduce stress caused by spending time manually reading information to look for answers
Retrieval Augmented Generation (RAG)
Retrieval Augmented Generation (RAG) reduces some of the shortcomings of LLM based queries by finding the answers from your knowledge base and using the LLM to summarize the documents into concise responses. Please read this post to learn how to implement the RAG approach with Amazon Kendra. The following risks and limitations are associated with LLM based queries that a RAG approach with Amazon Kendra addresses:
- Hallucinations and traceability – LLMS are trained on large data sets and generate responses on probabilities. This can lead to inaccurate answers, which are known as hallucinations.
- Multiple data silos – In order to reference data from multiple sources within your response, one needs to set up a connector ecosystem to aggregate the data. Accessing multiple repositories is manual and time-consuming.
- Security – Security and privacy are critical considerations when deploying conversational bots powered by RAG and LLMs. Despite using Amazon Comprehend to filter out personal data that may be provided through user queries, there remains a possibility of unintentionally surfacing personal or sensitive information, depending on the ingested data. This means that controlling access to the chatbot is crucial to prevent unintended access to sensitive information.
- Data relevance – LLMS are trained on data up to certain date, which means information is often not current. The cost associated with training models on recent data is high. To ensure accurate and up-to-date responses, organizations bear the responsibility of regularly updating and enriching the content of the indexed documents.
- Cost – The cost associated with deploying this solution should be a consideration for businesses. Businesses need to carefully assess their budget and performance requirements when implementing this solution. Running LLMs can require substantial computational resources, which may increase operational costs. These costs can become a limitation for applications that need to operate at a large scale. However, one of the benefits of the AWS Cloud is the flexibility to only pay for what you use. AWS offers a simple, consistent, pay-as-you-go pricing model, so you are charged only for the resources you consume.
Usage of Amazon SageMaker JumpStart
For transformer-based language models, organizations can benefit from using Amazon SageMaker JumpStart, which offers a collection of pre-built machine learning models. Amazon SageMaker JumpStart offers a wide range of text generation and question-answering (Q&A) foundational models that can be easily deployed and utilized. This solution integrates a FLAN T5-XL Amazon SageMaker JumpStart model, but there are different aspects to keep in mind when choosing a foundation model.
Integrating security in our workflow
Following the best practices of the Security Pillar of the Well-Architected Framework, Amazon Cognito is used for authentication. Amazon Cognito User Pools can be integrated with third-party identity providers that support several frameworks used for access control, including Open Authorization (OAuth), OpenID Connect (OIDC), or Security Assertion Markup Language (SAML). Identifying users and their actions allows the solution to maintain traceability. The solution also uses the Amazon Comprehend personally identifiable information (PII) detection feature to automatically identity and redact PII. Redacted PII includes addresses, social security numbers, email addresses, and other sensitive information. This design ensures that any PII provided by the user through the input query is redacted. The PII is not stored, used by Amazon Kendra, or fed to the LLM.
Solution Walkthrough
The following steps describe the workflow of the Question answering over documents flow:
- Users send a query through a web interface.
- Amazon Cognito is used for authentication, ensuring secure access to the web application.
- The web application front-end is hosted on AWS Amplify.
- Amazon API Gateway hosts a REST API with various endpoints to handle user requests that are authenticated using Amazon Cognito.
- PII redaction with Amazon Comprehend:
- User Query Processing: When a user submits a query or input, it is first passed through Amazon Comprehend. The service analyzes the text and identifies any PII entities present within the query.
- PII Extraction: Amazon Comprehend extracts the detected PII entities from the user query.
- Relevant Information Retrieval with Amazon Kendra:
- Amazon Kendra is used to manage an index of documents that contains the information used to generate answers to the user’s queries.
- The LangChain QA retrieval module is used to build a conversation chain that has relevant information about the user’s queries.
- Integration with Amazon SageMaker JumpStart:
- The AWS Lambda function uses the LangChain library and connects to the Amazon SageMaker JumpStart endpoint with a context-stuffed query. The Amazon SageMaker JumpStart endpoint serves as the interface of the LLM used for inference.
- Storing responses and returning it to the user:
- The response from the LLM is stored in Amazon DynamoDB along with the user’s query, the timestamp, a unique identifier, and other arbitrary identifiers for the item such as question category. Storing the question and answer as discrete items allows the AWS Lambda function to easily recreate a user’s conversation history based on the time when questions were asked.
- Finally, the response is sent back to the user via a HTTPs request through the Amazon API Gateway REST API integration response.
The following steps describe the AWS Lambda functions and their flow through the process:
- Check and redact any PII / Sensitive info
- LangChain QA Retrieval Chain
- Search and retrieve relevant info
- Context Stuffing & Prompt Engineering
- LangChain
- Inference with LLM
- Return response & Save it
Use cases
There are many business use cases where customers can use this workflow. The following section explains how the workflow can be used in different industries and verticals.
Employee Assistance
Well-designed corporate training can improve employee satisfaction and reduce the time required for onboarding new employees. As organizations grow and complexity increases, employees find it difficult to understand the many sources of internal documents. Internal documents in this context include company guidelines, policies, and Standard Operating Procedures. For this scenario, an employee has a question in how to proceed and edit an internal issue ticketing ticket. The employee can access and use the generative artificial intelligence (AI) conversational bot to ask and execute the next steps for a specific ticket.
Specific use case: Automate issue resolution for employees based on corporate guidelines.
The following steps describe the AWS Lambda functions and their flow through the process:
- LangChain agent to identify the intent
- Send notification based on employee request
- Modify ticket status
In this architecture diagram, corporate training videos can be ingested through Amazon Transcribe to collect a log of these video scripts. Additionally, corporate training content stored in various sources (i.e., Confluence, Microsoft SharePoint, Google Drive, Jira, etc.) can be used to create indexes through Amazon Kendra connectors. Read this article to learn more on the collection of native connectors you can utilize in Amazon Kendra as a source point. The Amazon Kendra crawler is then able to use both the corporate training video scripts and documentation stored in these other sources to assist the conversational bot in answering questions specific to company corporate training guidelines. The LangChain agent verifies permissions, modifies ticket status, and notifies the correct individuals using Amazon Simple Notification Service (Amazon SNS).
Customer Support Teams
Quickly resolving customer queries improves the customer experience and encourages brand loyalty. A loyal customer base helps drive sales, which contributes to the bottom line and increases customer engagement. Customer support teams spend lots of energy referencing many internal documents and customer relationship management software to answer customer queries about products and services. Internal documents in this context can include generic customer support call scripts, playbooks, escalation guidelines, and business information. The generative AI conversational bot helps with cost optimization because it handles queries on behalf of the customer support team.
Specific use case: Handling an oil change request based on service history and customer service plan purchased.
In this architecture diagram, the customer is routed to either the generative AI conversational bot or the Amazon Connect contact center. This decision can be based on the level of support needed or the availability of customer support agents. The LangChain agent identifies the customer’s intent and verifies identity. The LangChain agent also checks the service history and purchased support plan.
The following steps describe the AWS Lambda functions and their flow through the process:
- LangChain agent identifies the intent
- Retrieve Customer Information
- Check customer service history and warranty information
- Book appointment, provide more information, or route to contact center
- Send email confirmation
Amazon Connect is used to collect the voice and chat logs, and Amazon Comprehend is used to remove personally identifiable information (PII) from these logs. The Amazon Kendra crawler is then able to use the redacted voice and chat logs, customer call scripts, and customer service support plan policies to create the index. Once a decision is made, the generative AI conversational bot decides whether to book an appointment, provide more information, or route the customer to the contact center for further assistance. For cost optimization, the LangChain agent can also generate answers using fewer tokens and a less expensive large language model for lower priority customer queries.
Financial Services
Financial services companies rely on timely use of information to stay competitive and comply with financial regulations. Using a generative AI conversational bot, financial analysts and advisors can interact with textual information in a conversational manner and reduce the time and effort it takes to make better informed decisions. Outside of investment and market research, a generative AI conversational bot can also augment human capabilities by handling tasks that would traditionally require more human effort and time. For example, a financial institution specializing in personal loans can increase the rate at which loans are processed while providing better transparency to customers.
Specific use case: Use customer financial history and previous loan applications to decide and explain loan decision.
The following steps describe the AWS Lambda functions and their flow through the process:
- LangChain agent to identify the intent
- Check customer financial and credit score history
- Check internal customer relationship management system
- Check standard loan policies and suggest decision for employee qualifying the loan
- Send notification to customer
This architecture incorporates customer financial data stored in a database and data stored in a customer relationship management (CRM) tool. These data points are used to inform a decision based on the company’s internal loan policies. The customer is able to ask clarifying questions to understand what loans they qualify for and the terms of the loans they can accept. If the generative AI conversational bot is unable to approve a loan application, the user can still ask questions about improving credit scores or alternative financing options.
Government
Generative AI conversational bots can greatly benefit government institutions by speeding up communication, efficiency, and decision-making processes. Generative AI conversational bots can also provide instant access to internal knowledge bases to help government employees to quickly retrieve information, policies, and procedures (i.e., eligibility criteria, application processes, and citizen’s services and support). One solution is an interactive system, which allows tax payers and tax professionals to easily find tax-related details and benefits. It can be used to understand user questions, summarize tax documents, and provide clear answers through interactive conversations.
Users can ask questions such as:
- How does inheritance tax work and what are the tax thresholds?
- Can you explain the concept of income tax?
- What are the tax implications when selling a second property?
Additionally, users can have the convenience of submitting tax forms to a system, which can help verify the correctness of the information provided.
This architecture illustrates how users can upload completed tax forms to the solution and utilize it for interactive verification and guidance on how to accurately completing the necessary information.
Healthcare
Healthcare businesses have the opportunity to automate the use of large amounts of internal patient information, while also addressing common questions regarding use cases such as treatment options, insurance claims, clinical trials, and pharmaceutical research. Using a generative AI conversational bot enables quick and accurate generation of answers about health information from the provided knowledge base. For example, some healthcare professionals spend a lot of time filling in forms to file insurance claims.
In similar settings, clinical trial administrators and researchers need to find information about treatment options. A generative AI conversational bot can use the pre-built connectors in Amazon Kendra to retrieve the most relevant information from the millions of documents published through ongoing research conducted by pharmaceutical companies and universities.
Specific use case: Reduce the errors and time needed to fill out and send insurance forms.
In this architecture diagram, a healthcare professional is able to use the generative AI conversational bot to figure out what forms need to be filled out for the insurance. The LangChain agent is then able to retrieve the right forms and add the needed information for a patient as well as giving responses for descriptive parts of the forms based on insurance policies and previous forms. The healthcare professional can edit the responses given by the LLM before approving and having the form delivered to the insurance portal.
The following steps describe the AWS Lambda functions and their flow through the process:
- LangChain agent to identify the intent
- Retrieve the patient information needed
- Fill out the insurance form based on the patient information and form guideline
- Submit the form to the insurance portal after user approval
AWS HealthLake is used to securely store the health data including previous insurance forms and patient information, and Amazon Comprehend is used to remove personally identifiable information (PII) from the previous insurance forms. The Amazon Kendra crawler is then able to use the set of insurance forms and guidelines to create the index. Once the form(s) are filled out by the generative AI, then the form(s) reviewed by the medical professional can be sent to the insurance portal.
Cost estimate
The cost of deploying the base solution as a proof-of-concept is shown in the following table. Since the base solution is considered a proof-of-concept, Amazon Kendra Developer Edition was used as a low-cost option since the workload would not be in production. Our assumption for Amazon Kendra Developer Edition was 730 active hours for the month.
For Amazon SageMaker, we made an assumption that the customer would be using the ml.g4dn.2xlarge instance for real-time inference, with a single inference endpoint per instance. You can find more information on Amazon SageMaker pricing and available inference instance types here.
Service | Resources Consumed | Cost Estimate Per Month in USD |
AWS Amplify | 150 build minutes 1 GB of Data served 500,000 requests |
15.71 |
Amazon API Gateway | 1M REST API Calls | 3.5 |
AWS Lambda | 1 Million requests 5 seconds duration per request 2 GB memory allocated |
160.23 |
Amazon DynamoDB | 1 million reads 1 million writes 100 GB storage |
26.38 |
Amazon Sagemaker | Real-time inference with ml.g4dn.2xlarge | 676.8 |
Amazon Kendra | Developer Edition with 730 hours/month 10,000 Documents scanned 5,000 queries/day |
821.25 |
. | . | Total Cost: 1703.87 |
* Amazon Cognito has a free tier of 50,000 Monthly Active Users who use Cognito User Pools or 50 Monthly Active Users who use SAML 2.0 identity providers
Clean Up
To save costs, delete all the resources you deployed as part of the tutorial. You can delete any SageMaker endpoints you may have created via the SageMaker console. Remember, deleting an Amazon Kendra index doesn’t remove the original documents from your storage.
Conclusion
In this post, we showed you how to simplify access to internal information by summarizing from multiple repositories in real-time. After the recent developments of commercially available LLMs, the possibilities of generative AI have become more apparent. In this post, we showcased ways to use AWS services to create a serverless chatbot that uses generative AI to answer questions. This approach incorporates an authentication layer and Amazon Comprehend’s PII detection to filter out any sensitive information provided in the user’s query. Whether it be individuals in healthcare understanding the nuances to file insurance claims or HR understanding specific company-wide regulations, there’re multiple industries and verticals that can benefit from this approach. An Amazon SageMaker JumpStart foundation model is the engine behind the chatbot, while a context stuffing approach using the RAG technique is used to ensure that the responses more accurately reference internal documents.
To learn more about working with generative AI on AWS, refer to Announcing New Tools for Building with Generative AI on AWS. For more in-depth guidance on using the RAG technique with AWS services, refer to Quickly build high-accuracy Generative AI applications on enterprise data using Amazon Kendra, LangChain, and large language models. Since the approach in this blog is LLM agnostic, any LLM can be used for inference. In our next post, we’ll outline ways to implement this solution using Amazon Bedrock and the Amazon Titan LLM.
About the Authors
Abhishek Maligehalli Shivalingaiah is a Senior AI Services Solution Architect at AWS. He is passionate about building applications using Generative AI, Amazon Kendra and NLP. He has around 10 years of experience in building Data & AI solutions to create value for customers and enterprises. He has even built a (personal) chatbot for fun to answers questions about his career and professional journey. Outside of work he enjoys making portraits of family & friends, and loves creating artworks.
Medha Aiyah is an Associate Solutions Architect at AWS, based in Austin, Texas. She recently graduated from the University of Texas at Dallas in December 2022 with her Masters of Science in Computer Science with a specialization in Intelligent Systems focusing on AI/ML. She is interested to learn more about AI/ML and utilizing AWS services to discover solutions customers can benefit from.
Hugo Tse is an Associate Solutions Architect at AWS based in Seattle, Washington. He holds a Master’s degree in Information Technology from Arizona State University and a bachelor’s degree in Economics from the University of Chicago. He is a member of the Information Systems Audit and Control Association (ISACA) and International Information System Security Certification Consortium (ISC)2. He enjoys helping customers benefit from technology.
Ayman Ishimwe is an Associate Solutions Architect at AWS based in Seattle, Washington. He holds a Master’s degree in Software Engineering and IT from Oakland University. He has a prior experience in software development, specifically in building microservices for distributed web applications. He is passionate about helping customers build robust and scalable solutions on AWS cloud services following best practices.
Shervin Suresh is an Associate Solutions Architect at AWS based in Austin, Texas. He has graduated with a Masters in Software Engineering with a Concentration in Cloud Computing and Virtualization and a Bachelors in Computer Engineering from San Jose State University. He is passionate about leveraging technology to help improve the lives of people from all backgrounds.
Visualize an Amazon Comprehend analysis with a word cloud in Amazon QuickSight
Searching for insights in a repository of free-form text documents can be like finding a needle in a haystack. A traditional approach might be to use word counting or other basic analysis to parse documents, but with the power of Amazon AI and machine learning (ML) tools, we can gather deeper understanding of the content.
Amazon Comprehend is a fully, managed service that uses natural language processing (NLP) to extract insights about the content of documents. Amazon Comprehend develops insights by recognizing the entities, key phrases, sentiment, themes, and custom elements in a document. Amazon Comprehend can create new insights based on understanding the document structure and entity relationships. For example, with Amazon Comprehend, you can scan an entire document repository for key phrases.
Amazon Comprehend lets non-ML experts easily do tasks that normally take hours of time. Amazon Comprehend eliminates much of the time needed to clean, build, and train your own model. For building deeper custom models in NLP or any other domain, Amazon SageMaker enables you to build, train, and deploy models in a much more conventional ML workflow if desired.
In this post, we use Amazon Comprehend and other AWS services to analyze and extract new insights from a repository of documents. Then, we use Amazon QuickSight to generate a simple yet powerful word cloud visual to easily spot themes or trends.
Overview of solution
The following diagram illustrates the solution architecture.
To begin, we gather the data to be analyzed and load it into an Amazon Simple Storage Service (Amazon S3) bucket in an AWS account. In this example, we use text formatted files. The data is then analyzed by Amazon Comprehend. Amazon Comprehend creates a JSON formatted output that needs to be transformed and processed into a database format using AWS Glue. We verify the data and extract specific formatted data tables using Amazon Athena for a QuickSight analysis using a word cloud. For more information about visualizations, refer to Visualizing data in Amazon QuickSight.
Prerequisites
For this walkthrough, you should have the following prerequisites:
- An AWS account
- Access to the AWS Management Console
- Basic database table knowledge
- S3 buckets for input and output data
Upload data to an S3 bucket
Upload your data to an S3 bucket. For this post, we use UTF-8 formatted text of the US Constitution as the input file. Then you’re ready to analyze the data and create visualizations.
Analyze data using Amazon Comprehend
There are many types of text-based and image information that can be processed using Amazon Comprehend. In addition to text files, you can use Amazon Comprehend for one-step classification and entity recognition to to accept image files, PDF files, and Microsoft Word files as input, which are not discussed in this post.
To analyze your data, complete the following steps:
- On the Amazon Comprehend console, choose Analysis jobs in the navigation pane.
- Choose Create analysis job.
- Enter a name for your job.
- For Analysis type, choose Key phrases.
- For Language¸ choose English.
- For Input data location, specify the folder you created as a prerequisite.
- For Output data location, specify the folder you created as a prerequisite.
- Choose Create an IAM role.
- Enter a suffix for the role name.
- Choose Create job.
The job will run and the status will be displayed on the Analysis jobs page.
Wait for the analysis job to complete. Amazon Comprehend will create a file and place it in the output data folder you provided. The file is in .gz or GZIP format.
This file needs to be download and converted to a non-compressed format. You can download an object from the data folder or S3 bucket using the Amazon S3 console.
- On the Amazon S3 console, select the object and choose Download. If you want to download the object to a specific folder, choose Download on the Actions menu.
- After you download the file to your local computer, open the zipped file and save it as an uncompressed file.
The uncompressed file must be uploaded to the output folder before the AWS Glue crawler can process it. For this example, we upload the uncompressed file into the same output folder that we use in later steps.
- On the Amazon S3 console, navigate to your S3 bucket and choose Upload.
- Choose Add files.
- Choose the uncompressed files from your local computer.
- Choose Upload.
After you upload the file, delete the original zipped file.
- On the Amazon S3 console, select the bucket and choose Delete.
- Confirm the file name to permanently delete the file by entering the file name in the text box.
- Choose Delete objects.
This will leave one file remaining in the output folder: the uncompressed file.
Convert JSON data to table format using AWS Glue
In this step, you prepare the Amazon Comprehend output to be used as input into Athena. The Amazon Comprehend output is in JSON format. You can use AWS Glue to convert JSON into a database structure to ultimately be read by QuickSight.
- On the AWS Glue console, choose Crawlers in the navigation pane.
- Choose Create crawler.
- Enter a name for your crawler.
- Choose Next.
- For Is your data already mapped to Glue tables, select Not yet.
- Add a data source.
- For S3 path, enter the location of the Amazon Comprehend output data folder.
Be sure to add the trailing /
to the path name. AWS Glue will search the folder path for all files.
- Select Crawl all sub-folders.
- Choose Add an S3 data source.
- Create a new AWS Identity and Access Management (IAM) role for the crawler.
- Enter a name for the IAM role.
- Choose Update chosen IAM role to be sure the new role is assigned to the crawler.
- Choose Next to enter the output (database) information.
- Choose Add database.
- Enter a database name.
- Choose Next.
- Choose Create crawler.
- Choose Run crawler to run the crawler.
You can monitor the crawler status on the AWS Glue console.
Use Athena to prepare tables for QuickSight
Athena will extract data from the database tables the AWS Glue crawler created to provide a format that QuickSight will use to create the word cloud.
- On the Athena console, choose Query editor in the navigation pane.
- For Data source, choose AwsDataCatalog.
- For Database, choose the database the crawler created.
To create a table compatible for QuickSight, the data must be unnested from the arrays.
- The first step is to create a temporary database with the relevant Amazon Comprehend data:
- The following statement limits to phrases of at least three words and groups by frequency of the phrases:
Use QuickSight to visualize output
Finally, you can create the visual output from the analysis.
- On the QuickSight console, choose New analysis.
- Choose New dataset.
- For Create a dataset, choose From new data sources.
- Choose Athena as the data source.
- Enter a name for the data source and choose Create data source.
- Choose Visualize.
Make sure QuickSight has access to the S3 buckets where the Athena tables are stored.
- On the QuickSight console, choose the user profile icon and choose Manage QuickSight.
- Choose Security & permissions.
- Look for the section QuickSight access to AWS services.
By configuring access to AWS services, QuickSight can access the data in those services. Access by users and groups can be controlled through the options.
- Verify Amazon S3 is granted access.
Now you can create the word cloud.
- Choose the word cloud under Visual types.
- Drag text to Group by and count to Size.
Choose the options menu (three dots) in the visualization to access the edit options. For example, you might want to hide the term “other” from the display. You can also edit items such as the title and subtitle for your visual. To download the word cloud as a PDF, choose Download on the QuickSight toolbar.
Clean up
To avoid incurring ongoing charges, delete any unused data and processes or resources provisioned on their respective service console.
Conclusion
Amazon Comprehend uses NLP to extract insights about the content of documents. It develops insights by recognizing the entities, key phrases, language, sentiments, and other common elements in a document. You can use Amazon Comprehend to create new products based on understanding the structure of documents. For example, with Amazon Comprehend, you can scan an entire document repository for key phrases.
This post described the steps to build a word cloud to visualize a text content analysis from Amazon Comprehend using AWS tools and QuickSight to visualize the data.
Let’s stay in touch via the comments section!
About the Authors
Kris Gedman is the US East sales leader for Retail & CPG at Amazon Web Services. When not working, he enjoys spending time with his friends and family, especially summers on Cape Cod. Kris is a temporarily retired Ninja Warrior but he loves watching and coaching his two sons for now.
Clark Lefavour is a Solutions Architect leader at Amazon Web Services, supporting enterprise customers in the East region. Clark is based in New England and enjoys spending time architecting recipes in the kitchen.
Amazon SageMaker simplifies the Amazon SageMaker Studio setup for individual users
Today, we are excited to announce the simplified Quick setup experience in Amazon SageMaker. With this new capability, individual users can launch Amazon SageMaker Studio with default presets in minutes.
SageMaker Studio is an integrated development environment (IDE) for machine learning (ML). ML practitioners can perform all ML development steps—from preparing their data to building, training, and deploying ML models—within a single, integrated visual interface. You also get access to a large collection of models and pre-built solutions that you can deploy with a few clicks.
To use SageMaker Studio or other personal apps such as Amazon SageMaker Canvas, or to collaborate in shared spaces, AWS customers need to first set up a SageMaker domain. A SageMaker domain consists of an associated Amazon Elastic File System (Amazon EFS) volume, a list of authorized users, and a variety of security, application, policy, and Amazon Virtual Private Cloud (Amazon VPC) configurations. When a user is onboarded to a SageMaker domain, they are assigned a user profile that they can use to launch their apps. User authentication can be via AWS IAM Identity Center (successor to AWS Single Sign-On) or AWS Identity and Access Management (IAM).
Setting up a SageMaker domain and associated user profiles requires understanding the concepts of IAM roles, domains, authentication, and VPCs, and going through a number of configuration steps. To complete these configuration steps, data scientists and developers typically work with their IT admin teams who provision SageMaker Studio and set up the right guardrails.
Customers told us that the onboarding process can sometimes be time consuming, delaying data scientists and ML teams from getting started with SageMaker Studio. We listened and simplified the onboarding experience!
Introducing the simplified Quick Studio setup
The new Quick Studio setup experience for SageMaker provides a new onboarding and administration experience that makes it easy for individual users to set up and manage SageMaker Studio. Data scientists and ML admins can set up SageMaker Studio in minutes with a single click. SageMaker takes care of provisioning the SageMaker domain with default presets, including setting up the IAM role, IAM authentication, and public internet mode. ML admins can alter SageMaker Studio settings for the created domain and customize the UI further at any time. Let’s take a look at how it works.
Prerequisites
To use the Quick Studio setup, you need the following:
- An AWS account
- An IAM role with permissions to create the resources needed to set up a SageMaker domain
Use the Quick Studio setup option
Let’s discuss a scenario where a new user wants to access SageMaker Studio. The user experience includes the following steps:
- In your AWS account, navigate to the SageMaker console and choose Set up for single user.
SageMaker starts preparing the SageMaker domain. This process typically takes a few minutes. The new domain’s name is prefixed with QuickSetupDomain-
.
As soon as the SageMaker domain is ready, a notification appears on the screen stating “The SageMaker Domain is ready” and the user profile under the domain is also created successfully.
- Choose Launch next to the created user profile and choose Studio.
Because it’s the first time SageMaker Studio is getting launched for this user profile, SageMaker creates a new JupyterServer app, which takes a few minutes.
A few minutes later, the Studio IDE loads and you’re presented with the SageMaker Studio Home page.
Components of the Quick Studio setup
When using the Quick Studio setup, SageMaker creates the following resources:
- A new IAM role with the appropriate permissions for using SageMaker Studio, Amazon Simple Storage Service (Amazon S3), and SageMaker Canvas. You can modify the permissions of the created IAM role at any time based on your use case or persona-specific requirements.
- Another IAM role prefixed with
AmazonSagemakerCanvasForecastRole-
, which enables permissions for the SageMaker Canvas time series forecasting feature. - A SageMaker Studio domain and a user profile for the domain with unique names. IAM is used as the authentication mode. The IAM role created is used as the default SageMaker execution role for the domain and user profile. You can launch any of the personal apps available, such as SageMaker Studio and SageMaker Canvas, which are enabled by default.
- An EFS volume, which serves as the file system for SageMaker Studio. Apart from Amazon EFS, a new S3 bucket with prefix
sagemaker-studio-
is created for notebook sharing.
SageMaker Studio also uses the default VPC and its associated subnets. If there is no default VPC, or if the default VPC has no subnets, then it selects one of the existing VPCs that has associated subnets. If there is no VPC, it will prompt the user to create one on the Amazon VPC console. The VPC with all subnets under it are used to set up Amazon EFS.
Conclusion
Now, a single click is all it takes to get started with SageMaker Studio. The Quick Studio setup for individual users is available in all AWS commercial Regions where SageMaker is currently available.
Try out this new feature on the SageMaker console and let us know what you think. We always look forward to your feedback! You can send it through your usual AWS Support contacts or post it on the AWS Forum for SageMaker.
About the authors
Vikesh Pandey is a Machine Learning Specialist Solutions Architect at AWS, helping customers from financial industries design and build solutions on generative AI and ML. Outside of work, Vikesh enjoys trying out different cuisines and playing outdoor sports.
Anastasia Tzeveleka is a Machine Learning and AI Specialist Solutions Architect at AWS. She works with customers in EMEA and helps them architect machine learning solutions at scale using AWS services. She has worked on projects in different domains including natural language processing (NLP), MLOps, and low-code/no-code tools.
Unlocking language barriers: Translate application logs with Amazon Translate for seamless support
Application logs are an essential piece of information that provides crucial insights into the inner workings of an application. This includes valuable information such as events, errors, and user interactions that would aid an application developer or an operations support engineer to debug and provide support. However, when these logs are presented in languages other than English, it creates a significant hurdle for developers who can’t read the content, and hinders the support team’s ability to identify and address issues promptly.
In this post, we explore a solution on how you can unlock language barriers using Amazon Translate, a fully managed neural machine translation service for translating text to and from English across a wide range of supported languages. The solution will complement your existing logging workflows by automatically translating all your applications logs in Amazon CloudWatch in real time, which can alleviate the challenges posed by non-English application logs.
Solution overview
This solution shows you how you can use three key services to automate the translation of your application logs in an event-driven manner:
- CloudWatch Logs is used to monitor, store, and access your log files generated from various sources such as AWS services and your applications
- Amazon Translate is used to perform the translation of text to and from English
- AWS Lambda is a compute service that lets you run codes to retrieve application logs and translate them through the use of the Amazon Translate SDK
The following diagram illustrates the solution architecture.
The workflow consists of the following steps:
- A custom or third-party application is hosted on an Amazon Elastic Compute Cloud (Amazon EC2) instance and the generated application logs are uploaded to CloudWatch Logs via the CloudWatch Logs agent.
- Each log entry written to CloudWatch Logs triggers the Lambda function subscribed to the CloudWatch log group.
- The function processes the contents of the log entry and uses Amazon Translate SDK translate_text to translate the log content.
- The translated log content is returned to the function.
- The function writes the translated log content back to CloudWatch Logs in a different log group.
The entire process happens automatically in real time, and your developers will be able to access the translated application logs from the CloudWatch log groups with no change in how your existing application writes logs to CloudWatch.
Prerequisites
To follow through the instructions in this solution, you need an AWS account with an AWS Identity and Access Management (IAM) user who has permission to AWS CloudFormation, Amazon Translate, CloudWatch, Lambda, and IAM.
Deploy the solution
To get started, launch the following CloudFormation template to create a Lambda function, two CloudWatch log groups, and IAM role. Proceed to deploy with the default settings. This template takes about 1 minute to complete.
After the stack is created successfully, you can review the Lambda function by navigating to the Lambda console and locating the function translate-application-logs
.
You can observe that there is a CloudWatch Logs trigger added to the function.
You can view the details of the trigger configuration by navigating to the Configuration tab and choosing Triggers in the navigation pane.
You can confirm that the trigger has been configured to subscribe to log events from the log group /applicationlogs
. This is where your non-English application logs will be written to.
Next, choose Environment variables in the navigation pane.
Two environment variables are provided here:
- source_language – The original language that the application log is in (for example, ja for Japanese)
- target_language – The target language to translate the application log to (for example, en for English)
For a list of supported languages, refer to Supported languages and language codes.
Next, go to the Code tab and review the function logic:
Test the solution
Finally, to test the solution, you can create a log message through the CloudWatch console and choose the created log group and log stream.
After creating your log messages, you will be able to see it translated immediately.
Clean up
To clean up the resources created in this post, delete the CloudFormation stack via the CloudFormation console.
Conclusion
This post addressed the challenge faced by developers and support teams when application logs are presented in languages other than English, making it difficult for them to debug and provide support. The proposed solution uses Amazon Translate to automatically translate non-English logs in CloudWatch, and provides step-by-step guidance on deploying the solution in your environment. Through this implementation, developers can now seamlessly bridge the language barrier, empowering them to address issues swiftly and effectively.
Try out this implementation and let us know your thoughts in the comments.
About the author
Xan Huang is a Senior Solutions Architect with AWS and is based in Singapore. He works with major financial institutions to design and build secure, scalable, and highly available solutions in the cloud. Outside of work, Xan spends most of his free time with his family and documenting his daughter’s growing up journey.
Accelerate client success management through email classification with Hugging Face on Amazon SageMaker
This is a guest post from Scalable Capital, a leading FinTech in Europe that offers digital wealth management and a brokerage platform with a trading flat rate.
As a fast-growing company, Scalable Capital’s goals are to not only build an innovative, robust, and reliable infrastructure, but to also provide the best experiences for our clients, especially when it comes to client services.
Scalable receives hundreds of email inquiries from our clients on a daily basis. By implementing a modern natural language processing (NLP) model, the response process has been shaped much more efficiently, and waiting time for clients has been reduced tremendously. The machine learning (ML) model classifies new incoming customer requests as soon as they arrive and redirects them to predefined queues, which allows our dedicated client success agents to focus on the contents of the emails according to their skills and provide appropriate responses.
In this post, we demonstrate the technical benefits of using Hugging Face transformers deployed with Amazon SageMaker, such as training and experimentation at scale, and increased productivity and cost-efficiency.
Problem statement
Scalable Capital is one of the fastest growing FinTechs in Europe. With the aim to democratize investment, the company provides its clients with easy access to the financial markets. Clients of Scalable can actively participate in the market through the company’s brokerage trading platform, or use Scalable Wealth Management to invest in an intelligent and automated fashion. In 2021, Scalable Capital experienced a tenfold increase of its client base, from tens of thousands to hundreds of thousands.
To provide our clients with a top-class (and consistent) user experience across products and client service, the company was looking for automated solutions to generate efficiencies for a scalable solution while maintaining operational excellence. Scalable Capital’s data science and client service teams identified that one of the largest bottlenecks in servicing our clients was responding to email inquiries. Specifically, the bottleneck was the classification step, in which employees had to read and label request texts on a daily basis. After the emails were routed to their proper queues, the respective specialists quickly engaged and resolved the cases.
To streamline this classification process, the data science team at Scalable built and deployed a multitask NLP model using state-of-the-art transformer architecture, based on the pre-trained distilbert-base-german-cased model published by Hugging Face. distilbert-base-german-cased uses the knowledge distillation method to pretrain a smaller general-purpose language representation model than the original BERT base model. The distilled version achieves comparable performance to the original version, while being smaller and faster. To facilitate our ML lifecycle process, we decided to adopt SageMaker to build, deploy, serve, and monitor our models. In the following section, we introduce our project architecture design.
Solution overview
Scalable Capital’s ML infrastructure consists of two AWS accounts: one as an environment for the development stage and the other one for the production stage.
The following diagram shows the workflow for our email classifier project, but can also be generalized to other data science projects.

Email classification project diagram
The workflow consists of the following components:
- Model experimentation – Data scientists use Amazon SageMaker Studio to carry out the first steps in the data science lifecycle: exploratory data analysis (EDA), data cleaning and preparation, and building prototype models. When the exploratory phase is complete, we turn to VSCode hosted by a SageMaker notebook as our remote development tool to modularize and productionize our code base. To explore different types of models and model configurations, and at the same time to keep track of our experimentations, we use SageMaker Training and SageMaker Experiments.
- Model build – After we decide on a model for our production use case, in this case a multi-task distilbert-base-german-cased model, fine-tuned from the pretrained model from Hugging Face, we commit and push our code to Github develop branch. The Github merge event triggers our Jenkins CI pipeline, which in turn starts a SageMaker Pipelines job with test data. This acts as a test to make sure that codes are running as expected. A test endpoint is deployed for testing purposes.
- Model deployment – After making sure that everything is running as expected, data scientists merge the develop branch into the primary branch. This merge event now triggers a SageMaker Pipelines job using production data for training purposes. Afterwards, model artifacts are produced and stored in an output Amazon Simple Storage Service (Amazon S3) bucket, and a new model version is logged in the SageMaker model registry. Data scientists examine the performance of the new model, then approve if it’s in line with expectations. The model approval event is captured by Amazon EventBridge, which then deploys the model to a SageMaker endpoint in the production environment.
- MLOps – Because the SageMaker endpoint is private and can’t be reached by services outside of the VPC, an AWS Lambda function and Amazon API Gateway public endpoint are required to communicate with CRM. Whenever new emails arrive in the CRM inbox, CRM invokes the API Gateway public endpoint, which in turn triggers the Lambda function to invoke the private SageMaker endpoint. The function then relays the classification back to CRM through the API Gateway public endpoint. To monitor the performance of our deployed model, we implement a feedback loop between CRM and the data scientists to keep track of prediction metrics from the model. On a monthly basis, CRM updates the historical data used for experimentation and model training. We use Amazon Managed Workflows for Apache Airflow (Amazon MWAA) as a scheduler for our monthly retrain.
In the following sections, we break down the data preparation, model experimentation, and model deployment steps in more detail.
Data preparation
Scalable Capital uses a CRM tool for managing and storing email data. Relevant email contents consist of subject, body, and the custodian banks. There are three labels to assign to each email: which line of business the email is from, which queue is appropriate, and the specific topic of the email.
Before we start training any NLP models, we ensure that the input data is clean and the labels are assigned according to expectation.
To retrieve clean inquiry contents from Scalable clients, we remove from raw email data and extra text and symbols, such as email signatures, impressums, quotes of previous messages in email chains, CSS symbols, and so on. Otherwise, our future trained models might experience degraded performance.
Labels for emails evolve over time as Scalable client service teams add new ones and refine or remove existing ones to accommodate business needs. To make sure that labels for training data as well as expected classifications for prediction are up to date, the data science team works in close collaboration with the client service team to ensure the correctness of the labels.
Model experimentation
We start our experiment with the readily available pre-trained distilbert-base-german-cased model published by Hugging Face. Because the pre-trained model is a general-purpose language representation model, we can adapt the architecture to perform specific downstream tasks—such as classification and question answering—by attaching appropriate heads to the neural network. In our use case, the downstream task we are interested in is sequence classification. Without modifying the existing architecture, we decide to fine-tune three separate pre-trained models for each of our required categories. With the SageMaker Hugging Face Deep Learning Containers (DLCs), starting and managing NLP experiments are made simple with Hugging Face containers and the SageMaker Experiments API.
The following is a code snippet of train.py
:
The following code is the Hugging Face estimator:
To validate the fine-tuned models, we use the F1-score due to the imbalanced nature of our email dataset, but also to compute other metrics such as accuracy, precision, and recall. For the SageMaker Experiments API to register the training job’s metrics, we need to first log the metrics to the training job local console, which are picked up by Amazon CloudWatch. Then we define the correct regex format to capture the CloudWatch logs. The metric definitions include the name of the metrics and regex validation for extracting the metrics from the training job:
As part of the training iteration for the classifier model, we use a confusion matrix and classification report to evaluate the result. The following figure shows the confusion matrix for line of business prediction.

Confusion Matrix
The following screenshot shows an example of the classification report for line of business prediction.

Classification Report
As a next iteration of our experiment, we’ll take advantage of multi-task learning to improve our model. Multi-task learning is a form of training where a model learns to solve multiple tasks simultaneously, because the shared information among tasks can improve learning efficiencies. By attaching two more classification heads to the original distilbert architecture, we can carry out multi-task fine-tuning, which attains reasonable metrics for our client service team.
Model deployment
In our use case, the email classifier is to be deployed to an endpoint, to which our CRM pipeline can send a batch of unclassified emails and get back predictions. Because we have other logics—such as input data cleaning and multi-task predictions—in addition to Hugging Face model inference, we need to write a custom inference script that adheres to the SageMaker standard.
The following is a code snippet of inference.py
:
When everything is up and ready, we use SageMaker Pipelines to manage our training pipeline and attach it to our infrastructure to complete our MLOps setup.
To monitor the performance of the deployed model, we build a feedback loop to enable CRM to provide us with the status of classified emails when cases are closed. Based on this information, we make adjustments to improve the deployed model.
Conclusion
In this post, we shared how SageMaker facilitates the data science team at Scalable to manage the lifecycle of a data science project efficiently, namely the email classifier project. The lifecycle starts with the initial phase of data analysis and exploration with SageMaker Studio; moves on to model experimentation and deployment with SageMaker training, inference, and Hugging Face DLCs; and completes with a training pipeline with SageMaker Pipelines integrated with other AWS services. Thanks to this infrastructure, we are able to iterate and deploy new models more efficiently, and are therefore able to improve existing processes within Scalable as well as our clients’ experiences.
To learn more about Hugging Face and SageMaker, refer to the following resources:
- Use Hugging Face with Amazon SageMaker
- What are AWS Deep Learning Containers?
- Use Version 2.x of the SageMaker Python SDK: Frameworks: Hugging Face
About the Authors
Dr. Sandra Schmid is Head of Data Analytics at Scalable GmbH. She is responsible for data-driven approaches and use cases in the company together with her teams. Her key focus is finding the best combination of machine learning and data science models and business goals in order to gain as much business value and efficiencies out of data as possible.
Huy Dang Data Scientist at Scalable GmbH. His responsibilities include data analytics, building and deploying machine learning models, as well as developing and maintaining infrastructure for the data science team. In his spare time, he enjoys reading, hiking, rock climbing, and staying up to date with the latest machine learning developments.
Mia Chang is a ML Specialist Solutions Architect for Amazon Web Services. She works with customers in EMEA and shares best practices for running AI/ML workloads on the cloud with her background in applied mathematics, computer science, and AI/ML. She focuses on NLP-specific workloads, and shares her experience as a conference speaker and a book author. In her free time, she enjoys yoga, board games, and brewing coffee.
Moritz Guertler is an Account Executive in the Digital Native Businesses segment at AWS. He focuses on customers in the FinTech space and supports them in accelerating innovation through secure and scalable cloud infrastructure.
Falcon 180B foundation model from TII is now available via Amazon SageMaker JumpStart
Today, we are excited to announce that the Falcon 180B foundation model developed by Technology Innovation Institute (TII) is available for customers through Amazon SageMaker JumpStart to deploy with one-click for running inference. With a 180-billion-parameter size and trained on a massive 3.5-trillion-token dataset, Falcon 180B is the largest and one of the most performant models with openly accessible weights. You can try out this model with SageMaker JumpStart, a machine learning (ML) hub that provides access to algorithms, models, and ML solutions so you can quickly get started with ML. In this post, we walk through how to discover and deploy the Falcon 180B model via SageMaker JumpStart.
What is Falcon 180B
Falcon 180B is a model released by TII that follows previous releases in the Falcon family. It’s a scaled-up version of Falcon 40B, and it uses multi-query attention for better scalability. It’s an auto-regressive language model that uses an optimized transformer architecture. It was trained on 3.5 trillion tokens of data, primarily consisting of web data from RefinedWeb (approximately 85%). The model has two versions: 180B and 180B-Chat. 180B is a raw, pre-trained model, which should be further fine-tuned for most use cases. 180B-Chat is better suited to taking generic instructions. The Chat model has been fine-tuned on chat and instructions datasets together with several large-scale conversational datasets.
The model is made available under the Falcon-180B TII License and Acceptable Use Policy.
Falcon 180B was trained by TII on Amazon SageMaker, on a cluster of approximately 4K A100 GPUs. It used a custom distributed training codebase named Gigatron, which uses 3D parallelism with ZeRO, and custom, high-performance Triton kernels. The distributed training architecture used Amazon Simple Storage Service (Amazon S3) as the sole unified service for data loading and checkpoint writing and reading, which particularly contributed to the workload reliability and operational simplicity.
What is SageMaker JumpStart
With SageMaker JumpStart, ML practitioners can choose from a growing list of best-performing foundation models. ML practitioners can deploy foundation models to dedicated SageMaker instances within a network isolated environment, and customize models using Amazon SageMaker for model training and deployment.
You can now discover and deploy Falcon 180B with a few clicks in Amazon SageMaker Studio or programmatically through the SageMaker Python SDK, enabling you to derive model performance and MLOps controls with SageMaker features such as Amazon SageMaker Pipelines, Amazon SageMaker Debugger, or container logs. The model is deployed in an AWS secure environment and under your VPC controls, helping ensure data security. Falcon 180B is discoverable and can be deployed in Regions where the requisite instances are available. At present, ml.p4de instances are available in US East (N. Virginia) and US West (Oregon).
Discover models
You can access the foundation models through SageMaker JumpStart in the SageMaker Studio UI and the SageMaker Python SDK. In this section, we go over how to discover the models in SageMaker Studio.
SageMaker Studio is an integrated development environment (IDE) that provides a single web-based visual interface where you can access purpose-built tools to perform all ML development steps, from preparing data to building, training, and deploying your ML models. For more details on how to get started and set up SageMaker Studio, refer to Amazon SageMaker Studio.
In SageMaker Studio, you can access SageMaker JumpStart, which contains pre-trained models, notebooks, and prebuilt solutions, under Prebuilt and automated solutions.
From the SageMaker JumpStart landing page, you can browse for solutions, models, notebooks, and other resources. You can find Falcon 180B in the Foundation Models: Text Generation carousel.
You can also find other model variants by choosing Explore all Text Generation Models or searching for Falcon
.
You can choose the model card to view details about the model such as license, data used to train, and how to use. You will also find two buttons, Deploy and Open Notebook, which will help you use the model (the following screenshot shows the Deploy option).
Deploy models
When you choose Deploy, the model deployment will start. Alternatively, you can deploy through the example notebook that shows up by choosing Open Notebook. The example notebook provides end-to-end guidance on how to deploy the model for inference and clean up resources.
To deploy using a notebook, we start by selecting an appropriate model, specified by the model_id
. You can deploy any of the selected models on SageMaker with the following code:
This deploys the model on SageMaker with default configurations, including the default instance type and default VPC configurations. You can change these configurations by specifying non-default values in JumpStartModel
. To learn more, refer to the API documentation. After it’s deployed, you can run inference against the deployed endpoint through a SageMaker predictor. See the following code:
Inference parameters control the text generation process at the endpoint. The max new tokens control refers to the size of the output generated by the model. Note that this is not the same as the number of words because the vocabulary of the model is not the same as the English language vocabulary and each token may not be an English language word. Temperature controls the randomness in the output. Higher temperature results in more creative and hallucinated outputs. All the inference parameters are optional.
This 180B parameter model is 335GB and requires even more GPU memory to sufficiently perform inference in 16-bit precision. Currently, JumpStart only supports this model on ml.p4de.24xlarge instances. It is possible to deploy an 8-bit quantized model on a ml.p4d.24xlarge instance by providing the env={"HF_MODEL_QUANTIZE": "bitsandbytes"}
keyword argument to the JumpStartModel
constructor and specifying instance_type="ml.p4d.24xlarge"
to the deploy method. However, please note that per-token latency is approximately 5x slower for this quantized configuration.
The following table lists all the Falcon models available in SageMaker JumpStart along with the model IDs, default instance types, maximum number of total tokens (sum of the number of input tokens and number of generated tokens) supported, and the typical response latency per token for each of these models.
Model Name | Model ID | Default Instance Type | Max Total Tokens | Latency per Token* |
Falcon 7B | huggingface-llm- falcon-7b-bf16 |
ml.g5.2xlarge | 2048 | 34 ms |
Falcon 7B Instruct | huggingface-llm- falcon-7b-instruct-bf16 |
ml.g5.2xlarge | 2048 | 34 ms |
Falcon 40B | huggingface-llm- falcon-40b-bf16 |
ml.g5.12xlarge | 2048 | 57 ms |
Falcon 40B Instruct | huggingface-llm- falcon-40b-instruct-bf16 |
ml.g5.12xlarge | 2048 | 57 ms |
Falcon 180B | huggingface-llm- falcon-180b-bf16 |
ml.p4de.24xlarge | 2048 | 45 ms |
Falcon 180B Chat | huggingface-llm- falcon-180b-chat-bf16 |
ml.p4de.24xlarge | 2048 | 45 ms |
*per-token latency is provided for the median response time of the example prompts provided in this blog; this value will vary based on length of input and output sequences.
Inference and example prompts for Falcon 180B
Falcon models can be used for text completion for any piece of text. Through text generation, you can perform a variety of tasks, such as answering questions, language translation, sentiment analysis, and many more. The endpoint accepts the following input payload schema:
You can explore the definition of these client parameters and their default values within the text-generation-inference repository.
The following are some sample example prompts and the text generated by the model. All outputs here are generated with inference parameters {"max_new_tokens": 768, "stop": ["<|endoftext|>", "###"]}
.
Building a website can be done in 10 simple steps:
You may notice this pretrained model generates long text sequences that are not necessarily ideal for dialog use cases. Before we show how the fine-tuned chat model performs for a larger set of dialog-based prompts, the next two examples illustrate how to use Falcon models with few-shot in-context learning, where we provide training samples available to the model. Note that “few-shot learning” does not adjust model weights — we only perform inference on the deployed model during this process while providing a few examples within the input context to help guild model output.
Inference and example prompts for Falcon 180B-Chat
With Falcon 180B-Chat models, optimized for dialogue use cases, the input to the chat model endpoints may contain previous history between the chat assistant and the user. You can ask questions contextual to the conversation that has happened so far. You can also provide the system configuration, such as personas, which define the chat assistant’s behavior. Input payload to the endpoint is the same as the Falcon 180B model except the inputs
string value should use the following format:
The following are some sample example prompts and the text generated by the model. All outputs are generated with inference parameters {"max_new_tokens":256, "stop": ["nUser:", "<|endoftext|>", " User:", "###"]}.
In the following example, the user has had a conversation with the assistant about tourist sites in Paris. Next, the user is inquiring about the first option recommended by the chat assistant.
Clean up
After you’re done running the notebook, make sure to delete all resources that you created in the process so your billing is stopped. Use the following code:
Conclusion
In this post, we showed you how to get started with Falcon 180B in SageMaker Studio and deploy the model for inference. Because foundation models are pre-trained, they can help lower training and infrastructure costs and enable customization for your use case. Visit SageMaker JumpStart in SageMaker Studio now to get started.
Resources
- SageMaker JumpStart documentation
- SageMaker JumpStart Foundation Models documentation
- SageMaker JumpStart product detail page
- SageMaker JumpStart model catalog
About the Authors
Dr. Kyle Ulrich is an Applied Scientist with the Amazon SageMaker JumpStart team. His research interests include scalable machine learning algorithms, computer vision, time series, Bayesian non-parametrics, and Gaussian processes. His PhD is from Duke University and he has published papers in NeurIPS, Cell, and Neuron.
Dr. Ashish Khetan is a Senior Applied Scientist with Amazon SageMaker JumpStart and helps develop machine learning algorithms. He got his PhD from University of Illinois Urbana-Champaign. He is an active researcher in machine learning and statistical inference, and has published many papers in NeurIPS, ICML, ICLR, JMLR, ACL, and EMNLP conferences.
Olivier Cruchant is a Principal Machine Learning Specialist Solutions Architect at AWS, based in France. Olivier helps AWS customers – from small startups to large enterprises – develop and deploy production-grade machine learning applications. In his spare time, he enjoys reading research papers and exploring the wilderness with friends and family.
Karl Albertsen leads Amazon SageMaker’s foundation model hub, algorithms, and partnerships teams.