Amazon AWS – Page 42

How Tealium built a chatbot evaluation platform with Ragas and Auto-Instruct using AWS generative AI services

December 11, 2024

by Suren Gunturu Amazon AWS

This post was co-written with Varun Kumar from Tealium

Retrieval Augmented Generation (RAG) pipelines are popular for generating domain-specific outputs based on external data that’s fed in as part of the context. However, there are challenges with evaluating and improving such systems. Two open-source libraries, Ragas (a library for RAG evaluation) and Auto-Instruct, used Amazon Bedrock to power a framework that evaluates and improves upon RAG.

In this post, we illustrate the importance of generative AI in the collaboration between Tealium and the AWS Generative AI Innovation Center (GenAIIC) team by automating the following:

Evaluating the retriever and the generated answer of a RAG system based on the Ragas Repository powered by Amazon Bedrock.
Generating improved instructions for each question-and-answer pair using an automatic prompt engineering technique based on the Auto-Instruct Repository. An instruction refers to a general direction or command given to the model to guide generation of a response. These instructions were generated using Anthropic’s Claude on Amazon Bedrock.
Providing a UI for a human-based feedback mechanism that complements an evaluation system powered by Amazon Bedrock.

Amazon Bedrock is a fully managed service that makes popular FMs available through an API, so you can choose from a wide range of foundational models (FMs) to find the model that’s best suited for your use case. Because Amazon Bedrock is serverless, you can get started quickly, privately customize FMs with your own data, and integrate and deploy them into your applications without having to manage any infrastructure.

Tealium background and use case

Tealium is a leader in real-time customer data integration and management. They empower organizations to build a complete infrastructure for collecting, managing, and activating customer data across channels and systems. Tealium uses AI capabilities to integrate data and derive customer insights at scale. Their AI vision is to provide their customers with an active system that continuously learns from customer behaviors and optimizes engagement in real time.

Tealium has built a question and answer (QA) bot using a RAG pipeline to help identify common issues and answer questions about using the platform. The bot is expected to act as a virtual assistant to answer common questions, identify and solve issues, monitor platform health, and provide best practice suggestions, all aimed at helping Tealium customers get the most value from their customer data platform.

The primary goal of this solution with Tealium was to evaluate and improve the RAG solution that Tealium uses to power their QA bot. This was achieved by building an:

Evaluation pipeline.
Error correction mechanism to semi-automatically improve upon the metrics generated from evaluation. In this engagement, automatic prompt engineering was the only technique used, but others such as different chunking strategies and using semantic instead of hybrid search can be explored depending on your use case.
A human-in the-loop feedback system allowing the human to approve or disapprove RAG outputs

Amazon Bedrock was vital in powering an evaluation pipeline and error correction mechanism because of its flexibility in choosing a wide range of leading FMs and its ability to customize models for various tasks. This allowed for testing of many types of specialized models on specific data to power such frameworks. The value of Amazon Bedrock in text generation for automatic prompt engineering and text summarization for evaluation helped tremendously in the collaboration with Tealium. Lastly, Amazon Bedrock allowed for more secure generative AI applications, giving Tealium full control over their data while also encrypting it at rest and in transit.

Solution prerequisites

To test the Tealium solution, start with the following:

Get access to an AWS account.
Create a SageMaker domain instance.
Obtain access to the following models on Amazon Bedrock: Anthropic’s Claude Instant, Claude v2, Claude 3 Haiku, and Titan Embeddings G1 – Text. The evaluation using Ragas can be performed using any foundation model (FM) that’s available on Amazon Bedrock. Automatic prompt engineering must use Anthropic’s Claude v2, v2.1, or Claude Instant.
Obtain a golden set of question and answer pairs. Specifically, you need to provide examples of questions that you will ask the RAG bot and their expected ground truths.
Clone automatic prompt engineering and human-in-the-loop repositories. If you want access to a Ragas repository with prompts favorable towards Anthropic Claude models available on Amazon Bedrock, clone and navigate through this repository and this notebook.

The code repositories allow for flexibility of various FMs and customized models with minimal updates, illustrating Amazon Bedrock’s value in this engagement.

Solution overview

The following diagram illustrates a sample solution architecture that includes an evaluation framework, error correction technique (Auto-Instruct and automatic prompt engineering), and human-in-the-loop. As you can see, generative AI is an important part of the evaluation pipeline and the automatic prompt engineering pipeline.

The workflow consists of the following steps:

You first enter a query into the Tealium RAG QA bot. The RAG solution uses FAISS to retrieve an appropriate context for the specified query. Then, it outputs a response.
Ragas takes in this query, context, answer, and a ground truth that you input, and calculates faithfulness, context precision, context recall, answer correctness, answer relevancy, and answer similarity. Ragas can be integrated with Amazon Bedrock (look at the Ragas section of the notebook link). This illustrates integrating Amazon Bedrock in different frameworks.
If any of the metrics are below a certain threshold, the specific question and answer pair is run by the Auto-Instruct library, which generates candidate instructions using Amazon Bedrock. Various FMs can be used for this text generation use case.
The new instructions are appended to the original query to be prepared to be run by the Tealium RAG QA bot.
The QA bot runs an evaluation to determine whether improvements have been made. Steps 3 and 4 can be iterated until all metrics are above a certain threshold. In addition, you can set a maximum number of times steps 3 and 4 are iterated to prevent an infinite loop.
A human-in-the-loop UI is used to allow a subject matter expert (SME) to provide their own evaluation on given model outputs. This can also be used to provide guard rails against a system powered by generative AI.

In the following sections, we discuss how an example question, its context, its answer (RAG output) and ground truth (expected answer) can be evaluated and revised for a more ideal output. The evaluation is done using Ragas, a RAG evaluation library. Then, prompts and instructions are automatically generated based on their relevance to the question and answer. Lastly, you can approve or disapprove the RAG outputs based on the specific instruction generated from the automatic prompt engineering step.

Out-of-scope

Error correction and human-in-the-loop are two important aspects in this post. However, for each component, the following is out-of-scope, but can be improved upon in future iterations of the solution:

Error correction mechanism

Automatic prompt engineering is the only method used to correct the RAG solution. This engagement didn’t go over other techniques to improve the RAG solution; such as using Amazon Bedrock to find optimal chunking strategies, vector stores, models, semantic or hybrid search, and other mechanisms. Further testing needs to be done to evaluate whether FMs from Amazon Bedrock can be a good decision maker for such parameters of a RAG solution.
Based on the technique presented for automatic prompt engineering, there might be opportunities to optimize the cost. This wasn’t analyzed during the engagement. Disclaimer: The technique described in this post might not be the most optimal approach in terms of cost.

Human-in-the-loop

SMEs provide their evaluation of the RAG solution by approving and disapproving FM outputs. This feedback is stored in the user’s file directory. There is an opportunity to improve upon the model based on this feedback, but this isn’t touched upon in this post.

Ragas – Evaluation of RAG pipelines

Ragas is a framework that helps evaluate a RAG pipeline. In general, RAG is a natural language processing technique that uses external data to augment an FM’s context. Therefore, this framework evaluates the ability for the bot to retrieve relevant context as well as output an accurate response to a given question. The collaboration between the AWS GenAIIC and the Tealium team showed the success of Amazon Bedrock integration with Ragas with minimal changes.

The inputs to Ragas include a set of questions, ground truths, answers, and contexts. For each question, an expected answer (ground truth), LLM output (answer), and a list of contexts (retrieved chunks) were inputted. Context recall, precision, answer relevancy, faithfulness, answer similarity, and answer correctness were evaluated using Anthropic’s Claude on Amazon Bedrock (any version). For your reference, here are the metrics that have been successfully calculated using Amazon Bedrock:

Faithfulness – This measures the factual consistency of the generated answer against the given context, so it requires the answer and retrieved context as an input. This is a two-step prompt where the generated answer is first broken down into multiple standalone statements and propositions. Then, the evaluation LLM validates the attribution of the generated statement to the context. If the attribution can’t be validated, it’s assumed that the statement is at risk of hallucination. The answer is scaled to a 0–1 range; the higher the better.
Context precision – This evaluates the relevancy of the context to the answer, or in other words, the retriever’s ability to capture the best context to answer your query. An LLM verifies if the information in the given context is directly relevant to the question with a single “Yes” or “No” response. The context is passed in as a list, so if the list is size one (one chunk), then the metric for context precision is either 0 (representing the context isn’t relevant to the question) or 1 (representing that it is relevant). If the context list is greater than one (or includes multiple chunks), then context precision is between 0–1, representing a specific weighted average precision calculation. This involves the context precision of the first chunk being weighted heavier than the second chunk, which itself is weighted heavier than the third chunk, and onwards, taking into account the ordering of the chunks being outputted as contexts.
Context recall – This measures the alignment between the context and the expected RAG output, the ground truth. Similar to faithfulness, each statement in the ground truth is checked to see if it is attributed to the context (thereby evaluating the context).
Answer similarity – This assesses the semantic similarity between the RAG output (answer) and expected answer (ground truth), with a range between 0–1. A higher score signifies better performance. First, the embeddings of answer and ground truth are created, and then a score between 0–1 is predicted, representing the semantic similarity of the embeddings using a cross encoder Tiny BERT model.
Answer relevance – This focuses on how pertinent the generated RAG output (answer) is to the question. A lower score is assigned to answers that are incomplete or contain redundant information. To calculate this score, the LLM is asked to generate multiple questions from a given answer. Then using an Amazon Titan Embeddings model, embeddings are generated for the generated question and the actual question. The metric therefore is the mean cosine similarity between all the generated questions and the actual question.
Answer correctness – This is the accuracy between the generated answer and the ground truth. This is calculated from the semantic similarity metric between the answer and the ground truth in addition to a factual similarity by looking at the context. A threshold value is used if you want to employ a binary 0 or 1 answer correctness score, otherwise a value between 0–1 is generated.

AutoPrompt – Automatically generate instructions for RAG

Secondly, generative AI services were shown to successfully generate and select instructions for prompting FMs. In a nutshell, instructions are generated by an FM that best map a question and context to the RAG QA bot answer based on a certain style. This process was done using the Auto-Instruct library. The approach harnesses the ability of FMs to produce candidate instructions, which are then ranked using a scoring model to determine the most effective prompts.

First, you need to ask an Anthropic’s Claude model on Amazon Bedrock to generate an instruction for a set of inputs (question and context) that map to an output (answer). The FM is then asked to generate a specific type of instruction, such as a one-paragraph instruction, one-sentence instruction, or step-by-step instruction. Many candidate instructions are then generated. Look at the generate_candidate_prompts() function to see the logic in code.

Then, the resulting candidate instructions are tested against each other using an evaluation FM. To do this, first, each instruction is compared against all other instructions. Then, the evaluation FM is used to evaluate the quality of the prompts for a given task (query plus context to answer pairs). The evaluation logic for a sample pair of candidate instructions is shown in the test_candidate_prompts() function.

This outputs the most ideal prompt generated by the framework. For each question-and-answer pair, the output includes the best instruction, second best instruction, and third best instruction.

For a demonstration of performing automatic prompt engineering (and calling Ragas):

Navigate through the following notebook.
Code snippets for how candidate prompts are generated and evaluated are included in this source file with their associated prompts included in this config file.

You can review the full repository for automatic prompt engineering using FMs from Amazon Bedrock.

Human-in-the-loop evaluation

So far, you have learned about the applications of FMs in their generation of quantitative metrics and prompts. However, depending on the use case, they need to be aligned with human evaluators’ preferences to have ultimate confidence in these systems. This section presents a HITL web UI (Streamlit) demonstration, showing a side-by-side comparison of instructions and question inputs and RAG outputs. This is shown in the following image:

The structure of the UI is:

On the left, select an FM and two instruction templates (as marked by the index number) to test. After you choose Start, you will see the instructions on the main page.
The top text box on the main page is the query.
The text box below that is the first instruction sent to the LLM as chosen by the index number in the first bullet point.
The text box below the first instruction is the second instruction sent to the LLM as chosen by the index number in the first bullet point.
Then comes the model output for Prompt A, which is the output when the first instruction and query is sent to the LLM. This is compared against the model output for Prompt B, which is the output when the second instruction and query is sent to the LLM.
You can give your feedback for the two outputs, as shown in the following image.

After you input your results, they’re saved in a file in your directory. These can be used for further enhancement of the RAG solution.

Follow the instructions in this repository to run your own human-in-the-loop UI.

Chatbot live evaluation metrics

Amazon Bedrock has been used to continuously analyze the bot performance. The following are the latest results using Ragas:

.	Context Utilization	Faithfulness	Answer Relevancy
Count	714	704	714
Mean	0.85014	0.856887	0.7648831
Standard Deviation	0.357184	0.282743	0.304744
Min	0	0	0
25%	1	1	0.786385
50%	1	1	0.879644
75%	1	1	0.923229
Max	1	1	1

The Amazon Bedrock-based chatbot with Amazon Titan embeddings achieved 85% context utilization, 86% faithfulness, and 76% answer relevancy.

Conclusion

Overall, the AWS team was able to use various FMs on Amazon Bedrock using the Ragas library to evaluate Tealium’s RAG QA bot when inputted with a query, RAG response, retrieved context, and expected ground truth. It did this by finding out if:

The RAG response is attributed to the context.
The context is attributed to the query.
The ground truth is attributed to the context.
Whether the RAG response is relevant to the question and similar to the ground truth.

Therefore, it was able to evaluate a RAG solution’s ability to retrieve relevant context and answer the sample question accurately.

In addition, an FM was able to generate multiple instructions from a question-and-answer pair and rank them based on the quality of the responses. After instructions were generated, it was able to slightly improve errors in the LLM response. The human in the loop demonstration provides a side-by-side view of outputs for different prompts and instructions. This was an enhanced thumbs up/thumbs down approach to further improve inputs to the RAG bot based on human feedback.

Some next steps with this solution include the following:

Improving RAG performance using different models or different chunking strategies based on specific metrics
Testing out different strategies to optimize the cost (number of FM calls) to evaluate generated instructions in the automatic prompt engineering phase
Allowing SME feedback in the human evaluation step to automatically improve upon ground truth or instruction templates

The value of Amazon Bedrock was shown throughout the collaboration with Tealium. The flexibility of Amazon Bedrock in choosing a wide range of leading FMs and the ability to customize models for specific tasks allow Tealium to power the solution in specialized ways with minimal updates in the future. The importance of Amazon Bedrock in text generation and success in evaluation were shown in this engagement, providing potential and flexibility for Tealium to build on the solution. Its emphasis on security allows Tealium to be confident in building and delivering more secure applications.

As stated by Matt Gray, VP of Global Partnerships at Tealium,

“In collaboration with the AWS Generative AI Innovation Center, we have developed a sophisticated evaluation framework and an error correction system, utilizing Amazon Bedrock, to elevate the user experience. This initiative has resulted in a streamlined process for assessing the performance of the Tealium QA bot, enhancing its accuracy and reliability through advanced technical metrics and error correction methodologies. Our partnership with AWS and Amazon Bedrock is a testament to our dedication to delivering superior outcomes and continuing to innovate for our mutual clients.”

This is just one of the ways AWS enables builders to deliver generative AI based solutions. You can get started with Amazon Bedrock and see how it can be integrated in example code bases today. If you’re interested in working with the AWS generative AI services, reach out to the GenAIIC.

About the authors

Suren Gunturu is a Data Scientist working in the Generative AI Innovation Center, where he works with various AWS customers to solve high-value business problems. He specializes in building ML pipelines using large language models, primarily through Amazon Bedrock and other AWS Cloud services.

Varun Kumar is a Staff Data Scientist at Tealium, leading its research program to provide high-quality data and AI solutions to its customers. He has extensive experience in training and deploying deep learning and machine learning models at scale. Additionally, he is accelerating Tealium’s adoption of foundation models in its workflow including RAG, agents, fine-tuning, and continued pre-training.

Vidya Sagar Ravipati is a Science Manager at the Generative AI Innovation Center, where he leverages his vast experience in large-scale distributed systems and his passion for machine learning to help AWS customers across different industry verticals accelerate their AI and cloud adoption.

EBSCOlearning scales assessment generation for their online learning content with generative AI

December 11, 2024

by Yasin Khatami Amazon AWS

EBSCOlearning offers corporate learning and educational and career development products and services for businesses, educational institutions, and workforce development organizations. As a division of EBSCO Information Services, EBSCOlearning is committed to enhancing professional development and educational skills.

In this post, we illustrate how EBSCOlearning partnered with AWS Generative AI Innovation Center (GenAIIC) to use the power of generative AI in revolutionizing their learning assessment process. We explore the challenges faced in traditional question-answer (QA) generation and the innovative AI-driven solution developed to address them.

In the rapidly evolving landscape of education and professional development, the ability to effectively assess learners’ understanding of content is crucial. EBSCOlearning, a leader in the realm of online learning, recognized this need and embarked on an ambitious journey to transform their assessment creation process using cutting-edge generative AI technology.

Amazon Bedrock is a fully managed service that offers a choice of high-performing foundation models (FMs) from leading AI companies such as AI21 Labs, Anthropic, Cohere, Meta, Mistral AI, Stability AI, and Amazon through a single API, along with a broad set of capabilities to build generative AI applications with security, privacy, and responsible AI, and is well positioned to address these types of tasks.

The challenge: Scaling quality assessments

EBSCOlearning’s learning paths—comprising videos, book summaries, and articles—form the backbone of a multitude of educational and professional development programs. However, the company faced a significant hurdle: creating high-quality, multiple-choice questions for these learning paths was a time-consuming and resource-intensive process.

Traditionally, subject matter experts (SMEs) would meticulously craft each question set to be relevant, accurate, and to align with learning objectives. Although this approach guaranteed quality, it was slow, expensive, and difficult to scale. As EBSCOlearning’s content library continues to grow, so does the need for a more efficient solution.

Enter AI: A promising solution

Recognizing the potential of AI to address this challenge, EBSCOlearning partnered with the GenAIIC to develop an AI-powered question generation system. The goal was ambitious: to create an automated solution that could produce high-quality, multiple-choice questions at scale, while adhering to strict guidelines on bias, safety, relevance, style, tone, meaningfulness, clarity, and diversity, equity, and inclusion (DEI). The QA pairs had to be grounded in the learning content and test different levels of understanding, such as recall, comprehension, and application of knowledge. Additionally, explanations were needed to justify why an answer was correct or incorrect.

The team faced several key challenges:

Making sure AI-generated questions matched the quality of human-created ones
Developing a system that could handle diverse content types
Implementing robust quality control measures
Creating a scalable solution that could grow with EBSCOlearning’s needs

Crafting the AI solution

The GenAIIC team developed a sophisticated pipeline using the power of large language models (LLMs), specifically Anthropic’s Claude 3.5 Sonnet in Amazon Bedrock. This pipeline is illustrated in the following figure and consists of several key components: QA generation, multifaceted evaluation, and intelligent revision.

QA generation

The process begins with the QA generation component. This module takes in the learning content—which could be a video transcript, book summary, or article—and generates an initial set of multiple-choice questions using in-context learning.

EBSCOlearning experts and GenAIIC scientists worked together to develop a sophisticated prompt engineering approach using Anthropic’s Claude 3.5 Sonnet model in Amazon Bedrock. To align with EBSCOlearning’s high standards, the prompt includes:

Detailed guidelines on what constitutes a high-quality question, covering aspects such as relevance, clarity, difficulty level, and objectivity
Instructions to match the conversational style of the original content
Directives to include diversity and inclusivity in the language and scenarios used
Few-shot examples to enable in-context learning for the AI model

The system aims to generate up to seven questions for each piece of content, each with four answer choices including a correct answer, and detailed explanations for why each answer is correct or incorrect.

Multifaceted evaluation

After the initial set of questions is generated, it undergoes a rigorous evaluation process. This multifaceted approach makes sure that the questions adhere to all quality standards and guidelines. The evaluation process includes three phases: LLM-based guideline evaluation, rule-based checks, and a final evaluation.

LLM-based guideline evaluation

In collaboration with EBSCOlearning, the GenAIIC team manually developed a comprehensive set of evaluation guidelines covering fundamental requirements for multiple-choice questions, such as validity, accuracy, and relevance. Additionally, they incorporated EBSCOlearning’s specific standards for diversity, equity, inclusion, and belonging (DEIB), in addition to style and tone preferences. The AI system evaluates each question according to the established guidelines and generates a structured output that includes detailed reasoning along with a rating on a three-point scale, where 1 indicates invalid, 2 indicates partially valid, and 3 indicates valid. This rating is later used for revising the questions.

This process presented several significant challenges. The primary challenge was making sure that the AI model could effectively assess multiple complex guidelines simultaneously without overlooking any crucial aspects. This was particularly difficult because the evaluation needed to consider so many different factors—all while maintaining consistency across questions.

To overcome the challenge of LLMs potentially overlooking guidelines when presented with them all at one time, the evaluation process was split into smaller manageable tasks by getting the AI model to focus on fewer guidelines at a time or evaluating smaller chunks of questions in parallel. This way, each guideline receives focused attention, resulting in a more accurate and comprehensive evaluation. Additionally, the system was designed with modularity in mind, streamlining the addition or removal of guidelines. Because of this flexibility, the evaluation process can adapt quickly to new requirements or changes in educational standards.

By generating detailed, structured feedback for each question, including numerical ratings, concise summaries, and in-depth reasoning, the system provides invaluable insights for continual improvement. This level of detail allows for a nuanced understanding of how well each question aligns with the established criteria, offering possibilities for targeted enhancements to the question generation process.

Rule-based checks

Some quantitative aspects of question quality proved challenging for the AI to consistently evaluate. For instance, the team noticed that correct answers were often longer than incorrect ones, making them straightforward to identify. To address this, they developed a custom algorithm that analyzes answer lengths and flags potential issues without relying on the LLM’s judgment.

Final evaluation

Beyond evaluating individual questions, the system also assesses the entire set of questions for a given piece of content. This step checks for duplicates, promotes diversity in the types of questions asked, and verifies that the set as a whole provides a comprehensive assessment of the learning material.

Intelligent revision

One key component of the pipeline is the intelligent revision module. This is where the iterative improvement happens. When the evaluation process flags issues with a question—whether it’s a guideline violation or a structural problem—the question is sent back for revision. The AI model is provided with specific feedback on how it can address the specific violation and directs it to fix the issue by revising or replacing the QA.

The power of iteration

The whole pipeline goes through multiple iterations until the question aligns with all of the specified quality standards. If after several attempts a question still doesn’t meet the criteria, it’s flagged for human review. This iterative approach makes sure that the final output isn’t merely a raw AI generation, but a refined product that has gone through multiple checks and improvements.

Throughout the development process, the team maintained a strong focus on iterative tracking of changes. They implemented a unique history tracking system so they could monitor the evolution of each question through multiple rounds of generation, evaluation, and revision. This approach not only provided valuable insights into the AI model’s decision-making process, but also allowed for targeted improvements to the system over time. By closely tracking the AI model’s performance across multiple iterations, we were able to fine-tune our prompts and evaluation criteria, resulting in a significant improvement in output quality.

Scalability and robustness

With EBSCOlearning’s vast content library in mind, the team built scalability into the core of their solution. They implemented multithreading capabilities, allowing the system to process multiple pieces of content simultaneously. They also developed sophisticated retry mechanisms to handle potential API failures or invalid outputs so the system remained reliable even when processing large volumes of content.

The results: A game-changer for learning assessment

By combining these components—intelligent generation, comprehensive evaluation, and adaptive revision—EBSCOlearning and the GenAIIC team created a system that not only automates the process of creating assessment questions but does so with a level of quality that rivals human-created content. This pipeline represents a significant leap forward in the application of AI to educational content creation, promising to revolutionize how learning assessments are developed and delivered.

The impact of this AI-powered solution on EBSCOlearning’s assessment creation process has been nothing short of transformative. Feedback from EBSCOlearning’s subject matter experts has been overwhelmingly positive, with the AI-generated questions meeting or exceeding the quality of manually created ones in many cases.

Key benefits of the new system include:

Dramatically reduced time and cost for assessment creation
Consistent quality across a wide range of subjects and content types
Improved scalability, allowing EBSCOlearning to keep pace with their growing content library
Enhanced learning experiences for end users, with more diverse and engaging assessments

“The Generative AI Innovation Center’s automated solution for generating multiple-choice questions and answers considerably accelerated the timeline for deployment of assessments for our online learning platform. Their approach of leveraging advanced language models and implementing carefully constructed guidelines in collaboration with our subject matter experts and product management team has resulted in assessment material that is accurate, relevant, and of high quality. This solution is saving us considerable time and effort, and will enable us to scale assessments across the wide range of skills development resources on our platform.”

—Michael Laddin, Senior Vice President & General Manager, EBSCOlearning.

Here are two examples of generated QA.

Question 1: What does the Consumer Relevancy model proposed by Crawford and Mathews assert about human values in business transactions?

A. Human values are less important than monetary value.
B. Human values and monetary value are equally important in all transactions.
C. Human values are more important than traditional value propositions.
D. Human values are irrelevant compared to monetary value in business transactions.

Correct answer: C

Answer explanations:

A. This is contrary to what the Consumer Relevancy model asserts. The model emphasizes the importance of human values over traditional value propositions.
B. While this might seem balanced, it doesn’t reflect the emphasis placed on human values in the Consumer Relevancy model. The model asserts that human values are more important.
C. This correctly reflects the assertion of the Consumer Relevancy model as described in the Book Summary. The model emphasizes the importance of human values over traditional value propositions.
D. This is the opposite of what the Consumer Relevancy model asserts. The model emphasizes the importance of human values, not their irrelevance.

Overall: The Book Summary states that the Consumer Relevancy model asserts that human values are more important than traditional value propositions and that businesses must recognize this need for human values as the contemporary currency of commerce.

Question 2: According to Sara N. King, David G. Altman, and Robert J. Lee, what is the primary benefit of leaders having clarity about their own values?

A. It guarantees success in all leadership positions.
B. It helps leaders make more fulfilling career choices.
C. It eliminates all conflicts in the workplace environment.
D. It ensures leaders always make ethically perfect decisions.

Correct answer: B

Answer explanations:

A. The Book Summary does not suggest that clarity about values guarantees success in all leadership positions. This is an overstatement of the benefits.
B. This is the correct answer, as the Book Summary states that clarity about values allows leaders to make more fulfilling career choices and recognize conflicts with their core values.
C. While understanding one’s values can help in managing conflicts, the Book Summary does not claim it eliminates all workplace conflicts. This is an exaggeration of the benefits.
D. The Book Summary does not state that clarity about values ensures leaders always make ethically perfect decisions. This is an unrealistic expectation not mentioned in the content.

Overall: The Book Summary emphasizes that having clarity about one’s own values allows leaders to make more fulfilling career choices and helps them recognize when they are participating in actions that conflict with their core values.

Looking to the future

For EBSCOlearning, this project is the first step towards their goal of scaling assessments across their entire online learning platform. They’re already planning to expand the system’s capabilities, including:

Adapting the solution to handle more complex, technical content
Incorporating additional question types beyond multiple-choice
Exploring ways to personalize assessments based on individual learner profiles

The potential applications of this technology extend far beyond EBSCOlearning’s current use case. From personalized learning paths to adaptive testing, the possibilities for AI in education and professional development are vast and exciting.

Conclusion: A new era of learning assessment

The collaboration between EBSCOlearning and the GenAIIC demonstrates the transformative power of AI when applied thoughtfully to real-world challenges. By combining cutting-edge technology with deep domain expertise, they’ve created a solution that not only solves a pressing business need but also has the potential to enhance learning experiences for millions of people. This solution is slated to produce assessment questions for hundreds and eventually thousands of learning paths in EBSCOlearning’s curriculum.

As we look to the future of education and professional development, it’s clear that AI will play an increasingly important role. The success of this project serves as a compelling example of how AI can be used to create more efficient, effective, and engaging learning experiences.

For businesses and educational institutions alike, the message is clear: embracing AI isn’t just about keeping up with technology trends—it’s about unlocking new possibilities to better serve learners and drive innovation in education. As EBSCOlearning’s journey shows, the future of learning assessment is here, and it’s powered by AI. Consider how such a solution can enrich your own e-learning content and delight your customers with high quality and on-point assessments. To get started, contact your AWS account manager. If you don’t have an AWS account manager, contact sales. Visit Generative AI Innovation Center to learn more about our program.

About the authors

Yasin Khatami is a Senior Applied Scientist at the Generative AI Innovation Center. With more than a decade of experience in artificial intelligence (AI), he implements state-of-the-art AI products for AWS customers to drive innovation, efficiency and value for customer platforms. His expertise is in generative AI, large language models (LLM), multi-agent techniques, and multimodal learning.

Yifu Hu is an Applied Scientist at the Generative AI Innovation Center. He develops machine learning and generative AI solutions for diverse customer challenges across various industries. Yifu specializes in creative problem-solving, with expertise in AI/ML technologies, particularly in applications of large language models and AI agents.

Aude Genevay is a Senior Applied Scientist at the Generative AI Innovation Center, where she helps customers tackle critical business challenges and create value using generative AI. She holds a PhD in theoretical machine learning and enjoys turning cutting-edge research into real-world solutions.

Mike Laddin is Senior Vice President & General Manager of EBSCOlearning, a division of EBSCO Information Services. EBSCOlearning offers highly acclaimed online products and services for companies, educational institutions, and workforce development organizations. Mike oversees a team of professionals focused on unlocking the potential of people and organizations with on-demand upskilling and microlearning solutions. He has over 25 years of experience as both an entrepreneur and software executive in the information services industry. Mike received an MBA from the Lally School of Management at Rensselaer Polytechnic Institute, and outside of work he is an avid boater.

Alyssa Gigliotti is a Content Strategy and Product Operations Manager at EBSCOlearning, where she collaborates with her team to design top-tier microlearning solutions focused on enhancing business and power skills. With a background in English, professional writing, and technical communications from UMass Amherst, Alyssa combines her expertise in language with a strategic approach to educational content. Alyssa’s in-depth knowledge of the product and voice of the customer allows her to actively engage in product development planning to ensure her team continuously meets the needs of users. Outside the professional sphere, she is both a talented artist and a passionate reader, continuously seeking inspiration from creative and literary pursuits.

Pixtral 12B is now available on Amazon SageMaker JumpStart

December 10, 2024

by Preston Tuggle Amazon AWS

Today, we are excited to announce that Pixtral 12B (pixtral-12b-2409), a state-of-the-art vision language model (VLM) from Mistral AI that excels in both text-only and multimodal tasks, is available for customers through Amazon SageMaker JumpStart. You can try this model with SageMaker JumpStart, a machine learning (ML) hub that provides access to algorithms and models that can be deployed with one click for running inference.

In this post, we walk through how to discover, deploy, and use the Pixtral 12B model for a variety of real-world vision use cases.

Pixtral 12B overview

Pixtral 12B represents Mistral’s first VLM and demonstrates strong performance across various benchmarks, outperforming other open models and matching larger models, according to Mistral. Pixtral is trained to understand both images and documents, and shows strong abilities in vision tasks such as chart and figure understanding, document question answering, multimodal reasoning, and instruction following, some of which we demonstrate later in this post with examples. Pixtral 12B is able to ingest images at their natural resolution and aspect ratio. Unlike other open source models, Pixtral doesn’t compromise on text benchmark performance, such as instruction following, coding, and math, to excel in multimodal tasks.

Mistral designed a novel architecture for Pixtral 12B to optimize for both speed and performance. The model has two components: a 400-million-parameter vision encoder, which tokenizes images, and a 12-billion-parameter multimodal transformer decoder, which predicts the next text token given a sequence of text and images. The vision encoder was newly trained that natively supports variable image sizes, which allows Pixtral to be used to accurately understand complex diagrams, charts, and documents in high resolution, and provides fast inference speeds on small images like icons, clipart, and equations. This architecture allows Pixtral to process any number of images with arbitrary sizes in its large context window of 128,000 tokens.

License agreements are a critical decision factor when using open-weights models. Similar to other Mistral models, such as Mistral 7B, Mixtral 8x7B, Mixtral 8x22B and Mistral Nemo 12B, Pixtral 12B is released under the commercially permissive Apache 2.0, providing enterprise and startup customers with a high-performing VLM option to build complex multimodal applications.

SageMaker JumpStart overview

SageMaker JumpStart offers access to a broad selection of publicly available foundation models (FMs). These pre-trained models serve as powerful starting points that can be deeply customized to address specific use cases. You can now use state-of-the-art model architectures, such as language models, computer vision models, and more, without having to build them from scratch.

With SageMaker JumpStart, you can deploy models in a secure environment. The models can be provisioned on dedicated SageMaker Inference instances, including AWS Trainium and AWS Inferentia powered instances, and are isolated within your virtual private cloud (VPC). This enforces data security and compliance, because the models operate under your own VPC controls, rather than in a shared public environment. After deploying an FM, you can further customize and fine-tune the model, including SageMaker Inference for deploying models and container logs for improved observability.With SageMaker, you can streamline the entire model deployment process. Note that fine-tuning on Pixtral 12B is not yet available (at the time of writing) on SageMaker JumpStart.

Prerequisites

To try out Pixtral 12B in SageMaker JumpStart, you need the following prerequisites:

An AWS account that will contain all your AWS resources.
An AWS Identity and Access Management (IAM) role to access SageMaker. To learn more about how IAM works with SageMaker, refer to Identity and Access Management for Amazon SageMaker.
Access to Amazon SageMaker Studio or a SageMaker notebook instance or an interactive development environment (IDE) such as PyCharm or Visual Studio Code. We recommend using SageMaker Studio for straightforward deployment and inference.
Access to accelerated instances (GPUs) for hosting the model.

Discover Pixtral 12B in SageMaker JumpStart

You can access Pixtral 12B through SageMaker JumpStart in the SageMaker Studio UI and the SageMaker Python SDK. In this section, we go over how to discover the models in SageMaker Studio.

SageMaker Studio is an IDE that provides a single web-based visual interface where you can access purpose-built tools to perform ML development steps, from preparing data to building, training, and deploying your ML models. For more details on how to get started and set up SageMaker Studio, refer to Amazon SageMaker Studio Classic.

In SageMaker Studio, access SageMaker JumpStart by choosing JumpStart in the navigation pane.
Choose HuggingFace to access the Pixtral 12B model.
Search for the Pixtral 12B model.
You can choose the model card to view details about the model such as license, data used to train, and how to use the model.
Choose Deploy to deploy the model and create an endpoint.

Deploy the model in SageMaker JumpStart

Deployment starts when you choose Deploy. When deployment is complete, an endpoint is created. You can test the endpoint by passing a sample inference request payload or by selecting the testing option using the SDK. When you use the SDK, you will see example code that you can use in the notebook editor of your choice in SageMaker Studio.

To deploy using the SDK, we start by selecting the Mistral Nemo Base model, specified by the model_id with the value huggingface-vlm-mistral-pixtral-12b-2409. You can deploy your choice of any of the selected models on SageMaker with the following code:

from sagemaker.jumpstart.model import JumpStartModel 

accept_eula = True 

model = JumpStartModel(model_id="huggingface-vlm-mistral-pixtral-12b-2409") 
predictor = model.deploy(accept_eula=accept_eula)

This deploys the model on SageMaker with default configurations, including the default instance type and default VPC configurations. You can change these configurations by specifying non-default values in JumpStartModel. The end-user license agreement (EULA) value must be explicitly defined as True in order to accept the EULA. Also, make sure that you have the account-level service limit for using ml.p4d.24xlarge or ml.pde.24xlarge for endpoint usage as one or more instances. To request a service quota increase, refer to AWS service quotas. After you deploy the model, you can run inference against the deployed endpoint through the SageMaker predictor.

Pixtral 12B use cases

In this section, we provide examples of inference on Pixtral 12B with example prompts.

OCR

We use the following image as input for OCR.

We use the following prompt:

payload = {
    "messages": [
        {
            "role": "user",
            "content": [
                {
                    "type": "text",
                    "text": "Extract and transcribe all text visible in the image, preserving its exact formatting, layout, and any special characters. Include line breaks and maintain the original capitalization and punctuation.",
                },
                {
                    "type": "image_url",
                    "image_url": {
                        "url": "Pixtral_data/amazon_s1_2.jpg"
                    }
                }
            ]
        }
    ],
    "max_tokens": 2000,
    "temperature": 0.6,
    "top_p": 0.9,
}
print(response)
Approximate date of commencement of proposed sale to the public: AS SOON AS PRACTICABLE AFTER THIS REGISTRATION STATEMENT BECOMES EFFECTIVE. 
If any of the securities being registered on this Form are to be offered on a delayed or continuous basis pursuant to Rule 415 under the Securities Act of 1933, check the following box. 
[] If this Form is filed to register additional securities for an offering pursuant to Rule 462(b) under the Securities Act of 1933, check the following box and list the Securities Act registration statement number of the earlier effective registration statement for the same offering. 
[] If this Form is a post-effective amendment filed pursuant to Rule 462(c) under the Securities Act of 1933, check the following box and list the Securities Act registration statement number of the earlier effective registration statement for the same offering. 
[] If delivery of the prospectus is expected to be made pursuant to Rule 434, please check the following box. 
[] **CALCULATION OF REGISTRATION FEE** 
| TITLE OF EACH CLASS OF SECURITIES TO BE REGISTERED | AMOUNT TO BE REGISTERED(1) | PROPOSED MAXIMUM OFFERING PRICE PER SHARE(2) | PROPOSED MAXIMUM AGGREGATE OFFERING PRICE(2) | AMOUNT OF REGISTRATION FEE | 
|----------------------------------------------------|----------------------------|---------------------------------------------|---------------------------------------------|----------------------------| 
| Common Stock, $0.01 par value per share........... | 2,875,000 shares           | $14.00                                      | $40,250,000                                 | $12,197(3)                 | 

(1) Includes 375,000 shares that the Underwriters have the option to purchase to cover over-allotments, if any. 
(2) Estimated solely for the purpose of calculating the registration fee in accordance with Rule 457(c). 
(3) $11,326 of registration fee has been previously paid. ...

Chart understanding and analysis

For chart understanding and analysis, we use the following image as input.

We use the following prompt:

prompt= """
Analyze the attached image of the chart or graph. Your tasks are to:
Identify the type of chart or graph (e.g., bar chart, line graph, pie chart, etc.).
Extract the key data points, including labels, values, and any relevant scales or units.
Identify and describe the main trends, patterns, or significant observations presented in the chart.
Generate a clear and concise paragraph summarizing the extracted data and insights. The summary should highlight the most important information and provide an overview that would help someone understand the chart without seeing it.
Ensure that your summary is well-structured, accurately reflects the data, and is written in a professional tone.
"""
payload = {
    "messages": [
        {
            "role": "user",
            "content": [
                {
                    "type": "text",
                    "text": prompt,
                },
                {
                    "type": "image_url",
                    "image_url": {
                        "url": "Pixtral_data/amazon_s1_2.jpg"
                    }
                }
            ]
        }
    ],
    "max_tokens": 2000,
    "temperature": 0.6,
    "top_p": 0.9,
}
print(response)
image_path = "Pixtral_data/Amazon_Chart.png"  # Replace with your local image path
response = send_images_to_model(predictor, prompt, image_path)
print(response)

We get the following output:

The image is a bar chart titled "Segment Results – North America," which presents data on net sales and operating income over several quarters from Q2 2023 to Q2 2024. The chart is divided into two sections: one for net sales and the other for operating income.

### Key Data Points:
- Net Sales:
 - Q2 2023: $82,546 million
 - Q3 2023: Approximately $85,000 million
 - Q4 2023: Approximately $90,000 million
 - Q1 2024: Approximately $85,000 million
 - Q2 2024: $90,033 million
 - Year-over-Year (Y/Y) growth: 9%

- Operating Income:
 - Q2 2023: $3,211 million
 - Q3 2023: Approximately $4,000 million
 - Q4 2023: Approximately $7,000 million
 - Q1 2024: Approximately $5,000 million
 - Q2 2024: $5,065 million
 - Year-over-Year (Y/Y) growth: 58%

- Total Trailing Twelve Months (TTM):
 - Net Sales: $369.8 billion
 - Operating Income: $20.8 billion
...
- **Operating Income:** Operating income shows significant growth, particularly in Q4 2023, where it peaks. There is a notable year-over-year increase of 58%.

### Summary:
The bar chart illustrates the segment results for North America, focusing on net sales and operating income from Q2 2023 to Q2 2024. Net sales demonstrate a steady upward trend, culminating in a 9% year-over-year increase, with the highest value recorded in Q2 2024 at $90,033 million. Operating income exhibits more volatility, with a significant peak in Q4 2023, and an overall substantial year-over-year growth of 58%. The total trailing twelve months (TTM) figures indicate robust performance, with net sales reaching $369.8 billion and operating income at $20.8 billion. This data underscores a positive growth trajectory in both net sales and operating income for the North American segment over the observed period.

Image to code

For an image-to-code example, we use the following image as input.

We use the following prompt:

def extract_html(text):
 pattern = r'```htmls*(.*?)s*```'
 match = re.search(pattern, text, re.DOTALL)
 return match.group(1) if match else None
  
prompt = "Create HTML and CSS code for a minimalist and futuristic website to purchase luggage. Use the following image as template to create your own design."
payload = {
    "messages": [
        {
            "role": "user",
            "content": [
                {
                    "type": "text",
                    "text": prompt,
                },
                {
                    "type": "image_url",
                    "image_url": {
                        "url": "Pixtral_data/Amazon_Chart.png"
                    }
                }
            ]
        }
    ],
    "max_tokens": 2000,
    "temperature": 0.6,
    "top_p": 0.9,
}
print('Input Image:nn')
html_code = extract_html(response)
print(html_code)
display(HTML(html_code))
<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <title>Luggage Store</title>
    <link rel="stylesheet" href="styles.css">
</head>
<body>
    <header>
        <h1>Luggage Store</h1>
        <nav>
            <ul>
                <li><a href="#">Home</a></li>
                <li><a href="#">Products</a></li>
                <li><a href="#">About</a></li>
                <li><a href="#">Contact</a></li>
            </ul>
        </nav>
    </header>
...
        <p>&copy; 2023 Luggage Store. All rights reserved.</p>
    </footer>
</body>
</html>

Clean up

After you are done, delete the SageMaker endpoints using the following code to avoid incurring unnecessary costs:

predictor.delete_model()
predictor.delete_endpoint()

Conclusion

In this post, we showed you how to get started with Mistral’s newest multi-modal model, Pixtral 12B, in SageMaker JumpStart and deploy the model for inference. We also explored how SageMaker JumpStart empowers data scientists and ML engineers to discover, access, and deploy a wide range of pre-trained FMs for inference, including other Mistral AI models, such as Mistral 7B and Mixtral 8x22B.

For more information about SageMaker JumpStart, refer to Train, deploy, and evaluate pretrained models with SageMaker JumpStart and Getting started with Amazon SageMaker JumpStart to get started.

For more Mistral assets, check out the Mistral-on-AWS repo.

About the Authors

Preston Tuggle is a Sr. Specialist Solutions Architect working on generative AI.

Niithiyn Vijeaswaran is a GenAI Specialist Solutions Architect at AWS. His area of focus is generative AI and AWS AI Accelerators. He holds a Bachelor’s degree in Computer Science and Bioinformatics. Niithiyn works closely with the Generative AI GTM team to enable AWS customers on multiple fronts and accelerate their adoption of generative AI. He’s an avid fan of the Dallas Mavericks and enjoys collecting sneakers.

Shane Rai is a Principal GenAI Specialist with the AWS World Wide Specialist Organization (WWSO). He works with customers across industries to solve their most pressing and innovative business needs using the breadth of cloud-based AI/ML AWS services, including model offerings from top tier foundation model providers.

Talk to your slide deck using multimodal foundation models on Amazon Bedrock – Part 3

December 10, 2024

by Archana Inapudi Amazon AWS

In this series, we share two approaches to gain insights on multimodal data like text, images, and charts. In Part 1, we presented an “embed first, infer later” solution that uses the Amazon Titan Multimodal Embeddings foundation model (FM) to convert individual slides from a slide deck into embeddings. We stored the embeddings in a vector database and then used the Large Language-and-Vision Assistant (LLaVA 1.5-7b) model to generate text responses to user questions based on the most similar slide retrieved from the vector database. Part 1 uses AWS services including Amazon Bedrock, Amazon SageMaker, and Amazon OpenSearch Serverless.

In Part 2, we demonstrated a different approach: “infer first, embed later.” We used Anthropic’s Claude 3 Sonnet on Amazon Bedrock to generate text descriptions for each slide in the slide deck. These descriptions are then converted into text embeddings using the Amazon Titan Text Embeddings model and stored in a vector database. Then we used Anthropic’s Claude 3 Sonnet to generate answers to user questions based on the most relevant text description retrieved from the vector database.

In this post, we evaluate the results from both approaches using ground truth provided by SlideVQA[1], an open source visual question answering dataset. You can test both approaches and evaluate the results to find the best fit for your datasets. The code for this series is available in the GitHub repo.

Comparison of approaches

SlideVQA is a collection of publicly available slide decks, each composed of multiple slides (in JPG format) and questions based on the information in the slide decks. It allows a system to select a set of evidence images and answer the question. We use SlideVQA as the single source of truth to compare the results. It’s important that you follow the Amazon Bedrock data protection policies when using public datasets.

This post follows the process depicted in the following diagram. For more details about the architecture, refer to the solution overview and design in Parts 1 and 2 of the series.

We selected 100 random questions from SlideVQA to create a sample dataset to test solutions from Part 1 and Part 2.

The responses to the questions in the sample dataset are as concise as possible, as shown in the following example:

"question": "What is the process by which the breaking of hydrogen bonds allows water to change from the liquid phase into the gaseous phase which has reached equilibrium with the liquid surface said to have done?"

"answer": "reached saturation"

The responses from large language models (LLMs) are quite verbose:

According to the information provided in the images, the process by which the breaking of hydrogen bonds allows water to change from the liquid phase into the gaseous phase that has reached equilibrium with the liquid surface is said to have reached saturation.

The key points are:

1. Evaporation involves the breaking of hydrogen bonds that hold water molecules together in the liquid phase, allowing them to transition into the gaseous (vapor) phase.

2. Only the fastest moving water molecules with enough energy can overcome the hydrogen bonding and evaporate into the vapor phase.

3. The evaporation process that has reached equilibrium with the liquid surface, where the vapor pressure is balanced with the evaporation rate, is described as having reached saturation.

So in summary, the breaking of hydrogen bonds provides the mechanism for water molecules to gain enough energy to escape the liquid phase as vapor, and when this vapor has reached equilibrium with the liquid surface, it is said to have reached saturation.We updated the prompts in each approach to provide short responses instead of verbose responses. This helped match the output context length to the ground truth responses in the sample dataset.

The following sections briefly discuss the solutions and dive into the evaluation and pricing for each approach.

Approach 1: Embed first, infer later

Slide decks are converted into PDF images, one per slide, and embedded using the Amazon Titan Multimodal Embeddings model, resulting in a vector embedding of 1,024 dimensions. The embeddings are stored in an OpenSearch Serverless index, which serves as the vector store for our Retrieval Augmented Generation (RAG) solution. The embeddings are ingested using an Amazon OpenSearch Ingestion pipeline.

Each question is converted into embeddings using the Amazon Titan Multimodal Embeddings model, and an OpenSearch vector search is performed using these embeddings. We performed a k-nearest neighbor (k-NN) search to retrieve the most relevant embedding matching the question. The metadata of the response from the OpenSearch index contains a path to the image corresponding to the most relevant slide.

The following prompt is created by combining the question and the image path, and is sent to Anthropic’s Claude 3 Sonnet to respond to the question with a concise answer:

Human: Your role is to provide a precise answer to the question in the <question></question> tags. Search the image provided to answer the question. Retrieve the most accurate answer in as few words as possible. Do not make up an answer. For questions that ask for numbers, follow the instructions below in the <instructions></instructions> tags. Skip the preamble and provide only the exact precise answer.

If the image does not contain the answer to the question below, then respond with two words only - "no answer".

Refer to the question and instructions below:

<question>
{question}
</question>


<instructions>
1. Search for relevant data and numbers in the charts and graphs present in the image.

2. If the image does not provide a direct answer to the user question, just say "no answer". Do not add statements like "The image does not provide..." and "It only mentions...", instead just respond with "no answer".

3. Do not add any tags in your answer.

4. Scan for the direct answer to the user question. If there is more than one direct answer, give everything that seems like a valid answer to the question in your response.

5. Search for the question deeply in the image. If the question asks about any data or statistics, look for it in charts, tables, graphs first, and then in texts. Check the headings in the image.

</instructions>

If the image does not contain the answer, or if image does not directly answer the user question, do not respond with "The image does not provide..." or anything similar. In this case, your response should always be "no answer" and nothing else.

Assistant: Here is my response to the question. I will give a direct and precise answer to the question if I find it and if not, I will say "no answer":

We used Anthropic’s Claude 3 Sonnet instead of LLaVA 1.5-7b as mentioned in the solution for Part 1. The approach remains the same, “embed first, infer later,” but the model that compiles the final response is changed for simplicity and comparability between approaches.

A response for each question in the dataset is recorded in JSON format and compared to the ground truth provided by SlideVQA.

This approach retrieved a response for 78% of the questions on a dataset of 100 questions, achieving a 50% accuracy on the final responses.

Approach 2: Infer first, embed later

Slide decks are converted into PDF images, one per slide, and passed to Anthropic’s Claude 3 Sonnet to generate a text description. The description is sent to the Amazon Titan Text Embeddings model to generate vector embeddings with 1,536 dimensions. The embeddings are ingested into an OpenSearch Serverless index using an OpenSearch Ingestion pipeline.

Each question is converted into embeddings using the Amazon Titan Text Embeddings model and an OpenSearch vector search is performed using these embeddings. We performed a k-NN search to retrieve the most relevant embedding matching the question. The metadata of the response from the OpenSearch index contains the image description corresponding to the most relevant slide.

We create a prompt with the question and image description and pass it to Anthropic’s Claude 3 Sonnet to receive a precise answer. The following is the prompt template:

Human: Your role is to provide a precise answer to the question in the <question></question> tags. Search the summary provided in the <summary></summary> tags to answer the question. Retrieve the most accurate answer in as few words as possible. Do not make up an answer. For questions that ask for numbers, follow the instructions below in the <instructions></instructions> tags. Skip the preamble and provide only the exact precise answer.

If the summary does not contain the answer to the question below, then respond with two words only - "no answer".

Refer to the question, summary, and instructions below:

<question>
{question}
</question>

<summary>
{summary}
</summary>

<instructions>
1. Search for relevant data and numbers in the summary.

2. If the summary does not provide a direct answer to the user question, just say "no answer". Do not add statements like "The summary does not specify..." and "I do not have enough information...", instead just respond with "no answer".

3. Do not add any tags in your answer.

4. Scan for the direct answer to the user question. If there is more than one direct answer, give everything that seems like a valid answer to the question in your response.

</instructions>

If the summary does not contain the answer, or if summary does not directly answer the user question, do not respond with "The summary does not provide..." or anything similar. In this case, your response should always be "no answer" and nothing else.

Assistant: Here is my response to the question. I will give a direct and precise answer to the question if I find it and if not, I will say "no answer":

The response for each question in the dataset is recorded in JSON format for ease of comparison. The response is compared to the ground truth provided by SlideVQA.

With this approach, we received 44% accuracy on final responses with 75% of the questions retrieving a response out of the 100 questions in the sample dataset.

Analysis of results

In our testing, both approaches produced 50% or less matching results to the questions in the sample dataset. The sample dataset contains a random selection of slide decks covering a wide variety of topics, including retail, healthcare, academic, technology, personal, and travel. Therefore, for a generic question like, “What are examples of tools that can be used?” which lacks additional context, the nearest match could retrieve responses from a variety of topics, leading to inaccurate results, especially when all embeddings are being ingested in the same OpenSearch index. The use of techniques such as hybrid search, pre-filtering based on metadata, and reranking can be used to improve the retrieval accuracy.

One of the solutions is to retrieve more results (increase the k value) and reorder them to keep the most relevant ones; this technique is called reranking. We share additional ways to improve the accuracy of the results later in this post.

The final prompts to Anthropic’s Claude 3 Sonnet in our analysis included instructions to provide a concise answer in as few words as possible to be able to compare with the ground truth. Your responses will depend on your prompts to the LLM.

Pricing

Pricing is dependent on the modality, provider, and model used. For more details, refer to Amazon Bedrock pricing. We use the On-Demand and Batch pricing mode in our analysis, which allow you to use FMs on a pay-as-you-go basis without having to make time-based term commitments. For text-generation models, you are charged for every input token processed and every output token generated. For embeddings models, you are charged for every input token processed.

The following tables show the price per question for each approach. We calculated the average number of input and output tokens based on our sample dataset for the us-east-1 AWS Region; pricing may vary based on your datasets and Region used.

You can use the following tables for guidance. Refer to the Amazon Bedrock pricing website for additional information.

Approach 1
		Input Tokens			Output Tokens
Model	Description	Price per 1,000 Tokens / Price per Input Image	Number of Tokens	Price	Price per 1,000 Tokens	Number of Tokens	Price
Amazon Titan Multimodal Embeddings	Slide/image embedding	$0.00006	1	$0.00000006	$0.000	0	$0.00000
Amazon Titan Multimodal Embeddings	Question embedding	$0.00080	20	$0.00001600	$0.000	0	$0.00000
Anthropic’s Claude 3 Sonnet	Final response	$0.00300	700	$0.00210000	$0.015	8	$0.00012
Cost per input/output				$0.00211606			$0.00012
Total cost per question							$0.00224

Approach 2
		Input Tokens			Output Tokens
Model	Description	Price per 1,000 Tokens / Price per Input Image	Number of Tokens	Price	Price per 1,000 Tokens	Number of Tokens	Price
Anthropic’s Claude 3 Sonnet	Slide/image description	$0.00300	4523	$0.01356900	$0.015	350	$0.00525
Amazon Titan Text Embeddings	Slide/image description embedding	$0.00010	350	$0.00003500	$0.000	0	$0.00000
Amazon Titan Text Embeddings	Question embedding	$0.00010	20	$0.00000200	$0.000	0	$0.00000
Anthropic’s Claude 3 Sonnet	Final response	$0.00300	700	$0.00210000	$0.015	8	$0.00012
Cost per input/output				$0.01570600			$0.00537
Total cost per question							$0.02108

Clean up

To avoid incurring charges, delete any resources from Parts 1 and 2 of the solution. You can do this by deleting the stacks using the AWS CloudFormation console.

Conclusion

In Parts 1 and 2 of this series, we explored ways to use the power of multimodal FMs such as Amazon Titan Multimodal Embeddings, Amazon Titan Text Embeddings, and Anthropic’s Claude 3 Sonnet. In this post, we compared the approaches from an accuracy and pricing perspective.

Code for all parts of the series is available in the GitHub repo. We encourage you to deploy both approaches and explore different Anthropic Claude models available on Amazon Bedrock. You can discover new information and uncover new perspectives using your organization’s slide content with either approach. Compare the two approaches to identify a better workflow for your slide decks.

With generative AI rapidly developing, there are several ways to improve the results and approach the problem. We are exploring performing a hybrid search and adding search filters by extracting entities from the question to improve the results. Part 4 in this series will explore these concepts in detail.

Portions of this code are released under the Apache 2.0 License.

Resources

[1] Tanaka, Ryota & Nishida, Kyosuke & Nishida, Kosuke & Hasegawa, Taku & Saito, Itsumi & Saito, Kuniko. (2023). SlideVQA: A Dataset for Document Visual Question Answering on Multiple Images. Proceedings of the AAAI Conference on Artificial Intelligence. 37. 13636-13645. 10.1609/aaai.v37i11.26598.

About the Authors

Archana Inapudi is a Senior Solutions Architect at AWS, supporting a strategic customer. She has over a decade of cross-industry expertise leading strategic technical initiatives. Archana is an aspiring member of the AI/ML technical field community at AWS. Prior to joining AWS, Archana led a migration from traditional siloed data sources to Hadoop at a healthcare company. She is passionate about using technology to accelerate growth, provide value to customers, and achieve business outcomes.

Manju Prasad is a Senior Solutions Architect at Amazon Web Services. She focuses on providing technical guidance in a variety of technical domains, including AI/ML. Prior to joining AWS, she designed and built solutions for companies in the financial services sector and also for a startup. She has worked in all layers of the software stack, ranging from webdev to databases, and has experience in all levels of the software development lifecycle. She is passionate about sharing knowledge and fostering interest in emerging talent.

Amit Arora is an AI and ML Specialist Architect at Amazon Web Services, helping enterprise customers use cloud-based machine learning services to rapidly scale their innovations. He is also an adjunct lecturer in the MS data science and analytics program at Georgetown University in Washington, D.C.

Antara Raisa is an AI and ML Solutions Architect at Amazon Web Services supporting strategic customers based out of Dallas, Texas. She also has previous experience working with large enterprise partners at AWS, where she worked as a Partner Success Solutions Architect for digital-centered customers.

Automate actions across enterprise applications using Amazon Q Business plugins

December 10, 2024

by Abhishek Maligehalli Shivalingaiah Amazon AWS

Amazon Q Business is a generative AI-powered assistant that enhances employee productivity by solving problems, generating content, and providing insights across enterprise data sources. Beyond searching indexed third-party services, employees need access to dynamic, near real-time data such as stock prices, vacation balances, and location tracking, which is made possible through Amazon Q Business plugins. Furthermore, Amazon Q Business plugins enable employees to take direct actions within multiple enterprise applications—such as upgrading service ticket priorities—through a single Amazon Q Business interface, eliminating the need to switch between different systems and saving valuable time.

In this post, we explore how Amazon Q Business plugins enable seamless integration with enterprise applications through both built-in and custom plugins. We dive into configuring built-in plugins such as Salesforce, creating custom plugins for specific business needs, and real-world use cases showing how plugins can streamline employee workflows across multiple applications

Plugins enable Amazon Q Business users to use natural language to access non-indexed data (for example, available calendar slots, stock prices, and PTO balance) and take actions (for example, book a meeting or submit PTO) using third-party services such as Jira, ServiceNow, Salesforce, Fidelity, Vanguard, ADP, Workday, and Google Calendar. This provides a more straightforward and quicker experience for users, who no longer need to use multiple applications to complete tasks.

Solution overview

The following figure illustrates a sample architecture using Amazon Q Business plugins.

Amazon Q Business can connect to enterprise applications using over 50 connectors and over 10 plugins. Administrators can use connectors to pre-index the content from enterprise sources into Amazon Q Business to be used by end-users, whereas plugins can be configured to retrieve information and perform actions in real time on enterprise applications. There are two types of plugins:

Built-in plugins – These are available by default in Amazon Q Business. Built-in plugins carry out specific actions in an enterprise application. At the time of writing, we support predefined operations on Jira Cloud, ServiceNow, Zendesk Suite, Microsoft Teams, Atlassian Confluence, Smartsheet, Salesforce, Microsoft Exchange, Asana, and Google Calendar.
Custom plugins – These are created by administrators to interact with specific third-party services and the API endpoints. Administrators have flexibility in defining the behavior and actions carried out by custom plugins.

In the following sections, we discuss the capabilities of built-in plugins and custom plugins, with examples to create each type of plugin.

Built-in plugins

Amazon Q Business supports more than 50 actions in applications, including:

PagerDuty Advance, ServiceNow, and Zendesk Suite for ticketing and incident management
Atlassian Confluence, Jira Cloud, and Smartsheet for project management
Salesforce for customer relationship management (CRM)
Microsoft Exchange and Teams for communication
Asana and Google Calendar for productivity

The following table provides a complete list of the Amazon Q actions available for each application.

Category	Application	Actions
Ticketing and incident management	PagerDuty Advance	• Get incidents • Similar incidents • Root cause incident • Find recent changes • Who is on-call • Status update on incident • Customer impact • Update incident
	ServiceNow	• Create incident • Read incident • Update incident • Delete incident • Read change request • Create change request • Update change request • Delete change request
	Zendesk Suite	• Search content • Get ticket • Create ticket • Update ticket
Project management	Atlassian Confluence	• Search pages
	Jira Cloud	• Read issue • Create issue • Search issue • Change issue status • Delete issue • Read sprint • Move issue to sprint • Create sprint • Delete sprint
	Smartsheet	• Search sheets • Read sheet • List reports • Get report
Customer Relationship Management (CRM)	Salesforce	• Get account list • Get case • Create case • Delete case • Update case • Get opportunities • Get specific opportunity • Create opportunity • Update opportunity • Delete opportunity • Fetch specific contact • List contacts
Communication	Microsoft Exchange	• Get events from calendar • Get email
	Microsoft Teams	• Send private message • Send channel message (public or private)
Productivity	Asana	• Create a task • Update a task
	Google Calendar	• Find events • List calendar

Built-in plugin example: Configure the Salesforce built-in plugin with Amazon Q Business

Salesforce is a CRM tool for managing customer interactions. If you’re a Salesforce user, you can activate the Amazon Q Business plugin for Salesforce to allow your users to perform the following actions from within their web experience chat:

Managing cases (create, delete, update, and get)
Retrieving account lists
Handling opportunities (create, update, delete, get, and fetch specific)
Fetching specific contacts

To set up this plugin, you need configuration details from your Salesforce instance to connect Amazon Q Business with Salesforce. For more information, see Prerequisites

After carrying out the prerequisites in Salesforce and capturing configuration details, you need to configure them on the Amazon Q Business console.

To configure the plugin, complete the following steps:

On the Amazon Q Business console, choose Applications in the navigation pane.
Select your application and on the Actions menu, choose Plugins.
Choose Add plugin.
Under Add plugin, provide the following information:
- Choose Salesforce as your plugin.
- For Plugin name, enter a name for your Amazon Q plugin.
- For Domain URL, enter your Salesforce domain URL. For example, https://yourInstance.my.salesforce.com/services/data/v60.0.
Under OAuth 2.0 authentication, for AWS Secrets Manager secret, select Create and add a new secret or Use an existing one. (For this example, we create a new AWS Secrets Manager secrets).
In the Create new AWS Secrets Manager secret pop-up, enter the following information:
1. For Secret name, enter a name for your secret.
2. For Client ID, enter the client ID generated when you created your OAuth 2.0 application in Salesforce.
3. For Client secret, enter the client secret generated when you created your OAuth 2.0 application in Salesforce.
4. For Redirect URL, enter the URL to which the user needs to be redirected after authentication. If your deployed web URL is <q-endpoint>, use <q-endpoint>/oauth/callback. Amazon Q Business will handle OAuth tokens in this URL. This callback URL needs to be allowlisted in your third-party application.
5. Choose Create.
For Access token URL, enter https://login.salesforce.com/services/oauth2/token (Salesforce OAuth applications).
For Authorization URL, enter https://login.salesforce.com/services/oauth2/authorize (Salesforce OAuth applications).
Under Service access, select Create and add a new service role or Use an existing service role. Make sure that your service role has the necessary permissions.
Under Tags, you can add optional tags to track your plugin.
Choose Add.

You have successfully added the Salesforce built-in plugin to be used by users. Example usage of this plugin is shown in the end-to-end use case later in this post.

Custom plugins

If an action isn’t available through built-in plugins, then you can build a custom plugin and add it to your Amazon Q Business plugins. With custom plugins, you can integrate Amazon Q with third-party applications for a variety of different use cases. After a custom plugin is enabled, users can use natural language to query data (such as stock prices or their vacation balance) and take actions (such as submitting vacation time or updating a record).

Creating and using custom plugins requires the following high-level steps:

Configure authentication and network information for the third-party application to interact with Amazon Q Business.
Create or edit an OpenAPI schema outlining the different API operations that you want to enable for your custom plugin. You can configure up to eight API operations per custom plugin.
After the custom plugin is deployed, Amazon Q Business will dynamically determine the appropriate APIs to call to accomplish a user-requested task. To maximize accuracy, review the best practices for configuring OpenAPI schema definitions for custom plugins.

Custom plugin example: Configure the HR Time Off custom plugin with Amazon Q Business.

The HR Time Off custom plugin is designed to help employees manage their time off requests through Amazon Q Business. An employee can use this custom plugin to perform the following actions directly from an Amazon Q business web experience chat:

Check available time off balance
Submit time off requests

The following figure shows the architecture of this plugin.

This integration allows employees to manage their time off requests seamlessly in Amazon Q Business without having to switch between different applications, improving productivity and user experience.

For an AWS CloudFormation template and code samples to deploy an HR Leave Management System application along with the Amazon Q Business plugin, refer to the following GitHub repo.

To configure Amazon Q Business with the API details, complete the following steps:

On the Amazon Q Business console, in the navigation pane, choose Applications.
Select your application from the list of applications.
Choose Enhancements, and then choose Plugins.
Choose Add plugin.
Under Add plugin, choose Custom plugin.
Under Name and description, for Plugin name, enter a name for your Amazon Q plugin. The name can include hyphens (-) but not spaces and can have a maximum of 1,000 alphanumeric characters.
Under API schema, for API schema source, select one of the following options:
- Select Select from Amazon S3 to select an existing API schema from an Amazon Simple Storage Service (Amazon S3) bucket. Your API schema must have an API description, structure, and parameters for your custom plugin. Then, enter the Amazon S3 URL to your API schema.
- Select Define with in-line OpenAPI schema editor to write a custom plugin API schema in the inline OpenAPI schema editor in the Amazon Q console. A sample schema appears that you can edit. Then, you can choose to do the following:
  - Select the format for the schema: JSON or YAML.
  - To import an existing schema from Amazon S3 to edit, choose Import schema, provide the Amazon S3 URL, and choose Import.
  - To restore the schema to the original sample schema, choose Reset and then confirm the message that appears by choosing Reset
Under Authentication, select either Authentication required or No authentication required.
If no authentication is required, there is no further action needed. If authentication is required, choose Create and add a new secret or Use an existing one. (For this post, we create a new secret.)

Your secret must contain:

In the Create an AWS Secrets Manager secret pop-up, provide the following information:
- For Secret name, enter a name for your Secrets Manager secret.
- For Client ID, enter the client ID you copied from your third-party application.
- For Client secret, enter the client secret you copied from your third-party application.
- For OAuth callback URL, enter the URL to which the user needs to be redirected after authentication. If your deployed web URL is <q-endpoint>, use <q-endpoint>/oauth/callback. Amazon Q Business will handle OAuth tokens in this URL. This callback URL needs to be allowlisted in your third-party application.
- Choose Create.
Under Choose a method to authorize Amazon Q Business, select Create and add a new service role or Use an existing service role. Make sure that your service role has the necessary permissions.
The console will generate a Service role name.
Under Tags, you can add optional tags to track your plugin.
Choose Add to add your plugin.

You have successfully added the HR Time Off custom plugin to be used by users. Example usage of this plugin is shown in the end-to-end use case later in this post.

End-to-end use cases using built-in and custom plugins

Sarah, a Customer Success Manager, demonstrates the seamless use of multiple applications through Amazon Q Business. Using Amazon Q Business, she uses a Salesforce built-in plugin to check high-value opportunities and create cases, uses ServiceNow’s built-in plugin for ticket management on email synchronization issues of her laptop, and uses a custom HR plugin to check her PTO balance and submit time off requests.

Overview of the Amazon Q Business setup

To enable Sarah’s seamless experience across multiple applications, an Amazon Q Business administrator needs to implement a comprehensive configuration that combines both built-in and custom plugins. This enterprise-wide setup consists of:

UI integration
1. Implement the Amazon Q Business chat interface
2. Configure user interaction endpoints
Built-in plugin setup
1. Integrate ServiceNow for IT service management and incident handling
2. Configure Salesforce plugin for CRM operations and case handling
Custom plugin implementation
1. Set up the HR Time Off plugin employee leave management and PTO balance inquiries
2. Configure endpoints and authentication mechanisms
Data source integration
1. Configure an Amazon S3 connector for ingesting IT documentation
2. Set up secure access to the enterprise knowledge base

This integrated setup, shown in the following figure, enables employees to interact with multiple enterprise systems through a single, conversational interface, significantly improving workflow efficiency and user experience.

The following screenshot shows all the plugins available for end-user.

In the following sections, we explore the end-to-end user flow for this use case.

Salesforce integration (built-in plugin)

Sarah selects the Salesforce built-in plugin from the Amazon Q Business Chat UI and asks Amazon Q to provide details about high-value opportunities, as shown in the following screenshots.

During the first use of the Salesforce plugin, Amazon Q Business will authenticate the user through Salesforce’s login interface, as shown in the following screenshot. For users who have already authenticated through enterprise single sign-on (SSO) or directly using their Salesforce login, only an API access approval will be requested.

After authentication and API access approval by the user, the plugin returns the results in real time from Salesforce, as shown in the following screenshot.

Later, Sarah creates a new case in Salesforce to follow up with high-value client, as shown in the following screenshot.

A case is created successfully in Salesforce, as shown in the following screenshot.

ServiceNow ticket management integration (enterprise indexed content and built-in plugin)

Sarah encounters an email synchronization issue on her laptop. Sarah searches Amazon Q Business for guidance on troubleshooting the issue. Given that Amazon Q Business has already indexed IT Helpdesk documents from Amazon S3, it returns troubleshooting steps, as shown in the following screenshot.

Sarah couldn’t resolve the issue after following the troubleshooting documentation. She chooses the ServiceNow plugin in the Chat UI and creates a ServiceNow ticket for further analysis, as shown in the following screenshot.

During the first usage of the ServiceNow plugin, Amazon Q Business will authenticate the user through ServiceNow’s login interface, as shown in the following screenshot.

For users who are already authenticated through enterprise SSO or directly using their ServiceNow login, only an API access approval is required, as shown in the following screenshot.

As shown in the following screenshot, an incident is successfully created in ServiceNow.

An incident is created successfully in ServiceNow as show below. This shows the creation capability of built in plugin.

She updates the ticket priority to high for faster resolution as show below. This shows the update capability of built in plugin.

Impact and Urgency of the incident is updated to high in ServiceNow in real-time as shown in below figure. This shows the update capability of built in plugin.

HR system integration (custom plugin)

Sarah needs to plan her upcoming vacation. She uses Amazon Q to check her available PTO balance through the HR custom plugin, as shown in the following screenshot. This demonstrates the real-time secure retrieval capability of custom plugins.

She submits a time off request directly through Amazon Q, as shown in the following screenshots.

Sarah’s experience demonstrates how Amazon Q Business plugins enable seamless real-time interaction across multiple enterprise applications—from managing Salesforce opportunities and ServiceNow tickets to submitting time off requests—all through a single conversational interface, eliminating application switching and improving productivity.

Clean up

To clean up, delete the Amazon Q application you created.

Conclusion

Amazon Q Business actions through plugins represent a significant advancement in streamlining enterprise workflows and enhancing employee productivity. As demonstrated in this post, these advancements can be seen across three key areas:

Unified interface
- Provides employees with a single, conversational interface
- Enables seamless interaction across multiple enterprise applications
- Eliminates the need for constant application switching
Knowledge integration
- Combines enterprise knowledge from Amazon Q Business connectors with actionable plugins
- Enables employees to access documentation and take immediate action
Workflow enhancement
- Simplifies complex tasks through natural language interaction
- Reduces time spent switching between applications
- Improves overall employee productivity

What enterprise workflows in your organization could benefit from streamlined automation through Amazon Q Business plugins? Whether it’s integrating existing enterprise applications through built-in plugins or creating custom plugins for your proprietary systems, Amazon Q Business provides the flexibility to enhance employee productivity across your organization. Try implementing plugins in your Amazon Q Business environment today, and share your feedback and use cases in the comments.

About the Authors

Abhishek Maligehalli Shivalingaiah is a Senior Generative AI Solutions Architect at AWS, specializing in Amazon Q Business. With a deep passion for using agentic AI frameworks to solve complex business challenges, he brings nearly a decade of expertise in developing data and AI solutions that deliver tangible value for enterprises. Beyond his professional endeavors, Abhishek is an artist who finds joy in creating portraits of family and friends, expressing his creativity through various artistic mediums.

Marcel Pividal is a Senior AI Services Solutions Architect in the World-Wide Specialist Organization, bringing over 22 years of expertise in transforming complex business challenges into innovative technological solutions. As a thought leader in generative AI implementation, he specializes in developing secure, compliant AI architectures for enterprise-scale deployments across multiple industries.

Sachi Sharma is a Senior Software Engineer at Amazon Q Business, specializing in generative and agentic AI. Beyond her professional pursuits, Sachi is an avid reader and coffee lover, and enjoys driving, particularly long, scenic drives.

Manjukumar Patil is a Software Engineer at Amazon Q Business with a passion for designing and scaling AI-driven distributed systems. In his free time, he loves hiking and exploring national parks.

James Gung is a Senior Applied Scientist at AWS whose research spans diverse topics related to conversational AI and agentive systems. Outside of work, he enjoys spending time with his family, traveling, playing violin, and bouldering.

Najih is a Senior Software Engineer at AWS Q Business. He is passionate about designing and scaling AI based distributed systems, and excels at bringing innovative solutions to complex challenges. Outside of work, he enjoys lifting and martial arts, particularly MMA.

A quick guide to Amazon's papers at NeurIPS 2024

December 10, 2024

by Amazon AWS

While large language models and other foundation models are well represented, traditional Amazon interests such as bandit problems and new topics such as AI for automated reasoning also get their due.Read More

Amazon opens new AI lab in San Francisco focused on long-term research bets

December 9, 2024

by Amazon AWS

The Amazon AGI SF Lab will focus on developing new foundational capabilities for enabling useful AI agents.Read More

Accelerating ML experimentation with enhanced security: AWS PrivateLink support for Amazon SageMaker with MLflow

December 9, 2024

by Xiaoyu Xing Amazon AWS

With access to a wide range of generative AI foundation models (FM) and the ability to build and train their own machine learning (ML) models in Amazon SageMaker, users want a seamless and secure way to experiment with and select the models that deliver the most value for their business. In the initial stages of an ML project, data scientists collaborate closely, sharing experimental results to address business challenges. However, keeping track of numerous experiments, their parameters, metrics, and results can be difficult, especially when working on complex projects simultaneously. MLflow, a popular open-source tool, helps data scientists organize, track, and analyze ML and generative AI experiments, making it easier to reproduce and compare results.

SageMaker is a comprehensive, fully managed ML service designed to provide data scientists and ML engineers with the tools they need to handle the entire ML workflow. Amazon SageMaker with MLflow is a capability in SageMaker that enables users to create, manage, analyze, and compare their ML experiments seamlessly. It simplifies the often complex and time-consuming tasks involved in setting up and managing an MLflow environment, allowing ML administrators to quickly establish secure and scalable MLflow environments on AWS. See Fully managed MLFlow on Amazon SageMaker for more details.

Enhanced security: AWS VPC and AWS PrivateLink

When working with SageMaker, you can decide the level of internet access to provide to your users. For example, you can give users access permission to download popular packages and customize the development environment. However, this can also introduce potential risks of unauthorized access to your data. To mitigate these risks, you can further restrict which traffic can access the internet by launching your ML environment in an Amazon Virtual Private Cloud (Amazon VPC). With an Amazon VPC, you can control the network access and internet connectivity of your SageMaker environment, or even remove direct internet access to add another layer of security. See Connect to SageMaker through a VPC interface endpoint to understand the implications of running SageMaker within a VPC and the differences when using network isolation.

SageMaker with MLflow now supports AWS PrivateLink, which enables you to transfer critical data from your VPC to MLflow Tracking Servers through a VPC endpoint. This capability enhances the protection of sensitive information by making sure that data sent to the MLflow Tracking Servers is transferred within the AWS network, avoiding exposure to the public internet. This capability is available in all AWS Regions where SageMaker is currently available, excluding China Regions and GovCloud (US) Regions. To learn more, see Connect to an MLflow tracking server through an Interface VPC Endpoint.

In this blogpost, we demonstrate a use case to set up a SageMaker environment in a private VPC (without internet access), while using MLflow capabilities to accelerate ML experimentation.

Solution overview

You can find the reference code for this sample in GitHub. The high-level steps are as follows:

Deploy infrastructure with the AWS Cloud Development Kit (AWS CDK) including:
- A SageMaker environment in a private VPC without internet access.
- AWS CodeArtifact, which provides a private PyPI repository so that SageMaker can use it to download necessary packages.
- VPC endpoints, which enable the SageMaker environment to connect to other AWS services (Amazon Simple Storage Service (Amazon S3), AWS CodeArtifact, Amazon Elastic Container Registry (Amazon ECR), Amazon CloudWatch, SageMaker Managed MLflow, and so on) through AWS PrivateLink without exposing the environment to the public internet.
Run ML experimentation with MLflow using the @remote decorator from the open-source SageMaker Python SDK.

The overall solution architecture is shown in the following figure.

For your reference, this blog post demonstrates a solution to create a VPC with no internet connection using an AWS CloudFormation template.

Prerequisites

You need an AWS account with an AWS Identity and Access Management (IAM) role with permissions to manage resources created as part of the solution. For details, see Creating an AWS account.

Deploy infrastructure with AWS CDK

The first step is to create the infrastructure using this CDK stack. You can follow the deployment instructions from the README.

Let’s first have a closer look at the CDK stack itself.

It defines multiple VPC endpoints, including the MLflow endpoint as shown in the following sample:

vpc.add_interface_endpoint(
    "mlflow-experiments",
    service=ec2.InterfaceVpcEndpointAwsService.SAGEMAKER_EXPERIMENTS,
    private_dns_enabled=True,
    subnets=ec2.SubnetSelection(subnets=subnets),
    security_groups=[studio_security_group]
)

We also try to restrict the SageMaker execution IAM role so that you can use SageMaker MLflow only when you’re in the right VPC.

You can further restrict the VPC endpoint for MLflow by attaching a VPC endpoint policy.

Users outside the VPC can potentially connect to Sagemaker MLflow through the VPC endpoint to MLflow. You can add restrictions so that user access to SageMaker MLflow is only allowed from your VPC.

studio_execution_role.attach_inline_policy(
    iam.Policy(self, "mlflow-policy",
        statements=[
            iam.PolicyStatement(
                effect=iam.Effect.ALLOW,
                actions=["sagemaker-mlflow:*"],
                resources=["*"],
                conditions={"StringEquals": {"aws:SourceVpc": vpc.vpc_id } }
            )
        ]
    )
)

After successful deployment, you should be able to see the new VPC in the AWS Management Console for Amazon VPC without internet access, as shown in the following screenshot.

A CodeArtifact domain and a CodeArtifact repository with external connection to PyPI should also be created, as shown in the following figure, so that SageMaker can use it to download necessary packages without internet access. You can verify the creation of the domain and the repository by going to the CodeArtifact console. Choose “Repositories” under “Artifacts” from the navigation pane and you will see the repository “pip”.

ML experimentation with MLflow

Setup

After the CDK stack creation, a new SageMaker domain with a user profile should also be created. Launch Amazon SageMaker Studio and create a JupyterLab Space. In the JupyterLab Space, choose an instance type of ml.t3.medium, and select an image with SageMaker Distribution 2.1.0.

To check that the SageMaker environment has no internet connection, open the JupyterLab space and check the internet connection by running the curl command in a terminal.

SageMaker with MLflow now supports MLflow version 2.16.2 to accelerate generative AI and ML workflows from experimentation to production. An MLflow 2.16.2 tracking server is created along with the CDK stack.

You can find the MLflow tracking server Amazon Resource Name (ARN) either from the CDK output or from the SageMaker Studio UI by clicking “MLFlow” icon, as shown in the following figure. You can click the “copy” button next to the “mlflow-server” to copy the MLflow tracking server ARN.

As an example dataset to train the model, download the reference dataset from the public UC Irvine ML repository to your local PC, and name it predictive_maintenance_raw_data_header.csv.

Upload the reference dataset from your local PC to your JupyterLab Space as shown in the following figure.

To test your private connectivity to the MLflow tracking server, you can download the sample notebook that has been uploaded automatically during the creation of the stack in a bucket within your AWS account. You can find the an S3 bucket name in the CDK output, as shown in the following figure.

From the JupyterLab app terminal, run the following command:

aws s3 cp --recursive <YOUR-BUCKET-URI> ./

You can now open the private-mlflow.ipynb notebook.

In the first cell, fetch credentials for the CodeArtifact PyPI repository so that SageMaker can use pip from the private AWS CodeArtifact repository. The credentials will expire in 12 hours. Make sure to log on again when they expire.

%%bash
AWS_ACCOUNT=$(aws sts get-caller-identity --output text --query 'Account')
aws codeartifact login --tool pip --repository pip --domain code-artifact-domain --domain-owner ${AWS_ACCOUNT} --region ${AWS_DEFAULT_REGION}

Experimentation

After setup, start the experimentation. The scenario is using the XGBoost algorithm to train a binary classification model. Both the data processing job and model training job use @remote decorator so that the jobs are running in the SageMaker-associated private subnets and security group from your private VPC.

In this case, the @remote decorator looks up the parameter values from the SageMaker configuration file (config.yaml). These parameters are used for data processing and training jobs. We define the SageMaker-associated private subnets and security group in the configuration file. For the full list of supported configurations for the @remote decorator, see Configuration file in the SageMaker Developer Guide.

Note that we specify in PreExecutionCommands the aws codeartifact login command to point SageMaker to the private CodeAritifact repository. This is needed to make sure that the dependencies can be installed at runtime. Alternatively, you can pass a reference to a container in your Amazon ECR through ImageUri, which contains all installed dependencies.

We specify the security group and subnets information in VpcConfig.

config_yaml = f"""
SchemaVersion: '1.0'
SageMaker:
  PythonSDK:
    Modules:
      TelemetryOptOut: true
      RemoteFunction:
        # role arn is not required if in SageMaker Notebook instance or SageMaker Studio
        # Uncomment the following line and replace with the right execution role if in a local IDE
        # RoleArn: <replace the role arn here>
        # ImageUri: <replace with your image if you want to avoid installing dependencies at run time>
        S3RootUri: s3://{bucket_prefix}
        InstanceType: ml.m5.xlarge
        Dependencies: ./requirements.txt
        IncludeLocalWorkDir: true
        PreExecutionCommands:
        - "aws codeartifact login --tool pip --repository pip --domain code-artifact-domain --domain-owner {account_id} --region {region}"
        CustomFileFilter:
          IgnoreNamePatterns:
          - "data/*"
          - "models/*"
          - "*.ipynb"
          - "__pycache__"
        VpcConfig:
          SecurityGroupIds: 
          - {security_group_id}
          Subnets: 
          - {private_subnet_id_1}
          - {private_subnet_id_2}
"""

Here’s how you can setup an MLflow experiment similar to this.

from time import gmtime, strftime

# Mlflow (replace these values with your own, if needed)
project_prefix = project_prefix
tracking_server_arn = mlflow_arn
experiment_name = f"{project_prefix}-sm-private-experiment"
run_name=f"run-{strftime('%d-%H-%M-%S', gmtime())}"

Data preprocessing

During the data processing, we use the @remote decorator to link parameters in config.yaml to your preprocess function.

Note that MLflow tracking starts from the mlflow.start_run() API.

The mlflow.autolog() API can automatically log information such as metrics, parameters, and artifacts.

You can use log_input() method to log a dataset to the MLflow artifact store.

@remote(keep_alive_period_in_seconds=3600, job_name_prefix=f"{project_prefix}-sm-private-preprocess")
def preprocess(df, df_source: str, experiment_name: str):
    
    mlflow.set_tracking_uri(tracking_server_arn)
    mlflow.set_experiment(experiment_name)    
    
    with mlflow.start_run(run_name=f"Preprocessing") as run:            
        mlflow.autolog()
        
        columns = ['Type', 'Air temperature [K]', 'Process temperature [K]', 'Rotational speed [rpm]', 'Torque [Nm]', 'Tool wear [min]', 'Machine failure']
        cat_columns = ['Type']
        num_columns = ['Air temperature [K]', 'Process temperature [K]', 'Rotational speed [rpm]', 'Torque [Nm]', 'Tool wear [min]']
        target_column = 'Machine failure'                    
        df = df[columns]

        mlflow.log_input(
            mlflow.data.from_pandas(df, df_source, targets=target_column),
            context="DataPreprocessing",
        )
        
        ...
        
        model_file_path="/opt/ml/model/sklearn_model.joblib"
        os.makedirs(os.path.dirname(model_file_path), exist_ok=True)
        joblib.dump(featurizer_model, model_file_path)

    return X_train, y_train, X_val, y_val, X_test, y_test, featurizer_model

Run the preprocessing job, then go to the MLflow UI (shown in the following figure) to see the tracked preprocessing job with the input dataset.

X_train, y_train, X_val, y_val, X_test, y_test, featurizer_model = preprocess(df=df, 
                                                                              df_source=input_data_path, 
                                                                              experiment_name=experiment_name)

You can open an MLflow UI from SageMaker Studio as the following figure. Click “Experiments” from the navigation pane and select your experiment.

From the MLflow UI, you can see the processing job that just run.

You can also see security details in the SageMaker Studio console in the corresponding training job as shown in the following figure.

Model training

Similar to the data processing job, you can also use @remote decorator with the training job.

Note that the log_metrics() method sends your defined metrics to the MLflow tracking server.

@remote(keep_alive_period_in_seconds=3600, job_name_prefix=f"{project_prefix}-sm-private-train")
def train(X_train, y_train, X_val, y_val,
          eta=0.1, 
          max_depth=2, 
          gamma=0.0,
          min_child_weight=1,
          verbosity=0,
          objective='binary:logistic',
          eval_metric='auc',
          num_boost_round=5):     
    
    mlflow.set_tracking_uri(tracking_server_arn)
    mlflow.set_experiment(experiment_name)
    
    with mlflow.start_run(run_name=f"Training") as run:               
        mlflow.autolog()
             
        # Creating DMatrix(es)
        dtrain = xgboost.DMatrix(X_train, label=y_train)
        dval = xgboost.DMatrix(X_val, label=y_val)
        watchlist = [(dtrain, "train"), (dval, "validation")]
    
        print('')
        print (f'===Starting training with max_depth {max_depth}===')
        
        param_dist = {
            "max_depth": max_depth,
            "eta": eta,
            "gamma": gamma,
            "min_child_weight": min_child_weight,
            "verbosity": verbosity,
            "objective": objective,
            "eval_metric": eval_metric
        }        
    
        xgb = xgboost.train(
            params=param_dist,
            dtrain=dtrain,
            evals=watchlist,
            num_boost_round=num_boost_round)
    
        predictions = xgb.predict(dval)
    
        print ("Metrics for validation set")
        print('')
        print (pd.crosstab(index=y_val, columns=np.round(predictions),
                           rownames=['Actuals'], colnames=['Predictions'], margins=True))
        
        rounded_predict = np.round(predictions)
    
        val_accuracy = accuracy_score(y_val, rounded_predict)
        val_precision = precision_score(y_val, rounded_predict)
        val_recall = recall_score(y_val, rounded_predict)

        # Log additional metrics, next to the default ones logged automatically
        mlflow.log_metric("Accuracy Model A", val_accuracy * 100.0)
        mlflow.log_metric("Precision Model A", val_precision)
        mlflow.log_metric("Recall Model A", val_recall)
        
        from sklearn.metrics import roc_auc_score
    
        val_auc = roc_auc_score(y_val, predictions)
        
        mlflow.log_metric("Validation AUC A", val_auc)
    
        model_file_path="/opt/ml/model/xgboost_model.bin"
        os.makedirs(os.path.dirname(model_file_path), exist_ok=True)
        xgb.save_model(model_file_path)

    return xgb

Define hyperparameters and run the training job.

eta=0.3
max_depth=10

booster = train(X_train, y_train, X_val, y_val,
              eta=eta, 
              max_depth=max_depth)

In the MLflow UI you can see the tracking metrics as shown in the figure below. Under “Experiments” tab, go to “Training” job of your experiment task. It is under “Overview” tab.

You can also view the metrics as graphs. Under “Model metrics” tab, you can see the model performance metrics that configured as part of the training job log.

With MLflow, you can log your dataset information alongside other key metrics, such as hyperparameters and model evaluation. Find more details in the blogpost LLM experimentation with MLFlow.

Clean up

To clean up, first delete all spaces and applications created within the SageMaker Studio domain. Then destroy the infrastructure created by running the following code.

cdk destroy

Conclusion

SageMaker with MLflow allows ML practitioners to create, manage, analyze, and compare ML experiments on AWS. To enhance security, SageMaker with MLflow now supports AWS PrivateLink. All MLflow Tracking Server versions including 2.16.2 integrate seamlessly with this feature, enabling secure communication between your ML environments and AWS services without exposing data to the public internet.

For an extra layer of security, you can set up SageMaker Studio within your private VPC without Internet access and execute your ML experiments in this environment.

SageMaker with MLflow now supports MLflow 2.16.2. Setting up a fresh installation provides the best experience and full compatibility with the latest features.

About the Authors

Xiaoyu Xing is a Solutions Architect at AWS. She is driven by a profound passion for Artificial Intelligence (AI) and Machine Learning (ML). She strives to bridge the gap between these cutting-edge technologies and a broader audience, empowering individuals from diverse backgrounds to learn and leverage AI and ML with ease. She is helping customers to adopt AI and ML solutions on AWS in a secure and responsible way.

Paolo Di Francesco is a Senior Solutions Architect at Amazon Web Services (AWS). He holds a PhD in Telecommunications Engineering and has experience in software engineering. He is passionate about machine learning and is currently focusing on using his experience to help customers reach their goals on AWS, in particular in discussions around MLOps. Outside of work, he enjoys playing football and reading.

Tomer Shenhar is a Product Manager at AWS. He specializes in responsible AI, driven by a passion to develop ethically sound and transparent AI solutions.

Mistral-NeMo-Instruct-2407 and Mistral-NeMo-Base-2407 are now available on SageMaker JumpStart

December 6, 2024

by Niithiyn Vijeaswaran Amazon AWS

Today, we are excited to announce that Mistral-NeMo-Base-2407 and Mistral-NeMo-Instruct-2407—twelve billion parameter large language models from Mistral AI that excel at text generation—are available for customers through Amazon SageMaker JumpStart. You can try these models with SageMaker JumpStart, a machine learning (ML) hub that provides access to algorithms and models that can be deployed with one click for running inference. In this post, we walk through how to discover, deploy and use the Mistral-NeMo-Instruct-2407 and Mistral-NeMo-Base-2407 models for a variety of real-world use cases.

Mistral-NeMo-Instruct-2407 and Mistral-NeMo-Base-2407 overview

Mistral NeMo, a powerful 12B parameter model developed through collaboration between Mistral AI and NVIDIA and released under the Apache 2.0 license, is now available on SageMaker JumpStart. This model represents a significant advancement in multilingual AI capabilities and accessibility.

Key features and capabilities

Mistral NeMo features a 128k token context window, enabling processing of extensive long-form content. The model demonstrates strong performance in reasoning, world knowledge, and coding accuracy. Both pre-trained base and instruction-tuned checkpoints are available under the Apache 2.0 license, making it accessible for researchers and enterprises. The model’s quantization-aware training facilitates optimal FP8 inference performance without compromising quality.

Multilingual support

Mistral NeMo is designed for global applications, with strong performance across multiple languages including English, French, German, Spanish, Italian, Portuguese, Chinese, Japanese, Korean, Arabic, and Hindi. This multilingual capability, combined with built-in function calling and an extensive context window, helps make advanced AI more accessible across diverse linguistic and cultural landscapes.

Tekken: Advanced tokenization

The model uses Tekken, an innovative tokenizer based on tiktoken. Trained on over 100 languages, Tekken offers improved compression efficiency for natural language text and source code.

SageMaker JumpStart overview

SageMaker JumpStart is a fully managed service that offers state-of-the-art foundation models for various use cases such as content writing, code generation, question answering, copywriting, summarization, classification, and information retrieval. It provides a collection of pre-trained models that you can deploy quickly, accelerating the development and deployment of ML applications. One of the key components of SageMaker JumpStart is the Model Hub, which offers a vast catalog of pre-trained models, such as DBRX, for a variety of tasks.

You can now discover and deploy both Mistral NeMo models with a few clicks in Amazon SageMaker Studio or programmatically through the SageMaker Python SDK, enabling you to derive model performance and machine learning operations (MLOps) controls with Amazon SageMaker features such as Amazon SageMaker Pipelines, Amazon SageMaker Debugger, or container logs. The model is deployed in an AWS secure environment and under your virtual private cloud (VPC) controls, helping to support data security.

Prerequisites

To try out both NeMo models in SageMaker JumpStart, you will need the following prerequisites:

An AWS account that will contain all your AWS resources.
An AWS Identity and Access Management (IAM) role to access SageMaker. To learn more about how IAM works with SageMaker, see Identity and Access Management for Amazon SageMaker.
Access to Amazon SageMaker Studio, a SageMaker notebook instance, or an interactive development environment (IDE) such as PyCharm or Visual Studio Code. We recommend using SageMaker Studio for straightforward deployment and inference.
Access to accelerated instances (GPUs) for hosting the model.
This model requires an ml.g6.12xlarge instance. SageMaker JumpStart provides a simplified way to access and deploy over 100 different open source and third-party foundation models. In order to launch an endpoint to host Mistral NeMo from SageMaker JumpStart, you may need to request a service quota increase to access an ml.g6.12xlarge instance for endpoint usage. You can request service quota increases through the console, AWS Command Line Interface (AWS CLI), or API to allow access to those additional resources.

Discover Mistral NeMo models in SageMaker JumpStart

You can access NeMo models through SageMaker JumpStart in the SageMaker Studio UI and the SageMaker Python SDK. In this section, we go over how to discover the models in SageMaker Studio.

SageMaker Studio is an integrated development environment (IDE) that provides a single web-based visual interface where you can access purpose-built tools to perform ML development steps, from preparing data to building, training, and deploying your ML models. For more details on how to get started and set up SageMaker Studio, see Amazon SageMaker Studio.

In SageMaker Studio, you can access SageMaker JumpStart by choosing JumpStart in the navigation pane.

Then choose HuggingFace.

From the SageMaker JumpStart landing page, you can search for NeMo in the search box. The search results will list Mistral NeMo Instruct and Mistral NeMo Base.

You can choose the model card to view details about the model such as license, data used to train, and how to use the model. You will also find the Deploy button to deploy the model and create an endpoint.

Deploy the model in SageMaker JumpStart

Deployment starts when you choose the Deploy button. After deployment finishes, you will see that an endpoint is created. You can test the endpoint by passing a sample inference request payload or by selecting the testing option using the SDK. When you select the option to use the SDK, you will see example code that you can use in the notebook editor of your choice in SageMaker Studio.

Deploy the model with the SageMaker Python SDK

To deploy using the SDK, we start by selecting the Mistral NeMo Base model, specified by the model_id with the value huggingface-llm-mistral-nemo-base-2407. You can deploy your choice of the selected models on SageMaker with the following code. Similarly, you can deploy NeMo Instruct using its own model ID.

from sagemaker.jumpstart.model import JumpStartModel 

accept_eula = True 

model = JumpStartModel(model_id="huggingface-llm-mistral-nemo-base-2407") 
predictor = model.deploy(accept_eula=accept_eula)

This deploys the model on SageMaker with default configurations, including the default instance type and default VPC configurations. You can change these configurations by specifying non-default values in JumpStartModel. The EULA value must be explicitly defined as True to accept the end-user license agreement (EULA). Also make sure that you have the account-level service limit for using ml.g6.12xlarge for endpoint usage as one or more instances. You can follow the instructions in AWS service quotas to request a service quota increase. After it’s deployed, you can run inference against the deployed endpoint through the SageMaker predictor:

payload = {
    "messages": [
        {
            "role": "user",
            "content": "Hello"
        }
    ],
    "max_tokens": 1024,
    "temperature": 0.3,
    "top_p": 0.9,
}

response = predictor.predict(payload)['choices'][0]['message']['content'].strip()
print(response)

An important thing to note here is that we’re using the djl-lmi v12 inference container, so we’re following the large model inference chat completions API schema when sending a payload to both Mistral-NeMo-Base-2407 and Mistral-NeMo-Instruct-2407.

Mistral-NeMo-Base-2407

You can interact with the Mistral-NeMo-Base-2407 model like other standard text generation models, where the model processes an input sequence and outputs predicted next words in the sequence. In this section, we provide some example prompts and sample output. Keep in mind that the base model is not instruction fine-tuned.

Text completion

Tasks involving predicting the next token or filling in missing tokens in a sequence:

payload = {
    "messages": [
        {
            "role": "user",
            "content": "The capital of France is ___."
        }
    ],
    "max_tokens": 10,
    "temperature": 0.3,
    "top_p": 0.9,
}

response = predictor.predict(payload)['choices'][0]['message']['content'].strip()
print(response)

The following is the output:

Paris
The capital of France is Paris.

Mistral NeMo Instruct

The Mistral-NeMo-Instruct-2407 model is a quick demonstration that the base model can be fine-tuned to achieve compelling performance. You can follow the steps provided to deploy the model and use the model_id value of huggingface-llm-mistral-nemo-instruct-2407 instead.

The instruction-tuned NeMo model can be tested with the following tasks:

Code generation

Mistral NeMo Instruct demonstrates benchmarked strengths for coding tasks. Mistral states that their Tekken tokenizer for NeMo is approximately 30% more efficient at compressing source code. For example, see the following code:

payload = {
    "messages": [
        {
            "role": "user",
            "content": """Create a Binary Search Tree class with methods for insertion, searching, and in-order traversal."""
        }
    ],
    "max_tokens": 2048,
    "temperature": 0.3,
    "top_p": 0.9,
}
text_response = predictor.predict(payload)['choices'][0]['message']['content'].strip()
tokens = predictor.predict(payload)['usage']
print(text_response)
print(tokens)

The following is the output:

class Node:
    def __init__(self, key):
        self.key = key
        self.left = None
        self.right = None

class BinarySearchTree:
    def __init__(self):
        self.root = None

    def insert(self, key):
        if not self.root:
            self.root = Node(key)
        else:
            self._insert(self.root, key)

    def _insert(self, node, key):
        if key < node.key:
            if node.left:
                self._insert(node.left, key)
            else:
                node.left = Node(key)
        elif key > node.key:
            if node.right:
                self._insert(node.right, key)
            else:
                node.right = Node(key)

    def search(self, key):
        return self._search(self.root, key)

    def _search(self, node, key):
        if not node or node.key == key:
            return node

        if key < node.key:
            return self._search(node.left, key)
        else:
            return self._search(node.right, key)

    def inorder_traversal(self):
        self._inorder_traversal(self.root)
        print()

    def _inorder_traversal(self, node):
        if node:
            self._inorder_traversal(node.left)
            print(node.key, end=" ")
            self._inorder_traversal(node.right)

# Example usage:
bst = BinarySearchTree()
bst.insert(50)
bst.insert(30)
bst.insert(20)
bst.insert(40)
bst.insert(70)
bst.insert(60)
bst.insert(80)

print("In-order traversal:")
bst.inorder_traversal()  # Output: 20 30 40 50 60 70 80

print(f"Search 40: {bst.search(40).key if bst.search(40) else 'Not found'}")
print(f"Search 90: {bst.search(90).key if bst.search(90) else 'Not found'}")
{'prompt_tokens': 22, 'completion_tokens': 433, 'total_tokens': 455}

The model demonstrates strong performance on code generation tasks, with the completion_tokens offering insight into how the tokenizer’s code compression effectively optimizes the representation of programming languages using fewer tokens.

Advanced math and reasoning

The model also reports strengths in mathematic and reasoning accuracy. For example, see the following code:

payload = {
    "messages": [
        {   "role": "system", 
            "content": "You are an expert in mathematics and reasoning. Your role is to provide examples, explanations, and insights related to mathematical concepts, problem-solving techniques, and logical reasoning.",
            "role": "user",
            "content": """Calculating the orbital period of an exoplanet:
             Given: An exoplanet orbits its star at a distance of 2.5 AU (Astronomical Units). The star has a mass of 1.2 solar masses.
             Task: Calculate the orbital period of the exoplanet in Earth years."""
        }
    ],
    "max_tokens": 2048,
    "temperature": 0.3,
    "top_p": 0.9,
}
response = predictor.predict(payload)['choices'][0]['message']['content'].strip()
print(response)

The following is the output:

To calculate the orbital period of an exoplanet, we can use Kepler's Third Law, which states that the square of the orbital period (P) is directly proportional to the cube of the semi-major axis (a) of the orbit and inversely proportional to the mass (M) of the central body. The formula is:

P^2 = (4 * π^2 * a^3) / (G * M)

where:
- P is the orbital period in years,
- a is the semi-major axis in AU (Astronomical Units),
- G is the gravitational constant (6.67430 × 10^-11 m^3 kg^-1 s^-2),
- M is the mass of the star in solar masses.

First, we need to convert the mass of the star from solar masses to kilograms. The mass of the Sun is approximately 1.98847 × 10^30 kg. So, the mass of the star is:

M = 1.2 * 1.98847 × 10^30 kg = 2.386164 × 10^30 kg

Now, we can plug the values into Kepler's Third Law:

P^2 = (4 * π^2 * (2.5 AU)^3) / (G * M)

Since 1 AU is approximately 1.496 × 10^11 meters, the semi-major axis in meters is:

a = 2.5 AU * 1.496 × 10^11 m/AU = 3.74 × 10^12 m

Now, we can calculate P^2:

P^2 = (4 * π^2 * (3.74 × 10^12 m)^3) / (6.67430 × 10^-11 m^3 kg^-1 s^-2 * 2.386164 × 10^30 kg)

P^2 = (4 * π^2 * 5.62 × 10^36 m^3) / (1.589 × 10^20 m^3 kg^-1 s^-2)

P^2 = (4 * π^2 * 5.62 × 10^36 m^3) / (1.589 × 10^20 m^3 kg^-1 s^-2)

P^2 = 4.15 × 10^16 s^2

Now, we take the square root to find the orbital period in seconds:

P = √(4.15 × 10^16 s^2) ≈ 2.04 × 10^8 s

Finally, we convert the orbital period from seconds to Earth years (1 Earth year = 31,557,600 seconds):

P = (2.04 × 10^8 s) / (31,557,600 s/year) ≈ 6.47 years

Therefore, the orbital period of the exoplanet is approximately 6.47 Earth years.

Language translation task

In this task, let’s test Mistral’s new Tekken tokenizer. Mistral states that the tokenizer is two times and three times more efficient at compressing Korean and Arabic, respectively.

Here, we use some text for translation:

text= """
"How can our business leverage Mistral NeMo with our new RAG application?"
"What is our change management strategy once we roll out this new application to the field?
"""

We set our prompt to instruct the model on the translation to Korean and Arabic:

prompt=f"""

text={text}

Translate the following text into these languages:

1. Korean
2. Arabic

Label each language section accordingly""".format(text=text)

We can then set the payload:

payload = {
    "messages": [
        {   "role": "system", 
            "content": "You are an expert in language translation.",
            "role": "user",
            "content": prompt
        }
    ],
    "max_tokens": 2048,
    "temperature": 0.3,
    "top_p": 0.9,
}
#response = predictor.predict(payload)
text_response = predictor.predict(payload)['choices'][0]['message']['content'].strip()
tokens = predictor.predict(payload)['usage']
print(text_response)
print(tokens)

The following is the output:

**1. Korean**

- "우리의 비즈니스가 Mistral NeMo를 어떻게 활용할 수 있을까요?"
- "이 새 애플리케이션을 현장에 롤아웃할 때 우리의 변화 관리 전략은 무엇입니까?"

**2. Arabic**

- "كيف يمكن لعمليتنا الاست من Mistral NeMo مع تطبيق RAG الجديد؟"
- "ما هو استراتيجيتنا في إدارة التغيير بعد تفعيل هذا التطبيق الجديد في الميدان؟"
{'prompt_tokens': 61, 'completion_tokens': 243, 'total_tokens': 304}

The translation results demonstrate how the number of completion_tokens used is significantly reduced, even for tasks that are typically token-intensive, such as translations involving languages like Korean and Arabic. This improvement is made possible by the optimizations provided by the Tekken tokenizer. Such a reduction is particularly valuable for token-heavy applications, including summarization, language generation, and multi-turn conversations. By enhancing token efficiency, the Tekken tokenizer allows for more tasks to be handled within the same resource constraints, making it an invaluable tool for optimizing workflows where token usage directly impacts performance and cost.

Clean up

After you’re done running the notebook, make sure to delete all resources that you created in the process to avoid additional billing. Use the following code:

predictor.delete_model()
predictor.delete_endpoint()

Conclusion

In this post, we showed you how to get started with Mistral NeMo Base and Instruct in SageMaker Studio and deploy the model for inference. Because foundation models are pre-trained, they can help lower training and infrastructure costs and enable customization for your use case. Visit SageMaker JumpStart in SageMaker Studio now to get started.

For more Mistral resources on AWS, check out the Mistral-on-AWS GitHub repository.

About the authors

Niithiyn Vijeaswaran is a Generative AI Specialist Solutions Architect with the Third-Party Model Science team at AWS. His area of focus is generative AI and AWS AI Accelerators. He holds a Bachelor’s degree in Computer Science and Bioinformatics.

Preston Tuggle is a Sr. Specialist Solutions Architect working on generative AI.

Shane Rai is a Principal Generative AI Specialist with the AWS World Wide Specialist Organization (WWSO). He works with customers across industries to solve their most pressing and innovative business needs using the breadth of cloud-based AI/ML services provided by AWS, including model offerings from top tier foundation model providers.

Model produces pseudocode for security controls in seconds

December 6, 2024

by Amazon AWS

New tool harnesses large language models to create rules for the configuration of AWS services and the processing of alerts.Read More