Amazon Bedrock Prompt Optimization Drives LLM Applications Innovation for Yuewen Group

Amazon Bedrock Prompt Optimization Drives LLM Applications Innovation for Yuewen Group

Yuewen Group is a global leader in online literature and IP operations. Through its overseas platform WebNovel, it has attracted about 260 million users in over 200 countries and regions, promoting Chinese web literature globally. The company also adapts quality web novels into films, animations for international markets, expanding the global influence of Chinese culture.

Today, we are excited to announce the availability of Prompt Optimization on Amazon Bedrock. With this capability, you can now optimize your prompts for several use cases with a single API call or a click of a button on the Amazon Bedrock console. In this blog post, we discuss how Prompt Optimization improves the performance of large language models (LLMs) for intelligent text processing task in Yuewen Group.

Evolution from Traditional NLP to LLM in Intelligent Text Processing

Yuewen Group leverages AI for intelligent analysis of extensive web novel texts. Initially relying on proprietary natural language processing (NLP) models, Yuewen Group faced challenges with prolonged development cycles and slow updates. To improve performance and efficiency, Yuewen Group transitioned to Anthropic’s Claude 3.5 Sonnet on Amazon Bedrock.

Claude 3.5 Sonnet offers enhanced natural language understanding and generation capabilities, handling multiple tasks concurrently with improved context comprehension and generalization. Using Amazon Bedrock significantly reduced technical overhead and streamlined development process.

However, Yuewen Group initially struggled to fully harness LLM’s potential due to limited experience in prompt engineering. In certain scenarios, the LLM’s performance fell short of traditional NLP models. For example, in the task of “character dialogue attribution”, traditional NLP models achieved around 80% accuracy, while LLMs with unoptimized prompts only reached around 70%. This discrepancy highlighted the need for strategic prompt optimization to enhance capabilities of LLMs in these specific use cases.

Challenges in Prompt Optimization

Manual prompt optimization can be challenging due to the following reasons:

Difficulty in Evaluation: Assessing the quality of a prompt and its consistency in eliciting desired responses from a language model is inherently complex. Prompt effectiveness is not only determined by the prompt quality, but also by its interaction with the specific language model, depending on its architecture and training data. This interplay requires substantial domain expertise to understand and navigate. In addition, evaluating LLM response quality for open-ended tasks often involves subjective and qualitative judgements, making it challenging to establish objective and quantitative optimization criteria.

Context Dependency: Prompt effectiveness is highly contigent on the specific contexts and use cases. A prompt that works well in one scenario may underperform in another, necessitating extensive customization and fine-tuning for different applications. Therefore, developing a universally applicable prompt optimization method that generalizes well across diverse tasks remains a significant challenge.

Scalability: As LLMs find applications in a growing number of use cases, the number of required prompts and the complexity of the language models continue to rise. This makes manual optimization increasingly time-consuming and labor-intensive. Crafting and iterating prompts for large-scale applications can quickly become impractical and inefficient. Meanwhile, as the number of potential prompt variations increases, the search space for optimal prompts grows exponentially, rendering manual exploration of all combinations infeasible, even for moderately complex prompts.

Given these challenges, automatic prompt optimization technology has garnered significant attention in the AI community. In particular, Bedrock Prompt Optimization offers two main advantages:

  • Efficiency: It saves considerable time and effort by automatically generating high quality prompts suited for a variety of target LLMs supported on Bedrock, alleviating the need for tedious manual trial and error in model-specific prompt engineering.
  • Performance Enhancement: It notably improves AI performance by creating optimized prompts that enhance the output quality of language models across a wide range of tasks and tools.

These benefits not only streamline the development process, but also lead to more efficient and effective AI applications, positioning auto-prompting as a promising advancement in the field.

Introduction to Bedrock Prompt Optimization

Prompt Optimization on Amazon Bedrock is an AI-driven feature aiming to automatically optimize under-developed prompts for customers’ specific use cases, enhancing performance across different target LLMs and tasks. Prompt Optimization is seamlessly integrated into Amazon Bedrock Playground and Prompt Management to easily create, evaluate, store and use optimized prompt in your AI applications.

Amazon-Bedrock-Prompt-Optimization-1

On the AWS Management Console for Prompt Management, users input their original prompt. The prompt can be a template with the required variables represented by placeholders (e.g. {{document}} ), or a full prompt with actual texts filled into the placeholders. After selecting a target LLM from the supported list, users can kick off the optimization process with a single click, and the optimized prompt will be generated within seconds. The console then displays the Compare Variants tab, presenting the original and optimized prompts side-by-side for quick comparison. The optimized prompt often includes more explicit instructions on processing the input variables and generating the desired output format. Users can observe the enhancements made by Prompt Optimization to improve the prompt’s performance for their specific task.

Amazon-Bedrock-Prompt-Optimization-2

Comprehensive evaluation was done on open-source datasets across tasks including classification, summarization, open-book QA / RAG, agent / function-calling, as well as complex real-world customer use cases, which has shown substantial improvement by the optimized prompts.

Underlying the process, a Prompt Analyzer and a Prompt Rewriter are combined to optimize the original prompt. Prompt Analyzer is a fine-tuned LLM which decomposes the prompt structure by extracting its key constituent elements, such as the task instruction, input context, and few-shot demonstrations. The extracted prompt components are then channeled to the Prompt Rewriter module, which employs a general LLM-based meta-prompting strategy to further improve the prompt signatures and restructure the prompt layout. As the result, Prompt Rewriter produces a refined and enhanced version of the initial prompt tailored to the target LLM.

Results of Prompt Optimization

Using Bedrock Prompt Optimization, Yuewen Group achieved significant improvements in across various intelligent text analysis tasks, including name extraction and multi-option reasoning use-cases. Take character dialogue attribution as an example, optimized prompts reached 90% accuracy, surpassing traditional NLP models by 10% per customer’s experimentation.

Using the power of foundation models, Prompt Optimization produces high-quality results with minimal manual prompt iteration. Most importantly, this feature enabled Yuewen Group to complete prompt engineering processes in a fraction of the time, greatly improving development efficiency.

Prompt Optimization Best Practices

Throughout our experience with Prompt Optimization, we’ve compiled several tips for better user experience:

  1. Use clear and precise input prompt: Prompt Optimization will benefit from clear intent(s) and key expectations in your input prompt. Also, clear prompt structure can offer a better start for Prompt Optimization. For example, separating different prompt sections by new lines.
  2. Use English as the input language: We recommend using English as the input language for Prompt Optimization. Currently, prompts containing a large extent of other languages might not yield the best results.
  3. Avoid overly long input prompt and examples: Excessively long prompts and few-shot examples significantly increase the difficulty of semantic understanding and challenge the output length limit of the rewriter. Another tip is to avoid excessive placeholders among the same sentence and removing actual context about the placeholders from the prompt body, for example: instead of “Answer the {{question}} by reading {{author}}’s {{paragraph}}”, assemble your prompt in forms such as “Paragraph:n{{paragraph}}nAuthor:n{{author}}nAnswer the following question:n{{question}}”.
  4. Use in the early stages of Prompt Engineering : Prompt Optimization excels at quickly optimizing less-structured prompts (a.k.a. “lazy prompts”) during the early stage of prompt engineering. The improvement is likely to be more significant for such prompts compared to those already carefully curated by experts or prompt engineers.

Conclusion

Prompt Optimization on Amazon Bedrock has proven to be a game-changer for Yuewen Group in their intelligent text processing. By significantly improving the accuracy of tasks like character dialogue attribution and streamlining the prompt engineering process, Prompt Optimization has enabled Yuewen Group to fully harness the power of LLMs. This case study demonstrates the potential of Prompt Optimization to revolutionize LLM applications across industries, offering both time savings and performance improvements. As AI continues to evolve, tools like Prompt Optimization will play a crucial role in helping businesses maximize the benefits of LLM in their operations.

We encourage you to explore Prompt Optimization to improve the performance of your AI applications. To get started with Prompt Optimization, see the following resources:

  1. Amazon Bedrock Pricing page
  2. Amazon Bedrock user guide
  3. Amazon Bedrock API reference

About the Authors

qruwangRui Wang is a senior solutions architect at AWS with extensive experience in game operations and development. As an enthusiastic Generative AI advocate, he enjoys exploring AI infrastructure and LLM application development. In his spare time, he loves eating hot pot.

tonyhhHao Huang is an Applied Scientist at the AWS Generative AI Innovation Center. His expertise lies in generative AI, computer vision, and trustworthy AI. Hao also contributes to the scientific community as a reviewer for leading AI conferences and journals, including CVPR, AAAI, and TMM.

yaguanGuang Yang, Ph.D. is a senior applied scientist with the Generative AI Innovation Centre at AWS. He has been with AWS for 5 yrs, leading several customer projects in the Greater China Region spanning different industry verticals such as software, manufacturing, retail, AdTech, finance etc. He has over 10+ years of academic and industry experience in building and deploying ML and GenAI based solutions for business problems.

donshenZhengyuan Shen is an Applied Scientist at Amazon Bedrock, specializing in foundational models and ML modeling for complex tasks including natural language and structured data understanding. He is passionate about leveraging innovative ML solutions to enhance products or services, thereby simplifying the lives of customers through a seamless blend of science and engineering. Outside work, he enjoys sports and cooking.

Huong Nguyen is a Principal Product Manager at AWS. She is a product leader at Amazon Bedrock, with 18 years of experience building customer-centric and data-driven products. She is passionate about democratizing responsible machine learning and generative AI to enable customer experience and business innovation. Outside of work, she enjoys spending time with family and friends, listening to audiobooks, traveling, and gardening.

Read More

Build a location-aware agent using Amazon Bedrock Agents and Foursquare APIs

Build a location-aware agent using Amazon Bedrock Agents and Foursquare APIs

This post is co-written with Vikram Gundeti and Nate Folkert from Foursquare.

Personalization is key to creating memorable experiences. Whether it’s recommending the perfect movie or suggesting a new restaurant, tailoring suggestions to individual preferences can make all the difference. But when it comes to food and activities, there’s more to consider than just personal taste. Location and weather also play a crucial role in shaping our choices. Imagine planning a day out: on a sunny afternoon, a leisurely picnic in the park might be ideal, but if it’s pouring rain, a cozy indoor café would be much more appealing. The challenge, then, is to create an agent that can seamlessly integrate these factors—location, weather, and personal preferences—to provide truly personalized recommendations.

To tackle this challenge, we can combine Amazon Bedrock Agents and Foursquare APIs. In this post, we demonstrate how you can use a location-aware agent to bring personalized responses to your users.

Amazon Bedrock Agents

Amazon Bedrock that makes it straightforward to build and scale generative AI applications. It provides access to a variety of high-performing foundation models (FMs) from leading AI companies like AI21 Labs, Anthropic, Cohere, Luma, Meta, Mistral AI, Stability AI, and Amazon, all through a single API. This means you don’t need to manage infrastructure, because it’s serverless and integrates with familiar AWS services for security, privacy, and responsible AI. You can experiment with models, customize them with your data, and build applications without writing complex code.

Amazon Bedrock Agents is a feature within Amazon Bedrock that allows you to create autonomous AI agents. These agents can understand user requests, break them into steps, and complete tasks by connecting to your company’s APIs and data sources. For example, they can automate processes like processing insurance claims or managing inventory, making them efficient for business tasks. They handle prompt engineering, memory, and security automatically, so you can set them up quickly without managing infrastructure.

Foursquare Places APIs

Foursquare’s Places APIs deliver precise location intelligence for applications requiring contextual awareness. Built on top of the open source global Places dataset with 100 million points of interest spanning 1,500 categories, the Places APIs transform geographic coordinates into actionable business context.

The GeoTagging API accurately resolves GPS coordinates to a specific place with a high degree of precision, enabling applications to instantly determine if a user is at a local coffee shop, inside a Macy’s department store, or standing in Central Park. The Place Search & Data APIs transform how applications discover locations by providing nuanced filtering capabilities beyond simple proximity searches. Developers can filter places by specific categories (finding only restaurants, parks, or tourist attractions), apply attribute-based constraints (such as price range or special amenities), consider temporal factors (like current operating status), and balance distance with relevance for truly contextual results. Each place returned comes enriched with contextual attributes, including photos, reviews, quality ratings, and real-time popularity data.

When integrated with Amazon Bedrock Agents, Foursquare’s Places APIs enable the creation of applications that understand the complete context of a user’s location—resulting in experiences that are relevant, timely, and personalized.

Solution overview

To demonstrate the power of adding location to Amazon Bedrock Agents, we created a simple architecture that creates an Amazon Bedrock agent with the Foursquare Places APIs and a Weather API. By combing these capabilities, we can create unique user experiences that are customized to the context of where the user is. The following diagram shows how we architected the agent.

In the solution workflow, the user interacts with the agent through a Streamlit web interface. The web application uses the application logic that invokes the Amazon Bedrock agent in the cloud. The agent knows about the location tools and weather tools even though these are hosted locally inside the application. When the tools are invoked by the agent, a return of control response is given to the application logic, which invokes the tool and provides the response from the tool in a second invocation of the agent. In addition to the tools, the agent has basic instructions on what type of personality it should have and what types of behaviors it should support.

Let’s explore an example of a brief interaction with the agent where we ask if there is a park nearby and a recommended restaurant near the park for takeout food.

The following screenshot shows the first interaction with an agent, locating a park nearby with the Foursquare APIs invoked by the agent.

In this example, you can see the agent sending intermediate events to the user informing them of the actions taking place (invoking the model, invoking a tool, thinking, and so on).

The following screenshot shows the list of restaurants recommended by the Foursquare APIs near the park.

In this example, the agent invokes the APIs based on the user input, and the Streamlit UI connects the output from Foursquare to a map.

In the following section, we detail how you can build the agent in your account and get started.

Prerequisites

To deploy this solution, you should have an AWS account with the necessary permissions.

You will also need a Foursquare Service API Key to allow your AI agent to access Foursquare API endpoints. If you do not already have one, follow the instructions on Foursquare Documentation – Manage Your Service API Keys to create one. You will need to log in to your Foursquare developer account or create one if you do not have one (creating a basic account is free and includes starter credit for your project). Be sure to copy the Service API key upon creation as you will not be able to see it again. 

Build the agent

The source code for the Foursquare agent is available as open source in the following GitHub repository. Complete the following steps to up the agent in your local folder from the source:

  1. Clone the repository to a local folder.
  2. Set environment variables for your Foursquare API token:

export FOURSQUARE_SERVICE_TOKEN=<Foursquare API token>

  1. Set environment variables for your AWS credentials:

export AWS_ACCESS_KEY_ID=<AWS_ACCESS_KEY_ID>

export AWS_SECRET_ACCESS_KEY=<SECRET_ACCESS_KEY>

  1. Install requirements:

pip install requirements.txt

  1. Start the Streamlit UI:

streamlit run agent_ui.py

Best practices

When you’re creating an agent, we recommend starting with a test dataset. Think through the possible inputs and what are acceptable outputs. Use these sample conversations to test the agent whenever a change is made. In addition, Amazon Bedrock Agents allows you to configure guardrails to protect against malicious input or types of conversation that you would not want to use for your user experience. We recommend for any production use cases to couple your agent with appropriate guardrails. To learn more, see Amazon Bedrock Guardrails.

Clean up

When you’re done using the solution, remove any resources you created to avoid ongoing charges.

Conclusion

Agents provide a mechanism to automate work in behalf of your customers, whether through a chat interface or other inputs. Combining the automation possible with agents with the location-aware APIs from Foursquare, you can create powerful UIs and experiences that will delight your customers with new levels of personalization. With Amazon Bedrock Agents, you can build a cloud-centered solution that allows you to use powerful foundation models on Amazon Bedrock to drive these experiences.

Try out the solution for your own use case, and share your feedback in the comments.


About the authors

John Baker is a Principal SDE at AWS, where he works on Amazon Bedrock and specifically Amazon Bedrock Agents. He has been with Amazon for more than 10 years and has worked across AWS, Alexa, and Amazon.com. In his spare time, John enjoys skiing and other outdoor activities throughout the Pacific Northwest.

Mark Roy is a Principal Machine Learning Architect for AWS, helping customers design and build generative AI solutions. His focus since early 2023 has been leading solution architecture efforts for the launch of Amazon Bedrock, the flagship generative AI offering from AWS for builders. Mark’s work covers a wide range of use cases, with a primary interest in generative AI, agents, and scaling ML across the enterprise. He has helped companies in insurance, financial services, media and entertainment, healthcare, utilities, and manufacturing. Prior to joining AWS, Mark was an architect, developer, and technology leader for over 25 years, including 19 years in financial services. Mark holds six AWS Certifications, including the ML Specialty Certification.

Vikram Gundeti currently serves as the Chief Technology Officer (CTO) of Foursquare, where he leads the technical strategy, decision making, and research for the company’s Geospatial Platform. Before joining Foursquare, Vikram held the position of Principal Engineer at Amazon, where he made his mark as a founding engineer on the Amazon Alexa team.

Nate Folkert is a Senior Staff Engineer at Foursquare, where he’s been since spotting it trending nearby when checking in at a Soho coffee shop 14 years ago. He builds the server API for Swarm and helps out on special projects. Outside of work, he loves exploring the world (with Swarm, ofc, so is it really outside of work?) and is currently obsessed with finding all of the irl filming locations used in Apple TV’s Severance

Read More

Build an automated generative AI solution evaluation pipeline with Amazon Nova

Build an automated generative AI solution evaluation pipeline with Amazon Nova

Large language models (LLMs) have become integral to numerous applications across industries, ranging from enhanced customer interactions to automated business processes. Deploying these models in real-world scenarios presents significant challenges, particularly in ensuring accuracy, fairness, relevance, and mitigating hallucinations. Thorough evaluation of the performance and outputs of these models is therefore critical to maintaining trust and safety.

Evaluation plays a central role in the generative AI application lifecycle, much like in traditional machine learning. Robust evaluation methodologies enable informed decision-making regarding the choice of models and prompts. However, evaluating LLMs is a complex and resource-intensive process given the free-form text output of LLMs. Methods such as human evaluation provide valuable insights but are costly and difficult to scale. Consequently, there is a demand for automated evaluation frameworks that are highly scalable and can be integrated into application development, much like unit and integration tests in software development.

In this post, to address the aforementioned challenges, we introduce an automated evaluation framework that is deployable on AWS. The solution can integrate multiple LLMs, use customized evaluation metrics, and enable businesses to continuously monitor model performance. We also provide LLM-as-a-judge evaluation metrics using the newly released Amazon Nova models. These models enable scalable evaluations due to their advanced capabilities and low latency. Additionally, we provide a user-friendly interface to enhance ease of use.

In the following sections, we discuss various approaches to evaluate LLMs. We then present a typical evaluation workflow, followed by our AWS-based solution that facilitates this process.

Evaluation methods

Prior to implementing evaluation processes for generative AI solutions, it’s crucial to establish clear metrics and criteria for assessment and gather an evaluation dataset.

The evaluation dataset should be representative of the actual real-world use case. It should consist of diverse samples and ideally contain ground truth values generated by experts. The size of the dataset will depend on the exact application and the cost of acquiring data; however, a dataset that spans relevant and diverse use cases should be a minimum. Developing an evaluation dataset can itself be an iterative task that is progressively enhanced by adding new samples and enriching the dataset with samples where the model performance is lacking. After the evaluation dataset is acquired, evaluation criteria can then be defined.

The evaluation criteria can be broadly divided into three main areas:

  • Latency-based metrics – These include measurements such as response generation time or time to first token. The importance of each metric might vary depending on the specific application.
  • Cost – This refers to the expense associated with response generation.
  • Performance – Performance-based metrics are highly case-dependent. They might include measurements of accuracy, factual consistency of responses, or the ability to generate structured responses.

Generally, there is an inverse relationship between latency, cost, and performance. Depending on the use case, one factor might be more critical than the others. Having metrics for these categories across different models can help you make data-driven decisions to determine the optimum choice for your specific use case.

Although measuring latency and cost can be relatively straightforward, assessing performance requires a deep understanding of the use case and knowing what is crucial for success. Depending on the application, you might be interested in evaluating the factual accuracy of the model’s output (particularly if the output is based on specific facts or reference documents), or you might want to assess whether the model’s responses are consistently polite and helpful, or both.

To support these diverse scenarios, we have incorporated several evaluation metrics in our solution:

  • FMEval Foundation Model Evaluation (FMEval) library provided by AWS offers purpose-built evaluation models to provide metrics like toxicity in LLM output, accuracy, and semantic similarity between generated and reference text. This library can be used to evaluate LLMs across several tasks such as open-ended generation, text summarization, question answering, and classification.
  • Ragas Ragas is an open source framework that provides metrics for evaluation of Retrieval Augmented Generation (RAG) systems (systems that generate answers based on a provided context). Ragas can be used to evaluate the performance of an information retriever (the component that retrieves relevant information from a database) using metrics like context precision and recall. Ragas also provides metrics to evaluate the LLM generation from the provided context using metrics like answer faithfulness to the provided context and answer relevance to the original question.
  • LLMeter LLMeter is a simple solution for latency and throughput testing of LLMs, such as LLMs provided through Amazon Bedrock and OpenAI. This can be helpful in comparing models on metrics for latency-critical workloads.
  • LLM-as-a-judge metrics – Several challenges arise in defining performance metrics for free form text generated by LLMs – for example, the same information might be expressed in a different way. It’s also difficult to clearly define metrics for measuring characteristics like politeness. To tackle such evaluations, LLM-as-a-judge metrics have become popular. LLM-as-a-judge evaluations use a judge LLM to score the output of an LLM based on certain predefined criteria. We use the Amazon Nova model as the judge due to its advanced accuracy and performance.

Evaluation workflow

Now that we know what metrics we care about, how do we go about evaluating our solution? A typical generative AI application development (proof of concept) process can be abstracted as follows:

  1. Builders use a few test examples and try out different prompts to see the performance and get a rough idea of the prompt template and model they want to start with (online evaluation).
  2. Builders test the first prompt template version with a selected LLM against a test dataset with ground truth for a list of evaluation metrics to check the performance (offline evaluation). Based on the evaluation results, they might need to modify the prompt template, fine-tune the model, or implement RAG to add additional context to improve performance.
  3. Builders implement the change and evaluate the updated solution against the dataset to validate improvements on the solution. Then they repeat the previous steps until the performance of the developed solution meets the business requirements.

The two key stages in the evaluation process are:

  • Online evaluation – This involves manually evaluating prompts based on a few examples for qualitative checks
  • Offline evaluation – This involves automated quantitative evaluation on an evaluation dataset

This process can add significant operational complications and effort from the builder team and operations team. To achieve this workflow, you need the following:

  • A side-by-side comparison tool for various LLMs
  • A prompt management service that can be used to save and version control prompts
  • A batch inference service that can invoke your selected LLM on a large number of examples
  • A batch evaluation service that can be used to evaluate the LLM response generated in the previous step

In the next section, we describe how we can create this workflow on AWS.

Solution overview

In this section, we present an automated generative AI evaluation solution that can be used to simplify the evaluation process. The architecture diagram of the solution is shown in the following figure.

This solution provides both online (real-time comparison) and offline (batch evaluation) evaluation options that fulfill different needs during the generative AI solution development lifecycle. Each component in this evaluation infrastructure can be developed using existing open source tools or AWS native services.

The architecture of the automated LLM evaluation pipeline focuses on modularity, flexibility, and scalability. The design philosophy makes sure that different components can be reused or adapted for other generative AI projects. The following is an overview of each component and its role in the solution:

  • UI – The UI provides a straightforward way to interact with the evaluation framework. Users can compare different LLMs with a side-by-side comparison. The UI provides latency, model outputs, and cost for each input query (online evaluation). The UI also helps you store and manage your different prompt templates backed by the Amazon Bedrock prompt management feature. These prompts can be referenced later for batch generation or production use. You can also launch batch generation and evaluation jobs through the UI. The UI service can be run locally in a Docker container or deployed to AWS Fargate.
  • Prompt management – The evaluation solution includes a key component for prompt management. Backed by Amazon Bedrock prompt management, you can save and retrieve your prompts using the UI.
  • LLM invocation pipeline – Using AWS Step Functions, this workflow automates the process of generating outputs from the LLM for a test dataset. It retrieves inputs from Amazon Simple Storage Service (Amazon S3), processes them, and stores the responses back to Amazon S3. This workflow supports batch processing, making it suitable for large-scale evaluations.
  • LLM evaluation pipeline – This workflow, also managed by Step Functions, evaluates the outputs generated by the LLM. At the time of writing, the solution supports metrics provided by the FMEval library, Ragas library, and custom LLM-as-a-judge metrics. It handles various evaluation methods, including direct metrics computation and LLM-guided evaluation. The results are stored in Amazon S3, ready for analysis.
  • Eval factory – A core service for conducting evaluations, the eval factory supports multiple evaluation techniques, including those that use other LLMs for reference-free scoring. It provides consistency in evaluation results by standardizing outputs into a single metric per evaluation. It can be difficult to find a one-size-fits-all solution when it comes to evaluation, so we provide you the flexibility to use your own script for evaluation. We also provide pre-built scripts and pipelines for some common tasks including classification, summarization, translation, and RAG. Especially for RAG, we have integrated popular open source libraries like Ragas.
  • Postprocessing and results store – After the pipeline results are generated, postprocessing can concatenate the results and potentially display the results in a results store that can provide a graphical view of the results. This part also handles updates to the prompt management system because each prompt template and LLM combination will have recorded evaluation results to help you select the right model and prompt template for the use case. Visualization of the results can be done on the UI or even with an Amazon Athena table if the prompt management system uses Amazon S3 as the data storage. This part can be done by using an AWS Lambda function, which can be triggered by an event sent after the new data has been saved to the Amazon S3 location for the prompt management system.

The evaluation solution can significantly enhance team productivity throughout the development lifecycle by reducing manual intervention and increasing automated processes. As new LLMs emerge, builders can compare the current production LLM with new models to determine if upgrading would improve the system’s performance. This ongoing evaluation process makes sure that the generative AI solution remains optimal and up-to-date.

Prerequisites

For scripts to set up the solution, refer to the GitHub repository. After the backend and the frontend are up and running, you can start the evaluation process.

To start, open the UI in your browser. The UI provides the ability to do both online and offline evaluations.

Online evaluation

To iteratively refine prompts, you can follow these steps:

  1. Choose the options menu (three lines) on the top left side of the page to set the AWS Region.
  2. After you choose the Region, the model lists will be prefilled with the available Amazon Bedrock models in that Region.
  3. You can choose two models for side-by-side comparison.
  4. You can select a prompt already stored in Amazon Bedrock prompt management from the dropdown menu. If selected, this will automatically fill the prompts.
  5. You can also create a new prompt by entering the prompt in the text box. You can select generation configurations (temperature, top P, and so on) on the Generation Configuration The prompt template can also use dynamic variables by entering variables in {{}} (for example, for additional context, add a variable like {{context}}). Then define the value of these variables on the Context tab.
  6. Choose Enter to start generation.
  7. This will invoke the two models and present the output in the text boxes below each model. Additionally, you will also be provided with the latency and cost for each model.
  8. To save the prompt to Amazon Bedrock, choose Save.

Offline generation and evaluation

After you have made the model and prompt choice, you can run batch generation and evaluation over a larger dataset.

  1. To run batch generation, choose the model from the dropdown list.
  2. You can provide an Amazon Bedrock knowledge base ID if additional context is required for generation.
  3. You can also provide a prompt template ID. This prompt will be used for generation.
  4. Upload a dataset file. This file will be uploaded to the S3 bucket set in the sidebar. This file should be a pipe (|) separated CSV file. For more details on expected data file format, see the project’s GitHub README file.
  5. Choose Start Generation to start the job. This will trigger a Step Functions workflow that you can track by choosing the link in the pop-up.

Select model for Batch Generation

Invoking batch generation triggers a Step Functions workflow, which is shown in the following figure. The logic follows these steps:

  1. GetPrompts – This step retrieves a CSV file containing prompts from an S3 bucket. The contents of this file become the Step Functions workflow’s payload.
  2. convert_to_json – This step parses the CSV output and converts it into a JSON format. This transformation enables the step function to use the Map state to process the invoke_llm flow concurrently.
  3. Map step – This is an iterative step that processes the JSON payload by invoking the invoke_llm Lambda function concurrently for each item in the payload. A concurrency limit is set, with a default value of 3. You can adjust this limit based on the capacity of your backend LLM service. Within each Map iteration, the invoke_llm Lambda function calls the backend LLM service to generate a response for a single question and its associated context.
  4. InvokeSummary – This step combines the output from each iteration of the Map step. It generates a JSON Lines result file containing the outputs, which is then stored in an S3 bucket for evaluation purposes.

When the batch generation is complete, you can trigger a batch evaluation pipeline with the selected metrics from the predefined metric list. You can also specify the location of an S3 file that contains already generated LLM outputs to perform batch evaluation.

Select model for Evaluation

Invoking batch evaluation triggers an Evaluate-LLM Step Functions workflow, which is shown in the following figure. The Evaluate-LLM Step Functions workflow is designed to comprehensively assess LLM performance using multiple evaluation frameworks:

  • LLMeter evaluation – Uses the AWS Labs LLMeter framework and focuses on endpoint performance metrics and benchmarking.
  • Ragas framework evaluation – Uses Ragas framework evaluation to measure four critical quality metrics:
    • Context precision – A metric that evaluates whether the ground truth relevant items present in the contexts (retrieved chunks from vector database) are ranked higher or not. Its value ranges between 0–1, with higher values indicating better performance. The RAG system usually retrieves more than 1 chunks for a given query, and the chunks are ranked in order. A lower score is assigned when the high-ranked chunks contain more irrelevant information, which indicate bad information retrieval capability.
    • Context recall – A metric that measures the extent to which the retrieved context aligns with the ground truth. Its value ranges between 0–1, with higher values indicating better performance. The ground truth can contain several short and definitive claims. For example, the ground truth “Canberra is the capital city of Australia, and the city is located at the northern end of the Australian Capital Territory” has two claims: “Canberra is the capital city of Australia” and “Canberra city is located at the northern end of the Australian Capital Territory.” Each claim in the ground truth is analyzed to determine whether it can be attributed to the retrieved context or not. A higher value is assigned when more claims in the ground truth are attributable to the retrieved context.
    • Faithfulness – A metric that measures the factual consistency of the generated answer against the given context. Its value ranges between 0–1, with higher values indicating better performance. The answer can also contain several claims. A lower score is assigned to answers that contain a smaller number of claims that can be inferred from the given context.
    • Answer relevancy – A metric that focuses on assessing how pertinent the generated answer is to the given prompt. It is scaled to (0, 1) range, and the higher the better. A lower score is assigned to answers that are incomplete or contain redundant information, and higher scores indicate better relevancy.
  • LLM-as-a-judge evaluation – Uses LLM capabilities to compare and score outputs against expected answers, which provides qualitative assessment of response accuracy. The prompts used for the LLM-as-a-judge are for demonstration purposes; to serve your specific use case, provide your own evaluation prompts to make sure the LLM-as-a-judge meets the correct evaluation requirements.
  • FM evaluation: Uses the AWS open source FMEval library and analyzes key metrics, including toxicity measurement.

The architecture implements these evaluations as nested Step Functions workflows that execute concurrently, enabling efficient and comprehensive model assessment. This design also makes it straightforward to add new frameworks to the evaluation workflow.

Step Function workflow for Evaluation

Clean up

To delete local deployment for the frontend, run run.sh delete_local. If you need to delete the cloud deployment, run run.sh delete_cloud. For the backend, you can delete the AWS CloudFormation stack, llm-evaluation-stack. For resources that you can’t delete automatically, manually delete them on the AWS Management Console.

Conclusion

In this post, we explored the importance of evaluating LLMs in the context of generative AI applications, highlighting the challenges posed by issues like hallucinations and biases. We introduced a comprehensive solution using AWS services to automate the evaluation process, allowing for continuous monitoring and assessment of LLM performance. By using tools like the FMeval Library, Ragas, LLMeter, and Step Functions, the solution provides flexibility and scalability, meeting the evolving needs of LLM consumers.

With this solution, businesses can confidently deploy LLMs, knowing they adhere to the necessary standards for accuracy, fairness, and relevance. We encourage you to explore the GitHub repository and start building your own automated LLM evaluation pipeline on AWS today. This setup can not only streamline your AI workflows but also make sure your models deliver the highest-quality outputs for your specific applications.


About the Authors

Deepak DalakotiDeepak Dalakoti, PhD, is a Deep Learning Architect at the Generative AI Innovation Centre in Sydney, Australia. With expertise in artificial intelligence, he partners with clients to accelerate their GenAI adoption through customized, innovative solutions. Outside the world of AI, he enjoys exploring new activities and experiences, currently focusing on strength training.

Rafa XURafa XU, is a passionate Amazon Web Services (AWS) senior cloud architect focused on helping Public Sector customers design, build, and run infrastructure application and services on AWS. With more than 10 years of experience working across multiple information technology disciplines, Rafa has spent the last five years focused on AWS Cloud infrastructure, serverless applications, and automation. More recently, Rafa has expanded his skillset to include Generative AI, Machine Learning, Big data and Internet of Things (IoT).

Melanie LiDr. Melanie Li, PhD, is a Senior Generative AI Specialist Solutions Architect at AWS based in Sydney, Australia, where her focus is on working with customers to build solutions leveraging state-of-the-art AI and machine learning tools. She has been actively involved in multiple Generative AI initiatives across APJ, harnessing the power of Large Language Models (LLMs). Prior to joining AWS, Dr. Li held data science roles in the financial and retail industries.

Sam EdwardsSam Edwards, is a Solutions Architect at AWS based in Sydney and focused on Media & Entertainment. He is a Subject Matter Expert for Amazon Bedrock and Amazon SageMaker AI services. He is passionate about helping customers solve issues related to machine learning workflows and creating new solutions for them. In his spare time, he likes traveling and enjoying time with Family.

Kai ZhuDr. Kai Zhu, currently works as Cloud Support Engineer at AWS, helping customers with issues in AI/ML related services like SageMaker, Bedrock, etc. He is a SageMaker and Bedrock Subject Matter Expert. Experienced in data science and data engineering, he is interested in building generative AI powered projects.

Read More

Build a FinOps agent using Amazon Bedrock with multi-agent capability and Amazon Nova as the foundation model

Build a FinOps agent using Amazon Bedrock with multi-agent capability and Amazon Nova as the foundation model

AI agents are revolutionizing how businesses enhance their operational capabilities and enterprise applications. By enabling natural language interactions, these agents provide customers with a streamlined, personalized experience. Amazon Bedrock Agents uses the capabilities of foundation models (FMs), combining them with APIs and data to process user requests, gather information, and execute specific tasks effectively. The introduction of multi-agent collaboration now enables organizations to orchestrate multiple specialized AI agents working together to tackle complex, multi-step challenges that require diverse expertise.

Amazon Bedrock offers a diverse selection of FMs, allowing you to choose the one that best fits your specific use case. Among these offerings, Amazon Nova stands out as AWS’s next-generation FM, delivering breakthrough intelligence and industry-leading performance at exceptional value.

The Amazon Nova family comprises three types of models:

  • Understanding models – Available in Micro, Lite, and Pro variants
  • Content generation models – Featuring Canvas and Reel
  • Speech-to-Speech model – Nova Sonic

These models are specifically optimized for enterprise and business applications, excelling in the following capabilities:

  • Text generation
  • Summarization
  • Complex reasoning tasks
  • Content creation

This makes Amazon Nova ideal for sophisticated use cases like our FinOps solution.

A key advantage of the Amazon Nova model family is its industry-leading price-performance ratio. Compared to other enterprise-grade AI models, Amazon Nova offers comparable or superior capabilities at a more competitive price point. This cost-effectiveness, combined with its versatility and performance, makes Amazon Nova an attractive choice for businesses looking to implement advanced AI solutions.

In this post, we use the multi-agent feature of Amazon Bedrock to demonstrate a powerful and innovative approach to AWS cost management. By using the advanced capabilities of Amazon Nova FMs, we’ve developed a solution that showcases how AI-driven agents can revolutionize the way organizations analyze, optimize, and manage their AWS costs.

Solution overview

Our innovative AWS cost management solution uses the power of AI and multi-agent collaboration to provide comprehensive cost analysis and optimization recommendations. The core of the system is built around three key components:

  • FinOps supervisor agent – Acts as the central coordinator, managing user queries and orchestrating the activities of specialized subordinate agents
  • Cost analysis agent – Uses AWS Cost Explorer to gather and analyze cost data for specified time ranges
  • Cost optimization agent – Uses the AWS Trusted Advisor Cost Optimization Pillar to provide actionable cost-saving recommendations

The solution integrates the multi-agent collaboration capabilities of Amazon Bedrock with Amazon Nova to create an intelligent, interactive, cost management AI assistant. This integration enables seamless communication between specialized agents, each focusing on different aspects of AWS cost management. Key features of the solution include:

  • User authentication through Amazon Cognito with role-based access control
  • Frontend application hosted on AWS Amplify
  • Real-time cost insights and historical analysis
  • Actionable cost optimization recommendations
  • Parallel processing of tasks for improved efficiency

By combining AI-driven analysis with AWS cost management tools, this solution offers finance teams and cloud administrators a powerful, user-friendly interface to gain deep insights into AWS spending patterns and identify cost-saving opportunities.

The architecture displayed in the following diagram uses several AWS services, including AWS Lambda functions, to create a scalable, secure, and efficient system. This approach demonstrates the potential of AI-driven multi-agent systems to assist with cloud financial management and solve a wide range of cloud management challenges.

Solutions Overview - FinOps Amazon Bedrock Multi Agent

In the following sections, we dive deeper into the architecture of our solution, explore the capabilities of each agent, and discuss the potential impact of this approach on AWS cost management strategies.

Prerequisites

You must have the following in place to complete the solution in this post:

Deploy solution resources using AWS CloudFormation

This CloudFormation template is designed to run in the us-east-1 Region. If you deploy in a different Region, you must configure cross-Region inference profiles to have proper functionality and update the CloudFormation template accordingly.

During the CloudFormation template deployment, you will need to specify three required parameters:

  • Stack name
  • FM selection
  • Valid user email address

AWS resource usage will incur costs. When deployment is complete, the following resources will be deployed:

  • Amazon Cognito resources:
  • AWS Identity and Access Management (IAM) resources:
    • IAM roles:
      • FinanceUserRestrictedRole
      • DefaultCognitoAuthenticatedRole
    • IAM policies:
      • Finance-BedrockAccess
      • Default-CognitoAccess
    • Lambda functions:
      • TrustedAdvisorListRecommendationResources
      • TrustedAdvisorListRecommendations
      • CostAnalysis
      • ClockandCalendar
      • CostForecast
    • Amazon Bedrock agents:
      • FinOpsSupervisorAgent
      • CostAnalysisAgent with action groups:
        • CostAnalysisActionGroup
        • ClockandCalendarActionGroup
        • CostForecastActionGroup
      • CostOptimizationAgent with action groups:
        • TrustedAdvisorListRecommendationResources
        • TrustedAdvisorListRecommendations

After you deploy the CloudFormation template, copy the following from the Outputs tab on the AWS CloudFormation console to use during the configuration of your application after it’s deployed in Amplify:

  • AWSRegion
  • BedrockAgentAliasId
  • BedrockAgentId
  • BedrockAgentName
  • IdentityPoolId
  • UserPoolClientId
  • UserPoolId

The following screenshot shows you what the Outputs tab will look like.

FinOps CloudFormation Output

Deploy the Amplify application

You need to manually deploy the Amplify application using the frontend code found on GitHub. Complete the following steps:

  1. Download the frontend code AWS-Amplify-Frontend.zip from GitHub.
  2. Use the .zip file to manually deploy the application in Amplify.
  3. Return to the Amplify page and use the domain it automatically generated to access the application.

Amazon Cognito for user authentication

The FinOps application uses Amazon Cognito user pools and identity pools to implement secure, role-based access control for finance team members. User pools handle authentication and group management, and identity pools provide temporary AWS credentials mapped to specific IAM roles. The system makes sure that only verified finance team members can access the application and interact with the Amazon Bedrock API, combining robust security with a seamless user experience.

Amazon Bedrock Agents with multi-agent capability

The Amazon Bedrock multi-agent architecture enables sophisticated FinOps problem-solving through a coordinated system of AI agents, led by a FinOpsSupervisorAgent. The FinOpsSupervisorAgent coordinates with two key subordinate agents: the CostAnalysisAgent, which handles detailed cost analysis queries, and the CostOptimizationAgent, which handles specific cost optimization recommendations. Each agent focuses on their specialized financial tasks while maintaining contextual awareness, with the FinOpsSupervisorAgent managing communication and synthesizing comprehensive responses from both agents. This coordinated approach enables parallel processing of financial queries and delivers more effective answers than a single agent could provide, while maintaining consistency and accuracy throughout the FinOps interaction.

Lambda functions for Amazon Bedrock action groups

As part of this solution, Lambda functions are deployed to support the action groups defined for each subordinate agent.

The CostAnalysisAgent uses three distinct Lambda backed action groups to deliver comprehensive cost management capabilities. The CostAnalysisActionGroup connects with Cost Explorer to extract and analyze detailed historical cost data, providing granular insights into cloud spending patterns and resource utilization. The ClockandCalendarActionGroup maintains temporal precision by providing current date and time functionality, essential for accurate period-based cost analysis and reporting. The CostForecastActionGroup uses the Cost Explorer forecasting function, which analyzes historical cost data and provides future cost projections. This information helps the agent support proactive budget planning and make informed recommendations. These action groups work together seamlessly, enabling the agent to provide historical cost analysis and future spend predictions while maintaining precise temporal context.

The CostOptimizationAgent incorporates two Trusted Advisor focused action groups to enhance its recommendation capabilities. The TrustedAdvisorListRecommendationResources action group interfaces with Trusted Advisor to retrieve a comprehensive list of resources that could benefit from optimization, providing a targeted scope for cost-saving efforts. Complementing this, the TrustedAdvisorListRecommendations action group fetches specific recommendations from Trusted Advisor, offering actionable insights on potential cost reductions, performance improvements, and best practices across various AWS services. Together, these action groups empower the agent to deliver data-driven, tailored optimization strategies by using the expertise embedded in Trusted Advisor.

Amplify for frontend

Amplify provides a streamlined solution for deploying and hosting web applications with built-in security and scalability features. The service reduces the complexity of managing infrastructure, allowing developers to concentrate on application development. In our solution, we use the manual deployment capabilities of Amplify to host our frontend application code.

Multi-agent and application walkthrough

To validate the solution before using the Amplify deployed frontend, we can conduct testing directly on the AWS Management Console. By navigating to the FinOpsSupervisorAgent, we can pose a question like “What is my cost for Feb 2025 and what are my current cost savings opportunity?” This query demonstrates the multi-agent orchestration in action. As shown in the following screenshot, the FinOpsSupervisorAgent coordinates with both the CostAnalysisAgent (to retrieve February 2025 cost data) and the CostOptimizationAgent (to identify current cost savings opportunities). This illustrates how the FinOpsSupervisorAgent effectively delegates tasks to specialized agents and synthesizes their responses into a comprehensive answer, showcasing the solution’s integrated approach to FinOps queries.

Amazon Bedrock Agents Console Demo

Navigate to the URL provided after you created the application in Amplify. Upon accessing the application URL, you will be prompted to provide information related to Amazon Cognito and Amazon Bedrock Agents. This information is required to securely authenticate users and allow the frontend to interact with the Amazon Bedrock agent. It enables the application to manage user sessions and make authorized API calls to AWS services on behalf of the user.

You can enter information with the values you collected from the CloudFormation stack outputs. You will be required to enter the following fields, as shown in the following screenshot:

  • User Pool ID
  • User Pool Client ID
  • Identity Pool ID
  • Region
  • Agent Name
  • Agent ID
  • Agent Alias ID
  • Region

AWS Amplify Configuration

You need to sign in with your user name and password. A temporary password was automatically generated during deployment and sent to the email address you provided when launching the CloudFormation template. At first sign-in attempt, you will be asked to reset your password, as shown in the following video.

Amplify Login

Now you can start asking the same question in the application, for example, “What is my cost for February 2025 and what are my current cost savings opportunity?” In a few seconds, the application will provide you detailed results showing services spend for the particular month and savings opportunity. The following video shows this chat.

FinOps Agent Front End Demo 1

You can further dive into the details you got by asking a follow-up question such as “Can you give me the details of the EC2 instances that are underutilized?” and it will return the details for each of the Amazon Elastic Compute Cloud (Amazon EC2) instances that it found underutilized.

Fin Ops Agent Front End Demo 2

The following are a few additional sample queries to demonstrate the capabilities of this tool:

  • What is my top services cost in June 2024?
  • In the past 6 months, how much did I spend on VPC cost?
  • What is my current savings opportunity?

Clean up

If you decide to discontinue using the FinOps application, you can follow these steps to remove it, its associated resources deployed using AWS CloudFormation, and the Amplify deployment:

  1. Delete the CloudFormation stack:
    • On the AWS CloudFormation console, choose Stacks in the navigation pane.
    • Locate the stack you created during the deployment process (you assigned a name to it).
    • Select the stack and choose Delete.
  2. Delete the Amplify application and its resources. For instructions, refer to Clean Up Resources.

Considerations

For optimal visibility across your organization, deploy this solution in your AWS payer account to access cost details for your linked accounts through Cost Explorer.

Trusted Advisor cost optimization visibility is limited to the account where you deploy this solution. To expand its scope, enable Trusted Advisor at the AWS organization level and modify this solution accordingly.

Before deploying to production, enhance security by implementing additional safeguards. You can do this by associating guardrails with your agent in Amazon Bedrock.

Conclusion

The integration of the multi-agent capability of Amazon Bedrock with Amazon Nova demonstrates the transformative potential of AI in AWS cost management. Our FinOps agent solution showcases how specialized AI agents can work together to deliver comprehensive cost analysis, forecasting, and optimization recommendations in a secure and user-friendly environment. This implementation not only addresses immediate cost management challenges, but also adapts to evolving cloud financial operations. As AI technologies advance, this approach sets a foundation for more intelligent and proactive cloud management strategies across various business operations.

Additional resources

To learn more about Amazon Bedrock, refer to the following resources:


About the Author

Salman AhmedSalman Ahmed is a Senior Technical Account Manager in AWS Enterprise Support. He specializes in guiding customers through the design, implementation, and support of AWS solutions. Combining his networking expertise with a drive to explore new technologies, he helps organizations successfully navigate their cloud journey. Outside of work, he enjoys photography, traveling, and watching his favorite sports teams.

Ravi KumarRavi Kumar is a Senior Technical Account Manager in AWS Enterprise Support who helps customers in the travel and hospitality industry to streamline their cloud operations on AWS. He is a results-driven IT professional with over 20 years of experience. In his free time, Ravi enjoys creative activities like painting. He also likes playing cricket and traveling to new places.

Sergio BarrazaSergio Barraza is a Senior Technical Account Manager at AWS, helping customers on designing and optimizing cloud solutions. With more than 25 years in software development, he guides customers through AWS services adoption. Outside work, Sergio is a multi-instrument musician playing guitar, piano, and drums, and he also practices Wing Chun Kung Fu.

Ankush GoyalAnkush Goyal is a Enterprise Support Lead in AWS Enterprise Support who helps customers streamline their cloud operations on AWS. He is a results-driven IT professional with over 20 years of experience.

Read More

Stream ingest data from Kafka to Amazon Bedrock Knowledge Bases using custom connectors

Stream ingest data from Kafka to Amazon Bedrock Knowledge Bases using custom connectors

Retrieval Augmented Generation (RAG) enhances AI responses by combining the generative AI model’s capabilities with information from external data sources, rather than relying solely on the model’s built-in knowledge. In this post, we showcase the custom data connector capability in Amazon Bedrock Knowledge Bases that makes it straightforward to build RAG workflows with custom input data. Through this capability, Amazon Bedrock Knowledge Bases supports the ingestion of streaming data, which means developers can add, update, or delete data in their knowledge base through direct API calls.

Think of the examples of clickstream data, credit card swipes, Internet of Things (IoT) sensor data, log analysis and commodity prices—where both current data and historical trends are important to make a learned decision. Previously, to feed such critical data inputs, you had to first stage it in a supported data source and then either initiate or schedule a data sync job. Based on the quality and quantity of the data, the time to complete this process varied. With custom data connectors, you can quickly ingest specific documents from custom data sources without requiring a full sync and ingest streaming data without the need for intermediary storage. By avoiding time-consuming full syncs and storage steps, you gain faster access to data, reduced latency, and improved application performance.

However, with streaming ingestion using custom connectors, Amazon Bedrock Knowledge Bases processes such streaming data without using an intermediary data source, making it available almost immediately. This feature chunks and converts input data into embeddings using your chosen Amazon Bedrock model and stores everything in the backend vector database. This automation applies to both newly created and existing databases, streamlining your workflow so you can focus on building AI applications without worrying about orchestrating data chunking, embeddings generation, or vector store provisioning and indexing. Additionally, this feature provides the ability to ingest specific documents from custom data sources, all while reducing latency and alleviating operational costs for intermediary storage.

Amazon Bedrock

Amazon Bedrock is a fully managed service that offers a choice of high-performing foundation models (FMs) from leading AI companies such as Anthropic, Cohere, Meta, Stability AI, and Amazon through a single API, along with a broad set of capabilities you need to build generative AI applications with security, privacy, and responsible AI. Using Amazon Bedrock, you can experiment with and evaluate top FMs for your use case, privately customize them with your data using techniques such as fine-tuning and RAG, and build agents that execute tasks using your enterprise systems and data sources.

Amazon Bedrock Knowledge Bases

Amazon Bedrock Knowledge Bases allows organizations to build fully managed RAG pipelines by augmenting contextual information from private data sources to deliver more relevant, accurate, and customized responses. With Amazon Bedrock Knowledge Bases, you can build applications that are enriched by the context that is received from querying a knowledge base. It enables a faster time to product release by abstracting from the heavy lifting of building pipelines and providing you an out-of-the-box RAG solution, thus reducing the build time for your application.

Amazon Bedrock Knowledge Bases custom connector

Amazon Bedrock Knowledge Bases supports custom connectors and the ingestion of streaming data, which means you can add, update, or delete data in your knowledge base through direct API calls.

Solution overview: Build a generative AI stock price analyzer with RAG

For this post, we implement a RAG architecture with Amazon Bedrock Knowledge Bases using a custom connector and topics built with Amazon Managed Streaming for Apache Kafka (Amazon MSK) for a user who may be interested to understand stock price trends. Amazon MSK is a streaming data service that manages Apache Kafka infrastructure and operations, making it straightforward to run Apache Kafka applications on Amazon Web Services (AWS). The solution enables real-time analysis of customer feedback through vector embeddings and large language models (LLMs).

The following architecture diagram has two components:

Preprocessing streaming data workflow noted in letters on the top of the diagram:

  1. Mimicking streaming input, upload a .csv file with stock price data into MSK topic
  2. Automatically trigger the consumer AWS Lambda function
  3. Ingest consumed data into a knowledge base
  4. Knowledge base internally using embeddings model transforms into vector index
  5. Knowledge base internally storing vector index into the vector database

Runtime execution during user queries noted in numerals at the bottom of the diagram:

  1. Users query on stock prices
  2. Foundation model uses the knowledge base to search for an answer
  3. The knowledge base returns with relevant documents
  4. User answered with relevant answer

solution overview

Implementation design

The implementation follows these high-level steps:

  1. Data source setup – Configure an MSK topic that streams input stock prices
  2. Amazon Bedrock Knowledge Bases setup – Create a knowledge base in Amazon Bedrock using the quick create a new vector store option, which automatically provisions and sets up the vector store
  3. Data consumption and ingestion – As and when data lands in the MSK topic, trigger a Lambda function that extracts stock indices, prices, and timestamp information and feeds into the custom connector for Amazon Bedrock Knowledge Bases
  4. Test the knowledge base – Evaluate customer feedback analysis using the knowledge base

Solution walkthrough

To build a generative AI stock analysis tool with Amazon Bedrock Knowledge Bases custom connector, use instructions in the following sections.

Configure the architecture

To try this architecture, deploy the AWS CloudFormation template from this GitHub repository in your AWS account. This template deploys the following components:

  1. Functional virtual private clouds (VPCs), subnets, security groups and AWS Identity and Access Management (IAM) roles
  2. An MSK cluster hosting Apache Kafka input topic
  3. A Lambda function to consume Apache Kafka topic data
  4. An Amazon SageMaker Studio notebook for granular setup and enablement

Create an Apache Kafka topic

In the precreated MSK cluster, the required brokers are deployed ready for use. The next step is to use a SageMaker Studio terminal instance to connect to the MSK cluster and create the test stream topic. In this step, you follow the detailed instructions that are mentioned at Create a topic in the Amazon MSK cluster. The following are the general steps involved:

  1. Download and install the latest Apache Kafka client
  2. Connect to the MSK cluster broker instance
  3. Create the test stream topic on the broker instance

Create a knowledge base in Amazon Bedrock

To create a knowledge base in Amazon Bedrock, follow these steps:

  1. On the Amazon Bedrock console, in the left navigation page under Builder tools, choose Knowledge Bases.

amazon bedrock knowledge bases console

  1. To initiate knowledge base creation, on the Create dropdown menu, choose Knowledge Base with vector store, as shown in the following screenshot.

amazon bedrock knowledge bases create

  1. In the Provide Knowledge Base details pane, enter BedrockStreamIngestKnowledgeBase as the Knowledge Base name.
  2. Under IAM permissions, choose the default option, Create and use a new service role, and (optional) provide a Service role name, as shown in the following screenshot.

amazon bedrock knowledge bases create details

  1. On the Choose data source pane, select Custom as the data source where your dataset is stored
  2. Choose Next, as shown in the following screenshot

amazon bedrock knowledge bases data source details

  1. On the Configure data source pane, enter BedrockStreamIngestKBCustomDS as the Data source name.
  2. Under Parsing strategy, select Amazon Bedrock default parser and for Chunking strategy, choose Default chunking. Choose Next, as shown in the following screenshot.

amazon bedrock knowledge bases parsing strategy

  1. On the Select embeddings model and configure vector store pane, for Embeddings model, choose Titan Text Embeddings v2. For Embeddings type, choose Floating-point vector embeddings. For Vector dimensions, select 1024, as shown in the following screenshot. Make sure you have requested and received access to the chosen FM in Amazon Bedrock. To learn more, refer to Add or remove access to Amazon Bedrock foundation models.

amazon bedrock knowledge bases embedding model

  1. On the Vector database pane, select Quick create a new vector store and choose the new Amazon OpenSearch Serverless option as the vector store.

amazon bedrock knowledge bases vector data store

  1. On the next screen, review your selections. To finalize the setup, choose Create.
  2. Within a few minutes, the console will display your newly created knowledge base.

Configure AWS Lambda Apache Kafka consumer

Now, using API calls, you configure the consumer Lambda function so it gets triggered as soon as the input Apache Kafka topic receives data.

  1. Configure the manually created Amazon Bedrock Knowledge Base ID and its custom Data Source ID as environment variables within the Lambda function. When you use the sample notebook, the referred function names and IDs will be filled in automatically.
response = lambda_client.update_function_configuration(
        FunctionName=<Consumer Lambda Function Name>,
        Environment={
            'Variables': {
                'KBID': <Knowledge Base ID>,
                'DSID': <Data Source ID>
            }
        }
    )

  1. When it’s completed, you tie the Lambda consumer function to listen for events in the source Apache Kafka topic:
response = lambda_client.create_event_source_mapping(
        EventSourceArn=<MSK Cluster’s ARN>,
        FunctionName=<Consumer Lambda Function Name>,
        StartingPosition='LATEST',
        Enabled=True,
        Topics=['streamtopic']
    )

Review AWS Lambda Apache Kafka consumer

The Apache Kafka consumer Lambda function reads data from the Apache Kafka topic, decodes it, extracts stock price information, and ingests it into the Amazon Bedrock knowledge base using the custom connector.

  1. Extract the knowledge base ID and the data source ID:
kb_id = os.environ['KBID']
ds_id = os.environ['DSID']
  1. Define a Python function to decode input events:
def decode_payload(event_data):
    agg_data_bytes = base64.b64decode(event_data)
    decoded_data = agg_data_bytes.decode(encoding="utf-8") 
    event_payload = json.loads(decoded_data)
    return event_payload   
  1. Decode and parse required data on the input event received from the Apache Kafka topic. Using them, create a payload to be ingested into the knowledge base:
records = event['records']['streamtopic-0']
for rec in records:
        # Each record has separate eventID, etc.
        event_payload = decode_payload(rec['value'])
        ticker = event_payload['ticker']
        price = event_payload['price']
        timestamp = event_payload['timestamp']
        myuuid = uuid.uuid4()
        payload_ts = datetime.utcfromtimestamp(timestamp).strftime('%Y-%m-%d %H:%M:%S')
        payload_string = "At " + payload_ts + " the price of " + ticker + " is " + str(price) + "."
  1. Ingest the payload into Amazon Bedrock Knowledge Bases using the custom connector:
response = bedrock_agent_client.ingest_knowledge_base_documents(
                knowledgeBaseId = kb_id,
                dataSourceId = ds_id,
                documents= [
                    {
                        'content': {
                            'custom' : {
                                'customDocumentIdentifier': {
                                    'id' : str(myuuid)
                                },
                                'inlineContent' : {
                                    'textContent' : {
                                        'data' : payload_string
                                    },
                                    'type' : 'TEXT'
                                },
                                'sourceType' : 'IN_LINE'
                            },
                            'dataSourceType' : 'CUSTOM'
                        }
                    }
                ]
            )

Testing

Now that the required setup is done, you trigger the workflow by ingesting test data into your Apache Kafka topic hosted with the MSK cluster. For best results, repeat this section by changing the .csv input file to show stock price increase or decrease.

  1. Prepare the test data. In my case, I had the following data input as a .csv file with a header.
ticker price
OOOO $44.50
ZVZZT $3,413.23
ZNTRX $22.34
ZNRXX $208.76
NTEST $0.45
ZBZX $36.23
ZEXIT $942.34
ZIEXT $870.23
ZTEST $23.75
ZVV $2,802.86
ZXIET $63.00
ZAZZT $18.86
ZBZZT $998.26
ZCZZT $72.34
ZVZZC $90.32
ZWZZT $698.24
ZXZZT $932.32
  1. Define a Python function to put data to the topic. Use pykafka client to ingest data:
def put_to_topic(kafka_host, topic_name, ticker, amount, timestamp):    
    client = KafkaClient(hosts = kafka_host)
    topic = client.topics[topic_name]
    payload = {
        'ticker': ticker,
        'price': amount,
        'timestamp': timestamp
    }
    ret_status = True
    data = json.dumps(payload)
    encoded_message = data.encode("utf-8")
    print(f'Sending ticker data: {ticker}...')
    with topic.get_sync_producer() as producer:
        result=producer.produce(encoded_message)        
    return ret_status
  1. Read the .csv file and push the records to the topic:
df = pd.read_csv('TestData.csv')
start_test_time = time.time() 
print(datetime.utcfromtimestamp(start_test_time).strftime('%Y-%m-%d %H:%M:%S'))
df = df.reset_index()
for index, row in df.iterrows():
    put_to_topic(BootstrapBrokerString, KafkaTopic, row['ticker'], row['price'], time.time())
end_test_time = time.time()
print(datetime.utcfromtimestamp(end_test_time).strftime('%Y-%m-%d %H:%M:%S'))

Verification

If the data ingestion and subsequent processing is successful, navigate to the Amazon Bedrock Knowledge Bases data source page to check the uploaded information.

amazon bedrock knowledge bases upload verification

Querying the knowledge base

Within the Amazon Bedrock Knowledge Bases console, you have access to query the ingested data immediately, as shown in the following screenshot.

amazon bedrock knowledge bases test

To do that, select an Amazon Bedrock FM that you have access to. In my case, I chose Amazon Nova Lite 1.0, as shown in the following screenshot.

amazon bedrock knowledge bases choose llm

When it’s completed, the question, “How is ZVZZT trending?”, yields the results based on the ingested data. Note how Amazon Bedrock Knowledge Bases shows how it derived the answer, even pointing to the granular data element from its source.

bedrock console knowledge bases results

Cleanup

To make sure you’re not paying for resources, delete and clean up the resources created.

  1. Delete the Amazon Bedrock knowledge base.
  2. Delete the automatically created Amazon OpenSearch Serverless cluster.
  3. Delete the automatically created Amazon Elastic File System (Amazon EFS) shares backing the SageMaker Studio environment.
  4. Delete the automatically created security groups associated with the Amazon EFS share. You might need to remove the inbound and outbound rules before they can be deleted.
  5. Delete the automatically created elastic network interfaces attached to the Amazon MSK security group for Lambda traffic.
  6. Delete the automatically created Amazon Bedrock Knowledge Bases execution IAM role.
  7. Stop the kernel instances with Amazon SageMaker Studio.
  8. Delete the CloudFormation stack.

Conclusion

In this post, we showed you how Amazon Bedrock Knowledge Bases supports custom connectors and the ingestion of streaming data, through which developers can add, update, or delete data in their knowledge base through direct API calls. Amazon Bedrock Knowledge Bases offers fully managed, end-to-end RAG workflows to create highly accurate, low-latency, secure, and custom generative AI applications by incorporating contextual information from your company’s data sources. With this capability, you can quickly ingest specific documents from custom data sources without requiring a full sync, and ingest streaming data without the need for intermediary storage.

Send feedback to AWS re:Post for Amazon Bedrock or through your usual AWS contacts, and engage with the generative AI builder community at community.aws.


About the Author

author-imagePrabhakar Chandrasekaran is a Senior Technical Account Manager with AWS Enterprise Support. Prabhakar enjoys helping customers build cutting-edge AI/ML solutions on the cloud. He also works with enterprise customers providing proactive guidance and operational assistance, helping them improve the value of their solutions when using AWS. Prabhakar holds eight AWS and seven other professional certifications. With over 22 years of professional experience, Prabhakar was a data engineer and a program leader in the financial services space prior to joining AWS.

Read More

Add Zoom as a data accessor to your Amazon Q index

Add Zoom as a data accessor to your Amazon Q index

For many organizations, vast amounts of enterprise knowledge are scattered across diverse data sources and applications. Organizations across industries seek to use this cross-application enterprise data from within their preferred systems while adhering to their established security and governance standards.

This post demonstrates how Zoom users can access their Amazon Q Business enterprise data directly within their Zoom interface, alleviating the need to switch between applications while maintaining enterprise security boundaries. Organizations can now configure Zoom as a data accessor in Amazon Q Business, enabling seamless integration between their Amazon Q index and Zoom AI Companion. This integration allows users to access their enterprise knowledge in a controlled manner directly within the Zoom platform.

How Amazon Q Business and Zoom AI Companion work together

The Amazon Q Business data accessor is a core component within Amazon Q Business. It manages and controls access to data stored in an enterprise’s internal knowledge repositories on Amazon Q Business from an external independent software vendor (ISV) such as Zoom while maintaining security and data access compliance. This feature allows Zoom to retrieve relevant content, enhancing the Zoom AI Companion’s knowledge. It serves as an intermediary that enforces access control lists (ACLs), defining both data source permissions and user access rights to the existing Amazon Q Business index.

Zoom AI Companion, the foundation of Zoom’s AI-first work platform, enhances human connection by working behind the scenes to boost productivity, improve work quality, and strengthen relationships. This April, Zoom launched the Custom AI Companion add-on, enabling organizations to customize AI agents and skills to help meet their specific needs and drive company-wide efficiency. Through its partnership with Amazon Q Business, customers can now connect their indexed data in Amazon Q index to Zoom AI Companion, providing enhanced knowledge and contextual insights.

As an Amazon Q Business data accessor, Zoom AI Companion can interact with the enterprise Amazon Q index in a managed way, enriching content beyond what’s available in Zoom alone. Enterprise users can retrieve contextual information from their Amazon Q index’s multiple connected data sources directly within Zoom, with results seamlessly presented through Zoom AI Companion. Zoom AI Companion can access Amazon Q index data with its native data sources, such as previous call transcripts, to quickly surface relevant information to users. This integration alleviates the need to manually switch between various enterprise systems like Google Drive, Confluence, Salesforce, and more, saving time and reducing workflow disruptions.

For example, while preparing for a Zoom call, users can quickly find answers to questions like “When is customer AnyCustomer’s contract up for renewal, and who signed the last one?” The Amazon Q index processes these queries and delivers results through Zoom AI Companion in real time.

Solution overview

The following diagram is a high-level architecture that explains how enterprises can set up and access Amazon Q Business indexed data from within the Zoom AI Companion application.

In the following sections, we demonstrate how to configure Zoom as a data accessor and get started using Zoom AI Companion.

Prerequisites

To implement this solution, you need an AWS account with appropriate permissions.

Create an Amazon Q Business application

To access indexed data from Amazon Q Business through Zoom AI Companion, organizations must first set up their Amazon Q Business application. The application must be configured with AWS IAM Identity Center to enable the Zoom data accessor functionality. For detailed guidance on creating an Amazon Q Business application, refer to Configure application.

Configure access control with IAM Identity Center

Through IAM Identity Center, Amazon Q Business uses trusted identity propagation to provide proper authentication and fine-grained authorization based on user ID and group-based resources, making sure access to sensitive data is tightly controlled and document ACLs are enforced. The ISV is only permitted to access this index using the assigned data accessor.

If you’re using an identity provider (IdP) such as Okta, CyberArk, or others, you can add the IdP to IAM Identity Center as a trusted token issuer. For additional information, see Configure Amazon Q Business with AWS IAM Identity Center trusted identity propagation.

For more information on IAM Identity Center, refer to IAM Identity Center identity source tutorials.

Add Zoom as a data accessor

After creating an Amazon Q Business application with IAM Identity Center, administrators can configure Zoom as a data accessor through the Amazon Q Business console. Complete the following steps:

  1.  On the Amazon Q Business console, choose Data accessors in the navigation pane.
  2. Choose Add data accessor.
  3. Choose Zoom as your data accessor.
  4. For Accessor name, enter a name for your data accessor.
  5. For Data source access, configure your level of access.

You can select specific data sources to be available through the data accessor. This allows you to control which content is surfaced in the ISV environment. You can use Amazon Q Business pre-built connectors to synchronize content from various systems. For more information, refer to Supported connectors.

  1. For User access, specify which users can access the Amazon Q index through the data accessor.

This option enables you to configure granular permissions for data accessor accessibility and manage organizational access controls.

For more information about data access, refer to Accessing a customer’s Amazon Q index as a data accessor using cross-account access.

Administrators can modify data accessor settings at any time after implementation. You can adjust user access permissions, update available data sources, and change the scope of accessibility. To revoke access, complete the following steps:

  1. On the Amazon Q Business console, choose Data accessors in the navigation pane.
  2. Locate the accessor you want to delete and choose Delete.
  3. Confirm the deletion when prompted.

Removing a data accessor from a data source immediately cancels the ISV’s access to your organization’s Amazon Q index.

Configure Amazon Q for Zoom AI Companion

To start using Zoom as a data accessor for your Amazon Q Business index, the following information from your enterprise Amazon Q Business application must be shared with Zoom:

  • Amazon Q Business application ID
  • Amazon Q Business AWS Region
  • Amazon Q Business retriever ID
  • Data accessor application Amazon Resource Name (ARN)
  • IAM Identity Center instance Region

For more information, refer to Accessing a customer’s Amazon Q index as a data accessor using cross-account access.

After you add Zoom as a data accessor, a pop-up window will appear on the Amazon Q Business console. This pop-up contains the required parameters, as shown in the following screenshot.

Navigate to the Zoom App Marketplace to configure Amazon Q in Zoom, and enter the information you collected.

After you submit this information, you’re ready to access Amazon Q index data from Zoom AI Companion.

With AI Companion connected to Amazon Q index, you have the information you need instantly. For example, you could make AI Companion aware of your organization’s IT troubleshooting guides so employees could quickly get help with questions like “How do I fix a broken keyboard?”

Using the SearchRelevantContent API

When an enterprise customer with an Amazon Q index enables a data accessor, it allows authenticated Amazon Q Business users to search and retrieve relevant content in real time while using external ISV platforms (like Zoom). This functionality is achieved through the ISV calling the Amazon Q index SearchRelevantContent API as an external data accessor across accounts. The SearchRelevantContent API is specifically designed to return search results from the Amazon Q index, which can be further enhanced by the ISV’s generative AI stack. By using the Amazon Q index SearchRelevantContent API, Zoom and other ISVs can integrate query results directly into their environment.

The SearchRelevantContent API is an identity-aware API, which means it operates with knowledge of the user’s identity and associated information (such as email and group membership) through the credentials used to call the API. This identity awareness is a prerequisite for using the API. When querying the index, it reconciles document access controls against the authenticated user’s permissions. As a result, users can only retrieve results from content they are authorized to access.

When an ISV calls the SearchRelevantContent API as a data accessor, both sparse and dense searches are applied to the Amazon Q index, combining keyword search and vector embedding proximity. Results are ranked before being returned to the ISV interface.

For example, if you ask in Zoom, “What is Company XYZ’s engagement on the cancer moonshot project?”, Zoom AI Companion triggers a call to the SearchRelevantContent API as a data accessor.

For a more comprehensive code example, see the notebook in Module 2 – Amazon Q cross-app index.

The following is a code snippet in Python showing what that search request might look like:

search_params = {  'applicationId': Q_BIZ_APP_ID, 
    'contentSource': {
        'retriever': { 
            'retrieverId': Q_RETRIEVER_ID 
            }
    }, 
    'queryText': 'What is Company XYZ engagement on the cancer moonshot project?', 
    'maxResults': 10
}

search_response = qbiz.search_relevant_content(**search_params)

The search response will contain an array of results with relevant chunks of text, along with source information, document attributes, and confidence scores. The following is a snippet from the SearchRelevantContent API response. This is an example of results you might see from the web crawler data connector used with Amazon Q Business.

[
    {
        "content": "nSeveral initiatives have been launched or will soon launch to address the goals of this next phase, including:nIncluding more people in expanded and modernized cancer clinical trialsnIncreasing the pipeline of new cancer drugsnEnsuring access to current and new standards of cancer carenEnhancing diversity in the cancer research workforce",
        "documentId": "Cancermoonshot",
        "documentTitle": "About The Cancer Moonshot",
        "documentUri": "https://companyxyz/cancermoonshot.html",
        "documentAttributes": [
            {
                "name": "_source_uri",
                "value": {
                    "stringValue": "https://companyxyz.com/cancermoonshot.html"
                }
            }
        ],
        "scoreAttributes": {
            "scoreConfidence": "VERY_HIGH"
        }
    },...]

The SearchRelevantContent API has a rich set of optional parameters available that ISVs can choose to use. For example, document attributes can be used as filters. If documents with meta attributes have been indexed, and one of these attributes contains the author, it would be possible for an ISV to apply a filter where you can specify an author name. In the following example, results returned are constrained to only documents that have the specified attribute author name “John Smith.”

search_params = {
    'applicationId': Q_BIZ_APP_ID,
    'contentSource': {
        'retriever': {
            'retrieverId': Q_RETRIEVER_ID
            }
    },
    'queryText': myQuestion,
    'maxResults': 5,
    'attributeFilter': {
        'equalsTo': {
            'name': 'Author',
            'value': {
                'stringValue': 'John Smith'
            }
        }
    }
}

For a more comprehensive reference on what is available in the SearchRelevantContent API request object, refer to search_relevant_content.

Clean up

When you’re done using this solution, clean up the resources you created.

  1. Delete the Zoom data accessor from the Data accessors console. Deleting this data accessor will delete permissions and access to the data accessor for all users.
  2. Delete the Amazon Q Business application that you created as a prerequisite.
    • Navigate to the Amazon Q Business console.
    • Choose Applications on the left menu.
    • Select the application you created.
    • Choose Delete from under Actions to delete the application.

Deleting the Amazon Q Business application will remove the associated index and data source connectors, and prevent incurring additional costs.

Conclusion

Amazon Q indexes offers a transformative approach to workplace efficiency. By creating a centralized, secure repository for your organization’s data, you can seamlessly integrate vital information with your everyday productivity tools like Zoom AI Companion.

In this post, we explored how Amazon Q Business enterprise users can add data accessors to integrate with external parties like Zoom AI Companion, allowing users to access their enterprise knowledge in a managed way directly from within those platforms.

Ready to supercharge your workforce’s productivity? Start your Amazon Q Business journey today alongside Zoom. To learn more about Amazon Q Business data accessors, see Enhance enterprise productivity for your LLM solution by becoming an Amazon Q Business data accessor.


About the authors

David Girling is a Senior AI/ML Solutions Architect with over 20 years of experience in designing, leading, and developing enterprise systems. David is part of a specialist team that focuses on helping customers learn, innovate, and utilize these highly capable services with their data for their use cases.

Chinmayee Rane is a Generative AI Specialist Solutions Architect at AWS, with a core focus on generative AI. She helps Independent Software Vendors (ISVs) accelerate the adoption of generative AI by designing scalable and impactful solutions. With a strong background in applied mathematics and machine learning, she specializes in intelligent document processing and AI-driven innovation. Outside of work, she enjoys salsa and bachata dancing.

Sonali Sahu is leading the Generative AI Specialist Solutions Architecture team in AWS. She is an author, thought leader, and passionate technologist. Her core area of focus is AI and ML, and she frequently speaks at AI and ML conferences and meetups around the world. She has both breadth and depth of experience in technology and the technology industry, with industry expertise in healthcare, the financial sector, and insurance.

Read More

The future of quality assurance: Shift-left testing with QyrusAI and Amazon Bedrock

The future of quality assurance: Shift-left testing with QyrusAI and Amazon Bedrock

This post is co-written with Ameet Deshpande and Vatsal Saglani from Qyrus.

As businesses embrace accelerated development cycles to stay competitive, maintaining rigorous quality standards can pose a significant challenge. Traditional testing methods, which occur late in the development cycle, often result in delays, increased costs, and compromised quality.

Shift-left testing, which emphasizes earlier testing in the development process, aims to address these issues by identifying and resolving problems sooner. However, effectively implementing this approach requires the right tools. By using advanced AI models, QyrusAI improves testing throughout the development cycle—from generating test cases during the requirements phase to uncovering unexpected issues during application exploration.

In this post, we explore how QyrusAI and Amazon Bedrock are revolutionizing shift-left testing, enabling teams to deliver better software faster. Amazon Bedrock is a fully managed service that allows businesses to build and scale generative AI applications using foundation models (FMs) from leading AI providers. It enables seamless integration with AWS services, offering customization, security, and scalability without managing infrastructure.

QyrusAI: Intelligent testing agents powered by Amazon Bedrock

QyrusAI is a suite of AI-driven testing tools that enhances the software testing process across the entire software development lifecycle (SDLC). Using advanced large language models (LLMs) and vision-language models (VLMs) through Amazon Bedrock, QyrusAI provides a suite of capabilities designed to elevate shift-left testing. Let’s dive into each agent and the cutting-edge models that power them.

TestGenerator

TestGenerator generates initial test cases based on requirements using a suite of advanced models:

  • Meta’s Llama 70B – We use this model to generate test cases by analyzing requirements documents and understanding key entities, user actions, and expected behaviors. With its in-context learning capabilities, we use its natural language understanding to infer possible scenarios and edge cases, creating a comprehensive list of test cases that align with the given requirements.
  • Anthropic’s Claude 3.5 Sonnet – We use this model to evaluate the generated test scenarios, acting as a judge to assess if the scenarios are comprehensive and accurate. We also use it to highlight missing scenarios, potential failure points, or edge cases that might not be apparent in the initial phases. Additionally, we use it to rank test cases based on relevance, helping prioritize the most critical tests covering high-risk areas and key functionalities.
  • Cohere’s English Embed – We use this model to embed text from large documents such as requirement specifications, user stories, or functional requirement documents, enabling efficient semantic search and retrieval.
  • Pinecone on AWS Marketplace – Embedded documents are stored in Pinecone to enable fast and efficient retrieval. During test case generation, these embeddings are used as part of a ReAct agent approach—where the LLM thinks, observes, searches for specific or generic requirements in the document, and generates comprehensive test scenarios. 

The following diagram shows how TestGenerator is deployed on AWS using Amazon Elastic Container Service (Amazon ECS) tasks exposed through Application Load Balancer, using Amazon Bedrock, Amazon Simple Storage Service (Amazon S3), and Pinecone for embedding storage and retrieval to generate comprehensive test cases.

VisionNova

VisionNova is QyrusAI’s design test case generator that crafts design-based test cases using Anthropic’s Claude 3.5 Sonnet. The model is used to analyze design documents and generate precise, relevant test cases. This workflow specializes in understanding UX/UI design documents and translating visual elements into testable scenarios.

The following diagram shows how VisionNova is deployed on AWS using ECS tasks exposed through Application Load Balancer, using Anthropic’s Claude 3 and Claude 3.5 Sonnet models on Amazon Bedrock for image understanding, and using Amazon S3 for storing images, to generate design-based test cases for validating UI/UX elements.

Uxtract

UXtract is QyrusAI’s agentic workflow that converts Figma prototypes into test scenarios and steps based on the flow of screens in the prototype.

Figma prototype graphs are used to create detailed test cases with step-by-step instructions. The graph is analyzed to understand the different flows and make sure transitions between elements are validated. Anthropic’s Claude 3 Opus is used to process these transitions to identify potential actions and interactions, and Anthropic’s Claude 3.5 Sonnet is used to generate detailed test steps and instructions based on the transitions and higher-level objectives. This layered approach makes sure that UXtract captures both the functional accuracy of each flow and the granularity needed for effective testing.

The following diagram illustrates how UXtract uses ECS tasks, connected through Application Load Balancer, along with Amazon Bedrock models and Amazon S3 storage, to analyze Figma prototypes and create detailed, step-by-step test cases.

API Builder

API Builder creates virtualized APIs for early frontend testing by using various LLMs from Amazon Bedrock. These models interpret API specifications and generate accurate mock responses, facilitating effective testing before full backend implementation.

The following diagram illustrates how API Builder uses ECS tasks, connected through Application Load Balancer, along with Amazon Bedrock models and Amazon S3 storage, to create a virtualized and high-scalable microservice with dynamic data provisions using Amazon Elastic File System (Amazon EFS) on AWS Lambda compute.

QyrusAI offers a range of additional agents that further enhance the testing process:

  • Echo – Echo generates synthetic test data using a blend of Anthropic’s Claude 3 Sonnet, Mistral 8x7B Instruct, and Meta’s Llama1 70B to provide comprehensive testing coverage.
  • Rover and TestPilot – These multi-agent frameworks are designed for exploratory and objective-based testing, respectively. They use a combination of LLMs, VLMs, and embedding models from Amazon Bedrock to uncover and address issues effectively.
  • Healer – Healer tackles common test failures caused by locator issues by analyzing test scripts and their current state with various LLMs and VLMs to suggest accurate fixes.

These agents, powered by Amazon Bedrock, collaborate to deliver a robust, AI-driven shift-left testing strategy throughout the SDLC.

QyrusAI and Amazon Bedrock

At the core of QyrusAI’s integration with Amazon Bedrock is our custom-developed qai package, which builds upon aiobotocore, aioboto3, and boto3. This unified interface enables our AI agents to seamlessly access the diverse array of LLMs, VLMs, and embedding models available on Amazon Bedrock. The qai package is essential to our AI-powered testing ecosystem, offering several key benefits:

  • Consistent access – The package standardizes interactions with various models on Amazon Bedrock, providing uniformity across our suite of testing agents.
  • DRY principle – By centralizing Amazon Bedrock interaction logic, we’ve minimized code duplication and enhanced system maintainability, reducing the likelihood of errors.
  • Seamless updates – As Amazon Bedrock evolves and introduces new models or features, updating the qai package allows us to quickly integrate these advancements without altering each agent individually.
  • Specialized classes – The package includes distinct class objects for different model types (LLMs and VLMs) and families, optimizing interactions based on model requirements.
  • Out-of-the-box features – In addition to standard and streaming completions, the qai package offers built-in support for multiple and parallel function calling, providing a comprehensive set of capabilities.

Function calling and JSON mode were critical requirements for our AI workflows and agents. To maximize compatibility across diverse array of models available on Amazon Bedrock, we implemented consistent interfaces for these features in our QAI package. Because prompts for generating structured data can differ among LLMs and VLMs, specialized classes were created for various models and model families to provide consistent function calling and JSON mode capabilities. This approach provides a unified interface across the agents, streamlining interactions and enhancing overall efficiency.

The following code is a simplified overview of how we use the qai package to interact with LLMs and VLMs on Amazon Bedrock:

from qai import QAILLMs 

llm = QAILLMs() 

# can be taken from env or parameter store
provider = "Claude" 
model = "anthropic.claude-3-sonnet-20240229-v1:0" 

getattr(llm, provider).llm.__function_call__(model, messages, functions, tool_choice=None, max_tokens=2046)

The shift-left testing paradigm

Shift-left testing allows teams to catch issues sooner and reduce risk. Here’s how QyrusAI agents facilitate the shift-left approach:

  • Requirement analysis – TestGenerator AI generates initial test cases directly from the requirements, setting a strong foundation for quality from the start.
  • Design – VisionNova and UXtract convert Figma designs and prototypes into detailed test cases and functional steps.
  • Pre-implementation – This includes the following features:
    • API Builder creates virtualized APIs, enabling early frontend testing before the backend is fully developed.
    • Echo generates synthetic test data, allowing comprehensive testing without real data dependencies.
  • Implementation – Teams use the pre-generated test cases and virtualized APIs during development, providing continuous quality checks.
  • Testing – This includes the following features:
    • Rover, a multi-agent system, autonomously explores the application to uncover unexpected issues.
    • TestPilot conducts objective-based testing, making sure the application meets its intended goals.
  • Maintenance –QyrusAI supports ongoing regression testing with advanced test management, version control, and reporting features, providing long-term software quality.

The following diagram visually represents how QyrusAI agents integrate throughout the SDLC, from requirement analysis to maintenance, enabling a shift-left testing approach that makes sure issues are caught early and quality is maintained continuously.

AI Agent Integration

QyrusAI’s integrated approach makes sure that testing is proactive, continuous, and seamlessly aligned with every phase of the SDLC. With this approach, teams can:

  • Detect potential issues earlier in the process
  • Lower the cost of fixing bugs
  • Enhance overall software quality
  • Accelerate development timelines

This shift-left strategy, powered by QyrusAI and Amazon Bedrock, enables teams to deliver higher-quality software faster and more efficiently.

A typical shift-left testing workflow with QyrusAI

To make this more tangible, let’s walk through how QyrusAI and Amazon Bedrock can help create and refine test cases from a sample requirements document:

  • A user uploads a sample requirements document.
  • TestGenerator, powered by Meta’s Llama 3.1, processes the document and generates a list of high-level test cases.
  • These test cases are refined by Anthropic’s Claude 3.5 Sonnet to enforce coverage of key business rules.
  • VisionNova and UXtract use design documents from tools like Figma to generate step-by-step UI tests, validating key user journeys.
  • API Builder virtualizes APIs, allowing frontend developers to begin testing the UI with mock responses before the backend is ready.

By following these steps, teams can get ahead of potential issues, creating a safety net that improves both the quality and speed of software development.

The impact of AI-driven shift-left testing

Our data—collected from early adopters of QyrusAI—demonstrates the significant benefits of our AI-driven shift-left approach:

  • 80% reduction in defect leakage – Finding and fixing defects earlier results in fewer bugs reaching production
  • 20% reduction in UAT effort – Comprehensive testing early on means a more stable product reaching the user acceptance testing (UAT) phase
  • 36% faster time to market – Early defect detection, reduced rework, and more efficient testing leads to faster delivery

These metrics have been gathered through a combination of internal testing and pilot programs with select customers. The results consistently show that incorporating AI early in the SDLC can lead to a significant reduction in defects, development costs, and time to market.

Conclusion

Shift-left testing, powered by QyrusAI and Amazon Bedrock, is set to revolutionize the software development landscape. By integrating AI-driven testing across the entire SDLC—from requirements analysis to maintenance—QyrusAI helps teams:

  • Detect and fix issues early – Significantly cut development costs by identifying and resolving problems sooner
  • Enhance software quality – Achieve higher quality through thorough, AI-powered testing
  • Speed up development – Accelerate development cycles without sacrificing quality
  • Adapt to changes – Quickly adjust to evolving requirements and application structures

Amazon Bedrock provides the essential foundation with its advanced language and vision models, offering unparalleled flexibility and capability in software testing. This integration, along with seamless connectivity to other AWS services, enhances scalability, security, and cost-effectiveness.

As the software industry advances, the collaboration between QyrusAI and Amazon Bedrock positions teams at the cutting edge of AI-driven quality assurance. By adopting this shift-left, AI-powered approach, organizations can not only keep pace with today’s fast-moving digital world, but also set new benchmarks in software quality and development efficiency.

If you’re looking to revolutionize your software testing processes, we invite you to reach out to our team and learn more about QyrusAI. Let’s work together to build better software, faster.

To see how QyrusAI can enhance your development workflow, get in touch today at support@qyrus.com. Let’s redefine your software quality with AI-driven shift-left testing.


About the Authors

Ameet Deshpande is Head of Engineering at Qyrus and leads innovation in AI-driven, codeless software testing solutions. With expertise in quality engineering, cloud platforms, and SaaS, he blends technical acumen with strategic leadership. Ameet has spearheaded large-scale transformation programs and consulting initiatives for global clients, including top financial institutions. An electronics and communication engineer specializing in embedded systems, he brings a strong technical foundation to his leadership in delivering transformative solutions.

Vatsal Saglani is a Data Science and Generative AI Lead at Qyrus, where he builds generative AI-powered test automation tools and services using multi-agent frameworks, large language models, and vision-language models. With a focus on fine-tuning advanced AI systems, Vatsal accelerates software development by empowering teams to shift testing left, enhancing both efficiency and software quality.

Siddan Korbu is a Customer Delivery Architect with AWS. He works with enterprise customers to help them build AI/ML and generative AI solutions using AWS services.

Read More

Automate video insights for contextual advertising using Amazon Bedrock Data Automation

Automate video insights for contextual advertising using Amazon Bedrock Data Automation

Contextual advertising, a strategy that matches ads with relevant digital content, has transformed digital marketing by delivering personalized experiences to viewers. However, implementing this approach for streaming video-on-demand (VOD) content poses significant challenges, particularly in ad placement and relevance. Traditional methods rely heavily on manual content analysis. For example, a content analyst might spend hours watching a romantic drama, placing an ad break right after a climactic confession scene, but before the resolution. Then, they manually tag the content with metadata such as romance, emotional, or family-friendly to verify appropriate ad matching. Although this manual process helps create a seamless viewer experience and maintains ad relevance, it proves highly impractical at scale.

Recent advancements in generative AI, particularly multimodal foundation models (FMs), demonstrate advanced video understanding capabilities and offer a promising solution to these challenges. We previously explored this potential in the post Media2Cloud on AWS Guidance: Scene and ad-break detection and contextual understanding for advertising using generative AI, where we demonstrated custom workflows using Amazon Titan Multimodal embeddings G1 models and Anthropic’s Claude FMs from Amazon Bedrock. In this post, we’re introducing an even simpler way to build contextual advertising solutions.

Amazon Bedrock Data Automation (BDA) is a new managed feature powered by FMs in Amazon Bedrock. BDA extracts structured outputs from unstructured content—including documents, images, video, and audio—while alleviating the need for complex custom workflows. In this post, we demonstrate how BDA automatically extracts rich video insights such as chapter segments and audio segments, detects text in scenes, and classifies Interactive Advertising Bureau (IAB) taxonomies, and then uses these insights to build a nonlinear ads solution to enhance contextual advertising effectiveness. A sample Jupyter notebook is available in the following GitHub repository.

Solution overview

Nonlinear ads are digital video advertisements that appear simultaneously with the main video content without interrupting playback. These ads are displayed as overlays, graphics, or rich media elements on top of the video player, typically appearing at the bottom of the screen. The following screenshot is an illustration of the final linear ads solution we will implement in this post.

Example of an overlay add in the lower third of a video player

The following diagram presents an overview of the architecture and its key components.

The workflow is as follows:

  1. Users upload videos to Amazon Simple Storage Service (Amazon S3).
  2. Each new video invokes an AWS Lambda function that triggers BDA for video analysis. An asynchronous job runs to analyze the video.
  3. The analysis output is stored in an output S3 bucket.
  4. The downstream system (AWS Elemental MediaTailor) can consume the chapter segmentation, contextual insights, and metadata (such as IAB taxonomy) to drive better ad decisions in the video.

For simplicity in our notebook example, we provide a dictionary that maps the metadata to a set of local ad inventory files to be displayed with the video segments. This simulates how MediaTailor interacts with content manifest files and requests replacement ads from the Ad Decision Service.

Prerequisites

The following prerequisites are needed to run the notebooks and follow along with the examples in this post:

Video analysis using BDA

Thanks to BDA, processing and analyzing videos has become significantly simpler. The workflow consists of three main steps: creating a project, invoking the analysis, and retrieving analysis results. The first step—creating a project—establishes a reusable configuration template for your analysis tasks. Within the project, you define the types of analyses you want to perform and how you want the results structured. To create a project, use the create_data_automation_project API from the BDA boto3 client. This function returns a dataAutomationProjectArn, which you will need to include with each runtime invocation.

{
    'projectArn': 'string',
    'projectStage': 'DEVELOPMENT'|'LIVE',
    'status': 'COMPLETED'|'IN_PROGRESS'|'FAILED'
}

Upon project completion (status: COMPLETED), you can use the invoke_data_automation_async API from the BDA runtime client to start video analysis. This API requires input/output S3 locations and a cross-Region profile ARN in your request. BDA requires cross-Region inference support for all file processing tasks, automatically selecting the optimal AWS Region within your geography to maximize compute resources and model availability. This mandatory feature helps provide optimal performance and customer experience at no additional cost. You can also optionally configure Amazon EventBridge notifications for job tracking (for more details, see Tutorial: Send an email when events happen using Amazon EventBridge). After it’s triggered, the process immediately returns a job ID while continuing processing in the background.

default_profile_arn = "arn:aws:bedrock:{region}:{account_id}:data-automation-profile/us.data-automation-v1"

response = bda_runtime_client.invoke_data_automation_async(
    inputConfiguration={
        's3Uri': f's3://{data_bucket}/{s3_key}'
    },
    outputConfiguration={
        's3Uri': f's3://{data_bucket}/{output_prefix}'
    },
    dataAutomationConfiguration={
        'dataAutomationProjectArn': dataAutomationProjectArn,
        'stage': 'DEVELOPMENT'
    },
    notificationConfiguration={
        'eventBridgeConfiguration': {
            'eventBridgeEnabled': False
        }
    },
    dataAutomationProfileArn=default_profile_arn
)

BDA standard outputs for video

Let’s explore the outputs from BDA for video analysis. Understanding these outputs is essential to understand what type of insights BDA provides and how to use them to build our contextual advertising solution. The following diagram is an illustration of key components of a video, and each defines a granularity level you need to analyze the video content.

The key components are as follows:

  • Frame – A single still image that creates the illusion of motion when displayed in rapid succession with other frames in a video.
  • Shot – A continuous series of frames recorded from the moment the camera starts rolling until it stops.
  • Chapter – A sequence of shots that forms a coherent unit of action or narrative within the video, or a continuous conversation topic. BDA determines chapter boundaries by first classifying the video as either visually heavy (such as movies or episodic content) or audio heavy (such as news or presentations). Based on this classification, it then decides whether to establish boundaries using visual-based shot sequences or audio-based conversation topics.
  • Video – The complete content that enables analysis at the full video level.

Video-level analysis

Now that we defined the video granularity terms, let’s examine the insights BDA provides. At full video level, BDA generates a comprehensive summary that delivers a concise overview of the video’s key themes and main content. The system also includes speaker identification, a process that attempts to derive speakers’ names based on audible cues (For example, “I’m Jane Doe”) or visual cues on the screen whenever possible. To illustrate this capability, we can examine the following full video summary that BDA generated for the short film Meridian:

In a series of mysterious disappearances along a stretch of road above El Matador Beach, three seemingly unconnected men vanished without a trace. The victims – a school teacher, an insurance salesman, and a retiree – shared little in common except for being divorced, with no significant criminal records or ties to criminal organizations…Detective Sullivan investigates the cases, initially dismissing the possibility of suicide due to the absence of bodies. A key breakthrough comes from a credible witness who was walking his dog along the bluffs on the day of the last disappearance. The witness described seeing a man atop a massive rock formation at the shoreline, separated from the mainland. The man appeared to be searching for something or someone when suddenly, unprecedented severe weather struck the area with thunder and lightning….The investigation takes another turn when Captain Foster of the LAPD arrives at the El Matador location, discovering that Detective Sullivan has also gone missing. The case becomes increasingly complex as the connection between the disappearances, the mysterious woman, and the unusual weather phenomena remains unexplained.

Along with the summary, BDA generates a complete audio transcript that includes speaker identification. This transcript captures the spoken content while noting who is speaking throughout the video. The following is an example of a transcript generated by BDA from the Meridian short film:

[spk_0]: So these guys just disappeared.
[spk_1]: Yeah, on that stretch of road right above El Matador. You know it. With the big rock. That’s right, yeah.
[spk_2]: You know, Mickey Cohen used to take his associates out there, get him a bond voyage.

Chapter-level analysis

BDA performs detailed analysis at the chapter level by generating comprehensive chapter summaries. Each chapter summary includes specific start and end timestamps to precisely mark the chapter’s duration. Additionally, when relevant, BDA applies IAB categories to classify the chapter’s content. These IAB categories are part of a standardized classification system created for organizing and mapping publisher content, which serves multiple purposes, including advertising targeting, internet security, and content filtering. The following example demonstrates a typical chapter-level analysis:

[00:00:20;04 – 00:00:23;01] Automotive, Auto Type
The video showcases a vintage urban street scene from the mid-20th century. The focal point is the Florentine Gardens building, an ornate structure with a prominent sign displaying “Florentine GARDENS” and “GRUEN Time”. The building’s facade features decorative elements like columns and arched windows, giving it a grand appearance. Palm trees line the sidewalk in front of the building, adding to the tropical ambiance. Several vintage cars are parked along the street, including a yellow taxi cab and a black sedan. Pedestrians can be seen walking on the sidewalk, contributing to the lively atmosphere. The overall scene captures the essence of a bustling city environment during that era.

For a comprehensive list of supported IAB taxonomy categories, see Videos.

Also at the chapter level, BDA produces detailed audio transcriptions with precise timestamps for each spoken segment. These granular transcriptions are particularly useful for closed captioning and subtitling tasks. The following is an example of a chapter-level transcription:

[26.85 – 29.59] So these guys just disappeared.
[30.93 – 34.27] Yeah, on that stretch of road right above El Matador.
[35.099 – 35.959] You know it.
[36.49 – 39.029] With the big rock. That’s right, yeah.
[40.189 – 44.86] You know, Mickey Cohen used to take his associates out there, get him a bond voyage.

Shot- and frame-level insights

At a more granular level, BDA provides frame-accurate timestamps for shot boundaries. The system also performs text detection and logo detection on individual frames, generating bounding boxes around detected text and logo along with confidence scores for each detection. The following image is an example of text bounding boxes extracted from the Meridian video.

Contextual advertising solution

Let’s apply the insights extracted from BDA to power nonlinear ad solutions. Unlike traditional linear advertising that relies on predetermined time slots, nonlinear advertising enables dynamic ad placement based on content context. At the chapter level, BDA automatically segments videos and provides detailed insights including content summaries, IAB categories, and precise timestamps. These insights serve as intelligent markers for ad placement opportunities, allowing advertisers to target specific chapters that align with their promotional content.

In this example, we prepared a list of ad images and mapped them each to specific IAB categories. When BDA identifies IAB categories at the chapter level, the system automatically matches and selects the most relevant ad from the list to display as an overlay banner during that chapter. In the following example, when BDA identifies a scene with a car driving on a country road (IAB category: Automotive, Travel), the system selects and displays a suitcase at an airport from the pre-mapped ad database. This automated matching process promotes precise ad placement while maintaining optimal viewer experience.

Example of an overlay add in the lower third of a video player

Clean up

Follow the instructions in the cleanup section of the notebook to delete the projects and resources provisioned to avoid unnecessary charges. Refer to Amazon Bedrock pricing for details regarding BDA cost.

Conclusion

Amazon Bedrock Data Automation, powered by foundation models from Amazon Bedrock, marks a significant advancement in video analysis. BDA minimizes the complex orchestration layers previously required for extracting deep insights from video content, transforming what was once a sophisticated technical challenge into a streamlined, managed solution. This breakthrough empowers media companies to deliver more engaging, personalized advertising experiences while significantly reducing operational overhead. We encourage you to explore the sample Jupyter notebook provided in the GitHub repository to experience BDA firsthand and discover additional BDA use cases across other modalities in the following resources:


About the authors

James WuJames Wu is a Senior AI/ML Specialist Solution Architect at AWS. helping customers design and build AI/ML solutions. James’s work covers a wide range of ML use cases, with a primary interest in computer vision, deep learning, and scaling ML across the enterprise. Prior to joining AWS, James was an architect, developer, and technology leader for over 10 years, including 6 years in engineering and 4 years in marketing & advertising industries

Alex Burkleaux is a Senior AI/ML Specialist Solution Architect at AWS. She helps customers use AI Services to build media solutions using Generative AI. Her industry experience includes over-the-top video, database management systems, and reliability engineering.

Read More

How Salesforce achieves high-performance model deployment with Amazon SageMaker AI

How Salesforce achieves high-performance model deployment with Amazon SageMaker AI

This post is a joint collaboration between Salesforce and AWS and is being cross-published on both the Salesforce Engineering Blog and the AWS Machine Learning Blog.

The Salesforce AI Model Serving team is working to push the boundaries of natural language processing and AI capabilities for enterprise applications. Their key focus areas include optimizing large language models (LLMs) by integrating cutting-edge solutions, collaborating with leading technology providers, and driving performance enhancements that impact Salesforce’s AI-driven features. The AI Model Serving team supports a wide range of models for both traditional machine learning (ML) and generative AI including LLMs, multi-modal foundation models (FMs), speech recognition, and computer vision-based models. Through innovation and partnerships with leading technology providers, this team enhances performance and capabilities, tackling challenges such as throughput and latency optimization and secure model deployment for real-time AI applications. They accomplish this through evaluation of ML models across multiple environments and extensive performance testing to achieve scalability and reliability for inferencing on AWS.

The team is responsible for the end-to-end process of gathering requirements and performance objectives, hosting, optimizing, and scaling AI models, including LLMs, built by Salesforce’s data science and research teams. This includes optimizing the models to achieve high throughput and low latency and deploying them quickly through automated, self-service processes across multiple AWS Regions.

In this post, we share how the AI Model Service team achieved high-performance model deployment using Amazon SageMaker AI.

Key challenges

The team faces several challenges in deploying models for Salesforce. An example would be balancing latency and throughput while achieving cost-efficiency when scaling these models based on demand. Maintaining performance and scalability while minimizing serving costs is vital across the entire inference lifecycle. Inference optimization is a crucial aspect of this process, because the model and their hosting environment must be fine-tuned to meet price-performance requirements in real-time AI applications. Salesforce’s fast-paced AI innovation requires the team to constantly evaluate new models (proprietary, open source, or third-party) across diverse use cases. They then have to quickly deploy these models to stay in cadence with their product teams’ go-to-market motions. Finally, the models must be hosted securely, and customer data must be protected to abide by Salesforce’s commitment to providing a trusted and secure platform.

Solution overview

To support such a critical function for Salesforce AI, the team developed a hosting framework on AWS to simplify their model lifecycle, allowing them to quickly and securely deploy models at scale while optimizing for cost. The following diagram illustrates the solution workflow.

Managing performance and scalability

Managing scalability in the project involves balancing performance with efficiency and resource management. With SageMaker AI, the team supports distributed inference and multi-model deployments, preventing memory bottlenecks and reducing hardware costs. SageMaker AI provides access to advanced GPUs, supports multi-model deployments, and enables intelligent batching strategies to balance throughput with latency. This flexibility makes sure performance improvements don’t compromise scalability, even in high-demand scenarios. To learn more, see Revolutionizing AI: How Amazon SageMaker Enhances Einstein’s Large Language Model Latency and Throughput.

Accelerating development with SageMaker Deep Learning Containers

SageMaker AI Deep Learning Containers (DLCs) play a crucial role in accelerating model development and deployment. These pre-built containers come with optimized deep learning frameworks and best-practice configurations, providing a head start for AI teams. DLCs provide optimized library versions, preconfigured CUDA settings, and other performance enhancements that improve inference speeds and efficiency. This significantly reduces the setup and configuration overhead, allowing engineers to focus on model optimization rather than infrastructure concerns.

Best practice configurations for deployment in SageMaker AI

A key advantage of using SageMaker AI is the best practice configurations for deployment. SageMaker AI provides default parameters for setting GPU utilization and memory allocation, which simplifies the process of configuring high-performance inference environments. These features make it straightforward to deploy optimized models with minimal manual intervention, providing high availability and low-latency responses.

The team uses the DLC’s rolling-batch capability, which optimizes request batching to maximize throughput while maintaining low latency. SageMaker AI DLCs expose configurations for rolling batch inference with best-practice defaults, simplifying the implementation process. By adjusting parameters such as max_rolling_batch_size and job_queue_size, the team was able to fine-tune performance without extensive custom engineering. This streamlined approach provides optimal GPU utilization while maintaining real-time response requirements.

SageMaker AI provides elastic load balancing, instance scaling, and real-time model monitoring, and provides Salesforce control over scaling and routing strategies to suit their needs. These measures maintain consistent performance across environments while optimizing scalability, performance, and cost-efficiency.

Because the team supports multiple simultaneous deployments across projects, they needed to make sure enhancements in each project didn’t compromise others. To address this, they adopted a modular development approach. The SageMaker AI DLC architecture is designed with modular components such as the engine abstraction layer, model store, and workload manager. This structure allows the team to isolate and optimize individual components on the container, like rolling batch inference for throughput, without disrupting critical functionality such as latency or multi-framework support. This allows project teams to work on individual projects such as performance tuning while allowing others to focus on enabling other functionalities such as streaming in parallel.

This cross-functional collaboration is complemented by comprehensive testing. The Salesforce AI model team implemented continuous integration (CI) pipelines using a mix of internal and external tools such as Jenkins and Spinnaker to detect any unintended side effects early. Regression testing made sure that optimizations, such as deploying models with TensorRT or vLLM, didn’t negatively impact scalability or user experience. Regular reviews, involving collaboration between the development, foundation model operations (FMOps), and security teams, made sure that optimizations aligned with project-wide objectives.

Configuration management is also part of the CI pipeline. To be precise, configuration is stored in git alongside inference code. Configuration management using simple YAML files enabled rapid experimentation across optimizers and hyperparameters without altering the underlying code. These practices made sure that performance or security improvements were well-coordinated and didn’t introduce trade-offs in other areas.

Maintaining security through rapid deployment

Balancing rapid deployment with high standards of trust and security requires embedding security measures throughout the development and deployment lifecycle. Secure-by-design principles are adopted from the outset, making sure that security requirements are integrated into the architecture. Rigorous testing of all models is conducted in development environments alongside performance testing to provide scalable performance and security before production.

To maintain these high standards throughout the development process, the team employs several strategies:

  • Automated continuous integration and delivery (CI/CD) pipelines with built-in checks for vulnerabilities, compliance validation, and model integrity
  • Employing DJL-Serving’s encryption mechanisms for data in transit and at rest
  • Using AWS services like SageMaker AI that provide enterprise-grade security features such as role-based access control (RBAC) and network isolation

Frequent automated testing for both performance and security is employed through small incremental deployments, allowing for early issue identification while minimizing risks. Collaboration with cloud providers and continuous monitoring of deployments maintain compliance with the latest security standards and make sure rapid deployment aligns seamlessly with robust security, trust, and reliability.

Focus on continuous improvement

As Salesforce’s generative AI needs scale, and with the ever-changing model landscape, the team continually works to improve their deployment infrastructure—ongoing research and development efforts are centered on enhancing the performance, scalability, and efficiency of LLM deployments. The team is exploring new optimization techniques with SageMaker, including:

  • Advanced quantization methods (INT-4, AWQ, FP8)
  • Tensor parallelism (splitting tensors across multiple GPUs)
  • More efficient batching using caching strategies within DJL-Serving to boost throughput and reduce latency

The team is also investigating emerging technologies like AWS AI chips (AWS Trainium and AWS Inferentia) and AWS Graviton processors to further improve cost and energy efficiency. Collaboration with open source communities and public cloud providers like AWS makes sure that the latest advancements are incorporated into deployment pipelines while also pushing the boundaries further. Salesforce is collaborating with AWS to include advanced features into DJL, which makes the usage even better and more robust, such as additional configuration parameters, environment variables, and more granular metrics for logging. A key focus is refining multi-framework support and distributed inference capabilities to provide seamless model integration across various environments.

Efforts are also underway to enhance FMOps practices, such as automated testing and deployment pipelines, to expedite production readiness. These initiatives aim to stay at the forefront of AI innovation, delivering cutting-edge solutions that align with business needs and meet customer expectations. They are in close collaboration with the SageMaker team to continue to explore potential features and capabilities to support these areas.

Conclusion

Though exact metrics vary by use case, the Salesforce AI Model Serving team saw substantial improvements in terms of deployment speed and cost-efficiency with their strategy on SageMaker AI. They experienced faster iteration cycles, measured in days or even hours instead of weeks. With SageMaker AI, they reduced their model deployment time by as much as 50%.

To learn more about how SageMaker AI enhances Einstein’s LLM latency and throughput, see Revolutionizing AI: How Amazon SageMaker Enhances Einstein’s Large Language Model Latency and Throughput. For more information on how to get started with SageMaker AI, refer to Guide to getting set up with Amazon SageMaker AI.


About the authors

ml-18362-Authors-SaiGurujuSai Guruju is working as a Lead Member of Technical Staff at Salesforce. He has over 7 years of experience in software and ML engineering with a focus on scalable NLP and speech solutions. He completed his Bachelor’s of Technology in EE from IIT-Delhi, and has published his work at InterSpeech 2021 and AdNLP 2024.

ml-18362-Authors-NitinSuryaNitin Surya is working as a Lead Member of Technical Staff at Salesforce. He has over 8 years of experience in software and machine learning engineering, completed his Bachelor’s of Technology in CS from VIT University, with an MS in CS (with a major in Artificial Intelligence and Machine Learning) from the University of Illinois Chicago. He has three patents pending, and has published and contributed to papers at the CoRL Conference.

ml-18362-Authors-SrikantaPrasadSrikanta Prasad is a Senior Manager in Product Management specializing in generative AI solutions, with over 20 years of experience across semiconductors, aerospace, aviation, print media, and software technology. At Salesforce, he leads model hosting and inference initiatives, focusing on LLM inference serving, LLMOps, and scalable AI deployments. Srikanta holds an MBA from the University of North Carolina and an MS from the National University of Singapore.

ml-18362-Authors-RielahDeJesusRielah De Jesus is a Principal Solutions Architect at AWS who has successfully helped various enterprise customers in the DC, Maryland, and Virginia area move to the cloud. In her current role, she acts as a customer advocate and technical advisor focused on helping organizations like Salesforce achieve success on the AWS platform. She is also a staunch supporter of women in IT and is very passionate about finding ways to creatively use technology and data to solve everyday challenges.

Read More

Automate Amazon EKS troubleshooting using an Amazon Bedrock agentic workflow

Automate Amazon EKS troubleshooting using an Amazon Bedrock agentic workflow

As organizations scale their Amazon Elastic Kubernetes Service (Amazon EKS) deployments, platform administrators face increasing challenges in efficiently managing multi-tenant clusters. Tasks such as investigating pod failures, addressing resource constraints, and resolving misconfiguration can consume significant time and effort. Instead of spending valuable engineering hours manually parsing logs, tracking metrics, and implementing fixes, teams should focus on driving innovation. Now, with the power of generative AI, you can transform your Kubernetes operations. By implementing intelligent cluster monitoring, pattern analysis, and automated remediation, you can dramatically reduce both mean time to identify (MTTI) and mean time to resolve (MTTR) for common cluster issues.

At AWS re:Invent 2024, we announced the multi-agent collaboration capability for Amazon Bedrock (preview). With multi-agent collaboration, you can build, deploy, and manage multiple AI agents working together on complex multistep tasks that require specialized skills. Because troubleshooting an EKS cluster involves deriving insights from multiple observability signals and applying fixes using a continuous integration and deployment (CI/CD) pipeline, a multi-agent workflow can help an operations team streamline the management of EKS clusters. The workflow manager agent can integrate with individual agents that interface with individual observability signals and a CI/CD workflow to orchestrate and perform tasks based on user prompt.

In this post, we demonstrate how to orchestrate multiple Amazon Bedrock agents to create a sophisticated Amazon EKS troubleshooting system. By enabling collaboration between specialized agents—deriving insights from K8sGPT and performing actions through the ArgoCD framework—you can build a comprehensive automation that identifies, analyzes, and resolves cluster issues with minimal human intervention.

Solution overview

The architecture consists of the following core components:

  • Amazon Bedrock collaborator agent – Orchestrates the workflow and maintains context while routing user prompts to specialized agents, managing multistep operations and agent interactions
  • Amazon Bedrock agent for K8sGPT – Evaluates cluster and pod events through K8sGPT’s Analyze API for security issues, misconfigurations, and performance problems, providing remediation suggestions in natural language
  • Amazon Bedrock agent for ArgoCD – Manages GitOps-based remediation through ArgoCD, handling rollbacks, resource optimization, and configuration updates

The following diagram illustrates the solution architecture.

Architecture Diagram

Prerequisites

You need to have the following prerequisites in place:

Set up the Amazon EKS cluster with K8sGPT and ArgoCD

We start with installing and configuring the K8sGPT operator and ArgoCD controller on the EKS cluster.

The K8sGPT operator will help with enabling AI-powered analysis and troubleshooting of cluster issues. For example, it can automatically detect and suggest fixes for misconfigured deployments, such as identifying and resolving resource constraint problems in pods.

ArgoCD is a declarative GitOps continuous delivery tool for Kubernetes that automates the deployment of applications by keeping the desired application state in sync with what’s defined in a Git repository.

The Amazon Bedrock agent serves as the intelligent decision-maker in our architecture, analyzing cluster issues detected by K8sGPT. After the root cause is identified, the agent orchestrates corrective actions through ArgoCD’s GitOps engine. This powerful integration means that when problems are detected (whether it’s a misconfigured deployment, resource constraints, or scaling issue), the agent can automatically integrate with ArgoCD to provide the necessary fixes. ArgoCD then picks up these changes and synchronizes them with your EKS cluster, creating a truly self-healing infrastructure.

  1. Create the necessary namespaces in Amazon EKS:
    kubectl create ns helm-guestbook
    kubectl create ns k8sgpt-operator-system
  2. Add the k8sgpt Helm repository and install the operator:
    helm repo add k8sgpt https://charts.k8sgpt.ai/
    helm repo update
    helm install k8sgpt-operator k8sgpt/k8sgpt-operator 
      --namespace k8sgpt-operator-system
  3. You can verify the installation by entering the following command:
    kubectl get pods -n k8sgpt-operator-system
    
    NAME                                                          READY   STATUS    RESTARTS  AGE
    release-k8sgpt-operator-controller-manager-5b749ffd7f-7sgnd   2/2     Running   0         1d
    

After the operator is deployed, you can configure a K8sGPT resource. This Custom Resource Definition(CRD) will have the large language model (LLM) configuration that will aid in AI-powered analysis and troubleshooting of cluster issues. K8sGPT supports various backends to help in AI-powered analysis. For this post, we use Amazon Bedrock as the backend and Anthropic’s Claude V3 as the LLM.

  1. You need to create the pod identity for providing the EKS cluster access to other AWS services with Amazon Bedrock:
    eksctl create podidentityassociation  --cluster PetSite --namespace k8sgpt-operator-system --service-account-name k8sgpt  --role-name k8sgpt-app-eks-pod-identity-role --permission-policy-arns arn:aws:iam::aws:policy/AmazonBedrockFullAccess  --region $AWS_REGION
  2. Configure the K8sGPT CRD:
    cat << EOF > k8sgpt.yaml
    apiVersion: core.k8sgpt.ai/v1alpha1
    kind: K8sGPT
    metadata:
      name: k8sgpt-bedrock
      namespace: k8sgpt-operator-system
    spec:
      ai:
        enabled: true
        model: anthropic.claude-v3
        backend: amazonbedrock
        region: us-east-1
        credentials:
          secretRef:
            name: k8sgpt-secret
            namespace: k8sgpt-operator-system
      noCache: false
      repository: ghcr.io/k8sgpt-ai/k8sgpt
      version: v0.3.48
    EOF
    
    kubectl apply -f k8sgpt.yaml
    
  3. Validate the settings to confirm the k8sgpt-bedrock pod is running successfully:
    kubectl get pods -n k8sgpt-operator-system
    NAME                                                          READY   STATUS    RESTARTS      AGE
    k8sgpt-bedrock-5b655cbb9b-sn897                               1/1     Running   9 (22d ago)   22d
    release-k8sgpt-operator-controller-manager-5b749ffd7f-7sgnd   2/2     Running   3 (10h ago)   22d
    
  4. Now you can configure the ArgoCD controller:
    helm repo add argo https://argoproj.github.io/argo-helm
    helm repo update
    kubectl create namespace argocd
    helm install argocd argo/argo-cd 
      --namespace argocd 
      --create-namespace
  5. Verify the ArgoCD installation:
    kubectl get pods -n argocd
    NAME                                                READY   STATUS    RESTARTS   AGE
    argocd-application-controller-0                     1/1     Running   0          43d
    argocd-applicationset-controller-5c787df94f-7jpvp   1/1     Running   0          43d
    argocd-dex-server-55d5769f46-58dwx                  1/1     Running   0          43d
    argocd-notifications-controller-7ccbd7fb6-9pptz     1/1     Running   0          43d
    argocd-redis-587d59bbc-rndkp                        1/1     Running   0          43d
    argocd-repo-server-76f6c7686b-rhjkg                 1/1     Running   0          43d
    argocd-server-64fcc786c-bd2t8                       1/1     Running   0          43d
  6. Patch the argocd service to have an external load balancer:
    kubectl patch svc argocd-server -n argocd -p '{"spec": {"type": "LoadBalancer"}}'
  7. You can now access the ArgoCD UI with the following load balancer endpoint and the credentials for the admin user:
    kubectl get svc argocd-server -n argocd
    NAME            TYPE           CLUSTER-IP       EXTERNAL-IP                                                              PORT(S)                      AGE
    argocd-server   LoadBalancer   10.100.168.229   a91a6fd4292ed420d92a1a5c748f43bc-653186012.us-east-1.elb.amazonaws.com   80:32334/TCP,443:32261/TCP   43d
  8. Retrieve the credentials for the ArgoCD UI:
    export argocdpassword=`kubectl -n argocd get secret argocd-initial-admin-secret 
    -o jsonpath="{.data.password}" | base64 -d`
    
    echo ArgoCD admin password - $argocdpassword
  9. Push the credentials to AWS Secrets Manager:
    aws secretsmanager create-secret 
    --name argocdcreds 
    --description "Credentials for argocd" 
    --secret-string "{"USERNAME":"admin","PASSWORD":"$argocdpassword"}"
  10. Configure a sample application in ArgoCD:
    cat << EOF > argocd-application.yaml
    apiVersion: argoproj.io/v1alpha1
    kind: Application
    metadata:
    name: helm-guestbook
    namespace: argocd
    spec:
    project: default
    source:
    repoURL: https://github.com/awsvikram/argocd-example-apps
    targetRevision: HEAD
    path: helm-guestbook
    destination:
    server: https://kubernetes.default.svc
    namespace: helm-guestbook
    syncPolicy:
    automated:
    prune: true
    selfHeal: true
    EOF
  11. Apply the configuration and verify it from the ArgoCD UI by logging in as the admin user:
    kubectl apply -f argocd-application.yaml

    ArgoCD Application

  12. It takes some time for K8sGPT to analyze the newly created pods. To make that immediate, restart the pods created in the k8sgpt-operator-system namespace. The pods can be restarted by entering the following command:
    kubectl -n k8sgpt-operator-system rollout restart deploy
    
    deployment.apps/k8sgpt-bedrock restarted
    deployment.apps/k8sgpt-operator-controller-manager restarted

Set up the Amazon Bedrock agents for K8sGPT and ArgoCD

We use a CloudFormation stack to deploy the individual agents into the US East (N. Virginia) Region. When you deploy the CloudFormation template, you deploy several resources (costs will be incurred for the AWS resources used).

Use the following parameters for the CloudFormation template:

  • EnvironmentName: The name for the deployment (EKSBlogSetup)
  • ArgoCD_LoadBalancer_URL: Extracting the ArgoCD LoadBalancer URL:
    kubectl  get service argocd-server -n argocd -ojsonpath="{.status.loadBalancer.ingress[0].hostname}"
  • AWSSecretName: The Secrets Manager secret name that was created to store ArgoCD credentials

The stack creates the following AWS Lambda functions:

  • <Stack name>-LambdaK8sGPTAgent-<auto-generated>
  • <Stack name>-RestartRollBackApplicationArgoCD-<auto-generated>
  • <Stack name>-ArgocdIncreaseMemory-<auto-generated>

The stack creates the following Amazon Bedrock agents:

  • ArgoCDAgent, with the following action groups:
    1. argocd-rollback
    2. argocd-restart
    3. argocd-memory-management
  • K8sGPTAgent, with the following action group:
    1. k8s-cluster-operations
  • CollaboratorAgent

The stack outputs the following, with the following agents associated to it:

  1. ArgoCDAgent
  2. K8sGPTAgent
  • LambdaK8sGPTAgentRole, AWS Identity and Access Management (IAM) role Amazon Resource Name (ARN) associated to the Lambda function handing interactions with the K8sGPT agent on the EKS cluster. This role ARN will be needed at a later stage of the configuration process.
  • K8sGPTAgentAliasId, ID of the K8sGPT Amazon Bedrock agent alias
  • ArgoCDAgentAliasId, ID of the ArgoCD Amazon Bedrock Agent alias
  • CollaboratorAgentAliasId, ID of the collaborator Amazon Bedrock agent alias

Assign appropriate permissions to enable K8sGPT Amazon Bedrock agent to access the EKS cluster

To enable the K8sGPT Amazon Bedrock agent to access the EKS cluster, you need to configure the appropriate IAM permissions using Amazon EKS access management APIs. This is a two-step process: first, you create an access entry for the Lambda function’s execution role (which you can find in the CloudFormation template output section), and then you associate the AmazonEKSViewPolicy to grant read-only access to the cluster. This configuration makes sure that the K8sGPT agent has the necessary permissions to monitor and analyze the EKS cluster resources while maintaining the principle of least privilege.

  1. Create an access entry for the Lambda function’s execution role
    export CFN_STACK_NAME=EKS-Troubleshooter
    	   export EKS_CLUSTER=PetSite
    
    export K8SGPT_LAMBDA_ROLE=`aws cloudformation describe-stacks --stack-name $CFN_STACK_NAME --query "Stacks[0].Outputs[?OutputKey=='LambdaK8sGPTAgentRole'].OutputValue" --output text`
    
    aws eks create-access-entry 
        --cluster-name $EKS_CLUSTER 
        --principal-arn $K8SGPT_LAMBDA_ROLE
  2. Associate the EKS view policy with the access entry
    aws eks associate-access-policy 
        --cluster-name $EKS_CLUSTER 
        --principal-arn  $K8SGPT_LAMBDA_ROLE
        --policy-arn arn:aws:eks::aws:cluster-access-policy/AmazonEKSClusterAdminPolicy 
        --access-scope type=cluster
  3. Verify the Amazon Bedrock agents. The CloudFormation template adds all three required agents. To view the agents, on the Amazon Bedrock console, under Builder tools in the navigation pane, select Agents, as shown in the following screenshot.

Bedrock agents

Perform Amazon EKS troubleshooting using the Amazon Bedrock agentic workflow

Now, test the solution. We explore the following two scenarios:

  1. The agent coordinates with the K8sGPT agent to provide insights into the root cause of a pod failure
  2. The collaborator agent coordinates with the ArgoCD agent to provide a response

Agent coordinates with K8sGPT agent to provide insights into the root cause of a pod failure

In this section, we examine a down alert for a sample application called memory-demo. We’re interested in the root cause of the issue. We use the following prompt: “We got a down alert for the memory-demo app. Help us with the root cause of the issue.”

The agent not only stated the root cause, but went one step further to potentially fix the error, which in this case is increasing memory resources to the application.

K8sgpt agent finding

Collaborator agent coordinates with ArgoCD agent to provide a response

For this scenario, we continue from the previous prompt. We feel the application wasn’t provided enough memory, and it should be increased to permanently fix the issue. We can also tell the application is in an unhealthy state in the ArgoCD UI, as shown in the following screenshot.

ArgoUI

Let’s now proceed to increase the memory, as shown in the following screenshot.

Interacting with agent to increase memory

The agent interacted with the argocd_operations Amazon Bedrock agent and was able to successfully increase the memory. The same can be inferred in the ArgoCD UI.

ArgoUI showing memory increase

Cleanup

If you decide to stop using the solution, complete the following steps:

  1. To delete the associated resources deployed using AWS CloudFormation:
    1. On the AWS CloudFormation console, choose Stacks in the navigation pane.
    2. Locate the stack you created during the deployment process (you assigned a name to it).
    3. Select the stack and choose Delete.
  2. Delete the EKS cluster if you created one specifically for this implementation.

Conclusion

By orchestrating multiple Amazon Bedrock agents, we’ve demonstrated how to build an AI-powered Amazon EKS troubleshooting system that simplifies Kubernetes operations. This integration of K8sGPT analysis and ArgoCD deployment automation showcases the powerful possibilities when combining specialized AI agents with existing DevOps tools. Although this solution represents advancement in automated Kubernetes operations, it’s important to remember that human oversight remains valuable, particularly for complex scenarios and strategic decisions.

As Amazon Bedrock and its agent capabilities continue to evolve, we can expect even more sophisticated orchestration possibilities. You can extend this solution to incorporate additional tools, metrics, and automation workflows to meet your organization’s specific needs.

To learn more about Amazon Bedrock, refer to the following resources:


About the authors

Vikram Venkataraman is a Principal Specialist Solutions Architect at Amazon Web Services (AWS). He helps customers modernize, scale, and adopt best practices for their containerized workloads. With the emergence of Generative AI, Vikram has been actively working with customers to leverage AWS’s AI/ML services to solve complex operational challenges, streamline monitoring workflows, and enhance incident response through intelligent automation.

Puneeth Ranjan Komaragiri is a Principal Technical Account Manager at Amazon Web Services (AWS). He is particularly passionate about monitoring and observability, cloud financial management, and generative AI domains. In his current role, Puneeth enjoys collaborating closely with customers, leveraging his expertise to help them design and architect their cloud workloads for optimal scale and resilience.

Sudheer Sangunni is a Senior Technical Account Manager at AWS Enterprise Support. With his extensive expertise in the AWS Cloud and big data, Sudheer plays a pivotal role in assisting customers with enhancing their monitoring and observability capabilities within AWS offerings.

Vikrant Choudhary is a Senior Technical Account Manager at Amazon Web Services (AWS), specializing in healthcare and life sciences. With over 15 years of experience in cloud solutions and enterprise architecture, he helps businesses accelerate their digital transformation initiatives. In his current role, Vikrant partners with customers to architect and implement innovative solutions, from cloud migrations and application modernization to emerging technologies such as generative AI, driving successful business outcomes through cloud adoption.

Read More