July 2025 – Page 3

How Nippon India Mutual Fund improved the accuracy of AI assistant responses using advanced RAG methods on Amazon Bedrock

This post is co-written with Abhinav Pandey from Nippon Life India Asset Management Ltd.

Accurate information retrieval through generative AI-powered assistants is a popular use case for enterprises. To reduce hallucination and improve overall accuracy, Retrieval Augmented Generation (RAG) remains the most commonly used method to retrieve reliable and accurate responses that use enterprise data when responding to user queries. RAG is used for use cases such as AI assistants, search, real-time insights, and improving overall content quality by using the relevant data to generate the response, thereby reducing hallucinations.

Amazon Bedrock Knowledge Bases provides a managed RAG experience that can be used for many use cases. Amazon Bedrock Knowledge Bases is a fully managed service that does the heavy lifting of implementing a RAG pattern—including data ingestion, data chunking, data embedding, and query matching. Amazon Bedrock offers a choice of high-performing foundation models (FMs) from leading AI companies such as AI21 Labs, Anthropic, Cohere, Meta, Mistral AI, Stability AI, and Amazon through a single API, along with a broad set of capabilities to build generative AI applications with security, privacy, and responsible AI. Using Amazon Bedrock Knowledge Bases, you can create a RAG solution quickly and seamlessly.

However, in a large enterprise scenario with a large number of relevant documents, the final response is generated based on only the top 5 or top 10 results provided by the database. Because RAG uses a similarity match and not an exact match, there is a possibility that the most relevant result won’t be in the top results returned by the database. In such scenarios, the regular RAG pattern might not be highly accurate.

In this post, we examine a solution adopted by Nippon Life India Asset Management Limited that improves the accuracy of the response over a regular (naive) RAG approach by rewriting the user queries and aggregating and reranking the responses. The proposed solution uses enhanced RAG methods such as reranking to improve the overall accuracy.

Limitations and workarounds in naive RAG for a large volume of documents

The following are the primary limitations with naive RAG when using it with a large volume of documents:

Accuracy – As the documents grow in number or size, the final list of extracted chunks might miss some relevant sections or documents because of the limited number of documents returned
Parsing complex structures – Entities such as nested tables, images, and graphs are not parsed accurately
Limited use of relevant data – As the dataset grows, only the configured set of top results are used for context, impacting accuracy
Responding to complex questions – Compound questions (such as a question with multiple sub-questions) pose challenges to the RAG solution
Retrieving the correct context – Documents such as legal documents or technical manuals have semantically related sections on different pages, impacting the overall accuracy
Avoiding hallucinations – Models must generate complete, correct, and grounded responses without hallucinations

To address these challenges, developers usually adopt a combination of the following workarounds:

Programmatic parsing – Use another service (such as Amazon Textract) to extract the table content into a markdown (.MD) file.
Programmatic split of compound questions – Programmatically split the question or reformulate the question and get the responses and then programmatically aggregate the response.
Programmatic chunking – Programmatically create custom chunks of the documents and manage them in a vector store.

Solution overview

In this section, we review the basics of RAG, Amazon Bedrock Knowledge Bases, and advanced RAG methods to address the preceding challenges. The following table details the specific solution components adopted by Nippon to overcome the challenges we discussed in the previous section.

Naive RAG Challenges	How Nippon Addressed These Challenges
Lower accuracy due to high volume of documents	Use Amazon Bedrock Knowledge Bases with advanced RAG methods, including semantic chunking, multi-query RAG, and results reranking
Parsing complex document structure such as nested tables and graphs within documents	Use Amazon Textract to parse the documents into markdown files
Handling compound questions	Use query reformulation and results reranking to get the relevant results
Retrieving current context from the documents	Use semantic chunking in Amazon Bedrock Knowledge Bases
Generating grounded responses without any hallucination	Use Amazon Bedrock Knowledge Bases RAG evaluation

In the next few sections, we detail each of the solution components. We start with an overview of RAG and then discuss Amazon Bedrock Knowledge Bases. We then discuss advanced RAG methods adopted by Nippon, including advanced parsing, query reformulation, multi-query RAG, and results reranking.

RAG overview

The RAG pattern for this solution has two primary workflows: a data ingestion workflow and a text generation phase, as depicted in the following figure. During the ingestion phase, the solution chunks the content of the source documents, creates embeddings for the created chunks, and stores them in a vector database. In the text generation workflow, the user query is converted to a vector embedding and the query embedding is compared with embeddings stored in the vector database. The database provides the top results that are close to the user query. The solution then sends the user query along with the retrieved top results as the context to the FM, which then provides the final response. Because the responses are based on the relevant contextual enterprise data, this solution reduces hallucinations.

Ingestion & Text generation workflows

You can implement this solution using AWS services as shown in the following figure.

RAG implementation using AWS services

The following is a typical RAG-based AI assistant flow.The first series of steps, as numbered in the preceding diagram, augment the user query using a knowledge base:

User queries are served by Amazon Lex hosted on Amazon CloudFront.
The business logic for the AI assistant can run on Amazon Elastic Compute Cloud (Amazon EC2), Amazon Elastic Container Service (Amazon ECS), Amazon Elastic Kubernetes Service (Amazon EKS), or AWS Lambda.
You can manage the conversation history or state in Amazon DynamoDB, Amazon DocumentDB (with MongoDB compatibility), or Amazon MemoryDB.
The user query and the earlier conversation history are sent to the Amazon Bedrock embedding model.
User prompts are compared to earlier prompts cached in MemoryDB. If the user prompt matches a stored prompt, the stored prompt result is retrieved and sent to the user.
If there’s no match for the user prompt in the cache, the vector database (such as Amazon OpenSearch, Amazon Aurora, Amazon Kendra, MemoryDB, DocumentDB, or Amazon Neptune) is searched for the query embedding. The relevant knowledge or Amazon Simple Storage Service (Amazon S3) path to the relevant image or video is retrieved.
The relevant knowledge, image, or video is retrieved from the Amazon S3 path.
The multimodal large language model (LLM) in Amazon Bedrock uses the query results to create a final response.
The new answer is stored in the database cache.
The final response is sent to the user.

The workflow to create and maintain the knowledge base is shown in the previous diagram:

The data is ingested from structured and unstructured sources such as a file system, emails, or multi-media content.
The data is ingested into Amazon S3 in batches or in real time using services such as AWS Database Migration Service (AWS DMS), Amazon Kinesis, Amazon Managed Streaming for Apache Kafka (Amazon MSK), Amazon AppFlow, or AWS Glue.
The data is processed through AWS Glue, Amazon SageMaker processing, or Amazon EMR.
An embeddings LLM from Amazon Bedrock generates the embeddings for the content chunks.
The embeddings are stored in the vector database.

Amazon Bedrock Knowledge Bases

Amazon Bedrock Knowledge Bases provides managed RAG, which does the heavy lifting of the core activities in the data ingestion and text generation workflows. Nippon uses Amazon Bedrock Knowledge Bases for implementing the RAG pattern. The following figure shows the fully managed data ingestion workflow provided by Amazon Bedrock Knowledge Bases. You can choose the data source from which data of various formats can be incrementally updated. For content chunking, you can choose from the following strategies: fixed, semantic, or hierarchical. You can also choose the embedding models and vector store.

Data ingestion flow

The following table compares various chunking strategies. You can also author a custom chunking strategy using Lambda.

Chunking strategy	Details	Advantages
Fixed chunking	Fixed character count division Recursive character text splitting	Quick and efficient
Semantic chunking (used by Nippon)	Meaningful and complete chunks based on semantic similarity	Better retrieval quality
Hierarchical chunking	Organize chunks in a structured manner Nodes with summaries and parent-child relationships	Improved retrieval efficiency and reliability

Nippon uses semantic chunking because the documents have sections with semantic similarity. During the text generation workflow, Amazon Bedrock Knowledge Bases creates embeddings from the user input and performs semantic search in the Amazon Bedrock knowledge base. The retrieved results from the context are augmented with a prompt for the LLM to generate the final response as shown in the following figure.

RAG Flow

Advanced RAG methods adopted by Nippon

In this section, we review the key advanced RAG methods used in the Nippon AI assistant solution.

Advanced parsing and chunking

A complex document with tables, images, and graphs poses challenges in RAG because the default fixed chunking often loses the context. For these scenarios, Nippon uses the following approaches:

Parse the document with Amazon Textract – Nippon uses Amazon Textract—which excels at understanding complex structures like tables—to extract the details from the document into a markdown file (.MD).
Parse the document with an FM with specific instructions – In this approach, Nippon uses an FM that has specific parsing instructions to extract the details from the document. Amazon Bedrock Knowledge Bases provides LLMs along with instructions for parsing the documents.
Parse the document using third-party parsers – This approach uses third-party parsers such as open source document parsers and then indexed the parsed content.

After the documents are parsed, Amazon Bedrock Knowledge Bases chunks the data. Amazon Bedrock Knowledge Bases provides fixed chunking, semantic chunking, and hierarchical chunking strategies. You can also add custom chunking using Lambda functions for specific documents.

Query reformulation

If the user queries are complex, the overall accuracy of the RAG solution decreases because it can be difficult to understand all the nuances of the complex question. In such cases, you can split the complex question into multiple sub-questions for improved accuracy. Nippon used an LLM to split the compound query into multiple sub-queries. The search results for each of the sub-queries are returned simultaneously and are then ranked and combined. The LLM uses the combined data to provide the final response, as shown in the following figure.

Query reformulation

Multi-query RAG

In the multi-query RAG approach, you can reformulate the question into different variants (Amazon Bedrock Knowledge Bases provides automatic query reformulation) and execute the queries in parallel. You then summarize the results of those queries and rerank them to get the final response as shown in the following figure.

Multi-query RAG

Nippon uses the initial question and break it into n sub-questions using our choice of LLM (Anthropic’s Claude3 Sonnet on Amazon Bedrock). With this implementation, the quality of our responses has improved, and the query is answered in greater depth. Because this requires additional processing, the response time is slightly longer, which is justified by the quality of the response.

Results reranking

In a typical enterprise scenario where you have a large volume of documents, the ranking of the results might not be correct and accuracy could be reduced. For example, the default results returned from a vector database will be based on vector similarity criteria whose rank might not be accurate or relevant based on the user’s conversation history or the query context. In such scenarios, you can use another FM to look closely at the results and reorder them based on analysis of the context, relevance, and other criteria. After the results are reranked, you can be certain that the top results are highly relevant to the user context. Each document or search result is assigned a score based on criteria such as semantic similarity, contextual alignment, or domain-specific features, thereby increasing the overall accuracy of the final response, reducing hallucination, and improving user satisfaction. The key reranker strategies are:

Cross-encoder reranking – Uses models specifically trained to assess the relevance between query-document pairs
Hybrid reranking – Combines multiple signals such as vector similarity, keyword matching, recency, and so on
Multi-stage retrieval – Retrieves a larger set using faster methods, then applies more computationally intensive reranking on a subset

Amazon Bedrock provides reranker models that calculate the relevance of the chunks for a given query, assigns a score for the chunk, and reorders the results based on the score. By using a reranker model in Amazon Bedrock, you can get highly relevant chunks in the search results, improving the overall response accuracy. Nippon has combined query reformulation with results reranking to improve the overall accuracy of the RAG solution using the Amazon Bedrock reranker models.

Metrics for RAG Evaluation

Because RAG solutions have multiple moving parts, you need to evaluate them against key success metrics to make sure that you get relevant and contextual data from the knowledge base. RAG evaluation should validate the generation of complete, correct, and grounded answers without hallucinations. You also need to evaluate the bias, safety, and trust of RAG solutions.

Amazon Bedrock Knowledge Bases provides built-in support for RAG evaluation, including quality metrics such as correctness, completeness, and faithfulness (hallucination detection); responsible AI metrics such as harmfulness, answer refusal, and stereotyping; and compatibility with Amazon Bedrock Guardrails. Nippon used RAG evaluation to compare against multiple evaluation jobs using custom datasets.

Nippon is currently evaluating additional techniques, including GraphRAG, metadata filtering, and agentic AI. We have briefly summarized the capabilities being evaluated at Nippon in the following sections.

GraphRAG

For applications that need to establish relationships with data that are hierarchically related (such as knowledge management, enterprise search, recommendation systems, and so on), Nippon uses a graph database instead of a vector database. GraphRAG applications use rich and interconnected entity relationships to identify the dependencies and excel in querying multi-dimensional relationships that further boost the context for FMs. In addition, with graph databases, Nippon can efficiently query and traverse the data, and their schema enables them to accommodate dynamic content. The graph databases help the FMs better understand the semantic relationship, uncover hidden patterns, adapt to dynamic knowledge bases, and provide better reasonability.

Amazon Bedrock Knowledge Bases supports fully managed GraphRAG, which uses the Amazon Neptune graph database.

Metadata filtering

In a few use cases, you might need to filter the documents in a knowledge base based on specific metadata values. For instance, government regulatory bodies release regulatory guidelines frequently, often with the same document names and only minor variations in the regulatory clauses. When your query is about a specific regulation, you want the most recent document to be returned. In such cases, you can rank the documents based on their modified date. Amazon Bedrock Knowledge Bases provide custom metadata as filters (such as modified date) to improve the quality of search results.

Amazon Bedrock Agents

With Amazon Bedrock Agents, you can orchestrate multi-step business processes using tools and information sources. Nippon is currently evaluating the latest FMs in Amazon Bedrock Agents for their AI assistant use case.

Solution flow

The following diagram shows the end-to-end flow of the Nippon AI assistant solution.

End-to-end flow of the Nippon AI assistant solution

The data ingestion workflow consists of the following steps:

The documents from the data source are prepared for ingestion.
Nippon uses custom parsing to extract the relevant details from complex elements like tables, graphs, and images. They use custom chunking to create the chunks.
They use an embedding model from Amazon Bedrock to convert the content chunks to vector embeddings and store them in the vector database.

The content generation workflow is as follows:

The user’s query is converted into embeddings by the Amazon Bedrock embedding model.
Nippon uses multi-query RAG by creating multiple variants of the query and executing them in parallel. The obtained results are reranked using a reranker model for higher accuracy.
The prompt is augmented with the references from the source documents to create citations
The augmented prompt is sent to Amazon Bedrock.
Amazon Bedrock creates and sends the final response to the user.

Nippon plans to use agentic AI implementation in the future for automating the data retrieval, indexing, and ingestion.

Results

Nippon saw the following improvements after implementing RAG:

Accuracy was increased by more than 95%
Hallucination was reduced by 90–95%
They were able to add source chunks and file links (through file metadata), which improves the user confidence in the response
The time needed to generate a report was reduced from 2 days to approximately 10 minutes

Summary

In this post, we discussed RAG and some of the challenges associated with processing large volumes of documents. We explained the advanced RAG methods used in the Nippon AI assistant, including enhanced parsing using Amazon Bedrock Knowledge Bases and third-party models. In addition, we explained query reformulation and multi-query RAG techniques—such as generating multiple queries, reranking results, using GraphRAG, and applying metadata filtering. Finally, we described the end-to-end solution implemented for the Nippon AI assistant. These methods are generally available and are not built by or belong only to Nippon.

Explore Amazon Bedrock Knowledge Bases for RAG use cases by using advanced RAG features such as FM as a parser, query reformulation, reranker models, GraphRAG, and others to implement highly accurate RAG solutions. You can also use Amazon Bedrock Guardrails to build responsible AI solutions by enforcing content and image safeguards and enabling automated reasoning checks.

Please note that RAG methods mentioned in the blog are generally available to all and are not built by or belong only to Nippon.

About the authors

Shailesh Shivakumar is an FSI Sr. Solutions Architect with AWS India. He works with financial enterprises such as banks, NBFCs, and trading enterprises to help them design secure cloud systems and accelerate their cloud journey. He builds demos and proofs of concept to demonstrate the art of the possible on the AWS Cloud. He leads other initiatives such as customer enablement workshops, AWS demos, cost optimization, and solution assessments to make sure AWS customers succeed in their cloud journey. Shailesh is part of Machine Learning TFC at AWS, handling generative AI and machine learning-focused customer scenarios. Security, serverless, containers, and machine learning in the cloud are his key areas of interest.

Abhinav Pandey is a seasoned Data Scientist on the Technology team at Nippon Life India Asset Management Ltd, with over 18 years of industry experience, primarily in BFSI, who is passionate about using generative AI and agentic AI to transform business operations. With a proven track record of applying data for strategic decision-making and business growth, he excels at extracting actionable insights from complex datasets using cutting-edge analytical techniques. A strategic thinker and innovative problem solver, he has developed data-driven strategies that enhance operational efficiency and profitability while working effectively with cross-functional teams to align data initiatives with business objectives. In his leadership role, he has driven data science initiatives, fostered innovation, maintained a results-driven approach, and continuously adapted to evolving technologies to stay ahead of industry trends.

PyTorch on Kubernetes: Kubeflow Trainer Joins the PyTorch Ecosystem

Kubeflow Trainer Logo

We’re thrilled to announce that the Kubeflow Trainer project has been integrated into the PyTorch ecosystem! This integration ensures that Kubeflow Trainer aligns with PyTorch’s standards and practices, giving developers a reliable, scalable, and community-backed solution to run PyTorch on Kubernetes.

To view the PyTorch Ecosystem, see the PyTorch Landscape. Learn more about how projects can join the PyTorch Ecosystem.

About Kubeflow Trainer

Kubeflow Trainer is a Kubernetes-native project enabling scalable, distributed training of AI models and purpose-built for fine-tuning large language models (LLMs). It simplifies the scale-out of training workloads on multiple nodes, managing large datasets efficiently and ensuring fault-tolerance.

The core features include:

Simplify Kubernetes complexity: Kubeflow Trainer APIs are designed for two primary user personas – AI practitioners – ML engineers and data scientists who develop AI models using the Kubeflow Python SDK and TrainJob APIs, platform admins – administrators and DevOps engineers responsible for managing Kubernetes clusters and Kubeflow Trainer runtimes APIs. AI practitioners can focus on the application code in PyTorch without worrying about infrastructure details. Meanwhile, platform admins can flexibly schedule workload resources for maximum cluster utilization and cost efficiency. To support these roles, Kubeflow Trainer specifies purpose-built Kubernetes Custom Resource Definitions (CRDs) that streamline model training and infrastructure management.

Kubeflow Trainer user personas

Python SDK: A Pythonic interface designed for AI practitioners, abstract the details of interacting directly with Kubernetes APIs. It enables users to focus on developing PyTorch models without worrying about Kubernetes YAML configurations.
Blueprints for LLMs fine-tuning on Kubernetes: With built-in trainers, Kubeflow Trainer enables AI practitioners to seamlessly fine-tune their favorite LLMs using the desired configuration for datasets, LoRA parameters, learning rate, etc. In the first release, it implements recipes to support various fine-tuning strategies, including Supervised Fine-Tuning (SFT), Knowledge Distillation, DPO, PPO, GRPO, and Quantization-Aware Training. Community is working towards adding more builtin trainers powered by LLaMA-Factory, Unsloth, HuggingFace TRL to enable efficient LLMs fine-tuning.
Optimized GPU utilization: Kubeflow Trainer maximizes GPU efficiency by streaming large-scale data directly to distributed GPUs using an in-memory distributed data cache powered by Apache Arrow and Apache DataFusion
Advanced scheduling capabilities: Kubeflow Trainer supports gang scheduling through the PodGroupPolicy API, enabling coordinated scheduling of pods across nodes. It also integrates with Kubernetes schedulers such as Kueue, Coscheduling, Volcano, and KAI Scheduler to ensure all required resources are allocated before training jobs start.
Accelerate MPI workloads on Kubernetes: Kubeflow Trainer supports MPI-based runtimes such as DeepSpeed and MLX. It handles all necessary orchestration of MPI workloads with SSH-based optimization to boost MPI performance.
Improved resilience and fault-tolerance: By leveraging Kubernetes-native APIs like Jobs and JobSets, Kubeflow Trainer improves reliability and efficiency of AI workloads. With support for the PodFailurePolicy API, users can reduce cost by avoiding unnecessary restarts. Additionally, the SuccessPolicy API allows training jobs to complete early once the target objective is achieved.

Background and Evolution

This project was originally started as a distributed training operator for TensorFlow (e.g. TFJob), and later we merged efforts from other Kubeflow Training Operators (e.g. PyTorchJob, MPIJob) to provide a unified and simplified experience for both users and developers. We are very grateful to all who filed issues or helped resolve them, asked and answered questions, and were part of inspiring discussions. We’d also like to thank everyone who’s contributed to and maintained the original operators.

By joining the PyTorch Ecosystem, we strive to apply best practices of deploying distributed PyTorch applications on Kubernetes and bring first-class PyTorch support in Kubeflow Trainer.

Integrations with PyTorch Ecosystem

Kubeflow Trainer is deeply integrated with the PyTorch ecosystem, supporting a broad range of tools and libraries—including torch, DeepSpeed, HuggingFace, Horovod, and more.

It empowers PyTorch users to implement advanced distributed training strategies such as Distributed Data Parallel (DDP), Fully Sharded Data Parallel (FSDP & FSDP2), and Tensor Parallelism, enabling efficient large-scale model training on Kubernetes.

Additionally, Kubeflow Trainer supports data parallelism using PyTorch IterableDatasets, streaming data directly from distributed in-memory data cache nodes. This allows scalable training even with massive datasets that exceed local memory capacity.

Quick Start

Follow the steps below to quickly deploy Kubeflow Trainer and run your first training job.

Prerequisites

Installed kubectl
Installed kind

Install Kubeflow Trainer

Deploy Kubeflow Trainer control plane on your local kind cluster:

$ kind create cluster

$ kubectl apply --server-side -k "https://github.com/kubeflow/trainer.git/manifests/overlays/manager?ref=v2.0.0"


# Ensure that JobSet and Trainer controller manager are running.
$ kubectl get pods -n kubeflow-system

NAME                                                  READY   STATUS    RESTARTS   AGE
jobset-controller-manager-54968bd57b-88dk4            2/2     Running   0          65s
kubeflow-trainer-controller-manager-cc6468559-dblnw   1/1     Running   0          65s


# Deploy the Kubeflow Trainer runtimes.
$ kubectl apply --server-side -k "https://github.com/kubeflow/trainer.git/manifests/overlays/runtimes?ref=v2.0.0"

# Install Kubeflow SDK
$ pip install git+https://github.com/kubeflow/sdk.git@64d74db2b6c9a0854e39450d8d1c0201e1e9b3f7#subdirectory=python

Define PyTorch Training Function

After installing the Kubeflow Trainer, define your PyTorch training function that contains end-to-end training script:

def train_pytorch():
    import os
    import torch
    import torch.distributed as dist
    from torch.utils.data import DataLoader, DistributedSampler
    from torchvision import datasets, transforms, models

    # [1] Configure CPU/GPU device and distributed backend.
    device, backend = ("cuda", "nccl") if torch.cuda.is_available() else ("cpu", "gloo")
    dist.init_process_group(backend=backend)
    local_rank = int(os.getenv("LOCAL_RANK", 0))
    device = torch.device(f"{device}:{local_rank}")
    
    # [2] Get the pre-defined model.
    model = models.shufflenet_v2_x0_5(num_classes=10)
    model.conv1 = torch.nn.Conv2d(1, 24, kernel_size=3, stride=2, padding=1, bias=False)
    model = torch.nn.parallel.DistributedDataParallel(model.to(device))
    optimizer = torch.optim.SGD(model.parameters(), lr=0.1, momentum=0.9)
   
    # [3] Get the FashionMNIST dataset and distribute it across all available devices.
    if local_rank == 0: # Download dataset only on local_rank=0 process.
        dataset = datasets.FashionMNIST("./data", train=True, download=True, transform=transforms.Compose([transforms.ToTensor()]))
    dist.barrier()
    dataset = datasets.FashionMNIST("./data", train=True, download=False, transform=transforms.Compose([transforms.ToTensor()]))
    train_loader = DataLoader(dataset, batch_size=100, sampler=DistributedSampler(dataset))

    # [4] Define the PyTorch training loop.
    for epoch in range(3):
        for batch_idx, (inputs, labels) in enumerate(train_loader):
            inputs, labels = inputs.to(device), labels.to(device)
            # Forward and Backward pass
            outputs = model(inputs)
            loss = torch.nn.functional.cross_entropy(outputs, labels)
            optimizer.zero_grad()
            loss.backward()
            optimizer.step()
            if batch_idx % 10 == 0 and dist.get_rank() == 0:
                print(f"Epoch {epoch} [{batch_idx * len(inputs)}/{len(train_loader.dataset)}] "
                    f"Loss: {loss.item():.4f}"
                )

Run PyTorch on Kubernetes with TrainJob

After defining the training function, use the Kubeflow SDK to create TrainJob:

from kubeflow.trainer import TrainerClient, CustomTrainer

job_id = TrainerClient().train(
    trainer=CustomTrainer(
        func=train_pytorch,
        num_nodes=2,
        resources_per_node={
            "cpu": 3,
            "memory": "3Gi",
            # "gpu": 2, # Uncomment this line if you have GPUs.
        },
    ),
    runtime=TrainerClient().get_runtime("torch-distributed"),
)

Get the TrainJob Results

After creating the TrainJob, you should be able to list it:

for job in TrainerClient().list_jobs():
    print(f"TrainJob: {job.name}, Status: {job.status}")

TrainJob: q33a18f65635, Status: Created

It may take a few minutes for the TrainJob to pull the PyTorch image the first time. Once the image is pulled, the TrainJob‘s steps should transition to Running status. Each step represents a training node, and the number of devices per step corresponds to the number of devices on that node.

for s in TrainerClient().get_job(name=job_id).steps:
    print(f"Step: {s.name}, Status: {s.status}, Devices: {s.device} x {s.device_count}")

Step: node-0, Status: Running, Devices: cpu x 3
Step: node-1, Status: Running, Devices: cpu x 3

After steps are running, you can check the TrainJob logs. The dataset of 60,000 samples has been evenly distributed across 6 CPUs, with each device processing 10,000 samples: 60,000 / 6 = 10,000

print(TrainerClient().get_job_logs(name=job_id)["node-0"])

...
Epoch 0 [8000/60000] Loss: 0.4476
Epoch 0 [9000/60000] Loss: 0.4784
Epoch 1 [0/60000] Loss: 0.3909
Epoch 1 [1000/60000] Loss: 0.4888
Epoch 1 [2000/60000] Loss: 0.4100
...

Congratulations, you created your first distributed training job with PyTorch and Kubeflow Trainer!

What’s next

Kubeflow Trainer has exciting roadmap including the following items:

Local TrainJob Execution – run Kubeflow Trainer jobs locally without Kubernetes.
Distributed Data Cache – stream in-memory distributed data powered by Apache Arrow and Apache DataFusion.
Advanced scheduling capabilities – improve resources management and gang-scheduling capabilities by integrating with Kueue, KAI Scheduler, Volcano.
Support for JAX runtime.
Automate Checkpointing for GPU-accelerated workloads.

Call to Action

We are excited to welcome Kubeflow Trainer to the PyTorch ecosystem! Kubeflow Trainer democratizes AI model training on Kubernetes and significantly improves the development experience for AI practitioners. We invite you to explore the following resources to learn more about the project:

Read the Kubeflow Trainer v2 announcement blog post and release notes.
Explore the official Kubeflow Trainer documentation.
Join the conversations in the #kubeflow-trainer Slack channel.
Attend our bi-weekly Kubeflow Trainer community calls every Wednesday.
Share your use cases or feature proposals by opening an issue on the GitHub repository.
Tell your store by writing a Kubeflow blog post or speaking at upcoming Kubeflow Events.
Explore the Kubeflow Python SDK for AI practitioners.

We can’t wait to see what you’ll build with Kubeflow Trainer!

Build a drug discovery research assistant using Strands Agents and Amazon Bedrock

Drug discovery is a complex, time-intensive process that requires researchers to navigate vast amounts of scientific literature, clinical trial data, and molecular databases. Life science customers like Genentech and AstraZeneca are using AI agents and other generative AI tools to increase the speed of scientific discovery. Builders at these organizations are already using the fully managed features of Amazon Bedrock to quickly deploy domain-specific workflows for a variety of use cases, from early drug target identification to healthcare provider engagement.

However, more complex use cases might benefit from using the open source Strands Agents SDK. Strands Agents takes a model-driven approach to develop and run AI agents. It works with most model providers, including custom and internal large language model (LLM) gateways, and agents can be deployed where you would host a Python application.

In this post, we demonstrate how to create a powerful research assistant for drug discovery using Strands Agents and Amazon Bedrock. This AI assistant can search multiple scientific databases simultaneously using the Model Context Protocol (MCP), synthesize its findings, and generate comprehensive reports on drug targets, disease mechanisms, and therapeutic areas. This assistant is available as an example in the open-source healthcare and life sciences agent toolkit for you to use and adapt.

Solution overview

This solution uses Strands Agents to connect high-performing foundation models (FMs) with common life science data sources like arXiv, PubMed, and ChEMBL. It demonstrates how to quickly create MCP servers to query data and view the results in a conversational interface.

Small, focused AI agents that work together can often produce better results than a single, monolithic agent. This solution uses a team of sub-agents, each with their own FM, instructions, and tools. The following flowchart shows how the orchestrator agent (shown in orange) handles user queries and routes them to sub-agents for either information retrieval (green) or planning, synthesis, and report generation (purple).

This post focuses on building with Strands Agents in your local development environment. Refer to the Strands Agents documentation to deploy production agents on AWS Lambda, AWS Fargate, Amazon Elastic Kubernetes Service (Amazon EKS), or Amazon Elastic Compute Cloud (Amazon EC2).

In the following sections, we show how to create the research assistant in Strands Agents by defining an FM, MCP tools, and sub-agents.

Prerequisites

This solution requires Python 3.10+, strands-agents, and several additional Python packages. We strongly recommend using a virtual environment like venv or uv to manage these dependencies.

Complete the following steps to deploy the solution to your local environment:

Clone the code repository from GitHub.
Install the required Python dependencies with pip install -r requirements.txt.
Configure your AWS credentials by setting them as environment variables, adding them to a credentials file, or following another supported process.
Save your Tavily API key to a .env file in the following format: TAVILY_API_KEY="YOUR_API_KEY".

You also need access to the following Amazon Bedrock FMs in your AWS account:

Anthropic’s Claude 3.7 Sonnet
Anthropic’s Claude 3.5 Sonnet
Anthropic’s Claude 3.5 Haiku

Define the foundation model

We start by defining a connection to an FM in Amazon Bedrock using the Strands Agents BedrockModel class. We use Anthropic’s Claude 3.7 Sonnet as the default model. See the following code:

from strands import Agent, tool
from strands.models import BedrockModel
from strands.agent.conversation_manager import SlidingWindowConversationManager
from strands.tools.mcp import MCPClient
# Model configuration with Strands using Amazon Bedrock's foundation models
def get_model():
    model = BedrockModel(
        boto_client_config=Config(
            read_timeout=900,
            connect_timeout=900,
            retries=dict(max_attempts=3, mode="adaptive"),
        ),
        model_id="us.anthropic.claude-3-7-sonnet-20250219-v1:0",
        max_tokens=64000,
        temperature=0.1,
        top_p=0.9,
        additional_request_fields={
            "thinking": {
                "type": "disabled"  # Can be enabled for reasoning mode
            }
        }
    )
    return model

Define MCP tools

MCP provides a standard for how AI applications interact with their external environments. Thousands of MCP servers already exist, including those for life science tools and datasets. This solution provides example MCP servers for:

arXiv – Open-access repository of scholarly articles
PubMed – Peer-reviewed citations for biomedical literature
ChEMBL – Curated database of bioactive molecules with drug-like properties
ClinicalTrials.gov – US government database of clinical research studies
Tavily Web Search – API to find recent news and other content from the public internet

Strands Agents streamlines the definition of MCP clients for our agent. In this example, you connect to each tool using standard I/O. However, Strands Agents also supports remote MCP servers with Streamable-HTTP Events transport. See the following code:

# MCP Clients for various scientific databases
tavily_mcp_client = MCPClient(lambda: stdio_client(
    StdioServerParameters(command="python", args=["application/mcp_server_tavily.py"])
))
arxiv_mcp_client = MCPClient(lambda: stdio_client(
    StdioServerParameters(command="python", args=["application/mcp_server_arxiv.py"])
))
pubmed_mcp_client = MCPClient(lambda: stdio_client(
    StdioServerParameters(command="python", args=["application/mcp_server_pubmed.py"])
))
chembl_mcp_client = MCPClient(lambda: stdio_client(
    StdioServerParameters(command="python", args=["application/mcp_server_chembl.py"])
))
clinicaltrials_mcp_client = MCPClient(lambda: stdio_client(
    StdioServerParameters(command="python", args=["application/mcp_server_clinicaltrial.py"])
))

Define specialized sub-agents

The planning agent looks at user questions and creates a plan for which sub-agents and tools to use:

@tool
def planning_agent(query: str) -> str:
    """
    A specialized planning agent that analyzes the research query and determines
    which tools and databases should be used for the investigation.
    """
    planning_system = """
    You are a specialized planning agent for drug discovery research. Your role is to:
    
    1. Analyze research questions to identify target proteins, compounds, or biological mechanisms
    2. Determine which databases would be most relevant (Arxiv, PubMed, ChEMBL, ClinicalTrials.gov)
    3. Generate specific search queries for each relevant database
    4. Create a structured research plan
    """
    model = get_model()
    planner = Agent(
        model=model,
        system_prompt=planning_system,
    )
    response = planner(planning_prompt)
    return str(response)

Similarly, the synthesis agent integrates findings from multiple sources into a single, comprehensive report:

@tool
def synthesis_agent(research_results: str) -> str:
    """
    Specialized agent for synthesizing research findings into a comprehensive report.
    """
    system_prompt = """
    You are a specialized synthesis agent for drug discovery research. Your role is to:
    
    1. Integrate findings from multiple research databases
    2. Create a comprehensive, coherent scientific report
    3. Highlight key insights, connections, and opportunities
    4. Organize information in a structured format:
       - Executive Summary (300 words)
       - Target Overview
       - Research Landscape
       - Drug Development Status
       - References
    """
    model = get_model()
    synthesis = Agent(
        model=model,
        system_prompt=system_prompt,
    )
    response = synthesis(synthesis_prompt)
    return str(response)

Define the orchestration agent

We also define an orchestration agent to coordinate the entire research workflow. This agent uses the SlidingWindowConversationManager class from Strands Agents to store the last 10 messages in the conversation. See the following code:

def create_orchestrator_agent(
    history_mode,
    tavily_client=None,
    arxiv_client=None,
    pubmed_client=None,
    chembl_client=None,
    clinicaltrials_client=None,
):
    system = """
    You are an orchestrator agent for drug discovery research. Your role is to coordinate a multi-agent workflow:
    
    1. COORDINATION PHASE:
       - For simple queries: Answer directly WITHOUT using specialized tools
       - For complex research requests: Initiate the multi-agent research workflow
    
    2. PLANNING PHASE:
       - Use the planning_agent to determine which databases to search and with what queries
    
    3. EXECUTION PHASE:
       - Route specialized search tasks to the appropriate research agents
    
    4. SYNTHESIS PHASE:
       - Use the synthesis_agent to integrate findings into a comprehensive report
       - Generate a PDF report when appropriate
    """
    # Aggregate all tools from specialized agents and MCP clients
    tools = [planning_agent, synthesis_agent, generate_pdf_report, file_write]
    # Dynamically load tools from each MCP client
    if tavily_client:
        tools.extend(tavily_client.list_tools_sync())
    # ... (similar for other clients)
    conversation_manager = SlidingWindowConversationManager(
        window_size=10,  # Maintains context for the last 10 exchanges
    )
    orchestrator = Agent(
        model=model,
        system_prompt=system,
        tools=tools,
        conversation_manager=conversation_manager
    )
    return orchestrator

Example use case: Explore recent breast cancer research

To test out the new assistant, launch the chat interface by running streamlit run application/app.py and opening the local URL (typically http://localhost:8501) in your web browser. The following screenshot shows a typical conversation with the research agent. In this example, we ask the assistant, “Please generate a report for HER2 including recent news, recent research, related compounds, and ongoing clinical trials.” The assistant first develops a comprehensive research plan using the various tools at its disposal. It decides to start with a web search for recent news about HER2, as well as scientific articles on PubMed and arXiv. It also looks at HER2-related compounds in ChEMBL and ongoing clinical trials. It synthesizes these results into a single report and generates an output file of its findings, including citations.

The following is an excerpt of a generated report:

Comprehensive Scientific Report: HER2 in Breast Cancer Research and Treatment
1. Executive Summary
Human epidermal growth factor receptor 2 (HER2) continues to be a critical target in breast cancer research and treatment development. This report synthesizes recent findings across the HER2 landscape highlighting significant advances in understanding HER2 biology and therapeutic approaches. The emergence of antibody-drug conjugates (ADCs) represents a paradigm shift in HER2-targeted therapy, with trastuzumab deruxtecan (T-DXd, Enhertu) demonstrating remarkable efficacy in both early and advanced disease settings. The DESTINY-Breast11 trial has shown clinically meaningful improvements in pathologic complete response rates when T-DXd is followed by standard therapy in high-risk, early-stage HER2+ breast cancer, potentially establishing a new treatment paradigm.

Notably, you don’t have to define a step-by-step process to accomplish this task. By providing the assistant with a well-documented list of tools, it can decide which to use and in what order.

Clean up

If you followed this example on your local computer, you will not create new resources in your AWS account that you need to clean up. If you deployed the research assistant using one of those services, refer to the relevant service documentation for cleanup instructions.

Conclusion

In this post, we showed how Strands Agents streamlines the creation of powerful, domain-specific AI assistants. We encourage you to try this solution with your own research questions and extend it with new scientific tools. The combination of Strands Agents’s orchestration capabilities, streaming responses, and flexible configuration with the powerful language models of Amazon Bedrock creates a new paradigm for AI-assisted research. As the volume of scientific information continues to grow exponentially, frameworks like Strands Agents will become essential tools for drug discovery.

To learn more about building intelligent agents with Strands Agents, refer to Introducing Strands Agents, an Open Source AI Agents SDK, Strands Agents SDK, and the GitHub repository. You can also find more sample agents for healthcare and life sciences built on Amazon Bedrock.

For more information about implementing AI-powered solutions for drug discovery on AWS, visit us at AWS for Life Sciences.

About the authors

Hasun Yu is an AI/ML Specialist Solutions Architect with extensive expertise in designing, developing, and deploying AI/ML solutions for healthcare and life sciences. He supports the adoption of advanced AWS AI/ML services, including generative and agentic AI.

Brian Loyal is a Principal AI/ML Solutions Architect in the Global Healthcare and Life Sciences team at Amazon Web Services. He has more than 20 years’ experience in biotechnology and machine learning and is passionate about using AI to improve human health and well-being.

Amazon Nova Act SDK (preview): Path to production for browser automation agents

In early 2025, we introduced the Amazon Nova Act SDK as a research preview to help developers build agents that reliably complete tasks in a web browser. Now, we are excited to work with customers to take their agents to production in a limited preview, using new AWS integrations including AWS Identity and Access Management (IAM) for secure credentialing, Amazon Simple Storage Service (Amazon S3) for data storage and policy control, and the new Amazon Bedrock AgentCore Browser Tool for scalable, cloud-based browser execution.

In this post, we walk through what makes the Amazon Nova Act SDK unique, how it works, and how teams across industries are already using it to automate browser-based workflows at scale.

Challenges with traditional automated business workflows

Many day-to-day business operations require a browser, such as submitting time-off requests, processing invoices, accessing vendor portals, or reviewing dashboards. Lack of API coverage often means workflows are done manually: teams copy-paste data across tabs, follow multi-step flows, and click through countless interfaces to get work done.

Traditional rules-based browser automation frameworks often face challenges in dynamic web environments. Teams can spend more time on ongoing maintenance than on building new automations, because changes in page structure (for example, newly added form fields or dropdown options) break brittle selectors. Most importantly, these frameworks are difficult to scale. If one use case is performed on 50 different sites (for example, professional license verification on state websites), teams must build 50 site-specific automations, because rules-based frameworks don’t generalize.

As humans, our ability to perform tasks adapts across different tools and interfaces. For example, once you know how to draft an email in Outlook, you can easily do the same in Gmail—even if you’ve never used it before. Large language models (LLMs), trained on millions of examples of UIs, offer the potential to create a similar type of pattern recognition for AI agents. They’ve brought us this far—powering chat, summarization, coding copilots, and more—by interpreting language, following instructions, and reasoning across domains. Now, we’re entering the next phase of generative AI: one centered on action. Agentic AI builds on the foundation of LLMs to move from understanding to execution. These systems are designed to complete dynamic, multi-step workflows—like filling out complex forms, interacting with evolving UI elements, or performing real-world business tasks at scale. Agentic AI doesn’t replace LLMs—it extends them, unlocking new automation capabilities that bring us closer to real task completion in enterprise environments.

Agentic AI with the Amazon Nova Act SDK

With the Amazon Nova Act SDK, you can build and deploy reliable browser agents powered by the Amazon Nova Act model—purpose-built and fine-tuned for agentic AI. Trained with reinforcement learning and extensive in-domain browser interaction data, it executes step-by-step workflows with precision. With this latest version, we’ve extended those capabilities with AWS integrations so you can take your agents from prototype to production. You can install the SDK with a single command, write agents in Python and natural language, debug in real time, and integrate directly into continuous integration and delivery (CI/CD) pipelines. With enterprise-grade security, observability, and infrastructure now available through AWS, the Amazon Nova Act SDK provides a fast, flexible path for teams looking to build agents that act—and deliver—at scale. You can use the Amazon Nova Act SDK to automate real-world workflows where traditional scripts or general-purpose models aren’t reliable or scalable enough. You can install it with a single command, write agents using a combination of Python and natural language, debug while the workflow runs, and deploy through CI/CD pipelines.

The Amazon Nova Act SDK also integrates with the new Amazon Bedrock AgentCore Browser Tool—a fast, secure, cloud-based browser that enables AI agents to interact with websites at scale. It includes enterprise-grade security features, including virtual machine-level isolation and federated identity integration. The tool offers built-in observability through live viewing, AWS CloudTrail logging, and session replay to troubleshoot, maintain quality, and support compliance.

Benefits of the Amazon Nova Act SDK

The Amazon Nova Act SDK is reliable, fast to deploy, and built for secure, large-scale browser automation use cases. In this section, we discuss some of the benefits of the SDK in more detail.

Reliability: Build robust browser automation with high accuracy and repeatability

With the Amazon Nova Act SDK, developers can break down complex workflows into reliable atomic commands (for example, collect all form elements of a webpage and return a string with all required fields of the form). It supports the addition of detailed instructions to refine those commands when needed (for example, dismiss any popup banners), the ability to call APIs, and the option to alternate direct browser manipulation through Playwright to improve reliability (for example, for entering passwords). Developers can interleave Python code—such as tests, breakpoints, assertions, or thread pools for parallelization—to optimize performance, especially because even the fastest agents are constrained by webpage load times. With this latest version, the Amazon Nova Act SDK is already demonstrating over 90% reliability across early enterprise workflows, including automated quality assurance, complex form handling, and process execution. Improvements to reasoning and recovery help agents adapt to changing UIs and complete complex sequences consistently and accurately.

Speed-to-market: Move from prototype to production in days—not weeks

The Amazon Nova Act SDK is designed to help you build automation quickly, without relying on brittle scripts. You can install the SDK with a single command. You can define agents using Python, natural language, or both. You can debug flows while they run, inspect the DOM, pause between steps, and iterate rapidly. The Amazon Nova Act SDK supports the following features:

Git-based CI/CD pipelines
Reusable agent modules
Playwright fallbacks for sensitive tasks (for example, password input)
Threading and parallelization for fast execution

You don’t have to change your infrastructure or rebuild your internal tools. Agents built with Amazon Nova Act fit into existing dev workflows and allow you to move from experimentation to production quickly.

Security: Deploy automations you can trust—powered by AWS

The Amazon Nova Act SDK integrates with IAM for access control, and access to the model is managed just like access from other AWS services. It supports execution on macOS, Linux, Windows, and WSL2. Runtime environments are isolated, and encryption is supported for sensitive inputs and outputs. The Amazon Nova Act SDK was designed to work inside enterprise environments—with the reliability, observability, and security that production systems require.

See it in action: Automating information gathering to help streamline financial decisions

In financial services—especially investment banking, M&A advisory, and strategic research—success often depends on how fast and accurately teams can turn fragmented public data into actionable insight. The following demo shows the Amazon Nova Act SDK in action.

Where the Amazon Nova Act SDK can make an impact

Browser-based workflows are common in today’s businesses, yet many remain manual, repetitive, and prone to error. The Amazon Nova Act SDK helps organizations automate these tasks, freeing up teams to focus on higher-value work, improve accuracy, and reduce operational delays. Its reliability makes it a fit across industries and use cases. In this section, we provide some examples of what our early customers are building.

Automated data entry and form filling

The Amazon Nova Act SDK reduces repetitive manual input across web-based systems—like CRMs, HR tools, and finance platforms—by automating form submissions, uploads, and updates. In healthcare, staff assist members with complex, state-specific benefit applications. Public sector caseworkers also re-enter household data across multiple systems. The Amazon Nova Act SDK handles these dynamic flows reliably—navigating shifting fields, dropdowns, and popups without brittle scripts or custom code.

Rackspace Technology, a leading hybrid and AI solutions provider, is working with Alvee Health to automatically register members for public benefits using the Amazon Nova Act SDK. “Many registration forms for public programs are long and confusing, so members often don’t apply for the help they need,” said Nicole Cook, CEO at Alvee. “With the Amazon Nova Act SDK and harnessing information already in Alvee’s system, we’re not just simplifying paperwork—we’re helping provide timely, accurate access to the resources that support healthier lives. We expect this innovation to increase successful benefit registrations by 30%, and improve overall case load by up to tenfold, allowing healthcare providers to focus more on patient care and less on administration. This is a prime example of how AI can be used to support well-being and improve overall health for communities.”

Customer support augmentation

Customer support teams across retail, travel, and software as a service (SaaS) often move between internal tools and third-party portals to resolve tickets. For example, a retail associate might submit a return on a partner site. A travel agent might log in to an airline dashboard to request compensation. A support rep might reset a license key in a customer admin console. The Amazon Nova Act SDK automates these browser-based tasks, helping agents stay focused on customer conversations while backend steps are executed reliably and at speed.

Navan, a leading travel and expense management platform, uses the Amazon Nova Act SDK to simplify its travel agents’ workflows by automating how they fill out payment forms across a wide range of vendors.

Yuval Refua, SVP of Product, said, “Adding the Amazon Nova Act SDK to our agents’ workflows has helped us reduce repetitive tasks—an essential step in scaling our operations to serve more customers. We tried other computer use tools, and Amazon Nova Act’s reliability and flexibility enabled a single script work across diverse payment forms from a range of hotel brands. We’re now expanding this automation to cover even more vendors, which we expect will increase our operational capacity and help us meet growing customer demand more efficiently.”

Automating high-stakes administrative workflows

Credential verification, identity checks, and other compliance-heavy tasks often involve navigating hundreds of third-party portals with inconsistent layouts. The Amazon Nova Act SDK makes it possible to automate these workflows with high accuracy, flexibility, and full control—helping teams scale while maintaining precision.

Automation Anywhere, a global leader in Agentic Process Automation (APA), is expanding its automation capabilities through the Amazon Nova Act SDK, starting with professional credential verification—a high-stakes, repetitive task that’s essential for compliance, member safety, and day-to-day operations.

“By deeply integrating the Amazon Nova Act SDK into our Process Reasoning Engine (PRE), we’ve unlocked a major leap forward in computer use for enterprise automation,” said Adi Kuruganti, Chief Product Officer at Automation Anywhere. “Our goal-oriented AI agents don’t just mimic clicks, they reason through UI-based processes in real time, navigating complex websites with human-like expertise. This opens the door to automating previously out-of-reach use cases like healthcare program enrollment testing, where accuracy and scale are essential.”

UX and QA testing across dynamic interfaces

UX and QA testing often involves simulating real user interactions on frequently changing websites—especially in sectors like banking, insurance, and government. With the Amazon Nova Act SDK, teams can write and update tests using natural language or Python, adapting quickly to UI changes without brittle selectors or manual rewrites.

Tyler Technologies, a leading provider of integrated software and technology services to the public sector, is using the Amazon Nova Act SDK to automate software testing and improve the reliability of its releases. “Amazon Nova Act’s natural-language interface lets us convert our manual test plans into automated suites in minutes—without writing a single line of code, saving us hundreds of hours while expanding test coverage and increasing product quality.” said Franklin Williams, President of Data & Insights at Tyler Technologies. “We’re now looking to expand the use of Nova Act SDK across our portfolio.”

What’s next for the Amazon Nova Act SDK

We’re working closely with early AWS customers to inform our roadmap. Although today’s focus is on browser-based workflows, the Amazon Nova Act SDK is part of a broader effort to build agents that can operate reliably across diverse environments. We’re continuing to expand the model’s reach beyond the web, applying reinforcement learning to more complex, real-world tasks. We’re also deepening integration across the AWS ecosystem to help developers move faster—from prototyping to deploying secure, scalable agentic systems.

Get started with the Amazon Nova Act SDK

If you’re a technical leader or developer and want to start prototyping with the research preview of the Amazon Nova Act SDK, visit Amazon Nova Act. You will get access to early tooling designed for reliable, step-by-step browser automation—built for real-world workflows, not just demos.

The Amazon AGI Lab is Amazon’s applied research group focused on building useful AI agents that can take real-world actions in digital and physical environments. Their work spans LLMs, reinforcement learning, world modeling, and more. To learn more and keep up with their latest innovations, visit Amazon AGI Labs.

About the authors

Lori Knapp is a Principal Product Manager with Amazon Nova. She leads product efforts to define how foundation models can power intelligent agents across diverse real-world use cases. Prior to this role, Lori’s experience spanned scaling adaptive voice experiences at Alexa, product strategy at Microsoft, and behavioral science consulting. Outside of work, she enjoys exploring new cities, hosting dinner parties, and solving crossword puzzles.

Tara Raj is an Engineering Manager at Amazon working on Nova Act. In her current role she’s focused on developer experience, from building nova.amazon.com/act to the Nova Act SDK with the software engineers on her team to driving adoption of Amazon Nova Act with her solutions architect team. Tara has over 10 years of experience in engineering roles bringing products from vision to launch including Nova multimodal capabilities at Amazon and the Windows Subsystem for Linux and Visual Studio Code at Microsoft. Outside of work you can find her traveling, dancing, and trying new restaurants.

Optimizing enterprise AI assistants: How Crypto.com uses LLM reasoning and feedback for enhanced efficiency

This post is co-written with Jessie Jiao from Crypto.com. Crypto.com is a crypto exchange and comprehensive trading service serving 140 million users in 90 countries. To improve the service quality of Crypto.com, the firm implemented generative AI-powered assistant services on AWS.

Modern AI assistants—artificial intelligence systems designed to interact with users through natural language, answer questions, and even perform tasks—face increasingly complex challenges in production environments. Beyond handling basic FAQs, they must now execute meaningful actions, adhere to company policies, implement content filtering, escalate to human operators when needed, and manage follow-up tasks. These requirements demand sophisticated systems capable of handling diverse scenarios while maintaining consistency and compliance.

To address these challenges, a modular subsystem architecture proves invaluable. This architectural approach divides an AI system into separate, specialized components that can function independently while working together as a cohesive whole. Such design allows for flexible integration of different processing logics, such as intelligent routing between knowledge bases, dynamic prioritization of information sources, and seamless incorporation of business rules and policies. Each subsystem can be independently developed and optimized for specific tasks while maintaining overall system coherence.

As AI assistant systems grow in complexity, with multiple subsystems handling various workloads, prompt engineering emerges as a critical discipline. This art of carefully crafting input text guides language model responses and facilitates consistent behavior across interconnected components. Crafting effective prompts that work across different subsystems while maintaining consistency and accuracy is both critical and time-intensive. This challenge is particularly acute in enterprise environments where precision and reliability are paramount.

In this post, we explore how we used user and system feedback to continuously improve and optimize our instruction prompts. This feedback-driven approach has enabled us to create more effective prompts that adapt to various subsystems while maintaining high performance across different use cases.

Feedback and reasoning: The key to LLM performance improvement

Although large language models (LLMs) have demonstrated remarkable capabilities, they can sometimes struggle with complex or ambiguous inputs. This is where feedback mechanisms become essential. By incorporating feedback loops, LLMs can learn from their mistakes, refine the instruction, and adapt to challenging scenarios.

One powerful approach is critiquing, where LLMs are paired with an external feedback mechanism that provide critiques or feedback. For instance, when processing documents, if an LLM generates an incorrect summary, a fact-checking tool can identify inaccuracies and provide feedback. The model can then revise its output, leading to improved accuracy and reliability. This iterative process mirrors human learning, where feedback drives continuous improvement. Consider an example where a customer asks an enterprise AI assistant, “I need to increase my credit limit immediately for an emergency purchase.” The assistant might initially respond with approval steps without verification, but a critique system would flag: “Response bypasses required identity verification protocol and fails to assess qualification criteria per company policy.” With this feedback, the assistant can revise its response to include proper authentication steps, eligibility checking, and alternative options for emergency situations—demonstrating how critiquing facilitates adherence to business rules while maintaining helpful customer service.

Unlike traditional machine learning (ML) processes where feedback serves as a loss function to update model weights, these feedback mechanisms operate differently in inference-time LLM applications. Rather than modifying the underlying model parameters, feedback provides supplementary instructions that dynamically guide the model’s behavior. This approach allows for behavioral adaptation without the computational expense of retraining, effectively creating a flexible instruction layer that shapes model outputs while preserving the core capabilities of the pre-trained model. Such runtime adaptability represents a significant advancement in making LLMs more responsive to specific requirements without architectural modifications.

The effectiveness of feedback mechanisms extends beyond simple error correction, enabling LLMs to develop a nuanced understanding of task requirements. Through iterative feedback cycles, models can learn to interpret ambiguous instructions more effectively, identify implicit context, and adapt their processing strategies accordingly. This capability is particularly valuable in enterprise settings where complex, domain-specific tasks require precise interpretation of instructions. By analyzing feedback patterns over time, LLMs can even anticipate potential misunderstandings and proactively adjust their approach, leading to more efficient and accurate outcomes. In our research implementing this approach for financial services classification tasks, we observed substantial performance improvements—from initial accuracy rates of 60% to eventually achieving 100% through systematic feedback incorporation. Each iteration addressed specific weaknesses identified in previous rounds, demonstrating how structured critique leads to continuous model improvement.

For deeper insights into these mechanisms, we recommend exploring two key research papers: CRITIC: Large Language Models Can Self-Correct with Tool-Interactive Critiquing, which demonstrates how LLMs can self-correct with tool-interactive critiquing, and Reflexion: Language Agents with Verbal Reinforcement Learning, which explores language agents with verbal reinforcement learning. The following figure provides a visual representation of this feedback process.

Recent developments in reasoning capabilities have made this feedback process even more powerful. Modern LLMs can now engage in sophisticated analysis of their own outputs, breaking down complex problems into manageable components and systematically evaluating each aspect of their performance. To learn more, see Anthropic’s Claude 3.7 Sonnet hybrid reasoning model is now available in Amazon Bedrock and DeepSeek-R1 now available as a fully managed serverless model in Amazon Bedrock. This self-analysis capability, combined with external feedback, creates a robust framework for continuous improvement.

Consider a scenario where an LLM is tasked with sentiment analysis. Initially, when classifying a mixed review like “The product worked as advertised, but customer service was disappointing,” the model might incorrectly label it as positive. Through error analysis and verification, a critique mechanism (powered by a separate reasoning model) can provide targeted feedback, explaining that negative statements about service quality significantly impact overall sentiment. This feedback doesn’t modify the model’s weights but instead serves as supplementary instruction that enriches the original prompt template, helping the model properly weigh contrasting sentiments within the same text.

Over multiple feedback iterations, the LLM employs reasoning capabilities to incorporate this external feedback and develop more sophisticated classification heuristics. With the critique system continuously verifying outputs and providing constructive guidance, the model learns to identify why certain patterns lead to misclassifications and refines its approach accordingly. When encountering new ambiguous reviews, it can now apply these learned insights to correctly interpret subtle emotional nuances. This demonstrates how reasoning-based feedback effectively modifies the instruction context without requiring parameter adjustments, allowing for continuous improvement through analytical understanding rather than mechanical optimization.

In the next section, we explore how these feedback mechanisms and reasoning capability can be operationalized to enhance workflows.

Solution overview

The integration of feedback and reasoning creates a powerful learning loop: feedback identifies areas for improvement, reasoning capabilities analyze the root causes of issues, and the resulting insights drive specific, actionable changes. This systematic approach to improvement makes sure that each iteration brings the model closer to optimal performance, while maintaining transparency and accountability in the development process.

For practical examples and complete implementation code of this process, check out our GitHub repository. This repository includes sample datasets, evaluation frameworks, and ready-to-use templates for each step of the optimization workflow.

Our proposed solution uses two foundation models (FMs) through Amazon Bedrock: Amazon Nova for executing instructional tasks and optimizing the instruction prompt, and Anthropic’s Claude 3.7 or DeepSeek-R1 for error analysis and feedback generation. Amazon Bedrock, a fully managed service, provides access to high-performance FMs from leading AI companies, enabling flexible model selection and testing. You can explore illustration_notebook_optimization_prompt.ipynb for a quick walkthrough of the high-level process for LLM optimization, which demonstrates key concepts and implementation details in an accessible format.

LLM optimization workflow

The following is the high-level process for LLM optimization:

The process begins with a precise articulation of task requirements and success criteria. This crucial first step involves three key components: defining specific task objectives, crafting a well-structured prompt template with clear instructions, and assembling a comprehensive evaluation dataset with verified ground truth labels. During this phase, we establish quantifiable success metrics and acceptance criteria to measure improvement effectively. The Amazon Nova Pro understanding model is configured to provide both task outputs and detailed explanations for its decisions, enabling transparency in the evaluation process.

For illustration, we started with a simple prompt template to categorize customer inquiries into multiple classes, such as PASSWORD_RESET, ESCALATION, and OUT_OF_SCOPE. This initial template provided only basic category definitions without detailed guidance on edge cases or classification priorities, serving as our baseline for improvement. You can refer to the test case dataset and initial template.

Following the setup, we conduct rigorous testing against ground truth data to evaluate model performance. This evaluation focuses on both successful and failed cases, with particular emphasis on analyzing misclassifications. The model’s generated explanations for each decision serve as valuable insights into its reasoning process. We collect both quantitative performance metrics (accuracy, precision, recall) and qualitative insights into error patterns, creating a comprehensive performance baseline.

During this step, we compare model predictions to ground truth labels and record both quantitative metrics and detailed error cases. For example, when a customer urgently reports unauthorized account changes with “Someone must have accessed my account…I need this fixed immediately”, the model might incorrectly classify it as CARD_DISPUTE instead of the correct ESCALATION category. Each prediction is logged with its success status (true/false), the model’s explanation, and the correct label. This comprehensive analysis creates a structured dataset of both successful classifications and failure cases, providing critical input for the reasoning-based optimization in the next step.

The key step of our optimization process lies in systematic error analysis using a dedicated reasoning framework. This framework examines the model’s explanations for each error case, identifying root causes and pattern recognition failures. Beyond individual error analysis, we employ pattern recognition to identify systemic issues across multiple cases. The reasoning model, in our case Anthropic’s Claude 3.7, incorporates historical feedback and learning patterns to generate specific, actionable feedback for prompt improvement. This critical step produces structured, detailed recommendations for prompt optimization.

The reasoning model analyzed classification performance through a structured framework that identified error patterns, investigated prompt-specific root causes, considered historical context from previous iterations, and suggested targeted improvements. This methodical approach focused exclusively on enhancing prompt clarity, structure, and precision—avoiding model or data modifications outside the scope of prompt engineering. By systematically addressing ambiguities and refining classification criteria, we achieved progressively better performance with each iteration. See the following code:

critique_prompt_template = """
    Analyze classification performance and provide reasoning for prompt improvements:
    Current Template: ${input_current_template}
    Evaluation Results: ${evaluation_results}
    
    Follow these thinking steps:
    1. Error Pattern Analysis:
    2. Root Cause Investigation:
    3. Historical Context Review:
       • Previous suggestions: ${suggestion_history}
    4. Prompt Improvement Ideas:
     
    Output final suggestions between <suggestion> </suggestion> tags
"""

You can see the detailed implementation in error_analysis_with_reasoning.py.

Using the structured feedback from the reasoning framework, we implement targeted modifications to the prompt template. These refinements might include enhancing instruction clarity, adjusting classification parameters, or restructuring the prompt format. Each modification directly addresses specific issues identified in the analysis phase, making sure changes are evidence-based and purposeful. The focus remains on improving the instruction layer rather than modifying the underlying model architecture.

To implement these structured improvements, we developed a systematic prompt rewriting mechanism encoded in our prompt_rewrite.py module. This component transforms analytical feedback into concrete prompt enhancements through a dedicated template-based approach. The rewriting process follows a methodical workflow: it preserves essential components like placeholders, incorporates specific improvements identified in the analysis, and makes sure modifications directly address root causes from the feedback. This systematic rewriting approach guarantees that each iteration builds upon previous learnings rather than making arbitrary changes.

rewrite_prompt_template = """
TASK: Improve the prompt template based on critique feedback.
INPUT:
- Current Template: ${input_current_template}
- Critique Analysis: ${critique_feedbacks}
INSTRUCTIONS:
1. Preserve the current template structure and all placeholders
2. Implement specific improvements identified in the critique
3. Focus on addressing root causes of errors
4. Create a complete, ready-to-use improved template
OUTPUT FORMAT:
- Root cause summary
- Improved template incorporating all recommended changes
The improved template should directly address identified issues while remaining concise and effective.
"""

The optimization process concludes each iteration by testing the refined prompt against the evaluation dataset. We measure performance improvements through comparative analysis of key metrics and conduct quality assessments of new outputs. This phase initiates the next iteration cycle, where successful changes are incorporated into the baseline, and newly identified challenges inform the next round of optimization. This creates a sustainable improvement loop that progressively enhances prompt effectiveness while maintaining detailed documentation of successful strategies.

Through our iterative refinement process, we transformed a basic prompt into a highly effective instruction set for LLMs. Each iteration strategically addressed specific weaknesses identified through our structured analysis framework. For complete documentation of each iteration’s analysis and improvements, see iteration_log.

What began as a simple prompt evolved into a comprehensive set of instructions incorporating nuanced task boundaries, explicit priority rules for edge cases, hierarchical decision criteria, and precise handling instructions for corner cases. Rather than modify model weights or architecture, our approach used targeted feedback from a critique mechanism to enhance the instruction layer, effectively guiding model behavior without retraining. Each iteration built upon lessons from previous rounds, systematically addressing error patterns revealed through our critique framework. The feedback served as supplementary instructions that enriched the original prompt template, allowing the model to develop increasingly sophisticated processing heuristics over time.

Results

Through these iterative approaches, we benchmarked the solution on the production system. Our comparative analysis between the initial and final prompts revealed several important patterns:

Boundary confusion was resolved by adding explicit prioritization rules between overlapping categories
Edge case handling improved by incorporating specific examples that defined thresholds for categorization
Decision transparency increased through structured reasoning requirements in the output format
Classification consistency was enhanced by adding counterexamples to help prevent overcategorization in sensitive areas

Through 10 deliberate iterations and the incorporation of detailed task-specific instructions, we achieved a remarkable 34-percentage-point improvement in task effectiveness, transforming a basic prompt with 60% accuracy into a robust classification system with 94% accuracy on challenging cases. This validates not only our iterative optimization strategy but demonstrates how systematic prompt refinement can dramatically enhance LLM model performance without modifying the underlying model architecture.

Conclusion

The integration of feedback mechanisms into AI assistant systems represents a significant leap forward in conversational AI capabilities. By implementing robust feedback loops, we’ve demonstrated how AI assistants can evolve from static question-answering systems to dynamic, self-improving resources. The modular subsystem architecture, combined with continuous prompt optimization through feedback, enables AI assistants to handle increasingly complex tasks while maintaining compliance and accuracy.

As we’ve shown through practical examples and research insights, feedback-driven systems not only produce better outputs but also allow for more effective and streamlined input instructions over time. This efficiency gain is particularly valuable in enterprise environments where precision and adaptability are crucial, and where model retraining is costly or impractical. Each iteration builds upon lessons from previous rounds, systematically addressing error patterns revealed through our critique framework.

Looking ahead, the continued refinement of feedback mechanisms and prompt engineering techniques will be essential for developing next-generation AI assistant systems. By embracing these approaches, organizations can create AI assistants that not only meet current demands but also adapt to future challenges, delivering increasingly sophisticated and reliable interactions. We invite you to try our proposed feedback-driven prompt optimization approach in your own applications. For those interested in implementing these techniques, Amazon Bedrock provides an ideal landscape for exploring these methods in your specific business contexts, offering a selection of FMs with flexible deployment options.

About the authors

Jessie Jiao is a Senior Software Engineer at crypto.com, where she leverages her extensive experience in designing, building, and implementing enterprise applications with LLM models and AI technologies. She is passionate about harnessing the power of AI to drive business transformation and enhance operational efficiency.

Gary Lo is a Solutions Architect at AWS based in Hong Kong. He is a highly passionate IT professional with over 10 years of experience in designing and implementing critical and complex solutions for distributed systems, web applications, and mobile platforms for startups and enterprise companies. Outside of the office, he enjoys cooking and sharing the latest technology trends and insights on his social media platforms with thousands of followers.

Yanwei Cui, PhD, is a Senior Machine Learning Specialist Solutions Architect at AWS. He started machine learning research at IRISA (Research Institute of Computer Science and Random Systems), and has several years of experience building AI-powered industrial applications in computer vision, natural language processing, and online user behavior prediction. At AWS, he shares his domain expertise and helps customers unlock business potentials and drive actionable outcomes with machine learning at scale. Outside of work, he enjoys reading and traveling.

Michelle Hong, PhD, works as Prototyping Solutions Architect at Amazon Web Services, where she helps customers build innovative applications using a variety of AWS components. She demonstrated her expertise in machine learning, particularly in natural language processing, to develop data-driven solutions that optimize business processes and improve customer experiences.

Build modern serverless solutions following best practices using Amazon Q Developer CLI and MCP

Building modern serverless applications on AWS requires navigating best practices to manage the integration between multiple services, such as AWS Lambda, Amazon API Gateway, Amazon DynamoDB, and Amazon EventBridge. Security considerations, performance optimization, and implementing a comprehensive monitoring systems adds further requirements to build a serverless architecture while adhering to AWS best practices.

Amazon Q Developer CLI with Model Context Protocol (MCP) supports getting started with serverless applications using natural language conversations that produce infrastructure configurations and architectural guidance aligned with AWS best practices. This reduces the need to manually research documentation or piece together solutions from disparate sources.

This post explores how the AWS Serverless MCP server accelerates development throughout the serverless lifecycle, from making architectural decisions with tools like get_iac_guidance and get_lambda_guidance, to streamlining development with get_serverless_templates, sam_init, to deployment with SAM integration, webapp_deployment_help, and configure_domain. We show how this conversational AI approach transforms the entire process, from architecture design through operations, dramatically accelerating AWS serverless projects while adhering to architectural principles.

Solution overview

Amazon Q Developer CLI serves as a powerful terminal-based gateway to the AI capabilities of Amazon Q, bringing conversational intelligence directly to your command line environment. Although you can also apply this approach to conversational AI in the interactive development environment’s (IDE’s) agentic chat, this innovative interface transforms how developers work by enabling natural language interactions with AWS services and development tools without leaving the terminal. By typing queries or requests in plain English, developers can harness the full potential of generative AI with Amazon Q to solve complex problems, generate code snippets, troubleshoot issues, and access AWS best practices—all from the familiar comfort of their preferred terminal application.

However, although Amazon Q CLI provides the conversational interface, it needs a way to connect with external tools and data sources. This is where MCP comes in.

MCP is a groundbreaking open standard created by Anthropic that bridges the gap between AI models and external data sources or tools. This protocol establishes a unified framework allowing large language models (LLMs) to seamlessly interact with diverse information systems and specialized capabilities. When applied to serverless development workflows, this capability transforms how developers build AWS applications.

MCP transforms Amazon Q CLI for serverless development by bridging the gap between traditional approaches and modern AI-assisted workflows. This integration replaces manual documentation research and trial-and-error development with conversational access to AWS expertise, automated code generation, and intelligent configuration assistance—directly from the command line.

Now that we have discussed the conceptual benefits of Amazon Q CLI and MCP, let’s explore how to set up these tools in your development environment.

Prerequisites

You must have the following prerequisites:

An AWS account with credentials configured
The AWS Serverless Application Model Command Line Interface (AWS SAM CLI) installed
The AWS Command Line Interface (AWS CLI) installed

Prepare your environment

Before you can start constructing this serverless solution, you must configure your IDE by installing and setting up Amazon Q CLI and the AWS Serverless MCP server. This section provides comprehensive guidance for properly installing and configuring the required components.

Install Amazon Q Developer CLI You can install Amazon Q Developer CLI independently on your system. To get started, complete the following steps:

Confirm successful installation with this terminal command:

q --version

The system should now show something similar to the following output:

q 1.x.x

q login

Select your preferred authentication option:
- Use for free with Builder ID
- Pro license

Set up MCP servers

Before you can start using MCP servers with Amazon Q CLI, you must install several prerequisites and configure your environment. Complete the following steps:

Install uv with the following command:

pip install uv

Install Python 3.10 or newer using the following command:

uv python install 3.10

Add the following code to your MCP client global configuration (~/.aws/amazonq/mcp.json file). This can be configured by project (.amazonq/mcp.json) in the current folder. The AWS Serverless MCP server uses the default AWS profile by default. Specify a value in AWS_PROFILE if you want to use a different profile. Similarly, adjust the AWS Region and log-level values as needed:

{
    "mcpServers": {
        "awslabs.aws-serverless-mcp": {
            "command": "uvx",
            "args": [
                "awslabs.aws-serverless-mcp-server"
            ],
            "env": {
                "AWS_PROFILE": "your-aws-profile",
                "AWS_REGION": "us-east-1",
                "FASTMCP_LOG_LEVEL": "ERROR"
            }
        }
    }
}

Amazon Q CLI seamlessly detects and integrates with MCP servers configured in the ~/.aws/amazonq/mcp.json file.

The MCP servers support several configuration options to customize your development experience:

Auto-approve – Configure which tools can be used without prompting for approval
Environment variables – Customize server behavior through environment variables
Disabled flag – Temporarily disable specific servers when not needed

The following is an example of how to configure your AWS Serverless MCP server using the preceding options:

{
    "mcpServers": {
        "awslabs.aws-serverless-mcp": {
            "command": "uvx",
            "args": [
                "awslabs.aws-serverless-mcp-server"
            ],
            "env": {
                "AWS_PROFILE": "your-aws-profile",
                "AWS_REGION": "us-east-1",
                "FASTMCP_LOG_LEVEL": "ERROR"
            },
            "disabled": false,
            "autoApprove": [
                "search_documentation",
                "read_documentation",
                "recommend"
            ],
            "trusted": [
                "search_documentation",
                "read_documentation",
                "recommend"
            ]
        }
    }
}

Test your setup

To verify that everything is working correctly, generate a simple serverless function. Start the Amazon Q CLI chat interface and verify the output shows MCP servers being loaded and initialized:

q chat

The message in the preceding screenshot might not match what you see for your use case, especially when your MCP servers take longer to load.

Now that you’ve confirmed your environment is properly configured, you can begin exploring the serverless development workflow. The first step in this journey is making informed architectural decisions that will shape your application.

Architecture and design phase

The architecture and design phase is where your serverless application takes shape. Making the right decisions at this stage can significantly impact your project’s success, scalability, and maintainability. The AWS Serverless MCP server provides powerful tools to guide these critical early decisions.

When starting a new serverless project, choosing the right infrastructure as code (IaC) approach is crucial for long-term success. The AWS Serverless MCP server’s get_iac_guidance_tool transforms this decision-making process from hours of research into a simple conversation. This powerful tool provides comprehensive, contextual guidance tailored to your specific needs by analyzing factors such as:

Project complexity and scale
Team expertise and preferences
Required deployment flexibility
Local testing requirements
Long-term maintenance considerations

Let’s see how this works in practice by asking Amazon Q about the best IaC approach for your serverless application:

What's the best IaC approach for my serverless application?

Because Amazon Q is using get_iac_guidance for the first time, it will ask you to enter t to trust this tool for the entire session, y to allow the execution of this tool only one time, or n to reject the execution of this tool.

The following example is for an ecommerce application.

After approving the tool usage, you will receive a comprehensive, contextual response tailored to your specific architecture needs. For example:

AWS SAM for Your E-Commerce Microservices

Based on your e-commerce microservices architecture, I recommend AWS SAM (Serverless Application Model) as your primary IaC tool for the serverless components of your application. 

Here is why it is particularly well-suited for your needs:

1. Perfect for Your Serverless Components: SAM is specifically designed for the serverless parts of your architecture like the Payment Service (Lambda) and notification systems.

2. Simplified Syntax: SAM provides a simplified way to define Lambda functions, API Gateway, and DynamoDB tables - all key components in your architecture.

3. Local Testing: You can test your Lambda functions locally before deployment, which is crucial for your payment processing service.

4. Integration with Your Architecture: SAM works seamlessly with the API Gateway and Lambda components shown in your diagram.

5. Deployment Independence: Supports your microservices approach where "each microservice can be deployed independently" as mentioned in your documentation.

Example SAM Template for Your Payment Service:

AWSTemplateFormatVersion: "2010-09-09"
Transform: AWS::Serverless-2016-10-31

Resources:
  PaymentFunction:
    Type: AWS::Serverless::Function
    Properties:
      CodeUri: ./payment-service/
      Handler: index.handler
      Runtime: nodejs22.x
      Policies:
        - SQSPollerPolicy:
            QueueName: !GetAtt PaymentQueue.QueueName
      Events:
        SQSEvent:
          Type: SQS
          Properties:
            Queue: !GetAtt PaymentQueue.Arn
            BatchSize: 10

  PaymentQueue:
    Type: AWS::SQS::Queue
    Properties:
      VisibilityTimeout: 300

Development acceleration phase

After you’ve made key architectural decisions, you enter the development phase. This is often the most time-consuming part of the process, requiring significant effort to implement patterns correctly, handle edge cases, and follow best practices. The AWS Serverless MCP server provides tools that dramatically accelerate this phase by using proven patterns and templates.

The get_serverless_template tool alleviates the need to start from scratch by providing access to production-ready serverless patterns from the AWS Serverless Land GitHub repository. These patterns are tested, follow AWS best practices, and can be customized to meet your specific requirements.

Let’s say you want to understand more about serverless architecture. With this tool, you can discover and implement common serverless architectures with a simple conversation. For example, you can ask Amazon Q the following question:

Show me Node.js API templates for serverless applications

Because Amazon Q is using get_serverless_templates for the first time, it will ask you to enter t to trust this tool for the session, y to trust the tool only for this time, or n to reject the use of this tool.

You will have to trust at least this tool if not more. After trusting the tool, you should get a response similar to the following:

Here are some Node.js API templates for serverless applications on AWS:

1. API Gateway to SQS Integration

This template creates a REST API Gateway that directly integrates with AWS SQS to read messages:

Key Features:
- Direct service integration between API Gateway and SQS
- No Lambda function required
- Support for query string parameters
- Ideal for retrieving messages via HTTPS without AWS SDK

Use Cases:
- Web applications polling SQS queues
- Mobile apps requiring secure access to SQS
- Third-party integrations

GitHub Link: https://github.com/aws-samples/serverless-patterns/tree/main/APIGateway-SQS-ReceiveMessages

2. Application Load Balancer with Lambda REST API

While this example uses Python, the pattern is applicable to Node.js as well:

Key Features:
- Path-based listener rules on ALB
- Lambda function as the target
- RESTful API implementation

GitHub Link: https://github.com/aws-samples/serverless-patterns/tree/main/alb-lambda-rest-api-sam-py

3. Full-Stack Integration: Amplify + Cognito + API Gateway + Lambda

This template demonstrates a complete serverless stack with authentication:

Key Features:
- AWS Amplify frontend
- Cognito for authentication
- API Gateway with Cognito authorizer
- Lambda backend
- Environment variables shared between frontend and backend

GitHub Link: https://github.com/aws-samples/serverless-patterns/tree/main/amplify_cognito_apigateway_lambda_envvariables

Development and testing phase

The development and testing phase is where your serverless application takes shape through code. This phase requires tools that streamline the creation, building, and testing of your application components. The AWS Serverless MCP server provides a suite of tools that integrate seamlessly with the AWS SAM CLI, enabling a smooth development experience from project initialization to local testing.

Beginning a new serverless project with the right structure and configuration can be challenging. The sam_init tool reduces this hurdle by creating well-structured projects that follow AWS best practices out of the box. This tool provides templates for various serverless application types, helping you start with a solid foundation.

For example, you can ask the following question:

Create a new SAM project for a Python REST API

Because Amazon Q is using sam_init for the first time, it will ask you to enter t to trust this tool for the session, y to trust the tool only for this time, or n to reject the use of this tool.

The sam_init tool is used to initialize a new serverless application using AWS SAM. It creates a complete project structure with the necessary files to develop, test, and deploy serverless applications on AWS.

The tool has the following key capabilities:

It creates a new project directory with a standardized structure for serverless applications
It generates an AWS SAM template file (template.yaml) that defines your infrastructure as code
It sets up the initial code for Lambda functions in your chosen runtime (Python, Node.js, and so on)
It configures the project with the appropriate dependency management system

For your ecommerce microservices architecture, this tool would be particularly useful for:

Creating individual microservices (like your product catalog service or payment service)
Setting up serverless API endpoints through API Gateway
Establishing the foundation for event-driven communication between services

The tool supports various application templates and runtimes, helping you quickly bootstrap different types of serverless applications, from simple APIs to more complex event-processing systems.

Deployment and configuration phase

After you have developed and tested your serverless application locally, the next step is deploying it to the AWS Cloud. This phase often involves complex, multi-step processes to make sure the components are properly deployed, configured, and integrated. The AWS Serverless MCP server provides specialized tools that simplify these processes, enabling smooth deployments with minimal manual intervention.

Streamline full-stack deployments

For this use case, we’re building a full-stack serverless application with a React frontend, Node.js backend APIs, and DynamoDB database. Traditionally, this would require manually configuring multiple AWS services, managing deployment pipelines, and validating proper integration between components.

The webapp_deployment_help tool transforms this complex process into a single, streamlined operation that handles the different aspects of deployment while following AWS best practices.

You can ask Amazon Q the following question:

Deploy my React application to AWS with serverless backend

Because Amazon Q is using webapp_deployment_help for the first time, it will ask you to enter t to trust this tool for the session, y to trust the tool only for this time, or n to reject the use of this tool. For this example, trust this tool for the session.

This tool supports three deployment types:

Backend only – Deploys a backend application to Lambda with API Gateway
Frontend only – Deploys a frontend application to Amazon Simple Storage Service (Amazon S3) and Amazon CloudFront
Full stack – Deploys both backend and frontend components together

It uses AWS Lambda Web Adapter, so you can deploy standard web applications (like Express.js, Flask, and React) to Lambda without needing to modify your code to follow Lambda-specific conventions.

For your ecommerce microservices architecture, this tool is particularly useful because it aligns with several components in your diagram:

It can deploy your frontend to CloudFront
It can deploy your microservices as Lambda functions or containers
It integrates with API Gateway for API management

The tool requires you to build your application first using your framework’s build command (like npm run build for React), then it handles the deployment to AWS, including setting up the necessary infrastructure.

Set up custom domains

Although your application is now deployed and accessible using AWS generated URLs, professional applications typically require branded, custom domain names for a better user experience. However, configuring custom domains traditionally involves multiple complex steps across several AWS services. The configure_domain tool simplifies this process into a single, intuitive conversation.

You can ask Amazon Q the following question:

Configure my custom domain example.com for my serverless application

Because Amazon Q is using configure_domain for the first time, it will ask you to enter t to trust this tool for the session, y to trust the tool only for this time, or n to reject the use of this tool.

Get the steps to configure your domain as follows:

Steps to Configure a Custom Domain for Your Serverless Application

1. Prerequisites 
- Ensure you own the domain name (example.com) 
- Have a Route 53 hosted zone for your domain 
- Have deployed your serverless application (e.g., your e-commerce-app)

2. Create an SSL Certificate with AWS Certificate Manager (ACM)
- bash
# Request a certificate for your domain. Region must be us-east-1 for CloudFront distributions
- aws acm request-certificate --domain-name example.com --validation-method DNS --region us-east-1
# Note the certificate ARN from the output

3. Validate the Certificate 
- Go to the ACM console 
- Find your certificate and click on it 
- Create the DNS validation records in Route 53 
- Wait for validation to complete (can take up to 30 minutes)

The tool orchestrates a comprehensive domain setup process that includes the following steps:

Creating and validating AWS Certificate Manager (ACM) certificates for secure HTTPS connections
Configuring Amazon Route 53 DNS records to point to your application
Setting up API Gateway custom domain mappings
Associating CloudFront distributions with your domain
Establishing proper cache behaviors and security settings

This automation alleviates the need to navigate between multiple AWS console pages and services, reducing the risk of configuration errors while making sure your domain setup follows security best practices. The tool also handles the complexities of certificate validation and DNS propagation, providing clear guidance throughout the process.

With configure_domain, you can transform your application’s user experience from generic AWS endpoints to professional, branded domains in minutes rather than hours.

Having walked through the complete serverless development lifecycle—from architecture decisions to deployment and configuration—we can now appreciate the transformative impact of Amazon Q CLI with MCP on the entire process.

Next steps

To continue your serverless journey with Amazon Q CLI and MCP, we recommend a progressive approach to building your expertise:

Begin with a simple project like a basic Lambda function triggered by an API Gateway endpoint or Amazon S3 event to familiarize yourself with the conversational development workflow and gain confidence in the generated solutions.
Explore different serverless patterns by experimenting with event-driven architectures using EventBridge for decoupled communication, orchestration workflows with AWS Step Functions, or building comprehensive API-backed applications with API Gateway and Lambda.
As your applications mature, integrate the generated code into your continuous integration and deployment pipelines using services like AWS CodePipeline or AWS CodeBuild to automate testing and deployment processes.
Enhance your applications with robust observability by implementing monitoring, logging, and tracing using Amazon CloudWatch, AWS X-Ray, and Powertools for AWS Lambda (Python) to gain insights into performance and troubleshoot issues effectively.
Share your experiences and suggestions with the community to contribute to the improvement of MCP servers and help shape the future of AI-assisted serverless development.

Conclusion

By using Amazon Q CLI with MCP servers, you can accelerate your serverless application development while adhering to AWS best practices. This approach helps you focus on your business logic while Amazon Q handles the implementation details, resulting in more robust, scalable, and secure applications.

The ability to generate code, configurations, and architecture recommendations in minutes rather than hours creates a comprehensive development experience that makes sure the different components adhere to best practices while providing actionable, specific implementation guidance. The iterative nature of this approach helps teams quickly compare different architectural decisions through simple prompts, while continuously validating their implementations against official AWS documentation.

Having explored the capabilities and benefits of this approach, you’re now ready to apply these concepts in your own projects.

As you continue exploring serverless application development with Amazon Q CLI and MCP, we encourage you to deepen your knowledge with the following resources:

The MCP documentation provides valuable insights into how MCP enhances the capabilities of Amazon Q beyond basic interactions.
The AWS SAM documentation offers comprehensive guidance for serverless application development patterns and deployment strategies.
To learn more about Amazon Q CLI, refer to Using Amazon Q Developer on the command line.
When focusing on architectural excellence, the AWS Well-Architected Framework provides foundational principles for implementing security, reliability, performance efficiency, cost optimization, and operational excellence.
For serverless-specific architectural guidance that addresses the unique characteristics of this computing model, refer to the specialized Serverless Lens for Well-Architected Framework tailored specifically for serverless applications.
To enhance your development workflow, Powertools for AWS Lambda (Python) offers ready-to-use utilities for structured logging, metrics collection, and distributed tracing. Additionally, the AWS MCP servers documentation offers domain-specific assistance tailored to your project requirements.
The serverless community provides valuable resources through AWS Serverless Land, a curated collection of serverless resources, patterns, and examples, and Serverless Stack, an open source framework for building production-ready serverless applications with integrated best practices.

Start building your next serverless application with Amazon Q CLI and MCP today.

About the authors

Dunieski Otano is a Solutions Architect at Amazon Web Services based out of Miami, Florida. He works with World Wide Public Sector MNO (Multi-International Organizations) customers. His passion is Security, Machine Learning and Artificial Intelligence, and Serverless. He works with his customers to help them build and deploy high available, scalable, and secure solutions. Dunieski holds 14 AWS certifications and is an AWS Golden Jacket recipient. In his free time, you will find him spending time with his family and dog, watching a great movie, coding, or flying his drone.

Varun Jasti is a Solutions Architect at Amazon Web Services, working with AWS Partners to design and scale artificial intelligence solutions for public sector use cases to meet compliance standards. With a background in Computer Science, his work covers broad range of ML use cases primarily focusing on LLM training/inferencing and computer vision. In his spare time, he loves playing tennis and swimming.

Joel Asante, an Austin-based Solutions Architect at Amazon Web Services (AWS), works with GovTech (Government Technology) customers. With a strong background in data science and application development, he brings deep technical expertise to creating secure and scalable cloud architectures for his customers. Joel is passionate about data analytics, machine learning, and robotics, leveraging his development experience to design innovative solutions that meet complex government requirements. He holds 13 AWS certifications and enjoys family time, fitness, and cheering for the Kansas City Chiefs and Los Angeles Lakers in his spare time.

Build an intelligent eDiscovery solution using Amazon Bedrock Agents

Legal teams spend bulk of their time manually reviewing documents during eDiscovery. This process involves analyzing electronically stored information across emails, contracts, financial records, and collaboration systems for legal proceedings. This manual approach creates significant bottlenecks: attorneys must identify privileged communications, assess legal risks, extract contractual obligations, and maintain regulatory compliance across thousands of documents per case. The process is not only resource-intensive and time-consuming, but also prone to human error when dealing with large document volumes.

Amazon Bedrock Agents with multi-agent collaboration directly addresses these challenges by helping organizations deploy specialized AI agents that process documents in parallel while maintaining context across complex legal workflows. Instead of sequential manual review, multiple agents work simultaneously—one extracts contract terms while another identifies privileged communications, all coordinated by a central orchestrator. This approach can reduce document review time by 60–70% while maintaining the accuracy and human oversight required for legal proceedings, though actual performance varies based on document complexity and foundation model (FM) selection.

In this post, we demonstrate how to build an intelligent eDiscovery solution using Amazon Bedrock Agents for real-time document analysis. We show how to deploy specialized agents for document classification, contract analysis, email review, and legal document processing, all working together through a multi-agent architecture. We walk through the implementation details, deployment steps, and best practices to create an extensible foundation that organizations can adapt to their specific eDiscovery requirements.

Solution overview

This solution demonstrates an intelligent document analysis system using Amazon Bedrock Agents with multi-agent collaboration functionality. The system uses multiple specialized agents to analyze legal documents, classify content, assess risks, and provide structured insights. The following diagram illustrates the solution architecture.

The architecture diagram shows three main workflows for eDiscovery document analysis:

Real-time document analysis workflow – Attorneys and clients (authenticated users) can upload documents and interact through mobile/web clients and chat. Documents are processed in real time for immediate analysis without persistent storage—uploaded documents are passed directly to the Amazon Bedrock Collaborator Agent endpoint.
Case research document analysis workflow – This workflow is specifically for attorneys (authenticated users). It allows document review and analysis through mobile/web clients and chat. It’s focused on the legal research aspects of previously processed documents.
Document upload workflow – Law firm clients (authenticated users) can upload documents through mobile/web clients. Documents are transferred by using AWS Transfer Family web apps to an Amazon Simple Storage Service (Amazon S3) bucket for storage.

Although this architecture supports all three workflows, this post focuses specifically on implementing the real-time document analysis workflow for two key reasons: it represents the core functionality that delivers immediate value to legal teams, and it provides the foundational patterns that can be extended to support the other workflows. The real-time processing capability demonstrates the multi-agent coordination that makes this solution transformative for eDiscovery operations.

Real-time document analysis workflow

This workflow processes uploaded documents through coordinated AI agents, typically completing analysis within 1–2 minutes of upload. The system accelerates early case assessment by providing structured insights immediately, compared to traditional manual review that can take hours per document. The implementation coordinates five specialized agents that process different document aspects in parallel, listed in the following table.

Agent Type	Primary Function	Processing Time*	Key Outputs
Collaborator Agent	Central orchestrator and workflow manager	2–5 seconds	Document routing decisions, consolidated results
Document Classification Agent	Initial document triage and sensitivity detection	5–10 seconds	Document type, confidence scores, sensitivity flags
Email Analysis Agent	Communication pattern analysis	10–20 seconds	Participant maps, conversation threads, timelines
Legal Document Analysis Agent	Court filing and legal brief analysis	15–30 seconds	Case citations, legal arguments, procedural dates
Contract Analysis Agent	Contract terms and risk assessment	20–40 seconds	Party details, key terms, obligations, risk scores

*Processing times are estimates based on testing with Anthropic’s Claude 3.5 Haiku on Amazon Bedrock and might vary depending on document complexity and size. Actual performance in your environment may differ.

Let’s explore an example of processing a sample legal settlement agreement. The workflow consists of the following steps:

The Collaborator Agent identifies the document as requiring both contract and legal analysis.
The Contract Analysis Agent extracts parties, payment terms, and obligations (40 seconds).
The Legal Document Analysis Agent identifies case references and precedents (30 seconds).
The Document Classification Agent flags confidentiality levels (10 seconds).
The Collaborator Agent consolidates findings into a comprehensive report (15 seconds).

Total processing time is approximately 95 seconds for the sample document, compared to 2–4 hours of manual review for similar documents. In the following sections, we walk through deploying the complete eDiscovery solution, including Amazon Bedrock Agents, the Streamlit frontend, and necessary AWS resources.

Prerequisites

Make sure you have the following prerequisites:

An AWS account with appropriate permissions for Amazon Bedrock, AWS Identity and Access Management (IAM), and AWS CloudFormation.
Amazon Bedrock model access for Anthropic’s Claude 3.5 Haiku v1 in your deployment AWS Region. You can use a different supported model of your choice for this solution. If you use a different model than the default (Anthropic’s Claude 3.5 Haiku v1), you must modify the CloudFormation template to reflect your chosen model’s specifications before deployment. At the time of writing, Anthropic’s Claude 3.5 Haiku is available in US East (N. Virginia), US East (Ohio), and US West (Oregon). For current model availability, see Model support by AWS Region.
The AWS Command Line Interface (AWS CLI) installed and configured with appropriate credentials.
Python 3.8+ installed.
Terminal or command prompt access.

Deploy the AWS infrastructure

You can deploy the following CloudFormation template, which creates the five Amazon Bedrock agents, inference profile, and supporting IAM resources. (Costs will be incurred for the AWS resources used). Complete the following steps:

Launch the CloudFormation stack.

You will be redirected to the AWS CloudFormation console. In the stack parameters, the template URL will be prepopulated.

For EnvironmentName, enter a name for your deployment (default: LegalBlogSetup).
Review and create the stack.

After successful deployment, note the following values from the CloudFormation stack’s Outputs tab:

CollabBedrockAgentId
CollabBedrockAgentAliasId

Configure AWS credentials

Test if AWS credentials are working:aws sts get-caller-identityIf you need to configure credentials, use the following command:

aws configure

Set up the local environment

Complete the following steps to set up your local environment:

Create a new directory for your project:

mkdir bedrock-document-analyzer
cd bedrock-document-analyzer

Set up a Python virtual environment:

#On macOS/Linux:
source venv/bin/activate 
#On Windows:
venvScriptsactivate

Download the Streamlit application:

curl -O https://aws-blogs-artifacts-public.s3.us-east-1.amazonaws.com/ML-18253/eDiscovery-LegalBlog-UI.py

Install dependencies:

pip install streamlit boto3 PyPDF2 python-docx

Configure and run the application

Complete the following steps:

Run the downloaded Streamlit frontend UI file eDiscovery-LegalBlog-UI.py by executing the following command in your terminal or command prompt:

streamlit run eDiscovery-LegalBlog-UI.py

This command will start the Streamlit server and automatically open the application in your default web browser.

Under Agent configuration, provide the following values:
1. For AWS_REGION, enter your Region.
2. For AGENT_ID, enter the Amazon Bedrock Collaborator Agent ID.
3. For AGENT_ALIAS_ID, enter the Amazon Bedrock Collaborator Agent Alias ID.
Choose Save Configuration.

Now you can upload documents (TXT, PDF, and DOCX) to analyze and interact with.

Test the solution

The following is a demonstration of testing the application.

Implementation considerations

Although Amazon Bedrock Agents significantly streamlines eDiscovery workflows, organizations should consider several key factors when implementing AI-powered document analysis solutions. Consider the following legal industry requirements for compliance and governance:

Attorney-client privilege protection – AI systems must maintain confidentiality boundaries and can’t expose privileged communications during processing
Cross-jurisdictional compliance – GDPR, CCPA, and industry-specific regulations vary by region and case type
Audit trail requirements – Legal proceedings demand comprehensive processing documentation for all AI-assisted decisions
Professional responsibility – Lawyers remain accountable for AI outputs and must demonstrate competency in deployed tools

You might encounter technical implementation challenges, such as document processing complexity:

Variable document quality – Scanned PDFs, handwritten annotations, and corrupted files require preprocessing strategies
Format diversity – Legal documents span emails, contracts, court filings, and multimedia content requiring different processing approaches
Scale management – Large cases involving over 100,000 documents require careful resource planning and concurrent processing optimization

The system integration also has specific requirements:

Legacy system compatibility – Most law firms use established case management systems that need seamless integration
Authentication workflows – Multi-role access (attorneys, paralegals, clients) with different permission levels
AI confidence thresholds – Determining when human review is required based on processing confidence scores

Additionally, consider your human/AI collaboration framework. The most successful eDiscovery implementations maintain human oversight at critical decision points. Although Amazon Bedrock Agents excels at automating routine tasks like document classification and metadata extraction, legal professionals remain essential for the following factors:

Complex legal interpretations requiring contextual understanding
Privilege determinations that impact case strategy
Quality control of AI-generated insights
Strategic analysis of document relationships and case implications

This collaborative approach optimizes the eDiscovery process—AI handles time-consuming data processing while legal professionals focus on high-stakes decisions requiring human judgment and expertise. For your implementation strategy, consider a phased deployment approach. Organizations should implement staged rollouts to minimize risk while building confidence:

Pilot programs using lower-risk document categories (routine correspondence, standard contracts)
Controlled expansion with specialized agents and broader user base
Full deployment enabling complete multi-agent collaboration organization-wide

Lastly, consider the following success planning best practices:

Establish clear governance frameworks for model updates and version control
Create standardized testing protocols for new agent deployments
Develop escalation procedures for edge cases requiring human intervention
Implement parallel processing during validation periods to maintain accuracy

By addressing these considerations upfront, legal teams can facilitate smoother implementation and maximize the benefits of AI-powered document analysis while maintaining the accuracy and oversight required for legal proceedings.

Clean up

If you decide to discontinue using the solution, complete the following steps to remove it and its associated resources deployed using AWS CloudFormation:

On the AWS CloudFormation console, choose Stacks in the navigation pane.
Locate the stack you created during the deployment process (you assigned a name to it).
Select the stack and choose Delete.

Results

Amazon Bedrock Agents transforms eDiscovery from time-intensive manual processes into efficient AI-powered operations, delivering measurable operational improvements across business services organizations. With a multi-agent architecture, organizations can process documents in 1–2 minutes compared to 2–4 hours of manual review for similar documents, achieving a 60–70% reduction in review time while maintaining accuracy and compliance requirements. A representative implementation from the financial services sector demonstrates this transformative potential: a major institution transformed their compliance review process from a 448-page manual workflow requiring over 10,000 hours to an automated system that reduced external audit times from 1,000 to 300–400 hours and internal audits from 800 to 320–400 hours. The institution now conducts 30–40 internal reviews annually with existing staff while achieving greater accuracy and consistency across assessments. These results demonstrate the potential across implementations: organizations implementing this solution can progress from initial efficiency gains in pilot phases to a 60–70% reduction in review time at full deployment. Beyond time savings, the solution delivers strategic advantages, including resource optimization that helps legal professionals focus on high-value analysis rather than routine document processing, improved compliance posture through systematic identification of privileged communications, and future-ready infrastructure that adapts to evolving legal technology requirements.

Conclusion

The combination of Amazon Bedrock multi-agent collaboration, real-time processing capabilities, and the extensible architecture provided in this post offers legal teams immediate operational benefits while positioning them for future AI advancements—creating the powerful synergy of AI efficiency and human expertise that defines modern legal practice.

To learn more about Amazon Bedrock, refer to the following resources:

About the authors

Puneeth Ranjan Komaragiri is a Principal Technical Account Manager at AWS. He is particularly passionate about monitoring and observability, cloud financial management, and generative AI domains. In his current role, Puneeth enjoys collaborating closely with customers, using his expertise to help them design and architect their cloud workloads for optimal scale and resilience.

Pramod Krishna is a Senior Solutions Architect at AWS. He works as a trusted advisor for customers, helping customers innovate and build well-architected applications in AWS Cloud. Outside of work, Krishna enjoys reading, music, and traveling.

Sean Gifts Is a Senior Technical Account Manager at AWS. He is excited about helping customers with application modernization, specifically event-driven architectures that use serverless frameworks. Sean enjoys helping customers improve their architecture with simple, scalable solutions. Outside of work, he enjoys exercising, enjoying new foods, and traveling.

How PerformLine uses prompt engineering on Amazon Bedrock to detect compliance violations

This post is co-written with Bogdan Arsenie and Nick Mattei from PerformLine.

PerformLine operates within the marketing compliance industry, a specialized subset of the broader compliance software market, which includes various compliance solutions like anti-money laundering (AML), know your customer (KYC), and others. Specifically, marketing compliance refers to adhering to regulations and guidelines set by government agencies that make sure a company’s marketing, advertising, and sales content and communications are truthful, accurate, and not misleading for consumers. PerformLine is the leading service providing comprehensive compliance oversight across marketing, sales, and partner channels. As pioneers of the marketing compliance industry, PerformLine has conducted over 1.1 billion compliance observations over the past 10+ years, automating the entire compliance process—from pre-publication review of materials to continuous monitoring of consumer-facing channels such as websites, emails, and social media. Trusted by consumer finance brands and global organizations, PerformLine uses AI-driven solutions to protect brands and their consumers, transforming compliance efforts into a competitive advantage.

“Discover. Monitor. Act. This isn’t just our tagline—it’s the foundation of our innovation at PerformLine,” says PerformLine’s CTO Bogdan Arsenie. PerformLine’s engineering team brings these principles to life by developing AI-powered technology solutions. In this post, PerformLine and AWS explore how PerformLine used Amazon Bedrock to accelerate compliance processes, generate actionable insights, and provide contextual data—delivering the speed and accuracy essential for large-scale oversight.

The problem

One of PerformLine’s enterprise customers needed a more efficient process for running compliance checks on newly launched product pages, particularly those that integrate multiple products within the same visual and textual framework. These complex pages often feature overlapping content that can apply to one product, several products, or even all of them at once, necessitating a context-aware interpretation that mirrors how a typical consumer would view and interact with the content. By adopting AWS and the architecture discussed in this post, PerformLine can retrieve and analyze these intricate pages through AI-driven processing, generating detailed insights and contextual data that capture the nuanced interplay between various product elements. After the relevant information is extracted and structured, it’s fed directly into their rules engine, enabling robust compliance checks. This accomplishes a seamless flow, from data ingestion to rules-based analysis. It not only preserves the depth of each product’s presentation but also delivers the speed and accuracy critical to large-scale oversight. Monitoring millions of webpages daily for compliance demands a system that can intelligently parse, extract, and analyze content at scale—much like the approach PerformLine has developed for their enterprise customers. In this dynamic landscape, the ever-evolving nature of web content challenges traditional static parsing, requiring a context-aware and adaptive solution. This architecture not only processes bulk data offline but also delivers near real-time performance for one-time requests, dynamically scaling to manage the diverse complexity of each page. By using AI-powered inference, PerformLine provides comprehensive coverage of every product and marketing element across the web, while striking a careful balance between accuracy, performance, and cost.

Solution overview

With this flexible, adaptable solution, PerformLine can tackle even the most challenging webpages, providing comprehensive coverage when extracting and analyzing web content with multiple products. At the same time, by combining consistency with the adaptability of foundation models (FMs), PerformLine can maintain reliable performance across the diverse range of products and websites their customers monitor. This dual focus on agility and operational consistency makes sure their customers benefit from robust compliance checks and data integrity, without sacrificing the speed or scale needed to remain competitive.

PerformLine’s upstream ingestion pipeline efficiently collects millions of web pages and their associated metadata in a batch process. Downstream assets are submitted to PerformLine’s rules engine and compliance review processes. It was imperative that they not disrupt those processes or introduce cascading changes for this solution.

PerformLine decided to use generative AI and Amazon Bedrock to address their core challenges. Amazon Bedrock allows for a broad selection of models, including Amazon Nova. Amazon Bedrock is continuously expanding feature sets around using FMs at scale. This provides a reliable foundation to build a highly available and efficient content processing system.

PerformLine’s solution incorporates the following key components:

AI inference with Amazon Bedrock – Provides seamless access to FMs for content extraction and analysis
Application inference profiles – Enables precise tracking and optimization of inference costs
Event-driven serverless processing pipeline –Provides a lightweight, scalable approach to handling dynamic workloads using Amazon EventBridge, Amazon Simple Queue Service (Amazon SQS), AWS Lambda, Amazon Simple Storage Service (Amazon S3), and Amazon DynamoDB.
Prompt management in Bedrock – Supports versioning, testing, and deployment of prompts for improved AI consistency and control
Task orchestration – Uses Amazon SQS to manage work queues efficiently, facilitating smooth and scalable task execution

PerformLine implemented a scalable, serverless event-driven architecture (shown in the following diagram) that seamlessly integrates with their existing system, requiring less than a day to develop and deploy. This made it possible to focus on prompt optimization, evaluation, and cost management rather than infrastructure overhead. This architecture allows PerformLine to dynamically parse, extract, and analyze web content with high reliability, flexibility, and cost-efficiency.

Detailed AWS data processing pipeline showing 9-step workflow from RDS through multiple services to final results processing

The system implements multiple queue types (Incoming, DLQ, Results) and includes error handling mechanisms. Data flows through various AWS services including: Amazon RDS for initial data storage Amazon MQ RabbitMQ for message handling Amazon S3 for asset storage Amazon EventBridge for event management Amazon SQS for queue management AWS Lambda for serverless processing Amazon DynamoDB for NoSQL data storage

PerformLine’s process consists of several steps, including processing (Step 1), event trigger and storage (Steps 2–6), structured output and storage (Step 7), and downstream processing and compliance checks (Steps 8–9):

Millions of pages are processed by an upstream extract, transform, and load (ETL) process from PerformLine’s core systems running on the AWS Cloud.
When a page is retrieved, it triggers an event in the compliance check system.
Amazon S3 allows for storage of the data from a page according to metadata.
EventBridge uses event-driven processing to route Amazon S3 events to Amazon SQS.
Amazon SQS queues messages for processing and enables messages to be retried on failure.
A Lambda Function consumes SQS messages and also scales dynamically to handle even unpredictable workloads:
1. This function uses Amazon Bedrock to perform extraction and generative AI analysis of the content from Amazon SQS. Amazon Bedrock offers the greatest flexibility to choose the right model for the job. For PerformLine’s use case, Amazon’s Nova Pro was best suited for complex requests that require a powerful model but still allows for a high performance to cost ratio. Anthropic’s Claude Haiku model allows for optimized quick calls, where a fast response is paramount for additional processing if needed. Amazon Bedrock features, including Amazon Bedrock Prompt Management and inference profiles are used to increase input code variability without affecting output and reduce complexity in usage of FMs through Amazon Bedrock.
2. The function stores customer-defined product schemas in Amazon DynamoDB, enabling dynamic large language model (LLM) targeting and schema-driven output generation.
Amazon S3 stores the extracted data, which is formatted as structured JSON adhering to the target schema.
EventBridge forwards Amazon S3 events to Amazon SQS, making extracted data available for downstream processing.
Compliance checks and business rules, running on other PerformLine’s systems, are applied to validate and enforce regulatory requirements.

Cost optimizations

The solution offers several cost optimizations, including change data capture (CDC) on the web and strategic multi-pass inference. After a page’s content has been analyzed and formatted, it’s written back to a partition that includes a metadata hash of the asset. This enables upstream processes to determine whether a page has already been processed and if its content has changed. The key benefits of this approach include:

Alleviating redundant processing of the same pages, contributing to PerformLine experiencing a 15% workload reduction in human evaluation tasks. This frees time for human evaluators and allows them focus on critical pages rather than all the pages.
Avoiding reprocessing unchanged pages, dynamically reducing PerformLine’s analysts’ workload by over 50% in addition to deduplication gains.

LLM inference costs can escalate at scale, but context and carefully structured prompts are critical for accuracy. To optimize costs while maintaining precision, PerformLine implemented a multi-pass approach using Amazon Bedrock:

Initial filtering with Amazon Nova Micro – This lightweight model efficiently identifies relevant products with minimal cost.
Targeted extraction with Amazon Nova Lite – Identified products are batched into smaller groups and passed to Amazon Nova Lite for deeper analysis. This keeps PerformLine within token limits while improving extraction accuracy.
Increased accuracy through context-aware processing – By first identifying the target content and then processing it in smaller batches, PerformLine significantly improved accuracy while minimizing token consumption.

Use of Amazon Bedrock

During initial testing, PerformLine quickly realized the need for a more scalable approach to prompt management. Manually tracking multiple prompt versions and templates became inefficient as PerformLine iterated and collaborated.

Amazon Bedrock’s Prompt Management service provided a centralized solution, enabling them to version, manage, and seamlessly deploy prompts to production. After the prompts are deployed, they can be dynamically referenced in AWS Lambda, allowing for flexible configuration. Additionally, by using Amazon Bedrock application profile inference endpoints, PerformLine can dynamically adjust the models the Lambda function invokes, track cost per invocation, and attribute costs to specific application instances through setting up cost tags.

To streamline model interactions, PerformLine chose the Amazon Bedrock Converse API which provides a developer-friendly, standardized interface for model invocation. When combined with inference endpoints and prompt management, a Lambda function using the Amazon Bedrock Converse API becomes highly configurable—PerformLine developers can rapidly test new models and prompts, evaluate results, and iterate without needing to rebuild or redeploy. The simplification of prompt management and ability to deploy various models through Amazon Bedrock is shown in the following diagram.

ML model configuration system showing inference setup, prompt versioning, and environment-specific parameter management

Comprehensive AWS ML model configuration architecture highlighting three main components: Inference System: Model ID integration Profile configuration Content management Inference settings Prompt Management: Version control (V1 and Draft versions) Publish ID tracking Model A specifications Store configurations Environment Control: Separate PROD and DEV paths Environment-specific parameter stores Invoke ID management Engineering iteration tracking

Future plans and enhancements

PerformLine is excited to dive into additional Amazon Bedrock features, including prompt caching and Amazon Bedrock Flows.

With prompt caching, users can checkpoint prompt tokens, effectively caching context for reuse in subsequent API calls. Prompt caching on Amazon Bedrock offers up to 85% latency improvements and 90% cost reduction in comparison to calls without prompt caching. PerformLine sees prompt caching as a feature that will become the standard moving forward. They have a number of use cases for their data, and having the ability to apply further analysis on the same content at a lower cost creates new opportunities for feature expansion and development.

Amazon Bedrock Flows is a visual workflow builder that enables users to orchestrate multi-step generative AI tasks by connecting FMs and APIs without extensive coding. Amazon Bedrock Flows is a next step in simplifying PerformLine’s orchestration of knowledge bases, prompt caching, and even Amazon Bedrock agents in the future. Creating flows can help reduce time to feature deployment and maintenance.

Summary

PerformLine has implemented a highly scalable, serverless, AI-driven architecture that enhances efficiency, cost-effectiveness, and compliance in the web content processing pipeline. By using Amazon Bedrock, EventBridge, Amazon SQS, Lambda, and DynamoDB, they have built a solution that can dynamically scale, optimize AI inference costs, and reduce redundant processing—all while maintaining operational flexibility and compliance integrity. Based on their current volume and workflow, PerformLine is projected to process between 1.5 to 2 million pages daily, from which they expect to extract approximately 400,000 to 500,000 products. Additionally, PerformLine anticipates applying rules to each asset, resulting in about 500,000 rule observations that will require review each day.Throughout the design process PerformLine made sure their solution remains as simple as possible while still delivering operational flexibility and integrity. This approach minimizes complexity, enhances maintainability, and accelerates deployment, empowering them to adapt quickly to evolving business needs without unnecessary overhead.

By using a serverless AI-driven architecture built on Amazon Bedrock, PerformLine helps their customers tackle even the most complex, multi-product webpages with unparalleled accuracy and efficiency. This holistic approach interprets visual and textual elements as a typical consumer would, verifying that every product variant is accurately assessed for compliance. The resulting insights are then fed directly into a rules engine, enabling rapid, data-driven decisions. For PerformLine’s customers, this means less redundant processing, lower operational costs, and a dramatically simplified compliance workflow, all without compromising on speed or accuracy. By reducing the overhead of large-scale data analysis and streamlining compliance checks, PerformLine’s solution ultimately frees teams to focus on driving innovation and delivering value.

About the authors

Bogdan Arsenie is the Chief Technology Officer at PerformLine, with over two decades of experience leading technological innovation across digital advertising, big data, mobile gaming, and social engagement. Bogdan began programming at age 13, customizing bulletin board software to fund his passion for Star Trek memorabilia. He served as PerformLine’s founding CTO from 2007–2009, pioneering their initial compliance platform. Later, as CTO at the Rumie Initiative, he helped scale a global education initiative recognized by Google’s Impact Challenge.

Nick Mattei is a Senior Software Engineer at PerformLine. He is focused on solutions architecture and distributed application development in AWS. Outside of work, Nick is an avid cyclist and skier, always looking for the next great climb or powder day.

Shervin Suresh is a Generative AI Solutions Architect at AWS. He supports generative AI adoption both internally at AWS and externally with fast-growing startup customers. He is passionate about using technology to help improve the lives of people in all aspects. Outside of work, Shervin loves to cook, build LEGO, and collaborate with people on things they are passionate about.

Medha Aiyah is a Solutions Architect at AWS. She graduated from the University of Texas at Dallas with an MS in Computer Science, with a focus on AI/ML. She supports ISV customers in a wide variety of industries, by empowering customers to use AWS optimally to achieve their business goals. She is especially interested in guiding customers on ways to implement AI/ML solutions and use generative AI. Outside of work, Medha enjoys hiking, traveling, and dancing.

Michael Zhang is a generalist Solutions Architect at AWS working with small to medium businesses. He has been with Amazon for over 3 years and uses his background in computer science and machine learning to support customers on AWS. In his free time, Michael loves to hike and explore other cultures.

How Global Calibration Strengthens Multiaccuracy

Multiaccuracy and multicalibration are multigroup fairness notions for prediction that have found numerous applications in learning and computational complexity. They can be achieved from a single learning primitive: weak agnostic learning. Here we investigate the power of multiaccuracy as a learning primitive, both with and without the additional assumption of calibration. We find that multiaccuracy in itself is rather weak, but that the addition of global calibration (this notion is called calibrated multiaccuracy) boosts its power substantially, enough to recover implications that were…Apple Machine Learning Research

MMAU: A Holistic Benchmark of Agent Capabilities Across Diverse Domains

Recent advances in large language models (LLMs) have increased the demand for comprehensive benchmarks to evaluate their capabilities as human-like agents. Existing benchmarks, while useful, often focus on specific application scenarios, emphasizing task completion but failing to dissect the underlying skills that drive these outcomes. This lack of granularity makes it difficult to deeply discern where failures stem from. Additionally, setting up these environments requires considerable effort, and issues of unreliability and reproducibility sometimes arise, especially in interactive tasks. To…Apple Machine Learning Research