Amazon Bedrock Agents observability using Arize AI

Amazon Bedrock Agents observability using Arize AI

This post is cowritten with John Gilhuly from Arize AI.

With Amazon Bedrock Agents, you can build and configure autonomous agents in your application. An agent helps your end-users complete actions based on organization data and user input. Agents orchestrate interactions between foundation models (FMs), data sources, software applications, and user conversations. In addition, agents automatically call APIs to take actions and invoke knowledge bases to supplement information for these actions. By integrating agents, you can accelerate your development effort to deliver generative AI applications. With agents, you can automate tasks for your customers and answer questions for them. For example, you can create an agent that helps customers process insurance claims or make travel reservations. You don’t have to provision capacity, manage infrastructure, or write custom code. Amazon Bedrock manages prompt engineering, memory, monitoring, encryption, user permissions, and API invocation.

AI agents represent a fundamental shift in how applications make decisions and interact with users. Unlike traditional software systems that follow predetermined paths, AI agents employ complex reasoning that often operates as a “black box.” Monitoring AI agents presents unique challenges for organizations seeking to maintain reliability, efficiency, and optimal performance in their AI implementations.

Today, we’re excited to announce a new integration between Arize AI and Amazon Bedrock Agents that addresses one of the most significant challenges in AI development: observability. Agent observability is a crucial aspect of AI operations that provides deep insights into how your Amazon Bedrock agents perform, interact, and execute tasks. It involves tracking and analyzing hierarchical traces of agent activities, from high-level user requests down to individual API calls and tool invocations. These traces form a structured tree of events, helping developers understand the complete journey of user interactions through the agent’s decision-making process. Key metrics that demand attention include response latency, token usage, runtime exceptions, and inspect function calling. As organizations scale their AI implementations from proof of concept to production, understanding and monitoring AI agent behavior becomes increasingly critical.

The integration between Arize AI and Amazon Bedrock Agents provides developers with comprehensive observability tools for tracing, evaluating, and monitoring AI agent applications. This solution delivers three primary benefits:

  • Comprehensive traceability – Gain visibility into every step of your agent’s execution path, from initial user query through knowledge retrieval and action execution
  • Systematic evaluation framework – Apply consistent evaluation methodologies to measure and understand agent performance
  • Data-driven optimization – Run structured experiments to compare different agent configurations and identify optimal settings

The Arize AI service is available in two versions:

  • Arize AX – An enterprise solution offering advanced monitoring capabilities
  • Arize Phoenix – An open source service making tracing and evaluation accessible to developers

In this post, we demonstrate the Arize Phoenix system for tracing and evaluation. Phoenix can run on your local machine, a Jupyter notebook, a containerized deployment, or in the cloud. We explore how this integration works, its key features, and how you can implement it in your Amazon Bedrock Agents applications to enhance observability and maintain production-grade reliability.

Solution overview

Large language model (LLM) tracing records the paths taken by requests as they propagate through multiple steps or components of an LLM application. It improves the visibility of your application or system’s health and makes it possible to debug behavior that is difficult to reproduce locally. For example, when a user interacts with an LLM application, tracing can capture the sequence of operations, such as document retrieval, embedding generation, language model invocation, and response generation, to provide a detailed timeline of the request’s execution.

For an application to emit traces for analysis, the application must be instrumented. Your application can be manually instrumented or be automatically instrumented. Arize Phoenix offers a set of plugins (instrumentors) that you can add to your application’s startup process that perform automatic instrumentation. These plugins collect traces for your application and export them (using an exporter) for collection and visualization. The Phoenix server is a collector and UI that helps you troubleshoot your application in real time. When you run Phoenix (for example, the px.launch_app() container), Phoenix starts receiving traces from an application that is exporting traces to it. For Phoenix, the instrumentors are managed through a single repository called OpenInference. OpenInference provides a set of instrumentations for popular machine learning (ML) SDKs and frameworks in a variety of languages. It is a set of conventions and plugins that is complimentary to OpenTelemetry and online transaction processing (OLTP) to enable tracing of AI applications. Phoenix currently supports OTLP over HTTP.

For AWS, Boto3 provides Python bindings to AWS services, including Amazon Bedrock, which provides access to a number of FMs. You can instrument calls to these models using OpenInference, enabling OpenTelemetry-aligned observability of applications built using these models. You can also capture traces on invocations of Amazon Bedrock agents using OpenInference and view them in Phoenix.The following high-level architecture diagram shows an LLM application created using Amazon Bedrock Agents, which has been instrumented to send traces to the Phoenix server.

In the following sections, we demonstrate how, by installing the openinference-instrumentation-bedrock library, you can automatically instrument interactions with Amazon Bedrock or Amazon Bedrock agents for observability, evaluation, and troubleshooting purposes in Phoenix.

Prerequisites

To follow this tutorial, you must have the following:

You can also clone the GitHub repo locally to run the Jupyter notebook yourself:

git clone https://github.com/awslabs/amazon-bedrock-agent-samples.git

Install required dependencies

Begin by installing the necessary libraries:

%pip install -r requirements.txt — quiet

Next, import the required modules:

import time
import boto3
import logging
import os
import nest_asyncio
from phoenix.otel import register
from openinference.instrumentation import using_metadata

nest_asyncio.apply()

The arize-phoenix-otel package provides a lightweight wrapper around OpenTelemetry primitives with Phoenix-aware defaults. These defaults are aware of environment variables you must set to configure Phoenix in the next steps, such as:

  • PHOENIX_COLLECTOR_ENDPOINT
  • PHOENIX_PROJECT_NAME
  • PHOENIX_CLIENT_HEADERS
  • PHOENIX_API_KEY

Configure the Phoenix environment

Set up the Phoenix Cloud environment for this tutorial. Phoenix can also be self-hosted on AWS instead.

os.environ["PHOENIX_COLLECTOR_ENDPOINT"] = "https://app.phoenix.arize.com“
if not os.environ.get("PHOENIX_CLIENT_HEADERS"):
os.environ["PHOENIX_CLIENT_HEADERS"] = "api_key=" + input("Enter your Phoenix API key: ")

Connect your notebook to Phoenix with auto-instrumentation enabled:

project_name = "Amazon Bedrock Agent Example"
tracer_provider = register(project_name=project_name, auto_instrument=True)

The auto_instrument parameter automatically locates the openinference-instrumentation-bedrock library and instruments Amazon Bedrock and Amazon Bedrock Agent calls without requiring additional configuration. Configure metadata for the span:

metadata = { "agent" : "bedrock-agent", 
            "env" : "development"
Metadata is used to filter search values in the dashboard
       }

Set up an Amazon Bedrock session and agent

Before using Amazon Bedrock, make sure that your AWS credentials are configured correctly. You can set them up using the AWS Command Line Interface (AWS CLI) or by setting environment variables:

session = boto3.Session()
REGION = session.region_name
bedrock_agent_runtime = session.client(service_name="bedrock-agent-runtime",region_name=REGION)

We assume you’ve already created an Amazon Bedrock agent. To configure the agent, use the following code:

agent_id = "XXXXXYYYYY" # ← Configure your Bedrock Agent ID
agent_alias_id = "Z0ZZZZZZ0Z" # ← Optionally set a different Alias ID if you have one

Before proceeding to your next step, you can validate whether invoke agent is working correctly. The response is not important; we are simply testing the API call.

print(f"Trying to invoke alias {agent_alias_id} of agent {agent_id}...")
agent_resp = bedrock_agent_runtime.invoke_agent(
    agentAliasId=agent_alias_id,
    agentId=agent_id,
    inputText="Hello!",
    sessionId="dummy-session",
)
if "completion" in agent_resp:
    print("✅ Got response")
else:
    raise ValueError(f"No 'completion' in agent response:n{agent_resp}")

Run your agent with tracing enabled

Create a function to run your agent and capture its output:

@using_metadata(metadata)
def run(input_text):
    session_id = f"default-session1_{int(time.time())}"

    attributes = dict(
        inputText=input_text,
        agentId=agent_id,
        agentAliasId=agent_alias_id,
        sessionId=session_id,
        enableTrace=True,
    )
    response = bedrock_agent_runtime.invoke_agent(**attributes)

    # Stream the response
    for _, event in enumerate(response["completion"]):
        if "chunk" in event:
            print(event)
            chunk_data = event["chunk"]
            if "bytes" in chunk_data:
                output_text = chunk_data["bytes"].decode("utf8")
                print(output_text)
        elif "trace" in event:
            print(event["trace"])

Test your agent with a few sample queries:

run ("What are the total leaves for Employee 1?")
run ("If Employee 1 takes 4 vacation days off, What are the total leaves left for Employee 1?")

You should replace these queries with the queries that your application is built for. After executing these commands, you should see your agent’s responses in the notebook output. The Phoenix instrumentation is automatically capturing detailed traces of these interactions, including knowledge base lookups, orchestration steps, and tool calls.

View captured traces in Phoenix

Navigate to your Phoenix dashboard to view the captured traces. You will see a comprehensive visualization of each agent invocation, including:

  • The full conversation context
  • Knowledge base queries and results
  • Tool or action group calls and responses
  • Agent reasoning and decision-making steps

Phoenix’s tracing and span analysis capabilities are useful during the prototyping and debugging stages. By instrumenting application code with Phoenix, teams gain detailed insights into the execution flow, making it straightforward to identify and resolve issues. Developers can drill down into specific spans, analyze performance metrics, and access relevant logs and metadata to streamline debugging efforts. With Phoenix’s tracing capabilities, you can monitor the following:

  • Application latency – Identify latency bottlenecks and address slow invocations of LLMs, retrievers, and other components within your application, enabling you to optimize performance and responsiveness.
  • Token usage – Gain a detailed breakdown of token usage for your LLM calls, so you can identify and optimize the most expensive LLM invocations.
  • Runtime exceptions – Capture and inspect critical runtime exceptions, such as rate-limiting events, that can help you proactively address and mitigate potential issues.
  • Retrieved documents – Inspect the documents retrieved during a retriever call, including the score and order in which they were returned, to provide insight into the retrieval process.
  • Embeddings – Examine the embedding text used for retrieval and the underlying embedding model, so you can validate and refine your embedding strategies.
  • LLM parameters – Inspect the parameters used when calling an LLM, such as temperature and system prompts, to facilitate optimal configuration and debugging.
  • Prompt templates – Understand the prompt templates used during the prompting step and the variables that were applied, so you can fine-tune and improve your prompting strategies.
  • Tool descriptions – View the descriptions and function signatures of the tools your LLM has been given access to, in order to better understand and control your LLM’s capabilities.
  • LLM function calls – For LLMs with function call capabilities (such as Anthropic’s Claude, Amazon Nova, or Meta’s Llama), you can inspect the function selection and function messages in the input to the LLM. This can further help you debug and optimize your application.

The following screenshot shows the Phoenix dashboard for the Amazon Bedrock agent, showing the latency, token usage, total traces.

You can choose one of the traces to drill down to the level of the entire orchestration.

Evaluate the agent in Phoenix

Evaluating any AI application is a challenge. Evaluating an agent is even more difficult. Agents present a unique set of evaluation pitfalls to navigate. A common evaluation metric for agents is their function calling accuracy, in other words, how well they do at choosing the right tool for the job. For example, agents can take inefficient paths and still get to the right solution. How do you know if they took an optimal path? Additionally, bad responses upstream can lead to strange responses downstream. How do you pinpoint where a problem originated? Phoenix also includes built-in LLM evaluations and code-based experiment testing. An agent is characterized by what it knows about the world, the set of actions it can perform, and the pathway it took to get there. To evaluate an agent, you must evaluate each component. Phoenix has built evaluation templates for every step, such as:

You can evaluate the individual skills and response using normal LLM evaluation strategies, such as retrieval evaluation, classification with LLM judges, hallucination, or Q&A correctness. In this post, we demonstrate evaluation of agent function calling. You can use the Agent Function Call eval to determine how well a model selects a tool to use, extracts the right parameters from the user query, and generates the tool call code. Now that you’ve traced your agent in the previous step, the next step is to add evaluations to measure its performance. A common evaluation metric for agents is their function calling accuracy (how well they do at choosing the right tool for the job).Complete the following steps:

  1. Up until now, you have just used the lighter-weight Phoenix OTEL tracing library. To run evals, you must to install the full library:

!pip install -q arize-phoenix — quiet

  1. Import the necessary evaluation components:
import re
import json
import phoenix as px
from phoenix.evals import (
TOOL_CALLING_PROMPT_RAILS_MAP,
TOOL_CALLING_PROMPT_TEMPLATE,
BedrockModel,
llm_classify,
)
from phoenix.trace import SpanEvaluations
from phoenix.trace.dsl import SpanQuery

The following is our agent function calling prompt template:

TOOL_CALLING_PROMPT_TEMPLATE = """

You are an evaluation assistant evaluating questions and tool calls to
determine whether the tool called would answer the question. The tool
calls have been generated by a separate agent, and chosen from the list of
tools provided below. It is your job to decide whether that agent chose
the right tool to call.

    [BEGIN DATA]
    ************
    [Question]: {question}
    ************
    [Tool Called]: {tool_call}
    [END DATA]

Your response must be single word, either "correct" or "incorrect",
and should not contain any text or characters aside from that word.
"incorrect" means that the chosen tool would not answer the question,
the tool includes information that is not presented in the question,
or that the tool signature includes parameter values that don't match
the formats specified in the tool signatures below.

"correct" means the correct tool call was chosen, the correct parameters
were extracted from the question, the tool call generated is runnable and correct,
and that no outside information not present in the question was used
in the generated question.

    [Tool Definitions]: {tool_definitions}
"""
  1. Because we are only evaluating the inputs, outputs, and function call columns, let’s extract those into a simpler-to-use dataframe. Phoenix provides a method to query your span data and directly export only the values you care about:
query = (
SpanQuery()
.where(
# Filter for the `LLM` span kind.
# The filter condition is a string of valid Python boolean expression.
"span_kind == 'LLM' and 'evaluation' not in input.value"
)
.select(
question="input.value",
outputs="output.value",
)
)
trace_df = px.Client().query_spans(query, project_name=project_name)
  1. The next step is to prepare these traces into a dataframe with columns for input, tool call, and tool definitions. Parse the JSON input and output data to create these columns:
def extract_tool_calls(output_value):
try:
tool_calls = []
# Look for tool calls within <function_calls> tags
if "<function_calls>" in output_value:
# Find all tool_name tags
tool_name_pattern = r"<tool_name>(.*?)</tool_name>"
tool_names = re.findall(tool_name_pattern, output_value)

# Add each found tool name to the list
for tool_name in tool_names:
if tool_name:
tool_calls.append(tool_name)
except Exception as e:
print(f"Error extracting tool calls: {e}")
pass

return tool_calls
  1. Apply the function to each row of trace_df.output.value:
trace_df["tool_call"] = trace_df["outputs"].apply(
lambda x: extract_tool_calls(x) if isinstance(x, str) else []
)

# Display the tool calls found
print("Tool calls found in traces:", trace_df["tool_call"].sum())
  1. Add tool definitions for evaluation:
trace_df["tool_definitions"] = (
"phoenix-traces retrieves the latest trace information from Phoenix, phoenix-experiments retrieves the latest experiment information from Phoenix, phoenix-datasets retrieves the latest dataset information from Phoenix"
)

Now with your dataframe prepared, you can use Phoenix’s built-in LLM-as-a-Judge template for tool calling to evaluate your application. The following method takes in the dataframe of traces to evaluate, our built-in evaluation prompt, the eval model to use, and a rails object to snap responses from our model to a set of binary classification responses. We also instruct our model to provide explanations for its responses.

  1. Run the tool calling evaluation:
rails = list(TOOL_CALLING_PROMPT_RAILS_MAP.values())

eval_model = BedrockModel(session=session, model_id="us.anthropic.claude-3-5-haiku-20241022-v1:0")

response_classifications = llm_classify(
    data=trace_df,
    template=TOOL_CALLING_PROMPT_TEMPLATE,
    model=eval_model,
    rails=rails,
    provide_explanation=True,
)
response_classifications["score"] = response_classifications.apply(
    lambda x: 1 if x["label"] == "correct" else 0, axis=1
)

We use the following parameters:

  • df – A dataframe of cases to evaluate. The dataframe must have columns to match the default template.
  • question – The query made to the model. If you exported spans from Phoenix to evaluate, this will be the llm.input_messages column in your exported data.
  • tool_call – Information on the tool called and parameters included. If you exported spans from Phoenix to evaluate, this will be the llm.function_call column in your exported data.
  1. Finally, log the evaluation results to Phoenix:
px.Client().log_evaluations(
SpanEvaluations(eval_name="Tool Calling Eval", dataframe=response_classifications),
)

After running these commands, you will see your evaluation results on the Phoenix dashboard, providing insights into how effectively your agent is using its available tools.

The following screenshot shows how the tool calling evaluation attribute shows up when you run the evaluation.

When you expand the individual trace, you can observe that the tool calling evaluation adds a score of 1 if the label is correct. This means that agent has responded correctly.

Conclusion

As AI agents become increasingly prevalent in enterprise applications, effective observability is crucial for facilitating their reliability, performance, and continuous improvement. The integration of Arize AI with Amazon Bedrock Agents provides developers with the tools they need to build, monitor, and enhance AI agent applications with confidence. We’re excited to see how this integration will empower developers and organizations to push the boundaries of what’s possible with AI agents.

Stay tuned for more updates and enhancements to this integration in the coming months. To learn more about Amazon Bedrock Agents and the Arize AI integration, refer to the Phoenix documentation and Integrating Arize AI and Amazon Bedrock Agents: A Comprehensive Guide to Tracing, Evaluation, and Monitoring.


About the Authors

Ishan Singh is a Sr. Generative AI Data Scientist at Amazon Web Services, where he helps customers build innovative and responsible generative AI solutions and products. With a strong background in AI/ML, Ishan specializes in building generative AI solutions that drive business value. Outside of work, he enjoys playing volleyball, exploring local bike trails, and spending time with his wife and dog, Beau.

John Gilhuly is the Head of Developer Relations at Arize AI, focused on AI agent observability and evaluation tooling. He holds an MBA from Stanford and a B.S. in C.S. from Duke. Prior to joining Arize, John led GTM activities at Slingshot AI, and served as a venture fellow at Omega Venture Partners. In his pre-AI life, John built out and ran technical go-to-market teams at Branch Metrics.

Richa Gupta is a Sr. Solutions Architect at Amazon Web Services. She is passionate about architecting end-to-end solutions for customers. Her specialization is machine learning and how it can be used to build new solutions that lead to operational excellence and drive business revenue. Prior to joining AWS, she worked in the capacity of a Software Engineer and Solutions Architect, building solutions for large telecom operators. Outside of work, she likes to explore new places and loves adventurous activities.

Aris Tsakpinis is a Specialist Solutions Architect for Generative AI, focusing on open weight models on Amazon Bedrock and the broader generative AI open source landscape. Alongside his professional role, he is pursuing a PhD in Machine Learning Engineering at the University of Regensburg, where his research focuses on applied natural language processing in scientific domains.

Yanyan Zhang is a Senior Generative AI Data Scientist at Amazon Web Services, where she has been working on cutting-edge AI/ML technologies as a Generative AI Specialist, helping customers use generative AI to achieve their desired outcomes. Yanyan graduated from Texas A&M University with a PhD in Electrical Engineering. Outside of work, she loves traveling, working out, and exploring new things.

Mani Khanuja is a Principal Generative AI Specialist SA and author of the book Applied Machine Learning and High-Performance Computing on AWS. She leads machine learning projects in various domains such as computer vision, natural language processing, and generative AI. She speaks at internal and external conferences such AWS re:Invent, Women in Manufacturing West, YouTube webinars, and GHC 23. In her free time, she likes to go for long runs along the beach.

Musarath Rahamathullah is an AI/ML and GenAI Solutions Architect at Amazon Web Services, focusing on media and entertainment customers. She holds a Master’s degree in Analytics with a specialization in Machine Learning. She is passionate about using AI solutions in the AWS Cloud to address customer challenges and democratize technology. Her professional background includes a role as a Research Assistant at the prestigious Indian Institute of Technology, Chennai. Beyond her professional endeavors, she is interested in interior architecture, focusing on creating beautiful spaces to live.

Read More

Finalist teams advance in the Amazon Nova AI Challenge: Trusted AI Track


Finalist teams advance in the Amazon Nova AI Challenge: Trusted AI Track

Top eight university teams move on to head-to-head finals focused on AI security for code generation.

Conversational AI

June 24, 02:11 PMJune 24, 02:13 PM

Since November 2025, ten top university teams from around the world have competed in the inaugural Amazon Nova AI Challenge: Trusted AI Track, focused on strengthening security in AI coding assistants and developing new automated methods to red-team and test them. After months of intense competition, eight teams have advanced to the finals, demonstrating outstanding innovation in securing AI-powered code generation. Finals will take place June 2627, with judges convening in Santa Clara, California, while teams participate remotely in a tournament-style competition designed to push the boundaries of secure, AI-assisted software development. In each tournament attacking and defending teams are matched up against each other, and in each match the attacker engages in a limited number of conversations with the defender in order to try and solicit malicious code, vulnerable code, or assistance with malicious cyberactivity. In addition to their success in defending, defending models are also evaluated for their utility in supporting coding tasks. Attacking systems are evaluated of attack success and the diversity of their attacks. In the finals event, in addition to a finals tournament model defenses and attacks will also be evaluated by expert human red-teamers.

The finalists

The finalists were selected based on tournament performance, research papers, and presentations of their innovations.

“Since November, all of the teams been developing increasingly innovative ways to make AI-assisted coding more secure and reliable,” said Michael Johnston, lead of the Amazon Nova AI Challenge. “The quality of work has been exceptional, making the finalist selection highly competitive.”

Defender teams are working to build robust security features into code-generating models, while the attacker teams are developing sophisticated techniques to test these models and identify potential vulnerabilities. Together, they’re helping to shape the future of secure AI development.

The finals format

In the finals, teams will compete remotely in an offline tournament evaluated by a panel of judges from Amazon’s artificial general intelligence (AGI) team, Amazon Security, AWS Responsible AI, and AWS Q for Developers. The finals will test the teams solutions against real-world scenarios in a controlled competition environment.

The challenge sits at the intersection of AI capability and security, two critical areas for the responsible advancement of generative AI,” said Dr. Gang Wang, one of the faculty advisors for the UIUC team. “Our students have worked tirelessly to develop new approaches that enhance security without compromising the user experience.”

After the finals, all teams will gather in Seattle on July 22-24 for the Amazon Nova AI Challenge Summit, where winners will be announced and teams will present and share their research findings.

Advancing the field

The challenge brings together Amazons AI expertise with top academic talent to accelerate secure innovation in generative AI. Research produced through the competition aims to contribute to the development of safer, more reliable AI systems for all.

“What makes this challenge especially valuable is that it combines technical innovation with real-world application,” said Dr. Xiangyu Zhang of Purdue University. “Our students arent just competing, theyre helping to solve real-world AI security challenges.”

The Amazon Nova AI Challenge is part of Amazons broader commitment to responsible AI development and academic collaboration. For more information and updates, visit amazon.science/nova-ai-challenge.

Defender teams (model developers)Attacker teams (security testers)

Research areas: Conversational AI, Security, privacy, and abuse prevention

Tags: Generative AI, Responsible AI , Large language models (LLMs), Amazon Nova

Read More

HPE and NVIDIA Debut AI Factory Stack to Power Next Industrial Shift

HPE and NVIDIA Debut AI Factory Stack to Power Next Industrial Shift

To speed up AI adoption across industries, HPE and NVIDIA today launched new AI factory offerings at HPE Discover in Las Vegas.

The new lineup includes everything from modular AI factory infrastructure and HPE’s AI-ready RTX PRO Servers  (HPE ProLiant Compute DL380a Gen12), to the next generation of HPE’s turnkey AI platform, HPE Private Cloud AI. The goal: give enterprises a framework to build and scale generative, agentic and industrial AI.

The NVIDIA AI Computing by HPE portfolio is now among the broadest in the market.

The portfolio combines NVIDIA Blackwell accelerated computing, NVIDIA Spectrum-X Ethernet and NVIDIA BlueField-3 networking technologies, NVIDIA AI Enterprise software, and HPE’s full portfolio of servers, storage, services and software. This now includes HPE OpsRamp Software, a validated observability solution for the NVIDIA Enterprise AI Factory, and HPE Morpheus Enterprise Software for orchestration. The result is a pre-integrated, modular infrastructure stack to help teams get AI into production faster.

This includes the next-generation HPE Private Cloud AI, co-engineered with NVIDIA and validated as part of the NVIDIA Enterprise AI Factory framework. This full-stack, turnkey AI factory solution will offer HPE ProLiant Compute DL380a Gen12 servers with the new NVIDIA RTX PRO 6000 Blackwell Server Edition GPUs.

These new NVIDIA RTX PRO Servers from HPE provide a universal data center platform for a wide range of enterprise AI and industrial AI use cases, and are now available to order from HPE. HPE Private Cloud AI includes the latest NVIDIA AI Blueprints, including the NVIDIA AI-Q Blueprint for AI agent creation and workflows.

HPE also announced a new NVIDIA HGX B300 system, the HPE Compute XD690, built with NVIDIA Blackwell Ultra GPUs. It’s the latest entry in the NVIDIA AI Computing by HPE lineup and is expected to ship in October.

In Japan, KDDI is working with HPE to build NVIDIA AI infrastructure to accelerate global adoption.

The HPE-built KDDI system will be based on the NVIDIA GB200 NVL72 platform, built on the NVIDIA Grace Blackwell architecture, at the KDDI Osaka Sakai Data Center.

To accelerate AI for financial services, HPE will co-test agentic AI workflows built on Accenture’s AI Refinery with NVIDIA, running on HPE Private Cloud AI. Initial use cases include sourcing, procurement and risk analysis.

HPE said it’s adding 26 new partners to its “Unleash AI” ecosystem to support more NVIDIA AI use cases. The company now offers more than 70 packaged AI workloads, from fraud detection and video analytics to sovereign AI and cybersecurity.

Security and governance were a focus, too. HPE Private Cloud AI supports air-gapped management, multi-tenancy and post-quantum cryptography. HPE’s try-before-you-buy program lets customers test the system in Equinix data centers before purchase. HPE also introduced new programs, including AI Acceleration Workshops with NVIDIA, to help scale AI deployments.

  • Watch the keynote: HPE CEO Antonio Neri announced the news from the Las Vegas Sphere on Tuesday at 9 a.m. PT. Register for the livestream and replay.
  • Explore more: Learn how NVIDIA and HPE build AI factories for every industry. Visit the partner page.

Read More

How SkillShow automates youth sports video processing using Amazon Transcribe

How SkillShow automates youth sports video processing using Amazon Transcribe

This post is co-written with Tom Koerick from SkillShow.

The youth sports market was valued at $37.5 billion globally in 2022 and is projected to grow by 9.2% each year through 2030. Approximately 60 million young athletes participate in this market worldwide. SkillShow, a leader in youth sports video production, films over 300 events yearly in the youth sports industry, creating content for over 20,000 young athletes annually. This post describes how SkillShow used Amazon Transcribe and other Amazon Web Services (AWS) machine learning (ML) services to automate their video processing workflow, reducing editing time and costs while scaling their operations.

Challenge

In response to the surge in youth sports video production, manual video editing processes are becoming increasingly unsustainable. Since 2001, SkillShow has been at the forefront of sports video production, providing comprehensive video services for individuals, teams, and event organizers. They specialize in filming, editing, and distributing content that helps athletes showcase their skills to recruiters, build their personal brand on social media, and support their development training. As a trusted partner to major sports organizations including the Perfect Game, 3Step Sports, USA Baseball, MLB Network, Under Armour, Elite11 football combines and more, SkillShow has filmed hundreds of thousands of athletes and thousands of regional and national events across different sports and age groups.

Despite their market leadership, SkillShow faced significant operational challenges. With only seven full-time employees managing their expanding operation, they had to outsource to over 1,100 contractors annually. This reliance on outsourced editing not only increased operational costs but also resulted in a lengthy 3-week turnaround time per event, making it difficult to keep pace with the growing demand for youth sports content.

Managing approximately 230 TB of video data per year created significant operational challenges. This massive volume of data meant lengthy upload and download times for editors, expensive storage costs, and complex data management requirements. Each event’s raw footage needed to be securely stored, backed up, and made accessible to multiple editors, straining both technical resources and IT infrastructure. These challenges led to SkillShow halting new events mid-2023, limiting their growth potential in a rapidly expanding market. The need for an efficient, scalable solution became critical to maintaining SkillShow’s position and meeting the growing demand for youth sports content, particularly in the post-COVID era where recruiting videos have become essential for leagues and athletes alike.

Solution overview

To address these challenges, SkillShow partnered with AWS to develop an automated video processing pipeline. The team initially explored several approaches to automate player identification.

Facial recognition proved challenging due to varying video quality, inconsistent lighting conditions, and frequent player movement during games. Additionally, players often wore equipment such as helmets or protective gear that obscured their faces, making reliable identification difficult.

Text-based detection of jersey numbers and colors seemed promising at first, but presented its own set of challenges. Jersey numbers were frequently obscured by player movement, weather conditions could affect visibility, and varying camera angles made consistent detection unreliable.

Ultimately, the team settled on an audio logging and automated clip generation solution, which proved superior for several reasons:

  • More reliable player identification, because announcers consistently call out player numbers and team colors
  • Better performance in varying environmental conditions, because audio quality remains relatively consistent even in challenging weather or lighting
  • Reduced processing complexity and computational requirements compared to video-based analysis
  • More cost-effective due to lower computational demands and higher accuracy rates
  • Ability to capture additional context from announcer commentary, such as play descriptions and game situations

This solution uses several key AWS services:

  • Amazon Simple Storage Service (Amazon S3):
    • Used for storing the input and output video files
    • Provides scalable and durable storage to handle SkillShow’s large video data volume of 230 TB per year
    • Allows for straightforward access and integration with other AWS services in the processing pipeline
  • AWS Lambda:
    • Serverless compute service used to power the automated processing workflows
    • Triggers the various functions that orchestrate the video processing, such as transcription and clip generation
    • Enables event-driven, scalable, and cost-effective processing without the need to manage underlying infrastructure
  • Amazon Transcribe:
    • Automatic speech recognition (ASR) service used to convert the video audio into text transcripts
    • Provides the foundation for analyzing the video content and identifying player details
    • Allows for accurate speech-to-text conversion, even in noisy sports environments

The following diagram illustrates the solution architecture.

Workflow diagram of AWS services for audio processing: S3, Lambda, and Amazon Transcribe

SkillShow AWS Architecture Diagram

The architectural flow is as follows:

  1. The authorized user uploads a .csv file containing roster information (such as jersey color, number, player name, and school) and the video footage of players.
  2. A Lambda function is triggered by the upload of the video.
  3. The auto-transcript Lambda function uses Amazon Transcribe to generate a timestamped transcript of what is said in the input video.
  4. The transcript is uploaded to the output S3 bucket under transcripts/ for further use.
  5. The authorized user can invoke the auto-clipper Lambda function with an AWS Command Line Interface (AWS CLI) command.
  6. The function parses the transcript against player information from the roster.
  7. When identifying players, the function clips videos based on a specified keyword (in SkillShow’s case, it was “Next”) and uploads them to the output S3 bucket under segments/.

By using this suite of AWS services, SkillShow was able to build a scalable, cost-effective, and highly automated video processing solution that addressed their key operational challenges. The cloud-based architecture provides the flexibility and scalability required to handle their growing data volumes and evolving business needs.

Example processing workflow

Let’s explore an example processing workflow. As shown in the following screenshots, we first upload a player roster .csv and video file to the input bucket.

Amazon S3 management console showing two files in skillshow-input-videos bucket with metadata and actions

The auto-transcribe function processes the audio.

Amazon S3 management console displaying transcripts folder contents, including JSON output and temp file

The auto-clipper function segments the video based on player information.

AWS Lambda console displaying test event configuration with S3 bucket and file path parameters

Final clips are uploaded to the output bucket between two separate folders: a prefix of the input video name or Unnamed/ if the transcription was unclear or missing the player name within the segment.

Amazon S3 management interface showing two empty folders within skillshow-output-videos/segments path

Named videos can be viewed in the first folder where SkillShow’s current naming convention (jersey color_number_event video name) is followed for editors to download on demand.

S3 bucket interface showing four timestamped MP4 video segments with metadata and storage details

Unnamed videos can be seen in a similar naming convention, only missing the unique player name. Now, the editors only have to review files in this folder and manually rename the file instead of having to do this for entire event videos.

Amazon S3 interface showing segments/Unnamed folder containing unnamed MP4 file with creation date and storage details

Results and benefits

After implementing this AWS powered solution, SkillShow transformed their video processing operations. The automated pipeline reduced video production time from 3 weeks to 24 hours per event, enabling faster delivery to athletes and scouts. A recent event in Chicago showcased the system’s effectiveness. The automated pipeline processed 69 clips, accurately cutting and naming 64 of them—achieving a 93% success rate. This high accuracy demonstrates the solution’s ability to handle real-world scenarios effectively. The system also proved adaptable, quickly addressing initial challenges such as color naming inconsistencies.

The Northwest Indoor event further illustrated the system’s scalability and versatility. Here, the automated process handled a larger volume of approximately 270 clips, maintaining an estimated accuracy rate of over 90%. Notably, this event included batting practice footage, highlighting the solution’s adaptability to various types of sports activities.

With this streamlined workflow, SkillShow has expanded its capacity to process multiple events simultaneously, significantly enhancing its ability to serve youth sports leagues. The standardized output format and improved player identification accuracy have enhanced the viewing experience for athletes, coaches, and scouts alike. Although the time savings varies depending on specific event conditions and filming techniques, the system has demonstrated its potential to substantially reduce manual editing work. SkillShow continues to refine the process, carefully balancing automation with quality control to provide optimal results across diverse event types. These improvements positioned SkillShow to meet the growing demand for youth sports video content while maintaining consistent quality across all events.

Conclusion

This solution demonstrates how AWS ML services can transform resource-intensive video processing workflows into efficient, automated systems. By combining the scalable storage of Amazon S3, serverless computing with Lambda, and the speech recognition capabilities of Amazon Transcribe, organizations can dramatically reduce processing times and operational costs. As a leader in automated sports video production, SkillShow has pioneered this approach for youth sports while demonstrating its adaptability to various content types, from educational videos to corporate training. They’re already exploring additional artificial intelligence and machine learning (AI/ML) capabilities for automated highlight generation, real-time processing for live events, and deeper integration with sports leagues and organizations.

For organizations looking to further enhance their video processing capabilities, Amazon Bedrock Data Automation offers additional possibilities. Amazon Bedrock Data Automation can streamline the generation of valuable insights from unstructured, multimodal content such as documents, images, audio, and videos. This fully managed capability could potentially be integrated into workflows similar to SkillShow’s, offering features such as automated video summaries, content moderation, and custom extraction of relevant information from video content. Furthermore, Amazon Bedrock Data Automation can generate custom insights from audio, including summaries and sentiment analysis, providing even deeper understanding of spoken content in sports videos.

SkillShow’s success highlights the broader potential of cloud-based video processing. As demand for video content continues to grow across industries, organizations can use AWS ML services to automate their workflows, reduce manual effort, and focus on delivering value to their customers rather than managing complex editing operations.

Are you interested in implementing similar automated video processing workflows for your organization? Contact SkillShow to learn how their pipeline built with AWS services can transform your content production process.


About the Authors

Ragib Ahsan is a Partner Solutions Architect at Amazon Web Services (AWS), where he helps organizations build and implement AI/ML solutions. Specializing in computer vision, he works with AWS partners to create practical applications using cloud technologies. Ahsan is particularly passionate about serverless architecture and its role in making solutions more accessible and efficient.

Tom Koerick is the owner and CEO of SkillShow, a sports media network company that has been filming youth sporting events nationwide since 2001. A former professional baseball player turned entrepreneur, Tom develops video solutions for event organizers and families in the youth sports industry. His focus includes college recruiting, social media sharing, and B2B services that provide added value and revenue generation opportunities in youth sports.

Read More

NewDay builds A Generative AI based Customer service Agent Assist with over 90% accuracy

NewDay builds A Generative AI based Customer service Agent Assist with over 90% accuracy

This post is co-written with Sergio Zavota and Amy Perring from NewDay.

NewDay has a clear and defining purpose: to help people move forward with credit. NewDay provides around 4 million customers access to credit responsibly and delivers exceptional customer experiences, powered by their in-house technology system. NewDay’s contact center handles 2.5 million calls annually, so having the right technology to empower their customer service agents to have effective conversations with customers is paramount to deliver great customer experience.

The role of the contact center is complex, and with nearly 200 knowledge articles in Customer Services alone, there are times where an agent needs to search the right answer for a customer question from these articles. This led to a hackathon problem statement in early 2024 for NewDay: how can they harness the power of generative AI to improve the speed to resolution, improving both the customer and agent experience.

The hackathon event led to the creation of NewAssist—a real-time generative AI assistant designed to empower customer service agents with speech-to-text capabilities. Built on Amazon Bedrock, NewAssist would deliver rapid, context-aware support during live interactions with customers.

In this post, we share how NewDay turned their hackathon idea into a a successful Generative AI based solution and their learnings during this journey

Inception and early challenges

NewAssist won the hackathon event by showcasing the potential generative AI could deliver on speed of call resolution. However, despite a positive start, the team faced significant hurdles:

  • Managing costs and competing priorities – Amid large strategic initiatives and limited resources, the team remained focused and proactive, even as securing executive buy-in proved challenging
  • Lack of infrastructure – The existing legacy systems were not conducive to rapid experimentation
  • Unproven technology – The NewAssist team needed to prove the investment would truly add value back to the business

Realizing their ambitions of a fully fledged voice assistant were too ambitious given the challenges, the team made a strategic pivot. They scaled back to a chatbot solution, concentrating on standing up a proof of concept to validate that their existing knowledge management solution would work effectively with generative AI technology. The NewDay contact center team’s goal is to use one source of truth for its future generative AI solutions, so this task was crucial in setting the right foundation for a solid long-term strategy.With an agile, step-by-step approach, a small cross-functional team of three experts set out to build the proof of concept with a target of 80% accuracy. A golden dataset of over 100 questions and correct answers for these questions was created and the generative AI application was tested with this dataset to evaluate its accuracy of responses.

Solution overview

NewAssist’s technical design and implementation were executed by following these principles:

  • Embrace a culture of experimentation A small cross-functional team of three people was formed. The team followed the Improvement Kata methodology to implement rapid Build-Measure-Learn cycles. In just 10 weeks and over 8 experiment loops, the team honed the solution. Early iterations saw accuracy below 60%, but through rigorous testing and smart data strategies, they boosted performance to over 80%, a 33% improvement in just a few weeks.
  • Adopt a serverless Infrastructure Amazon Bedrock, AWS Fargate, AWS Lambda, Amazon API Gateway, and Amazon OpenSearch Serverless formed the backbone of the application. This approach not only reduced costs (with running cost kept under $400 per month), but also made sure the system could scale in response to real-time demand. In addition, this allowed the developer of the team to focus only on activities that would validate the result of the experiments without worrying about managing infrastructure.

NewAssist is implemented as a Retrieval Augmented Generation (RAG) solution. The following diagram shows the high-level solution architecture.

NewAssist Architecture with Cognito, Lambda and Amazon Bedrock

The high-level architecture is made up of five components:

  • User interface – A simple AI assistant UI is built using the Streamlit framework. Users can log in, ask questions, give feedback to answers in the form of thumbs up and thumbs down, and optionally provide a comment to explain the reason for the bad feedback. The UI is hosted using Fargate and authentication is implemented through Amazon Cognito with Microsoft Entra ID integration to provide single sign-on (SSO) capabilities to customer service agents.
  • Knowledge base processing – This component mostly drove the 40% increase in accuracy. Here, articles are retrieved by using APIs from the third-party knowledge base and chunked with a defined chunking strategy. The chunks are processed to convert to vector embeddings and finally stored in the vector database implemented using OpenSearch Serverless.
  • Suggestion generation – Questions on the UI are forwarded to the suggestion generation component, which retrieves the most relevant chunks and passess these chunks to the large language model (LLM) for generating suggestions based on the context. Anthropic’s Claude 3 Haiku was the preferred LLM and was accessed through Amazon Bedrock. Anthropic’s Claude 3 Haiku is still used at the time of writing, even though more performant models have been released. There are two reasons for this: first, it’s the most cost-effective model accessible through Amazon Bedrock that provides satisfying results; second, NewDay has a response time requirement of a maximum of 5 seconds, which Anthropic’s Claude 3 Haiku satisfies. To achieve required accuracy, NewDay experimented with different chunking strategies and retrieval configurations while maintaining cost with Anthropic Claude 3 Haiku.
  • Observability – Questions and answers with feedback are logged into Snowflake. A dashboard is created on top of it to show different metrics, such as accuracy. Every week, business experts review the answers with bad feedback, and AI engineers translate them into experiments that, if successful, increase the solution’s performance. Additionally, Amazon CloudWatch logs the requests that the AWS services described in the architecture process.
  • Offline evaluation – When a new version of NewAssist is created during the experimentation cycles, it is first evaluated in pre-production against an evaluation dataset. If the version’s accuracy surpasses a specified threshold, then it can be deployed in production.

Understand your data and invest in a robust data processing solution

The one experiment that had the biggest impact on the accuracy of NewAssist, increasing it by 20%, was replacing the general-purpose data parser for knowledge articles with a custom-built version.This new parser was designed specifically to understand the structure and meaning of NewDay’s data, and by using this data, the LLM could generate more accurate outputs.Initially, the workflow that implements the data processing logic consisted of the following steps:

  1. Manually extract the articles from the data source and save them in PDF.
  2. Use PyPDF to parse the articles.

With this approach, the solution was performing at around 60% accuracy. The simple reason was that the logic didn’t take into account the type of data that was being processed, providing below-average results. Things changed when NewDay started studying their data.In NewDay, knowledge articles for agents are created by a team of experts in the contact center area. They create articles using a specific methodology and store them in a third-party content management system. This system in particular allows the creation of articles through widgets. For example, lists, banners, and tables.In addition, the system provides APIs that can be used to retrieve articles. The articles are returned in the form of a JSON object, where each object contains a widget. There is a limited number of widgets available, and each one of them has a specific JSON schema.Given this discovery, the team studied each single widget schema and created a bespoke parsing logic that extracts the relevant content and formats it in a polished way.It took longer than simply parsing with PyPDF, but the results were positive. Just focusing on the data and without touching the AI component, the solution’s accuracy increased from 60% to 73%. This demonstrated that data quality plays a key role in developing an effective generative AI application.

Understand how your users use the solution

With the 80% accuracy milestone, the team proved that the proof of concept could work, so they obtained approval to expand experimentation to 10 customer service agents after just 3 months. NewDay selected 10 experienced agents because they needed to identify where the solution gave an incorrect response.As soon as NewAssist was handed over to customer service agents, something unexpected happened. Agents used NewAssist differently from what the NewDay technical team expected: they used various acronyms in their questions to NewAssist. As an example, consider the following questions:

  • How do I set a direct debit for a customer?
  • How do I set a dd for a cst?

Here, direct debit is abbreviated with “dd” and customer with “cst.” Unless this information is provided in the context, the LLM will struggle to provide the right answer. As a result, NewAssist’s accuracy dropped to 70% when agents started using it.The solution NewDay adopted was to statically inject the acronyms and abbreviations in the LLM prompt so it could better understand the question. Slowly, the accuracy recovered to over 80% . This is just a simple example that demonstrates how important it is to put a product in the hands of the final users to validate the assumptions.Another positive finding discovered was that agents would use NewAssist to understand how to explain a process to a customer. As we know, it’s difficult to translate technical content into a format that non-technical people understand. Agents started to ask NewAssist questions like: “How do I explain to a customer how to unlock their account?” with the outcome of producing a great answer they could just read to customers.

Scaling up for greater impact

By expanding NewDay’s experimentation to 10 agents, NewDay was able to test many different scenarios. Negative responses were reviewed and root cause analysis conducted. The NewAssist team identified several gaps in the knowledge base, which they solved with new and improved content. They made enhancements to the solution by training it on acronyms and internal language. Additionally, they provided training and feedback to the pilot team on how to effectively use the solution.By doing this, the NewAssist Team improved the accuracy to over 90% and gained approval from NewDay’s executive team to productionize the solution. NewDay is currently rolling out the solution to over 150 agents, with plans to expand the scope of the solution to all departments within Customer Operations (such as Fraud and Collections).Early results indicate a substantial reduction in the time it takes to retireve an answer to queries being raised by agents. Previously, it would take them on average 90 seconds to retrieve an answer; the solution now retrieves an answer in 4 seconds.

Learnings to build a production-ready generative AI application

NewDay acquired the following insights by deploying a production-ready generative AI application:

  • Embrace a culture of experimentation – This includes the following strategies:
    • Adopt an agile, iterative approach to rapidly test hypotheses and improve the solution
    • Implement methodologies like the Improvement Kata and Build-Measure-Learn cycles to achieve significant gains in short time frames
    • Start small with a focused proof of concept and gradually scale to validate effectiveness before full deployment
  • Focus on data quality Invest time in understanding and properly processing your data, because this can yield substantial improvements
  • Understand how your users interact with the product – This includes the following steps:
    • Conduct real-world testing with actual users to uncover unexpected usage patterns and behaviors
    • Be prepared to adapt your solution based on user insights, such as accommodating internal jargon or abbreviations
    • Look for unforeseen use cases that might emerge during user testing, because these can provide valuable directions for feature development
    • Balance AI capabilities with human expertise, recognizing the importance of oversight and training to facilitate optimal use of the technology.

Looking ahead

NewAssist’s journey is far from over. Due to a robust feedback mechanism and the right level of oversight, the team will continue to deliver optimizations to improve the accuracy of the output further. Future iterations will explore deeper integrations with AWS AI services, further refining the balance between human and machine intelligence in customer interactions.By adopting AWS serverless solutions and adopting an agile, data-driven approach, NewDay turned a hackathon idea into a powerful tool that has optimized customer services. The success of NewAssist is a testament to the innovation possible when creativity meets robust cloud infrastructure, setting the stage for the next wave of advancements in contact center technology.

Conclusion

NewAssist’s journey demonstrates the power of AWS in enabling rapid experimentation and deployment of RAG solutions. For organizations looking to enhance customer service, streamline operations, or unlock new insights from data, AWS provides the tools and infrastructure to drive innovation, in addition to numerous other opportunities:

  • Accelerate RAG experiments Services like Amazon Bedrock, Lambda, and Amazon Serverless enable quick building and iteration of ideas
  • Scale with confidence AWS serverless offerings provide effective cost management while making sure solutions can grow with demand
  • Focus on data quality If data quality isn’t good enough at the source, you can implement data processing, cleansing, and extraction techniques to improve the accuracy of responses
  • Streamline deployment Fargate and API Gateway simplify the process of moving from proof of concept to production-ready applications
  • Optimize for performance Cross-Region inference and other AWS features help meet strict latency requirements while balancing cost considerations.

To learn more on how AWS can help you in your Generative AI Journey, visit : Transform your business with generative AI.


About the authors

Kaushal Goyal is a Solutions Architect at AWS, working with Enterprise Financial Services in the UK and Ireland region. With a strong background in banking technology, Kaushal previously led digital transformation initiatives at major banks. At AWS, Kaushal helps financial institutions modernize legacy systems and implement cloud-native solutions. As a Generative AI enthusiast and Container Specialist, Kaushal focuses on bringing innovative AI solutions to enterprise customers and share the learnings through blogs, public speaking.

Sergio Zavota is an AI Architect at NewDay, specializing in MLOps and Generative AI. Sergio designs scalable platforms to productionize machine learning workloads and enable Generative AI at scale in Newday. Sergio shares his expertise at industry conferences and workshops, focusing on how to productionise AI solutions and aligning AI with organisational goals.

Amy Perring is a Senior Optimisation Manager at NewDay, based in London. She specialises in building a deep understanding of contact drivers through customer and agent feedback. This helps identify optimisation opportunities to improve overall efficiency and experience, through the introduction or improvement of products and processes.

Mayur Udernani leads AWS Generative AI & ML business with commercial enterprises in UK & Ireland. In his role, Mayur spends majority of his time with customers and partners to help create impactful solutions that solve the most pressing needs of a customer or for a wider industry leveraging AWS Cloud, Generative AI & ML services. Mayur lives in the London area. He has an MBA from Indian Institute of Management and Bachelors in Computer Engineering from Mumbai University.

Read More

NVIDIA and Partners Highlight Next-Generation Robotics, Automation and AI Technologies at Automatica

NVIDIA and Partners Highlight Next-Generation Robotics, Automation and AI Technologies at Automatica

From the heart of Germany’s automotive sector to manufacturing hubs across France and Italy, Europe is embracing industrial AI and advanced AI-powered robotics to address labor shortages, boost productivity and fuel sustainable economic growth.

Robotics companies are developing humanoid robots and collaborative systems that integrate AI into real-world manufacturing applications. Supported by a $200 billion investment initiative and coordinated efforts from the European Commission, Europe is positioning itself at the forefront of the next wave of industrial automation, powered by AI.

This momentum is on full display at Automatica — Europe’s premier conference on advancements in robotics, machine vision and intelligent manufacturing — taking place this week in Munich, Germany.

NVIDIA and its ecosystem of partners and customers are showcasing next-generation robots, automation and AI technologies designed to accelerate the continent’s leadership in smart manufacturing and logistics.

NVIDIA Technologies Boost Robotics Development 

Central to advancing robotics development is Europe’s first industrial AI cloud, announced at NVIDIA GTC Paris at VivaTech earlier this month. The Germany-based AI factory, featuring 10,000 NVIDIA GPUs, provides European manufacturers with secure, sovereign and centralized AI infrastructure for industrial workloads. It will support applications ranging from design and engineering to factory digital twins and robotics.

To help accelerate humanoid development, NVIDIA released NVIDIA Isaac GR00T N1.5 — an open foundation model for humanoid robot reasoning and skills. This update enhances the model’s adaptability and ability to follow instructions, significantly improving its performance in material handling and manufacturing tasks.

To help post-train GR00T N1.5, NVIDIA has also released the Isaac GR00T-Dreams blueprint — a reference workflow for generating vast amounts of synthetic trajectory data from a small number of human demonstrations — enabling robots to generalize across behaviors and adapt to new environments with minimal human demonstration data.

In addition, early developer previews of NVIDIA Isaac Sim 5.0 and Isaac Lab 2.2 — open-source robot simulation and learning frameworks optimized for NVIDIA RTX PRO 6000 workstations — are now available on GitHub.

Image courtesy of Wandelbots.

Robotics Leaders Tap NVIDIA Simulation Technology to Develop and Deploy Humanoids and More 

Robotics developers and solutions providers across the globe are integrating NVIDIA’s three computers to train, simulate and deploy robots.

NEURA Robotics, a German robotics company and pioneer for cognitive robots, unveiled the third generation of its humanoid, 4NE1, designed to assist humans in domestic and professional environments through advanced cognitive capabilities and humanlike interaction. 4NE1 is powered by GR00T N1 and was trained in Isaac Sim and Isaac Lab before real-world deployment.

NEURA Robotics is also presenting Neuraverse, a digital twin and interconnected ecosystem for robot training, skills and applications, fully compatible with NVIDIA Omniverse technologies.

Delta Electronics, a global leader in power management and smart green solutions, is debuting two next-generation collaborative robots: D-Bot Mar and D-Bot 2 in 1 — both trained using Omniverse and Isaac Sim technologies and libraries. These cobots are engineered to transform intralogistics and optimize production flows.

Wandelbots, the creator of the Wandelbots NOVA software platform for industrial robotics, is partnering with SoftServe, a global IT consulting and digital services provider, to scale simulation-first automating using NVIDIA Isaac Sim, enabling virtual validation and real-world deployment with maximum impact.

Cyngyn, a pioneer in autonomous mobile robotics, is integrating its DriveMod technology into Isaac Sim to enable large-scale, high fidelity virtual testing of advanced autonomous operation. Purpose-built for industrial applications, DriveMod is already deployed on vehicles such as the Motrec MT-160 Tugger and BYD Forklift, delivering sophisticated automation to material handling operations.

Doosan Robotics, a company specializing in AI robotic solutions, will showcase its “sim to real” solution, using NVIDIA Isaac Sim and cuRobo. Doosan will be showcasing how to seamlessly transfer tasks from simulation to real robots across a wide range of applications — from manufacturing to service industries.

Franka Robotics has integrated Isaac GR00T N1.5 into a dual-arm Franka Research 3 (FR3) robot for robotic control. The integration of GR00T N1.5 allows the system to interpret visual input, understand task context and autonomously perform complex manipulation — without the need for task-specific programming or hardcoded logic.

Image courtesy of Franka Robotics.

Hexagon, the global leader in measurement technologies, launched its new humanoid, dubbed AEON. With its unique locomotion system and multimodal sensor fusion, and powered by NVIDIA’s three-computer solution, AEON is engineered to perform a wide range of industrial applications, from manipulation and asset inspection to reality capture and operator support.

Intrinsic, a software and AI robotics company, is integrating Intrinsic Flowstate with  Omniverse and OpenUSD for advanced visualization and digital twins that can be used in many industrial use cases. The company is also using NVIDIA foundation models to enhance robot capabilities like grasp planning through AI and simulation technologies.

SCHUNK, a global leader in gripping systems and automation technology, is showcasing its innovative grasping kit powered by the NVIDIA Jetson AGX Orin module. The kit intelligently detects objects and calculates optimal grasping points. Schunk is also demonstrating seamless simulation-to-reality transfer using IGS Virtuous software — built on Omniverse technologies — to control a real robot through simulation in a pick-and-place scenario.

Universal Robots is showcasing UR15, its fastest cobot yet. Powered by the UR AI Accelerator — developed with NVIDIA and running on Jetson AGX Orin using CUDA-accelerated Isaac libraries — UR15 helps set a new standard for industrial automation.

Vention, a full-stack software and hardware automation company, launched its Machine Motion AI, built on CUDA-accelerated Isaac libraries and powered by Jetson. Vention is also expanding its lineup of robotic offerings by adding the FR3 robot from Franka Robotics to its ecosystem, enhancing its solutions for academic and research applications.

Image courtesy of Vention.

Learn more about the latest robotics advancements by joining NVIDIA at Automatica, running through Friday, June 27. 

Read More

No-code data preparation for time series forecasting using Amazon SageMaker Canvas

No-code data preparation for time series forecasting using Amazon SageMaker Canvas

Time series forecasting helps businesses predict future trends based on historical data patterns, whether it’s for sales projections, inventory management, or demand forecasting. Traditional approaches require extensive knowledge of statistical methods and data science methods to process raw time series data.

Amazon SageMaker Canvas offers no-code solutions that simplify data wrangling, making time series forecasting accessible to all users regardless of their technical background. In this post, we explore how SageMaker Canvas and SageMaker Data Wrangler provide no-code data preparation techniques that empower users of all backgrounds to prepare data and build time series forecasting models in a single interface with confidence.

Solution overview

Using SageMaker Data Wrangler for data preparation allows for the modification of data for predictive analytics without programming knowledge. In this solution, we demonstrate the steps associated with this process. The solution includes the following:

  • Data Import from varying sources
  • Automated no-code algorithmic recommendations for data preparation
  • Step-by-step processes for preparation and analysis
  • Visual interfaces for data visualization and analysis
  • Export capabilities post data preparation
  • Built in security and compliance features

In this post, we focus on data preparation for time series forecasting using SageMaker Canvas.

Walkthrough

The following is a walkthrough of the solution for data preparation using Amazon SageMaker Canvas. For the walkthrough, you use the consumer electronics synthetic dataset found in this SageMaker Canvas Immersion Day lab, which we encourage you to try. This consumer electronics related time series (RTS) dataset primarily contains historical price data that corresponds to sales transactions over time. This dataset is designed to complement target time series (TTS) data to improve prediction accuracy in forecasting models, particularly for consumer electronics sales, where price changes can significantly impact buying behavior. The dataset can be used for demand forecasting, price optimization, and market analysis in the consumer electronics sector.

Prerequisites

For this walkthrough, you should have the following prerequisites:

Solution walkthrough

Below, we will provide the solution walkthrough and explain how users are able to use a dataset, prepare the data using no code using Data Wrangler, and run and train a time series forecasting model using SageMaker Canvas.

Sign in to the AWS Management Console and go to Amazon SageMaker AI and then to Canvas. On the Get started page, select Import and prepare option. You will see the following options to import your data set into Sagemaker Data Wrangler. First, select Tabular Data as we will be utilizing this data for our time series forecasting. You will see the following options available to select from:

  1. Local upload
  2. Canvas Datasets
  3. Amazon S3
  4. Amazon Redshift
  5. Amazon Athena
  6. Databricks
  7. MySQL
  8. PostgreSQL
  9. SQL Server
  10. RDS

For this demo, select Local upload. When you use this option, the data is stored in the SageMaker instance, specifically on an Amazon Elastic File System (Amazon EFS) storage volume in the SageMaker Studio environment. This storage is tied to the SageMaker Studio instance, but for more permanent data storage purposes, Amazon Simple Storage Service (Amazon S3) is a good option when working with SageMaker Data Wrangler. For long term data management, Amazon S3 is recommended.

Select the consumer_electronics.csv file from the prerequisites. After selecting the file to import,  you can use the Import settings panel to set your desired configurations. For the purpose of this demo, leave the options to their default values.

Import tabular data screen with sampling methods and sampling size

After the import is complete, use the Data flow options to modify the newly imported data. For future data forecasting, you may need to clean up data for the service to properly understand the values and disregard any errors in the data. SageMaker Canvas has various offerings to accomplish this. Options include Chat for data prep with natural language data modifications and Add Transform. Chat for data prep may be best for users who prefer natural language processing (NLP) interactions and may not be familiar with technical data transformations. Add transform is best for data professionals who know which transformations they want to apply to their data.

For time series forecasting using Amazon SageMaker Canvas, data must be prepared in a certain way for the service to properly forecast and understand the data. To make a time series forecast using SageMaker Canvas, the documentation linked mentions the following requirements:

  • A timestamp column with all values having the datetime type.
  • A target column that has the values that you’re using to forecast future values.
  • An item ID column that contains unique identifiers for each item in your dataset, such as SKU numbers.

The datetime values in the timestamp column must use one of the following formats:

  • YYYY-MM-DD HH:MM:SS
  • YYYY-MM-DDTHH:MM:SSZ
  • YYYY-MM-DD
  • MM/DD/YY
  • MM/DD/YY HH:MM
  • MM/DD/YYYY
  • YYYY/MM/DD HH:MM:SS
  • YYYY/MM/DD
  • DD/MM/YYYY
  • DD/MM/YY
  • DD-MM-YY
  • DD-MM-YYYY

You can make forecasts for the following intervals:

  • 1 min
  • 5 min
  • 15 min
  • 30 min
  • 1 hour
  • 1 day
  • 1 week
  • 1 month
  • 1 year

For this example, remove the $ in the data, by using the Chat for data prep option. Give the chat a prompt such as Can you get rid of the $ in my data, and it will generate code to accommodate your request and modify the data, giving you a no-code solution to prepare the data for future modeling and predictive analysis. Choose Add to Steps to accept this code and apply changes to the data.

Chat for data prep options

You can also convert values to float data type and check for missing data in your uploaded CSV file using either Chat for data prep or Add Transform options. To drop missing values using Data Transform:

  1. Select Add Transform from the interface
  2. Choose Handle Missing from the transform options
  3. Select Drop missing from the available operations
  4. Choose the columns you want to check for missing values
  5. Select Preview to verify the changes
  6. Choose Add to confirm and apply the transformation

SageMaker Data Wrangler interface displaying consumer electronics data, column distributions, and options to handle missing values across all columns

For time-series forecasting, inferring missing values and resampling the data set to a certain frequency (hourly, daily, or weekly) are also important. In SageMaker Data Wrangler, the frequency of data can be altered by choosing Add Transform, selecting Time Series, selecting Resample from the Transform drop down, and then selecting the Timestamp dropdown, ts in this example. Then, you can select advanced options. For example, choose Frequency unit and then select the desired frequency from the list.

SageMaker Data Wrangler interface featuring consumer electronics data, column-wise visualizations, and time series resampling configuration

SageMaker Data Wrangler offers several methods to handle missing values in time-series data through its Handle missing transform. You can choose from options such as forward fill or backward fill, which are particularly useful for maintaining the temporal structure of the data. These operations can be applied by using natural language commands in Chat for data prep, allowing flexible and efficient handling of missing values in time-series forecasting preparation.
Data preprocessing interface displaying retail demand dataset with visualization, statistics, and imputation configuration

To create the data flow, choose Create model. Then, choose Run Validation, which checks the data to make sure the processes were done correctly. After this step of data transformation, you can access additional options by selecting the purple plus sign. The options include Get data insights, Chat for data prep, Combine data, Create model, and Export.Data Wrangler interface displaying validated data flow from local upload to drop missing step, with additional data preparation options

The prepared data can then be connected to SageMaker AI for time series forecasting strategies, in this case, to predict the future demand based on the historical data that has been prepared for machine learning.

When using SageMaker, it is also important to consider data storage and security. For the local import feature, data is stored on Amazon EFS volumes and encrypted by default. For more permanent storage, Amazon S3 is recommended. S3 offers security features such as server-side encryption (SSE-S3, SSE-KMS, or SSE-C), fine-grained access controls through AWS Identity and Access Management (IAM) roles and bucket policies, and the ability to use VPC endpoints for added network security. To help ensure data security in either case, it’s important to implement proper access controls, use encryption for data at rest and in transit, regularly audit access logs, and follow the principle of least privilege when assigning permissions.

In this next step, you learn how to train a model using SageMaker Canvas. Based on the previous step, select the purple plus sign and select Create Model, and then select Export to create a model. After selecting a column to predict (select price for this example), you go to the Build screen, with options such as Quick build and Standard build. Based on the column chosen, the model will predict future values based on the data that is being used.

SageMaker Canvas Version 1 model configuration interface for 3+ category price prediction with 20k sample dataset analysis

Clean up

To avoid incurring future charges, delete the SageMaker Data Wrangler data flow and S3 Buckets if used for storage.

  1. In the SageMaker console, navigate to Canvas
  2. Select Import and prepare
  3. Find your data flow in the list
  4. Click the three dots (⋮) menu next to your flow
  5. Select Delete to remove the data flow
    SageMaker Data Wrangler dashboard with recent data flow, last update time, and options to manage flows and create models

If you used S3 for storage:

  1. Open the Amazon S3 console
  2. Navigate to your bucket
  3. Select the bucket used for this project
  4. Choose Delete
  5. Type the bucket name to confirm deletion
  6. Select Delete bucket

Conclusion

In this post, we showed you how Amazon SageMaker Data Wrangler offers a no-code solution for time series data preparation, traditionally a task requiring technical expertise. By using the intuitive interface of the Data Wrangler console and natural language-powered tools, even users who don’t have a technical background can effectively prepare their data for future forecasting needs. This democratization of data preparation not only saves time and resources but also empowers a wider range of professionals to engage in data-driven decision-making.


About the author

Muni T. Bondu is a Solutions Architect at Amazon Web Services (AWS), based in Austin, Texas. She holds a Bachelor of Science in Computer Science, with concentrations in Artificial Intelligence and Human-Computer Interaction, from the Georgia Institute of Technology.

Read More

Build an agentic multimodal AI assistant with Amazon Nova and Amazon Bedrock Data Automation

Build an agentic multimodal AI assistant with Amazon Nova and Amazon Bedrock Data Automation

Modern enterprises are rich in data that spans multiple modalities—from text documents and PDFs to presentation slides, images, audio recordings, and more. Imagine asking an AI assistant about your company’s quarterly earnings call: the assistant should not only read the transcript but also “see” the charts in the presentation slides and “hear” the CEO’s remarks. Gartner predicts that by 2027, 40% of generative AI solutions will be multimodal (text, image, audio, video), up from only 1% in 2023. This shift underlines how vital multimodal understanding is becoming for business applications. Achieving this requires a multimodal generative AI assistant—one that can understand and combine text, visuals, and other data types. It also requires an agentic architecture so the AI assistant can actively retrieve information, plan tasks, and make decisions on tool calling, rather than just responding passively to prompts.

In this post, we explore a solution that does exactly that—using Amazon Nova Pro, a multimodal large language model (LLM) from AWS, as the central orchestrator, along with powerful new Amazon Bedrock features like Amazon Bedrock Data Automation for processing multimodal data. We demonstrate how agentic workflow patterns such as Retrieval Augmented Generation (RAG), multi-tool orchestration, and conditional routing with LangGraph enable end-to-end solutions that artificial intelligence and machine learning (AI/ML) developers and enterprise architects can adopt and extend. We walk through an example of a financial management AI assistant that can provide quantitative research and grounded financial advice by analyzing both the earnings call (audio) and the presentation slides (images), along with relevant financial data feeds. We also highlight how you can apply this pattern in industries like finance, healthcare, and manufacturing.

Overview of the agentic workflow

The core of the agentic pattern consists of the following stages:

  • Reason – The agent (often an LLM) examines the user’s request and the current context or state. It decides what the next step should be—whether that’s providing a direct answer or invoking a tool or sub-task to get more information.
  • Act – The agent executes that step. This could mean calling a tool or function, such as a search query, a database lookup, or a document analysis using Amazon Bedrock Data Automation.
  • Observe – The agent observes the result of the action. For instance, it reads the retrieved text or data that came back from the tool.
  • Loop – With new information in hand, the agent reasons again, deciding if the task is complete or if another step is needed. This loop continues until the agent determines it can produce a final answer for the user.

This iterative decision-making enables the agent to handle complex requests that are impossible to fulfill with a single prompt. However, implementing agentic systems can be challenging. They introduce more complexity in the control flow, and naive agents can be inefficient (making too many tool calls or looping unnecessarily) or hard to manage as they scale. This is where structured frameworks like LangGraph come in. LangGraph makes it possible to define a directed graph (or state machine) of potential actions with well-defined nodes (actions like “Report Writer” or “Query Knowledge Base”) and edges (allowable transitions). Although the agent’s internal reasoning still decides which path to take, LangGraph makes sure the process remains manageable and transparent. This controlled flexibility means the assistant has enough autonomy to handle diverse tasks while making sure the overall workflow is stable and predictable.

Solution overview

This solution is a financial management AI assistant designed to help analysts query portfolios, analyze companies, and generate reports. At its core is Amazon Nova, an LLM that acts as an intelligent LLM for inference. Amazon Nova processes text, images, or documents (like earnings call slides), and dynamically decides which tools to use to fulfill requests. Amazon Nova is optimized for enterprise tasks and supports function calling, so the model can plan actions and call tools in a structured way. With a large context window (up to 300,000 tokens in Amazon Nova Lite and Amazon Nova Pro), it can manage long documents or conversation history when reasoning.

The workflow consists of the following key components:

  • Knowledge base retrieval – Both the earnings call audio file and PowerPoint file are processed by Amazon Bedrock Data Automation, a managed service that extracts text, transcribes audio and video, and prepares data for analysis. If the user uploads a PowerPoint file, the system converts each slide into an image (PNG) for efficient search and analysis, a technique inspired by generative AI applications like Manus. Amazon Bedrock Data Automation is effectively a multimodal AI pipeline out of the box. In our architecture, Amazon Bedrock Data Automation acts as a bridge between raw data and the agentic workflow. Then Amazon Bedrock Knowledge Bases converts these chunks extracted from Amazon Bedrock Data Automation into vector embeddings using Amazon Titan Text Embeddings V2, and stores these vectors in an Amazon OpenSearch Serverless database.
  • Router agent – When a user asks a question—for example, “Summarize the key risks in this Q3 earnings report”—Amazon Nova first determines whether the task requires retrieving data, processing a file, or generating a response. It maintains memory of the dialogue, interprets the user’s request, and plans which actions to take to fulfill it. The “Memory & Planning” module in the solution diagram indicates that the router agent can use conversation history and chain-of-thought (CoT) prompting to determine next steps. Crucially, the router agent determines if the query can be answered with internal company data or if it requires external information and tools.
  • Multimodal RAG agent – For queries related with audio and video information, Amazon Bedrock Data Automation uses a unified API call to extract insights from such multimedia data, and stores the extracted insights in Amazon Bedrock Knowledge Bases. Amazon Nova uses Amazon Bedrock Knowledge Bases to retrieve factual answers using semantic search. This makes sure responses are grounded in real data, minimizing hallucination. If Amazon Nova generates an answer, a secondary hallucination check cross-references the response against trusted sources to catch unsupported claims.
  • Hallucination check (quality gate) – To further verify reliability, the workflow can include a postprocessing step using a different foundation model (FM) outside of the Amazon Nova family, such as Anthropic’s Claude, Mistral, or Meta’s Llama, to grade the answer’s faithfulness. For example, after Amazon Nova generates a response, a hallucination detector model or function can compare the answer against the retrieved sources or known facts. If a potential hallucination is detected (the answer isn’t supported by the reference data), the agent can choose to do additional retrieval, adjust the answer, or escalate to a human.
  • Multi-tool collaboration – This multi-tool collaboration allows the AI to not only find information but also take actions before formulating a final answer. This introduces multi-tool options. The supervisor agent might spawn or coordinate multiple tool-specific agents (for example, a web search agent to do a general web search, a stock search agent to get market data, or other specialized agents for company financial metrics or industry news). Each agent performs a focused task (one might call an API or perform a query on the internet) and returns findings to the supervisor agent. Amazon Nova Pro features a strong reasoning ability that allows the supervisor agent to merge these findings. This multi-agent approach follows the principle of dividing complex tasks among specialist agents, improving efficiency and reliability for complex queries.
  • Report creation agent – Another notable aspect in the architecture is the use of Amazon Nova Canvas for output generation. Amazon Nova Canvas is a specialized image-generation model in the Amazon Nova family, but in this context, we use the concept of a “canvas” more figuratively to mean a structured template or format generated content output. For instance, we could define a template for an “investor report” that the assistant fills out: Section 1: Key Highlights (bullet points), Section 2: Financial Summary (table of figures), Section 3: Notable Quotes, and so on. The agent can guide Amazon Nova to populate such a template by providing it with a system prompt containing the desired format (this is similar to few-shot prompting, where the layout is given). The result is that the assistant not only answers ad-hoc questions, but can also produce comprehensive generated reports that look as if a human analyst prepared them, combining text, image, and references to visuals.

These components are orchestrated in an agentic workflow. Instead of a fixed script, the solution uses a dynamic decision graph (implemented with the open source LangGraph library in the notebook solution) to route between steps. The result is an assistant that feels less like a chatbot and more like a collaborative analyst—one that can parse an earnings call audio recording, critique a slide deck, or draft an investor memo with minimal human intervention.

The following diagram shows the high-level architecture of the agentic AI workflow. Amazon Nova orchestrates various tools—including Bedrock Amazon Data Automation for document and image processing and a knowledge base for retrieval—to fulfill complex user requests. For brevity, we don’t list all the code here; the GitHub repo includes a full working example. Developers can run that to see the agent in action and extend it with their own data.

Example of the multi-tool collaboration workflow

To demonstrate the multi-tool collaboration agent workflow, we explore an example of how a question-answer interaction might flow through our deployed system for multi-tool collaboration:

  • User prompt – In the chat UI, the end-user asks a question, such as “What is XXX’s stock performance this year, and how does it compare to its rideshare‑industry peers?”
  • Agent initial response – The agent (Amazon Nova FM orchestrator) receives the question and responds with:
    Received your question. Routing to the reasoning engine…

  • Planning and tool selection – The agent determines that it needs the following:
    • The ticker symbol for the company (XXX)
    • Real‑time stock price and YTD changes
    • Key financial metrics (revenue, net income, price-earnings ratio)
    • Industry benchmarks (peer YTD performance, average revenue growth)
  • Planning execution using tool calls – The agent calls tools to perform the following actions:
    • Look up ticker symbol:
      Agent → WebSearchTool.lookupTicker("XXX Inc")
      WebSearchTool → Agent: returns "XXX"

    • Fetch real‑time stock performance using the retrieved ticker symbol:
      Agent → StockAnalysisTool.getPerformance(
       symbol="XXX",
       period="YTD"
       )
      StockAnalysisTool → Agent:
       {
       currentPrice: 
       ytdChange: 
       52wkRange: 
       volume: 
       }

    • Retrieve company financial metrics using the retrieved ticker symbol:
      Agent → CompanyFinancialAnalysisTool.getMetrics("UBER")
      CompanyFinancialAnalysisTool → Agent:
       {
       revenueQ4_2024: xxx B,
       netIncomeQ4_2024: xxx M,
       peRatio: xxx
       }

    • Gather industry benchmark data using the retrieved ticker symbol:
      Agent → IndustryAnalysisTool.comparePeers(
       symbol="XXX",
       sector="Rideshare"
       )
      IndustryAnalysisTool → Agent:
       {
       avgPeerYTD:
       avgRevenueGrowth: 
       }

    • Validation loop – The agent runs a validation loop:
      Agent: validate()
       ↳ Are all four data points present?
       • Ticker :heavy_check_mark: 
       • Stock performance :heavy_check_mark: 
       • Financial metrics :heavy_check_mark: 
       • Industry benchmark :heavy_check_mark: 
       ↳ All set—no retry needed.

If anything is missing or a tool encountered an error, the FM orchestrator triggers the error handler (up to three retries), then resumes the plan at the failed step.

  • Synthesis and final answer – The agent uses Amazon Nova Pro to synthesize the data points and generate final answers based on these data points.

The following figure shows a flow diagram of this multi-tool collaboration agent.

Benefits of using Amazon Bedrock for scalable generative AI agent workflows

This solution is built on Amazon Bedrock because AWS provides an integrated ecosystem for building such sophisticated solutions at scale:

  • Amazon Bedrock delivers top-tier FMs like Amazon Nova, with managed infrastructure—no need for provisioning GPU servers or handling scaling complexities.
  • Amazon Bedrock Data Automation offers an out-of-the-box solution to process documents, images, audio, and video into actionable data. Amazon Bedrock Data Automation can convert presentation slides to images, convert audio to text, perform OCR, and generate textual summaries or captions that are then indexed in an Amazon Bedrock knowledge bases.
  • Amazon Bedrock Knowledge Bases can store embeddings from unstructured data and support retrieval operations using similarity search.
  • In addition to LangGraph (as shown in this solution), you can also use Amazon Bedrock Agents to develop agentic workflows. Amazon Bedrock Agents simplifies the configuration of tool flows and action groups, so you can declaratively manage your agentic workflows.
  • Applications developed by open source frameworks like LangGraph (an extension of LangChain) can also run and scale with AWS infrastructure such as Amazon Elastic Compute Cloud (Amazon EC2) or Amazon SageMaker instances, so you can define directed graphs for agent orchestration, making it effortless to manage multi-step reasoning and tool chaining.

You don’t need to assemble a dozen disparate systems; AWS provides an integrated network for generative AI workflows.

Considerations and customizations

The architecture demonstrates exceptional flexibility through its modular design principles. At its core, the system uses Amazon Nova FMs, which can be selected based on task complexity. Amazon Nova Micro handles straightforward tasks like classification with minimal latency. Amazon Nova Lite manages moderately complex operations with balanced performance, and Amazon Nova Pro excels at sophisticated tasks requiring advanced reasoning or generating comprehensive responses.

The modular nature of the solution (Amazon Nova, tools, knowledge base, and Amazon Bedrock Data Automation) means each piece can be swapped or adjusted without overhauling the whole system. Solution architects can use this reference architecture as a foundation, implementing customizations as needed. You can seamlessly integrate new capabilities through AWS Lambda functions for specialized operations, and the LangGraph orchestration enables dynamic model selection and sophisticated routing logic. This architectural approach makes sure the system can evolve organically while maintaining operational efficiency and cost-effectiveness.

Bringing it to production requires thoughtful design, but AWS offers scalability, security, and reliability. For instance, you can secure the knowledge base content with encryption and access control, integrate the agent with AWS Identity and Access Management (IAM) to make sure it only performs allowed actions (for example, if an agent can access sensitive financial data, verify it checks user permissions ), and monitor the costs (you can track Amazon Bedrock pricing and tools usage; you might use Provisioned Throughput for consistent high-volume usage). Additionally, with AWS, you can scale from an experiment in a notebook to a full production deployment when you’re ready, using the same building blocks (integrated with proper AWS infrastructure like Amazon API Gateway or Lambda, if deploying as a service).

Vertical industries that can benefit from this solution

The architecture we described is quite general. Let’s briefly look at how this multimodal agentic workflow can drive value in different industries:

  • Financial services – In the financial sector, the solution integrates multimedia RAG to unify earnings call transcripts, presentation slides (converted to searchable images), and real-time market feeds into a single analytical framework. Multi-agent collaboration enables Amazon Nova to orchestrate tools like Amazon Bedrock Data Automation for slide text extraction, semantic search for regulatory filings, and live data APIs for trend detection. This allows the system to generate actionable insights—such as identifying portfolio risks or recommending sector rebalancing—while automating content creation for investor reports or trade approvals (with human oversight). By mimicking an analyst’s ability to cross-reference data types, the AI assistant transforms fragmented inputs into cohesive strategies.
  • Healthcare – Healthcare workflows use multimedia RAG to process clinical notes, lab PDFs, and X-rays, grounding responses in peer-reviewed literature and patient audio interview. Multi-agent collaboration excels in scenarios like triage: Amazon Nova interprets symptom descriptions, Amazon Bedrock Data Automation extracts text from scanned documents, and integrated APIs check for drug interactions, all while validating outputs against trusted sources. Content creation ranges from succinct patient summaries (“Severe pneumonia, treated with levofloxacin”) to evidence-based answers for complex queries, such as summarizing diabetes guidelines. The architecture’s strict hallucination checks and source citations support reliability, which is critical for maintaining trust in medical decision-making.
  • Manufacturing – Industrial teams use multimedia RAG to index equipment manuals, sensor logs, worker audio conversation, and schematic diagrams, enabling rapid troubleshooting. Multi-agent collaboration allows Amazon Nova to correlate sensor anomalies with manual excerpts, and Amazon Bedrock Data Automation highlights faulty parts in technical drawings. The system generates repair guides (for example, “Replace valve Part 4 in schematic”) or contextualizes historical maintenance data, bridging the gap between veteran expertise and new technicians. By unifying text, images, and time series data into actionable content, the assistant reduces downtime and preserves institutional knowledge—proving that even in hardware-centric fields, AI-driven insights can drive efficiency.

These examples highlight a common pattern: the synergy of data automation, powerful multimodal models, and agentic orchestration leads to solutions that closely mimic a human expert’s assistance. The financial AI assistant cross-checks figures and explanations like an analyst would, the clinical AI assistant correlates images and notes like a diligent doctor, and the industrial AI assistant recalls diagrams and logs like a veteran engineer. All of this is made possible by the underlying architecture we’ve built.

Conclusion

The era of siloed AI models that only handle one type of input is drawing to a close. As we’ve discussed, combining multimodal AI with an agentic workflow unlocks a new level of capability for enterprise applications. In this post, we demonstrated how to construct such a workflow using AWS services: we used Amazon Nova as the core AI orchestrator with its multimodal, agent-friendly capabilities, Amazon Bedrock Data Automation to automate the ingestion and indexing of complex data (documents, slides, audio) into Amazon Bedrock Knowledge Bases, and the concept of an agentic workflow graph for reasoning and condition (using LangChain or LangGraph) to orchestrate multi-step reasoning and tool usage. The end result is an AI assistant that operates much like a diligent analyst: researching, cross-checking multiple sources, and delivering insights—but at machine speed and scale.The solution demonstrates that building a sophisticated agentic AI system is no longer an academic dream—it’s practical and achievable with today’s AWS technologies. By using Amazon Nova as a powerful multimodal LLM and Amazon Bedrock Data Automation for multimodal data processing, along with frameworks for tool orchestration like LangGraph (or Amazon Bedrock Agents), developers get a head start. Many challenges (like OCR, document parsing, or conversational orchestration) are handled by these managed services or libraries, so you can focus on the business logic and domain-specific needs.

The solution presented in the BDA_nova_agentic sample notebook is a great starting point to experiment with these ideas. We encourage you to try it out, extend it, and tailor it to your organization’s needs. We’re excited to see what you will build—the techniques discussed here represent only a small portion of what’s possible when you combine modalities and intelligent agents.


About the authors

Julia Hu Julia Hu is a Sr. AI/ML Solutions Architect at Amazon Web Services, currently focused on the Amazon Bedrock team. Her core expertise lies in agentic AI, where she explores the capabilities of foundation models and AI agents to drive productivity in Generative AI applications. With a background in Generative AI, Applied Data Science, and IoT architecture, she partners with customers—from startups to large enterprises—to design and deploy impactful AI solutions.

Rui Cardoso is a partner solutions architect at Amazon Web Services (AWS). He is focusing on AI/ML and IoT. He works with AWS Partners and support them in developing solutions in AWS. When not working, he enjoys cycling, hiking and learning new things.

Jessie-Lee Fry is a Product and Go-to Market (GTM) Strategy executive specializing in Generative AI and Machine Learning, with over 15 years of global leadership experience in Strategy, Product, Customer success, Business Development, Business Transformation and Strategic Partnerships. Jessie has defined and delivered a broad range of products and cross-industry go- to-market strategies driving business growth, while maneuvering market complexities and C-Suite customer groups. In her current role, Jessie and her team focus on helping AWS customers adopt Amazon Bedrock at scale enterprise use cases and adoption frameworks, meeting customers where they are in their Generative AI Journey.

Read More