Long-running execution flows now supported in Amazon Bedrock Flows in public preview

Long-running execution flows now supported in Amazon Bedrock Flows in public preview

Today, we announce the public preview of long-running execution (asynchronous) flow support within Amazon Bedrock Flows. With Amazon Bedrock Flows, you can link foundation models (FMs), Amazon Bedrock Prompt Management, Amazon Bedrock Agents, Amazon Bedrock Knowledge Bases, Amazon Bedrock Guardrails, and other AWS services together to build and scale predefined generative AI workflows.

As customers across industries build increasingly sophisticated applications, they’ve shared feedback about needing to process larger datasets and run complex workflows that take longer than a few minutes to complete. Many customers told us they want to transform entire books, process massive documents, and orchestrate multi-step AI workflows without worrying about runtime limits, highlighting the need for a solution that can handle long-running background tasks. To address those concerns, Amazon Bedrock Flows introduces a new feature in public preview that extends workflow execution time from 5 minutes (synchronous) to 24 hours (asynchronous).

With Amazon Bedrock long-running execution flows (asynchronous), you can chain together multiple prompts, AI services, and Amazon Bedrock components into complex, long-running workflows (up to 24 hours asynchronously). The new capabilities include built-in execution tracing directly using the AWS Management Console and Amazon Bedrock Flow API for observability. These enhancements significantly streamline workflow development and management in Amazon Bedrock Flows, helping you focus on building and deploying your generative AI applications.

By decoupling the workflow execution (asynchronously through long-running flows that can run for up to 24 hours) from the user’s immediate interaction, you can now build applications that can handle large payloads that take longer than 5 minutes to process, perform resource-intensive tasks, apply multiple rules for decision-making, and even run the flow in the background while integrating with multiple systems—while providing your users with a seamless and responsive experience.

Solution overview

Organizations using Amazon Bedrock Flows now can use long-running execution flow capabilities to design and deploy long-running workflows for building more scalable and efficient generative AI applications. This feature offers the following benefits:

  • Long-running workflows – You can run long-running workflows (up to 24 hours) as background tasks and decouple workflow execution from the user’s immediate interaction.
  • Large payload – The feature enables large payload processing and resource-intensive tasks that can continue for 24 hours instead of the current limit of 5 minutes.
  • Complex use cases – It can manage the execution of complex, multi-step decision-making generative AI workflows that integrate multiple external systems.
  • Builder-friendly – You can create and manage long-running execution flows through both the Amazon Bedrock API and Amazon Bedrock console.
  • Observability – You can enjoy a seamless user experience with the ability to check flow execution status and retrieve results accordingly. The feature also provides traces so you can view the inputs and outputs from each node.

Dentsu, a leading advertising agency and creative powerhouse, needs to handle complex, multi-step generative AI use cases that require longer execution time. One use case is their Easy Reading application, which converts books with many chapters and illustrations into easily readable formats to enable people with intellectual disabilities to access literature. With Amazon Bedrock long-running execution flows, now Dentsu can:

  • Process large inputs and perform complex resource-intensive tasks within the workflow. Prior to long-running execution flows, input size was limited due to the 5 minutes execution limit of flows.
  • Integrate multiple external systems and services into the generative AI workflow.
  • Support both quick, near real-time workflows and longer-running, more complex workflows.

“Amazon Bedrock has been amazing to work with and demonstrate value to our clients,” says Victoria Aiello, Innovation Director, Dentsu Creative Brazil. “Using traces and flows, we are able to show how processing happens behind the scenes of the work AI is performing, giving us better visibility and accuracy on what’s to be produced. For the Easy Reading use case, long-running execution flows will allow for processing of the entire book in one go, taking advantage of the 24-hour flow execution time instead of writing custom code to manage multiple sections of the book separately. This saves us time when producing new books or even integrating with different models; we can test different results according to the needs or content of each book.”

Let’s explore how the new long-running execution flow capability in Amazon Bedrock Flows enables Dentsu to build a more efficient and long-running book processing generative AI application. The following diagram illustrates the end-to-end flow of Dentsu’s book processing application. The process begins when a client uploads a book to Amazon Simple Storage Service (Amazon S3), triggering a flow that processes multiple chapters, where each chapter undergoes accessibility transformations and formatting according to specific user requirements. The transformed chapters are then collected, combined with a table of contents, and stored back in Amazon S3 as a final accessible document. This long-running execution (asynchronous) flow can handle large books efficiently, processing them within the 24-hour execution window while providing status updates and traceability throughout the transformation process.

Detailed AWS workflow diagram showing three-step book conversion process: upload, asynchronous processing, and status monitoring

In the following sections, we demonstrate how to create a long-running execution flow in Amazon Bedrock Flows using Dentsu’s real-world use case of books transformation.

Prerequisites

Before implementing the new capabilities, make sure you have the following:

After these components are in place, you can implement Amazon Bedrock long-running execution flow capabilities in your generative AI use case.

Create a long-running execution flow

Complete the following steps to create your long-running execution flow:

  1. On the Amazon Bedrock console, in the navigation pane under Builder tools, choose Flows.
  2. Choose Create a flow.
  3. Provide a name for your a new flow, for example, easy-read-long-running-flow.

For detailed instructions on creating a flow, see Amazon Bedrock Flows is now generally available with enhanced safety and traceability. Amazon Bedrock provides different node types to build your prompt flow.

The following screenshot shows the high-level flow of Dentsu’s book conversion generative AI-powered application. The workflow demonstrates a sequential process from input handling through content transformation to final storage and delivery.

AWS Bedrock Flow Builder interface displaying easy-read-long-running-flow with connected components for document processing and storage

The following table outlines the core components and nodes within the preceding workflow, designed for document processing and accessibility transformation.

Node Purpose
Flow Input Entry point accepting an array of S3 prefixes (chapters) and accessibility profile
Iterator Processes each chapter (prefix) individually
S3 Retrieval Downloads chapter content from the specified Amazon S3 location
Easifier Applies accessibility transformation rules to chapter content
HTML Formatter Formats transformed content with appropriate HTML structure
Collector Assembles transformed chapters while maintaining order
Lambda Function Combines chapters into a single document with table of contents
S3 Storage Stores the final transformed document in Amazon S3
Flow Output Returns Amazon S3 location of the transformed book with metadata

Test the book processing flow

We are now ready to test the flow through the Amazon Bedrock console or API. We use a fictional book called “Beyond Earth: Humanity’s Journey to the Stars.” This book tells the story of humanity’s greatest adventure beyond our home planet, tracing our journey from the first satellites and moonwalks to space stations and robotic explorers that continue to unveil the mysteries of our solar system.

  1. On the Amazon Bedrock console, choose Flows in the navigation pane.
  2. Choose the flow (easy-read-long-running-flow) and choose Create execution.

The flow must be in the Prepared state before creating an execution.

The Execution tab shows the previous executions for the selected flow.

AWS Bedrock Flow details page showing flow configuration and execution status

  1. Provide the following input:

dyslexia test input

{
  "chapterPrefixes": [
    "books/beyond-earth/chapter_1.txt",
    "books/beyond-earth/chapter_2.txt",
    "books/beyond-earth/chapter_3.txt"
  ],
  "metadata": {
    "accessibilityProfile": "dyslexia",
    "bookId": "beyond-earth-002",
    "bookTitle": "Beyond Earth: Humanity's Journey to the Stars"
  }
}

These are the different chapters of our book that need to be transformed.

  1. Choose Create.

AWS Bedrock Flow execution setup modal with name, alias selection, and JSON configuration for book processing

Amazon Bedrock Flows initiates the long-running execution (asynchronous) flow of our workflow. The dashboard displays the executions of our flow with their respective statuses (Running, Succeeded, Failed, TimedOut, Aborted). When an execution is marked as Completed, the results become available in our designated S3 bucket.

AWS Bedrock Flow dashboard displaying flow details and active execution status for easy-read implementation

Choosing an execution takes you to the summary page containing its details. The Overview section displays start and end times, plus the execution Amazon Resource Name (ARN)—a unique identifier that’s essential for troubleshooting specific executions later.

AWS execution interface with status, summary details, and workflow diagram of connected services

When you select a node in the flow builder, its configuration details appear. For instance, choosing the Easifier node reveals the prompt used, the selected model (here it’s Amazon Nova Lite), and additional configuration parameters. This is essential information for understanding how that specific component is set up.

AWS Bedrock interface with LLM settings, prompt configuration, and service workflow visualization

The system also provides access to execution traces, offering detailed insights into each processing step, tracking real-time performance metrics, and highlighting issues that occurred during the flow’s execution. Traces can be enabled using the API and sent to an Amazon CloudWatch log. In the API, set the enableTrace field to true in an InvokeFlow request. Each flowOutputEvent in the response is returned alongside a flowTraceEvent.

AWS flow execution trace showing processing steps for book chapter conversion

We have now successfully created and executed a long-running execution flow. You can also use Amazon Bedrock APIs to programmatically start, stop, list, and get flow executions. For more details on how to configure flows with enhanced safety and traceability, refer to Amazon Bedrock Flows is now generally available with enhanced safety and traceability.

Conclusion

The integration of long-running execution flows in Amazon Bedrock Flows represents a significant advancement in generative AI development. With these capabilities, you can create more efficient AI-powered solutions to automate long-running operations, addressing critical challenges in the rapidly evolving field of AI application development.

Long-running execution flow support in Amazon Bedrock Flows is now available in public preview in AWS Regions where Amazon Bedrock Flows is available, except for the AWS GovCloud (US) Regions. To get started, open the Amazon Bedrock console or APIs to begin building flows with long-running execution flow with Amazon Bedrock Flows. To learn more, see Create your first flow in Amazon Bedrock and Track each step in your flow by viewing its trace in Amazon Bedrock.

We’re excited to see the innovative applications you will build with these new capabilities. As always, we welcome your feedback through AWS re:Post for Amazon Bedrock or your usual AWS contacts. Join the generative AI builder community at community.aws to share your experiences and learn from others.


About the authors

Shubhankar SumarShubhankar Sumar is a Senior Solutions Architect at AWS, where he specializes in architecting generative AI-powered solutions for enterprise software and SaaS companies across the UK. With a strong background in software engineering, Shubhankar excels at designing secure, scalable, and cost-effective multi-tenant systems on the cloud. His expertise lies in seamlessly integrating cutting-edge generative AI capabilities into existing SaaS applications, helping customers stay at the forefront of technological innovation.

Amit LullaAmit Lulla is a Principal Solutions Architect at AWS, where he architects enterprise-scale generative AI and machine learning solutions for software companies. With over 15 years in software development and architecture, he’s passionate about turning complex AI challenges into bespoke solutions that deliver real business value. When he’s not architecting cutting-edge systems or mentoring fellow architects, you’ll find Amit on the squash court, practicing yoga, or planning his next travel adventure. He also maintains a daily meditation practice, which he credits for keeping him centered in the fast-paced world of AI innovation.

Huong NguyenHuong Nguyen is a Principal Product Manager at AWS. She is leading the Amazon Bedrock Flows, with 18 years of experience building customer-centric and data-driven products. She is passionate about democratizing responsible machine learning and generative AI to enable customer experience and business innovation. Outside of work, she enjoys spending time with family and friends, listening to audiobooks, traveling, and gardening.

Christian Kamwangala is an AI/ML and Generative AI Specialist Solutions Architect at AWS, based in Paris, France. He partners with enterprise customers to architect, optimize, and deploy production-grade AI solutions leveraging AWS’s comprehensive machine learning stack. Christian specializes in inference optimization techniques that balance performance, cost, and latency requirements for large-scale deployments. In his spare time, Christian enjoys exploring nature and spending time with family and friends.

Jeremy Bartosiewicz is a Senior Solutions Architect at AWS, with over 15 years of experience working in technology in multiple roles. Coming from a consulting background, Jeremy enjoys working on a multitude of projects that help organizations grow using cloud solutions. He helps support large enterprise customers at AWS and is part of the Advertising and Machine Learning TFCs.

Read More

Fraud detection empowered by federated learning with the Flower framework on Amazon SageMaker AI

Fraud detection empowered by federated learning with the Flower framework on Amazon SageMaker AI

Fraud detection remains a significant challenge in the financial industry, requiring advanced machine learning (ML) techniques to detect fraudulent patterns while maintaining compliance with strict privacy regulations. Traditional ML models often rely on centralized data aggregation, which raises concerns about data security and regulatory constraints.

Fraud cost businesses over $485.6 billion in 2023 alone, according to Nasdaq’s Global Financial Crime Report, with financial institutions under pressure to keep up with evolving threats. Traditional fraud models often rely on isolated data, leading to overfitting and poor real-world performance. Data privacy laws like GDPR and CCPA further limit collaboration. With federated learning using Amazon SageMaker AI, organizations can jointly train models without sharing raw data, boosting accuracy while maintaining compliance.

In this post, we explore how SageMaker and federated learning help financial institutions build scalable, privacy-first fraud detection systems.

Federated learning with the Flower framework on SageMaker AI

With federated learning, multiple institutions can train a shared model while keeping their data decentralized, addressing privacy and security concerns in fraud detection. A key advantage of this approach is that it mitigates the risk of overfitting by learning from a wider distribution of fraud patterns across various datasets. It allows financial institutions to collaborate while maintaining strict data privacy, making sure that no single entity has access to another’s raw data. This not only improves fraud detection accuracy but also adheres to industry regulations and compliance requirements.

Popular frameworks for federated learning include Flower, PySyft, TensorFlow Federated TFF, and FedML. Among these, the Flower framework stands out for being framework-agnostic, a key advantage that allows it to seamlessly integrate with a wide range of tools such as PyTorch, TensorFlow, Hugging Face, scikit-learn, and more.

Although SageMaker is powerful for centralized ML workflows, Flower is purpose-built for decentralized model training, enabling secure collaboration across data silos without exposing raw data. When deployed on SageMaker, Flower takes advantage of the cloud system’s scalability and automation while enabling flexible, privacy-preserving federated learning workflows. This combination improves time to production, reduces engineering complexity, and supports strict data governance, making it highly suitable for cross-institutional or regulated environments.

Generating synthetic data with Synthetic Data Vault

To strengthen fraud detection while preserving data privacy, organizations can use the Synthetic Data Vault (SDV), a Python library that generates realistic synthetic datasets reflecting real-world patterns. Teams can use SDV to simulate diverse fraud scenarios without exposing sensitive information, helping federated learning models generalize better and detect subtle, evolving fraud tactics. It also helps address data imbalance by amplifying underrepresented fraud cases, improving model accuracy and robustness.

Beyond data generation, SDV captures complex statistical relationships and accelerates model development by reducing dependence on expensive, hard-to-obtain labeled data. In our approach, synthetic data is used primarily as a validation dataset, supporting privacy and consistency across environments, and training datasets can be real or synthetic depending on audit and compliance requirements. This flexibility supports privacy-by-design principles while maintaining adaptability in regulated environments.

A fair evaluation approach for federated learning models

A critical aspect of federated learning is facilitating a fair and unbiased evaluation of trained models. To achieve this, organizations must adopt a structured dataset strategy. As illustrated in the following figure, Dataset A and Dataset B are used as separate training datasets, with each participating institution contributing distinct datasets that capture different fraud patterns. Instead of evaluating the model using only one dataset, a combined dataset of A and B is used for evaluation. This makes sure that the model is tested on a more comprehensive distribution of real-world fraud cases, helping reduce bias and improve fairness in assessment.

By adopting this evaluation method, organizations can validate the model’s ability to generalize across different data distributions. This approach makes sure fraud detection models aren’t overly reliant on a single institution’s data, improving robustness against evolving fraud tactics. Standard evaluation metrics such as precision, recall, F1-score, and AUC-ROC are used to measure model performance. In the insurance sector, particular attention is given to false negatives—cases where fraudulent claims are missed—because these directly translate to financial losses. Minimizing false negatives is critical to protect against undetected fraud, while also making sure the model performs consistently and fairly across diverse datasets in a federated learning environment.

Solution overview

The following diagram illustrates how we implemented this approach across two AWS accounts using SageMaker AI and cross-account virtual private cloud (VPC) peering.

Flower supports a wide range of ML frameworks, including PyTorch, TensorFlow, Hugging Face, JAX, Pandas, fast.ai, PyTorch Lightning, MXNet, scikit-learn, and XGBoost. When deploying federated learning on SageMaker, Flower enables a distributed setup where multiple institutions can collaboratively train models while keeping data private. Each participant trains a local model on its own dataset and shares only model updates—not raw data—with a central server. SageMaker orchestrates the complete training, validation, and evaluation process securely and efficiently. The final model remains consistent with the original framework, making it deployable to a SageMaker endpoint using its supported framework container.

To facilitate a smooth and scalable implementation, SageMaker AI provides built-in features for model orchestration, hyperparameter tuning, and automated monitoring. Institutions can continuously improve their models based on the latest fraud patterns without requiring manual updates. Additionally, integrating SageMaker AI with AWS services such as AWS Identity and Access Management (IAM) enhances security and compliance.

For more information, refer to the Flower Federated Learning Workshop, which provides detailed guidance on setting up and running federated learning workloads effectively. By integrating federated learning, synthetic data generation, and structured evaluation strategies, you can develop robust fraud detection systems that are both scalable and privacy-preserving.

Results and key takeaways

The implementation of federated learning for fraud detection has demonstrated significant improvements in model performance and fraud detection accuracy. By training on diverse datasets, the model captures a broader range of fraud patterns, helping reduce bias and overfitting. The incorporation of SDV-generated datasets facilitates a well-rounded training process, improving generalization to real-world fraud scenarios. The federated learning framework on SageMaker enables organizations to scale their fraud detection models while maintaining compliance with data privacy regulations.

Through this approach, organizations have observed a reduction in false positives, helping fraud analysts focus on high-risk transactions more effectively. The ability to train models on a wider range of fraud patterns across multiple institutions has led to a more comprehensive and accurate fraud detection system. Future optimizations might include refining synthetic data techniques and expanding federated learning participation to further enhance fraud detection capabilities.

Conclusion

The Flower framework provides a scalable, privacy-preserving approach to fraud detection by using federated learning on SageMaker AI. By combining decentralized training, synthetic data generation, and fair evaluation strategies, financial institutions can enhance model accuracy while maintaining compliance with regulations. Shin Kong Financial Holding and Shin Kong Life successfully adopted this approach, as highlighted in their official blog post. This methodology sets a new standard for financial security applications, paving the way for broader adoption of federated learning.

Although using Flower on SageMaker for federated learning offers strong privacy and scalability benefits, there are some limitations to consider. Technically, managing heterogeneity across clients (such as different data schemas, compute capacities, or model architectures) can be complex. From a use case perspective, federated learning might not be ideal for scenarios requiring real-time inference or highly synchronous updates, and it depends on stable connectivity across participating nodes. To address these challenges, organizations are exploring the use of high-quality synthetic datasets that preserve data distributions while protecting privacy, improving model generalization and robustness. Next steps include experimenting with these datasets, using the Flower Federated Learning Workshop for hands-on guidance, reviewing the system architecture for deeper understanding, and engaging with the AWS account team to tailor and scale your federated learning solution.


About the Authors

Ray Wang is a Senior Solutions Architect at AWS. With 12 years of experience in the backend and consultant, Ray is dedicated to building modern solutions in the cloud, especially in especially in NoSQL, big data, machine learning, and Generative AI. As a hungry go-getter, he passed all 12 AWS certificates to increase the breadth and depth of his technical knowledge. He loves to read and watch sci-fi movies in his spare time.

Kanwaljit Khurmi is a Principal Solutions Architect at Amazon Web Services. He works with AWS customers to provide guidance and technical assistance, helping them improve the value of their solutions when using AWS. Kanwaljit specializes in helping customers with containerized and machine learning applications.

James Chan is a Solutions Architect at AWS specializing in the Financial Services Industry (FSI). With extensive experience in financial services, Fintech, and manufacturing sectors, James help FSI customers at AWS innovate and build scalable cloud solutions and financial system architectures. James specialize in AWS container, network architecture, and generative AI solutions that combine cloud-native technologies with strict financial compliance requirements.

Mike Xu is an Associate Solutions Architect specializing in AI/ML at Amazon Web Services. He works with customers to design machine learning solutions using services like Amazon SageMaker and Amazon Bedrock. With a background in computer engineering and a passion for generative AI, Mike focuses on helping organizations accelerate their AI/ML journey in the cloud. Outside of work, he enjoys producing electronic music and exploring emerging tech.

Read More

Building intelligent AI voice agents with Pipecat and Amazon Bedrock – Part 2

Building intelligent AI voice agents with Pipecat and Amazon Bedrock – Part 2

Voice AI is changing the way we use technology, allowing for more natural and intuitive conversations. Meanwhile, advanced AI agents can now understand complex questions and act autonomously on our behalf.

In Part 1 of this series, you learned how you can use the combination of Amazon Bedrock and Pipecat, an open source framework for voice and multimodal conversational AI agents to build applications with human-like conversational AI. You learned about common use cases of voice agents and the cascaded models approach, where you orchestrate several components to build your voice AI agent.

In this post (Part 2), you explore how to use speech-to-speech foundation model, Amazon Nova Sonic, and the benefits of using a unified model.

Architecture: Using Amazon Nova Sonic speech-to-speech

Amazon Nova Sonic is a speech-to-speech foundation model that delivers real-time, human-like voice conversations with industry-leading price performance and low latency. While the cascaded models approach outlined in Part 1 is flexible and modular, it requires orchestration of automatic speech recognition (ASR), natural language processing (NLU), and text-to-speech (TTS) models. For conversational use cases, this might introduce latency and result in loss of tone and prosody. Nova Sonic combines these components into a unified model that processes audio in real time with a single forward pass, reducing latency while streamlining development.

By unifying these capabilities, the model can dynamically adjust voice responses based on the acoustic characteristics and conversational context of the input, creating more fluid and contextually appropriate dialogue. The system recognizes conversational subtleties such as natural pauses, hesitations, and turn-taking cues, allowing it to respond at appropriate moments and seamlessly manage interruptions during conversation. Amazon Nova Sonic also supports tool use and agentic RAG with Amazon Bedrock Knowledge Bases enabling your voice agents to retrieve information. Refer to the following figure to understand the end-to-end flow.

End-to-end architecture diagram of voice-enabled AI agent orchestrated by Pipecat, featuring real-time processing and AWS services

The choice between the two approaches depends on your use case. While the capabilities of Amazon Nova Sonic are state-of-the-art, the cascaded models approach outlined in Part 1 might be suitable if you require additional flexibility or modularity for advanced use cases.

AWS collaboration with Pipecat

To achieve a seamless integration, AWS collaborated with the Pipecat team to support Amazon Nova Sonic in version v0.0.67, making it straightforward to integrate state-of-the-art speech capabilities into your applications.

Kwindla Hultman Kramer, Chief Executive Officer at Daily.co and Creator of Pipecat, shares his perspective on this collaboration:

“Amazon’s new Nova Sonic speech-to-speech model is a leap forward for real-time voice AI. The bidirectional streaming API, natural-sounding voices, and robust tool-calling capabilities open up exciting new possibilities for developers. Integrating Nova Sonic with Pipecat means we can build conversational agents that not only understand and respond in real time, but can also take meaningful actions; like scheduling appointments or fetching information-directly through natural conversation. This is the kind of technology that truly transforms how people interact with software, making voice interfaces faster, more human, and genuinely useful in everyday workflows.”

“Looking forward, we’re thrilled to collaborate with AWS on a roadmap that helps customers reimagine their contact centers with integration to Amazon Connect and harness the power of multi-agent workflows through the Strands agentic framework. Together, we’re enabling organizations to deliver more intelligent, efficient, and personalized customer experiences—whether it’s through real-time contact center transformation or orchestrating sophisticated agentic workflows across industries.”

Getting started with Amazon Nova Sonic and Pipecat

To guide your implementation, we provide a comprehensive code example that demonstrates the basic functionality. This example shows how to build a complete voice AI agent with Amazon Nova Sonic and Pipecat.

Prerequisites

Before using the provided code examples with Amazon Nova Sonic, make sure that you have the following:

Implementation steps

After you complete the prerequisites, you can start setting up your sample voice agent:

  1. Clone the repository:
git clone https://github.com/aws-samples/build-intelligent-ai-voice-agents-with-pipecat-and-amazon-bedrock
cd build-intelligent-ai-voice-agents-with-pipecat-and-amazon-bedrock/part-2

  1. Set up a virtual environment:
cd server
python3 -m venv
venv source venv/bin/activate # On Windows: venvScriptsactivate
pip install -r requirements.txt

  1. Create a .env file with your credentials:
DAILY_API_KEY=your_daily_api_key
AWS_ACCESS_KEY_ID=your_aws_access_key_id
AWS_SECRET_ACCESS_KEY=your_aws_secret_access_key
AWS_REGION=your_aws_region
  1. Start the server:
python server.py
  1. Connect using a browser at http://localhost:7860 and grant microphone access.
  2. Start the conversation with your AI voice agent.

Customize your voice AI agent

To customize your voice AI agent, start by:

  1. Modifying bot.py to change conversation logic.
  2. Adjusting model selection in bot.py for your latency and quality needs.

To learn more, see the README of our code sample on Github.

Clean up

The preceding instructions are for setting up the application in your local environment. The local application will uses AWS services and Daily through IAM and API credentials. For security and to avoid unanticipated costs, when you’re finished, delete these credentials so that they can no longer be accessed.

Amazon Nova Sonic and Pipecat in action

The demo showcases a scenario for an intelligent healthcare assistant. The demo was presented at the keynote in AWS Summit Sydney 2025 by Rada Stanic, Chief Technologist and Melanie Li, Senior Specialist Solutions Architect – Generative AI.

The demo showcases a simple fun facts voice agent in a local environment using SmallWebRTCTransport. As the user speaks, the voice agent provides transcription in real-time as displayed in the terminal.

Enhancing agentic capabilities with Strands Agents

A practical way to boost agentic capability and understanding is to implement a general tool call that delegates tool selection to an external agent such as a Strands Agent. The delegated Strands Agent can then reason or think about your complex query, perform multi-step tasks with tool calls, and return a summarized response.

To illustrate, let’s review a simple example. If the user asks a question like: “What is the weather like near the Seattle Aquarium?”, the voice agent can delegate to a Strands agent through a general tool call such as handle_query.

The Strands agent will handle the query and think about the task, for example:

<thinking>I need to get the weather information for the Seattle Aquarium. To do this, I need the latitude and longitude of the Seattle Aquarium. I will first use the 'search_places' tool to find the coordinates of the Seattle Aquarium.</thinking> 

The Strands Agent will then execute the search_places tool call, a subsequent get_weather tool call, and return a response back to the parent agent as part of the handle_query tool call. This is also known as the agent as tools pattern.

To learn more, see the example in our hands-on workshop.

Conclusion

Building intelligent AI voice agents is more accessible than ever through the combination of open source frameworks such as Pipecat, and powerful foundation models on Amazon Bedrock.

In this series, you learned about two common approaches for building AI voice agents. In Part 1, you learned about the cascaded models approach; diving into each component of a conversational AI system. In Part 2, you learned about how using Amazon Nova Sonic, a speech-to-speech foundation model, can simplify implementation and unify these components into a single model architecture. Looking ahead, stay tuned for exciting developments in multi-modal foundation models, including the upcoming Nova any-to-any models—these innovations will continually improve your voice AI applications.

Resources

To learn more about voice AI agents, see the following resources:

To get started with your own voice AI project, contact your AWS account team to explore an engagement with AWS Generative AI Innovation Center (GAIIC).


About the Authors

Adithya Suresh is a Deep Learning Architect at AWS Generative AI Innovation Center based in Sydney, where he collaborates directly with enterprise customers to design and scale transformational generative AI solutions for complex business challenges. He leverages AWS generative AI services to build bespoke AI systems that drive measurable business value across diverse industries.

Daniel Wirjo is a Solutions Architect at AWS, with focus across AI and SaaS startups. As a former startup CTO, he enjoys collaborating with founders and engineering leaders to drive growth and innovation on AWS. Outside of work, Daniel enjoys taking walks with a coffee in hand, appreciating nature, and learning new ideas.

Karan Singh is a Generative AI Specialist at AWS, where he works with top-tier third-party foundation model and agentic frameworks providers to develop and execute joint go-to-market strategies, enabling customers to effectively deploy and scale solutions to solve enterprise generative AI challenges.

Melanie Li, PhD is a Senior Generative AI Specialist Solutions Architect at AWS based in Sydney, Australia, where her focus is on working with customers to build solutions leveraging state-of-the-art AI and machine learning tools. She has been actively involved in multiple Generative AI initiatives across APJ, harnessing the power of Large Language Models (LLMs). Prior to joining AWS, Dr. Li held data science roles in the financial and retail industries.

Osman Ipek is a seasoned Solutions Architect on Amazon’s Artificial General Intelligence team, specializing in Amazon Nova foundation models. With over 12 years of experience in software and machine learning, he has driven innovative Alexa product experiences reaching millions of users. His expertise spans voice AI, natural language processing, large language models and MLOps, with a passion for leveraging AI to create breakthrough products.

Xuefeng Liu leads a science team at the AWS Generative AI Innovation Center in the Asia Pacific regions. His team partners with AWS customers on generative AI projects, with the goal of accelerating customers’ adoption of generative AI.

Read More

Uphold ethical standards in fashion using multimodal toxicity detection with Amazon Bedrock Guardrails

Uphold ethical standards in fashion using multimodal toxicity detection with Amazon Bedrock Guardrails

The global fashion industry is estimated to be valued at $1.84 trillion in 2025, accounting for approximately 1.63% of the world’s GDP (Statista, 2025). With such massive amounts of generated capital, so too comes the enormous potential for toxic content and misuse.

In the fashion industry, teams are frequently innovating quickly, often utilizing AI. Sharing content, whether it be through videos, designs, or otherwise, can lead to content moderation challenges. There remains a risk (through intentional or unintentional actions) of inappropriate, offensive, or toxic content being produced and shared. This can lead to violation of company policy and irreparable brand reputation damage. Implementing guardrails while utilizing AI to innovate faster within this industry can provide long lasting benefits.

In this post, we cover the use of the multimodal toxicity detection feature of Amazon Bedrock Guardrails to guard against toxic content. Whether you’re an enterprise giant in the fashion industry or an up-and-coming brand, you can use this solution to screen potentially harmful content before it impacts your brand’s reputation and ethical standards. For the purposes of this post, ethical standards refer to toxic, disrespectful, or harmful content and images that could be created by fashion designers.

Brand reputation represents a priceless currency that transcends trends, with companies competing not just for sales but for consumer trust and loyalty. As technology evolves, the need for effective reputation management strategies should include using AI in responsible ways. In this growing age of innovation, as the fashion industry evolves and creatives innovate faster, brands that strategically manage their reputation while adapting to changing consumer preferences and global trends will distinguish themselves from the rest in the industry (source). Take the first step toward responsible AI within your creative practices with Amazon Bedrock Guardrails.

Solution overview

To incorporate multimodal toxicity detection guardrails in an image generating workflow with Amazon Bedrock, you can use the following AWS services:

The following diagram illustrates the solution architecture.

Prerequisites

For this solution, you must have the following:

The following IAM policy grants specific permissions for a Lambda function to interact with Amazon CloudWatch Logs, access objects in an S3 bucket, and apply Amazon Bedrock guardrails, enabling the function to log its activities, read from Amazon S3, and use Amazon Bedrock content filtering capabilities. Before using this policy, update the placeholders with your resource-specific values:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "CloudWatchLogsAccess",
            "Effect": "Allow",
            "Action": "logs:CreateLogGroup",
            "Resource": "arn:aws:logs:<REGION>:<ACCOUNT-ID>:*"
        },
        {
            "Sid": "CloudWatchLogsStreamAccess",
            "Effect": "Allow",
            "Action": [
                "logs:CreateLogStream",
                "logs:PutLogEvents"
            ],
            "Resource": [
                "arn:aws:logs:<REGION>:<ACCOUNT-ID>:log-group:/aws/lambda/<FUNCTION-NAME>:*"
            ]
        },
        {
            "Sid": "S3ReadAccess",
            "Effect": "Allow",
            "Action": "s3:GetObject",
            "Resource": "arn:aws:s3:::<BUCKET-NAME>/*"
        },
        {
            "Sid": "BedrockGuardrailsAccess",
            "Effect": "Allow",
            "Action": "bedrock:ApplyGuardrail",
            "Resource": "arn:aws:bedrock:<REGION>:<ACCOUNT-ID>:guardrail/<GUARDRAIL-ID>"
        }
    ]
}

The following steps walk you through how to incorporate multimodal toxicity detection guardrails in an image generation workflow with Amazon Bedrock.

Create a multimodal guardrail in Amazon Bedrock

The foundation of our moderation system is a guardrail in Amazon Bedrock configured specifically for image content. To create a multimodality toxicity detection guardrail, complete the following steps:

  1. On the Amazon Bedrock console, choose Guardrails under Safeguards in the navigation pane.
  2. Choose Create guardrail.
  3. Enter a name and optional description, and create your guardrail.

Configure content filters for multiple modalities

Next, you configure the content filters. Complete the following steps:

  1. On the Configure content filters page, choose Image under Filter for prompts. This allows the guardrail to process visual content alongside text.
  2. Configure the categories for Hate, Insults, Sexual, and Violence to filter both text and image content. The Misconduct and Prompt threat categories are available for text content filtering only.
  3. Create your filters.

By setting up these filters, you create a comprehensive safeguard that can detect potentially harmful content across multiple modalities, enhancing the safety and reliability of your AI applications.

Create an S3 bucket

You need a place for users (or other processes) to upload the images that require moderation. To create an S3 bucket, complete the following steps:

  1. On the Amazon S3 console, choose Buckets in the navigation pane.
  2. Choose Create bucket.
  3. Enter a unique name and choose the AWS Region where you want to host the bucket.
  4. For this basic setup, standard settings are usually sufficient.
  5. Create your bucket.

This bucket is where our workflow begins—new images landing here will trigger the next step.

Create a Lambda function

We use a Lambda function, a serverless compute service, written in Python. This function is invoked when a new image arrives in the S3 bucket. The function will send the image to our guardrail in Amazon Bedrock for analysis. Complete the following steps to create your function:

  1. On the Lambda console, choose Functions in the navigation pane.
  2. Choose Create function.
  3. Enter a name and choose a recent Python runtime.
  4. Grant the correct permissions using the IAM execution role. The function needs permission to read the newly uploaded object from your S3 bucket (s3:GetObject) and permission to interact with Amazon Bedrock Guardrails using the bedrock:ApplyGuardrail action for your specific guardrail.
  5. Create your guardrail.

Let’s explore the Python code that powers this function. We use the AWS SDK for Python (Boto3) to interact with Amazon S3 and Amazon Bedrock. The code first identifies the uploaded image from the S3 event trigger. It then checks if the image format is supported (JPEG or PNG) and verifies that the size doesn’t exceed the guardrail limit of 4 MB.

The key step involves preparing the image data for the ApplyGuardrail API call. We package the raw image bytes along with its format into a structure that Amazon Bedrock understands. We use the ApplyGuardrail API; this is efficient because we can check the image against our configured policies without needing to invoke a full foundation model.

Finally, the function calls ApplyGuardrail, passing the image content, the guardrail ID, and the version you noted earlier. It then interprets the response from Amazon Bedrock, logging whether the content was BLOCKED or NONE (meaning it passed the check), along with specific harmful categories detected if it was blocked.

The following is Python code you can use as a starting point (remember to replace the placeholders):

import boto3
import json
import os
import traceback

s3_client = boto3.client('s3')
# Use 'bedrock-runtime' for ApplyGuardrail and InvokeModel
bedrock_runtime_client = boto3.client('bedrock-runtime')

GUARDRAIL_ID = '<YOUR_GUARDRAIL_ID>' 
GUARDRAIL_VERSION = '<SPECIFIC_VERSION>' #e.g, '1'

# Supported image formats by the Guardrail feature
SUPPORTED_FORMATS = {'jpg': 'jpeg', 'jpeg': 'jpeg', 'png': 'png'}

def lambda_handler(event, context):
    # Get bucket name and object key
    bucket = event['Records'][0]['s3']['bucket']['name']
    key = event['Records'][0]['s3']['object']['key']

    print(f"Processing s3://{bucket}/{key}")

    # Extract file extension and check if supported
    try:
        file_ext = os.path.splitext(key)[1].lower().lstrip('.')
        image_format = SUPPORTED_FORMATS.get(file_ext)
        if not image_format:
            print(f"Unsupported image format: {file_ext}. Skipping.")
            return {'statusCode': 400, 'body': 'Unsupported image format'}
    except Exception as e:
         print(f"Error determining file format for {key}: {e}")
         return {'statusCode': 500, 'body': 'Error determining file format'}


    try:
        # Get image bytes from S3
        response = s3_client.get_object(Bucket=bucket, Key=key)
        image_bytes = response['Body'].read()

        # Basic size check (Guardrail limit is 4MB)
        if len(image_bytes) > 4 * 1024 * 1024:
             print(f"Image size exceeds 4MB limit for {key}. Skipping.")
             return {'statusCode': 400, 'body': 'Image size exceeds 4MB limit'}

        # 3. Prepare content list for ApplyGuardrail API 
        content_to_assess = [
            {
                "image": {
                    "format": image_format, # 'jpeg' or 'png' 
                    "source": {
                        "bytes": image_bytes # Pass raw bytes 
                    }
                }
            }
        ]

        # Call ApplyGuardrail API 
        print(f"Calling ApplyGuardrail for {key} (Format: {image_format})")
        guardrail_response = bedrock_runtime_client.apply_guardrail(
            guardrailIdentifier=GUARDRAIL_ID,
            guardrailVersion=GUARDRAIL_VERSION,
            source='INPUT', # Assess as user input
            content=content_to_assess
        )

        # Process response
        print("Guardrail Assessment Response:", json.dumps(guardrail_response))

        action = guardrail_response.get('action')
        assessments = guardrail_response.get('assessments', [])
        outputs = guardrail_response.get('outputs', []) # Relevant if masking occurs

        print(f"Guardrail Action for {key}: {action}")

        if action == 'BLOCKED':
            print(f"Content BLOCKED. Assessments: {json.dumps(assessments)}")
            # Add specific handling for blocked content
        elif action == 'NONE':
             print("Content PASSED.")
             # Add handling for passed content
        else:
             # Handle other potential actions (e.g., content masked)
             print(f"Guardrail took action: {action}. Outputs: {json.dumps(outputs)}")


        return {
            'statusCode': 200,
            'body': json.dumps(f'Successfully processed {key}. Guardrail action: {action}')
        }

    except bedrock_runtime_client.exceptions.ValidationException as ve:
        print(f"Validation Error calling ApplyGuardrail for {key}: {ve}")
        # You might get this for exceeding size/dimension limits or other issues
        return {'statusCode': 400, 'body': f'Validation Error: {ve}'}
    except Exception as e:
        print(f"Error processing image {key}: {e}")
        # Log the full error for debugging
        traceback.print_exc()
        return {'statusCode': 500, 'body': f'Internal server error processing {key}'}

Check the function’s default execution timeout (found under Configuration, General configuration) to verify it has enough time to download the image and wait for the Amazon Bedrock API response, perhaps setting it to 30 seconds.

Create an Amazon S3 trigger for the Lambda function

With the S3 bucket ready and the function coded, you must now connect them. This is done by setting up an Amazon S3 trigger on the Lambda function:

  1. On the function’s configuration page, choose Add trigger.

  1. Choose S3 as the source.

  1. Point it to the S3 bucket you created earlier.
  2. Configure the trigger to activate on All object create events. This makes sure that whenever a new file is successfully uploaded to the S3 bucket, your Lambda function is automatically invoked.

Test your moderation pipeline

It’s time to see your automated workflow in action! Upload a few test images (JPEG or PNG, under 4 MB) to your designated S3 bucket. Include images that are clearly safe and others that might trigger the harmful content filters you configured in your guardrail. On the CloudWatch console, find the log group associated with your Lambda function. Examining the latest log streams will show you the function’s execution details. You should see messages confirming which file was processed, the call to ApplyGuardrail, and the final guardrail action (NONE or BLOCKED). If an image was blocked, the logs should also show the specific assessment details, indicating which harmful category was detected.

By following these steps, you have established a robust, serverless pipeline for automatically moderating image content using the power of Amazon Bedrock Guardrails. This proactive approach helps maintain safer online environments and aligns with responsible AI practices.

{
    "ResponseMetadata": {
        "RequestId": "fa025ab0-905f-457d-ae19-416537e2c69f",
        "HTTPStatusCode": 200,
        "HTTPHeaders": {
            "content-type": "application/json",
            "content-length": "1008",
            "connection": "keep-alive",
        },
        "RetryAttempts": 0
    },
    "usage": {
        "topicPolicyUnits": 0,
        "contentPolicyUnits": 0,
        "wordPolicyUnits": 0,
        "sensitiveInformationPolicyUnits": 0,
        "sensitiveInformationPolicyFreeUnits": 0,
        "contextualGroundingPolicyUnits": 0
    },
    "action": "GUARDRAIL_INTERVENED",
    "outputs": [
        {
            "text": "Sorry, the model cannot answer this question."
        }
    ],
    "assessments": [
        {
            "contentPolicy": {
                "filters": [
                    {
                        "type": "HATE",
                        "confidence": "MEDIUM",
                        "filterStrength": "HIGH",
                        "action": "BLOCKED"
                    }
                ]
            },
            "invocationMetrics": {
                "guardrailProcessingLatency": 918,
                "usage": {
                    "topicPolicyUnits": 0,
                    "contentPolicyUnits": 0,
                    "wordPolicyUnits": 0,
                    "sensitiveInformationPolicyUnits": 0,
                    "sensitiveInformationPolicyFreeUnits": 0,
                    "contextualGroundingPolicyUnits": 0
                },
                "guardrailCoverage": {
                    "images": {
                        "guarded": 1,
                        "total": 1
                    }
                }
            }
        }
    ],
    "guardrailCoverage": {
        "images": {
            "guarded": 1,
            "total": 1
        }
    }
}

Clean up

When you’re ready to remove the moderation pipeline you built, you must clean up the resources you created to avoid unnecessary charges. Complete the following steps:

  1. On the Amazon S3 console, remove the event notification configuration in the bucket that triggers the Lambda function.
  2. Delete the bucket.
  3. On the Lambda console, delete the moderation function you created.
  4. On the IAM console, remove the execution role you created for the Lambda function.
  5. If you created a guardrail specifically for this project and don’t need it for other purposes, remove it using the Amazon Bedrock console.

With these cleanup steps complete, you have successfully removed the components of your image moderation pipeline. You can recreate this solution in the future by following the steps outlined in this post—this highlights the ease of cloud-based, serverless architectures.

Conclusion

In the fashion industry, protecting your brand’s reputation while maintaining creative innovation is paramount. By implementing Amazon Bedrock Guardrails multimodal toxicity detection, fashion brands can automatically screen content for potentially harmful material before it impacts their reputation or violates their ethical standards. As the fashion industry continues to evolve digitally, implementing robust content moderation systems isn’t just about risk management—it’s about building trust with your customers and maintaining brand integrity. Whether you’re an established fashion house or an emerging brand, this solution offers an efficient way to uphold your content standards. The solution we outlined in this post provides a scalable, serverless architecture that accomplishes the following:

  • Automatically processes new image uploads
  • Uses advanced AI capabilities through Amazon Bedrock
  • Provides immediate feedback on content acceptability
  • Requires minimal maintenance after it’s deployed

If you’re interested in further insights on Amazon Bedrock Guardrails and its practical use, refer to the video Amazon Bedrock Guardrails: Make Your AI Safe and Ethical, and the post Amazon Bedrock Guardrails image content filters provide industry-leading safeguards, helping customer block up to 88% of harmful multimodal content: Generally available today.


About the Authors

Jordan Jones is a Solutions Architect at AWS within the Cloud Sales Center organization. He uses cloud technologies to solve complex problems, bringing defense industry experience and expertise in various operating systems, cybersecurity, and cloud architecture. He enjoys mentoring aspiring professionals and speaking on various career panels. Outside of work, he volunteers within the community and can be found watching Golden State Warriors games, solving Sudoku puzzles, or exploring new cultures through world travel.

Jean Jacques Mikem is a Solutions Architect at AWS with a passion for designing secure and scalable technology solutions. He uses his expertise in cybersecurity and technological hardware to architect robust systems that meet complex business needs. With a strong foundation in security principles and computing infrastructure, he excels at creating solutions that bridge business requirements with technical implementation.

Read More

A Gaming GPU Helps Crack the Code on a Thousand-Year Cultural Conversation

A Gaming GPU Helps Crack the Code on a Thousand-Year Cultural Conversation

Ceramics — the humble mix of earth, fire and artistry — have been part of a global conversation for millennia.

From Tang Dynasty trade routes to Renaissance palaces, from museum vitrines to high-stakes auction floors, they’ve carried culture across borders, evolving into status symbols, commodities and pieces of contested history. Their value has been shaped by aesthetics and economics, empire and, now, technology.

This figure visualizes 20 representative Chinese ceramic craftsmanship styles across seven historical periods, ranging from the Tang Dynasty (618–907 AD) to the Modern era (1913–2025). These styles, including kiln-specific categories and decorative techniques, were selected for their historical significance and visual distinctiveness for the AI’s training dataset. Courtesy of Yanfeng Hu, Siqi Wu, Zhuoran Ma and Si Cheng.

In a lab at University Putra Malaysia, that legacy meets silicon. Researchers there, alongside colleagues at UNSW Sydney, have built an AI system that can classify Chinese ceramics and predict their value with uncanny precision. The tool uses deep learning to analyze decorative motifs, shapes and kiln-specific craftsmanship. It predicts price categories based on real auction data from institutions like Sotheby’s and Christie’s, achieving test accuracy as high as 99%.

Beyond form, the AI also analyzes the intricate decorative patterns found on Chinese ceramics, which are organized into six major categories: plant patterns, animal motifs, landscapes, human figures, crackled glaze patterns and geometric designs. The system annotates images at the category level based on the most visually dominant pattern types. Courtesy of Yanfeng Hu, Siqi Wu, Zhuoran Ma, and Si Cheng.

It’s all powered by an NVIDIA GeForce RTX 3090, a consumer-grade GPU beloved by gamers, explains Siqi Wu, one of the researchers behind the project. Not a data center, not specialized industrial hardware, just the same chip pushing frame rates for gamers enjoying Cyberpunk 2077 and Alan Wake 2 across the world.

The motivation is as old as the trade routes those ceramics once traveled: access, but in this case, access to expertise rather than material goods.

The AI system employs a typological classification system for ceramic vessel shapes, based on modular morphological parts like the bottle neck, handle, shoulder, spout, body and base. This approach allows for detailed analysis and classification of shapes such as bottles, jars, plates, bowls, cups, pots and washbasins. Courtesy of Yanfeng Hu, Siqi Wu, Zhuoran Ma and Si Cheng.

“Artifact pricing and dating still heavily rely on expert judgment,” Wu said. That expertise remains elusive for younger collectors, smaller institutions and digital archive projects. Wu’s team aims to change that by making cultural appraisal more objective, scalable and accessible to a wider audience.

It doesn’t stop at classification. The system pairs its YOLOv11-based detection model with an algorithm that learned market value directly from years of real-world auction results. In one test, the AI assessed a Ming Dynasty artifact at roughly 30% below its final hammer price. It’s a reminder that even in an industry steeped in tradition, algorithms can offer new perspectives.

Those perspectives don’t just quantify heritage, they extend the conversation. The team is already exploring AI for other forms of cultural visual heritage, from Cantonese opera costumes to historical murals.

For now, a graphics card built for gaming is parsing centuries of craftsmanship and entering one of the world’s oldest and most global debates: what makes something valuable?

Read More

Indonesia on Track to Achieve Sovereign AI Goals With NVIDIA, Cisco and IOH

Indonesia on Track to Achieve Sovereign AI Goals With NVIDIA, Cisco and IOH

As one of the world’s largest emerging markets, Indonesia is making strides toward its “Golden 2045 Vision” — an initiative tapping digital technologies and bringing together government, enterprises, startups and higher education to enhance productivity, efficiency and innovation across industries.

Building out the nation’s AI infrastructure is a crucial part of this plan.

That’s why Indonesian telecommunications leader Indosat Ooredoo Hutchison, aka Indosat or IOH, has partnered with Cisco and NVIDIA to support the establishment of Indonesia’s AI Center of Excellence (CoE). Led by the Ministry of Communications and Digital Affairs, called Komdigi, the CoE aims to advance secure technologies, cultivate local talent and foster innovation through collaboration with startups.

Indosat Ooredoo Hutchison President Director and CEO Vikram Sinha, Cisco Chair and CEO Chuck Robbins and NVIDIA Senior Vice President of Telecom Ronnie Vasishta today detailed the purpose and potential of the CoE during a fireside chat at Indonesia AI Day, a conference focused on how artificial intelligence can fuel the nation’s digital independence and economic growth.

As part of the CoE, a new NVIDIA AI Technology Center will offer research support, NVIDIA Inception program benefits for eligible startups, and NVIDIA Deep Learning Institute training and certification to upskill local talent.

Pull quote graphic: "At Indosat, we believe AI must be a force for inclusion — not just in access, but in opportunity." - Vikram Sinha, President Director and CEO of IOH

“With the support of global partners, we’re accelerating Indonesia’s path to economic growth by ensuring Indonesians are not just users of AI, but creators and innovators,” Sinha added.

“The AI era demands fundamental architectural shifts and a workforce with digital skills to thrive,” Robbins said. “Together with Indosat, NVIDIA and Komdigi, Cisco will securely power the AI Center of Excellence — enabling innovation and skills development, and accelerating Indonesia’s growth.”

“Democratizing AI is more important than ever,” Vasishta added. “Through the new NVIDIA AI Technology Center, we’re helping Indonesia build a sustainable AI ecosystem that can serve as a model for nations looking to harness AI for innovation and economic growth.”

Making AI More Accessible

The Indonesia AI CoE will comprise an AI factory that features full-stack NVIDIA AI infrastructure — including NVIDIA Blackwell GPUs, NVIDIA Cloud Partner reference architectures and NVIDIA AI Enterprise software — as well as an intelligent security system powered by Cisco.

Called the Sovereign Security Operations Center Cloud Platform, the Cisco-powered system combines AI-based threat detection, localized data control and managed security services for the AI factory.

Building on the sovereign AI initiatives Indonesia’s technology leaders announced with NVIDIA last year, the CoE will bolster the nation’s AI strategy through four core pillars:

Graphic includes four core pillars of the work's strategic approach. 1) Sovereign Infrastructure: Establishing AI infrastructure for secure, scalable, high-performance AI workloads tailored to Indonesia’s digital ambitions. 2) Secure AI Workloads: Using Cisco’s intelligent infrastructure to connect and safeguard the nation’s digital assets and intellectual property. 3) AI for All: Giving hundreds of millions of Indonesians access to AI by 2027, breaking down geographical barriers and empowering developers across the nation. 4) Talent and Development Ecosystem: Aiming to equip 1 million people with digital skills in networking, security and AI by 2027.

Some 28 independent software vendors and startups are already using IOH’s NVIDIA-powered AI infrastructure to develop cutting-edge technologies that can speed and ease workflows across higher education and research, food security, bureaucratic reform, smart cities and mobility, and healthcare.

With Indosat’s coverage across the archipelago, the company can reach hundreds of millions of Bahasa Indonesian speakers with its large language model (LLM)-powered applications.

For example, using Indosat’s Sahabat-AI collection of Bahasa Indonesian LLMs, the Indonesia government and Hippocratic AI are collaborating to develop an AI agent system that provides preventative outreach capabilities, such as helping women subscribers over the age of 50 schedule a mammogram. This can help prevent or combat breast cancer and other health complications across the population.

Separately, Sahabat-AI also enables Indosat’s AI chatbot to answer queries in the Indonesian language for various citizen and resident services. A person could ask about processes for updating their national identification card, as well as about tax rates, payment procedures, deductions and more.

In addition, a government-led forum is developing trustworthy AI frameworks tailored to Indonesian values for the safe, responsible development of artificial intelligence and related policies.

Looking forward, Indosat and NVIDIA plan to deploy AI-RAN technologies that can reach even broader audiences using AI over wireless networks.

Learn more about NVIDIA-powered AI infrastructure for telcos.

Read More

Apple Machine Learning Research at ICML 2025

Apple researchers are advancing AI and ML through fundamental research, and to support the broader research community and help accelerate progress in this field, we share much of this research through publications and engagement at conferences. Next week, the International Conference on Machine Learning (ICML) will be held in Vancouver, Canada, and Apple is proud to once again participate in this important event for the research community and to be an industry sponsor.
At the main conference and associated workshops, Apple researchers will present new research across a number of topics in AI…Apple Machine Learning Research

Point-3D LLM: Studying the Impact of Token Structure for 3D Scene Understanding With Large Language Models

Effectively representing 3D scenes for Multimodal Large Language Models (MLLMs) is crucial yet challenging. Existing approaches commonly only rely on 2D image features and use varied tokenization approaches. This work presents a rigorous study of 3D token structures, systematically comparing video-based and point-based representations while maintaining consistent model backbones and parameters. We propose a novel approach that enriches visual tokens by incorporating 3D point cloud features from a Sonata pretrained Point Transformer V3 encoder. Our experiments demonstrate that merging explicit…Apple Machine Learning Research

QuantSpec: Self-Speculative Decoding with Hierarchical Quantized KV Cache

Large Language Models (LLMs) are increasingly being deployed on edge devices for long-context settings, creating a growing need for fast and efficient long-context inference. In these scenarios, the Key-Value (KV) cache is the primary bottleneck in terms of both GPU memory and latency, as the full KV cache must be loaded for each decoding step. While speculative decoding is a widely accepted technique to accelerate autoregressive decoding, existing methods often struggle to achieve significant speedups due to inefficient KV cache optimization strategies and result in low acceptance rates. To…Apple Machine Learning Research

Self-reflective Uncertainties: Do LLMs Know Their Internal Answer Distribution?

This paper was accepted at the Workshop on Reliable and Responsible Foundation Models (RRFMs) Workshop at ICML 2025.
Uncertainty quantification plays a pivotal role when bringing large language models (LLMs) to end-users. Its primary goal is that an LLM should indicate when it is unsure about an answer it gives. While this has been revealed with numerical certainty scores in the past, we propose to use the rich output space of LLMs, the space of all possible strings, to give a string that describes the uncertainty. In particular, we seek a string that describes the distribution of LLM answers…Apple Machine Learning Research