Rewriting SymCrypt in Rust to modernize Microsoft’s cryptographic library 

Rewriting SymCrypt in Rust to modernize Microsoft’s cryptographic library 

Three white icons on a gradient background that transitions from blue on the left to pink on the right. The first icon, on the left, is a microchip with a padlock in the center. The middle icon is a flowchart diagram with connected shapes. The third icon, on the right, consists of two angle brackets facing each other.

Outdated coding practices and memory-unsafe languages like C are putting software, including cryptographic libraries, at risk. Fortunately, memory-safe languages like Rust, along with formal verification tools, are now mature enough to be used at scale, helping prevent issues like crashes, data corruption, flawed implementation, and side-channel attacks.

To address these vulnerabilities and improve memory safety, we’re rewriting SymCrypt (opens in new tab)—Microsoft’s open-source cryptographic library—in Rust. We’re also incorporating formal verification methods. SymCrypt is used in Windows, Azure Linux, Xbox, and other platforms.

Currently, SymCrypt is primarily written in cross-platform C, with limited use of hardware-specific optimizations through intrinsics (compiler-provided low-level functions) and assembly language (direct processor instructions). It provides a wide range of algorithms, including AES-GCM, SHA, ECDSA, and the more recent post-quantum algorithms ML-KEM and ML-DSA. 

Formal verification will confirm that implementations behave as intended and don’t deviate from algorithm specifications, critical for preventing attacks. We’ll also analyze compiled code to detect side-channel leaks caused by timing or hardware-level behavior.

Proving Rust program properties with Aeneas

Program verification is the process of proving that a piece of code will always satisfy a given property, no matter the input. Rust’s type system profoundly improves the prospects for program verification by providing strong ownership guarantees, by construction, using a discipline known as “aliasing xor mutability”.

For example, reasoning about C code often requires proving that two non-const pointers are live and non-overlapping, a property that can depend on external client code. In contrast, Rust’s type system guarantees this property for any two mutably borrowed references.

As a result, new tools have emerged specifically for verifying Rust code. We chose Aeneas (opens in new tab) because it helps provide a clean separation between code and proofs.

Developed by Microsoft Azure Research in partnership with Inria, the French National Institute for Research in Digital Science and Technology, Aeneas connects to proof assistants like Lean (opens in new tab), allowing us to draw on a large body of mathematical proofs—especially valuable given the mathematical nature of cryptographic algorithms—and benefit from Lean’s active user community.

Compiling Rust to C supports backward compatibility  

We recognize that switching to Rust isn’t feasible for all use cases, so we’ll continue to support, extend, and certify C-based APIs as long as users need them. Users won’t see any changes, as Rust runs underneath the existing C APIs.

Some users compile our C code directly and may rely on specific toolchains or compiler features that complicate the adoption of Rust code. To address this, we will use Eurydice (opens in new tab), a Rust-to-C compiler developed by Microsoft Azure Research, to replace handwritten C code with C generated from formally verified Rust. Eurydice (opens in new tab) compiles directly from Rust’s MIR intermediate language, and the resulting C code will be checked into the SymCrypt repository alongside the original Rust source code.

As more users adopt Rust, we’ll continue supporting this compilation path for those who build SymCrypt from source code but aren’t ready to use the Rust compiler. In the long term, we hope to transition users to either use precompiled SymCrypt binaries (via C or Rust APIs), or compile from source code in Rust, at which point the Rust-to-C compilation path will no longer be needed.

Microsoft Research Blog

AIOpsLab: Building AI agents for autonomous clouds

AIOpsLab is an open-source framework designed to evaluate and improve AI agents for cloud operations, offering standardized, scalable benchmarks for real-world testing, enhancing cloud system reliability.


Timing analysis with Revizor 

Even software that has been verified for functional correctness can remain vulnerable to low-level security threats, such as side channels caused by timing leaks or speculative execution. These threats operate at the hardware level and can leak private information, such as memory load addresses, branch targets, or division operands, even when the source code is provably correct. 

To address this, we’re extending Revizor (opens in new tab), a tool developed by Microsoft Azure Research, to more effectively analyze SymCrypt binaries. Revizor models microarchitectural leakage and uses fuzzing techniques to systematically uncover instructions that may expose private information through known hardware-level effects.  

Earlier cryptographic libraries relied on constant-time programming to avoid operations on secret data. However, recent research has shown that this alone is insufficient with today’s CPUs, where every new optimization may open a new side channel. 

By analyzing binary code for specific compilers and platforms, our extended Revizor tool enables deeper scrutiny of vulnerabilities that aren’t visible in the source code.

Verified Rust implementations begin with ML-KEM

This long-term effort is in alignment with the Microsoft Secure Future Initiative and brings together experts across Microsoft, building on decades of Microsoft Research investment in program verification and security tooling.

A preliminary version of ML-KEM in Rust is now available on the preview feature/verifiedcrypto (opens in new tab) branch of the SymCrypt repository. We encourage users to try the Rust build and share feedback (opens in new tab). Looking ahead, we plan to support direct use of the same cryptographic library in Rust without requiring C bindings. 

Over the coming months, we plan to rewrite, verify, and ship several algorithms in Rust as part of SymCrypt. As our investment in Rust deepens, we expect to gain new insights into how to best leverage the language for high-assurance cryptographic implementations with low-level optimizations. 

As performance is key to scalability and sustainability, we’re holding new implementations to a high bar using our benchmarking tools to match or exceed existing systems.

Looking forward 

This is a pivotal moment for high-assurance software. Microsoft’s investment in Rust and formal verification presents a rare opportunity to advance one of our key libraries. We’re excited to scale this work and ultimately deliver an industrial-grade, Rust-based, FIPS-certified cryptographic library.

The post Rewriting SymCrypt in Rust to modernize Microsoft’s cryptographic library  appeared first on Microsoft Research.

Read More

Cisco and NVIDIA Advance Security for Enterprise AI Factories

Cisco and NVIDIA Advance Security for Enterprise AI Factories

Cisco and NVIDIA are helping set a new standard for secure, scalable and high-performance enterprise AI.

Announced today at the Cisco Live conference in San Diego, the Cisco AI Defense and Hypershield security solutions tap into NVIDIA AI to deliver comprehensive visibility, validation and runtime protection across entire AI workflows. This builds on the Cisco Secure AI Factory with NVIDIA unveiled at the NVIDIA GTC conference in March.

As AI moves to the center of every industry, enterprises need more than just speed — they need trust. From data ingestion and model training to deployment and inference, Cisco Secure AI Factory with NVIDIA can provide continuous monitoring and protection of AI workloads.

Cisco AI Defense and Hypershield integrate with NVIDIA AI for high-performance, scalable and more trustworthy AI responses for running agentic and generative AI workloads. The NVIDIA Enterprise AI Factory validated design now includes Cisco AI Defense and Hypershield to safeguard every stage of the AI lifecycle — which is key to helping enterprises confidently deploy AI at scale.

Open models post-trained with NVIDIA NeMo and safeguarded with NVIDIA Blueprints can now be validated and secured using AI Defense. Cisco security, privacy and safety models run as NVIDIA NIM microservices to optimize inference performance for production AI. Cisco AI Defense provides runtime visibility and monitoring of AI applications and agents deployed on the NVIDIA AI platform.

Cisco Hypershield will soon work seamlessly with NVIDIA BlueField DPUs and the NVIDIA DOCA Argus framework, bringing pervasive, distributed security and real-time threat detection to every node of the AI infrastructure.

Whether their workloads are running in a data center or at the edge, organizations can benefit from this real-time threat detection at the network, server and application layers. Together, NVIDIA and Cisco are enabling enterprises to maintain zero-trust security across distributed AI environments, no matter where data and workloads reside.

Maximizing AI Networking Performance

AI workloads are data hungry and latency sensitive. To meet these demands, Cisco and NVIDIA have enhanced AI networking with:

  • Cisco Intelligent Packet Flow: Dynamically steers traffic using real-time telemetry and congestion awareness, optimizing performance across AI fabrics.
  • NVIDIA Spectrum-X: This AI-optimized Ethernet platform delivers high-throughput, low-latency connectivity with advanced routing and congestion control.
  • End-to-End Visibility: Unified monitoring across networks, GPUs and distributed AI jobs means issues are detected proactively — before they impact performance or security.

Expanded AI PODs for Flexible, Scalable AI

To support the evolving needs of enterprise AI, Cisco is expanding its AI PODs — modular, validated building blocks for diverse AI workloads, including training, fine-tuning and inference. This flexibility lets organizations scale AI initiatives efficiently and securely, whether deploying a handful of models or running massive, distributed AI factories.

The new NVIDIA RTX PRO 6000 Blackwell Server GPU is now available for order with Cisco UCS C845A M8 servers, providing exceptional performance for next-generation AI applications.

Learn more about the Cisco Secure AI Factory with NVIDIA, a blueprint for organizations to confidently scale AI, accelerate innovation and protect their most valuable assets.

Read More

Clear Skies Ahead: New NVIDIA Earth-2 Generative AI Foundation Model Simulates Global Climate at Kilometer-Scale Resolution

Clear Skies Ahead: New NVIDIA Earth-2 Generative AI Foundation Model Simulates Global Climate at Kilometer-Scale Resolution

With a more detailed simulation of the Earth’s climate, scientists and researchers can better predict and mitigate the effects of climate change.

NVIDIA’s bringing more clarity to this work with cBottle — short for Climate in a Bottle — the world’s first generative AI foundation model designed to simulate global climate at kilometer resolution.

Part of the NVIDIA Earth-2 platform, the model can generate realistic atmospheric states that can be conditioned on inputs like the time of day, day of the year and sea surface temperatures. This offers a new way to understand and anticipate Earth’s most complex natural systems.

The Earth-2 platform features a software stack and tools that combine the power of AI, GPU acceleration, physical simulations and computer graphics. This helps enable the creation of interactive digital twins for simulating and visualizing weather, as well as delivering climate predictions at planetary scale. With cBottle, these predictions can be made thousands of times faster and with more energy efficiency than traditional numerical models, without compromising accuracy.

Leading scientific research institutions — including the Max-Planck-Institute for Meteorology (MPI-M) and Allen Institute for AI (Ai2) — are exploring cBottle to compress, distill and turn Earth observation data and ultra-high-resolution climate simulations into a queryable and interactive generative AI system.

cBottle was field-tested at the World Climate Research Programme Global KM-Scale Hackathon. The event was organized across eight countries and 10 climate simulation centers with the goal of advancing the analysis and development of high-resolution Earth-system models and broadening access to high-resolution high-fidelity climate data.

Revolutionizing Climate Modeling With AI

Climate informatics is traditionally time-, labor- and compute-intensive, requiring sophisticated analysis of tens of petabytes of data stores.

cBottle, incorporating NVIDIA GPU acceleration and the highly optimized NVIDIA Earth-2 stack, uses advanced AI to compress massive amounts of climate simulation data. It’s capable of reducing petabytes of data by up to 3,000x for an individual weather sample — translating to a 3,000,000x data size reduction for a collection of 1,000 samples.

cBottle was trained on high-resolution physical climate simulations, as well as measurement-constrained estimates of observed atmospheric states over the past 50 years.

The model can fill in missing or corrupted climate data, correct biased climate models, super-resolve low-resolution climate data and synthesize information based on patterns and previous observations. cBottle’s extreme data efficiency enables training on just four weeks of kilometer-scale climate simulations.

Global Collaboration for Planetary-Scale Impact

Leading climate institutions are using NVIDIA Earth-2 to advance climate simulation.

MPI-M has tapped Earth-2 to pioneer kilometer-scale climate modeling, using its ICON Earth system model. Harnessing NVIDIA GPU acceleration and performance optimizations, MPI-M researchers led a team that performed the first ever kilometer-scale simulations of the full Earth system, simulating and visualizing Earth’s climate with remarkable detail.

“In the face of a rapidly changing climate, the latest progress with Earth-2 represents a transformative leap in our ability to understand, predict and adapt to the world around us,” said Bjorn Stevens, director of the Max Planck Institute for Meteorology. “By harnessing NVIDIA’s advanced AI and accelerated computing, we’re building a digital twin of the planet — marking a new era where climate science becomes accessible and actionable for all, enabling informed decisions that safeguard our collective future.”

Ai2 and NVIDIA are collaborating to accelerate and enhance climate modeling using the Earth-2 AI stack and GPUs, focusing on making climate simulations faster, more energy efficient and more accessible at high resolutions. This is critical for scientific research and practical applications in weather prediction and climate resilience.

“Planning for climate change challenges societies worldwide,” said Christopher Bretherton, senior director of climate modeling at Ai2. “cBottle is an elegant use of generative AI and an exciting new resource for efficiently simulating local extreme weather, such as flooding rains or hot dry winds that spread wildfire.”

Using cBottle in NVIDIA Earth-2, developers can build climate digital twins to interactively explore and visualize kilometer-scale climate data, as well as predict possible scenarios at low latency and with high throughput.

The cBottle foundation model is available for early access. Climate AI researchers interested in retraining the model can access cBottle codebase from GitHub and the preprint on arXiv.

Watch the NVIDIA GTC Paris at VivaTech keynote from NVIDIA founder and CEO Jensen Huang, as well as the special address on NVIDIA CUDA-X libraries, to learn more.

See notice regarding software product information.

Read More

The Blue Lion Supercomputer Will Run on NVIDIA Vera Rubin — Here’s Why That Matters

The Blue Lion Supercomputer Will Run on NVIDIA Vera Rubin — Here’s Why That Matters

Germany’s Leibniz Supercomputing Centre, LRZ, is gaining a new supercomputer that delivers roughly 30x more computing power compared with SuperMUC-NG, the current LRZ high-performance computer. It’s called Blue Lion. And it will run on the NVIDIA Vera Rubin architecture.

That’s new. Until now, LRZ — part of the Gauss Centre for Supercomputing, Germany’s leading HPC institution — had only said its next system would use “next-generation” NVIDIA accelerators and processors.

We’re confirming it: That next generation is Vera Rubin, NVIDIA’s upcoming platform for AI and accelerated science.

If the name sounds familiar, it should. Last month, Lawrence Berkeley National Lab unveiled Doudna, its next flagship system that will also be powered by Vera Rubin.

Two continents. Two systems. Same architecture.

What Is Vera Rubin?

Vera Rubin is a superchip. It combines:

  • Rubin GPU — the successor to NVIDIA Blackwell
  • Vera CPU — NVIDIA’s first custom CPU, built to work in lockstep with the GPU

Together, they form a platform built to collapse simulation, data and AI into a single, high-bandwidth, low-latency engine for science. It combines shared memory, coherent compute and in-network acceleration — and is launching in the second half of 2026.

Blue Lion is built to meet it.

About the System

HPE is building Blue Lion. It will use next-generation HPE Cray technology and feature NVIDIA GPUs in a system equipped with powerful storage and interconnect that harnesses HPE’s 100% fanless direct liquid-cooling systems architecture, which uses warm water delivered through pipes to efficiently cool the supercomputer.

It’s built for researchers working on climate, turbulence, physics and machine learning,  with workflows that blend classic simulation and modern AI. Jobs can scale across the entire system. Heat from the racks will be reused to warm nearby buildings.

And it’s not just local. Blue Lion will support collaborative research projects across Europe.

About the Doudna Supercomputer

Meanwhile, in Berkeley, California, Doudna, the U.S. Department of Energy’s next supercomputer, will also run Vera Rubin. Built by Dell Technologies, the supercomputer is named for Nobel laureate and CRISPR pioneer Jennifer Doudna and will serve over 11,000 researchers when it launches next year.

Doudna will be wired for real-time workflows and optimized for science results per joule of energy. Data streams in from telescopes, genome sequencers and fusion experiments and lands directly in the system via NVIDIA Quantum-X800 InfiniBand networking. Processing starts instantly. Feedback loops are live.

It’s designed to advance fusion energy, materials discovery and biology faster. Compared with its predecessor, it’s expected to deliver 10x more application performance, using just 2-3x the power. That’s 3-5x better performance per watt.

Why This All Matters

Blue Lion and Doudna aren’t just big machines. They’re signals of what comes next: a shift in how high-performance systems are designed, used and connected.

AI is no longer an add-on. Simulation isn’t a silo. Data isn’t parked — it moves. Science is becoming a real-time discipline. The systems that power it need to keep up.

Vera Rubin is built for that.

Read More

Apple Machine Learning Research at CVPR 2025

Apple researchers are advancing AI and ML through fundamental research, and to support the broader research community and help accelerate progress in this field, we share much of our research through publications and engagement at conferences. This week, the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), will take place in Nashville, Tennessee. Apple is proud to once again participate in this important event for the community and to be an industry sponsor.
At the main conference and associated workshops, Apple researchers will present new research across a number of…Apple Machine Learning Research

Building intelligent AI voice agents with Pipecat and Amazon Bedrock – Part 1

Building intelligent AI voice agents with Pipecat and Amazon Bedrock – Part 1

Voice AI is transforming how we interact with technology, making conversational interactions more natural and intuitive than ever before. At the same time, AI agents are becoming increasingly sophisticated, capable of understanding complex queries and taking autonomous actions on our behalf. As these trends converge, you see the emergence of intelligent AI voice agents that can engage in human-like dialogue while performing a wide range of tasks.

In this series of posts, you will learn how to build intelligent AI voice agents using Pipecat, an open-source framework for voice and multimodal conversational AI agents, with foundation models on Amazon Bedrock. It includes high-level reference architectures, best practices and code samples to guide your implementation.

Approaches for building AI voice agents

There are two common approaches for building conversational AI agents:

  • Using cascaded models: In this post (Part 1), you will learn about the cascaded models approach, diving into the individual components of a conversational AI agent. With this approach, voice input passes through a series of architecture components before a voice response is sent back to the user. This approach is also sometimes referred to as pipeline or component model voice architecture.
  • Using speech-to-speech foundation models in a single architecture: In Part 2, you will learn how Amazon Nova Sonic, a state-of-the-art, unified speech-to-speech foundation model can enable real-time, human-like voice conversations by combining speech understanding and generation in a single architecture.

Common use cases

AI voice agents can handle multiple use cases, including but not limited to:

  • Customer Support: AI voice agents can handle customer inquiries 24/7, providing instant responses and routing complex issues to human agents when necessary.
  • Outbound Calling: AI agents can conduct personalized outreach campaigns, scheduling appointments or following up on leads with natural conversation.
  • Virtual Assistants: Voice AI can power personal assistants that help users manage tasks, answer questions.

Architecture: Using cascaded models to build an AI voice agent

To build an agentic voice AI application with the cascaded models approach, you need to orchestrate multiple architecture components involving multiple machine learning and foundation models.

Reference Architecture - Pipecat

Figure 1: Architecture overview of a Voice AI Agent using Pipecat

These components include:

WebRTC Transport: Enables real-time audio streaming between client devices and the application server.

Voice Activity Detection (VAD): Detects speech using Silero VAD with configurable speech start and speech end times, and noise suppression capabilities to remove background noise and enhance audio quality.

Automatic Speech Recognition (ASR): Uses Amazon Transcribe for accurate, real-time speech-to-text conversion.

Natural Language Understanding (NLU): Interprets user intent using latency-optimized inference on Bedrock with models like Amazon Nova Pro optionally enabling prompt caching to optimize for speed and cost efficiency in Retrieval Augmented Generation (RAG) use cases.

Tools Execution and API Integration: Executes actions or retrieves information for RAG by integrating backend services and data sources via Pipecat Flows and leveraging the tool use capabilities of foundation models.

Natural Language Generation (NLG): Generates coherent responses using Amazon Nova Pro on Bedrock, offering the right balance of quality and latency.

Text-to-Speech (TTS): Converts text responses back into lifelike speech using Amazon Polly with generative voices.

Orchestration Framework: Pipecat orchestrates these components, offering a modular Python-based framework for real-time, multimodal AI agent applications.

Best practices for building effective AI voice agents

Developing responsive AI voice agents requires focus on latency and efficiency. While best practices continue to emerge, consider the following implementation strategies to achieve natural, human-like interactions:

Minimize conversation latency: Use latency-optimized inference for foundation models (FMs) like Amazon Nova Pro to maintain natural conversation flow.

Select efficient foundation models: Prioritize smaller, faster foundation models (FMs) that can deliver quick responses while maintaining quality.

Implement prompt caching: Utilize prompt caching to optimize for both speed and cost efficiency, especially in complex scenarios requiring knowledge retrieval.

Deploy text-to-speech (TTS) fillers: Use natural filler phrases (such as “Let me look that up for you”) before intensive operations to maintain user engagement while the system makes tool calls or long-running calls to your foundation models.

Build a robust audio input pipeline: Integrate components like noise to support clear audio quality for better speech recognition results.

Start simple and iterate: Begin with basic conversational flows before progressing to complex agentic systems that can handle multiple use cases.

Region availability: Low-latency and prompt caching features may only be available in certain regions. Evaluate the trade-off between these advanced capabilities and selecting a region that is geographically closer to your end-users.

Example implementation: Build your own AI voice agent in minutes

This post provides a sample application on Github that demonstrates the concepts discussed. It uses Pipecat and and its accompanying state management framework, Pipecat Flows with Amazon Bedrock, along with Web Real-time Communication (WebRTC) capabilities from Daily to create a working voice agent you can try in minutes.

Prerequisites

To setup the sample application, you should have the following prerequisites:

  • Python 3.10+
  • An AWS account with appropriate Identity and Access Management (IAM) permissions for Amazon Bedrock, Amazon Transcribe, and Amazon Polly
  • Access to foundation models on Amazon Bedrock
  • Access to an API key for Daily
  • Modern web browser (such as Google Chrome or Mozilla Firefox) with WebRTC support

Implementation Steps

After you complete the prerequisites, you can start setting up your sample voice agent:

  1. Clone the repository:
    git clone https://github.com/aws-samples/build-intelligent-ai-voice-agents-with-pipecat-and-amazon-bedrock 
    cd build-intelligent-ai-voice-agents-with-pipecat-and-amazon-bedrock/part-1 
  2. Set up the environment:
    cd server
    python3 -m venv venv
    source venv/bin/activate  # Windows: venvScriptsactivate
    pip install -r requirements.txt
  3. Configure API key in.env:
    DAILY_API_KEY=your_daily_api_key
    AWS_ACCESS_KEY_ID=your_aws_access_key_id
    AWS_SECRET_ACCESS_KEY=your_aws_secret_access_key
    AWS_REGION=your_aws_region
  4. Start the server:
    python server.py
  5. Connect via browser at http://localhost:7860 and grant microphone access
  6. Start the conversation with your AI voice agent

Customizing your voice AI agent

To customize, you can start by:

  • Modifying flow.py to change conversation logic
  • Adjusting model selection in bot.py for your latency and quality needs

To learn more, see documentation for Pipecat Flows and review the README of our code sample on Github.

Cleanup

The instructions above are for setting up the application in your local environment. The local application will leverage AWS services and Daily through AWS IAM and API credentials. For security and to avoid unanticipated costs, when you are finished, delete these credentials to make sure that they can no longer be accessed.

Accelerating voice AI implementations

To accelerate AI voice agent implementations, AWS Generative AI Innovation Center (GAIIC) partners with customers to identify high-value use cases and develop proof-of-concept (PoC) solutions that can quickly move to production.

Customer Testimonial: InDebted

InDebted, a global fintech transforming the consumer debt industry, collaborates with AWS to develop their voice AI prototype.

“We believe AI-powered voice agents represent a pivotal opportunity to enhance the human touch in financial services customer engagement. By integrating AI-enabled voice technology into our operations, our goals are to provide customers with faster, more intuitive access to support that adapts to their needs, as well as improving the quality of their experience and the performance of our contact centre operations”

says Mike Zhou, Chief Data Officer at InDebted.

By collaborating with AWS and leveraging Amazon Bedrock, organizations like InDebted can create secure, adaptive voice AI experiences that meet regulatory standards while delivering real, human-centric impact in even the most challenging financial conversations.

Conclusion

Building intelligent AI voice agents is now more accessible than ever through the combination of open-source frameworks such as Pipecat, and powerful foundation models with latency optimized inference and prompt caching on Amazon Bedrock.

In this post, you learned about two common approaches on how to build AI voice agents, delving into the cascaded models approach and its key components. These essential components work together to create an intelligent system that can understand, process, and respond to human speech naturally. By leveraging these rapid advancements in generative AI, you can create sophisticated, responsive voice agents that deliver real value to your users and customers.

To get started with your own voice AI project, try our code sample on Github or contact your AWS account team to explore an engagement with AWS Generative AI Innovation Center (GAIIC).

You can also learn about building AI voice agents using a unified speech-to-speech foundation models, Amazon Nova Sonic in Part 2.


About the Authors

Adithya Suresh serves as a Deep Learning Architect at the AWS Generative AI Innovation Center, where he partners with technology and business teams to build innovative generative AI solutions that address real-world challenges.

Daniel Wirjo is a Solutions Architect at AWS, focused on FinTech and SaaS startups. As a former startup CTO, he enjoys collaborating with founders and engineering leaders to drive growth and innovation on AWS. Outside of work, Daniel enjoys taking walks with a coffee in hand, appreciating nature, and learning new ideas.

Karan Singh is a Generative AI Specialist at AWS, where he works with top-tier third-party foundation model and agentic frameworks providers to develop and execute joint go-to-market strategies, enabling customers to effectively deploy and scale solutions to solve enterprise generative AI challenges.

Xuefeng Liu leads a science team at the AWS Generative AI Innovation Center in the Asia Pacific regions. His team partners with AWS customers on generative AI projects, with the goal of accelerating customers’ adoption of generative AI.

Read More

Stream multi-channel audio to Amazon Transcribe using the Web Audio API

Stream multi-channel audio to Amazon Transcribe using the Web Audio API

Multi-channel transcription streaming is a feature of Amazon Transcribe that can be used in many cases with a web browser. Creating this stream source has it challenges, but with the JavaScript Web Audio API, you can connect and combine different audio sources like videos, audio files, or hardware like microphones to obtain transcripts.

In this post, we guide you through how to use two microphones as audio sources, merge them into a single dual-channel audio, perform the required encoding, and stream it to Amazon Transcribe. A Vue.js application source code is provided that requires two microphones connected to your browser. However, the versatility of this approach extends far beyond this use case—you can adapt it to accommodate a wide range of devices and audio sources.

With this approach, you can get transcripts for two sources in a single Amazon Transcribe session, offering cost savings and other benefits compared to using a separate session for each source.

Challenges when using two microphones

For our use case, using a single-channel stream for two microphones and enabling Amazon Transcribe speaker label identification to identify the speakers might be enough, but there are a few considerations:

  • Speaker labels are randomly assigned at session start, meaning you will have to map the results in your application after the stream has started
  • Mislabeled speakers with similar voice tones can happen, which even for a human is hard to distinguish
  • Voice overlapping can occur when two speakers talk at the same time with one audio source

By using two audio sources with microphones, you can address these concerns by making sure each transcription is from a fixed input source. By assigning a device to a speaker, our application knows in advance which transcript to use. However, you might still encounter voice overlapping if two nearby microphones are picking up multiple voices. This can be mitigated by using directional microphones, volume management, and Amazon Transcribe word-level confidence scores.

Solution overview

The following diagram illustrates the solution workflow.

Application diagram

Application diagram for two microphones

We use two audio inputs with the Web Audio API. With this API, we can merge the two inputs, Mic A and Mic B, into a single audio data source, with the left channel representing Mic A and the right channel representing Mic B.

Then, we convert this audio source to PCM (Pulse-Code Modulation) audio. PCM is a common format for audio processing, and it’s one of the formats required by Amazon Transcribe for the audio input. Finally, we stream the PCM audio to Amazon Transcribe for transcription.

Prerequisites

You should have the following prerequisites in place:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "DemoWebAudioAmazonTranscribe",
      "Effect": "Allow",
      "Action": "transcribe:StartStreamTranscriptionWebSocket",
      "Resource": "*"
    }
  ]
}

Start the application

Complete the following steps to launch the application:

  1. Go to the root directory where you downloaded the code.
  2. Create a .env file to set up your AWS access keys from the env.sample file.
  3. Install packages and run bun install (if you’re using node, run node install).
  4. Start the web server and run bun dev (if you’re using node, run node dev).
  5. Open your browser in http://localhost:5173/.

    Application running on http://localhost:5173

    Application running on http://localhost:5173 with two connected microphones

Code walkthrough

In this section, we examine the important code pieces for the implementation:

  1. The first step is to list the connected microphones by using the browser API navigator.mediaDevices.enumerateDevices():
const devices = await navigator.mediaDevices.enumerateDevices()
return devices.filter((d) => d.kind === 'audioinput')
  1. Next, you need to obtain the MediaStream object for each of the connected microphones. This can be done using the navigator.mediaDevices.getUserMedia() API, which enables access the user’s media devices (such as cameras and microphones). You can then retrieve a MediaStream object that represents the audio or video data from those devices:
const streams = []
const stream = await navigator.mediaDevices.getUserMedia({
  audio: {
    deviceId: device.deviceId,
    echoCancellation: true,
    noiseSuppression: true,
    autoGainControl: true,
  },
})

if (stream) streams.push(stream)
  1. To combine the audio from the multiple microphones, you need to create an AudioContext interface for audio processing. Within this AudioContext, you can use ChannelMergerNode to merge the audio streams from the different microphones. The connect(destination, src_idx, ch_idx) method arguments are:
    • destination – The destination, in our case mergerNode.
    • src_idx – The source channel index, in our case both 0 (because each microphone is a single-channel audio stream).
    • ch_idx – The channel index for the destination, in our case 0 and 1 respectively, to create a stereo output.
// instance of audioContext
const audioContext = new AudioContext({
       sampleRate: SAMPLE_RATE,
})
// this is used to process the microphone stream data
const audioWorkletNode = new AudioWorkletNode(audioContext, 'recording-processor', {...})
// microphone A
const audioSourceA = audioContext.createMediaStreamSource(mediaStreams[0]);
// microphone B
const audioSourceB = audioContext.createMediaStreamSource(mediaStreams[1]);
// audio node for two inputs
const mergerNode = audioContext.createChannelMerger(2);
// connect the audio sources to the mergerNode destination.  
audioSourceA.connect(mergerNode, 0, 0);
audioSourceB.connect(mergerNode, 0, 1);
// connect our mergerNode to the AudioWorkletNode
merger.connect(audioWorkletNode);
  1. The microphone data is processed in an AudioWorklet that emits data messages every defined number of recording frames. These messages will contain the audio data encoded in PCM format to send to Amazon Transcribe. Using the p-event library, you can asynchronously iterate over the events from the Worklet. A more in-depth description about this Worklet is provided in the next section of this post.
import { pEventIterator } from 'p-event'
...

// Register the worklet
try {
  await audioContext.audioWorklet.addModule('./worklets/recording-processor.js')
} catch (e) {
  console.error('Failed to load audio worklet')
}

//  An async iterator 
const audioDataIterator = pEventIterator<'message', MessageEvent<AudioWorkletMessageDataType>>(
  audioWorkletNode.port,
  'message',
)
...

// AsyncIterableIterator: Every time the worklet emits an event with the message `SHARE_RECORDING_BUFFER`, this iterator will return the AudioEvent object that we need.
const getAudioStream = async function* (
  audioDataIterator: AsyncIterableIterator<MessageEvent<AudioWorkletMessageDataType>>,
) {
  for await (const chunk of audioDataIterator) {
    if (chunk.data.message === 'SHARE_RECORDING_BUFFER') {
      const { audioData } = chunk.data
      yield {
        AudioEvent: {
          AudioChunk: audioData,
        },
      }
    }
  }
}
  1. To start streaming the data to Amazon Transcribe, you can use the fabricated iterator and enabled NumberOfChannels: 2 and EnableChannelIdentification: true to enable the dual channel transcription. For more information, refer to the AWS SDK StartStreamTranscriptionCommand documentation.
import {
  LanguageCode,
  MediaEncoding,
  StartStreamTranscriptionCommand,
} from '@aws-sdk/client-transcribe-streaming'

const command = new StartStreamTranscriptionCommand({
    LanguageCode: LanguageCode.EN_US,
    MediaEncoding: MediaEncoding.PCM,
    MediaSampleRateHertz: SAMPLE_RATE,
    NumberOfChannels: 2,
    EnableChannelIdentification: true,
    ShowSpeakerLabel: true,
    AudioStream: getAudioStream(audioIterator),
  })
  1. After you send the request, a WebSocket connection is created to exchange audio stream data and Amazon Transcribe results:
const data = await client.send(command)
for await (const event of data.TranscriptResultStream) {
    for (const result of event.TranscriptEvent.Transcript.Results || []) {
        callback({ ...result })
    }
}

The result object will include a ChannelId property that you can use to identify your microphone source, such as ch_0 and ch_1, respectively.

Deep dive: Audio Worklet

Audio Worklets can execute in a separate thread to provide very low-latency audio processing. The implementation and demo source code can be found in the public/worklets/recording-processor.js file.

For our case, we use the Worklet to perform two main tasks:

  1. Process the mergerNode audio in an iterable way. This node includes both of our audio channels and is the input to our Worklet.
  2. Encode the data bytes of the mergerNode node into PCM signed 16-bit little-endian audio format. We do this for each iteration or when required to emit a message payload to our application.

The general code structure to implement this is as follows:

class RecordingProcessor extends AudioWorkletProcessor {
  constructor(options) {
    super()
  }
  process(inputs, outputs) {...}
}

registerProcessor('recording-processor', RecordingProcessor)

You can pass custom options to this Worklet instance using the processorOptions attribute. In our demo, we set a maxFrameCount: (SAMPLE_RATE * 4) / 10 as a bitrate guide to determine when to emit a new message payload. A message is for example:

this.port.postMessage({
  message: 'SHARE_RECORDING_BUFFER',
  buffer: this._recordingBuffer,
  recordingLength: this.recordedFrames,
  audioData: new Uint8Array(pcmEncodeArray(this._recordingBuffer)), // PCM encoded audio format
})

PCM encoding for two channels

One of the most important sections is how to encode to PCM for two channels. Following the AWS documentation in the Amazon Transcribe API Reference, the AudioChunk is defined by: Duration (s) * Sample Rate (Hz) * Number of Channels * 2. For two channels, 1 second at 16000Hz is: 1 * 16000 * 2 * 2 = 64000 bytes. Our encoding function it should then look like this:

// Notice that input is an array, where each element is a channel with Float32 values between -1.0 and 1.0 from the AudioWorkletProcessor.
const pcmEncodeArray = (input: Float32Array[]) => {
  const numChannels = input.length
  const numSamples = input[0].length
  const bufferLength = numChannels * numSamples * 2 // 2 bytes per sample per channel
  const buffer = new ArrayBuffer(bufferLength)
  const view = new DataView(buffer)

  let index = 0

  for (let i = 0; i < numSamples; i++) {
    // Encode for each channel
    for (let channel = 0; channel < numChannels; channel++) {
      const s = Math.max(-1, Math.min(1, input[channel][i]))
      // Convert the 32 bit float to 16 bit PCM audio waveform samples.
      // Max value: 32767 (0x7FFF), Min value: -32768 (-0x8000) 
      view.setInt16(index, s < 0 ? s * 0x8000 : s * 0x7fff, true)
      index += 2
    }
  }
  return buffer
}

For more information how the audio data blocks are handled, see AudioWorkletProcessor: process() method. For more information on PCM format encoding, see Multimedia Programming Interface and Data Specifications 1.0.

Conclusion

In this post, we explored the implementation details of a web application that uses the browser’s Web Audio API and Amazon Transcribe streaming to enable real-time dual-channel transcription. By using the combination of AudioContext, ChannelMergerNode, and AudioWorklet, we were able to seamlessly process and encode the audio data from two microphones before sending it to Amazon Transcribe for transcription. The use of the AudioWorklet in particular allowed us to achieve low-latency audio processing, providing a smooth and responsive user experience.

You can build upon this demo to create more advanced real-time transcription applications that cater to a wide range of use cases, from meeting recordings to voice-controlled interfaces.

Try out the solution for yourself, and leave your feedback in the comments.


About the Author

Jorge LanzarottiJorge Lanzarotti is a Sr. Prototyping SA at Amazon Web Services (AWS) based on Tokyo, Japan. He helps customers in the public sector by creating innovative solutions to challenging problems.

Read More

How Kepler democratized AI access and enhanced client services with Amazon Q Business

How Kepler democratized AI access and enhanced client services with Amazon Q Business

This is a guest post co-authored by Evan Miller, Noah Kershaw, and Valerie Renda of Kepler Group

At Kepler, a global full-service digital marketing agency serving Fortune 500 brands, we understand the delicate balance between creative marketing strategies and data-driven precision. Our company name draws inspiration from the visionary astronomer Johannes Kepler, reflecting our commitment to bringing clarity to complex challenges and illuminating the path forward for our clients.

In this post, we share how implementing Amazon Q Business transformed our operations by democratizing AI access across our organization while maintaining stringent security standards, resulting in an average savings of 2.7 hours per week per employee in manual work and improved client service delivery.

The challenge: Balancing innovation with security

As a digital marketing agency working with Fortune 500 clients, we faced increasing pressure to use AI capabilities while making sure that we maintain the highest levels of data security. Our previous solution lacked essential features, which led team members to consider more generic solutions. Specifically, the original implementation was missing critical capabilities such as chat history functionality, preventing users from accessing or referencing their prior conversations. This absence of conversation context meant users had to repeatedly provide background information in each interaction. Additionally, the solution had no file upload capabilities, limiting users to text-only interactions. These limitations resulted in a basic AI experience where users often had to compromise by rewriting prompts, manually maintaining context, and working around the inability to process different file formats. The restricted functionality ultimately pushed teams to explore alternative solutions that could better meet their comprehensive needs. Being an International Organization for Standardization (ISO) 27001-certified organization, we needed an enterprise-grade solution that would meet our strict security requirements without compromising on functionality. Our ISO 27001 certification mandates rigorous security controls, which meant that public AI tools weren’t suitable for our needs. We required a solution that could be implemented within our secure environment while maintaining full compliance with our stringent security protocols.

Why we chose Amazon Q Business

Our decision to implement Amazon Q Business was driven by three key factors that aligned perfectly with our needs. First, because our Kepler Intelligence Platform (Kip) infrastructure already resided on Amazon Web Services (AWS), the integration process was seamless. Our Amazon Q Business implementation uses three core connectors (Amazon Simple Storage Service (Amazon S3), Google Drive, and Amazon Athena), though our wider data ecosystem includes 35–45 different platform integrations, primarily flowing through Amazon S3. Second, the commitment from Amazon Q Business to not use our data for model training satisfied our essential security requirements. Finally, the Amazon Q Business apps functionality enabled us to develop no-code solutions for everyday challenges, democratizing access to efficient workflows without requiring additional software developers.

Implementation journey

We began our Amazon Q Business implementation journey in early 2025 with a focused pilot group of 10 participants, expanding to 100 users in February and March, with plans for a full deployment reaching 500+ employees. During this period, we organized an AI-focused hackathon that catalyzed organic adoption and sparked creative solutions. The implementation was unique in how we integrated Amazon Q Business into our existing Kepler Intelligence Platform, rebranding it as Kip AI to maintain consistency with our internal systems.

Kip AI demonstrates how we’ve comprehensively integrated AI capabilities with our existing data infrastructure. We use multiple data sources, including Amazon S3 for our storage needs, Amazon QuickSight for our business intelligence requirements, and Google Drive for team collaboration. At the heart of our system is our custom extract, transform, and load ETL pipeline (Kip SSoT), which we’ve designed to feed data into QuickSight for AI-enabled analytics. We’ve configured Amazon Q Business to seamlessly connect with these data sources, allowing our team members to access insights through both a web interface and browser extension. The following figure shows the architecture of Kip AI.

This integrated approach helps ensure that Kepler’s employees can securely access AI capabilities while maintaining data governance and security requirements crucial for their clients. Access to the platform is secured through AWS Identity and Access Management (IAM), connected to our single sign-on provider, ensuring that only authorized personnel can use the system. This careful approach to security and access management has been crucial in maintaining our clients’ trust while rolling out AI capabilities across our organization.

Transformative use cases and results

The implementation of Amazon Q Business has revolutionized several key areas of our operations. Our request for information (RFI) response process, which traditionally consumed significant time and resources, has been streamlined dramatically. Teams now report saving over 10 hours per RFI response, allowing us to pursue more business opportunities efficiently.

Client communications have also seen substantial improvements. The platform helps us draft clear, consistent, and timely communications, from routine emails to comprehensive status reports and presentations. This enhancement in communication quality has strengthened our client relationships and improved service delivery.

Perhaps most significantly, we’ve achieved remarkable efficiency gains across the organization. Our employees report saving an average of 2.7 hours per week in manual work, with user satisfaction rates exceeding 87%. The platform has enabled us to standardize our approach to insight generation, ensuring consistent, high-quality service delivery across all client accounts.

Looking ahead

As we expand Amazon Q Business access to all Kepler employees (over 500) in the coming months, we’re maintaining a thoughtful approach to deployment. We recognize that some clients have specific requirements regarding AI usage, and we’re carefully balancing innovation with client preferences. This strategic approach includes working to update client contracts and helping clients become more comfortable with AI integration while respecting their current guidelines.

Conclusion

Our experience with Amazon Q Business demonstrates how enterprise-grade AI can be successfully implemented while maintaining strict security standards and respecting client preferences. The platform has not only improved our operational efficiency but has also enhanced our ability to deliver consistent, high-quality service to our clients. What’s particularly impressive is the platform’s rapid deployment capabilities—we were able to implement the solution within weeks, without any coding requirements, and eliminate ongoing model maintenance and data source management expenses. As we continue to expand our use of Amazon Q Business, we’re excited about the potential for further innovation and efficiency gains in our digital marketing services.


About the authors

Evan Miller, Global Head of Product and Data Science, is a strategic product leader who joined Kepler 2013. Currently serving as Global Head of Product and Data Science, he owns the end-to-end product strategy for the Kepler Intelligence Platform (Kip). Under his leadership, Kip has garnered industry recognition, winning awards for Best Performance Management Solution and Best Commerce Technology, while driving significant business impact through innovative features like automated Machine Learning analytics and Marketing Mix Modeling technology.

Noah Kershaw leads the product team at Kepler Group, a global digital marketing agency that helps brands connect with their audiences through data-driven strategies. With a passion for innovation, Noah has been at the forefront of integrating AI solutions to enhance client services and streamline operations. His collaborative approach and enthusiasm for leveraging technology have been key in bringing Kepler’s “Future in Focus” vision to life, helping Kepler and its clients navigate the modern era of marketing with clarity and precision.

Valerie Renda, Director of Data Strategy & Analytics, has a specialized focus on data strategy, analytics, and marketing systems strategy within digital marketing, a field she’s worked in for over eight years. At Kepler, she has made significant contributions to various clients’ data management and martech strategies. She has been instrumental in leading data infrastructure projects, including customer data platform implementations, business intelligence visualization implementations, server-side tracking, martech consolidation, tag migrations, and more. She has also led the development of workflow tools to automate data processes and streamline ad operations to improve internal organizational processes.

Al Destefano is a Sr. Generative AI Specialist on the Amazon Q GTM team based in New York City. At AWS, he uses technical knowledge and business experience to communicate the tangible enterprise benefits when using managed Generative AI AWS services.

Sunanda Patel is a Senior Account Manager with over 15 years of expertise in management consulting and IT sectors, with a focus on business development and people management. Throughout her career, Sunanda has successfully managed diverse client relationships, ranging from non-profit to corporate and large multinational enterprises. Sunanda joined AWS in 2022 as an Account Manager for the Manhattan Commercial sector and now works with strategic commercial accounts, helping them grow in their cloud journey to achieve complex business goals.

Kumar Karra is a Sr. Solutions Architect at AWS supporting SMBs. He is an experienced engineer with deep experience in the software development lifecycle. Kumar looks to solve challenging problems by applying technical, leadership, and business skills. He holds a Master’s Degree in Computer Science and Machine Learning from Georgia Institute of Technology and is based in New York (US).

Read More