No-code data preparation for time series forecasting using Amazon SageMaker Canvas

No-code data preparation for time series forecasting using Amazon SageMaker Canvas

Time series forecasting helps businesses predict future trends based on historical data patterns, whether it’s for sales projections, inventory management, or demand forecasting. Traditional approaches require extensive knowledge of statistical methods and data science methods to process raw time series data.

Amazon SageMaker Canvas offers no-code solutions that simplify data wrangling, making time series forecasting accessible to all users regardless of their technical background. In this post, we explore how SageMaker Canvas and SageMaker Data Wrangler provide no-code data preparation techniques that empower users of all backgrounds to prepare data and build time series forecasting models in a single interface with confidence.

Solution overview

Using SageMaker Data Wrangler for data preparation allows for the modification of data for predictive analytics without programming knowledge. In this solution, we demonstrate the steps associated with this process. The solution includes the following:

  • Data Import from varying sources
  • Automated no-code algorithmic recommendations for data preparation
  • Step-by-step processes for preparation and analysis
  • Visual interfaces for data visualization and analysis
  • Export capabilities post data preparation
  • Built in security and compliance features

In this post, we focus on data preparation for time series forecasting using SageMaker Canvas.

Walkthrough

The following is a walkthrough of the solution for data preparation using Amazon SageMaker Canvas. For the walkthrough, you use the consumer electronics synthetic dataset found in this SageMaker Canvas Immersion Day lab, which we encourage you to try. This consumer electronics related time series (RTS) dataset primarily contains historical price data that corresponds to sales transactions over time. This dataset is designed to complement target time series (TTS) data to improve prediction accuracy in forecasting models, particularly for consumer electronics sales, where price changes can significantly impact buying behavior. The dataset can be used for demand forecasting, price optimization, and market analysis in the consumer electronics sector.

Prerequisites

For this walkthrough, you should have the following prerequisites:

Solution walkthrough

Below, we will provide the solution walkthrough and explain how users are able to use a dataset, prepare the data using no code using Data Wrangler, and run and train a time series forecasting model using SageMaker Canvas.

Sign in to the AWS Management Console and go to Amazon SageMaker AI and then to Canvas. On the Get started page, select Import and prepare option. You will see the following options to import your data set into Sagemaker Data Wrangler. First, select Tabular Data as we will be utilizing this data for our time series forecasting. You will see the following options available to select from:

  1. Local upload
  2. Canvas Datasets
  3. Amazon S3
  4. Amazon Redshift
  5. Amazon Athena
  6. Databricks
  7. MySQL
  8. PostgreSQL
  9. SQL Server
  10. RDS

For this demo, select Local upload. When you use this option, the data is stored in the SageMaker instance, specifically on an Amazon Elastic File System (Amazon EFS) storage volume in the SageMaker Studio environment. This storage is tied to the SageMaker Studio instance, but for more permanent data storage purposes, Amazon Simple Storage Service (Amazon S3) is a good option when working with SageMaker Data Wrangler. For long term data management, Amazon S3 is recommended.

Select the consumer_electronics.csv file from the prerequisites. After selecting the file to import,  you can use the Import settings panel to set your desired configurations. For the purpose of this demo, leave the options to their default values.

Import tabular data screen with sampling methods and sampling size

After the import is complete, use the Data flow options to modify the newly imported data. For future data forecasting, you may need to clean up data for the service to properly understand the values and disregard any errors in the data. SageMaker Canvas has various offerings to accomplish this. Options include Chat for data prep with natural language data modifications and Add Transform. Chat for data prep may be best for users who prefer natural language processing (NLP) interactions and may not be familiar with technical data transformations. Add transform is best for data professionals who know which transformations they want to apply to their data.

For time series forecasting using Amazon SageMaker Canvas, data must be prepared in a certain way for the service to properly forecast and understand the data. To make a time series forecast using SageMaker Canvas, the documentation linked mentions the following requirements:

  • A timestamp column with all values having the datetime type.
  • A target column that has the values that you’re using to forecast future values.
  • An item ID column that contains unique identifiers for each item in your dataset, such as SKU numbers.

The datetime values in the timestamp column must use one of the following formats:

  • YYYY-MM-DD HH:MM:SS
  • YYYY-MM-DDTHH:MM:SSZ
  • YYYY-MM-DD
  • MM/DD/YY
  • MM/DD/YY HH:MM
  • MM/DD/YYYY
  • YYYY/MM/DD HH:MM:SS
  • YYYY/MM/DD
  • DD/MM/YYYY
  • DD/MM/YY
  • DD-MM-YY
  • DD-MM-YYYY

You can make forecasts for the following intervals:

  • 1 min
  • 5 min
  • 15 min
  • 30 min
  • 1 hour
  • 1 day
  • 1 week
  • 1 month
  • 1 year

For this example, remove the $ in the data, by using the Chat for data prep option. Give the chat a prompt such as Can you get rid of the $ in my data, and it will generate code to accommodate your request and modify the data, giving you a no-code solution to prepare the data for future modeling and predictive analysis. Choose Add to Steps to accept this code and apply changes to the data.

Chat for data prep options

You can also convert values to float data type and check for missing data in your uploaded CSV file using either Chat for data prep or Add Transform options. To drop missing values using Data Transform:

  1. Select Add Transform from the interface
  2. Choose Handle Missing from the transform options
  3. Select Drop missing from the available operations
  4. Choose the columns you want to check for missing values
  5. Select Preview to verify the changes
  6. Choose Add to confirm and apply the transformation

SageMaker Data Wrangler interface displaying consumer electronics data, column distributions, and options to handle missing values across all columns

For time-series forecasting, inferring missing values and resampling the data set to a certain frequency (hourly, daily, or weekly) are also important. In SageMaker Data Wrangler, the frequency of data can be altered by choosing Add Transform, selecting Time Series, selecting Resample from the Transform drop down, and then selecting the Timestamp dropdown, ts in this example. Then, you can select advanced options. For example, choose Frequency unit and then select the desired frequency from the list.

SageMaker Data Wrangler interface featuring consumer electronics data, column-wise visualizations, and time series resampling configuration

SageMaker Data Wrangler offers several methods to handle missing values in time-series data through its Handle missing transform. You can choose from options such as forward fill or backward fill, which are particularly useful for maintaining the temporal structure of the data. These operations can be applied by using natural language commands in Chat for data prep, allowing flexible and efficient handling of missing values in time-series forecasting preparation.
Data preprocessing interface displaying retail demand dataset with visualization, statistics, and imputation configuration

To create the data flow, choose Create model. Then, choose Run Validation, which checks the data to make sure the processes were done correctly. After this step of data transformation, you can access additional options by selecting the purple plus sign. The options include Get data insights, Chat for data prep, Combine data, Create model, and Export.Data Wrangler interface displaying validated data flow from local upload to drop missing step, with additional data preparation options

The prepared data can then be connected to SageMaker AI for time series forecasting strategies, in this case, to predict the future demand based on the historical data that has been prepared for machine learning.

When using SageMaker, it is also important to consider data storage and security. For the local import feature, data is stored on Amazon EFS volumes and encrypted by default. For more permanent storage, Amazon S3 is recommended. S3 offers security features such as server-side encryption (SSE-S3, SSE-KMS, or SSE-C), fine-grained access controls through AWS Identity and Access Management (IAM) roles and bucket policies, and the ability to use VPC endpoints for added network security. To help ensure data security in either case, it’s important to implement proper access controls, use encryption for data at rest and in transit, regularly audit access logs, and follow the principle of least privilege when assigning permissions.

In this next step, you learn how to train a model using SageMaker Canvas. Based on the previous step, select the purple plus sign and select Create Model, and then select Export to create a model. After selecting a column to predict (select price for this example), you go to the Build screen, with options such as Quick build and Standard build. Based on the column chosen, the model will predict future values based on the data that is being used.

SageMaker Canvas Version 1 model configuration interface for 3+ category price prediction with 20k sample dataset analysis

Clean up

To avoid incurring future charges, delete the SageMaker Data Wrangler data flow and S3 Buckets if used for storage.

  1. In the SageMaker console, navigate to Canvas
  2. Select Import and prepare
  3. Find your data flow in the list
  4. Click the three dots (⋮) menu next to your flow
  5. Select Delete to remove the data flow
    SageMaker Data Wrangler dashboard with recent data flow, last update time, and options to manage flows and create models

If you used S3 for storage:

  1. Open the Amazon S3 console
  2. Navigate to your bucket
  3. Select the bucket used for this project
  4. Choose Delete
  5. Type the bucket name to confirm deletion
  6. Select Delete bucket

Conclusion

In this post, we showed you how Amazon SageMaker Data Wrangler offers a no-code solution for time series data preparation, traditionally a task requiring technical expertise. By using the intuitive interface of the Data Wrangler console and natural language-powered tools, even users who don’t have a technical background can effectively prepare their data for future forecasting needs. This democratization of data preparation not only saves time and resources but also empowers a wider range of professionals to engage in data-driven decision-making.


About the author

Muni T. Bondu is a Solutions Architect at Amazon Web Services (AWS), based in Austin, Texas. She holds a Bachelor of Science in Computer Science, with concentrations in Artificial Intelligence and Human-Computer Interaction, from the Georgia Institute of Technology.

Read More

Build an agentic multimodal AI assistant with Amazon Nova and Amazon Bedrock Data Automation

Build an agentic multimodal AI assistant with Amazon Nova and Amazon Bedrock Data Automation

Modern enterprises are rich in data that spans multiple modalities—from text documents and PDFs to presentation slides, images, audio recordings, and more. Imagine asking an AI assistant about your company’s quarterly earnings call: the assistant should not only read the transcript but also “see” the charts in the presentation slides and “hear” the CEO’s remarks. Gartner predicts that by 2027, 40% of generative AI solutions will be multimodal (text, image, audio, video), up from only 1% in 2023. This shift underlines how vital multimodal understanding is becoming for business applications. Achieving this requires a multimodal generative AI assistant—one that can understand and combine text, visuals, and other data types. It also requires an agentic architecture so the AI assistant can actively retrieve information, plan tasks, and make decisions on tool calling, rather than just responding passively to prompts.

In this post, we explore a solution that does exactly that—using Amazon Nova Pro, a multimodal large language model (LLM) from AWS, as the central orchestrator, along with powerful new Amazon Bedrock features like Amazon Bedrock Data Automation for processing multimodal data. We demonstrate how agentic workflow patterns such as Retrieval Augmented Generation (RAG), multi-tool orchestration, and conditional routing with LangGraph enable end-to-end solutions that artificial intelligence and machine learning (AI/ML) developers and enterprise architects can adopt and extend. We walk through an example of a financial management AI assistant that can provide quantitative research and grounded financial advice by analyzing both the earnings call (audio) and the presentation slides (images), along with relevant financial data feeds. We also highlight how you can apply this pattern in industries like finance, healthcare, and manufacturing.

Overview of the agentic workflow

The core of the agentic pattern consists of the following stages:

  • Reason – The agent (often an LLM) examines the user’s request and the current context or state. It decides what the next step should be—whether that’s providing a direct answer or invoking a tool or sub-task to get more information.
  • Act – The agent executes that step. This could mean calling a tool or function, such as a search query, a database lookup, or a document analysis using Amazon Bedrock Data Automation.
  • Observe – The agent observes the result of the action. For instance, it reads the retrieved text or data that came back from the tool.
  • Loop – With new information in hand, the agent reasons again, deciding if the task is complete or if another step is needed. This loop continues until the agent determines it can produce a final answer for the user.

This iterative decision-making enables the agent to handle complex requests that are impossible to fulfill with a single prompt. However, implementing agentic systems can be challenging. They introduce more complexity in the control flow, and naive agents can be inefficient (making too many tool calls or looping unnecessarily) or hard to manage as they scale. This is where structured frameworks like LangGraph come in. LangGraph makes it possible to define a directed graph (or state machine) of potential actions with well-defined nodes (actions like “Report Writer” or “Query Knowledge Base”) and edges (allowable transitions). Although the agent’s internal reasoning still decides which path to take, LangGraph makes sure the process remains manageable and transparent. This controlled flexibility means the assistant has enough autonomy to handle diverse tasks while making sure the overall workflow is stable and predictable.

Solution overview

This solution is a financial management AI assistant designed to help analysts query portfolios, analyze companies, and generate reports. At its core is Amazon Nova, an LLM that acts as an intelligent LLM for inference. Amazon Nova processes text, images, or documents (like earnings call slides), and dynamically decides which tools to use to fulfill requests. Amazon Nova is optimized for enterprise tasks and supports function calling, so the model can plan actions and call tools in a structured way. With a large context window (up to 300,000 tokens in Amazon Nova Lite and Amazon Nova Pro), it can manage long documents or conversation history when reasoning.

The workflow consists of the following key components:

  • Knowledge base retrieval – Both the earnings call audio file and PowerPoint file are processed by Amazon Bedrock Data Automation, a managed service that extracts text, transcribes audio and video, and prepares data for analysis. If the user uploads a PowerPoint file, the system converts each slide into an image (PNG) for efficient search and analysis, a technique inspired by generative AI applications like Manus. Amazon Bedrock Data Automation is effectively a multimodal AI pipeline out of the box. In our architecture, Amazon Bedrock Data Automation acts as a bridge between raw data and the agentic workflow. Then Amazon Bedrock Knowledge Bases converts these chunks extracted from Amazon Bedrock Data Automation into vector embeddings using Amazon Titan Text Embeddings V2, and stores these vectors in an Amazon OpenSearch Serverless database.
  • Router agent – When a user asks a question—for example, “Summarize the key risks in this Q3 earnings report”—Amazon Nova first determines whether the task requires retrieving data, processing a file, or generating a response. It maintains memory of the dialogue, interprets the user’s request, and plans which actions to take to fulfill it. The “Memory & Planning” module in the solution diagram indicates that the router agent can use conversation history and chain-of-thought (CoT) prompting to determine next steps. Crucially, the router agent determines if the query can be answered with internal company data or if it requires external information and tools.
  • Multimodal RAG agent – For queries related with audio and video information, Amazon Bedrock Data Automation uses a unified API call to extract insights from such multimedia data, and stores the extracted insights in Amazon Bedrock Knowledge Bases. Amazon Nova uses Amazon Bedrock Knowledge Bases to retrieve factual answers using semantic search. This makes sure responses are grounded in real data, minimizing hallucination. If Amazon Nova generates an answer, a secondary hallucination check cross-references the response against trusted sources to catch unsupported claims.
  • Hallucination check (quality gate) – To further verify reliability, the workflow can include a postprocessing step using a different foundation model (FM) outside of the Amazon Nova family, such as Anthropic’s Claude, Mistral, or Meta’s Llama, to grade the answer’s faithfulness. For example, after Amazon Nova generates a response, a hallucination detector model or function can compare the answer against the retrieved sources or known facts. If a potential hallucination is detected (the answer isn’t supported by the reference data), the agent can choose to do additional retrieval, adjust the answer, or escalate to a human.
  • Multi-tool collaboration – This multi-tool collaboration allows the AI to not only find information but also take actions before formulating a final answer. This introduces multi-tool options. The supervisor agent might spawn or coordinate multiple tool-specific agents (for example, a web search agent to do a general web search, a stock search agent to get market data, or other specialized agents for company financial metrics or industry news). Each agent performs a focused task (one might call an API or perform a query on the internet) and returns findings to the supervisor agent. Amazon Nova Pro features a strong reasoning ability that allows the supervisor agent to merge these findings. This multi-agent approach follows the principle of dividing complex tasks among specialist agents, improving efficiency and reliability for complex queries.
  • Report creation agent – Another notable aspect in the architecture is the use of Amazon Nova Canvas for output generation. Amazon Nova Canvas is a specialized image-generation model in the Amazon Nova family, but in this context, we use the concept of a “canvas” more figuratively to mean a structured template or format generated content output. For instance, we could define a template for an “investor report” that the assistant fills out: Section 1: Key Highlights (bullet points), Section 2: Financial Summary (table of figures), Section 3: Notable Quotes, and so on. The agent can guide Amazon Nova to populate such a template by providing it with a system prompt containing the desired format (this is similar to few-shot prompting, where the layout is given). The result is that the assistant not only answers ad-hoc questions, but can also produce comprehensive generated reports that look as if a human analyst prepared them, combining text, image, and references to visuals.

These components are orchestrated in an agentic workflow. Instead of a fixed script, the solution uses a dynamic decision graph (implemented with the open source LangGraph library in the notebook solution) to route between steps. The result is an assistant that feels less like a chatbot and more like a collaborative analyst—one that can parse an earnings call audio recording, critique a slide deck, or draft an investor memo with minimal human intervention.

The following diagram shows the high-level architecture of the agentic AI workflow. Amazon Nova orchestrates various tools—including Bedrock Amazon Data Automation for document and image processing and a knowledge base for retrieval—to fulfill complex user requests. For brevity, we don’t list all the code here; the GitHub repo includes a full working example. Developers can run that to see the agent in action and extend it with their own data.

Example of the multi-tool collaboration workflow

To demonstrate the multi-tool collaboration agent workflow, we explore an example of how a question-answer interaction might flow through our deployed system for multi-tool collaboration:

  • User prompt – In the chat UI, the end-user asks a question, such as “What is XXX’s stock performance this year, and how does it compare to its rideshare‑industry peers?”
  • Agent initial response – The agent (Amazon Nova FM orchestrator) receives the question and responds with:
    Received your question. Routing to the reasoning engine…

  • Planning and tool selection – The agent determines that it needs the following:
    • The ticker symbol for the company (XXX)
    • Real‑time stock price and YTD changes
    • Key financial metrics (revenue, net income, price-earnings ratio)
    • Industry benchmarks (peer YTD performance, average revenue growth)
  • Planning execution using tool calls – The agent calls tools to perform the following actions:
    • Look up ticker symbol:
      Agent → WebSearchTool.lookupTicker("XXX Inc")
      WebSearchTool → Agent: returns "XXX"

    • Fetch real‑time stock performance using the retrieved ticker symbol:
      Agent → StockAnalysisTool.getPerformance(
       symbol="XXX",
       period="YTD"
       )
      StockAnalysisTool → Agent:
       {
       currentPrice: 
       ytdChange: 
       52wkRange: 
       volume: 
       }

    • Retrieve company financial metrics using the retrieved ticker symbol:
      Agent → CompanyFinancialAnalysisTool.getMetrics("UBER")
      CompanyFinancialAnalysisTool → Agent:
       {
       revenueQ4_2024: xxx B,
       netIncomeQ4_2024: xxx M,
       peRatio: xxx
       }

    • Gather industry benchmark data using the retrieved ticker symbol:
      Agent → IndustryAnalysisTool.comparePeers(
       symbol="XXX",
       sector="Rideshare"
       )
      IndustryAnalysisTool → Agent:
       {
       avgPeerYTD:
       avgRevenueGrowth: 
       }

    • Validation loop – The agent runs a validation loop:
      Agent: validate()
       ↳ Are all four data points present?
       • Ticker :heavy_check_mark: 
       • Stock performance :heavy_check_mark: 
       • Financial metrics :heavy_check_mark: 
       • Industry benchmark :heavy_check_mark: 
       ↳ All set—no retry needed.

If anything is missing or a tool encountered an error, the FM orchestrator triggers the error handler (up to three retries), then resumes the plan at the failed step.

  • Synthesis and final answer – The agent uses Amazon Nova Pro to synthesize the data points and generate final answers based on these data points.

The following figure shows a flow diagram of this multi-tool collaboration agent.

Benefits of using Amazon Bedrock for scalable generative AI agent workflows

This solution is built on Amazon Bedrock because AWS provides an integrated ecosystem for building such sophisticated solutions at scale:

  • Amazon Bedrock delivers top-tier FMs like Amazon Nova, with managed infrastructure—no need for provisioning GPU servers or handling scaling complexities.
  • Amazon Bedrock Data Automation offers an out-of-the-box solution to process documents, images, audio, and video into actionable data. Amazon Bedrock Data Automation can convert presentation slides to images, convert audio to text, perform OCR, and generate textual summaries or captions that are then indexed in an Amazon Bedrock knowledge bases.
  • Amazon Bedrock Knowledge Bases can store embeddings from unstructured data and support retrieval operations using similarity search.
  • In addition to LangGraph (as shown in this solution), you can also use Amazon Bedrock Agents to develop agentic workflows. Amazon Bedrock Agents simplifies the configuration of tool flows and action groups, so you can declaratively manage your agentic workflows.
  • Applications developed by open source frameworks like LangGraph (an extension of LangChain) can also run and scale with AWS infrastructure such as Amazon Elastic Compute Cloud (Amazon EC2) or Amazon SageMaker instances, so you can define directed graphs for agent orchestration, making it effortless to manage multi-step reasoning and tool chaining.

You don’t need to assemble a dozen disparate systems; AWS provides an integrated network for generative AI workflows.

Considerations and customizations

The architecture demonstrates exceptional flexibility through its modular design principles. At its core, the system uses Amazon Nova FMs, which can be selected based on task complexity. Amazon Nova Micro handles straightforward tasks like classification with minimal latency. Amazon Nova Lite manages moderately complex operations with balanced performance, and Amazon Nova Pro excels at sophisticated tasks requiring advanced reasoning or generating comprehensive responses.

The modular nature of the solution (Amazon Nova, tools, knowledge base, and Amazon Bedrock Data Automation) means each piece can be swapped or adjusted without overhauling the whole system. Solution architects can use this reference architecture as a foundation, implementing customizations as needed. You can seamlessly integrate new capabilities through AWS Lambda functions for specialized operations, and the LangGraph orchestration enables dynamic model selection and sophisticated routing logic. This architectural approach makes sure the system can evolve organically while maintaining operational efficiency and cost-effectiveness.

Bringing it to production requires thoughtful design, but AWS offers scalability, security, and reliability. For instance, you can secure the knowledge base content with encryption and access control, integrate the agent with AWS Identity and Access Management (IAM) to make sure it only performs allowed actions (for example, if an agent can access sensitive financial data, verify it checks user permissions ), and monitor the costs (you can track Amazon Bedrock pricing and tools usage; you might use Provisioned Throughput for consistent high-volume usage). Additionally, with AWS, you can scale from an experiment in a notebook to a full production deployment when you’re ready, using the same building blocks (integrated with proper AWS infrastructure like Amazon API Gateway or Lambda, if deploying as a service).

Vertical industries that can benefit from this solution

The architecture we described is quite general. Let’s briefly look at how this multimodal agentic workflow can drive value in different industries:

  • Financial services – In the financial sector, the solution integrates multimedia RAG to unify earnings call transcripts, presentation slides (converted to searchable images), and real-time market feeds into a single analytical framework. Multi-agent collaboration enables Amazon Nova to orchestrate tools like Amazon Bedrock Data Automation for slide text extraction, semantic search for regulatory filings, and live data APIs for trend detection. This allows the system to generate actionable insights—such as identifying portfolio risks or recommending sector rebalancing—while automating content creation for investor reports or trade approvals (with human oversight). By mimicking an analyst’s ability to cross-reference data types, the AI assistant transforms fragmented inputs into cohesive strategies.
  • Healthcare – Healthcare workflows use multimedia RAG to process clinical notes, lab PDFs, and X-rays, grounding responses in peer-reviewed literature and patient audio interview. Multi-agent collaboration excels in scenarios like triage: Amazon Nova interprets symptom descriptions, Amazon Bedrock Data Automation extracts text from scanned documents, and integrated APIs check for drug interactions, all while validating outputs against trusted sources. Content creation ranges from succinct patient summaries (“Severe pneumonia, treated with levofloxacin”) to evidence-based answers for complex queries, such as summarizing diabetes guidelines. The architecture’s strict hallucination checks and source citations support reliability, which is critical for maintaining trust in medical decision-making.
  • Manufacturing – Industrial teams use multimedia RAG to index equipment manuals, sensor logs, worker audio conversation, and schematic diagrams, enabling rapid troubleshooting. Multi-agent collaboration allows Amazon Nova to correlate sensor anomalies with manual excerpts, and Amazon Bedrock Data Automation highlights faulty parts in technical drawings. The system generates repair guides (for example, “Replace valve Part 4 in schematic”) or contextualizes historical maintenance data, bridging the gap between veteran expertise and new technicians. By unifying text, images, and time series data into actionable content, the assistant reduces downtime and preserves institutional knowledge—proving that even in hardware-centric fields, AI-driven insights can drive efficiency.

These examples highlight a common pattern: the synergy of data automation, powerful multimodal models, and agentic orchestration leads to solutions that closely mimic a human expert’s assistance. The financial AI assistant cross-checks figures and explanations like an analyst would, the clinical AI assistant correlates images and notes like a diligent doctor, and the industrial AI assistant recalls diagrams and logs like a veteran engineer. All of this is made possible by the underlying architecture we’ve built.

Conclusion

The era of siloed AI models that only handle one type of input is drawing to a close. As we’ve discussed, combining multimodal AI with an agentic workflow unlocks a new level of capability for enterprise applications. In this post, we demonstrated how to construct such a workflow using AWS services: we used Amazon Nova as the core AI orchestrator with its multimodal, agent-friendly capabilities, Amazon Bedrock Data Automation to automate the ingestion and indexing of complex data (documents, slides, audio) into Amazon Bedrock Knowledge Bases, and the concept of an agentic workflow graph for reasoning and condition (using LangChain or LangGraph) to orchestrate multi-step reasoning and tool usage. The end result is an AI assistant that operates much like a diligent analyst: researching, cross-checking multiple sources, and delivering insights—but at machine speed and scale.The solution demonstrates that building a sophisticated agentic AI system is no longer an academic dream—it’s practical and achievable with today’s AWS technologies. By using Amazon Nova as a powerful multimodal LLM and Amazon Bedrock Data Automation for multimodal data processing, along with frameworks for tool orchestration like LangGraph (or Amazon Bedrock Agents), developers get a head start. Many challenges (like OCR, document parsing, or conversational orchestration) are handled by these managed services or libraries, so you can focus on the business logic and domain-specific needs.

The solution presented in the BDA_nova_agentic sample notebook is a great starting point to experiment with these ideas. We encourage you to try it out, extend it, and tailor it to your organization’s needs. We’re excited to see what you will build—the techniques discussed here represent only a small portion of what’s possible when you combine modalities and intelligent agents.


About the authors

Julia Hu Julia Hu is a Sr. AI/ML Solutions Architect at Amazon Web Services, currently focused on the Amazon Bedrock team. Her core expertise lies in agentic AI, where she explores the capabilities of foundation models and AI agents to drive productivity in Generative AI applications. With a background in Generative AI, Applied Data Science, and IoT architecture, she partners with customers—from startups to large enterprises—to design and deploy impactful AI solutions.

Rui Cardoso is a partner solutions architect at Amazon Web Services (AWS). He is focusing on AI/ML and IoT. He works with AWS Partners and support them in developing solutions in AWS. When not working, he enjoys cycling, hiking and learning new things.

Jessie-Lee Fry is a Product and Go-to Market (GTM) Strategy executive specializing in Generative AI and Machine Learning, with over 15 years of global leadership experience in Strategy, Product, Customer success, Business Development, Business Transformation and Strategic Partnerships. Jessie has defined and delivered a broad range of products and cross-industry go- to-market strategies driving business growth, while maneuvering market complexities and C-Suite customer groups. In her current role, Jessie and her team focus on helping AWS customers adopt Amazon Bedrock at scale enterprise use cases and adoption frameworks, meeting customers where they are in their Generative AI Journey.

Read More

Build a scalable AI video generator using Amazon SageMaker AI and CogVideoX

Build a scalable AI video generator using Amazon SageMaker AI and CogVideoX

In recent years, the rapid advancement of artificial intelligence and machine learning (AI/ML) technologies has revolutionized various aspects of digital content creation. One particularly exciting development is the emergence of video generation capabilities, which offer unprecedented opportunities for companies across diverse industries. This technology allows for the creation of short video clips that can be seamlessly combined to produce longer, more complex videos. The potential applications of this innovation are vast and far-reaching, promising to transform how businesses communicate, market, and engage with their audiences. Video generation technology presents a myriad of use cases for companies looking to enhance their visual content strategies. For instance, ecommerce businesses can use this technology to create dynamic product demonstrations, showcasing items from multiple angles and in various contexts without the need for extensive physical photoshoots. In the realm of education and training, organizations can generate instructional videos tailored to specific learning objectives, quickly updating content as needed without re-filming entire sequences. Marketing teams can craft personalized video advertisements at scale, targeting different demographics with customized messaging and visuals. Furthermore, the entertainment industry stands to benefit greatly, with the ability to rapidly prototype scenes, visualize concepts, and even assist in the creation of animated content. The flexibility offered by combining these generated clips into longer videos opens up even more possibilities. Companies can create modular content that can be quickly rearranged and repurposed for different displays, audiences, or campaigns. This adaptability not only saves time and resources, but also allows for more agile and responsive content strategies. As we delve deeper into the potential of video generation technology, it becomes clear that its value extends far beyond mere convenience, offering a transformative tool that can drive innovation, efficiency, and engagement across the corporate landscape.

In this post, we explore how to implement a robust AWS-based solution for video generation that uses the CogVideoX model and Amazon SageMaker AI.

Solution overview

Our architecture delivers a highly scalable and secure video generation solution using AWS managed services. The data management layer implements three purpose-specific Amazon Simple Storage Service (Amazon S3) buckets—for input videos, processed outputs, and access logging—each configured with appropriate encryption and lifecycle policies to support data security throughout its lifecycle.

For compute resources, we use AWS Fargate for Amazon Elastic Container Service (Amazon ECS) to host the Streamlit web application, providing serverless container management with automatic scaling capabilities. Traffic is efficiently distributed through an Application Load Balancer. The AI processing pipeline uses SageMaker AI processing jobs to handle video generation tasks, decoupling intensive computation from the web interface for cost optimization and enhanced maintainability. User prompts are refined through Amazon Bedrock, which feeds into the CogVideoX-5b model for high-quality video generation, creating an end-to-end solution that balances performance, security, and cost-efficiency.

The following diagram illustrates the solution architecture.

Solution Architecture

CogVideoX model

CogVideoX is an open source, state-of-the-art text-to-video generation model capable of producing 10-second continuous videos at 16 frames per second with a resolution of 768×1360 pixels. The model effectively translates text prompts into coherent video narratives, addressing common limitations in previous video generation systems.

The model uses three key innovations:

  • A 3D Variational Autoencoder (VAE) that compresses videos along both spatial and temporal dimensions, improving compression efficiency and video quality
  • An expert transformer with adaptive LayerNorm that enhances text-to-video alignment through deeper fusion between modalities
  • Progressive training and multi-resolution frame pack techniques that enable the creation of longer, coherent videos with significant motion elements

CogVideoX also benefits from an effective text-to-video data processing pipeline with various preprocessing strategies and a specialized video captioning method, contributing to higher generation quality and better semantic alignment. The model’s weights are publicly available, making it accessible for implementation in various business applications, such as product demonstrations and marketing content. The following diagram shows the architecture of the model.

Model Architecture

Prompt enhancement

To improve the quality of video generation, the solution provides an option to enhance user-provided prompts. This is done by instructing a large language model (LLM), in this case Anthropic’s Claude, to take a user’s initial prompt and expand upon it with additional details, creating a more comprehensive description for video creation. The prompt consists of three parts:

  • Role section – Defines the AI’s purpose in enhancing prompts for video generation
  • Task section – Specifies the instructions needed to be performed with the original prompt
  • Prompt section – Where the user’s original input is inserted

By adding more descriptive elements to the original prompt, this system aims to provide richer, more detailed instructions to video generation models, potentially resulting in more accurate and visually appealing video outputs. We use the following prompt template for this solution:

"""
<Role>
Your role is to enhance the user prompt that is given to you by 
providing additional details to the prompt. The end goal is to
covert the user prompt into a short video clip, so it is necessary 
to provide as much information you can.
</Role>
<Task>
You must add details to the user prompt in order to enhance it for
 video generation. You must provide a 1 paragraph response. No 
more and no less. Only include the enhanced prompt in your response. 
Do not include anything else.
</Task>
<Prompt>
{prompt}
</Prompt>
"""

Prerequisites

Before you deploy the solution, make sure you have the following prerequisites:

  • The AWS CDK Toolkit – Install the AWS CDK Toolkit globally using npm:
    npm install -g aws-cdk
    This provides the core functionality for deploying infrastructure as code to AWS.
  • Docker Desktop – This is required for local development and testing. It makes sure container images can be built and tested locally before deployment.
  • The AWS CLI – The AWS Command Line Interface (AWS CLI) must be installed and configured with appropriate credentials. This requires an AWS account with necessary permissions. Configure the AWS CLI using aws configure with your access key and secret.
  • Python Environment – You must have Python 3.11+ installed on your system. We recommend using a virtual environment for isolation. This is required for both the AWS CDK infrastructure and Streamlit application.
  • Active AWS account – You will need to raise a service quota request for SageMaker to ml.g5.4xlarge for processing jobs.

Deploy the solution

This solution has been tested in the us-east-1 AWS Region. Complete the following steps to deploy:

  1. Create and activate a virtual environment:
python -m venv .
venv source .venv/bin/activate
  1. Install infrastructure dependencies:
cd infrastructure
pip install -r requirements.txt
  1. Bootstrap the AWS CDK (if not already done in your AWS account):
cdk bootstrap
  1. Deploy the infrastructure:
cdk deploy -c allowed_ips='["'$(curl -s ifconfig.me)'/32"]'

To access the Streamlit UI, choose the link for StreamlitURL in the AWS CDK output logs after deployment is successful. The following screenshot shows the Streamlit UI accessible through the URL.

User interface screenshot

Basic video generation

Complete the following steps to generate a video:

  1. Input your natural language prompt into the text box at the top of the page.
  2. Copy this prompt to the text box at the bottom.
  3. Choose Generate Video to create a video using this basic prompt.

The following is the output from the simple prompt “A bee on a flower.”

Enhanced video generation

For higher-quality results, complete the following steps:

  1. Enter your initial prompt in the top text box.
  2. Choose Enhance Prompt to send your prompt to Amazon Bedrock.
  3. Wait for Amazon Bedrock to expand your prompt into a more descriptive version.
  4. Review the enhanced prompt that appears in the lower text box.
  5. Edit the prompt further if desired.
  6. Choose Generate Video to initiate the processing job with CogVideoX.

When processing is complete, your video will appear on the page with a download option.The following is an example of an enhanced prompt and output:

"""
A vibrant yellow and black honeybee gracefully lands on a large, 
blooming sunflower in a lush garden on a warm summer day. The 
bee's fuzzy body and delicate wings are clearly visible as it 
moves methodically across the flower's golden petals, collecting 
pollen. Sunlight filters through the petals, creating a soft, 
warm glow around the scene. The bee's legs are coated in pollen 
as it works diligently, its antennae twitching occasionally. In 
the background, other colorful flowers sway gently in a light 
breeze, while the soft buzzing of nearby bees can be heard
"""

Add an image to your prompt

If you want to include an image with your text prompt, complete the following steps:

  1. Complete the text prompt and optional enhancement steps.
  2. Choose Include an Image.
  3. Upload the photo you want to use.
  4. With both text and image now prepared, choose Generate Video to start the processing job.

The following is an example of the previous enhanced prompt with an included image.

To view more samples, check out the CogVideoX gallery.

Clean up

To avoid incurring ongoing charges, clean up the resources you created as part of this post:

cdk destroy

Considerations

Although our current architecture serves as an effective proof of concept, several enhancements are recommended for a production environment. Considerations include implementing an API Gateway with AWS Lambda backed REST endpoints for improved interface and authentication, introducing a queue-based architecture using Amazon Simple Queue Service (Amazon SQS) for better job management and reliability, and enhancing error handling and monitoring capabilities.

Conclusion

Video generation technology has emerged as a transformative force in digital content creation, as demonstrated by our comprehensive AWS-based solution using the CogVideoX model. By combining powerful AWS services like Fargate, SageMaker, and Amazon Bedrock with an innovative prompt enhancement system, we’ve created a scalable and secure pipeline capable of producing high-quality video clips. The architecture’s ability to handle both text-to-video and image-to-video generation, coupled with its user-friendly Streamlit interface, makes it an invaluable tool for businesses across sectors—from ecommerce product demonstrations to personalized marketing campaigns. As showcased in our sample videos, the technology delivers impressive results that open new avenues for creative expression and efficient content production at scale. This solution represents not just a technological advancement, but a glimpse into the future of visual storytelling and digital communication.

To learn more about CogVideoX, refer to CogVideoX on Hugging Face. Try out the solution for yourself, and share your feedback in the comments.


About the Authors

Nick Biso is a Machine Learning Engineer at AWS Professional Services. He solves complex organizational and technical challenges using data science and engineering. In addition, he builds and deploys AI/ML models on the AWS Cloud. His passion extends to his proclivity for travel and diverse cultural experiences.

Natasha Tchir is a Cloud Consultant at the Generative AI Innovation Center, specializing in machine learning. With a strong background in ML, she now focuses on the development of generative AI proof-of-concept solutions, driving innovation and applied research within the GenAIIC.

Katherine Feng is a Cloud Consultant at AWS Professional Services within the Data and ML team. She has extensive experience building full-stack applications for AI/ML use cases and LLM-driven solutions.

Jinzhao Feng is a Machine Learning Engineer at AWS Professional Services. He focuses on architecting and implementing large-scale generative AI and classic ML pipeline solutions. He is specialized in FMOps, LLMOps, and distributed training.

Read More

Building trust in AI: The AWS approach to the EU AI Act

As AI adoption accelerates and reshapes our future, organizations are adapting to evolving regulatory frameworks. In our report commissioned to Strand Partners, Unlocking Europe’s AI Potential in the Digital Decade 2025, 68% of European businesses surveyed underlined that they struggle to understand their responsibilities under the EU AI Act. European businesses also highlighted that an estimated 40% of their IT spend goes towards compliance-related costs, and those uncertain about regulations plan to invest 28% less in AI over the next year. More clarity around regulation and compliance is critical to meet the competitiveness targets set out by the European Commission.

The EU AI Act

The European Union’s Artificial Intelligence Act (EU AI Act) establishes comprehensive regulations for the development, deployment, use, and provision of AI within the EU. It brings a risk-based regulatory framework with the overarching goal of protecting fundamental rights and safety. The EU AI Act entered into force on August 1, 2024, and will apply in phases, with most requirements becoming applicable over the next 14 months. The first group of obligations on prohibited AI practices and AI literacy became enforceable on February 1, 2025, with the remaining obligations to follow gradually.

AWS customers across industries use our AI services for a myriad of purposes, such as to provide better customer service, optimize their businesses, or create new experiences for their customers. We are actively evaluating how our services can best support customers to meet their compliance obligations, while maintaining AWS’s own compliance with the applicable provisions of the EU AI Act. As the European Commission continues to publish compliance guidance, such as the Guidelines of Prohibited AI Practices and the Guidelines on AI System Definition, we will continue to provide updates to our customers through our AWS Blog posts and other AWS channels.

The AWS approach to the EU AI Act

AWS has long been committed to AI solutions that are safe and respect fundamental rights. We take a people-centric approach that prioritizes education, science, and our customers’ needs to integrate responsible AI across the end-to-end AI lifecycle. As a leader in AI technology, AWS prioritizes trust in our AI offerings and supports the EU AI Act’s goal of promoting trustworthy AI products and services. We do this in several ways:

The EU AI Act requires all AI systems to meet certain requirements for fairness, transparency, accountability, and fundamental rights protection. Taking a risk-based approach, the EU AI Act establishes different categories of AI systems with corresponding requirements, and it brings obligations for all actors across the AI supply chain, including providers, deployers, distributors, users, and importers. AI systems deemed to pose unacceptable risks are prohibited. High-risk AI systems are allowed, but they are subject to stricter requirements for documentation, data governance, human oversight, and risk management procedures. In addition, certain AI systems (for example, those intended to interact directly with natural persons) are considered low risk and subject to transparency requirements. Apart from the requirements for AI systems, the EU AI Act also brings a separate set of obligations for providers of general-purpose AI (GPAI) models, depending on whether they pose systemic risks or not. The EU AI Act may apply to activities both inside and outside the EU. Therefore, even if your organization is not established in the EU, you may still be required to comply with the EU AI Act. We encourage all AWS customers to conduct a thorough assessment of their AI activities to determine whether they are subject to the EU AI Act and their specific obligations, regardless of their location.

Prohibited use cases

Beginning February 1, 2025, the EU AI Act has prohibited certain AI practices deemed to present unacceptable risks to fundamental rights. These prohibitions, a full list of which is available under Article 5 of the EU AI Act, generally focus on manipulative or exploitative practices that can be harmful or abusive and the evaluation or classification of individuals based on social behavior, personal traits, or biometric data.

AWS is committed to making sure our AI services meet applicable regulatory requirements, including those of the EU AI Act. Although AWS services support a wide range of customer use case categories, none are designed or intended for practices prohibited under the EU AI Act, and we maintain this commitment through our policies, including the AWS Acceptable Use Policy, Responsible AI Policy, and Responsible Use of AI Guide.

Compliance with the EU AI Act is a shared journey as set out by the regulation and responsibilities for developers (providers) and deployers of AI systems, and although AWS provides the building blocks for compliant solutions, AWS customers remain responsible for assessing how their use of AWS services falls under the EU AI Act, implementing appropriate controls for their AI applications, and making sure their specific use cases are compliant with the EU AI Act’s restrictions. We encourage AWS customers to carefully review the list of prohibited practices under the EU AI Act when building AI solutions using AWS services and review the European Commission’s recently published guidelines on prohibited practices.

Moving forward with the EU AI Act

As the regulatory landscape continues to evolve, customers should stay informed about the EU AI Act and assess how it applies to their organization’s use of AI. AWS remains engaged with EU institutions and relevant authorities across EU member states on the enforcement of the EU AI Act. We participate in industry dialogues and contribute our knowledge and experience to support balanced outcomes that safeguard against risks of this technology, particularly where AI use cases have the potential to affect individuals’ health and safety or fundamental rights, while enabling continued AI innovation in ways that will benefit all. We will continue to update our customers through our AWS ML Blog posts and other AWS channels as new guidance emerges and additional portions of the EU AI Act take effect.

If you have questions about compliance with the EU AI Act, or if you require additional information on AWS AI governance tools and resources, please contact your account representative or request to be contacted.

If you’d like to join our community of innovators and learn about upcoming events and gain expert insights, practical guidance, and connections that help you navigate the regulatory landscape, please express interest by registering.

Read More

Update on the AWS DeepRacer Student Portal

Update on the AWS DeepRacer Student Portal

The AWS DeepRacer Student Portal will no longer be available starting September 15, 2025. This change comes as part of the broader transition of AWS DeepRacer from a service to an AWS Solution, representing an evolution in how we deliver AI & ML education. Since its launch, the AWS DeepRacer Student Portal has helped thousands of learners begin their AI & ML journey through hands-on reinforcement learning experiences. The portal has served as a foundational stepping stone for many who have gone on to pursue career development in AI through the AWS AI & ML Scholars program, which has been re-launched with a generative AI focused curriculum.

Starting July 14, 2025, the AWS DeepRacer Student Portal will enter a maintenance phase where new registrations will be disabled. Until September 15, 2025, existing users will retain full access to their content and training materials, with updates limited to critical security fixes, after which the portal will no longer be available. Going forward, AWS DeepRacer will be available as a solution in the AWS Solutions Library in the future, providing educational institutions and organizations with greater capabilities to build and customize their own DeepRacer learning experiences.

As part of our commitment to advancing AI & ML education, we recently launched the enhanced AWS AI & ML Scholars program on May 28, 2025. This new program embraces the latest developments in generative AI, featuring hands-on experience with AWS PartyRock and Amazon Q. The curriculum focuses on practical applications of AI technologies and emerging skills, reflecting the evolving needs of the technology industry and preparing students for careers in AI. To learn more about the new AI & ML Scholars program and continue your learning journey, visit awsaimlscholars.com. In addition, users can also explore AI learning content and build in-demand cloud skills using AWS Skill Builder.

We’re grateful to the entire AWS DeepRacer Student community for their enthusiasm and engagement, and we look forward to supporting the next chapter of your AI & ML learning journey.


About the author

Jayadev Kalla is a Product Manager with the AWS Social Responsibility and Impact team, focusing on AI & ML education. His goal is to expand access to AI education through hands-on learning experiences. Outside of work, Jayadev is a sports enthusiast and loves to cook.

Read More

Accelerate foundation model training and inference with Amazon SageMaker HyperPod and Amazon SageMaker Studio

Accelerate foundation model training and inference with Amazon SageMaker HyperPod and Amazon SageMaker Studio

Modern generative AI model providers require unprecedented computational scale, with pre-training often involving thousands of accelerators running continuously for days, and sometimes months. Foundation Models (FMs) demand distributed training clusters — coordinated groups of accelerated compute instances, using frameworks like PyTorch — to parallelize workloads across hundreds of accelerators (like AWS Trainium and AWS Inferentia chips or NVIDIA GPUs).

Orchestrators like SLURM and Kubernetes manage these complex workloads, scheduling jobs across nodes, managing cluster resources, and processing requests. Paired with AWS infrastructure like Amazon Elastic Compute Cloud (Amazon EC2) accelerated computing instances, Elastic Fabric Adapter (EFA), and distributed file systems like Amazon Elastic File System (Amazon EFS) and Amazon FSx, these ultra clusters can run large-scale machine learning (ML) training and inference, handling parallelism, gradient synchronization and collective communications, and even routing and load balancing. However, at scale, even robust orchestrators face challenges around cluster resilience. Distributed training workloads specifically run synchronously, because each training step requires participating instances to complete their calculations before proceeding to the next step. This means that if a single instance fails, the entire job fails. The likelihood of these failures increases with the size of the cluster.

Although resilience and infrastructure reliability can be a challenge, developer experience remains equally pivotal. Traditional ML workflows create silos, where data and research scientists prototype on local Jupyter notebooks or Visual Studio Code instances, lacking access to cluster-scale storage, and engineers manage production jobs through separate SLURM or Kubernetes (kubectl or helm, for example) interfaces. This fragmentation has consequences, including mismatches between notebook and production environments, lack of local access to cluster storage, and most importantly, sub-optimal use of ultra clusters.

In this post, we explore these challenges. In particular, we propose a solution to enhance the data scientist experience on Amazon SageMaker HyperPod—a resilient ultra cluster solution.

Amazon SageMaker HyperPod

SageMaker HyperPod is a compute environment purpose built for large-scale frontier model training. You can build resilient clusters for ML workloads and develop state-of-the-art frontier models. SageMaker HyperPod runs health monitoring agents in the background for each instance. When it detects a hardware failure, SageMaker HyperPod automatically repairs or replaces the faulty instance and resumes training from the last saved checkpoint. This automation alleviates the need for manual intervention, which means you can train in distributed settings for weeks or months with minimal disruption.

To learn more about the resilience and Total Cost of Ownership (TCO) benefits of SageMaker HyperPod, check out Reduce ML training costs with Amazon SageMaker HyperPod. As of writing this post, SageMaker HyperPod supports both SLURM and Amazon Elastic Kubernetes Service (Amazon EKS) as orchestrators.

To deploy a SageMaker HyperPod cluster, refer to the SageMaker HyperPod workshops (SLURM, Amazon EKS). To learn more about what’s being deployed, check out the architecture diagrams later in this post. You can choose to use either of the two orchestrators based on your preference.

Amazon SageMaker Studio

Amazon SageMaker Studio is a fully integrated development environment (IDE) designed to streamline the end-to-end ML lifecycle. It provides a unified, web-based interface where data scientists and developers can perform ML tasks, including data preparation, model building, training, tuning, evaluation, deployment, and monitoring.

By centralizing these capabilities, SageMaker Studio alleviates the need to switch between multiple tools, significantly enhancing productivity and collaboration. SageMaker Studio supports a variety of IDEs, such as JupyterLab Notebooks, Code Editor based on Code-OSS, Visual Studio Code Open Source, and RStudio, offering flexibility for diverse development preferences. SageMaker Studio supports private and shared spaces, so teams can collaborate effectively while optimizing resource allocation. Shared spaces allow multiple users to access the same compute resources across profiles, and private spaces provide dedicated environments for individual users. This flexibility empowers data scientists and developers to seamlessly scale their compute resources and enhance collaboration within SageMaker Studio. Additionally, it integrates with advanced tooling like managed MLflow and Partner AI Apps to streamline experiment tracking and accelerate AI-driven innovation.

Distributed file systems: Amazon FSx

Amazon FSx for Lustre is a fully managed file storage service designed to provide high-performance, scalable, and cost-effective storage for compute-intensive workloads. Powered by the Lustre architecture, it’s optimized for applications requiring access to fast storage, such as ML, high-performance computing, video processing, financial modeling, and big data analytics.

FSx for Lustre delivers sub-millisecond latencies, scaling up to 1 GBps per TiB of throughput, and millions of IOPS. This makes it ideal for workloads demanding rapid data access and processing. The service integrates with Amazon Simple Storage Service (Amazon S3), enabling seamless access to S3 objects as files and facilitating fast data transfers between Amazon FSx and Amazon S3. Updates in S3 buckets are automatically reflected in FSx file systems and vice versa. For more information on this integration, check out Exporting files using HSM commands and Linking your file system to an Amazon S3 bucket.

Theory behind mounting an FSx for Lustre file system to SageMaker Studio spaces

You can use FSx for Lustre as a shared high-performance file system to connect SageMaker Studio domains with SageMaker HyperPod clusters, streamlining ML workflows for data scientists and researchers. By using FSx for Lustre as a shared volume, you can build and refine your training or fine-tuning code using IDEs like JupyterLab and Code Editor in SageMaker Studio, prepare datasets, and save your work directly in the FSx for Lustre volume.This same volume is mounted by SageMaker HyperPod during the execution of training workloads, enabling direct access to prepared data and code without the need for repetitive data transfers or custom image creation. Data scientists can iteratively make changes, prepare data, and submit training workloads directly from SageMaker Studio, providing consistency across development and execution environments while enhancing productivity. This integration alleviates the overhead of moving data between environments and provides a seamless workflow for large-scale ML projects requiring high throughput and low-latency storage. You can configure FSx for Lustre volumes to provide file system access to SageMaker Studio user profiles in two distinct ways, each tailored to different collaboration and data management needs.

Option 1: Shared file system partition across every user profile

Infrastructure administrators can set up a single FSx for Lustre file system partition shared across user profiles within a SageMaker Studio domain, as illustrated in the following diagram.

Figure 1: A FSx for Lustre file system partition shared across multiple user profiles within a single SageMaker Studio Domain

  • Shared project directories – Teams working on large-scale projects can collaborate seamlessly by accessing a shared partition. This makes it possible for multiple users to work on the same files, datasets, and FMs without duplicating resources.
  • Simplified file management – You don’t need to manage private storage; instead, you can rely on the shared directory for your file-related needs, reducing complexity.
  • Improved data governance and security – The shared FSx for Lustre partition is centrally managed by the infrastructure admin, enabling robust access controls and data policies to maintain security and integrity of shared resources.

Option 2: Shared file system partition across each user profile

Alternatively, administrators can configure dedicated FSx for Lustre file system partitions for each individual user profile in SageMaker Studio, as illustrated in the following diagram.

Figure 2: A FSx for Lustre file system with a dedicated partition per user

This setup provides personalized storage and facilitates data isolation. Key benefits include:

  • Individual data storage and analysis – Each user gets a private partition to store personal datasets, models, and files. This facilitates independent work on projects with clear segregation by user profile.
  • Centralized data management – Administrators retain centralized control over the FSx for Lustre file system, facilitating secure backups and direct access while maintaining data security for users.
  • Cross-instance file sharing – You can access your private files across multiple SageMaker Studio spaces and IDEs, because the FSx for Lustre partition provides persistent storage at the user profile level.

Solution overview

The following diagram illustrates the architecture of SageMaker HyperPod with SLURM integration.

Figure 3: Architecture Diagram for SageMaker HyperPod with Slurm as the orchestrator

The following diagram illustrates the architecture of SageMaker HyperPod with Amazon EKS integration.

Figure 4: Architecture Diagram for SageMaker HyperPod with EKS as the orchestrator

These diagrams illustrate what you would provision as part of this solution. In addition to the SageMaker HyperPod cluster you already have, you provision a SageMaker Studio domain, and attach the cluster’s FSx for Lustre file system to the SageMaker Studio domain. Depending on whether or not you choose a SharedFSx, you can either attach the file system to be mounted with a single partition shared across user profiles (that you configure) within your SageMaker domain, or attach it to be mounted with multiple partitions for multiple isolated users. To learn more about this distinction, refer to the section earlier in this post discussing the theory behind mounting an FSx for Lustre file system to SageMaker Studio spaces.

In the following sections, we present a walkthrough of this integration by demonstrating on a SageMaker HyperPod with Amazon EKS cluster how you can:

  1. Attach a SageMaker Studio domain.
  2. Use that domain to fine-tune the DeepSeek-R1-Distill-Qwen-14B using the FreedomIntelligence/medical-o1-reasoning-SFT dataset.

Prerequisites

This post assumes that you have a SageMaker HyperPod cluster.

Deploy resources using AWS CloudFormation

As part of this integration, we provide an AWS CloudFormation stack template (SLURM, Amazon EKS). Before deploying the stack, make sure you have a SageMaker HyperPod cluster set up.

In the stack for SageMaker HyperPod with SLURM, you create the following resources:

  • A SageMaker Studio domain.
  • Lifecycle configurations for installing necessary packages for the SageMaker Studio IDE, including SLURM. Lifecycle configurations will be created for both JupyterLab and Code Editor. We set it up so that your Code Editor or JupyterLab instance will essentially be configured as a login node for your SageMaker HyperPod cluster.
  • An AWS Lambda function that:
    • Associates the created security-group-for-inbound-nfs security group to the SageMaker Studio domain.
    • Associates the security-group-for-inbound-nfs security group to the FSx for Lustre ENIs.
    • Optional:
      • If SharedFSx is set to True, the created partition is shared in the FSx for Lustre volume and associated to the SageMaker Studio domain.
      • If SharedFSx is set to False, a Lambda function creates the partition /{user_profile_name} and associates it to the SageMaker Studio user profile.

In the stack for SageMaker HyperPod with Amazon EKS, you create the following resources:

  • A SageMaker Studio domain.
  • Lifecycle configurations for installing necessary packages for SageMaker Studio IDE, such as kubectl and jq. Lifecycle configurations will be created for both JupyterLab and Code Editor.
  • A Lambda function that:
    • Associates the created security-group-for-inbound-nfs security group to the SageMaker Studio domain.
    • Associates the security-group-for-inbound-nfs security group to the FSx for Lustre ENIs.
    • Optional:
      • If SharedFSx is set to True, the created partition is shared in the FSx for Lustre volume and associated to the SageMaker Studio domain.
      • If SharedFSx is set to False, a Lambda function creates the partition /{user_profile_name} and associates it to the SageMaker Studio user profile.

The main difference in the implementation of the two is in the lifecycle configurations for the JupyterLab or Code Editor servers running on the two implementations of SageMaker HyperPod—this is because of the difference in how you interact with the cluster using the different orchestrators (kubectl or helm for Amazon EKS, and ssm or ssh for SLURM). In addition to mounting your cluster’s FSx for Lustre file system, for SageMaker HyperPod with Amazon EKS, the lifecycle scripts configure your JupyterLab or Code Editor server to be able to run known Kubernetes-based command line interfaces, including kubectl, eksctl, and helm. Additionally, it preconfigures your context, so that your cluster is ready to use as soon as your JupyterLab or Code Editor instance is up.

You can find the lifecycle configuration for SageMaker HyperPod with Amazon EKS on the deployed CloudFormation stack template. SLURM works a bit differently. We designed the lifecycle configuration so that your JupyterLab or Code Editor instance would serve as a login node to your SageMaker HyperPod with SLURM cluster. Login nodes allow you to log in to the cluster, submit jobs, and view and manipulate data without running on the critical slurmctld scheduler node. This also makes it possible to run monitoring servers like aim, TensorBoard, or Grafana or Prometheus. Therefore, the lifecycle configuration here automatically installs SLURM and configures it so that you can interface with your cluster using your JupyterLab or Code Editor instance. You can find the script used to configure SLURM on these instances on GitHub.

Both these configurations use the same logic to mount the file systems. The instructions found in Adding a custom file system to a domain were achieved in a custom resource (Lambda function) defined in the CloudFormation stack template.

For more details on deploying these provided stacks, check out the respective workshop pages for SageMaker HyperPod with SLURM and SageMaker HyperPod with Amazon EKS.

Data science journey on SageMaker HyperPod with SageMaker Studio

As a data scientist, after you set up the SageMaker HyperPod and SageMaker Studio integration, you can log in to the SageMaker Studio environment through your user profile.

Figure 5: You can log in to your SageMaker Studio environment through your created user profile.

In SageMaker Studio, you can select your preferred IDE to start prototyping your fine-tuning workload, and create the MLFlow tracking server to track training and system metrics during the execution of the workload.

Figure 6: Select your preferred IDE to connect to your HyperPod cluster

The SageMaker HyperPod clusters page provides information about the available clusters and details on the nodes.

Figures 7,8: You can also see information about your SageMaker HyperPod cluster on SageMaker Studio

For this post, we selected Code Editor as our preferred IDE. The automation provided by this solution preconfigured the FSx for Lustre file system and the lifecycle configuration to install the necessary modules for submitting workloads on the cluster by using the hyperpod-cli or kubectl. For the instance type, you can choose a wide range of available instances. In our case, we opted for the default ml.t3.medium.

Figure 9: CodeEditor configuration

The development environment already presents the partition mounted as a file system, where you can start prototyping your code for data preparation of model fine-tuning. For the purpose of this example, we fine-tune DeepSeek-R1-Distill-Qwen-14B using the FreedomIntelligence/medical-o1-reasoning-SFT dataset.

Figure 10: Your cluster’s files are accessible directly on your CodeEditor space, as a result of your file system being mounted directly to your CodeEditor space! This means you can develop locally, and deploy onto your ultra-cluster.

The repository is organized as follows:

  • download_model.py – The script to download the open source model directly in the FSx for Lustre volume. This way, we provide a faster and consistent execution of the training workload on SageMaker HyperPod.
  • scripts/dataprep.py – The script to download and prepare the dataset for the fine-tuning workload. In the script, we format the dataset by using the prompt style defined for the DeepSeek R1 models and save the dataset in the FSx for Lustre volume. This way, we provide a faster execution of the training workload by avoiding asset copy from other data repositories.
  • scripts/train.py – The script containing the fine-tuning logic, using open source modules like Hugging Face transformers and optimization and distribution techniques using FSDP and QLoRA.
  • scripts/evaluation.py – The script to run ROUGE evaluation on the fine-tuned model.
  • pod-finetuning.yaml – The manifest file containing the definition of the container used to execute the fine-tuning workload on the SageMaker HyperPod cluster.
  • pod-evaluation.yaml – The manifest file containing the definition of the container used to execute the evaluation workload on the SageMaker HyperPod cluster.

After downloading the model and preparing the dataset for the fine-tuning, you can start prototyping the fine-tuning script directly in the IDE.

Figure 11: You can start developing locally!

The updates done in the script will be automatically reflected in the container for the execution of the workload. When you’re ready, you can define the manifest file for the execution of the workload on SageMaker HyperPod. In the following code, we highlight the key components of the manifest. For a complete example of a Kubernetes manifest file, refer to the awsome-distributed-training GitHub repository.

...

apiVersion: "kubeflow.org/v1"
kind: PyTorchJob
metadata:
  name: deepseek-r1-qwen-14b-fine-tuning
spec:
  ...
  pytorchReplicaSpecs:
    Worker:
      replicas: 8
      restartPolicy: OnFailure
      template:
        metadata:
          labels:
            app: deepseek-r1-distill-qwen-14b-fine-tuning
        spec:
          volumes:
            - name: shmem
              hostPath: 
                path: /dev/shm
            - name: local
              hostPath:
                path: /mnt/k8s-disks/0
            - name: fsx-volume
              persistentVolumeClaim:
                claimName: fsx-claim
          serviceAccountName: eks-hyperpod-sa
          containers:
            - name: pytorch
              image: 123456789012.dkr.ecr.us-east-1.amazonaws.com/pytorch-training:2.6.0-gpu-py312-cu126-ubuntu22.04-ec2
              imagePullPolicy: Always
              resources:
                requests:
                  nvidia.com/gpu: 1
                  vpc.amazonaws.com/efa: 1
                limits:
                  nvidia.com/gpu: 1
                  vpc.amazonaws.com/efa: 1
              ...
              command:
                - /bin/bash
                - -c
                - |
                  pip install -r /data/Data-Scientist/deepseek-r1-distill-qwen-14b/requirements.txt && 
                  torchrun 
                  --nnodes=8 
                  --nproc_per_node=1 
                  /data/Data-Scientist/deepseek-r1-distill-qwen-14b/scripts/train.py 
                  --config /data/Data-Scientist/deepseek-r1-distill-qwen-14b/args-fine-tuning.yaml
              volumeMounts:
                - name: shmem
                  mountPath: /dev/shm
                - name: local
                  mountPath: /local
                - name: fsx-volume
                  mountPath: /data

The key components are as follows:

  • replicas: 8 – This specifies that eight worker pods will be created for this PyTorchJob. This is particularly important for distributed training because it determines the scale of your training job. Having eight replicas means your PyTorch training will be distributed across eight separate pods, allowing for parallel processing and faster training times.
  • Persistent volume configuration – This includes the following:
    • name: fsx-volume – Defines a named volume that will be used for storage.
    • persistentVolumeClaim – Indicates this is using Kubernetes’s persistent storage mechanism.
    • claimName: fsx-claim – References a pre-created PersistentVolumeClaim, pointing to an FSx for Lustre file system used in the SageMaker Studio environment.
  • Container image – This includes the following:
  • Training command – The highlighted command shows the execution instructions for the training workload:
    • pip install -r /data/Data-Scientist/deepseek-r1-distill-qwen-14b/requirements.txt – Installs dependencies at runtime, to customize the container with packages and modules required for the fine-tuning workload.
    • torchrun … /data/Data-Scientist/deepseek-r1-distill-qwen-14b/scripts/train.py – The actual training script, by pointing to the shared FSx for Lustre file system, in the partition created for the SageMaker Studio user profile Data-Scientist.
    • –config /data/Data-Scientist/deepseek-r1-distill-qwen-14b/args-fine-tuning.yaml – Arguments provided to the training script, which contains definition of the training parameters, and additional variables used during the execution of the workload.

The args-fine-tuning.yaml file contains the definition of the training parameters to provide to the script. In addition, the training script was defined to save training and system metrics on the managed MLflow server in SageMaker Studio, in case the Amazon Resource Name (ARN) and experiment name are provided:

# Location in the FSx for Lustre file system where the base model was saved
model_id: "/data/Data-Scientist/deepseek-r1-distill-qwen-14b/DeepSeek-R1-Distill-Qwen-14B"
mlflow_uri: "${MLFLOW_ARN}"
mlflow_experiment_name: "deepseek-r1-distill-llama-8b-agent"
# sagemaker specific parameters
# File system path where the workload will store the model 
output_dir: "/data/Data-Scientist/deepseek-r1-distill-qwen-14b/model/"
# File system path where the workload can access the dataset train dataset
train_dataset_path: "/data/Data-Scientist/deepseek-r1-distill-qwen-14b/data/train/"
# File system path where the workload can access the dataset test dataset
test_dataset_path: "/data/Data-Scientist/deepseek-r1-distill-qwen-14b/data/test/"
# training parameters
lora_r: 8
lora_alpha: 16
lora_dropout: 0.1                 
learning_rate: 2e-4                    # learning rate scheduler
num_train_epochs: 1                    # number of training epochs
per_device_train_batch_size: 2         # batch size per device during training
per_device_eval_batch_size: 2          # batch size for evaluation
gradient_accumulation_steps: 2         # number of steps before performing a backward/update pass
gradient_checkpointing: true           # use gradient checkpointing
bf16: true                             # use bfloat16 precision
tf32: false                            # use tf32 precision
fsdp: "full_shard auto_wrap offload"
fsdp_config: 
    backward_prefetch: "backward_pre"
    cpu_ram_efficient_loading: true
    offload_params: true
    forward_prefetch: false
    use_orig_params: true
merge_weights: true

The parameters model_id, output_dir, train_dataset_path, and test_dataset_path follow the same logic described for the manifest file and refer to the location where the FSx for Lustre volume is mounted in the container, under the partition Data-Scientist created for the SageMaker Studio user profile.

When you have finished the development of the fine-tuning script and defined the training parameters for the workload, you can deploy the workload with the following commands:

$ kubectl apply -f pod-finetuning.yaml
service/etcd unchanged
deployment.apps/etcd unchanged
pytorchjob.kubeflow.org/deepseek-r1-qwen-14b-fine-tuning created
$ kubectl get pods
NAME READY STATUS RESTARTS AGE
deepseek-r1-qwen-14b-fine-tuning-worker-0 1/1 Running 0 2m7s
deepseek-r1-qwen-14b-fine-tuning-worker-1 1/1 Running 0 2m7s
deepseek-r1-qwen-14b-fine-tuning-worker-2 1/1 Running 0 2m7s
deepseek-r1-qwen-14b-fine-tuning-worker-3 1/1 Running 0 2m7s
deepseek-r1-qwen-14b-fine-tuning-worker-4 1/1 Running 0 2m7s
deepseek-r1-qwen-14b-fine-tuning-worker-5 1/1 Running 0 2m7s
deepseek-r1-qwen-14b-fine-tuning-worker-6 1/1 Running 0 2m7s
deepseek-r1-qwen-14b-fine-tuning-worker-7 1/1 Running 0 2m7s
...

You can explore the logs of the workload execution directly from the SageMaker Studio IDE.

Figure 12: View the logs of the submitted training run directly in your CodeEditor terminal

You can track training and system metrics from the managed MLflow server in SageMaker Studio.

Figure 13: SageMaker Studio directly integrates with a managed MLFlow server. You can use it to track training and system metrics directly from your Studio Domain

In the SageMaker HyperPod cluster sections, you can explore cluster metrics thanks to the integration of SageMaker Studio with SageMaker HyperPod observability.

Figure 14: You can view additional cluster level/infrastructure metrics in the “Compute” -> “SageMaker HyperPod clusters” section, including GPU utilization.

At the conclusion of the fine-tuning workload, you can use the same cluster to run batch evaluation workloads on the model by deploying the manifest pod-evaluation.yaml file to run an evaluation on the fine-tuned model by using ROUGE metrics (ROUGE-1, ROUGE-2, ROUGE-L, and ROUGE-L-Sum), which measure the similarity between machine-generated text and human-written reference text.

The evaluation script uses the same SageMaker HyperPod cluster and compares results with the previously downloaded base model.

Clean up

To clean up your resources to avoid incurring more charges, follow these steps:

  1. Delete unused SageMaker Studio resources.
  2. Optionally, delete the SageMaker Studio domain.
  3. If you created a SageMaker HyperPod cluster, delete the cluster to stop incurring costs.
  4. If you created the networking stack from the SageMaker HyperPod workshop, delete the stack as well to clean up the virtual private cloud (VPC) resources and the FSx for Lustre volume.

Conclusion

In this post, we discussed how SageMaker HyperPod and SageMaker Studio can improve and speed up the development experience of data scientists by using IDEs and tooling of SageMaker Studio and the scalability and resiliency of SageMaker HyperPod with Amazon EKS. The solution simplifies the setup for the system administrator of the centralized system by using the governance and security capabilities offered by the AWS services.

We recommend starting your journey by exploring the workshops Amazon EKS Support in Amazon SageMaker HyperPod and Amazon SageMaker HyperPod, and prototyping your customized large language model by using the resources available in the awsome-distributed-training GitHub repository.

A special thanks to our colleagues Nisha Nadkarni (Sr. WW Specialist SA GenAI), Anoop Saha (Sr. Specialist WW Foundation Models), and Mair Hasco (Sr. WW GenAI/ML Specialist) in the AWS ML Frameworks team, for their support in the publication of this post.


About the authors

Bruno Pistone is a Senior Generative AI and ML Specialist Solutions Architect for AWS based in Milan. He works with large customers helping them to deeply understand their technical needs and design AI and Machine Learning solutions that make the best use of the AWS Cloud and the Amazon Machine Learning stack. His expertise include: Machine Learning end to end, Machine Learning Industrialization, and Generative AI. He enjoys spending time with his friends and exploring new places, as well as travelling to new destinations

Aman Shanbhag is a Specialist Solutions Architect on the ML Frameworks team at Amazon Web Services (AWS), where he helps customers and partners with deploying ML training and inference solutions at scale. Before joining AWS, Aman graduated from Rice University with degrees in computer science, mathematics, and entrepreneurship.

Read More

Meeting summarization and action item extraction with Amazon Nova

Meeting summarization and action item extraction with Amazon Nova

Meetings play a crucial role in decision-making, project coordination, and collaboration, and remote meetings are common across many organizations. However, capturing and structuring key takeaways from these conversations is often inefficient and inconsistent. Manually summarizing meetings or extracting action items requires significant effort and is prone to omissions or misinterpretations.

Large language models (LLMs) offer a more robust solution by transforming unstructured meeting transcripts into structured summaries and action items. This capability is especially useful for project management, customer support and sales calls, legal and compliance, and enterprise knowledge management.

In this post, we present a benchmark of different understanding models from the Amazon Nova family available on Amazon Bedrock, to provide insights on how you can choose the best model for a meeting summarization task.

LLMs to generate meeting insights

Modern LLMs are highly effective for summarization and action item extraction due to their ability to understand context, infer topic relationships, and generate structured outputs. In these use cases, prompt engineering provides a more efficient and scalable approach compared to traditional model fine-tuning or customization. Rather than modifying the underlying model architecture or training on large labeled datasets, prompt engineering uses carefully crafted input queries to guide the model’s behavior, directly influencing the output format and content. This method allows for rapid, domain-specific customization without the need for resource-intensive retraining processes. For tasks such as meeting summarization and action item extraction, prompt engineering enables precise control over the generated outputs, making sure they meet specific business requirements. It allows for the flexible adjustment of prompts to suit evolving use cases, making it an ideal solution for dynamic environments where model behaviors need to be quickly reoriented without the overhead of model fine-tuning.

Amazon Nova models and Amazon Bedrock

Amazon Nova models, unveiled at AWS re:Invent in December 2024, are built to deliver frontier intelligence at industry-leading price performance. They’re among the fastest and most cost-effective models in their respective intelligence tiers, and are optimized to power enterprise generative AI applications in a reliable, secure, and cost-effective manner.

The understanding model family has four tiers of models: Nova Micro (text-only, ultra-efficient for edge use), Nova Lite (multimodal, balanced for versatility), Nova Pro (multimodal, balance of speed and intelligence, ideal for most enterprise needs) and Nova Premier (multimodal, the most capable Nova model for complex tasks and teacher for model distillation). Amazon Nova models can be used for a variety of tasks, from summarization to structured text generation. With Amazon Bedrock Model Distillation, customers can also bring the intelligence of Nova Premier to a faster and more cost-effective model such as Nova Pro or Nova Lite for their use case or domain. This can be achieved through the Amazon Bedrock console and APIs such as the Converse API and Invoke API.

Solution overview

This post demonstrates how to use Amazon Nova understanding models, available through Amazon Bedrock, for automated insight extraction using prompt engineering. We focus on two key outputs:

  • Meeting summarization – A high-level abstractive summary that distills key discussion points, decisions made, and critical updates from the meeting transcript
  • Action items – A structured list of actionable tasks derived from the meeting conversation that apply to the entire team or project

The following diagram illustrates the solution workflow.

Meeting summary and action item summarization pipeline

Prerequisites

To follow along with this post, familiarity with calling LLMs using Amazon Bedrock is expected. For detailed steps on using Amazon Bedrock for text summarization tasks, refer to Build an AI text summarizer app with Amazon Bedrock. For additional information about calling LLMs, refer to the Invoke API and Using the Converse API reference documentation.

Solution components

We developed the two core features of the solution—meeting summarization and action item extraction—by using popular models available through Amazon Bedrock. In the following sections, we look at the prompts that were used for these key tasks.

For the meeting summarization task, we used a persona assignment, prompting the LLM to generate a summary in <summary> tags to reduce redundant opening and closing sentences, and a one-shot approach by giving the LLM one example to make sure the LLM consistently follows the right format for summary generation. As part of the system prompt, we give clear and concise rules emphasizing the correct tone, style, length, and faithfulness towards the provided transcript.

For the action item extraction task, we gave specific instructions on generating action items in the prompts and used chain-of-thought to improve the quality of the generated action items. In the assistant message, the prefix <action_items> tag is provided as a prefilling to nudge the model generation in the right direction and to avoid redundant opening and closing sentences.

Different model families respond to the same prompts differently, and it’s important to follow the prompting guide defined for the particular model. For more information on best practices for Amazon Nova prompting, refer to Prompting best practices for Amazon Nova understanding models.

Dataset

To evaluate the solution, we used the samples for the public QMSum dataset. The QMSum dataset is a benchmark for meeting summarization, featuring English language transcripts from academic, business, and governance discussions with manually annotated summaries. It evaluates LLMs on generating structured, coherent summaries from complex and multi-speaker conversations, making it a valuable resource for abstractive summarization and discourse understanding. For testing, we used 30 randomly sampled meetings from the QMSum dataset. Each meeting contained 2–5 topic-wise transcripts and contained approximately 8,600 tokens for each transcript in average.

Evaluation framework

Achieving high-quality outputs from LLMs in meeting summarization and action item extraction can be a challenging task. Traditional evaluation metrics such as ROUGE, BLEU, and METEOR focus on surface-level similarity between generated text and reference summaries, but they often fail to capture nuances such as factual correctness, coherence, and actionability. Human evaluation is the gold standard but is expensive, time-consuming, and not scalable. To address these challenges, you can use LLM-as-a-judge, where another LLM is used to systematically assess the quality of generated outputs based on well-defined criteria. This approach offers a scalable and cost-effective way to automate evaluation while maintaining high accuracy. In this example, we used Anthropic’s Claude 3.5 Sonnet v1 as the judge model because we found it to be most aligned with human judgment. We used the LLM judge to score the generated responses on three main metrics: faithfulness, summarization, and question answering (QA).

The faithfulness score measures the faithfulness of a generated summary by measuring the portion of the parsed statements in a summary that are supported by given context (for example, a meeting transcript) with respect to the total number of statements.

The summarization score is the combination of the QA score and the conciseness score with the same weight (0.5). The QA score measures the coverage of a generated summary from a meeting transcript. It first generates a list of question and answer pairs from a meeting transcript and measures the portion of the questions that are asked correctly when the summary is used as a context instead of a meeting transcript. The QA score is complimentary to the faithfulness score because the faithfulness score doesn’t measure the coverage of a generated summary. We only used the QA score to measure the quality of a generated summary because the action items aren’t supposed to cover all aspects of a meeting transcript. The conciseness score measures the ratio of the length of a generated summary divided by the length of the total meeting transcript.

We used a modified version of the faithfulness score and the summarization score that had much lower latency than the original implementation.

Results

Our evaluation of Amazon Nova models across meeting summarization and action item extraction tasks revealed clear performance-latency patterns. For summarization, Nova Premier achieved the highest faithfulness score (1.0) with a processing time of 5.34s, while Nova Pro delivered 0.94 faithfulness in 2.9s. The smaller Nova Lite and Nova Micro models provided faithfulness scores of 0.86 and 0.83 respectively, with faster processing times of 2.13s and 1.52s. In action item extraction, Nova Premier again led in faithfulness (0.83) with 4.94s processing time, followed by Nova Pro (0.8 faithfulness, 2.03s). Interestingly, Nova Micro (0.7 faithfulness, 1.43s) outperformed Nova Lite (0.63 faithfulness, 1.53s) in this particular task despite its smaller size. These measurements provide valuable insights into the performance-speed characteristics across the Amazon Nova model family for text-processing applications. The following graphs show these results. The following screenshot shows a sample output for our summarization task, including the LLM-generated meeting summary and a list of action items.

results on meeting summary

faithfulness score on action item summarization

example of meeting and action item summarization

Conclusion

In this post, we showed how you can use prompting to generate meeting insights such as meeting summaries and action items using Amazon Nova models available through Amazon Bedrock. For large-scale AI-driven meeting summarization, optimizing latency, cost, and accuracy is essential. The Amazon Nova family of understanding models (Nova Micro, Nova Lite, Nova Pro, and Nova Premier) offers a practical alternative to high-end models, significantly improving inference speed while reducing operational costs. These factors make Amazon Nova an attractive choice for enterprises handling large volumes of meeting data at scale.

For more information on Amazon Bedrock and the latest Amazon Nova models, refer to the Amazon Bedrock User Guide and Amazon Nova User Guide, respectively. The AWS Generative AI Innovation Center has a group of AWS science and strategy experts with comprehensive expertise spanning the generative AI journey, helping customers prioritize use cases, build a roadmap, and move solutions into production. Check out the Generative AI Innovation Center for our latest work and customer success stories.


About the Authors

Baishali Chaudhury is an Applied Scientist at the Generative AI Innovation Center at AWS, where she focuses on advancing Generative AI solutions for real-world applications. She has a strong background in computer vision, machine learning, and AI for healthcare. Baishali holds a PhD in Computer Science from University of South Florida and PostDoc from Moffitt Cancer Centre.

Sungmin Hong is a Senior Applied Scientist at Amazon Generative AI Innovation Center where he helps expedite the variety of use cases of AWS customers. Before joining Amazon, Sungmin was a postdoctoral research fellow at Harvard Medical School. He holds Ph.D. in Computer Science from New York University. Outside of work, he prides himself on keeping his indoor plants alive for 3+ years.

Mengdie (Flora) Wang is a Data Scientist at AWS Generative AI Innovation Center, where she works with customers to architect and implement scalable Generative AI solutions that address their unique business challenges. She specializes in model customization techniques and agent-based AI systems, helping organizations harness the full potential of generative AI technology. Prior to AWS, Flora earned her Master’s degree in Computer Science from the University of Minnesota, where she developed her expertise in machine learning and artificial intelligence.

Anila Joshi has more than a decade of experience building AI solutions. As a AWSI Geo Leader at AWS Generative AI Innovation Center, Anila pioneers innovative applications of AI that push the boundaries of possibility and accelerate the adoption of AWS services with customers by helping customers ideate, identify, and implement secure generative AI solutions.

Read More

Building a custom text-to-SQL agent using Amazon Bedrock and Converse API

Building a custom text-to-SQL agent using Amazon Bedrock and Converse API

Developing robust text-to-SQL capabilities is a critical challenge in the field of natural language processing (NLP) and database management. The complexity of NLP and database management increases in this field, particularly while dealing with complex queries and database structures. In this post, we introduce a straightforward but powerful solution with accompanying code to text-to-SQL using a custom agent implementation along with Amazon Bedrock and Converse API.

The ability to translate natural language queries into SQL statements is a game-changer for businesses and organizations because users can now interact with databases in a more intuitive and accessible manner. However, the complexity of database schemas, relationships between tables, and the nuances of natural language can often lead to inaccurate or incomplete SQL queries. This not only compromises the integrity of the data but also hinders the overall user experience. Through a straightforward yet powerful architecture, the agent can understand your query, develop a plan of execution, create SQL statements, self-correct if there is a SQL error, and learn from its execution to improve in the future. Overtime, the agent can develop a cohesive understanding of what to do and what not to do to efficiently answer queries from users.

Solution overview

The solution is composed of an AWS Lambda function that contains the logic of the agent that communicates with Amazon DynamoDB for long-term memory retention, calls Anthropic’s Claude Sonnet in Amazon Bedrock through Converse API, uses AWS Secrets Manager to retrieve database connection details and credentials, and Amazon Relational Database Service (Amazon RDS) that contains an example Postgres database called HR Database. The Lambda function is connected to a virtual private cloud (VPC) and communicates with DynamoDB, Amazon Bedrock, and Secrets Manager through AWS PrivateLink VPC endpoints so that the Lambda can communicate with the RDS database while keeping traffic private through AWS networking.

In the demo, you can interact with the agent through the Lambda function. You can provide it a natural language query, such as “How many employees are there in each department in each region?” or “What is the employee mix by gender in each region”. The following is the solution architecture.

Solution Diagram

A custom agent build using Converse API

Converse API is provided by Amazon Bedrock for you to be able to create conversational applications. It enables powerful features such as tool use. Tool use is the ability for a large language model (LLM) to choose from a list of tools, such as running SQL queries against a database, and decide which tool to use depending on the context of the conversation. Using Converse API also means you can maintain a series of messages between User and Assistant roles to carry out a chat with an LLM such as Anthropic’s Claude 3.5 Sonnet. In this post, a custom agent called ConverseSQLAgent was created specifically for long-running agent executions and to follow a plan of execution.

The Agent loop: Agent planning, self-correction, and long-term learning

The agent contains several key features: planning and carry-over, execution and tool use, SQLAlchemy and self-correction, reflection and long-term learning using memory.

Planning and carry-over

The first step that the agent takes is to create a plan of execution to perform the text-to-SQL task. It first thinks through what the user is asking and develops a plan on how it will fulfill the request of the user. This behavior is controlled using a system prompt, which defines how the agent should behave. After the agent thinks through what it should do, it outputs the plan.

One of the challenges with long-running agent execution is that sometimes the agent will forget the plan that it was supposed to execute as the context becomes longer and longer as it conducts its steps. One of the primary ways to deal with this is by “carrying over the initial plan by injecting it back into a section in the system prompt. The system prompt is part of every converse API call, and it improves the ability of the agent to follow its plan. Because the agent may revise its plan as it progresses through the execution, the plan in the system prompt is updated as new plans emerge. Refer to the following figure on how the carry over works.

Process Flow

Execution and tool use

After the plan has been created, the agent will execute its plan one step at a time. It might decide to call on one or more tools it has access to. With Converse API, you can pass in a toolConfig that contains the toolSpec for each tool it has access to. The toolSpec defines what the tool is, a description of the tool, and the parameters that the tool requires. When the LLM decides to use a tool, it outputs a tool use block as part of its response. The application, in this case the Lambda code, needs to identify that tool use block, execute the corresponding tool, append the tool result response to the message list, and call the Converse API again. As shown at (a) in the following figure, you can add tools for the LLM to choose from by adding in a toolConfig along with toolSpecs. Part (b) shows that in the implementation of ConverseSQLAgent, tool groups contain a collection of tools, and each tool contains the toolSpec and the callable function. The tool groups are added to the agent, which in turn adds it to the Converse API call. Tool group instructions are additional instructions on how to use the tool group that get injected into the system prompt. Although you can add descriptions to each individual tool, having tool group–wide instructions enable more effective usage of the group.

Converse API processing Tools

SQLAlchemy and self-correction

The SQL tool group (these tools are part of the demo code provided), as shown in the preceding figure, is implemented using SQLAlchemy, which is a Python SQL toolkit you can use to interface with different databases without having to worry about database-specific SQL syntax. You can connect to Postgres, MySQL, and more without having to change your code every time.

In this post, there is an InvokeSQLQuery tool that allows the agent to execute arbitrary SQL statements. Although almost all database specific tasks, such as looking up schemas and tables, can be accomplished through InvokeSQLQuery, it’s better to provide SQLAlchemy implementations for specific tasks, such as GetDatabaseSchemas, which gets every schema in the database, greatly reducing the time it takes for the agent to generate the correct query. Think of it as giving the agent a shortcut to getting the information it needs. The agents can make errors in querying the database through the InvokeSQLQuery tool. The InvokeSQLQuery tool will respond with the error that it encountered back to the agent, and the agent can perform self-correction to correct the query. This flow is shown in the following diagram.

Agent Flow

Reflection and long-term learning using memory

Although self-correction is an important feature of the agent, the agent must be able to learn through its mistakes to avoid the same mistake in the future. Otherwise, the agent will continue to make the mistake, greatly reducing effectiveness and efficiency. The agent maintains a hierarchical memory structure, as shown in the following figure. The agent decides how to structure its memory. Here is an example on how it may structure it.

Memory Management

The agent can reflect on its execution, learn best practices and error avoidance, and save it into long-term memory. Long-term memory is implemented through a hierarchical memory structure with Amazon DynamoDB. The agent maintains a main memory that has pointers to other memories it has. Each memory is represented as a record in a DynamoDB table. As the agent learns through its execution and encounters errors, it can update its main memory and create new memories by maintaining an index of memories in the main memory. It can then tap onto this memory in the future to avoid errors and even improve the efficiency of queries by caching facts.

Prerequisites

Before you get started, make sure you have the following prerequisites:

Deploy the solution

The full code and instructions are available in GitHub in the Readme file.

  1. Clone the code to your working environment:

git clone https://github.com/aws-samples/aws-field-samples.git

  1. Move to ConverseSqlAgent folder
  2. Follow the steps in the Readme file in the GitHub repo

Cleanup

To dispose of the stack afterwards, invoke the following command:

cdk destroy

Conclusion

The development of robust text-to-SQL capabilities is a critical challenge in natural language processing and database management. Although current approaches have made progress, there remains room for improvement, particularly with complex queries and database structures. The introduction of the ConverseSQLAgent, a custom agent implementation using Amazon Bedrock and Converse API, presents a promising solution to this problem. The agent’s architecture, featuring planning and carry-over, execution and tool use, self-correction through SQLAlchemy, and reflection-based long-term learning, demonstrates its ability to understand natural language queries, develop and execute SQL plans, and continually improve its capabilities. As businesses seek more intuitive ways to access and manage data, solutions such as the ConverseSQLAgent hold the potential to bridge the gap between natural language and structured database queries, unlocking new levels of productivity and data-driven decision-making. To dive deeper and learn more about generative AI, check out these additional resources:


About the authors

Pavan KumarPavan Kumar is a Solutions Architect at Amazon Web Services (AWS), helping customers design robust, scalable solutions on the cloud across multiple industries. With a background in enterprise architecture and software development, Pavan has contributed to creating solutions to handle API security, API management, microservices, and geospatial information system use cases for his customers. He is passionate about learning new technologies and solving, automating, and simplifying customer problems using these solutions.

Abdullah SiddiquiAbdullah Siddiqui is a Partner Sales Solutions Architect at Amazon Web Services (AWS) based out of Toronto. He helps AWS Partners and customers build solutions using AWS services and specializes in resilience and migrations. In his spare time, he enjoys spending time with his family and traveling.

Parag SrivastavaParag Srivastava is a Solutions Architect at Amazon Web Services (AWS), helping enterprise customers with successful cloud adoption and migration. During his professional career, he has been extensively involved in complex digital transformation projects. He is also passionate about building innovative solutions around geospatial aspects of addresses.

Read More

Accelerate threat modeling with generative AI

Accelerate threat modeling with generative AI

In this post, we explore how generative AI can revolutionize threat modeling practices by automating vulnerability identification, generating comprehensive attack scenarios, and providing contextual mitigation strategies. Unlike previous automation attempts that struggled with the creative and contextual aspects of threat analysis, generative AI overcomes these limitations through its ability to understand complex system relationships, reason about novel attack vectors, and adapt to unique architectural patterns. Where traditional automation tools relied on rigid rule sets and predefined templates, AI models can now interpret nuanced system designs, infer security implications across components, and generate threat scenarios that human analysts might overlook, making effective automated threat modeling a practical reality.

Threat modeling and why it matters

Threat modeling is a structured approach to identifying, quantifying, and addressing security risks associated with an application or system. It involves analyzing the architecture from an attacker’s perspective to discover potential vulnerabilities, determine their impact, and implement appropriate mitigations. Effective threat modeling examines data flows, trust boundaries, and potential attack vectors to create a comprehensive security strategy tailored to the specific system.

In a shift-left approach to security, threat modeling serves as a critical early intervention. By implementing threat modeling during the design phase—before a single line of code is written—organizations can identify and address potential vulnerabilities at their inception point. The following diagram illustrates this workflow.

Threat modeling in shift-left

This proactive strategy significantly reduces the accumulation of security debt and transforms security from a bottleneck into an enabler of innovation. When security considerations are integrated from the beginning, teams can implement appropriate controls throughout the development lifecycle, resulting in more resilient systems built from the ground up.

Despite these clear benefits, threat modeling remains underutilized in the software development industry. This limited adoption stems from several significant challenges inherent to traditional threat modeling approaches:

  • Time requirements – The process takes 1–8 days to complete, with multiple iterations needed for full coverage. This conflicts with tight development timelines in modern software environments.
  • Inconsistent assessment – Threat modeling suffers from subjectivity. Security experts often vary in their threat identification and risk level assignments, creating inconsistencies across projects and teams.
  • Scaling limitations – Manual threat modeling can’t effectively address modern system complexity. The growth of microservices, cloud deployments, and system dependencies outpaces security teams’ capacity to identify vulnerabilities.

How generative AI can help

Generative AI has revolutionized threat modeling by automating traditionally complex analytical tasks that required human judgment, reasoning, and expertise. Generative AI brings powerful capabilities to threat modeling, combining natural language processing with visual analysis to simultaneously evaluate system architectures, diagrams, and documentation. Drawing from extensive security databases like MITRE ATT&CK and OWASP, these models can quickly identify potential vulnerabilities across complex systems. This dual capability of processing both text and visuals while referencing comprehensive security frameworks enables faster, more thorough threat assessments than traditional manual methods.

Our solution, Threat Designer, uses enterprise-grade foundation models (FMs) available in Amazon Bedrock to transform threat modeling. Using Anthropic’s Claude Sonnet 3.7 advanced multimodal capabilities, we create comprehensive threat assessments at scale. You can also use other available models from the model catalog or use your own fine-tuned model, giving you maximum flexibility to use pre-trained expertise or custom-tailored capabilities specific to your security domain and organizational requirements. This adaptability makes sure your threat modeling solution delivers precise insights aligned with your unique security posture.

Solution overview

Threat Designer is a user-friendly web application that makes advanced threat modeling accessible to development and security teams. Threat Designer uses large language models (LLMs) to streamline the threat modeling process and identify vulnerabilities with minimal human effort.

Key features include:

  • Architecture diagram analysis – Users can submit system architecture diagrams, which the application processes using multimodal AI capabilities to understand system components and relationships
  • Interactive threat catalog – The system generates a comprehensive catalog of potential threats that users can explore, filter, and refine through an intuitive interface
  • Iterative refinement – With the replay functionality, teams can rerun the threat modeling process with design improvements or modifications, and see how changes impact the system’s security posture
  • Standardized exports – Results can be exported in PDF or DOCX formats, facilitating integration with existing security documentation and compliance processes
  • Serverless architecture – The solution runs on a cloud-based serverless infrastructure, alleviating the need for dedicated servers and providing automatic scaling based on demand

The following diagram illustrates the Threat Designer architecture.

Architecture diagram

The solution is built on a serverless stack, using AWS managed services for automatic scaling, high availability, and cost-efficiency. The solution is composed of the following core components:

  • FrontendAWS Amplify hosts a ReactJS application built with the Cloudscape design system, providing the UI
  • AuthenticationAmazon Cognito manages the user pool, handling authentication flows and securing access to application resources
  • API layerAmazon API Gateway serves as the communication hub, providing proxy integration between frontend and backend services with request routing and authorization
  • Data storage – We use the following services for storage:
    • Two Amazon DynamoDB tables:
      • The agent execution state table maintains processing state
      • The threat catalog table stores identified threats and vulnerabilities
    • An Amazon Simple Storage Service (Amazon S3) architecture bucket stores system diagrams and artifacts
  • Generative AI – Amazon Bedrock provides the FM for threat modeling, analyzing architecture diagrams and identifying potential vulnerabilities
  • Backend service – An AWS Lambda function contains the REST interface business logic, built using Powertools for AWS Lambda (Python)
  • Agent service – Hosted on a Lambda function, the agent service works asynchronously to manage threat analysis workflows, processing diagrams and maintaining execution state in DynamoDB

Agent service workflow

The agent service is built on LangGraph by LangChain, with which we can orchestrate complex workflows through a graph-based structure. This approach incorporates two key design patterns:

  • Separation of concerns – The threat modeling process is decomposed into discrete, specialized steps that can be executed independently and iteratively. Each node in the graph represents a specific function, such as image processing, asset identification, data flow analysis, or threat enumeration.
  • Structured output – Each component in the workflow produces standardized, well-defined outputs that serve as inputs to subsequent steps, providing consistency and facilitating downstream integrations for consistent representation.

The agent workflow follows a directed graph where processing begins at the Start node and proceeds through several specialized stages, as illustrated in the following diagram.

Agent anatomy

The workflow includes the following nodes:

  • Image processing – The Image processing node processes the architecture diagram image and converts it in the appropriate format for the LLM to consume
  • Assets – This information, along with textual descriptions, feeds into the Assets node, which identifies and catalogs system components
  • Flows – The workflow then progresses to the Flows node, mapping data movements and trust boundaries between components
  • Threats – Lastly, the Threats node uses this information to identify potential vulnerabilities and attack vectors

A critical innovation in our agent architecture is the adaptive iteration mechanism implemented through conditional edges in the graph. This feature addresses one of the fundamental challenges in LLM-based threat modeling: controlling the comprehensiveness and depth of the analysis.

The conditional edge after the Threats node enables two powerful operational modes:

  • User-controlled iteration – In this mode, the user specifies the number of iterations the agent should perform. With each pass through the loop, the agent enriches the threat catalog by analyzing edge cases that might have been overlooked in previous iterations. This approach gives security professionals direct control over the thoroughness of the analysis.
  • Autonomous gap analysis – In fully agentic mode, a specialized gap analysis component evaluates the current threat catalog. This component identifies potential blind spots or underdeveloped areas in the threat model and triggers additional iterations until it determines the threat catalog is sufficiently comprehensive. The agent essentially performs its own quality assurance, continuously refining its output until it meets predefined completeness criteria.

Prerequisites

Before you deploy Threat Designer, make sure you have the required prerequisites in place. For more information, refer to the GitHub repo.

Get started with Threat Designer

To start using Threat Designer, follow the step-by-step deployment instructions from the project’s README available in GitHub. After you deploy the solution, you’re ready to create your first threat model. Log in and complete the following steps:

  1. Choose Submit threat model to initiate a new threat model.
  2. Complete the submission form with your system details:
    • Required fields: Provide a title and architecture diagram image.
    • Recommended fields: Provide a solution description and assumptions (these significantly improve the quality of the threat model).
  3. Configure analysis parameters:
    • Choose your iteration mode:
      1. Auto (default): The agent intelligently determines when the threat catalog is comprehensive.
      2. Manual: Specify up to 15 iterations for more control.
    • Configure your reasoning boost to specify how much time the model spends on analysis (available when using Anthropic’s Claude Sonnet 3.7).
  4. Choose Start threat modeling to launch the analysis.

Wizard

You can monitor progress through the intuitive interface, which displays each execution step in real time. The complete analysis typically takes between 5–15 minutes, depending on system complexity and selected parameters.

Workflow

When the analysis is complete, you will have access to a comprehensive threat model that you can explore, refine, and export.

Threat modeling results

Clean up

To avoid incurring future charges, delete the solution by running the ./destroy.sh script. Refer to the README for more details.

Conclusion

In this post, we demonstrated how generative AI transforms threat modeling from an exclusive, expert-driven process into an accessible security practice for all development teams. By using FMs through our Threat Designer solution, we’ve democratized sophisticated security analysis, enabling organizations to identify vulnerabilities earlier and more consistently. This AI-powered approach removes the traditional barriers of time, expertise, and scalability, making shift-left security a practical reality rather than just an aspiration—ultimately building more resilient systems without sacrificing development velocity.

Deploy Threat Designer following the README instructions, upload your architecture diagram, and quickly receive AI-generated security insights. This streamlined approach helps you integrate proactive security measures into your development process without compromising speed or innovation—making comprehensive threat modeling accessible to teams of different sizes.


About the Authors

Edvin HallvaxhiuEdvin Hallvaxhiu is a senior security architect at Amazon Web Services, specialized in cybersecurity and automation. He helps customers design secure, compliant cloud solutions.

Sindi CaliSindi Cali is a consultant with AWS Professional Services. She supports customers in building data-driven applications in AWS.

Aditi GuptaAditi Gupta is a Senior Global Engagement Manager at AWS ProServe. She specializes in delivering impactful Big Data and AI/ML solutions that enable AWS customers to maximize their business value through data utilization.

Rahul ShauryaRahul Shaurya is a Principal Data Architect at Amazon Web Services. He helps and works closely with customers building data platforms and analytical applications on AWS.

Read More

How Anomalo solves unstructured data quality issues to deliver trusted assets for AI with AWS

How Anomalo solves unstructured data quality issues to deliver trusted assets for AI with AWS

This post is co-written with Vicky Andonova and Jonathan Karon from Anomalo.

Generative AI has rapidly evolved from a novelty to a powerful driver of innovation. From summarizing complex legal documents to powering advanced chat-based assistants, AI capabilities are expanding at an increasing pace. While large language models (LLMs) continue to push new boundaries, quality data remains the deciding factor in achieving real-world impact.

A year ago, it seemed that the primary differentiator in generative AI applications would be who could afford to build or use the biggest model. But with recent breakthroughs in base model training costs (such as DeepSeek-R1) and continual price-performance improvements, powerful models are becoming a commodity. Success in generative AI is becoming less about building the right model and more about finding the right use case. As a result, the competitive edge is shifting toward data access and data quality.

In this environment, enterprises are poised to excel. They have a hidden goldmine of decades of unstructured text—everything from call transcripts and scanned reports to support tickets and social media logs. The challenge is how to use that data. Transforming unstructured files, maintaining compliance, and mitigating data quality issues all become critical hurdles when an organization moves from AI pilots to production deployments.

In this post, we explore how you can use Anomalo with Amazon Web Services (AWS) AI and machine learning (AI/ML) to profile, validate, and cleanse unstructured data collections to transform your data lake into a trusted source for production ready AI initiatives, as shown in the following figure.

Ovearall Architecture

The challenge: Analyzing unstructured enterprise documents at scale

Despite the widespread adoption of AI, many enterprise AI projects fail due to poor data quality and inadequate controls. Gartner predicts that 30% of generative AI projects will be abandoned in 2025. Even the most data-driven organizations have focused primarily on using structured data, leaving unstructured content underutilized and unmonitored in data lakes or file systems. Yet, over 80% of enterprise data is unstructured (according to MIT Sloan School research), spanning everything from legal contracts and financial filings to social media posts.

For chief information officers (CIOs), chief technical officers (CTOs), and chief information security officers (CISOs), unstructured data represents both risk and opportunity. Before you can use unstructured content in generative AI applications, you must address the following critical hurdles:

  • Extraction – Optical character recognition (OCR), parsing, and metadata generation can be unreliable if not automated and validated. In addition, if extraction is inconsistent or incomplete, it can result in malformed data.
  • Compliance and security – Handling personally identifiable information (PII) or proprietary intellectual property (IP) demands rigorous governance, especially with the EU AI Act, Colorado AI Act, General Data Protection Regulation (GDPR), California Consumer Privacy Act (CCPA), and similar regulations. Sensitive information can be difficult to identify in unstructured text, leading to inadvertent mishandling of that information.
  • Data quality – Incomplete, deprecated, duplicative, off-topic, or poorly written data can pollute your generative AI models and Retrieval Augmented Generation (RAG) context, yielding hallucinated, out-of-date, inappropriate, or misleading outputs. Making sure that your data is high-quality helps mitigate these risks.
  • Scalability and cost – Training or fine-tuning models on noisy data increases compute costs by unnecessarily growing the training dataset (training compute costs tend to grow linearly with dataset size), and processing and storing low-quality data in a vector database for RAG wastes processing and storage capacity.

In short, generative AI initiatives often falter—not because the underlying model is insufficient, but because the existing data pipeline isn’t designed to process unstructured data and still meet high-volume, high-quality ingestion and compliance requirements. Many companies are in the early stages of addressing these hurdles and are facing these problems in their existing processes:

  • Manual and time-consuming – The analysis of vast collections of unstructured documents relies on manual review by employees, creating time-consuming processes that delay projects.
  • Error-prone – Human review is susceptible to mistakes and inconsistencies, leading to inadvertent exclusion of critical data and inclusion of incorrect data.
  • Resource-intensive – The manual document review process requires significant staff time that could be better spent on higher-value business activities. Budgets can’t support the level of staffing needed to vet enterprise document collections.

Although existing document analysis processes provide valuable insights, they aren’t efficient or accurate enough to meet modern business needs for timely decision-making. Organizations need a solution that can process large volumes of unstructured data and help maintain compliance with regulations while protecting sensitive information.

The solution: An enterprise-grade approach to unstructured data quality

Anomalo uses a highly secure, scalable stack provided by AWS that you can use to detect, isolate, and address data quality problems in unstructured data–in minutes instead of weeks. This helps your data teams deliver high-value AI applications faster and with less risk. The architecture of Anomalo’s solution is shown in the following figure.

Solution Diagram

  1. Automated ingestion and metadata extraction – Anomalo automates OCR and text parsing for PDF files, PowerPoint presentations, and Word documents stored in Amazon Simple Storage Service (Amazon S3) using auto scaling Amazon Elastic Cloud Compute (Amazon EC2) instances, Amazon Elastic Kubernetes Service (Amazon EKS), and Amazon Elastic Container Registry (Amazon ECR).
  2. Continuous data observability – Anomalo inspects each batch of extracted data, detecting anomalies such as truncated text, empty fields, and duplicates before the data reaches your models. In the process, it monitors the health of your unstructured pipeline, flagging surges in faulty documents or unusual data drift (for example, new file formats, an unexpected number of additions or deletions, or changes in document size). With this information reviewed and reported by Anomalo, your engineers can spend less time manually combing through logs and more time optimizing AI features, while CISOs gain visibility into data-related risks.
  3. Governance and compliance – Built-in issue detection and policy enforcement help mask or remove PII and abusive language. If a batch of scanned documents includes personal addresses or proprietary designs, it can be flagged for legal or security review—minimizing regulatory and reputational risk. You can use Anomalo to define custom issues and metadata to be extracted from documents to solve a broad range of governance and business needs.
  4. Scalable AI on AWS – Anomalo uses Amazon Bedrock to give enterprises a choice of flexible, scalable LLMs for analyzing document quality. Anomalo’s modern architecture can be deployed as software as a service (SaaS) or through an Amazon Virtual Private Cloud (Amazon VPC) connection to meet your security and operational needs.
  5. Trustworthy data for AI business applications – The validated data layer provided by Anomalo and AWS Glue helps make sure that only clean, approved content flows into your application.
  6. Supports your generative AI architecture – Whether you use fine-tuning or continued pre-training on an LLM to create a subject matter expert, store content in a vector database for RAG, or experiment with other generative AI architectures, by making sure that your data is clean and validated, you improve application output, preserve brand trust, and mitigate business risks.

Impact

Using Anomalo and AWS AI/ML services for unstructured data provides these benefits:

  • Reduced operational burden – Anomalo’s off-the-shelf rules and evaluation engine save months of development time and ongoing maintenance, freeing time for designing new features instead of developing data quality rules.
  • Optimized costs – Training LLMs and ML models on low-quality data wastes precious GPU capacity, while vectorizing and storing that data for RAG increases overall operational costs, and both degrade application performance. Early data filtering cuts these hidden expenses.
  • Faster time to insights – Anomalo automatically classifies and labels unstructured text, giving data scientists rich data to spin up new generative prototypes or dashboards without time-consuming labeling prework.
  • Strengthened compliance and security – Identifying PII and adhering to data retention rules is built into the pipeline, supporting security policies and reducing the preparation needed for external audits.
  • Create durable value – The generative AI landscape continues to rapidly evolve. Although LLM and application architecture investments may depreciate quickly, trustworthy and curated data is a sure bet that won’t be wasted.

Conclusion

Generative AI has the potential to deliver massive value–Gartner estimates 15–20% revenue increase, 15% cost savings, and 22% productivity improvement. To achieve these results, your applications must be built on a foundation of trusted, complete, and timely data. By delivering a user-friendly, enterprise-scale solution for structured and unstructured data quality monitoring, Anomalo helps you deliver more AI projects to production faster while meeting both your user and governance requirements.

Interested in learning more? Check out Anomalo’s unstructured data quality solution and request a demo or contact us for an in-depth discussion on how to begin or scale your generative AI journey.


About the authors

Vicky Andonova is the GM of Generative AI at Anomalo, the company reinventing enterprise data quality. As a founding team member, Vicky has spent the past six years pioneering Anomalo’s machine learning initiatives, transforming advanced AI models into actionable insights that empower enterprises to trust their data. Currently, she leads a team that not only brings innovative generative AI products to market but is also building a first-in-class data quality monitoring solution specifically designed for unstructured data. Previously, at Instacart, Vicky built the company’s experimentation platform and led company-wide initiatives to grocery delivery quality. She holds a BE from Columbia University.

Jonathan Karon leads Partner Innovation at Anomalo. He works closely with companies across the data ecosystem to integrate data quality monitoring in key tools and workflows, helping enterprises achieve high-functioning data practices and leverage novel technologies faster. Prior to Anomalo, Jonathan created Mobile App Observability, Data Intelligence, and DevSecOps products at New Relic, and was Head of Product at a generative AI sales and customer success startup. He holds a BA in Cognitive Science from Hampshire College and has worked with AI and data exploration technology throughout his career.

Mahesh Biradar is a Senior Solutions Architect at AWS with a history in the IT and services industry. He helps SMBs in the US meet their business goals with cloud technology. He holds a Bachelor of Engineering from VJTI and is based in New York City (US)

Emad Tawfik is a seasoned Senior Solutions Architect at Amazon Web Services, boasting more than a decade of experience. His specialization lies in the realm of Storage and Cloud solutions, where he excels in crafting cost-effective and scalable architectures for customers.

Read More