Choosing the right approach for generative AI-powered structured data retrieval

Choosing the right approach for generative AI-powered structured data retrieval

Organizations want direct answers to their business questions without the complexity of writing SQL queries or navigating through business intelligence (BI) dashboards to extract data from structured data stores. Examples of structured data include tables, databases, and data warehouses that conform to a predefined schema. Large language model (LLM)-powered natural language query systems transform how we interact with data, so you can ask questions like “Which region has the highest revenue?” and receive immediate, insightful responses. Implementing these capabilities requires careful consideration of your specific needs—whether you need to integrate knowledge from other systems (for example, unstructured sources like documents), serve internal or external users, handle the analytical complexity of questions, or customize responses for business appropriateness, among other factors.

In this post, we discuss LLM-powered structured data query patterns in AWS. We provide a decision framework to help you select the best pattern for your specific use case.

Business challenge: Making structured data accessible

Organizations have vast amounts of structured data but struggle to make it effectively accessible to non-technical users for several reasons:

  • Business users lack the technical knowledge (like SQL) needed to query data
  • Employees rely on BI teams or data scientists for analysis, limiting self-service capabilities
  • Gaining insights often involves time delays that impact decision-making
  • Predefined dashboards constrain spontaneous exploration of data
  • Users might not know what questions are possible or where relevant data resides

Solution overview

An effective solution should provide the following:

  • A conversational interface that allows employees to query structured data sources without technical expertise
  • The ability to ask questions in everyday language and receive accurate, trustworthy answers
  • Automatic generation of visualizations and explanations to clearly communicate insights.
  • Integration of information from different data sources (both structured and unstructured) presented in a unified manner
  • Ease of integration with existing investments and rapid deployment capabilities
  • Access restriction based on identities, roles, and permissions

In the following sections, we explore five patterns that can address these needs, highlighting the architecture, ideal use cases, benefits, considerations, and implementation resources for each approach.

Pattern 1: Direct conversational interface using an enterprise assistant

This pattern uses Amazon Q Business, a generative AI-powered assistant, to provide a chat interface on data sources with native connectors. When users ask questions in natural language, Amazon Q Business connects to the data source, interprets the question, and retrieves relevant information without requiring intermediate services. The following diagram illustrates this workflow.

This approach is ideal for internal enterprise assistants that need to answer business user-facing questions from both structured and unstructured data sources in a unified experience. For example, HR personnel can ask “What’s our parental leave policy and how many employees used it last quarter?” and receive answers drawn from both leave policy documentation and employee databases together in one interaction. With this pattern, you can benefit from the following:

  • Simplified connectivity through the extensive Amazon Q Business library of built-in connectors
  • Streamlined implementation with a single service to configure and manage
  • Unified search experience for accessing both structured and unstructured information
  • Built-in understanding and respect existing identities, roles, and permissions

You can define the scope of data to be pulled in the form of a SQL query. Amazon Q Business pre-indexes database content based on defined SQL queries and uses this index when responding to user questions. Similarly, you can define the sync mode and schedule to determine how often you want to update your index. Amazon Q Business does the heavy lifting of indexing the data using a Retrieval Augmented Generation (RAG) approach and using an LLM to generate well-written answers. For more details on how to set up Amazon Q Business with an Amazon Aurora PostgreSQL-Compatible Edition connector, see Discover insights from your Amazon Aurora PostgreSQL database using the Amazon Q Business connector. You can also refer to the complete list of supported data source connectors.

Pattern 2: Enhancing BI tool with natural language querying capabilities

This pattern uses Amazon Q in QuickSight to process natural language queries against datasets that have been previously configured in Amazon QuickSight. Users can ask questions in everyday language within the QuickSight interface and get visualized answers without writing SQL. This approach works with QuickSight (Enterprise or Q edition) and supports various data sources, including Amazon Relational Database Service (Amazon RDS), Amazon Redshift, Amazon Athena, and others. The architecture is depicted in the following diagram.

This pattern is well-suited for internal BI and analytics use cases. Business analysts, executives, and other employees can ask ad-hoc questions to get immediate visualized insights in the form of dashboards. For example, executives can ask questions like “What were our top 5 regions by revenue last quarter?” and immediately see responsive charts, reducing dependency on analytics teams. The benefits of this pattern are as follows:

  • It enables natural language queries that produce rich visualizations and charts
  • No coding or machine learning (ML) experience is needed—the heavy lifting like natural language interpretation and SQL generation is managed by Amazon Q in QuickSight
  • It integrates seamlessly within the familiar QuickSight dashboard environment

Existing QuickSight users might find this the most straightforward way to take advantage of generative AI benefits. You can optimize this pattern for higher-quality results by configuring topics like curated fields, synonyms, and expected question phrasing. This pattern will pull data only from a specific configured data source in QuickSight to produce a dashboard as an output. For more details, check out QuickSight DemoCentral to view a demo in QuickSight, see the generative BI learning dashboard, and view guided instructions to create dashboards with Amazon Q. Also refer to the list of supported data sources.

Pattern 3: Combining BI visualization with conversational AI for a seamless experience

This pattern merges BI visualization capabilities with conversational AI to create a seamless knowledge experience. By integrating Amazon Q in QuickSight with Amazon Q Business (with the QuickSight plugin enabled), organizations can provide users with a unified conversational interface that draws on both unstructured and structured data. The following diagram illustrates the architecture.

This is ideal for enterprises that want an internal AI assistant to answer a variety of questions—whether it’s a metric from a database or knowledge from a document. For example, executives can ask “What was our Q4 revenue growth?” and see visualized results from data warehouses through Amazon Redshift through QuickSight, then immediately follow up with “What is our company vacation policy?” to access HR documentation—all within the same conversation flow. This pattern offers the following benefits:

  • It unifies answers from structured data (databases and warehouses) and unstructured data (documents, wikis, emails) in a single application
  • It delivers rich visualizations alongside conversational responses in a seamless experience with real-time analysis in chat
  • There is no duplication of work—if your BI team has already built datasets and topics in QuickSight for analytics, you use that in Amazon Q Business
  • It maintains conversational context when switching between data and document-based inquiries

For more details, see Query structured data from Amazon Q Business using Amazon QuickSight integration and Amazon Q Business now provides insights from your databases and data warehouses (preview).

Another variation of this pattern is recommended for BI users who want to expose unified data through rich visuals in QuickSight, as illustrated in the following diagram.

Structured data retrieval using hybrid approach option 2

For more details, see Integrate unstructured data into Amazon QuickSight using Amazon Q Business.

Pattern 4: Building knowledge bases from structured data using managed text-to-SQL

This pattern uses Amazon Bedrock Knowledge Bases to enable structured data retrieval. The service provides a fully managed text-to-SQL module that alleviates common challenges in developing natural language query applications for structured data. This implementation uses Amazon Bedrock (Amazon Bedrock Agents and Amazon Bedrock Knowledge Bases) along with your choice of data warehouse such as Amazon Redshift or Amazon SageMaker Lakehouse. The following diagram illustrates the workflow.

Structured data retrieval using Amazon Bedrock Knowledge Bases

For example, a seller can use this capability embedded into an ecommerce application to ask a complex query like “Give me top 5 products whose sales increased by 50% last year as compared to previous year? Also group the results by product category.” The system automatically generates the appropriate SQL, executes it against the data sources, and delivers results or a summarized narrative. This pattern features the following benefits:

  • It provides fully managed text-to-SQL capabilities without requiring model training
  • It enables direct querying of data from the source without data movement
  • It supports complex analytical queries on warehouse data
  • It offers flexibility in foundation model (FM) selection through Amazon Bedrock
  • API connectivity, personalization options, and context-aware chat features make it better suited for customer facing applications

Choose this pattern when you need a flexible, developer-oriented solution. This approach works well for applications (internal or external) where you control the UI design. Default outputs are primarily text or structured data. However, executing arbitrary SQL queries can be a security risk for text-to-SQL applications. It is recommended that you take precautions as needed, such as using restricted roles, read-only databases, and sandboxing. For more information on how to build this pattern, see Empower financial analytics by creating structured knowledge bases using Amazon Bedrock and Amazon Redshift. For a list of supported structured data stores, refer to Create a knowledge base by connecting to a structured data store.

Pattern 5: Custom text-to-SQL implementation with flexible model selection

This pattern represents a build-your-own solution using FMs to convert natural language to SQL, execute queries on data warehouses, and return results. Choose Amazon Bedrock when you want to quickly integrate this capability without deep ML expertise—it offers a fully managed service with ready-to-use FMs through a unified API, handling infrastructure needs with pay-as-you-go pricing. Alternatively, select Amazon SageMaker AI when you require extensive model customization to build specialized needs—it provides complete ML lifecycle tools for data scientists and ML engineers to build, train, and deploy custom models with greater control. For more information, refer to our Amazon Bedrock or Amazon SageMaker AI decision guide. The following diagram illustrates the architecture.

Structured data retrieval using Amazon Bedrock or Amazon SageMaker AI

Use this pattern if your use case requires specific open-weight models, or you want to fine-tune models on your domain-specific data. For example, if you need highly accurate results for your query, then you can use this pattern to fine-tune models on specific schema structures, while maintaining the flexibility to integrate with existing workflows and multi-cloud environments. This pattern offers the following benefits:

  • It provides maximum customization in model selection, fine-tuning, and system design
  • It supports complex logic across multiple data sources
  • It offers complete control over security and deployment in your virtual private cloud (VPC)
  • It enables flexible interface implementation (Slack bots, custom web UIs, notebook plugins)
  • You can implement it for external user-facing solutions

For more information on steps to build this pattern, see Build a robust text-to-SQL solution generating complex queries, self-correcting, and querying diverse data sources.

Pattern comparison: Making the right choice

To make effective decisions, let’s compare these patterns across key criteria.

Data workload suitability

Different out-of-the-box patterns handle transactional (operational) and analytical (historical or aggregated) data with varying degrees of effectiveness. Patterns 1 and 3, which use Amazon Q Business, work with indexed data and are optimized for lookup-style queries against previously indexed content rather than real-time transactional database queries. Pattern 2, which uses Amazon Q in QuickSight, gets visual output for transactional information for ad-hoc analysis. Pattern 4, which uses Amazon Bedrock structured data retrieval, is specifically designed for analytical systems and data warehouses, excelling at complex queries on large datasets. Pattern 5 is a self-managed text-to-SQL option that can be built to support both transactional or analytical needs of users.

Target audience

Architectures highlighted in Patterns 1, 2, and 3 (using Amazon Q Business, Amazon Q in QuickSight, or a combination) are best suited for internal enterprise use. However, you can use Amazon QuickSight Embedded to embed data visuals, dashboards, and natural language queries into both internal or customer-facing applications. Amazon Q Business serves as an enterprise AI assistant for organizational knowledge that uses subscription-based pricing tiers that is designed for internal employees. Pattern 4 (using Amazon Bedrock) can be used to build both internal as well as customer-facing applications. This is because, unlike the subscription-based model of Amazon Q Business, Amazon Bedrock provides API-driven services that alleviate per-user costs and identity management overhead for external customer scenarios. This makes it well-suited for customer-facing experiences where you need to serve potentially thousands of external users. The custom LLM solutions in Pattern 5 can similarly be tailored to external application requirements.

Interface and output format

Different patterns deliver answers through different interaction models:

  • Conversational experiences – Patterns 1 and 3 (using Amazon Q Business) provide chat-based interfaces. Pattern 4 (using Amazon Bedrock Knowledge Bases for structured data retrieval) naturally supports AI assistant integration, and Pattern 5 (a custom text-to-SQL solution) can be designed for a variety of interaction models.
  • Visualization-focused output – Pattern 2 (using Amazon Q in QuickSight) specializes in generating on-the-fly visualizations such as charts and tables in response to user questions.
  • API integration – For embedding capabilities into existing applications, Patterns 4 and 5 offer the most flexible API-based integration options.

The following figure is a comparison matrix of AWS structured data query patterns.

Pattern comparison matrix

Conclusion

Between these patterns, your optimal choice depends on the following key factors:

  • Data location and characteristics – Is your data in operational databases, already in a data warehouse, or distributed across various sources?
  • User profile and interaction model – Are you supporting internal or external users? Do they prefer conversational or visualization-focused interfaces?
  • Available resources and expertise – Do you have ML specialists available, or do you need a fully managed solution?
  • Accuracy and governance requirements – Do you need strictly controlled semantics and curation, or is broader query flexibility acceptable with monitoring?

By understanding these patterns and their trade-offs, you can architect solutions that align with your business objectives.


About the authors

Akshara Shah is a Senior Solutions Architect at Amazon Web Services. She helps commercial customers build cloud-based generative AI services to meet their business needs. She has been designing, developing, and implementing solutions that leverage AI and ML technologies for more than 10 years. Outside of work, she loves painting, exercising and spending time with family.

Sanghwa Na is a Generative AI Specialist Solutions Architect at Amazon Web Services. Based in San Francisco, he works with customers to design and build generative AI solutions using large language models and foundation models on AWS. He focuses on helping organizations adopt AI technologies that drive real business value

Read More

Revolutionizing drug data analysis using Amazon Bedrock multimodal RAG capabilities

Revolutionizing drug data analysis using Amazon Bedrock multimodal RAG capabilities

In the pharmaceutical industry, biotechnology and healthcare companies face an unprecedented challenge for efficiently managing and analyzing vast amounts of drug-related data from diverse sources. Traditional data analysis methods prove inadequate for processing complex medical documentation that includes a mix of text, images, graphs, and tables. Amazon Bedrock offers features like multimodal retrieval, advanced chunking capabilities, and citations to help organizations get high-accuracy responses.

Pharmaceutical and healthcare organizations process a vast number of complex document formats and unstructured data that pose analytical challenges. Clinical study documents and research papers related to them typically present an intricate blend of technical text, detailed tables, and sophisticated statistical graphs, making automated data extraction particularly challenging. Clinical study documents present additional challenges through non-standardized formatting and varied data presentation styles across multiple research institutions. This post showcases a solution to extract data-driven insights from complex research documents through a sample application with high-accuracy responses. It analyzes clinical trial data, patient outcomes, molecular diagrams, and safety reports from the research documents. It can help pharmaceutical companies accelerate their research process. The solution provides citations from the source documents, reducing hallucinations and enhancing the accuracy of the responses.

Solution overview

The sample application uses Amazon Bedrock to create an intelligent AI assistant that analyzes and summarizes research documents containing text, graphs, and unstructured data. Amazon Bedrock is a fully managed service that offers a choice of industry-leading foundation models (FMs) along with a broad set of capabilities to build generative AI applications, simplifying development with security, privacy, and responsible AI.

To equip FMs with up-to-date and proprietary information, organizations use Retrieval Augmented Generation (RAG), a technique that fetches data from company data sources and enriches the prompt to provide relevant and accurate responses.

Amazon Bedrock Knowledge Bases is a fully managed RAG capability within Amazon Bedrock with in-built session context management and source attribution that helps you implement the entire RAG workflow, from ingestion to retrieval and prompt augmentation, without having to build custom integrations to data sources and manage data flows.

Amazon Bedrock Knowledge Bases introduces powerful document parsing capabilities, including Amazon Bedrock Data Automation powered parsing and FM parsing, revolutionizing how we handle complex documents. Amazon Bedrock Data Automation is a fully managed service that processes multimodal data effectively, without the need to provide additional prompting. The FM option parses multimodal data using an FM. This parser provides the option to customize the default prompt used for data extraction. This advanced feature goes beyond basic text extraction by intelligently breaking down documents into distinct components, including text, tables, images, and metadata, while preserving document structure and context. When working with supported formats like PDF, specialized FMs interpret and extract tabular data, charts, and complex document layouts. Additionally, the service provides advanced chunking strategies like semantic chunking, which intelligently divides text into meaningful segments based on semantic similarity calculated by the embedding model. Unlike traditional syntactic chunking methods, this approach preserves the context and meaning of the content, improving the quality and relevance of information retrieval.

The solution architecture implements these capabilities through a seamless workflow that begins with administrators securely uploading knowledge base documents to an Amazon Simple Storage Service (Amazon S3) bucket. These documents are then ingested into Amazon Bedrock Knowledge Bases, where a large language model (LLM) processes and parses the ingested data. The solution employs semantic chunking to store document embeddings efficiently in Amazon OpenSearch Service for optimized retrieval. The solution features a user-friendly interface built with Streamlit, providing an intuitive chat experience for end-users. When users interact with the Streamlit application, it triggers AWS Lambda functions that handle the requests, retrieving relevant context from the knowledge base and generating appropriate responses. The architecture is secured through AWS Identity and Access Management (IAM), maintaining proper access control throughout the workflow. Amazon Bedrock uses AWS Key Management Service (AWS KMS) to encrypt resources related to your knowledge bases. By default, Amazon Bedrock encrypts this data using an AWS managed key. Optionally, you can encrypt the model artifacts using a customer managed key. This end-to-end solution provides efficient document processing, context-aware information retrieval, and secure user interactions, delivering accurate and comprehensive responses through a seamless chat interface.

The following diagram illustrates the solution architecture.

Architecture diagram

This solution uses the following additional services and features:

  • The Anthropic Claude 3 family offers Opus, Sonnet, and Haiku models that accept text, image, and video inputs and generate text output. They provide a broad selection of capability, accuracy, speed, and cost operation points. These models understand complex research documents that include charts, graphs, tables, diagrams, and reports.
  • AWS Lambda is a serverless computing service that empowers you to run code without provisioning or managing servers cost effectively.
  • Amazon S3 is a highly scalable, durable, and secure object storage service.
  • Amazon OpenSearch Service is a fully managed search and analytics engine for efficient document retrieval. The OpenSearch Service vector database capabilities enable semantic search, RAG with LLMs, recommendation engines, and search rich media.
  • Streamlit is a faster way to build and share data applications using interactive web-based data applications in pure Python.

Prerequisites

The following prerequisites are needed to proceed with this solution. For this post, we use the us-east-1 AWS Region. For details on available Regions, see Amazon Bedrock endpoints and quotas.

Deploy the solution

Refer to the GitHub repository for the deployment steps listed under the deployment guide section. We use an AWS CloudFormation template to deploy solution resources, including S3 buckets to store the source data and knowledge base data.

Test the sample application

Imagine you are a member of an R&D department for a biotechnology firm, and your job requires you to derive insights from drug- and vaccine-related information from diverse sources like research studies, drug specifications, and industry papers. You are performing research on cancer vaccines and want to gain insights based on cancer research publications. You can upload the documents given in the reference section to the S3 bucket and sync the knowledge base. Let’s explore example interactions that demonstrate the application’s capabilities. The responses generated by the AI assistant are based on the documents uploaded to the S3 bucket connected with the knowledge base. Due to non-deterministic nature of machine learning (ML), your responses might be slightly different from the ones presented in this post.

Understanding historical context

We use the following query: “Create a timeline of major developments in mRNA vaccine technology for cancer treatment based on the information provided in the historical background sections.”The assistant analyzes multiple documents and presents a chronological progression of mRNA vaccine development, including key milestones based on the chunks of information retrieved from the OpenSearch Service vector database.

The following screenshot shows the AI assistant’s response.

RAG Chatbot Assistant

Complex data analysis

We use the following query: “Synthesize the information from the text, figures, and tables to provide a comprehensive overview of the current state and future prospects of therapeutic cancer vaccines.”

The AI assistant is able to provide insights from complex data types, which is enabled by FM parsing, while ingesting the data to OpenSearch Service. It is also able to provide images in the source attribution using the multimodal data capabilities of Amazon Bedrock Knowledge Bases.

The following screenshot shows the AI assistant’s response.

RAG Response 02

The following screenshot shows the visuals provided in the citations when the mouse hovers over the question mark icon.

RAG Response 03

Comparative analysis

We use the following query: “Compare the efficacy and safety profiles of MAGE-A3 and NY-ESO-1 based vaccines as described in the text and any relevant tables or figures.”

The AI assistant used the semantically similar chunks returned from the OpenSearch Service vector database and added this context to the user’s question, which enabled the FM to provide a relevant answer.

The following screenshot shows the AI assistant’s response.

RAG Response 04

Technical deep dive

We use the following query: “Summarize the potential advantages of mRNA vaccines over DNA vaccines for targeting tumor angiogenesis, as described in the review.”

With the semantic chunking feature of the knowledge base, the AI assistant was able to get the relevant context from the OpenSearch Service database with higher accuracy.

The following screenshot shows the AI assistant’s response.

RAG Response 05

The following screenshot shows the diagram that was used for the answer as one of the citations.

RAG Response 06

The sample application demonstrates the following:

  • Accurate interpretation of complex scientific diagrams
  • Precise extraction of data from tables and graphs
  • Context-aware responses that maintain scientific accuracy
  • Source attribution for provided information
  • Ability to synthesize information across multiple documents

This application can help you quickly analyze vast amounts of complex scientific literature, extracting meaningful insights from diverse data types while maintaining accuracy and providing proper attribution to source materials. This is enabled by the advanced features of the knowledge bases, including FM parsing, which aides in interpreting complex scientific diagrams and extraction of data from tables and graphs, semantic chunking, which aides with high-accuracy context-aware responses, and multimodal data capabilities, which aides in providing relevant images as source attribution.

These are some of the many new features added to Amazon Bedrock, empowering you to generate high-accuracy results depending on your use case. To learn more, see New Amazon Bedrock capabilities enhance data processing and retrieval.

Production readiness

The proposed solution accelerates the time to value of the project development process. Solutions built on the AWS Cloud benefit from inherent scalability while maintaining robust security and privacy controls.

The security and privacy framework includes fine-grained user access controls using IAM for both OpenSearch Service and Amazon Bedrock services. In addition, Amazon Bedrock enhances security by providing encryption at rest and in transit, and private networking options using virtual private cloud (VPC) endpoints. Data protection is achieved using KMS keys, and API calls and usage are tracked through Amazon CloudWatch logs and metrics. For specific compliance validation for Amazon Bedrock, see Compliance validation for Amazon Bedrock.

For additional details on moving RAG applications to production, refer to From concept to reality: Navigating the Journey of RAG from proof of concept to production.

Clean up

Complete the following steps to clean up your resources.

  1. Empty the SourceS3Bucket and KnowledgeBaseS3BucketName buckets.
  2. Delete the main CloudFormation stack.

Conclusion

This post demonstrated the powerful multimodal document analysis (text, graphs, images) using advanced parsing and chunking features of Amazon Bedrock Knowledge Bases. By combining the powerful capabilities of Amazon Bedrock FMs, OpenSearch Service, and intelligent chunking strategies through Amazon Bedrock Knowledge Bases, organizations can transform their complex research documents into searchable, actionable insights. The integration of semantic chunking makes sure that document context and relationships are preserved, and the user-friendly Streamlit interface makes the system accessible to end-users through an intuitive chat experience. This solution not only streamlines the process of analyzing research documents, but also demonstrates the practical application of AI/ML technologies in enhancing knowledge discovery and information retrieval. As organizations continue to grapple with increasing volumes of complex documents, this scalable and intelligent system provides a robust framework for extracting maximum value from their document repositories.

Although our demonstration focused on the healthcare industry, the versatility of this technology extends beyond a single industry. RAG on Amazon Bedrock has proven its value across diverse sectors. Notable adopters include global brands like Adidas in retail, Empolis in information management, Fractal Analytics in AI solutions, Georgia Pacific in manufacturing, and Nasdaq in financial services. These examples illustrate the broad applicability and transformative potential of RAG technology across various business domains, highlighting its ability to drive innovation and efficiency in multiple industries.

Refer to the GitHub repo for the agentic RAG application, including samples and components for building agentic RAG solutions. Be on the lookout for additional features and samples in the repository in the coming months.

To learn more about Amazon Bedrock Knowledge Bases, check out the RAG workshop using Amazon Bedrock. Get started with Amazon Bedrock Knowledge Bases, and let us know your thoughts in the comments section.

References

The following are sample research documents available with an open access distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license https://creativecommons.org/licenses/by/4.0/:


About the authors

Vivek Mittal is a Solution Architect at Amazon Web Services, where he helps organizations architect and implement cutting-edge cloud solutions. With a deep passion for Generative AI, Machine Learning, and Serverless technologies, he specializes in helping customers harness these innovations to drive business transformation. He finds particular satisfaction in collaborating with customers to turn their ambitious technological visions into reality.

Sharmika's portraitShamika Ariyawansa, serving as a Senior AI/ML Solutions Architect in the Global Healthcare and Life Sciences division at Amazon Web Services (AWS), has a keen focus on Generative AI. He assists customers in integrating Generative AI into their projects, emphasizing the importance of explainability within their AI-driven initiatives. Beyond his professional commitments, Shamika passionately pursues skiing and off-roading adventures.

Shaik Abdulla is a Sr. Solutions Architect, specializes in architecting enterprise-scale cloud solutions with focus on Analytics, Generative AI and emerging technologies. His technical expertise is validated by his achievement of all 12 AWS certifications and the prestigious Golden jacket recognition. He has a passion to architect and implement innovative cloud solutions that drive business transformation. He speaks at major industry events like AWS re:Invent and regional AWS Summits, where he shares insights on cloud architecture and emerging technologies.

Read More

How AI Factories Can Help Relieve Grid Stress

How AI Factories Can Help Relieve Grid Stress

In many parts of the world, including major technology hubs in the U.S., there’s a yearslong wait for AI factories to come online, pending the buildout of new energy infrastructure to power them.

Emerald AI, a startup based in Washington, D.C., is developing an AI solution that could enable the next generation of data centers to come online sooner by tapping existing energy resources in a more flexible and strategic way.

“Traditionally, the power grid has treated data centers as inflexible — energy system operators assume that a 500-megawatt AI factory will always require access to that full amount of power,” said Varun Sivaram, founder and CEO of Emerald AI. “But in moments of need, when demands on the grid peak and supply is short, the workloads that drive AI factory energy use can now be flexible.”

That flexibility is enabled by the startup’s Emerald Conductor platform, an AI-powered system that acts as a smart mediator between the grid and a data center. In a recent field test in Phoenix, Arizona, the company and its partners demonstrated that its software can reduce the power consumption of AI workloads running on a cluster of 256 NVIDIA GPUs by 25% over three hours during a grid stress event while preserving compute service quality.

Emerald AI achieved this by orchestrating the host of different workloads that AI factories run. Some jobs can be paused or slowed, like the training or fine-tuning of a large language model for academic research. Others, like inference queries for an AI service used by thousands or even millions of people, can’t be rescheduled, but could be redirected to another data center where the local power grid is less stressed.

Emerald Conductor coordinates these AI workloads across a network of data centers to meet power grid demands, ensuring full performance of time-sensitive workloads while dynamically reducing the throughput of flexible workloads within acceptable limits.

Beyond helping AI factories come online using existing power systems, this ability to modulate power usage could help cities avoid rolling blackouts, protect communities from rising utility rates and make it easier for the grid to integrate clean energy.

“Renewable energy, which is intermittent and variable, is easier to add to a grid if that grid has lots of shock absorbers that can shift with changes in power supply,” said Ayse Coskun, Emerald AI’s chief scientist and a professor at Boston University. “Data centers can become some of those shock absorbers.”

A member of the NVIDIA Inception program for startups and an NVentures portfolio company, Emerald AI today announced more than $24 million in seed funding. Its Phoenix demonstration, part of EPRI’s DCFlex data center flexibility initiative, was executed in collaboration with NVIDIA, Oracle Cloud Infrastructure (OCI) and the regional power utility Salt River Project (SRP).

“The Phoenix technology trial validates the vast potential of an essential element in data center flexibility,” said Anuja Ratnayake, who leads EPRI’s DCFlex Consortium.

EPRI is also leading the Open Power AI Consortium, a group of energy companies, researchers and technology companies — including NVIDIA — working on AI applications for the energy sector.

Using the Grid to Its Full Potential

Electric grid capacity is typically underused except during peak events like hot summer days or cold winter storms, when there’s a high power demand for cooling and heating. That means, in many cases, there’s room on the existing grid for new data centers, as long as they can temporarily dial down energy usage during periods of peak demand.

A recent Duke University study estimates that if new AI data centers could flex their electricity consumption by just 25% for two hours at a time, less than 200 hours a year, they could unlock 100 gigawatts of new capacity to connect data centers — equivalent to over $2 trillion in data center investment.

Quote from article

Putting AI Factory Flexibility to the Test

Emerald AI’s recent trial was conducted in the Oracle Cloud Phoenix Region on NVIDIA GPUs spread across a multi-rack cluster managed through Databricks MosaicML.

“Rapid delivery of high-performance compute to AI customers is critical but is constrained by grid power availability,” said Pradeep Vincent, chief technical architect and senior vice president of Oracle Cloud Infrastructure, which supplied cluster power telemetry for the trial. “Compute infrastructure that is responsive to real-time grid conditions while meeting the performance demands unlocks a new model for scaling AI — faster, greener and more grid-aware.”

Jonathan Frankle, chief AI scientist at Databricks, guided the trial’s selection of AI workloads and their flexibility thresholds.

“There’s a certain level of latent flexibility in how AI workloads are typically run,” Frankle said. “Often, a small percentage of jobs are truly non-preemptible, whereas many jobs such as training, batch inference or fine-tuning have different priority levels depending on the user.”

Because Arizona is among the top states for data center growth, SRP set challenging flexibility targets for the AI compute cluster — a 25% power consumption reduction compared with baseline load — in an effort to demonstrate how new data centers can provide meaningful relief to Phoenix’s power grid constraints.

“This test was an opportunity to completely reimagine AI data centers as helpful resources to help us operate the power grid more effectively and reliably,” said David Rousseau, president of SRP.

On May 3, a hot day in Phoenix with high air-conditioning demand, SRP’s system experienced peak demand at 6 p.m. During the test, the data center cluster reduced consumption gradually with a 15-minute ramp down, maintained the 25% power reduction over three hours, then ramped back up without exceeding its original baseline consumption.

AI factory users can label their workloads to guide Emerald’s software on which jobs can be slowed, paused or rescheduled — or, Emerald’s AI agents can make these predictions automatically.

Dual chart showing GPU cluster power and SRP load over time in Phoenix on May 3, 2025, alongside a bar chart comparing job performance across flex tiers.
(Left panel): AI GPU cluster power consumption during SRP grid peak demand on May 3, 2025; (Right panel): Performance of AI jobs by flexibility tier. Flex 1 allows up to 10% average throughput reduction, Flex 2 up to 25% and Flex 3 up to 50% over a six-hour period. Figure courtesy of Emerald AI.

Orchestration decisions were guided by the Emerald Simulator, which accurately models system behavior to optimize trade-offs between energy usage and AI performance. Historical grid demand from data provider Amperon confirmed that the AI cluster performed correctly during the grid’s peak period.

Line graph showing power usage over time on May 2, 2025, for simulator, AI cluster and individual jobs.
Comparison of Emerald Simulator prediction of AI GPU cluster power with real-world measured power consumption. Figure courtesy of Emerald AI.

Forging an Energy-Resilient Future

The International Energy Agency projects that electricity demand from data centers globally could more than double by 2030. In light of the anticipated demand on the grid, the state of Texas passed a law that requires data centers to ramp down consumption or disconnect from the grid at utilities’ requests during load shed events.

“In such situations, if data centers are able to dynamically reduce their energy consumption, they might be able to avoid getting kicked off the power supply entirely,” Sivaram said.

Looking ahead, Emerald AI is expanding its technology trials in Arizona and beyond — and it plans to continue working with NVIDIA to test its technology on AI factories.

“We can make data centers controllable while assuring acceptable AI performance,” Sivaram said. “AI factories can flex when the grid is tight — and sprint when users need them to.”

Learn more about NVIDIA Inception and explore AI platforms designed for power and utilities.

Read More

EgoDex: Learning Dexterous Manipulation from Large-Scale Egocentric Video

Imitation learning for manipulation has a well-known data scarcity problem. Unlike natural language and 2D computer vision, there is no Internet-scale corpus of data for dexterous manipulation. One appealing option is egocentric human video, a passively scalable data source. However, existing large-scale datasets such as Ego4D do not have native hand pose annotations and do not focus on object manipulation. To this end, we use Apple Vision Pro to collect EgoDex: the largest and most diverse dataset of dexterous human manipulation to date. EgoDex has 829 hours of egocentric video with paired 3D…Apple Machine Learning Research