Using AI to expand global access to reliable flood forecasts

Using AI to expand global access to reliable flood forecasts

Floods are the most common natural disaster, and are responsible for roughly $50 billion in annual financial damages worldwide. The rate of flood-related disasters has more than doubled since the year 2000 partly due to climate change. Nearly 1.5 billion people, making up 19% of the world’s population, are exposed to substantial risks from severe flood events. Upgrading early warning systems to make accurate and timely information accessible to these populations can save thousands of lives per year.

Driven by the potential impact of reliable flood forecasting on people’s lives globally, we started our flood forecasting effort in 2017. Through this multi-year journey, we advanced research over the years hand-in-hand with building a real-time operational flood forecasting system that provides alerts on Google Search, Maps, Android notifications and through the Flood Hub. However, in order to scale globally, especially in places where accurate local data is not available, more research advances were required.

In “Global prediction of extreme floods in ungauged watersheds”, published in Nature, we demonstrate how machine learning (ML) technologies can significantly improve global-scale flood forecasting relative to the current state-of-the-art for countries where flood-related data is scarce. With these AI-based technologies we extended the reliability of currently-available global nowcasts, on average, from zero to five days, and improved forecasts across regions in Africa and Asia to be similar to what are currently available in Europe. The evaluation of the models was conducted in collaboration with the European Center for Medium Range Weather Forecasting (ECMWF).

These technologies also enable Flood Hub to provide real-time river forecasts up to seven days in advance, covering river reaches across over 80 countries. This information can be used by people, communities, governments and international organizations to take anticipatory action to help protect vulnerable populations.

Flood forecasting at Google

The ML models that power the FloodHub tool are the product of many years of research, conducted in collaboration with several partners, including academics, governments, international organizations, and NGOs.

In 2018, we launched a pilot early warning system in the Ganges-Brahmaputra river basin in India, with the hypothesis that ML could help address the challenging problem of reliable flood forecasting at scale. The pilot was further expanded the following year via the combination of an inundation model, real-time water level measurements, the creation of an elevation map and hydrologic modeling.

In collaboration with academics, and, in particular, with the JKU Institute for Machine Learning we explored ML-based hydrologic models, showing that LSTM-based models could produce more accurate simulations than traditional conceptual and physics-based hydrology models. This research led to flood forecasting improvements that enabled the expansion of our forecasting coverage to include all of India and Bangladesh. We also worked with researchers at Yale University to test technological interventions that increase the reach and impact of flood warnings.

Our hydrological models predict river floods by processing publicly available weather data like precipitation and physical watershed information. Such models must be calibrated to long data records from streamflow gauging stations in individual rivers. A low percentage of global river watersheds (basins) have streamflow gauges, which are expensive but necessary to supply relevant data, and it’s challenging for hydrological simulation and forecasting to provide predictions in basins that lack this infrastructure. Lower gross domestic product (GDP) is correlated with increased vulnerability to flood risks, and there is an inverse correlation between national GDP and the amount of publicly available data in a country. ML helps to address this problem by allowing a single model to be trained on all available river data and to be applied to ungauged basins where no data are available. In this way, models can be trained globally, and can make predictions for any river location.

There is an inverse (log-log) correlation between the amount of publicly available streamflow data in a country and national GDP. Streamflow data from the Global Runoff Data Center.

Our academic collaborations led to ML research that developed methods to estimate uncertainty in river forecasts and showed how ML river forecast models synthesize information from multiple data sources. They demonstrated that these models can simulate extreme events reliably, even when those events are not part of the training data. In an effort to contribute to open science, in 2023 we open-sourced a community-driven dataset for large-sample hydrology in Nature Scientific Data.

The river forecast model

Most hydrology models used by national and international agencies for flood forecasting and river modeling are state-space models, which depend only on daily inputs (e.g., precipitation, temperature, etc.) and the current state of the system (e.g., soil moisture, snowpack, etc.). LSTMs are a variant of state-space models and work by defining a neural network that represents a single time step, where input data (such as current weather conditions) are processed to produce updated state information and output values (streamflow) for that time step. LSTMs are applied sequentially to make time-series predictions, and in this sense, behave similarly to how scientists typically conceptualize hydrologic systems. Empirically, we have found that LSTMs perform well on the task of river forecasting.

A diagram of the LSTM, which is a neural network that operates sequentially in time. An accessible primer can be found here.

Our river forecast model uses two LSTMs applied sequentially: (1) a “hindcast” LSTM ingests historical weather data (dynamic hindcast features) up to the present time (or rather, the issue time of a forecast), and (2) a “forecast” LSTM ingests states from the hindcast LSTM along with forecasted weather data (dynamic forecast features) to make future predictions. One year of historical weather data are input into the hindcast LSTM, and seven days of forecasted weather data are input into the forecast LSTM. Static features include geographical and geophysical characteristics of watersheds that are input into both the hindcast and forecast LSTMs and allow the model to learn different hydrological behaviors and responses in various types of watersheds.

Output from the forecast LSTM is fed into a “head” layer that uses mixture density networks to produce a probabilistic forecast (i.e., predicted parameters of a probability distribution over streamflow). Specifically, the model predicts the parameters of a mixture of heavy-tailed probability density functions, called asymmetric Laplacian distributions, at each forecast time step. The result is a mixture density function, called a Countable Mixture of Asymmetric Laplacians (CMAL) distribution, which represents a probabilistic prediction of the volumetric flow rate in a particular river at a particular time.

LSTM-based river forecast model architecture. Two LSTMs are applied in sequence, one ingesting historical weather data and one ingesting forecasted weather data. The model outputs are the parameters of a probability distribution over streamflow at each forecasted timestep.

Input and training data

The model uses three types of publicly available data inputs, mostly from governmental sources:

  1. Static watershed attributes representing geographical and geophysical variables: From the HydroATLAS project, including data like long-term climate indexes (precipitation, temperature, snow fractions), land cover, and anthropogenic attributes (e.g., a nighttime lights index as a proxy for human development).
  2. Historical meteorological time-series data: Used to spin up the model for one year prior to the issue time of a forecast. The data comes from NASA IMERG, NOAA CPC Global Unified Gauge-Based Analysis of Daily Precipitation, and the ECMWF ERA5-land reanalysis. Variables include daily total precipitation, air temperature, solar and thermal radiation, snowfall, and surface pressure.
  3. Forecasted meteorological time series over a seven-day forecast horizon: Used as input for the forecast LSTM. These data are the same meteorological variables listed above, and come from the ECMWF HRES atmospheric model.

Training data are daily streamflow values from the Global Runoff Data Center over the time period 1980 – 2023. A single streamflow forecast model is trained using data from 5,680 diverse watershed streamflow gauges (shown below) to improve accuracy.

Location of 5,680 streamflow gauges that supply training data for the river forecast model from the Global Runoff Data Center.

Improving on the current state-of-the-art

We compared our river forecast model with GloFAS version 4, the current state-of-the-art global flood forecasting system. These experiments showed that ML can provide accurate warnings earlier and over larger and more impactful events.

The figure below shows the distribution of F1 scores when predicting different severity events at river locations around the world, with plus or minus 1 day accuracy. F1 scores are an average of precision and recall and event severity is measured by return period. For example, a 2-year return period event is a volume of streamflow that is expected to be exceeded on average once every two years. Our model achieves reliability scores at up to 4-day or 5-day lead times that are similar to or better, on average, than the reliability of GloFAS nowcasts (0-day lead time).

Distributions of F1 scores over 2-year return period events in 2,092 watersheds globally during the time period 2014-2023 from GloFAS (blue) and our model (orange) at different lead times. On average, our model is statistically as accurate as GloFAS nowcasts (0–day lead time) up to 5 days in advance over 2-year (shown) and 1-year, 5-year, and 10-year events (not shown).

Additionally (not shown), our model achieves accuracies over larger and rarer extreme events, with precision and recall scores over 5-year return period events that are similar to or better than GloFAS accuracies over 1-year return period events. See the paper for more information.

Looking into the future

The flood forecasting initiative is part of our Adaptation and Resilience efforts and reflects Google’s commitment to address climate change while helping global communities become more resilient. We believe that AI and ML will continue to play a critical role in helping advance science and research towards climate action.

We actively collaborate with several international aid organizations (e.g., the Centre for Humanitarian Data and the Red Cross) to provide actionable flood forecasts. Additionally, in an ongoing collaboration with the World Meteorological Organization (WMO) to support early warning systems for climate hazards, we are conducting a study to help understand how AI can help address real-world challenges faced by national flood forecasting agencies.

While the work presented here demonstrates a significant step forward in flood forecasting, future work is needed to further expand flood forecasting coverage to more locations globally and other types of flood-related events and disasters, including flash floods and urban floods. We are looking forward to continuing collaborations with our partners in the academic and expert communities, local governments and the industry to reach these goals.

Read More

Research Focus: Week of March 18, 2024

Research Focus: Week of March 18, 2024

Welcome to Research Focus, a series of blog posts that highlights notable publications, events, code/datasets, new hires and other milestones from across the research community at Microsoft.

Research Focus March 20, 2024

Fewer is More: Boosting LLM Reasoning with Reinforced Context Pruning

Large language models (LLMs) have shown impressive capabilities, yet they still struggle with math reasoning. In a recent paper: Fewer is More: Boosting LLM Reasoning with Reinforced Context Pruning, researchers from Microsoft propose CoT-Influx, a novel approach that pushes the boundary of few-shot chain-of-Thought (CoT) learning to improve LLM mathematical reasoning.

Given that adding more concise CoT examples in the prompt can improve LLM reasoning performance, CoT-Influx employs a coarse-to-fine pruner to maximize the input of effective and concise CoT examples. The pruner first selects as many crucial CoT examples as possible and then prunes unimportant tokens to fit the context window. A math reasoning dataset with diverse difficulty levels and reasoning steps is used to train the pruner, along with a math-specialized reinforcement learning approach. As a result, by enabling more CoT examples with double the context window size in tokens, CoT-Influx significantly outperforms various prompting baselines across various LLMs (LLaMA2-7B, 13B, 70B) and 5 math datasets, achieving up to 4.55% absolute improvements. Remarkably, without any fine-tuning, LLaMA2-70B with CoT-Influx surpasses GPT-3.5 and a wide range of larger LLMs (PaLM, Minerva 540B, etc.) on the GSM8K. CoT-Influx serves as a plug-and-play module for LLMs and is compatible with most existing reasoning prompting techniques, such as self-consistency and self-verification.

Microsoft Research Podcast

Collaborators: Holoportation™ communication technology with Spencer Fowers and Kwame Darko

Spencer Fowers and Kwame Darko break down how the technology behind Holoportation and the telecommunication device being built around it brings patients and doctors together when being in the same room isn’t an easy option and discuss the potential impact of the work.


From User Surveys to Telemetry-Driven Agents: Exploring the Potential of Personalized Productivity Solutions

Organizations and individuals continuously strive to enhance their efficiency, improve time management, and optimize their work processes. Rapid advancements in AI, natural language processing, and machine learning technologies create new opportunities to develop tools that boost productivity. 

In a recent paper: From User Surveys to Telemetry-Driven Agents: Exploring the Potential of Personalized Productivity Solutions, researchers from Microsoft present a comprehensive, user-centric approach to understand preferences in AI-based productivity agents and develop personalized solutions. The research began with a survey of 363 participants, seeking to reveal users’ specific needs and preferences for productivity agents such as relevant productivity challenges of information workers, preferred communication style and approach towards solving problems, and privacy expectations. With the survey insights, the researchers then developed a GPT-4 powered personalized productivity agent that uses telemetry data gathered from information workers via Viva Insights to provide tailored assistance. The agent’s performance was compared with alternative productivity-assistive tools, such as the traditional dashboard and AI-enabled summaries, in a study involving 40 participants. The findings highlight the importance of user-centric design, adaptability, and the balance between personalization and privacy in AI-assisted productivity tools. The insights distilled from this study could support future research to further enhance productivity solutions, ultimately leading to optimized efficiency and user experiences for information workers.


LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens

The size of the context window of a large language model (LLM) determines the amount of text that can be entered for processing to generate responses. The window size is specifically measured by anumber of tokens—larger windows are more desirable. However, due to high fine-tuning costs, scarcity of long texts, and catastrophic values introduced by new token positions, current extended context windows are limited to around 128k tokens. 

In a recent paper: LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens, researchers from Microsoft introduce a new method that extends the context window of pre-trained LLMs to an impressive 2.048 million tokens, without requiring direct fine-tuning on texts with extremely long lengths, which are scarce, while maintaining performance at the level of the original short context window. Extensive experiments on LLaMA2 and Mistral across various tasks demonstrate the effectiveness of this method. Models extended via LongRoPE retain the original architecture with minor modifications to the positional embedding and can reuse most pre-existing optimizations.


Exploring Interaction Patterns for Debugging: Enhancing Conversational Capabilities of AI-assistants

Conversational interactions with large language models (LLMs) enable programmers to obtain natural language explanations for various software development tasks. However, LLMs often leap to action without sufficient context, giving rise to implicit assumptions and inaccurate responses. Conversations between developers and LLMs are primarily structured as question-answer pairs, where the developer is responsible for asking the right questions and sustaining conversations across multiple turns.  

In a recent paper: Exploring Interaction Patterns for Debugging: Enhancing Conversational Capabilities of AI-assistants, researchers from Microsoft draw inspiration from interaction patterns and conversation analysis to design Robin, an enhanced conversational AI-assistant for debugging. Robin works with the developer collaboratively, creating hypotheses about the bug’s root cause, testing them using IDE debugging features such as breakpoints and watches, and then proposing fixes. A user study with 12 industry professionals shows that equipping the LLM-driven debugging assistant to (1) leverage the insert expansion interaction pattern; (2) facilitate turn-taking; and (3) utilize debugging workflows, leads to lowered conversation barriers, effective fault localization, and 5x improvement in bug resolution rates.


Ironies of Generative AI: Understanding and mitigating productivity loss in human-AI interactions

Generative AI (GenAI) systems, which can produce new content based on input like code, images, speech, video, and more, offer opportunities to increase user productivity in many tasks, such as programming and writing. However, while they boost productivity in some studies, many others show that users are working ineffectively with GenAI systems and actually losing productivity. These ‘ironies of automation’ have been observed for over three decades in human factors research on automation in aviation, automated driving, and intelligence.  

In a recent paper: Ironies of Generative AI: Understanding and mitigating productivity loss in human-AI interactions, researchers from Microsoft draw on this extensive research alongside recent GenAI user studies to outline four key reasons for why productivity loss can occur with GenAI systems: 1) a shift in users’ roles from production to evaluation; 2) unhelpful restructuring of workflows; 3) interruptions; and 4) a tendency for automation to make easy tasks easier and hard tasks harder. We then suggest how human factors research can also inform GenAI system design to mitigate productivity loss by using approaches such as continuous feedback, system personalization, ecological interface design, task stabilization, and clear task allocation. Grounding developments in GenAI system usability in decades of research aims to ensure that the design of human-AI interactions in this rapidly moving field learns from history instead of repeating it. 

The post Research Focus: Week of March 18, 2024 appeared first on Microsoft Research.

Read More

AI Decoded From GTC: The Latest Developer Tools and Apps Accelerating AI on PC and Workstation

AI Decoded From GTC: The Latest Developer Tools and Apps Accelerating AI on PC and Workstation

Editor’s note: This post is part of the AI Decoded series, which demystifies AI by making the technology more accessible, and which showcases new hardware, software, tools and accelerations for RTX PC users.

NVIDIA’s RTX AI platform includes tools and software development kits that help Windows developers create cutting-edge generative AI features to deliver the best performance on AI PCs and workstations.

At GTC — NVIDIA’s annual technology conference — a dream team of industry luminaries, developers and researchers have come together to learn from one another, fueling what’s next in AI and accelerated computing.

This special edition of AI Decoded from GTC spotlights the best AI tools currently available and looks at what’s ahead for the 100 million RTX PC and workstation users and developers.

Chat with RTX, the tech demo and developer reference project that quickly and easily allows users to connect a powerful LLM to their own data, showcased new capabilities and new models in the GTC exhibit hall.

The winners of the Gen AI on RTX PCs contest were announced Monday. OutlookLLM, Rocket League BotChat and CLARA were highlighted in one of the AI Decoded talks in the generative AI theater and each are accelerated by NVIDIA TensorRT-LLM. Two other AI Decoded talks included using generative AI in content creation and a deep dive on Chat with RTX.

Developer frameworks and interfaces with TensorRT-LLM integration continue to grow as Jan.ai, Langchain, LlamaIndex and Oobabooga will all soon be accelerated — helping to grow the already more than 500 AI applications for RTX PCs and workstations.

NVIDIA NIM microservices are coming to RTX PCs and workstations. They provide pre-built containers, with industry standard APIs, enabling developers to accelerate deployment on RTX PCs and workstations. NVIDIA AI Workbench, an easy-to-use developer toolkit to manage AI model customization and optimization workflows, is now generally available for RTX developers.

These ecosystem integrations and tools will accelerate development of new Windows apps and features. And today’s contest winners are an inspiring glimpse into what that content will look like.

Hear More, See More, Chat More

Chat with RTX, or ChatRTX for short, uses retrieval-augmented generation, NVIDIA TensorRT-LLM software and NVIDIA RTX acceleration to bring local generative AI capabilities to RTX-powered Windows systems. Users can quickly and easily connect local files as a dataset to an open large language model like Mistral or Llama 2, enabling queries for quick, contextually relevant answers.

Moving beyond text, ChatRTX will soon add support for voice, images and new models.

Users will be able to talk to ChatRTX with Whisper — an automatic speech recognition system that uses AI to process spoken language. When the feature becomes available, ChatRTX will be able to “understand” spoken language, and provide text responses.

A future update will also add support for photos. By integrating OpenAI’s CLIP — Contrastive Language-Image Pre-training — users will be able to search by words, terms or phrases to find photos in their private library.

In addition to Google’s Gemma, ChatGLM will get support in a future update.

Developers can start with the latest version of the developer reference project on GitHub.

Generative AI for the Win

The NVIDIA Generative AI on NVIDIA RTX developer contest prompted developers to build a Windows app or plug-in.

“I found that playing against bots that react to game events with in-game messages in near real time adds a new level of entertainment to the game, and I’m excited to share my approach to incorporating AI into gaming as a participant in this developer contest. The target audience for my project is anyone who plays Rocket League with RTX hardware.” — Brian Caffey, Rocket League BotChat developer

Submissions were judged on three criteria, including a short demo video posted to social media, relative impact and ease of use of the project, and how effectively NVIDIA’s technology stack was used in the project. Each of the three winners received a pass to GTC, including a spot in the NVIDIA Deep Learning Institute GenAI/LLM courses, and a GeForce RTX 4090 GPU to power future development work.

OutlookLLM gives Outlook users generative AI features — such as email composition — securely and privately in their email client on RTX PCs and workstations. It uses a local LLM served via TensorRT-LLM.

Rocket League BotChat, for the popular Rocket League game, is a plug-in that allows bots to send contextual in-game chat messages based on a log of game events, such as scoring a goal or making a save. Designed to be used only in offline games against bot players, the plug-in is configurable in many ways via its settings menu.

CLARA (short for Command Line Assistant with RTX Acceleration) is designed to enhance the command line interface of PowerShell by translating plain English instructions into actionable commands. The extension runs locally, quickly and keeps users in their PowerShell context. Once it’s enabled, users type their English instructions and press the tab button to invoke CLARA. Installation is straightforward, and there are options for both script-based and manual setup.

From the Generative AI Theater

GTC attendees can attend three AI Decoded talks on Wednesday, March 20 at the generative AI theater. These 15-minute sessions will guide the audience through ChatRTX and how developers can productize their own personalized chatbot; how each of the three contest winners’ showed some of the possibilities for generative AI apps on RTX systems; and a celebration of artists, the tools and methods they use powered by NVIDIA technology.

In the creator session, Lee Fraser, senior developer relations manager for generative AI media and entertainment at NVIDIA, will explore why generative AI has become so popular. He’ll show off new workflows and how creators can rapidly explore ideas. Artists to be featured include Steve Talkowski, Sophia Crespo, Lim Wenhui, Erik Paynter, Vanessa Rosa and Refik Anadol.

Anadol also has an installation at the show that combines data visualization and imagery based on that data.

Ecosystem of Acceleration

Top creative app developers, like Blackmagic Design and Topaz Labs have integrated RTX AI acceleration in their software. TensorRT doubles the speed of AI effects like rotoscoping, denoising, super-resolution and video stabilization in the DaVinci Resolve and Topaz apps.

“Blackmagic Design and NVIDIA’s ongoing collaborations to run AI models on RTX AI PCs will produce a new wave of groundbreaking features that give users the power to create captivating and immersive content, faster.” — Rohit Gupta, director of software development at Blackmagic Design

TensorRT-LLM is being integrated with popular developer frameworks and ecosystems such as LangChain, LlamaIndex, Oobabooga and Jan.AI. Developers and enthusiasts can easily access the performance benefits of TensorRT-LLM through top LLM frameworks to build and deploy generative AI apps to both local and cloud GPUs.

Enthusiasts can also try out their favorite LLMs — accelerated with TensorRT-LLM on RTX systems — through the Oobabooga and Jan.AI chat interfaces.

AI That’s NIMble, AI That’s Quick

Developers and tinkerers can tap into NIM microservices. These pre-built AI “containers,” with industry-standard APIs, provide an optimized solution that helps to reduce deployment times from weeks to minutes. They can be used with more than two dozen popular models from NVIDIA, Getty Images, Google, Meta, Microsoft, Shutterstock and more.

NVIDIA AI Workbench is now generally available, helping developers quickly create, test and customize pretrained generative AI models and LLMs on RTX GPUs. It offers streamlined access to popular repositories like Hugging Face, GitHub and NVIDIA NGC, along with a simplified user interface that enables developers to easily reproduce, collaborate on and migrate projects.

Projects can be easily scaled up when additional performance is needed — whether to the data center, a public cloud or NVIDIA DGX Cloud — and then brought back to local RTX systems on a PC or workstation for inference and light customization. AI Workbench is a free download and provides example projects to help developers get started quickly.

These tools, and many others announced and shown at GTC, are helping developers drive innovative AI solutions.

From the Blackwell platform’s arrival, to a digital twin for Earth’s climate, it’s been a GTC to remember. For RTX PC and workstation users and developers, it was also a glimpse into what’s next for generative AI.

See notice regarding software product information.

Read More

ScreenAI: A visual language model for UI and visually-situated language understanding

ScreenAI: A visual language model for UI and visually-situated language understanding

Screen user interfaces (UIs) and infographics, such as charts, diagrams and tables, play important roles in human communication and human-machine interaction as they facilitate rich and interactive user experiences. UIs and infographics share similar design principles and visual language (e.g., icons and layouts), that offer an opportunity to build a single model that can understand, reason, and interact with these interfaces. However, because of their complexity and varied presentation formats, infographics and UIs present a unique modeling challenge.

To that end, we introduce “ScreenAI: A Vision-Language Model for UI and Infographics Understanding”. ScreenAI improves upon the PaLI architecture with the flexible patching strategy from pix2struct. We train ScreenAI on a unique mixture of datasets and tasks, including a novel Screen Annotation task that requires the model to identify UI element information (i.e., type, location and description) on a screen. These text annotations provide large language models (LLMs) with screen descriptions, enabling them to automatically generate question-answering (QA), UI navigation, and summarization training datasets at scale. At only 5B parameters, ScreenAI achieves state-of-the-art results on UI- and infographic-based tasks (WebSRC and MoTIF), and best-in-class performance on Chart QA, DocVQA, and InfographicVQA compared to models of similar size. We are also releasing three new datasets: Screen Annotation to evaluate the layout understanding capability of the model, as well as ScreenQA Short and Complex ScreenQA for a more comprehensive evaluation of its QA capability.

ScreenAI

ScreenAI’s architecture is based on PaLI, composed of a multimodal encoder block and an autoregressive decoder. The PaLI encoder uses a vision transformer (ViT) that creates image embeddings and a multimodal encoder that takes the concatenation of the image and text embeddings as input. This flexible architecture allows ScreenAI to solve vision tasks that can be recast as text+image-to-text problems.

On top of the PaLI architecture, we employ a flexible patching strategy introduced in pix2struct. Instead of using a fixed-grid pattern, the grid dimensions are selected such that they preserve the native aspect ratio of the input image. This enables ScreenAI to work well across images of various aspect ratios.

The ScreenAI model is trained in two stages: a pre-training stage followed by a fine-tuning stage. First, self-supervised learning is applied to automatically generate data labels, which are then used to train ViT and the language model. ViT is frozen during the fine-tuning stage, where most data used is manually labeled by human raters.

ScreenAI model architecture.

Data generation

To create a pre-training dataset for ScreenAI, we first compile an extensive collection of screenshots from various devices, including desktops, mobile, and tablets. This is achieved by using publicly accessible web pages and following the programmatic exploration approach used for the RICO dataset for mobile apps. We then apply a layout annotator, based on the DETR model, that identifies and labels a wide range of UI elements (e.g., image, pictogram, button, text) and their spatial relationships. Pictograms undergo further analysis using an icon classifier capable of distinguishing 77 different icon types. This detailed classification is essential for interpreting the subtle information conveyed through icons. For icons that are not covered by the classifier, and for infographics and images, we use the PaLI image captioning model to generate descriptive captions that provide contextual information. We also apply an optical character recognition (OCR) engine to extract and annotate textual content on screen. We combine the OCR text with the previous annotations to create a detailed description of each screen.

A mobile app screenshot with generated annotations that include UI elements and their descriptions, e.g., TEXT elements also contain the text content from OCR, IMAGE elements contain image captions, LIST_ITEMs contain all their child elements.

LLM-based data generation

We enhance the pre-training data’s diversity using PaLM 2 to generate input-output pairs in a two-step process. First, screen annotations are generated using the technique outlined above, then we craft a prompt around this schema for the LLM to create synthetic data. This process requires prompt engineering and iterative refinement to find an effective prompt. We assess the generated data’s quality through human validation against a quality threshold.

You only speak JSON. Do not write text that isn’t JSON.
You are given the following mobile screenshot, described in words. Can you generate 5 questions regarding the content of the screenshot as well as the corresponding short answers to them? 

The answer should be as short as possible, containing only the necessary information. Your answer should be structured as follows:
questions: [
{{question: the question,
    answer: the answer
}},
 ...
]

{THE SCREEN SCHEMA}

A sample prompt for QA data generation.

By combining the natural language capabilities of LLMs with a structured schema, we simulate a wide range of user interactions and scenarios to generate synthetic, realistic tasks. In particular, we generate three categories of tasks:

  • Question answering: The model is asked to answer questions regarding the content of the screenshots, e.g., “When does the restaurant open?”
  • Screen navigation: The model is asked to convert a natural language utterance into an executable action on a screen, e.g., “Click the search button.”
  • Screen summarization: The model is asked to summarize the screen content in one or two sentences.
Block diagram of our workflow for generating data for QA, summarization and navigation tasks using existing ScreenAI models and LLMs. Each task uses a custom prompt to emphasize desired aspects, like questions related to counting, involving reasoning, etc.

LLM-generated data. Examples for screen QA, navigation and summarization. For navigation, the action bounding box is displayed in red on the screenshot.

Experiments and results

As previously mentioned, ScreenAI is trained in two stages: pre-training and fine-tuning. Pre-training data labels are obtained using self-supervised learning and fine-tuning data labels comes from human raters.

We fine-tune ScreenAI using public QA, summarization, and navigation datasets and a variety of tasks related to UIs. For QA, we use well established benchmarks in the multimodal and document understanding field, such as ChartQA, DocVQA, Multi page DocVQA, InfographicVQA, OCR VQA, Web SRC and ScreenQA. For navigation, datasets used include Referring Expressions, MoTIF, Mug, and Android in the Wild. Finally, we use Screen2Words for screen summarization and Widget Captioning for describing specific UI elements. Along with the fine-tuning datasets, we evaluate the fine-tuned ScreenAI model using three novel benchmarks:

  1. Screen Annotation: Enables the evaluation model layout annotations and spatial understanding capabilities.
  2. ScreenQA Short: A variation of ScreenQA, where its ground truth answers have been shortened to contain only the relevant information that better aligns with other QA tasks.
  3. Complex ScreenQA: Complements ScreenQA Short with more difficult questions (counting, arithmetic, comparison, and non-answerable questions) and contains screens with various aspect ratios.

The fine-tuned ScreenAI model achieves state-of-the-art results on various UI and infographic-based tasks (WebSRC and MoTIF) and best-in-class performance on Chart QA, DocVQA, and InfographicVQA compared to models of similar size. ScreenAI achieves competitive performance on Screen2Words and OCR-VQA. Additionally, we report results on the new benchmark datasets introduced to serve as a baseline for further research.

Comparing model performance of ScreenAI with state-of-the-art (SOTA) models of similar size.

Next, we examine ScreenAI’s scaling capabilities and observe that across all tasks, increasing the model size improves performances and the improvements have not saturated at the largest size.

Model performance increases with size, and the performance has not saturated even at the largest size of 5B params.

Conclusion

We introduce the ScreenAI model along with a unified representation that enables us to develop self-supervised learning tasks leveraging data from all these domains. We also illustrate the impact of data generation using LLMs and investigate improving model performance on specific aspects with modifying the training mixture. We apply all of these techniques to build multi-task trained models that perform competitively with state-of-the-art approaches on a number of public benchmarks. However, we also note that our approach still lags behind large models and further research is needed to bridge this gap.

Acknowledgements

This project is the result of joint work with Maria Wang, Fedir Zubach, Hassan Mansoor, Vincent Etter, Victor Carbune, Jason Lin, Jindong Chen and Abhanshu Sharma. We thank Fangyu Liu, Xi Chen, Efi Kokiopoulou, Jesse Berent, Gabriel Barcik, Lukas Zilka, Oriana Riva, Gang Li,Yang Li, Radu Soricut, and Tania Bedrax-Weiss for their insightful feedback and discussions, along with Rahul Aralikatte, Hao Cheng and Daniel Kim for their support in data preparation. We also thank Jay Yagnik, Blaise Aguera y Arcas, Ewa Dominowska, David Petrou, and Matt Sharifi for their leadership, vision and support. We are very grateful toTom Small for helping us create the animation in this post.

Read More

Secure by Design: NVIDIA AIOps Partner Ecosystem Blends AI for Businesses

Secure by Design: NVIDIA AIOps Partner Ecosystem Blends AI for Businesses

In today’s complex business environments, IT teams face a constant flow of challenges, from simple issues like employee account lockouts to critical security threats. These situations demand both quick fixes and strategic defenses, making the job of maintaining smooth and secure operations ever tougher.

That’s where AIOps comes in, blending artificial intelligence with IT operations to not only automate routine tasks, but also enhance security measures. This efficient approach allows teams to quickly deal with minor issues and, more importantly, to identify and respond to security threats faster and with greater accuracy than before.

By using machine learning, AIOps becomes a crucial tool in not just streamlining operations but also in strengthening security across the board. It’s proving to be a game-changer for businesses looking to integrate advanced AI into their teams, helping them stay a step ahead of potential security risks.

According to IDC, the IT operations management software market is expected to grow at a rate of 10.3% annually, reaching a projected revenue of $28.4 billion by 2027. This growth underscores the increasing reliance on AIOps for operational efficiency and as a critical component of modern cybersecurity strategies.

As the rapid growth of machine learning operations continues to transform the era of generative AI, a broad ecosystem of NVIDIA partners are offering AIOps solutions that leverage NVIDIA AI to improve IT operations.

NVIDIA is helping a broad ecosystem of AIOps partners with accelerated compute and AI software. This includes NVIDIA AI Enterprise, a cloud-native stack that can run anywhere and provides a basis for AIOps through software like NVIDIA NIM for accelerated inference of AI modes, NVIDIA Morpheus for AI-based cybersecurity and NVIDIA NeMo for custom generative AI. This software facilitates GenAI-based chatbot, summarization and search functionality.

AIOps providers using NVIDIA AI include:

  • Dynatrace Davis hypermodal AI advances AIOps by integrating causal, predictive and generative AI techniques with the addition of Davis CoPilot. This combination enhances observability and security across IT, development, security and business operations by offering precise and actionable, AI-driven answers and automation.

  • Elastic offers Elasticsearch Relevance Engine (ESRE) for semantic and vector search, which integrates with popular LLMs like GPT-4 to power AI Assistants in their Observability and Security solutions. The Observability AI Assistant is a next-generation AI Ops capability that helps IT teams understand complex systems, monitor health and automate remediation of operational issues.
  • New Relic is advancing AIOps by leveraging its machine learning, generative AI assistant frameworks and longstanding expertise in observability. Its machine learning and advanced logic helps IT teams reduce alerting noise, improve mean time to detect and mean time to repair, automate root cause analysis and generate retrospectives. Its GenAI assistant, New Relic AI, accelerates issue resolution by allowing users to identify, explain and resolve errors without switching contexts, and suggests and applies code fixes directly in a developer’s integrated development environment. It also extends incident visibility and prevention to non-technical teams by automatically producing high-level system health reports, analyzing and summarizing dashboards and answering plain-language questions about a user’s applications, infrastructure and services. New Relic also provides full-stack observability for AI-powered applications benefitting from NVIDIA GPUs.
  • PagerDuty has introduced a new feature in PagerDuty Copilot, integrating a generative AI assistant within Slack to offer insights from incident start to resolution, streamlining the incident lifecycle and reducing manual task loads for IT teams.
  • ServiceNow’s commitment to creating a proactive IT operations encompasses automating insights for rapid incident response, optimizing service management and detecting anomalies. Now, in collaboration with NVIDIA, it is pushing into generative AI to further innovate technology service and operations.
  • Splunk’s technology platform applies artificial intelligence and machine learning to automate the processes of identifying, diagnosing and resolving operational issues and threats, thereby enhancing IT efficiency and security posture. Splunk IT Service Intelligence serves as Splunk’s primary AIOps offering, providing embedded AI-driven incident prediction, detection and resolution all from one place.

Cloud service providers including Amazon Web Services (AWS), Google Cloud and Microsoft Azure enable organizations to automate and optimize their IT operations, leveraging the scale and flexibility of cloud resources.

  • AWS offers a suite of services conducive to AIOps, including Amazon CloudWatch for monitoring and observability; AWS CloudTrail for tracking user activity and API usage; Amazon SageMaker for creating repeatable and responsible machine learning workflows; and AWS Lambda for serverless computing, allowing for the automation of response actions based on triggers.
  • Google Cloud supports AIOps through services like Google Cloud Operations, which provides monitoring, logging and diagnostics across applications on the cloud and on-premises. Google Cloud’s AI and machine learning products include Vertex AI for model training and prediction and BigQuery for fast SQL queries using the processing power of Google’s infrastructure.
  • Microsoft Azure facilitates AIOps with Azure Monitor for comprehensive monitoring of applications, services and infrastructure. Azure Monitor’s built-in AIOps capabilities help predict capacity usage, enable autoscaling, identify application performance issues and detect anomalous behaviors in virtual machines, containers and other resources. Microsoft Azure Machine Learning (AzureML) offers a cloud-based MLOps environment for training, deploying and managing machine learning models responsibly, securely and at scale.

Platforms specializing in MLOps primarily focus on streamlining the lifecycle of machine learning models, from development to deployment and monitoring. While the core mission centers on making machine learning more accessible, efficient and scalable, their technologies and methodologies indirectly support AIOps by enhancing AI capabilities within IT operations: 

  • Anyscale’s platform, based on Ray, allows for the easy scaling of AI and machine learning applications, including those used in AIOps for tasks like anomaly detection and automated remediation. By facilitating distributed computing, Anyscale helps AIOps systems process large volumes of operational data more efficiently, enabling real-time analytics and decision-making.
  • Dataiku can be used to create models that predict IT system failures or optimize resource allocation, with features that allow IT teams to quickly deploy and iterate on these models in production environments.
  • Dataloop’s platform delivers full data lifecycle management and a flexible way to plug in AI models for an end-to-end workflow, allowing users to develop AI applications using their data.
  • DataRobot is a full AI lifecycle platform that enables IT operations teams to rapidly build, deploy and govern AI solutions, improving operational efficiency and performance.
  • Domino Data Lab’s platform lets enterprises and their data scientists build, deploy and manage AI on a unified, end-to-end platform. Data, tools, compute, models and projects across all environments are centrally managed so teams can collaborate, monitor production models and standardize best practices for governed AI innovation. This approach is vital for AIOps as it balances the self-service needed by data science teams with complete reproducibility, granular cost tracking and proactive governance for IT operational needs.
  • Weights & Biases provides tools for experiment tracking, model optimization, and collaboration, crucial for developing and fine-tuning AI models used in AIOps. By offering detailed insights into model performance and facilitating collaboration across teams, Weights & Biases helps ensure that AI models deployed for IT operations are both effective and transparent.

Learn more about NVIDIA’s partner ecosystem and their work at NVIDIA GTC.

Read More

Climate Pioneers: 3 Startups Harnessing NVIDIA’s AI and Earth-2 Platforms

Climate Pioneers: 3 Startups Harnessing NVIDIA’s AI and Earth-2 Platforms

To help mitigate climate change — one of humanity’s greatest challenges — researchers are turning to AI and sustainable computing to accelerate and operationalize their work.

At this week’s NVIDIA GTC global AI conference, startups, enterprises and scientists are highlighting their environmental sustainability initiatives and latest climate innovations. Many are using NVIDIA Earth-2, a full-stack, open platform for accelerating climate and weather simulation and predictions.

Earth-2 comprises GPU-accelerated numerical weather and climate prediction models, including ICON and IFS; state-of-the-art AI-driven weather models, such as FourCastNet, GraphCast and Deep Learning Weather Prediction, offered through the NVIDIA Modulus framework; and large-scale, interactive, high-resolution data visualization and simulation enabled by the NVIDIA Omniverse platform. These capabilities are also available via cloud APIs, or application programming interfaces.

Various members of NVIDIA Inception — a free, global program for cutting-edge startups — are pioneering climate AI advancements with Earth-2. It’s critical work, as extreme-weather events are expected to take a million lives and cost $1.7 trillion per year by 2050.

Tomorrow.io Powers Weather Predictions of Tomorrow

Boston-based Tomorrow.io provides actionable, weather-related insights to countries, businesses and individuals by applying advanced AI and machine learning models to a proprietary global dataset collected from satellites, radar and other sensors. Its weather intelligence and climate adaptation platform delivers high-resolution, accurate weather forecasts across time zones for both short- and long-term projections.

The startup is using Earth-2 to study the potential impacts of its suite of satellites on global model forecasts. By conducting observing-system simulation experiments, or OSSEs, with Earth-2 AI forecast models, Tomorrow.io can identify the optimal configurations of satellites and other instruments to improve weather-forecasting conditions. The work ultimately aims to offer users precision and simplicity, helping them easily understand complex weather situations and make the right operational decisions at the right time.

Learn more about Tomorrow.io’s work with Earth-2 by joining the GTC session, “Global Strategies: Startups, Venture Capital, and Climate Change Solutions,” taking place today, March 19, at 3 p.m. PT, at the San Jose Convention Center and online.

ClimaSens Advances Flood-Risk Management With AI

ClimaSens, based in Melbourne, Australia, and New York, fuses historical, real-time and future climate and weather information using advanced AI models. FloodSens, its upcoming flood risk analysis model, informs clients about the probability of flooding from rainfall, offering high-resolution assessments of flash flooding, riverine flooding and all types of flooding in between.

FloodSens, now in beta, was developed using Earth-2 APIs and the FourCastNet model for high-fidelity, physically accurate representations of future weather conditions, as well as an ensemble of other models for assessing the probabilities of low-likelihood, high-impact flooding events. Through this work, the startup aims to enable a more resilient, sustainable future for communities worldwide.

North.io Garners Ocean Insights With AI and Accelerated Modeling

Based in Kiel, Germany, north.io is helping to map the Earth’s largest carbon sink: oceans. Only about 25% of the ocean floor — a critical source of the world’s renewable energy and food security — has been mapped so far.

North.io is collecting and analyzing massive amounts of data from autonomous underwater vehicles (AUVs) and making it accessible, shareable, visualizable and understandable for users across the globe through its TrueOcean platform.

Using Earth-2 APIs, north.io is developing AI weather forecasts for intelligent operational planning, system management and risk assessment for its AUVs. The combination of high-precision weather modeling and the use of autonomous systems drastically reduces human safety risks in rough, offshore environments.

Learn more about the latest AI, high performance computing and sustainable computing advancements for climate research at GTC, running through Thursday, March 21.

Read More

Unlock the potential of generative AI in industrial operations

Unlock the potential of generative AI in industrial operations

In the evolving landscape of manufacturing, the transformative power of AI and machine learning (ML) is evident, driving a digital revolution that streamlines operations and boosts productivity. However, this progress introduces unique challenges for enterprises navigating data-driven solutions. Industrial facilities grapple with vast volumes of unstructured data, sourced from sensors, telemetry systems, and equipment dispersed across production lines. Real-time data is critical for applications like predictive maintenance and anomaly detection, yet developing custom ML models for each industrial use case with such time series data demands considerable time and resources from data scientists, hindering widespread adoption.

Generative AI using large pre-trained foundation models (FMs) such as Claude can rapidly generate a variety of content from conversational text to computer code based on simple text prompts, known as zero-shot prompting. This eliminates the need for data scientists to manually develop specific ML models for each use case, and therefore democratizes AI access, benefitting even small manufacturers. Workers gain productivity through AI-generated insights, engineers can proactively detect anomalies, supply chain managers optimize inventories, and plant leadership makes informed, data-driven decisions.

Nevertheless, standalone FMs face limitations in handling complex industrial data with context size constraints (typically less than 200,000 tokens), which poses challenges. To address this, you can use the FM’s ability to generate code in response to natural language queries (NLQs). Agents like PandasAI come into play, running this code on high-resolution time series data and handling errors using FMs. PandasAI is a Python library that adds generative AI capabilities to pandas, the popular data analysis and manipulation tool.

However, complex NLQs, such as time series data processing, multi-level aggregation, and pivot or joint table operations, may yield inconsistent Python script accuracy with a zero-shot prompt.

To enhance code generation accuracy, we propose dynamically constructing multi-shot prompts for NLQs. Multi-shot prompting provides additional context to the FM by showing it several examples of desired outputs for similar prompts, boosting accuracy and consistency. In this post, multi-shot prompts are retrieved from an embedding containing successful Python code run on a similar data type (for example, high-resolution time series data from Internet of Things devices). The dynamically constructed multi-shot prompt provides the most relevant context to the FM, and boosts the FM’s capability in advanced math calculation, time series data processing, and data acronym understanding. This improved response facilitates enterprise workers and operational teams in engaging with data, deriving insights without requiring extensive data science skills.

Beyond time series data analysis, FMs prove valuable in various industrial applications. Maintenance teams assess asset health, capture images for Amazon Rekognition-based functionality summaries, and anomaly root cause analysis using intelligent searches with Retrieval Augmented Generation (RAG). To simplify these workflows, AWS has introduced Amazon Bedrock, enabling you to build and scale generative AI applications with state-of-the-art pre-trained FMs like Claude v2. With Knowledge Bases for Amazon Bedrock, you can simplify the RAG development process to provide more accurate anomaly root cause analysis for plant workers. Our post showcases an intelligent assistant for industrial use cases powered by Amazon Bedrock, addressing NLQ challenges, generating part summaries from images, and enhancing FM responses for equipment diagnosis through the RAG approach.

Solution overview

The following diagram illustrates the solution architecture.

The workflow includes three distinct use cases:

Use case 1: NLQ with time series data

The workflow for NLQ with time series data consists of the following steps:

  1. We use a condition monitoring system with ML capabilities for anomaly detection, such as Amazon Monitron, to monitor industrial equipment health. Amazon Monitron is able to detect potential equipment failures from the equipment’s vibration and temperature measurements.
  2. We collect time series data by processing Amazon Monitron data through Amazon Kinesis Data Streams and Amazon Data Firehose, converting it into a tabular CSV format and saving it in an Amazon Simple Storage Service (Amazon S3) bucket.
  3. The end-user can start chatting with their time series data in Amazon S3 by sending a natural language query to the Streamlit app.
  4. The Streamlit app forwards user queries to the Amazon Bedrock Titan text embedding model to embed this query, and performs a similarity search within an Amazon OpenSearch Service index, which contains prior NLQs and example codes.
  5. After the similarity search, the top similar examples, including NLQ questions, data schema, and Python codes, are inserted in a custom prompt.
  6. PandasAI sends this custom prompt to the Amazon Bedrock Claude v2 model.
  7. The app uses the PandasAI agent to interact with the Amazon Bedrock Claude v2 model, generating Python code for Amazon Monitron data analysis and NLQ responses.
  8. After the Amazon Bedrock Claude v2 model returns the Python code, PandasAI runs the Python query on the Amazon Monitron data uploaded from the app, collecting code outputs and addressing any necessary retries for failed runs.
  9. The Streamlit app collects the response via PandasAI, and provides the output to users. If the output is satisfactory, the user can mark it as helpful, saving the NLQ and Claude-generated Python code in OpenSearch Service.

Use case 2: Summary generation of malfunctioning parts

Our summary generation use case consists of the following steps:

  1. After the user knows which industrial asset shows anomalous behavior, they can upload images of the malfunctioning part to identify if there is something physically wrong with this part according to its technical specification and operation condition.
  2. The user can use the Amazon Recognition DetectText API to extract text data from these images.
  3. The extracted text data is included in the prompt for the Amazon Bedrock Claude v2 model, enabling the model to generate a 200-word summary of the malfunctioning part. The user can use this information to perform further inspection of the part.

Use case 3: Root cause diagnosis

Our root cause diagnosis use case consists of the following steps:

  1. The user obtains enterprise data in various document formats (PDF, TXT, and so on) related with malfunctioning assets, and uploads them to an S3 bucket.
  2. A knowledge base of these files is generated in Amazon Bedrock with a Titan text embeddings model and a default OpenSearch Service vector store.
  3. The user poses questions related to the root cause diagnosis for malfunctioning equipment. Answers are generated through the Amazon Bedrock knowledge base with a RAG approach.

Prerequisites

To follow along with this post, you should meet the following prerequisites:

Deploy the solution infrastructure

To set up your solution resources, complete the following steps:

  1. Deploy the AWS CloudFormation template opensearchsagemaker.yml, which creates an OpenSearch Service collection and index, Amazon SageMaker notebook instance, and S3 bucket. You can name this AWS CloudFormation stack as: genai-sagemaker.
  2. Open the SageMaker notebook instance in JupyterLab. You will find the following GitHub repo already downloaded on this instance: unlocking-the-potential-of-generative-ai-in-industrial-operations.
  3. Run the notebook from the following directory in this repository: unlocking-the-potential-of-generative-ai-in-industrial-operations/SagemakerNotebook/nlq-vector-rag-embedding.ipynb. This notebook will load the OpenSearch Service index using the SageMaker notebook to store key-value pairs from the existing 23 NLQ examples.
  4. Upload documents from the data folder assetpartdoc in the GitHub repository to the S3 bucket listed in the CloudFormation stack outputs.

Next, you create the knowledge base for the documents in Amazon S3.

  1. On the Amazon Bedrock console, choose Knowledge base in the navigation pane.
  2. Choose Create knowledge base.
  3. For Knowledge base name, enter a name.
  4. For Runtime role, select Create and use a new service role.
  5. For Data source name, enter the name of your data source.
  6. For S3 URI, enter the S3 path of the bucket where you uploaded the root cause documents.
  7. Choose Next.
    The Titan embeddings model is automatically selected.
  8. Select Quick create a new vector store.
  9. Review your settings and create the knowledge base by choosing Create knowledge base.
  10. After the knowledge base is successfully created, choose Sync to sync the S3 bucket with the knowledge base.
  11. After you set up the knowledge base, you can test the RAG approach for root cause diagnosis by asking questions like “My actuator travels slow, what might be the issue?”

The next step is to deploy the app with the required library packages on either your PC or an EC2 instance (Ubuntu Server 22.04 LTS).

  1. Set up your AWS credentials with the AWS CLI on your local PC. For simplicity, you can use the same admin role you used to deploy the CloudFormation stack. If you’re using Amazon EC2, attach a suitable IAM role to the instance.
  2. Clone GitHub repo:
    git clone https://github.com/aws-samples/unlocking-the-potential-of-generative-ai-in-industrial-operations

  3. Change the directory to unlocking-the-potential-of-generative-ai-in-industrial-operations/src and run the setup.sh script in this folder to install the required packages, including LangChain and PandasAI:
    cd unlocking-the-potential-of-generative-ai-in-industrial-operations/src
    chmod +x ./setup.sh
    ./setup.sh   
  4. Run the Streamlit app with the following command:
    source monitron-genai/bin/activate
    python3 -m streamlit run app_bedrock.py <REPLACE WITH YOUR BEDROCK KNOWLEDGEBASE ARN>
    

Provide the OpenSearch Service collection ARN you created in Amazon Bedrock from the previous step.

Chat with your asset health assistant

After you complete the end-to-end deployment, you can access the app via localhost on port 8501, which opens a browser window with the web interface. If you deployed the app on an EC2 instance, allow port 8501 access via the security group inbound rule. You can navigate to different tabs for various use cases.

Explore use case 1

To explore the first use case, choose Data Insight and Chart. Begin by uploading your time series data. If you don’t have an existing time series data file to use, you can upload the following sample CSV file with anonymous Amazon Monitron project data. If you already have an Amazon Monitron project, refer to Generate actionable insights for predictive maintenance management with Amazon Monitron and Amazon Kinesis to stream your Amazon Monitron data to Amazon S3 and use your data with this application.

When the upload is complete, enter a query to initiate a conversation with your data. The left sidebar offers a range of example questions for your convenience. The following screenshots illustrate the response and Python code generated by the FM when inputting a question such as “Tell me the unique number of sensors for each site shown as Warning or Alarm respectively?” (a hard-level question) or “For sensors shown temperature signal as NOT Healthy, can you calculate the time duration in days for each sensor shown abnormal vibration signal?” (a challenge-level question). The app will answer your question, and will also show the Python script of data analysis it performed to generate such results.

If you’re satisfied with the answer, you can mark it as Helpful, saving the NLQ and Claude-generated Python code to an OpenSearch Service index.

Explore use case 2

To explore the second use case, choose the Captured Image Summary tab in the Streamlit app. You can upload an image of your industrial asset, and the application will generate a 200-word summary of its technical specification and operation condition based on the image information. The following screenshot shows the summary generated from an image of a belt motor drive. To test this feature, if you lack a suitable image, you can use the following example image.

Hydraulic elevator motor label” by Clarence Risher is licensed under CC BY-SA 2.0.

Explore use case 3

To explore the third use case, choose the Root cause diagnosis tab. Input a query related to your broken industrial asset, such as, “My actuator travels slow, what might be the issue?” As depicted in the following screenshot, the application delivers a response with the source document excerpt used to generate the answer.

Use case 1: Design details

In this section, we discuss the design details of the application workflow for the first use case.

Custom prompt building

The user’s natural language query comes with different difficult levels: easy, hard, and challenge.

Straightforward questions may include the following requests:

  • Select unique values
  • Count total numbers
  • Sort values

For these questions, PandasAI can directly interact with the FM to generate Python scripts for processing.

Hard questions require basic aggregation operation or time series analysis, such as the following:

  • Select value first and group results hierarchically
  • Perform statistics after initial record selection
  • Timestamp count (for example, min and max)

For hard questions, a prompt template with detailed step-by-step instructions assists FMs in providing accurate responses.

Challenge-level questions need advanced math calculation and time series processing, such as the following:

  • Calculate anomaly duration for each sensor
  • Calculate anomaly sensors for site on a monthly basis
  • Compare sensor readings under normal operation and abnormal conditions

For these questions, you can use multi-shots in a custom prompt to enhance response accuracy. Such multi-shots show examples of advanced time series processing and math calculation, and will provide context for the FM to perform relevant inference on similar analysis. Dynamically inserting the most relevant examples from an NLQ question bank into the prompt can be a challenge. One solution is to construct embeddings from existing NLQ question samples and save these embeddings in a vector store like OpenSearch Service. When a question is sent to the Streamlit app, the question will be vectorized by BedrockEmbeddings. The top N most-relevant embeddings to that question are retrieved using opensearch_vector_search.similarity_search and inserted into the prompt template as a multi-shot prompt.

The following diagram illustrates this workflow.

The embedding layer is constructed using three key tools:

  • Embeddings model – We use Amazon Titan Embeddings available through Amazon Bedrock (amazon.titan-embed-text-v1) to generate numerical representations of textual documents.
  • Vector store – For our vector store, we use OpenSearch Service via the LangChain framework, streamlining the storage of embeddings generated from NLQ examples in this notebook.
  • Index – The OpenSearch Service index plays a pivotal role in comparing input embeddings to document embeddings and facilitating the retrieval of relevant documents. Because the Python example codes were saved as a JSON file, they were indexed in OpenSearch Service as vectors via an OpenSearchVevtorSearch.fromtexts API call.

Continuous collection of human-audited examples via Streamlit

At the outset of app development, we began with only 23 saved examples in the OpenSearch Service index as embeddings. As the app goes live in the field, users start inputting their NLQs via the app. However, due to the limited examples available in the template, some NLQs may not find similar prompts. To continuously enrich these embeddings and offer more relevant user prompts, you can use the Streamlit app for gathering human-audited examples.

Within the app, the following function serves this purpose. When end-users find the output helpful and select Helpful, the application follows these steps:

  1. Use the callback method from PandasAI to collect the Python script.
  2. Reformat the Python script, input question, and CSV metadata into a string.
  3. Check whether this NLQ example already exists in the current OpenSearch Service index using opensearch_vector_search.similarity_search_with_score.
  4. If there’s no similar example, this NLQ is added to the OpenSearch Service index using opensearch_vector_search.add_texts.

In the event that a user selects Not Helpful, no action is taken. This iterative process makes sure that the system continually improves by incorporating user-contributed examples.

def addtext_opensearch(input_question, generated_chat_code, df_column_metadata, opensearch_vector_search,similarity_threshold,kexamples, indexname):
    #######build the input_question and generated code the same format as existing opensearch index##########
    reconstructed_json = {}
    reconstructed_json["question"]=input_question
    reconstructed_json["python_code"]=str(generated_chat_code)
    reconstructed_json["column_info"]=df_column_metadata
    json_str = ''
    for key,value in reconstructed_json.items():
        json_str += key + ':' + value
    reconstructed_raw_text =[]
    reconstructed_raw_text.append(json_str)
    
    results = opensearch_vector_search.similarity_search_with_score(str(reconstructed_raw_text[0]), k=kexamples)  # our search query  # return 3 most relevant docs
    if (dumpd(results[0][1])<similarity_threshold):    ###No similar embedding exist, then add text to embedding
        response = opensearch_vector_search.add_texts(texts=reconstructed_raw_text, engine="faiss", index_name=indexname)
    else:
        response = "A similar embedding is already exist, no action."
    
    return response

By incorporating human auditing, the quantity of examples in OpenSearch Service available for prompt embedding grows as the app gains usage. This expanded embedding dataset results in enhanced search accuracy over time. Specifically, for challenging NLQs, the FM’s response accuracy reaches approximately 90% when dynamically inserting similar examples to construct custom prompts for each NLQ question. This represents a notable 28% increase compared to scenarios without multi-shot prompts.

Use case 2: Design details

On the Streamlit app’s Captured Image Summary tab, you can directly upload an image file. This initiates the Amazon Rekognition API (detect_text API), extracting text from the image label detailing machine specifications. Subsequently, the extracted text data is sent to the Amazon Bedrock Claude model as the context of a prompt, resulting in a 200-word summary.

From a user experience perspective, enabling streaming functionality for a text summarization task is paramount, allowing users to read the FM-generated summary in smaller chunks rather than waiting for the entire output. Amazon Bedrock facilitates streaming via its API (bedrock_runtime.invoke_model_with_response_stream).

Use case 3: Design details

In this scenario, we’ve developed a chatbot application focused on root cause analysis, employing the RAG approach. This chatbot draws from multiple documents related to bearing equipment to facilitate root cause analysis. This RAG-based root cause analysis chatbot uses knowledge bases for generating vector text representations, or embeddings. Knowledge Bases for Amazon Bedrock is a fully managed capability that helps you implement the entire RAG workflow, from ingestion to retrieval and prompt augmentation, without having to build custom integrations to data sources or manage data flows and RAG implementation details.

When you’re satisfied with the knowledge base response from Amazon Bedrock, you can integrate the root cause response from the knowledge base to the Streamlit app.

Clean up

To save costs, delete the resources you created in this post:

  1. Delete the knowledge base from Amazon Bedrock.
  2. Delete the OpenSearch Service index.
  3. Delete the genai-sagemaker CloudFormation stack.
  4. Stop the EC2 instance if you used an EC2 instance to run the Streamlit app.

Conclusion

Generative AI applications have already transformed various business processes, enhancing worker productivity and skill sets. However, the limitations of FMs in handling time series data analysis have hindered their full utilization by industrial clients. This constraint has impeded the application of generative AI to the predominant data type processed daily.

In this post, we introduced a generative AI Application solution designed to alleviate this challenge for industrial users. This application uses an open source agent, PandasAI, to strengthen an FM’s time series analysis capability. Rather than sending time series data directly to FMs, the app employs PandasAI to generate Python code for the analysis of unstructured time series data. To enhance the accuracy of Python code generation, a custom prompt generation workflow with human auditing has been implemented.

Empowered with insights into their asset health, industrial workers can fully harness the potential of generative AI across various use cases, including root cause diagnosis and part replacement planning. With Knowledge Bases for Amazon Bedrock, the RAG solution is straightforward for developers to build and manage.

The trajectory of enterprise data management and operations is unmistakably moving towards deeper integration with generative AI for comprehensive insights into operational health. This shift, spearheaded by Amazon Bedrock, is significantly amplified by the growing robustness and potential of LLMs like Amazon Bedrock Claude 3 to further elevate solutions. To learn more, visit consult the Amazon Bedrock documentation, and get hands-on with the Amazon Bedrock workshop.


About the authors

Julia Hu is a Sr. AI/ML Solutions Architect at Amazon Web Services. She is specialized in Generative AI, Applied Data Science and IoT architecture. Currently she is part of the Amazon Q team, and an active member/mentor in Machine Learning Technical Field Community. She works with customers, ranging from start-ups to enterprises, to develop AWSome generative AI solutions. She is particularly passionate about leveraging Large Language Models for advanced data analytics and exploring practical applications that address real-world challenges.

Sudeesh Sasidharan is a Senior Solutions Architect at AWS, within the Energy team. Sudeesh loves experimenting with new technologies and building innovative solutions that solve complex business challenges. When he is not designing solutions or tinkering with the latest technologies, you can find him on the tennis court working on his backhand.

Neil Desai is a technology executive with over 20 years of experience in artificial intelligence (AI), data science, software engineering, and enterprise architecture. At AWS, he leads a team of Worldwide AI services specialist solutions architects who help customers build innovative Generative AI-powered solutions, share best practices with customers, and drive product roadmap. In his previous roles at Vestas, Honeywell, and Quest Diagnostics, Neil has held leadership roles in developing and launching innovative products and services that have helped companies improve their operations, reduce costs, and increase revenue. He is passionate about using technology to solve real-world problems and is a strategic thinker with a proven track record of success.

Read More