Empower your business users to extract insights from company documents using Amazon SageMaker Canvas Generative AI

Empower your business users to extract insights from company documents using Amazon SageMaker Canvas Generative AI

Enterprises seek to harness the potential of Machine Learning (ML) to solve complex problems and improve outcomes. Until recently, building and deploying ML models required deep levels of technical and coding skills, including tuning ML models and maintaining operational pipelines. Since its introduction in 2021, Amazon SageMaker Canvas has enabled business analysts to build, deploy, and use a variety of ML models – including tabular, computer vision, and natural language processing – without writing a line of code. This has accelerated the ability of enterprises to apply ML to use cases such as time-series forecasting, customer churn prediction, sentiment analysis, industrial defect detection, and many others.

As announced on October 5, 2023, SageMaker Canvas expanded its support of models to foundation models (FMs) – large language models used to generate and summarize content. With the October 12, 2023 release, SageMaker Canvas lets users ask questions and get responses that are grounded in their enterprise data. This ensures that results are context-specific, opening up additional use cases where no-code ML can be applied to solve business problems. For example, business teams can now formulate responses consistent with an organization’s specific vocabulary and tenets, and can more quickly query lengthy documents to get responses specific and grounded to the contents of those documents. All this content is performed in a private and secure manner, ensuring that all sensitive data is accessed with proper governance and safeguards.

To get started, a cloud administrator configures and populates Amazon Kendra indexes with enterprise data as data sources for SageMaker Canvas. Canvas users select the index where their documents are, and can ideate, research, and explore knowing that the output will always be backed by their sources-of-truth. SageMaker Canvas uses state-of-the-art FMs from Amazon Bedrock and Amazon SageMaker JumpStart. Conversations can be started with multiple FMs side-by-side, comparing the outputs and truly making generative-AI accessible to everyone.

In this post, we will review the recently released feature, discuss the architecture, and present a step-by-step guide to enable SageMaker Canvas to query documents from your knowledge base, as shown in the following screen capture.

Solution overview

Foundation models can produce hallucinations – responses that are generic, vague, unrelated, or factually incorrect. Retrieval Augmented Generation (RAG) is a frequently used approach to reduce hallucinations. RAG architectures are used to retrieve data from outside of an FM, which is then used to perform in-context learning to answer the user’s query. This ensures that the FM can use data from a trusted knowledge base and use that knowledge to answer users’ questions, reducing the risk of hallucination.

With RAG, the data external to the FM and used to augment user prompts can come from multiple disparate data sources, such as document repositories, databases, or APIs. The first step is to convert your documents and any user queries into a compatible format to perform relevancy semantic search. To make the formats compatible, a document collection, or knowledge library, and user-submitted queries are converted into numerical representations using embedding models.

With this release, RAG functionality is provided in a no-code and seamless manner. Enterprises can enrich the chat experience in Canvas with Amazon Kendra as the underlying knowledge management system. The following diagram illustrates the solution architecture.

Connecting SageMaker Canvas to Amazon Kendra requires a one-time set-up. We describe the set-up process in detail in Setting up Canvas to query documents. If you haven’t already set-up your SageMaker Domain, refer to Onboard to Amazon SageMaker Domain.

As part of the domain configuration, a cloud administrator can choose one or more Kendra indices that the business analyst can query when interacting with the FM through SageMaker Canvas.

After the Kendra indices are hydrated and configured, business analysts use them with SageMaker Canvas by starting a new chat and selecting “Query Documents” toggle. SageMaker Canvas will then manage the underlying communication between Amazon Kendra and the FM of choice to perform the following operations:

  1. Query the Kendra indices with the question coming from the user.
  2. Retrieve the snippets (and the sources) from Kendra indices.
  3. Engineer the prompt with the snippets with the original query so that the foundation model can generate an answer from the retrieved documents.
  4. Provide the generated answer to the user, along with references to the pages/documents that were used to formulate the response.

Setting up Canvas to query documents

In this section, we will walk you through the steps to set up Canvas to query documents served through Kendra indexes. You should have the following prerequisites:

  • SageMaker Domain setup – Onboard to Amazon SageMaker Domain
  • Create a Kendra index (or more than one)
  • Setup the Kendra Amazon S3 connector – follow the Amazon S3 Connector – and upload PDF files and other documents to the Amazon S3 bucket associated with the Kendra index
  • Setup IAM so that Canvas has the appropriate permissions, including those required for calling Amazon Bedrock and/or SageMaker endpoints – follow the Set-up Canvas Chat documentation

Now you can update the Domain so that it can access the desired indices. On the SageMaker console, for the given Domain, select Edit under the Domain Settings tab. Enable the toggle “Enable query documents with Amazon Kendra” which can be found at the Canvas Settings step. Once activated, choose one or more Kendra indices that you want to use with Canvas. Once activated, choose one or more Kendra indices that you want to use with Canvas.

That’s all that’s needed to configure Canvas query documents feature. Users can now jump into a chat within Canvas and start using the knowledge bases that have been attached to the Domain through the Kendra indexes. The maintainers of the knowledge-base can continue to update the source-of-truth and with the syncing capability in Kendra, the chat users will automatically be able to use the up-to-date information in a seamless manner.

Using the Query Documents feature for chat

As a SageMaker Canvas user, the Query Documents feature can be accessed from within a chat. To start the chat session, click or search for the “Generate, extract and summarize content” button from the Ready-to-use models tab in SageMaker Canvas.

Once there, you can turn on and off Query Documents with the toggle at the top of the screen. Check out the information prompt to learn more about the feature.

When Query Documents is enabled, you can choose among a list of Kendra indices enabled by the cloud administrator.

You can select an index when starting a new chat. You can then ask a question in the UX with knowledge being automatically sourced from the selected index. Note that after a conversation has started against a specific index, it is not possible to switch to another index.

For the questions asked, the chat will show the answer generated by the FM along with the source documents that contributed to generating the answer. When clicking any of the source documents, Canvas opens a preview of the document, highlighting the excerpt used by the FM.

Conclusion

Conversational AI has immense potential to transform customer and employee experience by providing a human-like assistant with natural and intuitive interactions such as:

  • Performing research on a topic or search and browse the organization’s knowledge base
  • Summarizing volumes of content to rapidly gather insights
  • Searching for Entities, Sentiments, PII and other useful data, and increasing the business value of unstructured content
  • Generating drafts for documents and business correspondence
  • Creating knowledge articles from disparate internal sources (incidents, chat logs, wikis)

The innovative integration of chat interfaces, knowledge retrieval, and FMs enables enterprises to provide accurate, relevant responses to user questions by using their domain knowledge and sources-of-truth.

By connecting SageMaker Canvas to knowledge bases in Amazon Kendra, organizations can keep their proprietary data within their own environment while still benefiting from state-of-the-art natural language capabilities of FMs. With the launch of SageMaker Canvas’s Query Documents feature, we are making it easy for any enterprise to use LLMs and their enterprise knowledge as source-of-truth to power a secure chat experience. All this functionality is available in a no-code format, allowing businesses to avoid handling the repetitive and non-specialized tasks.

To learn more about SageMaker Canvas and how it helps make it easier for everyone to start with Machine Learning, check out the SageMaker Canvas announcement. Learn more about how SageMaker Canvas helps foster collaboration between data scientists and business analysts by reading the Build, Share & Deploy post. Finally, to learn how to create your own Retrieval Augmented Generation workflow, refer to SageMaker JumpStart RAG.

References

Lewis, P., Perez, E., Piktus, A., Petroni, F., Karpukhin, V., Goyal, N., Küttler, H., Lewis, M., Yih, W., Rocktäschel, T., Riedel, S., Kiela, D. (2020). Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks. Advances in Neural Information Processing Systems, 33, 9459-9474.


About the Authors

Picture of DavideDavide Gallitelli is a Senior Specialist Solutions Architect for AI/ML. He is based in Brussels and works closely with customers all around the globe that are looking to adopt Low-Code/No-Code Machine Learning technologies, and Generative AI. He has been a developer since he was very young, starting to code at the age of 7. He started learning AI/ML at university, and has fallen in love with it since then.

Bilal Alam is an Enterprise Solutions Architect at AWS with a focus on the Financial Services industry. On most days Bilal is helping customers with building, uplifting and securing their AWS environment to deploy their most critical workloads. He has extensive experience in Telco, networking, and software development. More recently, he has been looking into using AI/ML to solve business problems.

Pashmeen Mistry is a Senior Product Manager at AWS. Outside of work, Pashmeen enjoys adventurous hikes, photography, and spending time with his family.

Dan Sinnreich is a Senior Product Manager at AWS, helping to democratize low-code/no-code machine learning. Previous to AWS, Dan built and commercialized enterprise SaaS platforms and time-series models used by institutional investors to manage risk and construct optimal portfolios. Outside of work, he can be found playing hockey, scuba diving, and reading science fiction.

Read More

Turning the Tide on Coral Reef Decline: CUREE Robot Dives Deep With Deep Learning

Turning the Tide on Coral Reef Decline: CUREE Robot Dives Deep With Deep Learning

Researchers are taking deep learning for a deep dive, literally.

The Woods Hole Oceanographic Institution (WHOI) Autonomous Robotics and Perception Laboratory (WARPLab) and MIT are developing a robot for studying coral reefs and their ecosystems.

The WARPLab autonomous underwater vehicle (AUV), enabled by an NVIDIA Jetson Orin NX module, is an effort from the world’s largest private ocean research institution to turn the tide on reef declines.

Some 25% of coral reefs worldwide have vanished in the past three decades, and most of the remaining reefs are heading for extinction, according to the WHOI Reef Solutions Initiative.

The AUV, dubbed CUREE (Curious Underwater Robot for Ecosystem Exploration), gathers visual, audio, and other environmental data alongside divers to help understand the human impact on reefs and the sea life around them. The robot runs an expanding collection of NVIDIA Jetson-enabled edge AI to build 3D models of reefs and to track creatures and plant life. It also runs models to navigate and collect data autonomously.

WHOI, whose submarine first explored the Titanic in 1986, is developing its CUREE robot for data gathering to scale the effort and aid in mitigation strategies. The oceanic research organization is also exploring the use of simulation and digital twins to better replicate reef conditions and investigate solutions like NVIDIA Omniverse, a development platform for building and connecting 3D tools and applications.

Creating a digital twin of Earth in Omniverse, NVIDIA is developing the world’s most powerful AI supercomputer for predicting climate change, called Earth-2.

Underwater AI: DeepSeeColor Model

Anyone who’s gone snorkeling knows that seeing underwater isn’t as clear as seeing on land. Over distance, water attenuates the visible spectrum of light from the sun underwater, muting some colors more than others. At the same time, particles in the water create a hazy view, known as backscatter.

A team from WARPLab recently published a research paper on undersea vision correction that helps mitigate these problems and supports the work of CUREE. The paper describes a model, called DeepSeeColor, that uses a sequence of two convolutional neural networks to reduce backscatter and correct colors in real time on the NVIDIA Jetson Orin NX while undersea.

“NVIDIA GPUs are involved in a large portion of our pipeline because, basically, when the images come in, we use DeepSeeColor to color correct them, and then we can do the fish detection and transmit that to a scientist up at the surface on a boat,” said Stewart Jamieson, a robotics Ph.D. candidate at MIT and AI developer at WARPLab.

Eyes and Ears: Fish and Reef Detection

CUREE packs four forward-facing cameras, four hydrophones for underwater audio capture, depth sensors and inertial measurement unit sensors. GPS doesn’t work underwater, so it is only used to initialize the robot’s starting position while on the surface.

Using a combination of cameras and hydrophones along with AI models running on the Jetson Orin NX enables CUREE to collect data for producing 3D models of reefs and undersea terrains.

To use the hydrophones for audio data collection, CUREE needs to drift with its motor off so that there’s no interference with the audio.

“It can build a spatial soundscape map of the reef, using sounds produced by different animals,” said Yogesh Girdhar, an associate scientist at WHOI, who leads WARPLab. “We currently (in post-processing) detect where all the chatter associated with bioactivity hotspots is,” he added, referring to all the noises of sea life.

The team has been training detection models for both audio and video input to track creatures. But a big noise interference with detecting clear audio samples has come from one creature in particular.

“The problem is that, underwater, the snapping shrimps are loud,” said Girdhar. On land, this classic dilemma of how to separate sounds from background noises is known as the cocktail party problem. “If only we could figure out an algorithm to remove the effects of sounds of snapping shrimps from audio, but at the moment we don’t have a good solution,” said Girdhar.

Despite few underwater datasets in existence, pioneering fish detection and tracking is going well, said Levi Cai, a Ph.D. candidate in the MIT-WHOI joint program. He said they’re taking a semi-supervised approach to the marine animal tracking problem. The tracking is initialized using targets detected by a fish detection neural network trained on open-source datasets for fish detection, which is fine-tuned with transfer learning from images gathered by CUREE.

“We manually drive the vehicle until we see an animal that we want to track, and then we click on it and have the semi-supervised tracker take over from there,” said Cai.

Jetson Orin Energy Efficiency Drives CUREE

Energy efficiency is critical for small AUVs like CUREE. The compute requirements for data collection consume roughly 25% of the available energy resources, with driving the robots taking the remainder.

CUREE typically operates for as long as two hours on a charge, depending on the reef mission and the observation requirements, said Girdhar, who goes on the dive missions in St. John in the U.S. Virgin Islands.

To enhance energy efficiency, the team is looking into AI for managing the sensors so that computing resources automatically stay awake while making observations and sleep when not in use.

“Our robot is small, so the amount of energy spent on GPU computing actually matters — with Jetson Orin NX our power issues are gone, and it’s made our system much more robust,” said  Girdhar.

Exploring Isaac Sim to Make Improvements 

The WARPLab team is experimenting with NVIDIA Isaac Sim, a scalable robotics simulation application and synthetic data generation tool powered by Omniverse, to accelerate development of autonomy and observation for CUREE.

The goal is to do simple simulations in Isaac Sim to get the core essence of the problem to be simulated and then finish the training in the real world undersea, said Yogesh.

“In a coral reef environment, we cannot depend on sonars — we need to get up really close,” he said. “Our goal is to observe different ecosystems and processes happening.”

Understanding Ecosystems and Creating Mitigation Strategies

The WARPLab team intends to make the CUREE platform available for others to understand the impact humans are having on undersea environments and to help create mitigation strategies.

The researchers plan to learn from patterns that emerge from the data. CUREE provides an almost fully autonomous data collection scientist that can communicate findings to human researchers, said Jamieson. “A scientist gets way more out of this than if the task had to be done manually, driving it around staring at a screen all day,” he said.

Girdhar said that ecosystems like coral reefs can be modeled with a network, with different nodes corresponding to different types of species and habitat types. Within that, he said, there are all these different interactions happening, and the researchers seek to understand this network to learn about the relationship between various animals and their habitats.

The hope is that there’s enough data collected using CUREE AUVs to gain a comprehensive understanding of ecosystems and how they might progress over time and be affected by harbors, pesticide runoff, carbon emissions and dive tourism, he said.

“We can then better design and deploy interventions and determine, for example, if we planted new corals how they would change the reef over time,” said Girdar.

Learn more about NVIDIA Jetson Orin NX, Omniverse and Earth-2.

 

Read More

Project Silica: Sustainable cloud archival storage in glass

Project Silica: Sustainable cloud archival storage in glass

This research paper was presented at the 29th ACM Symposium on Operating Systems Principles (opens in new tab) (SOSP 2023), the premier forum for the theory and practice of computer systems software.

SOSP 2023
Project Silica: Towards Sustainable Cloud Archival Storage in Glass

Data growth demands a sustainable archival solution

For millennia, data has woven itself into every facet of our lives, from business and academia to personal spheres. Our production of data is staggering, encompassing personal photos, medical records, financial data, scientific insights, and more. By 2025, it’s estimated that we will generate a massive 175 zettabytes of data annually. Amidst this deluge, a substantial portion is vital for preserving our collective heritage and personal histories.  

Presently, magnetic technologies like tape and hard disk drives provide the most economical storage, but they come with limitations. Magnetic media lacks the longevity and durability essential for enduring archival storage, requiring data to be periodically migrated to new media—for hard disk drives, this is every five years, for magnetic tape, it’s around ten. Moreover, ensuring data longevity on magnetic media requires regular “scrubbing,” a process involving reading data to identify corruption and fixing any errors. This leads to substantial energy consumption. We need a sustainable solution, one that ensures the preservation of our digital heritage without imposing an ongoing environmental and financial burden.

Project Silica: Sustainable and durable cloud archival storage

Our paper, “Project Silica: Towards Sustainable Cloud Archival Storage in Glass, (opens in new tab)” presented at SOSP 2023 (opens in new tab), describes Project Silica, a cloud-based storage system underpinned by quartz glass. This type of glass is a durable, chemically inert, and resilient low-cost media, impervious to electromagnetic interference. With data’s lifespan lasting thousands of years, quartz glass is ideal for archival storage, offering a sustainable solution and eliminating the need for periodic data refreshes.

Writing, reading, and decoding data

Ultrafast femtosecond lasers enable the writing process. Data is written inside a square glass platter similar in size to a DVD through voxels, permanent modifications to the physical structure of the glass made using femtosecond-scale laser pulses. Voxels encode multiple bits of data and are written in 2D layers across the XY plane. Hundreds of these layers are then stacked in the Z axis. To achieve high write throughput, we rapidly scan the laser pulses across the length of the media using a scanner similar to that used in barcode readers. 

To read data, we employ polarization microscopy to image the platter. The read drive scans sectors in a single swift Z-pattern, and the resulting images are processed for decoding. Different read drive options offer varying throughput, balancing cost and performance.

Data decoding relies on ML models that analyze images captured by the read drive, accurately converting signals from analog to digital. The glass library design includes independent read, write, and storage racks. Platters are stored in power-free storage racks and moved by free-roaming shuttles, ensuring minimal resource consumption for passive storage, as shown in Video 1. A one-way system between write racks and the rest of the library ensures that a written platter cannot be over-written under any circumstances, enforcing data integrity.

Video 1. The Silica library prototype demonstrates the flexible and scalable design of the system and its ability to sustainably service archival workloads. 

Azure workload analysis informs Silica’s design

To build an optimal storage system around the core Silica technology, we extensively studied cloud archival data workloads from Azure Storage. Surprisingly, we discovered that small read requests dominate the read workload, yet a small percentage of requests constitute the majority of read bytes, creating a skewed distribution, as illustrated in Figure 1.

Project Silica paper at SOSP 2023: A double bar chart with 2 y-axes: percentage of total read operations on the left y-axis, and percentage of total bytes read on the right y-axis; with file size buckets on the x-axis. The graph shows that the majority of read operations are for files with small file sizes, but they only make up a small fraction of all the bytes read (i.e., 58% of operations are for file sizes smaller than 4MB, but make up less than 1.2% of all bytes read). Conversely, most bytes read are for large files, but make up a small fraction of all read operations (i.e., 85% of bytes read are for files larger than 256MB, but make up less than 2% of requests).
Figure 1. The distribution of read request sizes. Most requests are for small files, but they make up a small percentage of the total load in bytes.

This implies that minimizing the latency of mechanical movement in the library is crucial for optimal performance. Silica glass, a random-seeking storage medium, can suitably meet these requirements as it eliminates the necessity for spooling, unlike magnetic tape. Figure 2 illustrates substantial differences in read demand across various datacenters. These results suggest that we need a flexible library design that can scale resources for each datacenter’s workload. Studying these archival workloads has been instrumental in helping us establish the core design principles for the Silica storage system.

Project Silica paper at SOSP 2023: Figure 2. A bar chart showing different, unlabeled data centers on the x-axis, and tail over median read throughput on the y-axis on a log scale. The graph shows up to 7 orders of magnitude mean-to-tail difference within a data center, and up to 5 orders of magnitude variability in the mean-to-tail difference across different data centers.
Figure 2. Tail over median read load for different datacenters. The data shows significant variation across and within datacenters.

Microsoft Research Podcast

Collaborators: Renewable energy storage with Bichlien Nguyen and David Kwabi

Dr. Bichlien Nguyen and Dr. David Kwabi explore their work in flow batteries and how machine learning can help more effectively search the vast organic chemistry space to identify compounds with properties just right for storing waterpower and other renewables.


Project Silica’s versatile storage system

We designed and evaluated a comprehensive storage system that manages error correction, data layout, request scheduling, and shuttle traffic management. Our design effectively manages IOPS-intensive tasks, meeting the expected service level objective (SLO) of an archival storage tier, approximately 15 hours. Interestingly, even in volume-intensive scenarios where a large number of bytes are read, our system efficiently handles requests using read drives with low throughput. In both cases, throughput demands are significantly below those of traditional tape drives. This is shown in Figure 3. The paper provides an extensive description of this system, and the video above shows our prototype library’s capabilities. 

Project Silica paper at SOSP 2023: Figure 3. A line chart with 3 lines: Volume, IOPS, and Typical. The x-axis shows Read drive throughput ranging from 30MB/s to 210MB/s in increments of 30, and the y-axis shows the tail completion time in hours of the system running each of the workloads represented by each line. The graph shows that all workloads complete within the desired 15-hour SLO, even with 30MB/s read drives. The SLO improves as read drive throughput increases, but starts to plateau past 60MB/s for all workloads.
Figure 3. Volume and IOPS workloads represent different extremes in the spectrum of read workloads. Our design can service both workloads well within the expected SLO for an archival storage tier, at about 15 hours.

Diverse applications for sustainably archiving humanity’s data

Project Silica holds promise in numerous sectors, such as healthcare, scientific research, and finance, where secure and durable archival storage of sensitive data is crucial. Research institutions could benefit from Silica’s ability to store vast datasets generated from experiments and simulations, ensuring the integrity and accessibility of research findings over time. Similarly, healthcare organizations could securely archive patient records, medical imaging data, and research outcomes for long-term reference and analysis. 

As the volume of globally generated data grows, traditional storage solutions will continue to face challenges in terms of scalability, energy-efficiency, and long-term durability. Moreover, as technologies like AI and advanced analytics progress, the need for reliable and accessible archival data will continue to intensify. Project Silica is well-positioned to play a pivotal role in supporting these technologies by providing a stable, secure, and sustainable repository for the vast amounts of data we create and rely on.

The post Project Silica: Sustainable cloud archival storage in glass appeared first on Microsoft Research.

Read More

Spoken question answering and speech continuation using a spectrogram-powered LLM

Spoken question answering and speech continuation using a spectrogram-powered LLM

The goal of natural language processing (NLP) is to develop computational models that can understand and generate natural language. By capturing the statistical patterns and structures of text-based natural language, language models can predict and generate coherent and meaningful sequences of words. Enabled by the increasing use of the highly successful Transformer model architecture and with training on large amounts of text (with proportionate compute and model size), large language models (LLMs) have demonstrated remarkable success in NLP tasks.

However, modeling spoken human language remains a challenging frontier. Spoken dialog systems have conventionally been built as a cascade of automatic speech recognition (ASR), natural language understanding (NLU), response generation, and text-to-speech (TTS) systems. However, to date there have been few capable end-to-end systems for the modeling of spoken language: i.e., single models that can take speech inputs and generate its continuation as speech outputs.

Today we present a new approach for spoken language modeling, called Spectron, published in “Spoken Question Answering and Speech Continuation Using Spectrogram-Powered LLM.” Spectron is the first spoken language model that is trained end-to-end to directly process spectrograms as both input and output, instead of learning discrete speech representations. Using only a pre-trained text language model, it can be fine-tuned to generate high-quality, semantically accurate spoken language. Furthermore, the proposed model improves upon direct initialization in retaining the knowledge of the original LLM as demonstrated through spoken question answering datasets.

We show that a pre-trained speech encoder and a language model decoder enable end-to-end training and state-of-the-art performance without sacrificing representational fidelity. Key to this is a novel end-to-end training objective that implicitly supervises speech recognition, text continuation, and conditional speech synthesis in a joint manner. A new spectrogram regression loss also supervises the model to match the higher-order derivatives of the spectrogram in the time and frequency domain. These derivatives express information aggregated from multiple frames at once. Thus, they express rich, longer-range information about the shape of the signal. Our overall scheme is summarized in the following figure:

The Spectron model connects the encoder of a speech recognition model with a pre-trained Transformer-based decoder language model. At training, speech utterances split into a prompt and its continuation. Then the full transcript (prompt and continuation) is reconstructed along with the continuation’s speech features. At inference, only a prompt is provided; the prompt’s transcription, text continuation, and speech continuations are all generated by the model.

Spectron architecture

The architecture is initialized with a pre-trained speech encoder and a pre-trained decoder language model. The encoder is prompted with a speech utterance as input, which it encodes into continuous linguistic features. These features feed into the decoder as a prefix, and the whole encoder-decoder is optimized to jointly minimize a cross-entropy loss (for speech recognition and transcript continuation) and a novel reconstruction loss (for speech continuation). During inference, one provides a spoken speech prompt, which is encoded and then decoded to give both text and speech continuations.

Speech encoder

The speech encoder is a 600M-parameter conformer encoder pre-trained on large-scale data (12M hours). It takes the spectrogram of the source speech as input, generating a hidden representation that incorporates both linguistic and acoustic information. The input spectrogram is first subsampled using a convolutional layer and then processed by a series of conformer blocks. Each conformer block consists of a feed-forward layer, a self-attention layer, a convolution layer, and a second feed-forward layer. The outputs are passed through a projection layer to match the hidden representations to the embedding dimension of the language model.

Language model

We use a 350M or 1B parameter decoder language model (for the continuation and question-answering tasks, respectively) trained in the manner of PaLM 2. The model receives the encoded features of the prompt as a prefix. Note that this is the only connection between the speech encoder and the LM decoder; i.e., there is no cross-attention between the encoder and the decoder. Unlike most spoken language models, during training, the decoder is teacher-forced to predict the text transcription, text continuation, and speech embeddings. To convert the speech embeddings to and from spectrograms, we introduce lightweight modules pre- and post-network.

By having the same architecture decode the intermediate text and the spectrograms, we gain two benefits. First, the pre-training of the LM in the text domain allows continuation of the prompt in the text domain before synthesizing the speech. Secondly, the predicted text serves as intermediate reasoning, enhancing the quality of the synthesized speech, analogous to improvements in text-based language models when using intermediate scratchpads or chain-of-thought (CoT) reasoning.

Acoustic projection layers

To enable the language model decoder to model spectrogram frames, we employ a multi-layer perceptron “pre-net” to project the ground truth spectrogram speech continuations to the language model dimension. This pre-net compresses the spectrogram input into a lower dimension, creating a bottleneck that aids the decoding process. This bottleneck mechanism prevents the model from repetitively generating the same prediction in the decoding process. To project the LM output from the language model dimension to the spectrogram dimension, the model employs a “post-net”, which is also a multi-layer perceptron. Both pre- and post-networks are two-layer multi-layer perceptrons.

Training objective

The training methodology of Spectron uses two distinct loss functions: (i) cross-entropy loss, employed for both speech recognition and transcript continuation, and (ii) regression loss, employed for speech continuation. During training, all parameters are updated (speech encoder, projection layer, LM, pre-net, and post-net).

Audio samples

Following are examples of speech continuation and question answering from Spectron:

Speech Continuation

Prompt:   
Continuation:   
   
Prompt:   
Continuation:   
   
Prompt:   
Continuation:   
   
Prompt:   
Continuation:   
   

Question Answering

Question:   
Answer:   
   
Question:   
Answer:   

Performance

To empirically evaluate the performance of the proposed approach, we conducted experiments on the Libri-Light dataset. Libri-Light is a 60k hour English dataset consisting of unlabelled speech readings from LibriVox audiobooks. We utilized a frozen neural vocoder called WaveFit to convert the predicted spectrograms into raw audio. We experiment with two tasks, speech continuation and spoken question answering (QA). Speech continuation quality is tested on the LibriSpeech test set. Spoken QA is tested on the Spoken WebQuestions datasets and a new test set named LLama questions, which we created. For all experiments, we use a 3 second audio prompt as input. We compare our method against existing spoken language models: AudioLM, GSLM, TWIST and SpeechGPT. For the speech continuation task, we use the 350M parameter version of LM and the 1B version for the spoken QA task.

For the speech continuation task, we evaluate our method using three metrics. The first is log-perplexity, which uses an LM to evaluate the cohesion and semantic quality of the generated speech. The second is mean opinion score (MOS), which measures how natural the speech sounds to human evaluators. The third, speaker similarity, uses a speaker encoder to measure how similar the speaker in the output is to the speaker in the input. Performance in all 3 metrics can be seen in the following graphs.

Log-perplexity for completions of LibriSpeech utterances given a 3-second prompt. Lower is better.
Speaker similarity between the prompt speech and the generated speech using the speaker encoder. Higher is better.
MOS given by human users on speech naturalness. Raters rate 5-scale subjective mean opinion score (MOS) ranging between 0 – 5 in naturalness given a speech utterance. Higher is better.

As can be seen in the first graph, our method significantly outperforms GSLM and TWIST on the log-perplexity metric, and does slightly better than state-of-the-art methods AudioLM and SpeechGPT. In terms of MOS, Spectron exceeds the performance of all the other methods except for AudioLM. In terms of speaker similarity, our method outperforms all other methods.

To evaluate the ability of the models to perform question answering, we use two spoken question answering datasets. The first is the LLama Questions dataset, which uses general knowledge questions in different domains generated using the LLama2 70B LLM. The second dataset is the WebQuestions dataset which is a general question answering dataset. For evaluation we use only questions that fit into the 3 second prompt length. To compute accuracy, answers are transcribed and compared to the ground truth answers in text form.

Accuracy for Question Answering on the LLama Questions and Spoken WebQuestions datasets. Accuracy is computed using the ASR transcripts of spoken answers.

First, we observe that all methods have more difficulty answering questions from the Spoken WebQuestions dataset than from the LLama questions dataset. Second, we observe that methods centered around spoken language modeling such as GSLM, AudioLM and TWIST have a completion-centric behavior rather than direct question answering which hindered their ability to perform QA. On the LLama questions dataset our method outperforms all other methods, while SpeechGPT is very close in performance. On the Spoken WebQuestions dataset, our method outperforms all other methods except for SpeechGPT, which does marginally better.

Acknowledgements

The direct contributors to this work include Eliya Nachmani, Alon Levkovitch, Julian Salazar, Chulayutsh Asawaroengchai, Soroosh Mariooryad, RJ Skerry-Ryan and Michelle Tadmor Ramanovich. We also thank Heiga Zhen, Yifan Ding, Yu Zhang, Yuma Koizumi, Neil Zeghidour, Christian Frank, Marco Tagliasacchi, Nadav Bar, Benny Schlesinger and Blaise Aguera-Arcas.

Read More

The Sky’s the Limit: ‘Cities: Skylines II’ Streams This Week on GeForce NOW

The Sky’s the Limit: ‘Cities: Skylines II’ Streams This Week on GeForce NOW

The cloud is full of treats this GFN Thursday with Cities: Skylines II now streaming, leading 15 newly supported games this week. The game’s publisher, Paradox Interactive, is offering GeForce NOW one-month Priority memberships for those who pick up the game first, so make sure to grab one before they’re gone.

Among the newly supported additions to the GeForce NOW library are more games from the PC Game Pass catalog, including Ghostwire Tokyo, State of Decay and the Dishonored series. Members can also look forward to Alan Wake 2 — streaming soon.

Cloud City

Cities: Skylines II on GeForce NOW
If you build it, they will come.

Members can build the metropolis of their dreams this week in Cities: Skylines II, the sequel to Paradox Interactive’s award-winning city sim. Raise a city from the ground up and transform it into a thriving urban landscape. Get creative to build on an unprecedented scale while managing a deep simulation and a living economy.

The game’s AI and intricate economics mean every choice ripples through the fabric of a player’s city, so they’ll have to stay sharp — strategizing, solving problems and reacting to challenges. Build sky-high and sprawl across the map like never before. New dynamic map features affect how the city expands amid rising pollution, changing weather and seasonal challenges.

Paradox is offering one-month GeForce NOW Priority memberships to the first 100,000 people who purchase the game, so budding city planners can optimize their gameplay across nearly any device. Visit Cities Skylines II for more info.

Newly Risen in the Cloud

Settle in for a spooky night with the newest PC Game Pass additions to the cloud: State of Decay and the Dishonored series.

State of Decay 2: Juggernaut Edition on GeForce NOW
“The right choice is the one that keeps us alive.”

Drop into a post-apocalyptic world and fend off zombies in State of Decay 2: Juggernaut Edition from Undead Labs and Xbox Game Studios. Band together with a small group of survivors and rebuild a corner of civilization in this dynamic, open-world sandbox. Fortify home base, perform daring raids for food and supplies and rescue other survivors who may have unique talents to contribute. Head online with friends for an up to four-player online co-op mode and visit their communities to help defend them and bring back rewards. No two players’ experiences will be the same.

Dishonor on you, dishonor on your cow, “Dishonored” in the cloud.

Get supernatural with the Dishonored series, which comprises first-person action games set in a steampunk Lovecraftian world. In Dishonored, follow the story of Corvo Attano — a former bodyguard turned assassin driven by revenge after being framed for the murder of the Empress of Dunwall. Choose stealth or violence with Dishonored’s flexible combat system and Corvo’s supernatural abilities.

The Definitive Edition includes the original Dishonored game with updated graphics, the “Void Walker’s Arsenal” add-on pack, plus expansion packs for more missions: “The Knife of Dunwall,” “The Brigmore Witches” and “Dunwall City Trials.”

Follow up with the sequel, Dishonored 2, set 15 years after Dishonored. Members can play as Corvo or his daughter, Emily, who seeks to reclaim her rightful place as the Empress of Dunwall. Dishonored: Death of the Outsider is the latest in the series, following the story of former assassin Billie Lurk on her mission to discover the origins of a mysterious entity called The Outsider.

It’s Getting Dark in Here

Maybe you should be afraid of the dark after all.

Alan Wake 2, the long-awaited sequel to Remedy Entertainments’ survival-horror classic, is coming soon to the cloud.

What begins as a small-town murder investigation rapidly spirals into a nightmare journey. Uncover the source of a supernatural darkness in this psychological horror story filled with suspense and unexpected twists. Play as FBI agent Saga Anderson and Alan Wake, a horror writer long trapped in the Dark Place, to see events unfold from different perspectives.

Ultimate members will soon be able to uncover mysteries with the power of a GeForce RTX 4080 server in the cloud. Survive the surreal world of Alan Wake 2 at up to 4K resolution and 120 frames per second, with path-traced graphics accelerated and enhanced by NVIDIA DLSS 3.5 and NVIDIA Reflex technology.

Trick or Treat: Give Me All New Games to Beat

Ghostwire Tokyo on GeForce NOW
I ain’t afraid of no ghost.

It’s time for a bewitching new list of games in the cloud. Ghostwire Tokyo from Bethesda is an action-adventure game set in a modern-day Tokyo mysteriously depopulated by a paranormal phenomenon. Team with a spectral entity to fight the supernatural forces that have taken over the city, including ghosts, yokai and other creatures from Japanese folklore.

Jump into the action now with 15 new games this week:

Make sure to check out the question of the week. Share your answer on Twitter or in the comments below.

Read More

Frontier risk and preparedness

To support the safety of highly-capable AI systems, we are developing our approach to catastrophic risk preparedness, including building a Preparedness team and launching a challenge.OpenAI Blog

Looking back at wildfire research in 2023

Looking back at wildfire research in 2023


Wildfires are becoming larger and affecting more and more communities around the world, often resulting in large-scale devastation. Just this year, communities have experienced catastrophic wildfires in Greece, Maui, and Canada to name a few. While the underlying causes leading to such an increase are complex — including changing climate patterns, forest management practices, land use development policies and many more — it is clear that the advancement of technologies can help to address the new challenges.

At Google Research, we’ve been investing in a number of climate adaptation efforts, including the application of machine learning (ML) to aid in wildfire prevention and provide information to people during these events. For example, to help map fire boundaries, our wildfire boundary tracker uses ML models and satellite imagery to map large fires in near real-time with updates every 15 minutes. To advance our various research efforts, we are partnering with wildfire experts and government agencies around the world.

Today we are excited to share more about our ongoing collaboration with the US Forest Service (USFS) to advance fire modeling tools and fire spread prediction algorithms. Starting from the newly developed USFS wildfire behavior model, we use ML to significantly reduce computation times, thus enabling the model to be employed in near real time. This new model is also capable of incorporating localized fuel characteristics, such as fuel type and distribution, in its predictions. Finally, we describe an early version of our new high-fidelity 3D fire spread model.

Current state of the art in wildfire modeling

Today’s most widely used state-of-the-art fire behavior models for fire operation and training are based on the Rothermel fire model developed at the US Forest Service Fire Lab, by Rothermel et al., in the 1970s. This model considers many key factors that affect fire spread, such as the influence of wind, the slope of the terrain, the moisture level, the fuel load (e.g., the density of the combustible materials in the forest), etc., and provided a good balance between computational feasibility and accuracy at the time. The Rothermel model has gained widespread use throughout the fire management community across the world.

Various operational tools that employ the Rothermel model, such as BEHAVE, FARSITE, FSPro, and FlamMap, have been developed and improved over the years. These tools and the underlying model are used mainly in three important ways: (1) for training firefighters and fire managers to develop their insights and intuitions on fire behavior, (2) for fire behavior analysts to predict the development of a fire during a fire operation and to generate guidance for situation awareness and resource allocation planning, and (3) for analyzing forest management options intended to mitigate fire hazards across large landscapes.  These models are the foundation of fire operation safety and efficiency today.

However, there are limitations on these state-of-the art models, mostly associated with the simplification of the underlying physical processes (which was necessary when these models were created). By simplifying the physics to produce steady state predictions, the required inputs for fuel sources and weather became practical but also more abstract compared to measurable quantities.  As a result, these models are typically “adjusted” and “tweaked” by experienced fire behavior analysts so they work more accurately in certain situations and to compensate for uncertainties and unknowable environmental characteristics. Yet these expert adjustments mean that many of the calculations are not repeatable.

To overcome these limitations, USFS researchers have been working on a new model to drastically improve the physical fidelity of fire behavior prediction. This effort represents the first major shift in fire modeling in the past 50 years. While the new model continues to improve in capturing fire behavior, the computational cost and inference time makes it impractical to be deployed in the field or for applications with near real-time requirements. In a realistic scenario, to make this model useful and practical in training and operations, a speed up of at least 1000x would be needed.

Machine learning acceleration

In partnership with the USFS, we have undertaken a program to apply ML to decrease computation times for complex fire models. Researchers knew that many complex inputs and features could be characterized using a deep neural network, and if successful, the trained model would lower the computational cost and latency of evaluating new scenarios. Deep learning is a branch of machine learning that uses neural networks with multiple hidden layers of nodes that do not directly correspond to actual observations. The model’s hidden layers allow a rich representation of extremely complex systems — an ideal technique for modeling wildfire spread.

We used the USFS physics-based, numerical prediction models to generate many simulations of wildfire behavior and then used these simulated examples to train the deep learning model on the inputs and features to best capture the system behavior accurately. We found that the deep learning model can perform at a much lower computational cost compared to the original and is able to address behaviors resulting from fine-scale processes. In some cases, computation time for capturing the fine-scale features described above and providing a fire spread estimate was 100,000 times faster than running the physics-based numerical models.

This project has continued to make great progress since the first report at ICFFR in December 2022. The joint Google–USFS presentation at ICFFR 2022 and the USFS Fire Lab’s project page provides a glimpse into the ongoing work in this direction. Our team has expanded the dataset used for training by an order of magnitude, from 40M up to 550M training examples. Additionally, we have delivered a prototype ML model that our USFS Fire Lab partner is integrating into a training app that is currently being developed for release in 2024.

Google researchers visiting the USFS Fire Lab in Missoula, MT, stopping by Big Knife Fire Operation Command Center.

Fine-grained fuel representation

Besides training, another key use-case of the new model is for operational fire prediction. To fully leverage the advantages of the new model’s capability to capture the detailed fire behavior changes from small-scale differences in fuel structures, high resolution fuel mapping and representation are needed. To this end, we are currently working on the integration of high resolution satellite imagery and geo information into ML models to allow fuel specific mapping at-scale. Some of the preliminary results will be presented at the upcoming 10th International Fire Ecology and Management Congress in November 2023.

Future work

Beyond the collaboration on the new fire spread model, there are many important and challenging problems that can help fire management and safety. Many such problems require even more accurate fire models that fully consider 3D flow interactions and fluid dynamics, thermodynamics and combustion physics. Such detailed calculations usually require high-performance computers (HPCs) or supercomputers.

These models can be used for research and longer-term planning purposes to develop insights on extreme fire development scenarios, build ML classification models, or establish a meaningful “danger index” using the simulated results. These high-fidelity simulations can also be used to supplement physical experiments that are used in expanding the operational models mentioned above.

In this direction, Google research has also developed a high-fidelity large-scale 3D fire simulator that can be run on Google TPUs. In the near future, there is a plan to further leverage this new capability to augment the experiments, and to generate data to build insights on the development of extreme fires and use the data to design a fire-danger classifier and fire-danger index protocol.

An example of 3D high-fidelity simulation. This is a controlled burn field experiment (FireFlux II) simulated using Google’s high fidelity fire simulator.

Acknowledgements

We thank Mark Finney, Jason Forthofer, William Chatham and Issac Grenfell from US Forest Service Missoula Fire Science Laboratory and our colleagues John Burge, Lily Hu, Qing Wang, Cenk Gazen, Matthias Ihme, Vivian Yang, Fei Sha and John Anderson for core contributions and useful discussions. We also thank Tyler Russell for his assistance with program management and coordination.

Read More

Detection and high-frequency monitoring of methane emission point sources using Amazon SageMaker geospatial capabilities

Detection and high-frequency monitoring of methane emission point sources using Amazon SageMaker geospatial capabilities

Methane (CH4) is a major anthropogenic greenhouse gas that‘s a by-product of oil and gas extraction, coal mining, large-scale animal farming, and waste disposal, among other sources. The global warming potential of CH4 is 86 times that of CO2 and the Intergovernmental Panel on Climate Change (IPCC) estimates that methane is responsible for 30 percent of observed global warming to date. Rapidly reducing leakage of CH4 into the atmosphere represents a critical component in the fight against climate change. In 2021, the U.N. introduced The Global Methane Pledge at the Climate Change Conference (COP26), with a goal to take “fast action on methane to keep a 1.5C future within reach.” The Pledge has 150 signatories including the U.S. and EU.

Early detection and ongoing monitoring of methane sources is a key component of meaningful action on methane and is therefore becoming a concern for policy makers and organizations alike. Implementing affordable, effective methane detection solutions at scale – such as on-site methane detectors or aircraft-mounted spectrometers – is challenging, as they are often impractical or prohibitively expensive. Remote sensing using satellites, on the other hand, can provide the global-scale, high-frequency, and cost-effective detection functionality that stakeholders desire.

In this blog post, we show you how you can use Sentinel 2 satellite imagery hosted on the AWS Registry of Open Data in combination with Amazon SageMaker geospatial capabilities to detect point sources of CH4 emissions and monitor them over time. Drawing on recent findings from the earth observation literature you will learn how you can implement a custom methane detection algorithm and use it to detect and monitor methane leakage from a variety of sites across the globe. This post includes accompanying code on GitHub that provides additional technical detail and helps you to get started with your own methane monitoring solution.

Traditionally, running complex geospatial analyses was a difficult, time-consuming, and resource-intensive undertaking. Amazon SageMaker geospatial capabilities make it easier for data scientists and machine learning engineers to build, train, and deploy models using geospatial data. Using SageMaker geospatial capabilities, you can efficiently transform or enrich large-scale geospatial datasets, accelerate model building with pre-trained machine learning (ML) models, and explore model predictions and geospatial data on an interactive map using 3D accelerated graphics and built-in visualization tools.

Remote sensing of methane point sources using multispectral satellite imagery

Satellite-based methane sensing approaches typically rely on the unique transmittance characteristics of CH4. In the visible spectrum, CH4 has transmittance values equal or close to 1, meaning it’s undetectable by the naked eye. Across certain wavelengths, however, methane does absorb light (transmittance <1), a property which can be exploited for detection purposes. For this, the short wavelength infrared (SWIR) spectrum (1500–2500 nm spectral range) is typically chosen, which is where CH4 is most detectable. Hyper- and multispectral satellite missions (that is, those with optical instruments that capture image data within multiple wavelength ranges (bands) across the electromagnetic spectrum) cover these SWIR ranges and therefore represent potential detection instruments. Figure 1 plots the transmittance characteristics of methane in the SWIR spectrum and the SWIR coverage of various candidate multispectral satellite instruments (adapted from this study).

Figure 1 – Transmittance characteristics of methane in the SWIR spectrum and coverage of Sentinel-2 multi-spectral missions

Figure 1 – Transmittance characteristics of methane in the SWIR spectrum and coverage of Sentinel-2 multi-spectral missions

Many multispectral satellite missions are limited either by a low revisit frequency (for example, PRISMA Hyperspectral at approximately 16 days) or by low spatial resolution (for example, Sentinel 5 at 7.5 km x 7.5 km). The cost of accessing data is an additional challenge: some dedicated constellations operate as commercial missions, potentially making CH4 emission insights less readily available to researchers, decision makers, and other concerned parties due to financial constraints. ESA’s Sentinel-2 multispectral mission, which this solution is based on, strikes an appropriate balance between revisit rate (approximately 5 days), spatial resolution (approximately 20 m) and open access (hosted on the AWS Registry of Open Data).

Sentinel-2 has two bands that cover the SWIR spectrum (at a 20 m resolution): band-11 (1610 nm central wavelength) and band-12 (2190 nm central wavelength). Both bands are suitable for methane detection, while band-12 has significantly higher sensitivity to CH4 absorption (see Figure 1). Intuitively there are two possible approaches to using this SWIR reflectance data for methane detection. First, you could focus on just a single SWIR band (ideally the one that is most sensitive to CH4 absorption) and compute the pixel-by-pixel difference in reflectance across two different satellite passes. Alternatively, you use data from a single satellite pass for detection by using the two adjacent spectral SWIR bands that have similar surface and aerosol reflectance properties but have different methane absorption characteristics.

The detection method we implement in this blog post combines both approaches. We draw on recent findings from the earth observation literature and compute the fractional change in top-of-the-atmosphere (TOA) reflectance Δρ (that is, reflectance measured by Sentinel-2 including contributions from atmospheric aerosols and gases) between two satellite passes and the two SWIR bands; one baseline pass where no methane is present (base) and one monitoring pass where an active methane point source is suspected (monitor). Mathematically, this can be expressed as follows:

Equation 1Equation (1)

where ρ is the TOA reflectance as measured by Sentinel-2, cmonitor and cbase are computed by regressing TOA reflectance values of band-12 against those of band-11 across the entire scene (that is, ρb11 = c * ρb12). For more details, refer to this study on high-frequency monitoring of anomalous methane point sources with multispectral Sentinel-2 satellite observations.

Implement a methane detection algorithm with SageMaker geospatial capabilities

To implement the methane detection algorithm, we use the SageMaker geospatial notebook within Amazon SageMaker Studio. The geospatial notebook kernel is pre-equipped with essential geospatial libraries such as GDAL, GeoPandas, Shapely, xarray, and Rasterio, enabling direct visualization and processing of geospatial data within the Python notebook environment. See the getting started guide to learn how to start using SageMaker geospatial capabilities.

SageMaker provides a purpose-built API designed to facilitate the retrieval of satellite imagery through a consolidated interface using the SearchRasterDataCollection API call. SearchRasterDataCollection relies on the following input parameters:

  • Arn: The Amazon resource name (ARN) of the queried raster data collection
  • AreaOfInterest: A polygon object (in GeoJSON format) representing the region of interest for the search query
  • TimeRangeFilter: Defines the time range of interest, denoted as {StartTime: <string>, EndTime: <string>}
  • PropertyFilters: Supplementary property filters, such as specifications for maximum acceptable cloud cover, can also be incorporated

This method supports querying from various raster data sources which can be explored by calling ListRasterDataCollections. Our methane detection implementation uses Sentinel-2 satellite imagery, which can be globally referenced using the following ARN: arn:aws:sagemaker-geospatial:us-west-2:378778860802:raster-data-collection/public/nmqj48dcu3g7ayw8.

This ARN represents Sentinel-2 imagery, which has been processed to Level 2A (surface reflectance, atmospherically corrected). For methane detection purposes, we will use top-of-atmosphere (TOA) reflectance data (Level 1C), which doesn’t include the surface level atmospheric corrections that would make changes in aerosol composition and density (that is, methane leaks) undetectable.

To identify potential emissions from a specific point source, we need two input parameters: the coordinates of the suspected point source and a designated timestamp for methane emission monitoring. Given that the SearchRasterDataCollection API uses polygons or multi-polygons to define an area of interest (AOI), our approach involves expanding the point coordinates into a bounding box first and then using that polygon to query for Sentinel-2 imagery using SearchRasterDateCollection.

In this example, we monitor a known methane leak originating from an oil field in Northern Africa. This is a standard validation case in the remote sensing literature and is referenced, for example, in this study. A fully executable code base is provided on the amazon-sagemaker-examples GitHub repository. Here, we highlight only selected code sections that represent the key building blocks for implementing a methane detection solution with SageMaker geospatial capabilities. See the repository for additional details.

We start by initializing the coordinates and target monitoring date for the example case.

#coordinates and date for North Africa oil field
#see here for reference: https://doi.org/10.5194/amt-14-2771-2021
point_longitude = 5.9053
point_latitude = 31.6585
target_date = '2019-11-20'
#size of bounding box in each direction around point
distance_offset_meters = 1500

The following code snippet generates a bounding box for the given point coordinates and then performs a search for the available Sentinel-2 imagery based on the bounding box and the specified monitoring date:

def bbox_around_point(lon, lat, distance_offset_meters):
    #Equatorial radius (km) taken from https://nssdc.gsfc.nasa.gov/planetary/factsheet/earthfact.html
    earth_radius_meters = 6378137
    lat_offset = math.degrees(distance_offset_meters / earth_radius_meters)
    lon_offset = math.degrees(distance_offset_meters / (earth_radius_meters * math.cos(math.radians(lat))))
    return geometry.Polygon([
        [lon - lon_offset, lat - lat_offset],
        [lon - lon_offset, lat + lat_offset],
        [lon + lon_offset, lat + lat_offset],
        [lon + lon_offset, lat - lat_offset],
        [lon - lon_offset, lat - lat_offset],
    ])

#generate bounding box and extract polygon coordinates
aoi_geometry = bbox_around_point(point_longitude, point_latitude, distance_offset_meters)
aoi_polygon_coordinates = geometry.mapping(aoi_geometry)['coordinates']

#set search parameters
search_params = {
    "Arn": "arn:aws:sagemaker-geospatial:us-west-2:378778860802:raster-data-collection/public/nmqj48dcu3g7ayw8", # Sentinel-2 L2 data
    "RasterDataCollectionQuery": {
        "AreaOfInterest": {
            "AreaOfInterestGeometry": {
                "PolygonGeometry": {
                    "Coordinates": aoi_polygon_coordinates
                }
            }
        },
        "TimeRangeFilter": {
            "StartTime": "{}T00:00:00Z".format(as_iso_date(target_date)),
            "EndTime": "{}T23:59:59Z".format(as_iso_date(target_date))
        }
    },
}
#query raster data using SageMaker geospatial capabilities
sentinel2_items = geospatial_client.search_raster_data_collection(**search_params)

The response contains a list of matching Sentinel-2 items and their corresponding metadata. These include Cloud-Optimized GeoTIFFs (COG) for all Sentinel-2 bands, as well as thumbnail images for a quick preview of the visual bands of the image. Naturally, it’s also possible to access the full-resolution satellite image (RGB plot), shown in Figure 2 that follows.

Figure 2Figure 2 – Satellite image (RGB plot) of AOI

As previously detailed, our detection approach relies on fractional changes in top-of-the-atmosphere (TOA) SWIR reflectance. For this to work, the identification of a good baseline is crucial. Finding a good baseline can quickly become a tedious process that involves plenty of trial and error. However, good heuristics can go a long way in automating this search process. A search heuristic that has worked well for cases investigated in the past is as follows: for the past day_offset=n days, retrieve all satellite imagery, remove any clouds and clip the image to the AOI in scope. Then compute the average band-12 reflectance across the AOI. Return the Sentinel tile ID of the image with the highest average reflectance in band-12.

This logic is implemented in the following code excerpt. Its rationale relies on the fact that band-12 is highly sensitive to CH4 absorption (see Figure 1). A greater average reflectance value corresponds to a lower absorption from sources such as methane emissions and therefore provides a strong indication for an emission free baseline scene.

def approximate_best_reference_date(lon, lat, date_to_monitor, distance_offset=1500, cloud_mask=True, day_offset=30):
    
    #initialize AOI and other parameters
    aoi_geometry = bbox_around_point(lon, lat, distance_offset)
    BAND_12_SWIR22 = "B12"
    max_mean_swir = None
    ref_s2_tile_id = None
    ref_target_date = date_to_monitor   
        
    #loop over n=day_offset previous days
    for day_delta in range(-1 * day_offset, 0):
        date_time_obj = datetime.strptime(date_to_monitor, '%Y-%m-%d')
        target_date = (date_time_obj + timedelta(days=day_delta)).strftime('%Y-%m-%d')   
        
        #get Sentinel-2 tiles for current date
        s2_tiles_for_target_date = get_sentinel2_meta_data(target_date, aoi_geometry)
        
        #loop over available tiles for current date
        for s2_tile_meta in s2_tiles_for_target_date:
            s2_tile_id_to_test = s2_tile_meta['Id']
            #retrieve cloud-masked (optional) L1C band 12
            target_band_data = get_s2l1c_band_data_xarray(s2_tile_id_to_test, BAND_12_SWIR22, clip_geometry=aoi_geometry, cloud_mask=cloud_mask)
            #compute mean reflectance of SWIR band
            mean_swir = target_band_data.sum() / target_band_data.count()
            
            #ensure the visible/non-clouded area is adequately large
            visible_area_ratio = target_band_data.count() / (target_band_data.shape[1] * target_band_data.shape[2])
            if visible_area_ratio <= 0.7: #<-- ensure acceptable cloud cover
                continue
            
            #update maximum ref_s2_tile_id and ref_target_date if applicable
            if max_mean_swir is None or mean_swir > max_mean_swir: 
                max_mean_swir = mean_swir
                ref_s2_tile_id = s2_tile_id_to_test
                ref_target_date = target_date

    return (ref_s2_tile_id, ref_target_date)

Using this method allows us to approximate a suitable baseline date and corresponding Sentinel-2 tile ID. Sentinel-2 tile IDs carry information on the mission ID (Sentinel-2A/Sentinel-2B), the unique tile number (such as, 32SKA), and the date the image was taken among other information and uniquely identify an observation (that is, a scene). In our example, the approximation process suggests October 6, 2019 (Sentinel-2 tile: S2B_32SKA_20191006_0_L2A), as the most suitable baseline candidate.

Next, we can compute the corrected fractional change in reflectance between the baseline date and the date we’d like to monitor. The correction factors c (see Equation 1 preceding) can be calculated with the following code:

def compute_correction_factor(tif_y, tif_x):
    
    #get flattened arrays for regression
    y = np.array(tif_y.values.flatten())
    x = np.array(tif_x.values.flatten())
    np.nan_to_num(y, copy=False)
    np.nan_to_num(x, copy=False)
        
    #fit linear model using least squares regression
    x = x[:,np.newaxis] #reshape
    c, _, _, _ = np.linalg.lstsq(x, y, rcond=None)

    return c[0]

The full implementation of Equation 1 is given in the following code snippet:

def compute_corrected_fractional_reflectance_change(l1_b11_base, l1_b12_base, l1_b11_monitor, l1_b12_monitor):
    
    #get correction factors
    c_monitor = compute_correction_factor(tif_y=l1_b11_monitor, tif_x=l1_b12_monitor)
    c_base = compute_correction_factor(tif_y=l1_b11_base, tif_x=l1_b12_base)
    
    #get corrected fractional reflectance change
    frac_change = ((c_monitor*l1_b12_monitor-l1_b11_monitor)/l1_b11_monitor)-((c_base*l1_b12_base-l1_b11_base)/l1_b11_base)
    return frac_change

Finally, we can wrap the above methods into an end-to-end routine that identifies the AOI for a given longitude and latitude, monitoring date and baseline tile, acquires the required satellite imagery, and performs the fractional reflectance change computation.

def run_full_fractional_reflectance_change_routine(lon, lat, date_monitor, baseline_s2_tile_id, distance_offset=1500, cloud_mask=True):
    
    #get bounding box
    aoi_geometry = bbox_around_point(lon, lat, distance_offset)
    
    #get S2 metadata
    s2_meta_monitor = get_sentinel2_meta_data(date_monitor, aoi_geometry)
    
    #get tile id
    grid_id = baseline_s2_tile_id.split("_")[1]
    s2_tile_id_monitor = list(filter(lambda x: f"_{grid_id}_" in x["Id"], s2_meta_monitor))[0]["Id"]
    
    #retrieve band 11 and 12 of the Sentinel L1C product for the given S2 tiles
    l1_swir16_b11_base = get_s2l1c_band_data_xarray(baseline_s2_tile_id, BAND_11_SWIR16, clip_geometry=aoi_geometry, cloud_mask=cloud_mask)
    l1_swir22_b12_base = get_s2l1c_band_data_xarray(baseline_s2_tile_id, BAND_12_SWIR22, clip_geometry=aoi_geometry, cloud_mask=cloud_mask)
    l1_swir16_b11_monitor = get_s2l1c_band_data_xarray(s2_tile_id_monitor, BAND_11_SWIR16, clip_geometry=aoi_geometry, cloud_mask=cloud_mask)
    l1_swir22_b12_monitor = get_s2l1c_band_data_xarray(s2_tile_id_monitor, BAND_12_SWIR22, clip_geometry=aoi_geometry, cloud_mask=cloud_mask)
    
    #compute corrected fractional reflectance change
    frac_change = compute_corrected_fractional_reflectance_change(
        l1_swir16_b11_base,
        l1_swir22_b12_base,
        l1_swir16_b11_monitor,
        l1_swir22_b12_monitor
    )
    
    return frac_change

Running this method with the parameters we determined earlier yields the fractional change in SWIR TOA reflectance as an xarray.DataArray. We can perform a first visual inspection of the result by running a simple plot() invocation on this data array. Our method reveals the presence of a methane plume at the center of the AOI that was undetectable in the RGB plot seen previously.

Figure 3Figure 3 – Fractional reflectance change in TOA reflectance (SWIR spectrum)

As a final step, we extract the identified methane plume and overlay it on a raw RGB satellite image to provide the important geographic context. This is achieved by thresholding, which can be implemented as shown in the following:

def get_plume_mask(change_in_reflectance_tif, threshold_value):
    cr_masked = change_in_reflectance_tif.copy()
    #set values above threshold to nan
    cr_masked[cr_masked > treshold_value] = np.nan 
    #apply mask on nan values
    plume_tif = np.ma.array(cr_masked, mask=cr_masked==np.nan) 
    
    return plume_tif

For our case, a threshold of -0.02 fractional change in reflectance yields good results but this can change from scene to scene and you will have to calibrate this for your specific use case. Figure 4 that follows illustrates how the plume overlay is generated by combining the raw satellite image of the AOI with the masked plume into a single composite image that shows the methane plume in its geographic context.

Figure 4 – RGB image, fractional reflectance change in TOA reflectance (SWIR spectrum), and methane plume overlay for AOI

Figure 4 – RGB image, fractional reflectance change in TOA reflectance (SWIR spectrum), and methane plume overlay for AOI

Solution validation with real-world methane emission events

As a final step, we evaluate our method for its ability to correctly detect and pinpoint methane leakages from a range of sources and geographies. First, we use a controlled methane release experiment specifically designed for the validation of space-based point-source detection and quantification of onshore methane emissions. In this 2021 experiment, researchers performed several methane releases in Ehrenberg, Arizona over a 19-day period. Running our detection method for one of the Sentinel-2 passes during the time of that experiment produces the following result showing a methane plume:

Figure 5Figure 5 – Methane plume intensities for Arizona Controlled Release Experiment

The plume generated during the controlled release is clearly identified by our detection method. The same is true for other known real-world leakages (in Figure 6 that follows) from sources such as a landfill in East Asia (left) or an oil and gas facility in North America (right).

Figure 6Figure 6 – Methane plume intensities for an East Asian landfill (left) and an oil and gas field in North America (right)

In sum, our method can help identify methane emissions both from controlled releases and from various real-world point sources across the globe. This works best for on-shore point sources with limited surrounding vegetation. It does not work for off-shore scenes due to the high absorption (that is, low transmittance) of the SWIR spectrum by water. Given that the proposed detection algorithm relies on variations in methane intensity, our method also requires pre-leakage observations. This can make monitoring of leakages with constant emission rates challenging.

Clean up

To avoid incurring unwanted charges after a methane monitoring job has completed, ensure that you terminate the SageMaker instance and delete any unwanted local files.

Conclusion

By combining SageMaker geospatial capabilities with open geospatial data sources you can implement your own highly customized remote monitoring solutions at scale. This blog post focused on methane detection, a focal area for governments, NGOs and other organizations seeking to detect and ultimately avoid harmful methane emissions. You can get started today in your own journey into geospatial analytics by spinning up a Notebook with the SageMaker geospatial kernel and implement your own detection solution. See the GitHub repository to get started building your own satellite-based methane detection solution. Also check out the sagemaker-examples repository for further examples and tutorials on how to use SageMaker geospatial capabilities in other real-world remote sensing applications.


About the authors

Karsten SchroerDr. Karsten Schroer is a Solutions Architect at AWS. He supports customers in leveraging data and technology to drive sustainability of their IT infrastructure and build cloud-native data-driven solutions that enable sustainable operations in their respective verticals. Karsten joined AWS following his PhD studies in applied machine learning & operations management. He is truly passionate about technology-enabled solutions to societal challenges and loves to dive deep into the methods and application architectures that underlie these solutions.

Janosch WoschitzJanosch Woschitz is a Senior Solutions Architect at AWS, specializing in geospatial AI/ML. With over 15 years of experience, he supports customers globally in leveraging AI and ML for innovative solutions that capitalize on geospatial data. His expertise spans machine learning, data engineering, and scalable distributed systems, augmented by a strong background in software engineering and industry expertise in complex domains such as autonomous driving.

Read More