June 2023 – Page 5

Deep Learning Digs Deep: AI Unveils New Large-Scale Images in Peruvian Desert

Researchers at Yamagata University in Japan have harnessed AI to uncover four previously unseen geoglyphs — images on the ground, some as wide as 1,200 feet, made using the land’s elements — in Nazca, a seven-hour drive south of Lima, Peru.

The geoglyphs — a humanoid, a pair of legs, a fish and a bird — were revealed using a deep learning model, making the discovery process significantly faster than traditional archaeological methods.

The team’s deep learning model training was executed on an IBM Power Systems server with an NVIDIA GPU.

Using open-source deep learning software, the researchers analyzed high-resolution aerial photographs, a technique that was part of a study that began in November 2019.

Published this month in the Journal of Archaeological Science, the study confirms the deep learning model’s findings through onsite surveys and highlights the potential of AI in accelerating archaeological discoveries.

The deep learning techniques that comprise the hallmark of modern AI are used for various archeological efforts, whether analyzing ancient scrolls discovered across the Mediterranean or categorizing pottery sherds from the American Southwest.

The Nazca lines, a series of ancient geoglyphs that date from 500 B.C. to 500 A.D. — primarily likely from 100 B.C. to 300 A.D. — were created by removing darker stones on the desert floor to reveal lighter-colored sand beneath.

The drawings — depicting animals, plants, geometric shapes and more — are thought to have had religious or astronomical significance to the Nazca people who created them.

The discovery of these new geoglyphs indicates the possibility of more undiscovered sites in the area.

And it underscores how technology like deep learning can enhance archaeological exploration, providing a more efficient approach to uncovering hidden archaeological sites.

Read the full paper.

Featured image courtesy of Wikimedia Commons.

Association for Computational Linguistics (ACL) 2023

Apple Machine Learning Research

Our support for early warning systems

Google’s Yossi Matias and WMO Director of Infrastructure Anthony Rea discuss the Early Warnings For All Initiative.Read More

SoundStorm: Efficient parallel audio generation

Posted by Zalán Borsos, Research Software Engineer, and Marco Tagliasacchi, Senior Staff Research Scientist, Google Research

The recent progress in generative AI unlocked the possibility of creating new content in several different domains, including text, vision and audio. These models often rely on the fact that raw data is first converted to a compressed format as a sequence of tokens. In the case of audio, neural audio codecs (e.g., SoundStream or EnCodec) can efficiently compress waveforms to a compact representation, which can be inverted to reconstruct an approximation of the original audio signal. Such a representation consists of a sequence of discrete audio tokens, capturing the local properties of sounds (e.g., phonemes) and their temporal structure (e.g., prosody). By representing audio as a sequence of discrete tokens, audio generation can be performed with Transformer-based sequence-to-sequence models — this has unlocked rapid progress in speech continuation (e.g., with AudioLM), text-to-speech (e.g., with SPEAR-TTS), and general audio and music generation (e.g., AudioGen and MusicLM). Many generative audio models, including AudioLM, rely on auto-regressive decoding, which produces tokens one by one. While this method achieves high acoustic quality, inference (i.e., calculating an output) can be slow, especially when decoding long sequences.

To address this issue, in “SoundStorm: Efficient Parallel Audio Generation”, we propose a new method for efficient and high-quality audio generation. SoundStorm addresses the problem of generating long audio token sequences by relying on two novel elements: 1) an architecture adapted to the specific nature of audio tokens as produced by the SoundStream neural codec, and 2) a decoding scheme inspired by MaskGIT, a recently proposed method for image generation, which is tailored to operate on audio tokens. Compared to the autoregressive decoding approach of AudioLM, SoundStorm is able to generate tokens in parallel, thus decreasing the inference time by 100x for long sequences, and produces audio of the same quality and with higher consistency in voice and acoustic conditions. Moreover, we show that SoundStorm, coupled with the text-to-semantic modeling stage of SPEAR-TTS, can synthesize high-quality, natural dialogues, allowing one to control the spoken content (via transcripts), speaker voices (via short voice prompts) and speaker turns (via transcript annotations), as demonstrated by the examples below:

Input: Text (transcript used to drive the audio generation in bold)	Something really funny happened to me this morning. \| Oh wow, what? \| Well, uh I woke up as usual. \| Uhhuh \| Went downstairs to have uh breakfast. \| Yeah \| Started eating. Then uh 10 minutes later I realized it was the middle of the night. \| Oh no way, that’s so funny!	I didn’t sleep well last night. \| Oh, no. What happened? \| I don’t know. I I just couldn’t seem to uh to fall asleep somehow, I kept tossing and turning all night. \| That’s too bad. Maybe you should uh try going to bed earlier tonight or uh maybe you could try reading a book. \| Yeah, thanks for the suggestions, I hope you’re right. \| No problem. I I hope you get a good night’s sleep

Input: Audio prompt

Output: Audio prompt + generated audio

SoundStorm design

In our previous work on AudioLM, we showed that audio generation can be decomposed into two steps: 1) semantic modeling, which generates semantic tokens from either previous semantic tokens or a conditioning signal (e.g., a transcript as in SPEAR-TTS, or a text prompt as in MusicLM), and 2) acoustic modeling, which generates acoustic tokens from semantic tokens. With SoundStorm we specifically address this second, acoustic modeling step, replacing slower autoregressive decoding with faster parallel decoding.

SoundStorm relies on a bidirectional attention-based Conformer, a model architecture that combines a Transformer with convolutions to capture both local and global structure of a sequence of tokens. Specifically, the model is trained to predict audio tokens produced by SoundStream given a sequence of semantic tokens generated by AudioLM as input. When doing this, it is important to take into account the fact that, at each time step t, SoundStream uses up to Q tokens to represent the audio using a method known as residual vector quantization (RVQ), as illustrated below on the right. The key intuition is that the quality of the reconstructed audio progressively increases as the number of generated tokens at each step goes from 1 to Q.

At inference time, given the semantic tokens as input conditioning signal, SoundStorm starts with all audio tokens masked out, and fills in the masked tokens over multiple iterations, starting from the coarse tokens at RVQ level q = 1 and proceeding level-by-level with finer tokens until reaching level q = Q.

There are two crucial aspects of SoundStorm that enable fast generation: 1) tokens are predicted in parallel during a single iteration within a RVQ level and, 2) the model architecture is designed in such a way that the complexity is only mildly affected by the number of levels Q. To support this inference scheme, during training a carefully designed masking scheme is used to mimic the iterative process used at inference.

SoundStorm model architecture. T denotes the number of time steps and Q the number of RVQ levels used by SoundStream. The semantic tokens used as conditioning are time-aligned with the SoundStream frames.

Measuring SoundStorm performance

We demonstrate that SoundStorm matches the quality of AudioLM’s acoustic generator, replacing both AudioLM’s stage two (coarse acoustic model) and stage three (fine acoustic model). Furthermore, SoundStorm produces audio 100x faster than AudioLM’s hierarchical autoregressive acoustic generator (top half below) with matching quality and improved consistency in terms of speaker identity and acoustic conditions (bottom half below).

Runtimes of SoundStream decoding, SoundStorm and different stages of AudioLM on a TPU-v4.

Acoustic consistency between the prompt and the generated audio. The shaded area represents the inter-quartile range.

Safety and risk mitigation

We acknowledge that the audio samples produced by the model may be influenced by the unfair biases present in the training data, for instance in terms of represented accents and voice characteristics. In our generated samples, we demonstrate that we can reliably and responsibly control speaker characteristics via prompting, with the goal of avoiding unfair biases. A thorough analysis of any training data and its limitations is an area of future work in line with our responsible AI Principles.

In turn, the ability to mimic a voice can have numerous malicious applications, including bypassing biometric identification and using the model for the purpose of impersonation. Thus, it is crucial to put in place safeguards against potential misuse: to this end, we have verified that the audio generated by SoundStorm remains detectable by a dedicated classifier using the same classifier as described in our original AudioLM paper. Hence, as a component of a larger system, we believe that SoundStorm would be unlikely to introduce additional risks to those discussed in our earlier papers on AudioLM and SPEAR-TTS. At the same time, relaxing the memory and computational requirements of AudioLM would make research in the domain of audio generation more accessible to a wider community. In the future, we plan to explore other approaches for detecting synthesized speech, e.g., with the help of audio watermarking, so that any potential product usage of this technology strictly follows our responsible AI Principles.

Conclusion

We have introduced SoundStorm, a model that can efficiently synthesize high-quality audio from discrete conditioning tokens. When compared to the acoustic generator of AudioLM, SoundStorm is two orders of magnitude faster and achieves higher temporal consistency when generating long audio samples. By combining a text-to-semantic token model similar to SPEAR-TTS with SoundStorm, we can scale text-to-speech synthesis to longer contexts and generate natural dialogues with multiple speaker turns, controlling both the voices of the speakers and the generated content. SoundStorm is not limited to generating speech. For example, MusicLM uses SoundStorm to synthesize longer outputs efficiently (as seen at I/O).

Acknowledgments

The work described here was authored by Zalán Borsos, Matt Sharifi, Damien Vincent, Eugene Kharitonov, Neil Zeghidour and Marco Tagliasacchi. We are grateful for all discussions and feedback on this work that we received from our colleagues at Google.

How Light & Wonder built a predictive maintenance solution for gaming machines on AWS

This post is co-written with Aruna Abeyakoon and Denisse Colin from Light and Wonder (L&W).

Headquartered in Las Vegas, Light & Wonder, Inc. is the leading cross-platform global game company that provides gambling products and services. Working with AWS, Light & Wonder recently developed an industry-first secure solution, Light & Wonder Connect (LnW Connect), to stream telemetry and machine health data from roughly half a million electronic gaming machines distributed across its casino customer base globally when LnW Connect reaches its full potential. Over 500 machine events are monitored in near-real time to give a full picture of machine conditions and their operating environments. Utilizing data streamed through LnW Connect, L&W aims to create better gaming experience for their end-users as well as bring more value to their casino customers.

Light & Wonder teamed up with the Amazon ML Solutions Lab to use events data streamed from LnW Connect to enable machine learning (ML)-powered predictive maintenance for slot machines. Predictive maintenance is a common ML use case for businesses with physical equipment or machinery assets. With predictive maintenance, L&W can get advanced warning of machine breakdowns and proactively dispatch a service team to inspect the issue. This will reduce machine downtime and avoid significant revenue loss for casinos. With no remote diagnostic system in place, issue resolution by the Light & Wonder service team on the casino floor can be costly and inefficient, while severely degrading the customer gaming experience.

The nature of the project is highly exploratory—this is the first attempt at predictive maintenance in the gaming industry. The Amazon ML Solutions Lab and L&W team embarked on an end-to-end journey from formulating the ML problem and defining the evaluation metrics, to delivering a high-quality solution. The final ML model combines CNN and Transformer, which are the state-of-the-art neural network architectures for modeling sequential machine log data. The post presents a detailed description of this journey, and we hope you will enjoy it as much as we do!

In this post, we discuss the following:

How we formulated the predictive maintenance problem as an ML problem with a set of appropriate metrics for evaluation
How we prepared data for training and testing
Data preprocessing and feature engineering techniques we employed to obtain performant models
Performing a hyperparameter tuning step with Amazon SageMaker Automatic Model Tuning
Comparisons between the baseline model and the final CNN+Transformer model
Additional techniques we used to improve model performance, such as ensembling

Background

In this section, we discuss the issues that necessitated this solution.

Dataset

Slot machine environments are highly regulated and are deployed in an air-gapped environment. In LnW Connect, an encryption process was designed to provide a secure and reliable mechanism for the data to be brought into an AWS data lake for predictive modeling. The aggregated files are encrypted and the decryption key is only available in AWS Key Management Service (AWS KMS). A cellular-based private network into AWS is set up through which the files were uploaded into Amazon Simple Storage Service (Amazon S3).

LnW Connect streams a wide range of machine events, such as start of game, end of game, and more. The system collects over 500 different types of events. As shown in the following
, each event is recorded along with a timestamp of when it happened and the ID of the machine recording the event. LnW Connect also records when a machine enters a non-playable state, and it will be marked as a machine failure or breakdown if it doesn’t recover to a playable state within a sufficiently short time span.

Machine ID	Event Type ID	Timestamp
0	E1	2022-01-01 00:17:24
0	E3	2022-01-01 00:17:29
1000	E4	2022-01-01 00:17:33
114	E234	2022-01-01 00:17:34
222	E100	2022-01-01 00:17:37

In addition to dynamic machine events, static metadata about each machine is also available. This includes information such as machine unique identifier, cabinet type, location, operating system, software version, game theme, and more, as shown in the following table. (All the names in the table are anonymized to protect customer information.)

Machine ID	Cabinet Type	OS	Location	Game Theme
276	A	OS_Ver0	AA Resort & Casino	StormMaiden
167	B	OS_Ver1	BB Casino, Resort & Spa	UHMLIndia
13	C	OS_Ver0	CC Casino & Hotel	TerrificTiger
307	D	OS_Ver0	DD Casino Resort	NeptunesRealm
70	E	OS_Ver0	EE Resort & Casino	RLPMealTicket

Problem definition

We treat the predictive maintenance problem for slot machines as a binary classification problem. The ML model takes in the historical sequence of machine events and other metadata and predicts whether a machine will encounter a failure in a 6-hour future time window. If a machine will break down within 6 hours, it is deemed a high-priority machine for maintenance. Otherwise, it is low priority. The following figure gives examples of low-priority (top) and high-priority (bottom) samples. We use a fixed-length look-back time window to collect historical machine event data for prediction. Experiments show that longer look-back time windows improve model performance significantly (more details later in this post).

Modeling challenges

We faced a couple of challenges solving this problem:

We have a huge amount event logs that contain around 50 million events a month (from approximately 1,000 game samples). Careful optimization is needed in the data extraction and preprocessing stage.
Event sequence modeling was challenging due to the extremely uneven distribution of events over time. A 3-hour window can contain anywhere from tens to thousands of events.
Machines are in a good state most of the time and the high-priority maintenance is a rare class, which introduced a class imbalance issue.
New machines are added continuously to the system, so we had to make sure our model can handle prediction on new machines that have never been seen in training.

Data preprocessing and feature engineering

In this section, we discuss our methods for data preparation and feature engineering.

Feature engineering

Slot machine feeds are streams of unequally spaced time series events; for example, the number of events in a 3-hour window can range from tens to thousands. To handle this imbalance, we used event frequencies instead of the raw sequence data. A straightforward approach is aggregating the event frequency for the entire look-back window and feeding it into the model. However, when using this representation, the temporal information is lost, and the order of events is not preserved. We instead used temporal binning by dividing the time window into N equal sub-windows and calculating the event frequencies in each. The final features of a time window are the concatenation of all its sub-window features. Increasing the number of bins preserves more temporal information. The following figure illustrates temporal binning on a sample window.

First, the sample time window is split into two equal sub-windows (bins); we used only two bins here for simplicity for illustration. Then, the counts of the events E1, E2, E3, and E4 are calculated in each bin. Lastly, they are concatenated and used as features.

Along with the event frequency-based features, we used machine-specific features like software version, cabinet type, game theme, and game version. Additionally, we added features related to the timestamps to capture the seasonality, such as hour of the day and day of the week.

Data preparation

To extract data efficiently for training and testing, we utilize Amazon Athena and the AWS Glue Data Catalog. The events data is stored in Amazon S3 in Parquet format and partitioned according to day/month/hour. This facilitates efficient extraction of data samples within a specified time window. We use data from all machines in the latest month for testing and the rest of the data for training, which helps avoid potential data leakage.

ML methodology and model training

In this section, we discuss our baseline model with AutoGluon and how we built a customized neural network with SageMaker automatic model tuning.

Building a baseline model with AutoGluon

With any ML use case, it’s important to establish a baseline model to be used for comparison and iteration. We used AutoGluon to explore several classic ML algorithms. AutoGluon is easy-to-use AutoML tool that uses automatic data processing, hyperparameter tuning, and model ensemble. The best baseline was achieved with a weighted ensemble of gradient boosted decision tree models. The ease of use of AutoGluon helped us in the discovery stage to navigate quickly and efficiently through a wide range of possible data and ML modeling directions.

Building and tuning a customized neural network model with SageMaker automatic model tuning

After experimenting with different neural networks architectures, we built a customized deep learning model for predictive maintenance. Our model surpassed the AutoGluon baseline model by 121% in recall at 80% precision. The final model ingests historical machine event sequence data, time features such as hour of the day, and static machine metadata. We utilize SageMaker automatic model tuning jobs to search for the best hyperparameters and model architectures.

The following figure shows the model architecture. We first normalize the binned event sequence data by average frequencies of each event in the training set to remove the overwhelming effect of high-frequency events (start of game, end of game, and so on). The embeddings for individual events are learnable, while the temporal feature embeddings (day of the week, hour of the day) are extracted using the package GluonTS. Then we concatenate the event sequence data with the temporal feature embeddings as the input to the model. The model consists of the following layers:

Convolutional layers (CNN) – Each CNN layer consists of two 1-dimensional convolutional operations with residual connections. The output of each CNN layer has the same sequence length as the input to allow for easy stacking with other modules. The total number of CNN layers is a tunable hyperparameter.
Transformer encoder layers (TRANS) – The output of the CNN layers is fed together with the positional encoding to a multi-head self-attention structure. We use TRANS to directly capture temporal dependencies instead of using recurrent neural networks. Here, binning of the raw sequence data (reducing length from thousands to hundreds) helps alleviate the GPU memory bottlenecks, while keeping the chronological information to a tunable extent (the number of the bins is a tunable hyperparameter).
Aggregation layers (AGG) – The final layer combines the metadata information (game theme type, cabinet type, locations) to produce the priority level probability prediction. It consists of several pooling layers and fully connected layers for incremental dimension reduction. The multi-hot embeddings of metadata are also learnable, and don’t go through CNN and TRANS layers because they don’t contain sequential information.

We use the cross-entropy loss with class weights as tunable hyperparameters to adjust for the class imbalance issue. In addition, the numbers of CNN and TRANS layers are crucial hyperparameters with the possible values of 0, which means specific layers may not always exist in the model architecture. This way, we have a unified framework where the model architectures are searched along with other usual hyperparameters.

We utilize SageMaker automatic model tuning, also known as hyperparameter optimization (HPO), to efficiently explore model variations and the large search space of all hyperparameters. Automatic model tuning receives the customized algorithm, training data, and hyperparameter search space configurations, and searches for best hyperparameters using different strategies such as Bayesian, Hyperband, and more with multiple GPU instances in parallel. After evaluating on a hold-out validation set, we obtained the best model architecture with two layers of CNN, one layer of TRANS with four heads, and an AGG layer.

We used the following hyperparameter ranges to search for the best model architecture:

hyperparameter_ranges = {
# Learning Rate
"learning_rate": ContinuousParameter(5e-4, 1e-3, scaling_type="Logarithmic"),
# Class weights
"loss_weight": ContinuousParameter(0.1, 0.9),
# Number of input bins
"num_bins": CategoricalParameter([10, 40, 60, 120, 240]),
# Dropout rate
"dropout_rate": CategoricalParameter([0.1, 0.2, 0.3, 0.4, 0.5]),
# Model embedding dimension
"dim_model": CategoricalParameter([160,320,480,640]),
# Number of CNN layers
"num_cnn_layers": IntegerParameter(0,10),
# CNN kernel size
"cnn_kernel": CategoricalParameter([3,5,7,9]),
# Number of tranformer layers
"num_transformer_layers": IntegerParameter(0,4),
# Number of transformer attention heads
"num_heads": CategoricalParameter([4,8]),
#Number of RNN layers
"num_rnn_layers": IntegerParameter(0,10), # optional
# RNN input dimension size
"dim_rnn":CategoricalParameter([128,256])
}

To further improve model accuracy and reduce model variance, we trained the model with multiple independent random weight initializations, and aggregated the result with mean values as the final probability prediction. There is a trade-off between more computing resources and better model performance, and we observed that 5–10 should be a reasonable number in the current use case (results shown later in this post).

Model performance results

In this section, we present the model performance evaluation metrics and results.

Evaluation metrics

Precision is very important for this predictive maintenance use case. Low precision means reporting more false maintenance calls, which drives costs up through unnecessary maintenance. Because average precision (AP) doesn’t fully align with the high precision objective, we introduced a new metric named average recall at high precisions (ARHP). ARHP is equal to the average of recalls at 60%, 70%, and 80% precision points. We also used precision at top K% (K=1, 10), AUPR, and AUROC as additional metrics.

Results

The following table summarizes the results using the baseline and the customized neural network models, with 7/1/2022 as the train/test split point. Experiments show that increasing the window length and sample data size both improve the model performance, because they contain more historical information to help with the prediction. Regardless of the data settings, the neural network model outperforms AutoGluon in all metrics. For example, recall at the fixed 80% precision is increased by 121%, which enables you to quickly identify more malfunctioned machines if using the neural network model.

Model	Window length/Data size	AUROC	AUPR	ARHP	Recall@Prec0.6	Recall@Prec0.7	Recall@Prec0.8	Prec@top1%	Prec@top10%
AutoGluon baseline	12H/500k	66.5	36.1	9.5	12.7	9.3	6.5	85	42
Neural Network	12H/500k	74.7	46.5	18.5	25	18.1	12.3	89	55
AutoGluon baseline	48H/1mm	70.2	44.9	18.8	26.5	18.4	11.5	92	55
Neural Network	48H/1mm	75.2	53.1	32.4	39.3	32.6	25.4	94	65

The following figures illustrate the effect of using ensembles to boost the neural network model performance. All the evaluation metrics shown on the x-axis are improved, with higher mean (more accurate) and lower variance (more stable). Each box-plot is from 12 repeated experiments, from no ensembles to 10 models in ensembles (x-axis). Similar trends persist in all metrics besides the Prec@top1% and Recall@Prec80% shown.

After factoring in the computational cost, we observe that using 5–10 models in ensembles is suitable for Light & Wonder datasets.

Conclusion

Our collaboration has resulted in the creation of a groundbreaking predictive maintenance solution for the gaming industry, as well as a reusable framework that could be utilized in a variety of predictive maintenance scenarios. The adoption of AWS technologies such as SageMaker automatic model tuning facilitates Light & Wonder to navigate new opportunities using near-real-time data streams. Light & Wonder is starting the deployment on AWS.

If you would like help accelerating the use of ML in your products and services, please contact the Amazon ML Solutions Lab program.

About the authors

Aruna Abeyakoon is the Senior Director of Data Science & Analytics at Light & Wonder Land-based Gaming Division. Aruna leads the industry-first Light & Wonder Connect initiative and supports both casino partners and internal stakeholders with consumer behavior and product insights to make better games, optimize product offerings, manage assets, and health monitoring & predictive maintenance.

Denisse Colin is a Senior Data Science Manager at Light & Wonder, a leading cross-platform global game company. She is a member of the Gaming Data & Analytics team helping develop innovative solutions to improve product performance and customers’ experiences through Light & Wonder Connect.

Tesfagabir Meharizghi is a Data Scientist at the Amazon ML Solutions Lab where he helps AWS customers across various industries such as gaming, healthcare and life sciences, manufacturing, automotive, and sports and media, accelerate their use of machine learning and AWS cloud services to solve their business challenges.

Mohamad Aljazaery is an applied scientist at Amazon ML Solutions Lab. He helps AWS customers identify and build ML solutions to address their business challenges in areas such as logistics, personalization and recommendations, computer vision, fraud prevention, forecasting and supply chain optimization.

Yawei Wang is an Applied Scientist at the Amazon ML Solution Lab. He helps AWS business partners identify and build ML solutions to address their organization’s business challenges in a real-world scenario.

Yun Zhou is an Applied Scientist at the Amazon ML Solutions Lab, where he helps with research and development to ensure the success of AWS customers. He works on pioneering solutions for various industries using statistical modeling and machine learning techniques. His interest includes generative models and sequential data modeling.

Panpan Xu is a Applied Science Manager with the Amazon ML Solutions Lab at AWS. She is working on research and development of Machine Learning algorithms for high-impact customer applications in a variety of industrial verticals to accelerate their AI and cloud adoption. Her research interest includes model interpretability, causal analysis, human-in-the-loop AI and interactive data visualization.

Raj Salvaji leads Solutions Architecture in the Hospitality segment at AWS. He works with hospitality customers by providing strategic guidance, technical expertise to create solutions to complex business challenges. He draws on 25 years of experience in multiple engineering roles across Hospitality, Finance and Automotive industries.

Shane Rai is a Principal ML Strategist with the Amazon ML Solutions Lab at AWS. He works with customers across a diverse spectrum of industries to solve their most pressing and innovative business needs using AWS’s breadth of cloud-based AI/ML services.

Scientists Improve Delirium Detection Using AI and Rapid-Response EEGs

Detecting delirium isn’t easy, but it can have a big payoff: speeding essential care to patients, leading to quicker and surer recovery.

Improved detection also reduces the need for long-term skilled care, enhancing the quality of life for patients while decreasing a major financial burden. In the U.S., caring for those suffering from delirium costs up to $64,000 a year per patient, according to the National Institutes of Health.

In a paper published last month in Nature, researchers describe how they used a deep learning model called Vision Transformer, accelerated by NVIDIA GPUs, alongside a rapid-response electroencephalogram, or EEG, device to detect delirium in critically ill older adults.

The paper, called “Supervised deep learning with vision transformer predicts delirium using limited lead EEG,” is authored by Malissa Mulkey of the University of South Carolina, Huyunting Huang of Purdue University, Thomas Albanese and Sunghan Kim of the University of East Carolina, and Baijian Yang of Purdue.

Their innovative approach achieved a testing accuracy rate of 97%, promising a potential breakthrough in forecasting dementia. And by harnessing AI and EEGs, the researchers could objectively evaluate prevention and treatment methods, leading to better care.

This impressive result is due in part to the accelerated performance of NVIDIA GPUs, enabling the researchers to accomplish their tasks in half the time compared to CPUs.

Delirium affects up to 80% of critically ill patients. Yet conventional clinical detection methods identify fewer than 40% of cases — representing a significant gap in patient care. Presently, screening ICU patients involves a subjective bedside assessment.

The introduction of handheld EEG devices could make screening more accurate and affordable, but the lack of skilled technicians and neurologists poses a challenge.

The use of AI, however, can eliminate the need for a neurologist to interpret findings and allow for the detection of changes associated with delirium roughly two days before symptom onset, when patients are more receptive to treatment. It also makes it possible to use EEGs with minimal training.

The researchers applied an AI model called ViT, initially created for natural language processing and accelerated by NVIDIA GPUs, to EEG data — offering a fresh approach to data interpretation.

The use of a handheld rapid-response EEG device, which doesn’t require large EEG machines or specialized technicians, was another noteworthy study finding.

This practical tool, combined with advanced AI models for interpreting the data they collect, could streamline delirium screenings in critical care units.

The research presents a promising method for delirium detection that could shorten hospital stays, increase discharge rates, decrease mortality rates and reduce the financial burden associated with delirium.

By integrating the power of NVIDIA GPUs with innovative deep learning models and practical medical devices, this study underlines the transformative potential of technology in enhancing patient care.

As AI grows and develops, medical professionals are increasingly likely to rely on it to forecast conditions like dementia and intervene early, revolutionizing the future of critical care.

Read the full paper.

Yash Kanoria wins SIGecom Test of Time Award

Kanoria and coauthors honored for their paper narrowing the gap between theoretical understanding and practical experience in matching markets.Read More

A Golden Age: ‘Age of Empires III’ Joins GeForce NOW

Conquer the lands in Microsoft’s award-winning Age of Empires III: Definitive Edition. It leads 10 new games supported today on GeForce NOW.

At Your Command

Age of Empires III on GeForce NOW — *Stream battles all from the cloud.*

Age of Empires III: Definitive Edition is a remaster of one of the most beloved real-time strategy franchises featuring improved visuals, enhanced gameplay, cross-platform multiplayer and more. Command mighty civilizations from across Europe and the Americas or jump to the battlefields of Asia. Members can experience two new game modes: Historical Battles and The Art of War Challenge Missions. Two new nations also join this edition — Sweden and the Inca — each with advantages for conquering the New World.

Build an empire today and stream across devices in glorious 4K resolution with an Ultimate membership.

Conquer Your Games List

Conqueror's Blade on GeForce NOW — *Master the art of siege tactics in “*Conqueror’s Blade” *this week.*

The GeForce NOW library is always expanding. Take a look at the 10 newly supported games this week.

Aliens: Dark Descent (New release on Steam, June 20)
Trepang2 (New release on Steam, June 21)
Forever Skies (New release on Steam, June 22)
Age of Empires III: Definitive Edition (Steam)
A.V.A Global (Steam)
Bloons TD 6 (Steam)
Conqueror’s Blade (Steam)
Layers of Fear (Steam)
Park Beyond (Steam)
Tom Clancy’s Rainbow Six Extraction (Steam)

Before diving into the weekend, let us know your answer to our question of the week on Twitter or in the comments below. Happy streaming!

You’ve been chosen to build the greatest empire in history.

What time period are you choosing to build it in?

— NVIDIA GeForce NOW (@NVIDIAGFN) June 14, 2023

Collaborators: Renewable energy storage with Bichlien Nguyen and David Kwabi

black and white photos of Microsoft Principal Researcher Dr. Bichlien Nguyen and Dr. David Kwabi, Assistant Professor of Mechanical Engineering at the University of Michigan, next to the Microsoft Research Podcast

Episode 141 | June 22, 2023

Transforming research ideas into meaningful impact is no small feat. It often requires the knowledge and experience of individuals from across disciplines and institutions. Collaborators, a new Microsoft Research Podcast series, explores the relationships—both expected and unexpected—behind the projects, products, and services being pursued and delivered by researchers at Microsoft and the diverse range of people they’re teaming up with.

In this episode, Microsoft Principal Researcher Dr. Bichlien Nguyen and Dr. David Kwabi, Assistant Professor of Mechanical Engineering at the University of Michigan, join host Dr. Gretchen Huizinga to talk about how their respective research interests—and those of their larger teams—are converging to develop renewable energy storage systems. They specifically explore their work in flow batteries and how machine learning can help more effectively search the vast organic chemistry space to identify compounds with properties just right for storing waterpower and other renewables for a not rainy day. The bonus? These new compounds may just help advance carbon capture, too.

Learn more:

Microsoft Climate Research Initiative
Collaboration page
Project Zerix
Project page
Project Carbonix
Project page
Kwabi Lab, University of Michigan
Lab page
Understanding capacity fade in organic redox-flow batteries by combining spectroscopy with statistical inference techniques
Publication, June 2023

Transcript

[MUSIC PLAYS UNDER DIALOGUE]

DAVID KWABI: I’m a mechanical engineer who sort of likes to hang out with chemists.

BICHLIEN NGUYEN: I’m an organic chemist by training, and I dabble in machine learning. Bryan’s a computational chemist who dabbles in flow cell works. Anne is, uh, a purely synthetic chemist who dabbles in almost all of our aspects.

KWABI: There’s really interesting synergies that show up just because there’s people, you know, coming from very different backgrounds.

NGUYEN: Because we have overlap, we, we have lower, I’m going to call it an activation barrier, in terms of the language we speak.

[MUSIC]

GRETCHEN HUIZINGA: You’re listening to Collaborators, a Microsoft Research podcast showcasing the range of expertise that goes into transforming mind-blowing ideas into world-changing technologies. I’m Dr. Gretchen Huizinga.

[MUSIC ENDS]

Today I’m talking to Dr. Bichlien Nguyen, a Principal Researcher at Microsoft Research, and Dr. David Kwabi, an Assistant Professor of Mechanical Engineering at the University of Michigan. Bichlien and David are collaborating on a fascinating project under the umbrella of the Microsoft Climate Research Initiative that brings organic chemistry and machine learning together to discover new forms of renewable energy storage. Before we unpack the “computational design and characterization of organic electrolytes for flow batteries and carbon capture,” let’s meet our collaborators.

Bichlien, I’ll start with you. Give us a bit more detail on what you do at Microsoft Research and the broader scope and mission of the Microsoft Climate Research Initiative.

BICHLIEN NGUYEN: Thanks so much, Gretchen, for the introduction. Um, so I guess I’ll start with my background. I have a background in organic electric chemistry, so it’s quite fitting, and as a researcher at Microsoft, really, it’s my job, uh, to come up with the newest technologies and keep abreast of what is happening around me so that I can actually, uh, fuse those different technology streams together and create something that’s, um, really valuable. And so as part of that, uh, the Microsoft Climate Research Initiative was really where a group of researchers came together and said, “How can we use the resources, the computational resources, and expertise at Microsoft to enable new technologies that will allow us to get to carbon negative by the year 2050? How can we do that?” And that, you know, as part of that, um, I just want to throw out that the Microsoft Climate Research Initiative really is focusing on three pillars, right. The three pillars are being carbon accounting, because if you don’t know how much carbon is in the atmosphere, you can’t really, uh, do much to remedy it, right, if you don’t know what’s there. The other one is climate resilience. So how do people get affected by climate change, and how do we overcome that, and how can we help that with technology? And then the third is materials engineering, where, that’s where I sit in the Microsoft Climate Research Initiative, and that’s more of how do we either develop technologies that, um, are used to capture and store carbon, uh, or are used to enable the green energy transition?

HUIZINGA: So do you find yourself spread across those three? You say the last one is really where your focus is, but do you dip your toe in the other areas, as well?

NGUYEN: I love dipping my toe in all the areas because I think they’re all important, right? They’re all important. We have to really understand what the environmental impacts of all the materials, for example, that we’re making are. I mean, it’s so … so carbon accounting is really important. Environmental accounting is very important. Um, and then people are the ones that form the core, right? Why are we, do … why do we do what we do? It’s because we want to make sure that we can enable people and solve their problems.

HUIZINGA: Yeah, when you talk about carbon accounting and why you’re doing it, it makes me think about when you have to go on a diet and the doctor says, “You have to get really honest about what you’re eating. Don’t, don’t fake it.” [LAUGHS] So, David, you’re a professor at the University of Michigan, and you run the eponymous Kwabi Lab there. Tell us about your work in general. What are your research interests? Who do you work with, and what excites you most about what you do?

DAVID KWABI: Yeah, happy to! Thank you for the introduction and, and for having me on, on here today. So, um, uh, so as you said, I run the Kwabi Lab here at the University of Michigan, and the sort of headline in terms of what we’re interested in doing is that we like to design and study batteries that can store lots of renewable electricity, uh, on the grid, so, so that’s kind of our mission. Um, that’s not quite all of what we do, but it’s, uh, it’s how I like to describe it. Um, and the motivation, of course, is … comes back to what Bichlien just mentioned, is this need for us to transition from carbon-intensive, uh, ways of producing energy to renewables. And the thing about renewables is that they’re intermittent, so solar and wind aren’t there all the time. You need to find a way to store all that energy and store it cheaply for us to really make, make a dent, um, in carbon emissions from energy production. So we work on, on building systems or energy storage systems that can meet that goal, that can accomplish that task.

HUIZINGA: Both of you talked about having larger teams that support the work you’re doing or collaborate with you two as collaborators. Do you want to talk about the size and scope of those teams or, um, you know, this collaboration across collaboration?

KWABI: Yeah, so I could start with that. So my group, like you said, we’re in the mechanical engineering department, so we really are, um, we call ourselves electric chemical engineers, and electric chemistry is the science of batteries, but it’s a science of lots of other things besides that, but the interesting thing about energy storage systems or batteries in general is that you need to build and put these systems together, but they’re made of lots of different materials. And so what we like to do in my group is build and put together these systems and then essentially figure out how they perform, right. Uh, try to explore performance limits as a function of different chemistries and system configurations and so on. But the hope then is that this can inform materials chemists and computationalists in terms of what materials we want to make next, sort of, so, so there’s a lot of need for collaboration and interdisciplinary knowledge to, to make progress here.

HUIZINGA: Yeah. Bichlien, how about you in terms of the umbrella that you’re under at Microsoft Research?

NGUYEN: There are so many different disciplines, um, within Microsoft Research, but also with the team that we’re working, you know, with David. So we have actually two other collaborators from two different, I guess, departments. There’s the chemical engineering department, which Bryan Goldsmith is part of, and Anne McNeil, I believe, is part of the chemistry department. And you know, for this particular project, on flow battery electrolytes for energy storage, um, we do need a multidisciplinary team, right? We, we need to go from the atomistic, you know, simulation level all the way to the full system level. And I think that’s, that’s where, you know, that’s important.

HUIZINGA: Now that we’re on the topic of this collaboration, let’s talk about how it came about. I like to call this “how I met your mother.” Um, what was the initial felt need for the project, and who called who and said, “Let’s do some research on renewable climate-friendly energy solutions?” Bichlien, why don’t you go ahead and take the lead on this?

NGUYEN: Yeah. Um, so I’m pretty sure what happened—and David, correct me if I’m wrong—

[LAUGHTER]

HUIZINGA: Pretty sure … !

NGUYEN: I’m pretty sure, but not 100 percent sure—is that, um, while we were formulating how to … uh, what, what topics we wanted to target for the Microsoft Climate Research Initiative, we began talking to many different universities as a way to learn from them, to see what areas of interest and what areas they think are, uh, really important for the future. And one of those universities was the University of Michigan, and I believe David was one of few PIs on that initial Teams meeting. And David gave, I believe … David, was it like a 10-minute presentation? Very quick, right? Um, but it sparked this moment of, “Wow, I think we could accelerate this.”

HUIZINGA: David, how do you remember it?

KWABI: Yeah, I think I remember it. [LAUGHS] This is so almost like a, like a marriage. Like, how did you guys meet? Um, and then, and then the stories have to align in some way or …

HUIZINGA: Yeah, who liked who first?

KWABI: Yeah, exactly. Um, but yeah, I think, I think that’s what I recall. So basically, I’m part of … here at the university, I’m part of this group called the Global CO₂ Initiative, uh, which is basically, uh, an institute here at the university that convenes research related to CO₂ capture, CO₂ utilization, um, and I believe the Microsoft team set up a meeting with the Global CO₂ Initiative, which I joined, uh, in my capacity as a member, and I gave a little 10-minute talk, which apparently was interesting enough for a second look, so, um, that, that’s how the collaboration started. There was a follow-up meeting after that, and here we are today.

HUIZINGA: Well, it sounds like you’re compelling, so let’s get into it, David. Now would be a good time to talk about, uh, more detail on this research. I won’t call it Flow Batteries for Dummies, but assume we’re not experts. So what are flow batteries, what problems do they solve, and how are you going after your big research goals methodologically?

KWABI: OK, so one way to think about flow batteries is to, to think first about pumped hydro storage is, is how I like to introduce it. So, so a flow battery is just like a battery, the, the sort of battery that you have in your cell phone or laptop computer or electric vehicle, but it’s a … it has a very different architecture. Um, and to explain that architecture, I like to talk about pumped hydro. So pumped hydro is I think a technology many of us probably appreciate or know about. You have two reservoirs that, that hold water—so upper and lower reservoirs—and when you have extra electricity, or, or excess electricity, you can pump water up a mountain, if you like, from one reservoir to another. And when you need that electricity, water just flows down, spins some turbines, and produces electricity. You’re turning gravitational potential energy into electrical energy, or electricity, is the idea. And the nice thing about pumped hydro, um, is that if you want to store more energy, you just need a bigger reservoir. So you just need more water essentially, um, in, in the two reservoirs to get to longer and longer durations of storage, um, and so then this is nice because more and more water is actually … is cheap. So the, the marginal cost of your … every stored, um, unit of energy is quite low. This isn’t really the case for the source of batteries we have in our cell phones and laptop computers. So if we’re talking about grid storage, you want something like this, something that decouples energy and power, so we have essentially a low cost per unit of electricity. So, so flow batteries essentially mimic pumped hydro because instead of turning gravitational potential energy into electricity, you’re actually changing or turning chemical potential energy, if you like, into electricity. So essentially what you’re doing is just storing energy in, um, in the form of electrons that are sort of attached to molecules. So you have an electron at a really high energy that is like flowing onto another molecule that has a low energy. That’s essentially the transformation that you’re trying to do in a, in a flow battery. But that’s the analogy I like to, I like to draw. It’s sort of a high- and low-reservation, uh, reservoirs you have. High and low chemical, uh, potential energy.

HUIZINGA: So what do these do better than the other batteries that you mentioned that we’re already using for energy storage?

KWABI: So the other batteries don’t have this decoupling. So in the flow battery, you have the, the energy being stored in these tanks, and the larger the tank, the more the energy. If you want to turn that energy into, uh, chemical energy into electricity, you, you, you run it through a reactor. So the reactor can stay the same size, but the tank gets bigger and bigger and you store more energy. In a laptop battery, you don’t have that. If you want more energy, you just want more battery, and that, the cost of that is the same. So there isn’t this big cost advantage, um, that comes from decoupling the energy capacity from the power capacity.

NGUYEN: David, would you, would you also say that, um, with, you know, redox organic flow batteries, you also kind of decouple the source, right, of the, the material, the battery material, so it’s no longer, for example, a rare earth metal or precious metal.

KWABI: Absolutely. So that’s, that’s then the thing. So when you … so you’ve got … you know, imagine these large systems, these giant tanks with, with molecules that store electricity. The question then is what molecules do you choose? Because if it’s like really expensive, then your electricity is also very expensive. Um, the particular type of battery we’re looking at uses organic molecules to store that electricity, and the hope is that these organic molecules can be made very cheaply from very abundant materials, and in the end, that means that this then translates to a really low cost of electricity.

HUIZINGA: Bichlien, I’m glad you brought that up because that is a great comparison in terms of the rare earth stuff, especially lithium mining right now is a huge cost, or tax, on the environment, and the more electric we have, the more lithium we need, so there’s other solutions that you guys are, are digging around for. Were you going to say something else, Bichlien?

NGUYEN: I was just going to say, I mean, another reason why, um, we thought David’s work was so interesting is because, you know, we’re looking at, um, energy, um, storage for renewables, and so to get to this green energy economy, we’ll need a ton of renewables, and then we’ll need a ton of ways to store the energy because renewables are, you know, they’re intermittent. I mean, sometimes the rain rains all the time, [LAUGHS] and sometimes it doesn’t. It’s really dry. Um, I don’t know why I say rain, but I assume … I probably …

HUIZINGA: Because you’re in Seattle, that’s why!

NGUYEN: That’s true. But like the sun shines; it doesn’t shine. Um, yeah, the wind blows, and sometimes it doesn’t.

HUIZINGA: Or doesn’t. Yeah. … Well, let’s talk about molecules, David, um, and getting a little bit more granular here, or maybe I should say atomic. You’re specifically looking for aqueous-soluble redox-active organic molecules, and you’ve noted that they’re really hard to find, um, these molecules that meet all the performance requirements for real-world applications. In other words, you have to swipe left a lot before you get to a good [LAUGHS] match, continuing with the marriage analogy. … So what are the properties necessary that you’re looking for, and why are they so hard to find?

KWABI: So the “aqueous soluble” just means soluble in water. You want the molecule to be able to dissolve into water at really high concentrations. So that’s one, um, property. You need it to last a really long time because the hope is that these flow battery installations are going to be there for decades. You need it to store electrons at the right energy. So, uh, I mentioned you have two tanks: one tank will store electrons at high energy; the other at low energy. So you need those energy levels to be set just right in a sense, so you want a high-voltage battery, essentially. You also want the molecule to be set that it doesn’t leak from one tank to the other through the reactor that’s in the middle of the two tanks, right. Otherwise, you’re essentially losing material, which is not, uh, desirable. And you want the molecule to be cheap. So that, that’s really important, obviously, because if we’re going to do this at, um, really large scale, and we want this to be low cost, that we want something that’s abundant and cheap. And finding a molecule that satisfies all of these requirements at the same time is really difficult. Um, you can find molecules that satisfy three or four or two, but finding something that hits all the, all the criteria is really hard, as is finding a good partner. [LAUGHS]

HUIZINGA: Well, and even in, in other areas, you hear the phrase cheap, fast, good—pick two, right? So, um, yeah, finding them is hard, but have you identified some or one, or I mean where are you on this search?

KWABI: Right now, the state-of-the-art charge-storing molecule, if you like, is based on a rare earth … rare element called vanadium. So the most developed flow batteries now use vanadium, um, to store electricity. But, uh, vanadium is pretty rare in the Earth’s crusts. It’s unclear if we start to scale this technology, um, to levels that would really make an impact on climate, it’s not clear there’s enough vanadium, uh, to, to do the job. So it fulfills a bunch of, a bunch of the criteria that I just mentioned, but not, not the cheap one, which is pretty important. We’re hoping, you know, with this project, that with organic molecules, we can find examples or particular compounds that really can fulfill all of these requirements, and, um, we’re excited because organic chemistry gives us, uh … there’s a wide design space with organic molecules, and you’re starting from abundant elements, and, you know, the hope is that we can really get to something that, that, you know, we can swipe left on … is it swipe left or right? I’m not sure.

NGUYEN: I have no idea.

HUIZINGA: Swipe left means …

[LAUGHTER]

HUIZINGA: I looked it up. I, I’ve been married for a really long time, so I did look it up, and it is left if you’re not interested and right if you are, apparently on Tinder, but, uh, not to beat that horse …

KWABI: You want to swipe right eventually.

HUIZINGA: Yes. Which leads me to, uh, Bichlien. What does machine learning have to do with natural sciences like organic chemistry? Why does computation play a role here, particularly generative models, for climate change science?

NGUYEN: Yeah so what, you know, the past decade or two, um, in computer science and machine learning have taught us is that ML is really good at pattern recognition, right, being able to, um, take complex datasets and pull out the most type … you know, relevant, uh, trends and information, and, and it’s good at classifying … you know, used as a classification tool. Uh, and what we know about nature is that nature is full of patterns, right. We see repeating patterns all the time in nature, at many different scales. And when we think, for example, of all of the combinations of carbons, carbon organic molecules, that you could make, you see around 10⁶⁰; it’s estimated to be 10⁶⁰. Um, and those are all connected somehow, you know, in this large, you know, space, this large, um, distribution. And we want to, for example, as David mentioned, we want to check the boxes on all these properties. So what is really powerful, we believe, is that generative models will allow us to sample this, this organic chemistry space and allow us to condition the outputs of these models on these properties that David wants to checkmark. And so in a way, it’s allowing us to do more efficient searching. And I like to think about it as like you’re trying to find a needle, right, in the ocean, and the ocean’s pretty vast; needles are really small. And instead of having the size of the ocean as your search space, maybe you have the size of a bathtub and so that we can narrow down the search space and then be able to test and validate some of the, the molecules that come out.

HUIZINGA: So do these models then eliminate a lot of the options, making the pool smaller? Is that how that works to make it a bathtub instead of an ocean?

NGUYEN: I, I wouldn’t say eliminates, but it definitely tells you where you should be … it helps focus where you’re searching.

HUIZINGA: Right, right, right. Well, David, you and I talked briefly and exchanged some email on “The Elements” song by Tom Lehrer, and it’s, it’s a guy who basically sings the entire periodic chart of the elements, really fast, to the piano. But at the end, he mentions the fact that there’s a lot that haven’t been discovered. There’s, there’s blanks in the chart. And so I wonder if, you know, this, this search for molecules, um, it just feels like is there just so much more out there to be discovered?

KWABI: I don’t know if there’s more elements to be discovered, per se, but certainly there’s ways of combining them in ways that produce …

HUIZINGA: Aaahhhhh …

KWABI: … new compounds or compounds with properties that, that we’re looking for, for example, in this project. So, um, that’s, I think, one of the things that’s really exciting about, uh, about this particular endeavor we’re, we’re, we’re engaged in. So, um, one of the ways that people have traditionally thought about finding new molecules for flow batteries is, you know, you go into the lab or you go online and order a chemical that you think is going to be promising [LAUGHS] … some people I know have done this, uh, myself included … but you, you order a chemical that you think is promising, you throw it in the flow battery, and then you figure out if it works or not, right. And if it doesn’t work, you move on to the next compound, or you, um …

NGUYEN: You tweak it!

KWABI: … if it does work, you publish it. Yeah, exactly—you tweak it, for example. Um, but one of the, one of the questions that we get to ask in this project is, well, rather than think about starting from a molecule and then deciding or figuring out whether it works, we, we actually start from the criteria that we’re looking for and then figure out if we can intelligently design, um, a molecule based on the criteria. Um, so it’s, it’s, uh, I think a more promising way of going about discovering new molecules. And, as Bichlien’s already alluded to, with organic chemistry, the possibilities are endless. We’ve seen this already in, like, the pharmaceutical industry for example, um, and lots of other industries where people think about, uh, this combinatorial problem of, how do I get the right structure, the right compound, that solves the problem of, you know, killing this virus or whatever it is. We’re hoping to do something similar for, uh, for flow batteries.

HUIZINGA: Yeah, in fact, as I mentioned at the very beginning of the show, you titled your proposal “The computational design and characterization of organic electrolytes for flow batteries,” so it’s kind of combining all of that together. David, sometimes research has surprising secondary uses. You start out looking for one thing and it turns out to be useful for something else. Talk about the dual purposes of your work, particularly how flow batteries both store energy and work as a sort of carbon capture version of the Ghostbusters’ Ecto-Containment Unit. [LAUGHS]

KWABI: Sure. Yeah, so this is where I sort of confess and say I wasn’t completely up front in the beginning when I said all we do is energy storage, but, um, another, um, application we’re very interested in is carbon capture in my group. And with regard to flow batteries, it turns out that you, you actually can take the same architecture that you use for a flow battery and actually use it to, to capture carbon, CO₂ in particular. So the way this would work is, um, it turns out that some of the molecules that we’ve been talking about, some of the organic molecules, when you push an electron onto them—so you’re storing energy and you push an electron onto them—it turns out that some of these molecules also absorb hydrogen ions from water so those two processes sort of happen together. You push an electron onto the molecule, and then it picks up a hydrogen ion from water. Um, and if you remember anything about something from your chemistry classes in high school, that changes the pH of water. If you remove protons, uh, from water, that makes water more basic, or more alkaline. And alkaline electrolytes or alkaline water actually absorbs or reacts with CO₂ to make bicarbonate. So that’s a chemical reaction that can serve as a mode, or a mechanism, for removing CO₂ from, from the environment, so it could be air, or it could be, uh, flue gas or, you know, exhaust gas from a power plant. So imagine you, you run this process, you push the electron onto the molecule, you change the pH of the solution, you remove CO₂… that can then … you can actually concentrate that CO₂ and then run the opposite reaction. So you pull the electron off the molecule; that then dumps protons back into solution, and then you can release all this pure CO₂ all of a sudden. So, so now what you can do is take a flow battery that stores energy, but also, uh, use it to separate CO₂, separate and concentrate CO₂ from a, from a gaseous source. So this is, um, some work that we’ve been pursuing sort of in parallel with our work on energy storage, and the hope is that we can find molecules that, in principle, maybe could do both—could do the energy storage and also help with, uh, with CO₂ separation.

HUIZINGA: Bichlien, is that part of the story that was attractive to Microsoft in terms of both storage for energy and getting rid of CO₂ in the environment?

NGUYEN: Yeah, absolutely. Absolutely. Of course, the properties of, of, you know, both CO₂ capture and the energy storage components are sometimes somewhat, uh—David, correct me if I’m wrong—kind of divergent. It’s, it’s hard to optimize for one and have the other one optimized, too. So it’s really a balance of, of things, and we’re targeting, just right now, for this project, our joint project, the, the energy storage aspect.

HUIZINGA: Yeah. On that note, and either one of you can take this, what do you do with it? I mean, when I, when I used the Ghostbusters’ Ecto-Containment Unit, I was being direct. I mean, you got to put it somewhere once you capture it, whether it’s storing it for good use or getting rid of it for bad use. So how are you managing that?

KWABI: Great question, so … Bichlien, were you going to … were you going to go?

NGUYEN: Oh, I mean, yeah, I was going to say that there are many ways, um, for I’ll call it CO₂ abatement, um, once you have it. Um, there are people who are interested in storing it underground, so, uh, mineralizing it in basalt formations, rock formations. There are folks, um, like me, who are interested in, you know, developing new catalysts so that we can convert CO₂ to different renewable feedstocks that can be used in different materials like different plastics, different, um, you know, essentially new fuels, things of that nature. And then there’s, you know, commercial applications for pure streams of CO₂, as well. Uh, yeah, so I, I would say there’s a variety of things you can do with CO₂.

HUIZINGA: What’s happening now? I mean, where does that generally … David we, I, I want to say, we talked about this issue, um, when we met before on some of the downsides of what’s current.

KWABI: Yeah, so currently, um, so there’s, as Bichlien has mentioned, there’s a number of things you could do with it. But right now, of all the sort of large projects that have been set up, large pilot plants for CO₂ capture that have been set up, I think the main one is enhanced oil recovery, which is a little bit controversial, um, because what you’re doing with the CO₂ there is you’re pumping it underground into an oil field that has become sort of less productive over time. And the goal there is to try to coax a little bit more oil, um, out of this field. So, so you pump the CO₂ underground, it mixes in with the oil, and then you … that, that sort of comes back up to the surface, and you separate the CO₂ from the oil, and you can, you can go off and, um, use the oil for whatever you use it for. So, so the economically attractive thing there is, there’s, uh, there’s, there’s going to be some sort of payoff. There’s a reason, a commercial incentive, for separating the CO₂, uh, but of course the problem is you’re removing oil from the … you’re, you’re extracting more oil that’s going to end up with … in more CO₂ emissions. So, um, there are, in principle, many potential options, but there aren’t very many that have both the sort of commercial … uh, where there’s sort of a commercial impact and there’s also sort of the scale to take care of the, you know, the gigatons of CO₂ that we’re going to have to draw down, basically, so … .

NGUYEN: Yeah. And I, I think, I mean, you know, to David’s point, that’s true—that, that is what’s happening, you know, today because it provides value, right? The issue, I think, with CO₂ capture and storage is that while there’s global utility, there’s no monetary value to it right now. Um, and so it makes it a challenge in terms of being able to industrialize, you know, industries to take care of the CO₂. But I, I, I think, you know, as part of the MCRI initiative, you know, we’re very interested in both the carbon capture and the utilization aspect, um, and utilization would mean utilizing the CO₂ in productive ways for long-term storage, so think about maybe using CO₂, um, converting it electrochemically, for example, into, uh, different monomers. Those monomers maybe could be used in new plastics for long-term storage. Uh, maybe those are recyclable plastics. Maybe they’re plastics that are easily biodegradable. But, you know, one of the issues with using, or manufacturing, is there’s always going to be energy associated with manufacturing. And so that’s why we care a lot about renewables and, and the green energy transition. And, and that’s why, uh, you know, we’re collaborating with David and his team as, as part of that. It’s really full circle. We have to really think about it on a systems level, and the collaboration with David is, is one part of that system.

HUIZINGA: Well, that leads beautifully, and that’s probably an odd choice of words for this question, but it seems like “solving for X” in climate change is a no-lose proposition. It’s a good thing to do. But I always ask what could possibly go wrong, and in this case, I’m thinking about other solutions, some of which you’ve already mentioned, that had seemed environmentally friendly at first, but turned out to have unforeseen environmental impacts of their own. So even as you’re exploring new solutions to renewable energy sources, how are you making sure, or how are you mitigating, harming the environment while you’re trying to save it?

KWABI: That’s a great question. So it’s, it’s something that I think isn’t traditionally, at least in my field, isn’t traditionally sort of part of the “solve for X” when people are thinking about coming up with a new technology or new way of storing renewable electricity. So, you know, in our particular case, one of the things that’s really exciting about the project we’re working on is we’re looking at molecules that are fairly already quite ubiquitous, so, so they’re already being used in the food and textile industry, for example, derivatives of the molecules we’re using. So, you know, thinking about the materials you’re using and the synthetic routes that are necessary to produce them is sort of a pitfall that one can easily sort of get into if you don’t start thinking about this question at the very beginning, right? You might come up with a technology that’s, um, appealing and that works really well, performance-wise, but might not be very recyclable or might have some difficulties in terms of extraction and so on and so forth. So lithium-ion batteries, for example, come to mind. I think you were alluding to this earlier, that, you know, they’re a great technology for electric vehicles, but mining cobalt, extracting cobalt, comes with a lot of, um, just negative impacts in terms of child labor and so on in the Congo, et cetera. So how, how do we, you know, think about, you know, materials that don’t … that sort of avoid this? And I’ll, I’ll just highlight as one of our team members … so Anne McNeil, who’s in the chemistry department here, thinks quite a lot about this, and that’s appropriate because she’s sort of the synthetic chemist on the team. She’s the one who’s thinking a lot about, you know, given we have this molecule we want to make, what’s the most eco-friendly, sustainable route to making that molecule with materials that don’t require, you know, pillaging and polluting the earth to do it in a sense, right. And also materials … and also making it in a way that, you know, at end of life, it can be potentially recycled, right.

HUIZINGA: Right.

KWABI: So thinking about sustainable routes to making these molecules and potential sort of ways of recycling them are things that, um, we’re, we’re trying to, in some sense, to take into consideration. And by we, I mean Anne, specifically, is thinking quite seriously about …

NGUYEN: David … David, can I put words in your mouth?

KWABI: But, yeah. … Yeah, sure, go ahead.

NGUYEN: Um, you’re, you’re thinking of sustainability as being a first design principle for …

KWABI: Yes! I would take those words! Exactly.

NGUYEN: OK. [LAUGHS] Yeah, I mean, that’s really important. I, I agree and second what David said.

HUIZINGA: Bichlien, when we talked earlier, the term co-optimization came up, and I want to dig in here a little bit because whenever there’s a collaboration, each discipline can learn something from the other, but you can also learn about your own in the process. So what are some of the benefits you’ve experienced working across the sciences here for this project? Um, could you provide any specific insights or learnings from this project?

NGUYEN: I mean, I, I think maybe a naive … something that maybe seems naive is that we definitely have to work together in all three disciplines because what we’re also learning from David and Bryan is that there are different experimental and computational timelines that sometimes don’t agree, and sometimes do agree, and we really have to, uh, you know, work together in order to create a unified, I’m not going to call it a roadmap, but a unified research plan that works for everyone. For example, um, it takes much longer to run an experiment to synthesize a molecule … I, I think it takes much longer to synthesize a molecule than, for example, to run a, uh, flow cell, um, experiment. And then on the computational side, you could probably run it, you know, at night, on a weekend, you know, have it done relatively soon, generate molecules. And one of those that we’re, you know, understanding is that the human feedback and the computational feedback, um, it takes a lot of balancing to make sure that we’re on the same track.

HUIZINGA: What do you think, David?

KWABI: Yeah, I think that’s definitely accurate, um, figuring out how we can work together in a way that sort of acknowledges these timelines is really important. And I think … I’m a big believer in the fact that people from somewhat different backgrounds working together, the diversity of background, actually helps to bring about, you know, really great innovative solutions to things. And there’s various ways that this has sort of shown up in our, in own work, I think, and in our, in our discussions. Like, you know, we’re currently working on a particular sort of molecular structure for, uh, for a compound that we think will be promising at storing electricity, and the way we, we came about with it is that my, my group, you know, we ran a flow cell and we saw some data that seemed to suggest that the molecule was decomposing in a certain way, and then Anne’s group, or one of Anne’s students, proposed a mechanism for what might be happening. And then Jake, who works with Bichlien, also … and then thought about, “Well, what, what about this other structure?” So that sort of … and then that’s now informing some of the calculations that are going on, uh, with Bryan. So there’s really interesting synergies that show up just because there’s people working from, you know, coming from very different backgrounds. Like I’m a mechanical engineer who sort of likes to hang out with chemists and, um, there’s actual chemists and then there’s, you know …

NGUYEN: But, David, I think …

KWABI: … the people who do computation, and so on …

NGUYEN: I think you’re absolutely right here in terms of the overlap, too, right? Because in a, in a way, um, I’m an organic chemist by training, and I dabble in machine learning. You’re a mechanical engineer who dabbles in chemistry. Uh, Bryan’s a computational chemist who dabbles in flow cell works. Uh, Anne is, uh, you know, a purely synthetic chemist who dabbles in, you know, almost all of our aspects. Because we have overlap, we have lower, I’m going to call it an activation barrier, [LAUGHS] in terms of the language we speak. I think that is something that, you know, we have to speak the same language, um, so that we can understand each other. And sometimes that can be really challenging, but oftentimes, it’s, it’s not.

HUIZINGA: Yeah, David, all successful research projects begin in the mind and make their way to the market. Um, where does this research sit on that spectrum from lab to life, and how fast is it moving as far as you’re concerned?

KWABI: Do you mean the research, uh, in general or this project?

HUIZINGA: This project, specifically.

KWABI: OK, so I’d say we’re, we’re still quite early at this stage. So there’s a system of classification called Technology Readiness Level, and I would say we’re probably on the low end of the scale, I don’t know, maybe like a 1 or 2.

NGUYEN: We just started six months ago!

KWABI: We just started six months ago! So …

[LAUGHTER]

HUIZINGA: OK, that’s early. Wait, how many levels are there? If there’s 1 or 2, what’s the high end?

KWABI: I think we go up to an 8 or so, an 8 or a 9. Um, so, so we’re quite early; we just started. But the, the nice thing about this field is that things can move really quickly. So in a year or two, who knows where we’ll be? Maybe four or five, but things are still early. There’s a lot of fundamental research right now that’s happening …

HUIZINGA: Which is so cool.

KWABI: Proof of concept. Which is necessary, I think, before you can get to the, the point where you’re, um, you’re spinning out a company or, or moving up to larger scales.

HUIZINGA: Right. Which lives very comfortably in the academic world. Bichlien, Microsoft Research is sort of a third space where they allow for some horizon on that scale in terms of how long it’s going to take this to be something that could be financially viable for Microsoft. Is that just not a factor right now? It’s just like, let’s go, let’s solve this problem because this is super-important?

NGUYEN: I guess I’ll say that it takes roughly 20 years or so to get a proof of concept into market at an industrial scale. So, I’m … what I’m hoping that with this collaboration, and with others, is that we can shorten the time for discovery so that we understand the fundamentals and we have a good baseline of what we think can be achieved so that we can go to, for example, a pilot scale, like a test scale, outside of the laboratory, not full industrial scale, but just a pilot scale much faster than we would if we had to hand iterate every single molecule.

HUIZINGA: So the generative models play a huge role in that shortening of the time frame …

NGUYEN: Yes, yes. That’s what we …

KWABI: Yeah, I think …

NGUYEN: Go ahead, David.

KWABI: Yeah. I think the idea of having a platform … so, so rather than, you know, you found this wonderful, precious molecule that you’re going to make a lot of, um … you know, having a platform that can generate molecules, right, I think is, you know, proving that this actually works gives you a lot more shots on goal, basically. And I think that, you know, if we’re able to show that, in the next year or two, that there’s, there’s a proof of concept that this can go forward, then um, then, in principle, we have many more chemistries to work with and play with, than the …

NGUYEN: Yeah, and, um, we might also be able to, you know, with, with this platform, discover molecules that have that dual purpose, right, of both energy storage and carbon capture.

HUIZINGA: Well, as we wrap up, I’d love to know in your fantastical ideal preferred future, what does your work look like … now, I’m going to say five to 10 years, but, Bichlien, you just said 20 years, [LAUGHS] so maybe I’m on the short end of it here. In the “future,” um, how have you changed the landscape of eco-friendly, cost-effective energy solutions?

KWABI: That’s a, that’s a big question. I, I tend to think in more two–, three–year timelines sometimes. [LAUGHS] But I think in, in, in, you know, in like five, 10 years, if this research leads to a company that’s sort of thriving and demonstrating that flow batteries can really make an impact in terms of low-cost energy storage, that would have been a great place to land. I mean that and the demonstration that you, you know, with artificial intelligence, you can create this platform that can, uh, custom design molecules that fulfill these criteria. I think that would be, um, that would be a fantastic outcome.

HUIZINGA: Bichlien, what about you?

NGUYEN: So I think in one to two years, but I also think about the 10-to-20-year timeline, and what I’m hoping for is, again, to demonstrate the value of AI in order to enable a carbon negative economy so that we can all benefit from it. It sounds very … a polished answer, but I, I really think there are going to be accelerations in this space that’s enabled by these new technologies that are coming out.

HUIZINGA: Hmm.

NGUYEN: And I hope so! We have to save the planet!

KWABI: There’s a lot more to AI than ChatGPT and, [LAUGHS] you know, language models and so on, I think …

HUIZINGA: That’s a perfect way to close the show. So … Bichlien Nguyen and David Kwabi, thank you so much for coming on. It’s been delightful—and informative!

NGUYEN: Thanks, Gretchen.

KWABI: Thank you very much.

The post Collaborators: Renewable energy storage with Bichlien Nguyen and David Kwabi appeared first on Microsoft Research.

Optimized PyTorch 2.0 Inference with AWS Graviton processors

New generations of CPUs offer significant performance improvement in machine learning (ML) inference due to specialized built-in instructions. Combined with their flexibility, high speed of development, and low operating cost, these general-purpose processors offer an alternative ML inference solution to other existing hardware solutions.

AWS, Arm, Meta, and others helped optimize the performance of PyTorch 2.0 inference for Arm-based processors. As a result, we are delighted to announce that Arm-based AWS Graviton instance inference performance for PyTorch 2.0 is up to 3.5 times the speed for ResNet-50 compared to the previous PyTorch release, and up to 1.4 times the speed for BERT, making Graviton-based instances the fastest compute optimized instances on AWS for these models (see the following graph).

Image 1: Relative speed improvement achieved by upgrading from PyTorch version 1.13 to 2.0 (higher is better). The performance is measured on c7g.4xlarge instances.

As shown in the next graph, we measured up to 50% cost savings for PyTorch inference with Graviton3-based c7g instances across Torch Hub ResNet-50 and multiple Hugging Face models compared to comparable x86-based compute optimized Amazon EC2 instances. For that graph, we first measured the cost per million inference for the five instance types. Then, we normalized the cost per million inference results to a c5.4xlarge instance, which is the baseline measure of “1” on the Y-axis of the chart.

Image 2: Relative cost of PyTorch inference running on different AWS instances (lower is better).
Source: AWS ML Blog on Graviton PyTorch2.0 inference performance.

Similar to the preceding inference cost comparison graph, the following graph shows the model p90 latency for the same five instance types. We normalized the latency results to the c5.4xlarge instance, which is the baseline measure of “1” on the Y-axis of the chart. The c7g.4xlarge (AWS Graviton3) model inference latency is up to 50% better than the latencies measured on c5.4xlarge, c6i.4xlarge, and c6a.4xlarge.

Image 3: Relative latency (p90) of PyTorch inference running on different AWS instances (lower is better).
Source: AWS ML Blog on Graviton PyTorch2.0 inference performance.

Optimization details

PyTorch supports Compute Library for the Arm® Architecture (ACL) GEMM kernels via the oneDNN backend (previously called “MKL-DNN”) for AArch64 platforms. The optimizations are primarily for PyTorch ATen CPU BLAS, ACL kernels for fp32 and bfloat16, and oneDNN primitive caching. There are no frontend API changes, so no changes are required at the application level to get these optimizations working on Graviton3-based instances.

PyTorch level optimizations

We extended the ATen CPU BLAS interface to accelerate more operators and tensor configurations via oneDNN backend for aarch64 platform. The following diagram highlights (in orange) the optimized components that improved the PyTorch inference performance on aarch64 platform.

Image 4: PyTorch software stack highlighting (in orange) the components optimized for inference performance improvement on AArch64 platform

ACL kernels and BFloat16 FPmath mode

The ACL library provides Neon and SVE optimized GEMM kernels for both fp32 and bfloat16 formats: These kernels improve the SIMD hardware utilization and reduce the end to end inference latencies. The bfloat16 support in Graviton3 allows efficient deployment of models trained using bfloat16, fp32 and Automatic Mixed Precision (AMP). The standard fp32 models use bfloat16 kernels via oneDNN FPmath mode without model quantization. They provide up to two times faster performance compared to existing fp32 model inference without bfloat16 FPmath support. For more details on ACL GEMM kernel support, refer to Arm Compute Library github.

Primitive Caching

The following call sequence diagram shows how ACL operators are integrated into oneDNN backend. As shown in the diagram, ACL objects are handled as oneDNN resources instead of the primitive objects. This is because the ACL objects are stateful and mutable. Since the ACL objects are handled as resource objects, they are not cacheable with the default primitive caching feature supported in oneDNN. We implemented primitive caching at ideep operator level for “convolution”, “matmul” and “inner product” operators to avoid redundant GEMM kernel initialization and tensor allocation overhead.

Image 5: Call sequence diagram showing how the Compute Library for the Arm® Architecture (ACL) GEMM kernels are integrated into oneDNN backend

How to take advantage of the optimizations

Install the PyTorch 2.0 wheel from the official repo and set environment variables to enable the additional optimizations.

# Install Python
sudo apt-get update
sudo apt-get install -y python3 python3-pip

# Upgrade pip3 to the latest version
python3 -m pip install --upgrade pip

# Install PyTorch and extensions
python3 -m pip install torch
python3 -m pip install torchvision torchaudio torchtext

# Turn on Graviton3 optimization
export DNNL_DEFAULT_FPMATH_MODE=BF16
export LRU_CACHE_CAPACITY=1024

Running an inference

You can use PyTorch torchbench to measure the CPU inference performance improvements, or to compare different instance types.

# Pre-requisite:
# pip install PyTorch2.0 wheels and set the above mentioned environment variables

# Clone PyTorch benchmark repo
git clone https://github.com/pytorch/benchmark.git

# Setup ResNet-50 benchmark
cd benchmark
python3 install.py resnet50

# Install the dependent wheels
python3 -m pip install numba

# Run ResNet-50 inference in jit mode. On successful completion of the inference runs,
# the script prints the inference latency and accuracy results
python3 run.py resnet50 -d cpu -m jit -t eval --use_cosine_similarity

Performance Analysis

Now, we will analyze the inference performance of ResNet-50 on Graviton3-based c7g instance using PyTorch profiler. We run the code below with PyTorch 1.13 and PyTorch 2.0 and run the inference for a few iterations as a warmup before measuring the performance.

# Turn on Graviton3 optimization
export DNNL_DEFAULT_FPMATH_MODE=BF16
export LRU_CACHE_CAPACITY=1024

import torch
from torchvision import models
sample_input = [torch.rand(1, 3, 224, 224)]
eager_model = models.resnet50(weights=models.ResNet50_Weights.DEFAULT)
model = torch.jit.script(eager_model, example_inputs=[sample_input, ])

model = model.eval()
model = torch.jit.optimize_for_inference(model)

with torch.no_grad():
    # warmup runs
    for i in range(10):
        model(*sample_input)
    prof = torch.profiler.profile(
      on_trace_ready=torch.profiler.tensorboard_trace_handler('./logs'), record_shapes=True, with_stack=True)
    # profile after warmup
    prof.start()
    model(*sample_input)
    prof.stop()

We use tensorboard to view results of the profiler and analyze model performance.

Install PyTorch Profiler Tensorboard plugin as follows

pip install torch_tb_profiler

Launch the tensorboard using

tensorboard --logdir=./logs

Launch the following in the browser to view the profiler output. The profiler supports ‘Overview’, ‘Operator’, ‘Trace’ and ‘Module’ views to get insight into the inference execution.

http://localhost:6006/#pytorch_profiler

The following diagram is the profiler ‘Trace’ view which shows the call stack along with the execution time of each function. In the profiler, we selected the forward() function to get the overall inference time. As shown in the diagram, the inference time for the ResNet-50 model on Graviton3-based c7g instance is around 3 times faster in PyTorch 2.0 compared to PyTorch 1.13.

Image 6: Profiler Trace view: Forward pass wall duration on PyTorch 1.13 and PyTorch 2.0

The next diagram is the ‘Operator’ view which shows the list of PyTorch operators and their execution time. Similar to the preceding Trace view, the Operator view shows that the operator host duration for the ResNet-50 model on Graviton3-based c7g instance is around 3 times faster in PyTorch 2.0 compared to PyTorch 1.13.

Image 7: Profiler Operator view: Forward operator Host duration on PyTorch 1.13 and PyTorch 2.0

Benchmarking Hugging Face models

You can use the Amazon SageMaker Inference Recommender utility to automate performance benchmarking across different instances. With Inference Recommender, you can find the real-time inference endpoint that delivers the best performance at the lowest cost for a given ML model. We collected the preceding data using the Inference Recommender notebooks by deploying the models on production endpoints. For more details on Inference Recommender, refer to the amazon-sagemaker-examples GitHub repo. We benchmarked the following models for this post: ResNet50 image classification, DistilBERT sentiment analysis, RoBERTa fill mask, and RoBERTa sentiment analysis.

Conclusion

For PyTorch 2.0, the Graviton3-based C7g instance is the most cost-effective compute optimized Amazon EC2 instance for inference. These instances are available on SageMaker and Amazon EC2. The AWS Graviton Technical Guide provides the list of optimized libraries and best practices that will help you achieve cost benefit with Graviton instances across different workloads.

If you find use cases where similar performance gains are not observed on Graviton, please open an issue on the aws-graviton-getting-started github to let us know about it. We will continue to add more performance improvements to make AWS Graviton-based instances the most cost-effective and efficient general purpose processor for inference using PyTorch.

Acknowledgments

We would like to thank Ali Saidi (Sr. Principal Engineer) and Csaba Csoma (Sr. Manager, Software Development) from AWS, Ashok Bhat (Sr. Product Manager), Nathan Sircombe (Sr. Engineering Manager) and Milos Puzovic (Principal Software Engineer) from Arm for their support during the Graviton PyTorch inference optimization work. We would also like to thank Geeta Chauhan (Engineering Leader, Applied AI) from Meta for her guidance on this blog.

About the authors

Sunita Nadampalli is a ML Engineer and Software Development Manager at AWS.

Ankith Gunapal is an AI Partner Engineer at Meta(PyTorch).

SoundStorm design

Measuring SoundStorm performance

Safety and risk mitigation

Conclusion

Acknowledgments

Background

Dataset

Problem definition

Modeling challenges

Data preprocessing and feature engineering

Feature engineering

Data preparation

ML methodology and model training

Building a baseline model with AutoGluon

Building and tuning a customized neural network model with SageMaker automatic model tuning

Model performance results

Evaluation metrics

Results

Conclusion

About the authors

At Your Command

Conquer Your Games List

Episode 141 | June 22, 2023

Learn more:

Subscribe to the Microsoft Research Podcast:

Transcript

Optimization details

PyTorch level optimizations

ACL kernels and BFloat16 FPmath mode

Primitive Caching

How to take advantage of the optimizations

Running an inference

Performance Analysis

Benchmarking Hugging Face models

Conclusion

Acknowledgments

About the authors

Navigation

GenAI Vision Endless Possibilities

"I'm interested in things that change the world or that affect the future and wondrous, new technology where you see it, and you're like, 'Wow, how did that even happen? How is that possible?'" -- Elon Musk

Copyright © 2019-2025 Vedere AI. All Rights Reserved.