Microsoft AI – Page 23

Research Focus: Week of September 11, 2023

September 13, 2023

by Alyssa Hughes Microsoft AI

Microsoft Research Focus 24 | Week of September 11, 2023

Welcome to Research Focus, a series of blog posts that highlights notable publications, events, code/datasets, new hires and other milestones from across the research community at Microsoft.

NEW RESEARCH

PolySem: Efficient Polyglot Analytics on Semantic Data

Data scientists and data engineers spend a large portion of their time trying to understand, clean and transform their data before they can even start performing meaningful analysis. Most database vendors provide business intelligence (BI) tools as an efficient and user-friendly platform for customers to perform data cleaning, preparation and linking tasks to obtain actionable semantic data. However, customers are increasingly interested in querying semantic data through various modalities including SQL, imperative programming languages such as Python, and natural language queries. Today, customers are limited to using either the visual interfaces provided by these tools or languages that are specific to the particular tool.

In a new paper: PolySem: Efficient Polyglot Analytics on Semantic Data, researchers from Microsoft propose techniques to enable the execution of user queries expressed in different modalities on semantic datasets without having to export data out of the BI system. Their techniques include automatic translation of user queries into a language-agnostic representation of data processing operations, and subsequently into the specific query language that is amenable to execution on the BI engine. Evaluation results on BI and decision support benchmarks suggest significant improvements in query performance compared to other popular data processing engines.

Read the paper

NEW RESOURCE

Generative retrieval for conversational question answering

The growth of conversational agents, including voice assistants and chatbots, has led to a shift towards dialogue-based interfaces for information-seeking activities. This has spurred the development of conversational question answering (QA) systems. Effective passage retrieval, which excludes irrelevant data from scanned documents, is crucial but challenging for such systems due to the ambiguity of questions. Current methods rely on the dual-encoder architecture to embed contextualized vectors of questions in conversations. However, this architecture is limited in the embedding bottleneck and the dot-product operation.

To alleviate these limitations, researchers from Microsoft propose generative retrieval for conversational QA (GCoQA). GCoQA assigns distinctive identifiers for passages and retrieves passages by generating their identifiers token-by-token via the encoder–decoder architecture. In this generative way, GCoQA eliminates the need for a vector-style index and could attend to crucial tokens of the conversation context at every decoding step. Experiments on three public datasets containing about twenty million passages show GCoQA achieves relative improvements of +13.6% in passage retrieval and +42.9% in document retrieval. GCoQA also reduces memory usage and improves inference speed.

Read the paper

View code & data

NEW RESOURCE

BatteryML: An open-source tool for machine learning on battery degradation

In recent years, lithium-ion batteries have become the cornerstone of energy storage solutions, owing to their high energy density, long cycle life, and relatively low self-discharge. They have found widespread applications across various industries, including electric vehicles, consumer electronics, and renewable energy systems. Despite these advantages, lithium-ion batteries face challenges related to capacity degradation and performance optimization, which have become critical areas of focus in battery research.

Capacity degradation is a complex process influenced by various factors such as temperature, charge-discharge rate, and state of charge. Understanding and mitigating these factors is crucial for enhancing the performance and longevity of lithium-ion batteries. This has led to the development of advanced battery management systems and the application of machine learning techniques to improve prediction accuracy and optimize battery performance.

To address these challenges, researchers from Microsoft have released BatteryML (opens in new tab), a comprehensive open-source tool designed specifically for machine learning researchers, battery scientists, and materials researchers with an interest in battery performance prediction and analysis. BatteryML aims to address the challenges of capacity degradation by leveraging machine learning methods to improve various aspects of battery performance, such as capacity fade modeling, state of health prediction, and state of charge estimation.

Download code

The post Research Focus: Week of September 11, 2023 appeared first on Microsoft Research.

Abstracts: September 13, 2023

September 13, 2023

by Brenda Potts Microsoft AI

Episode 148 | September 13, 2023

Members of the research community at Microsoft work continuously to advance their respective fields. Abstracts brings its audience to the cutting edge with them through short, compelling conversations about new and noteworthy achievements.

In the inaugural episode of the series, Dr. Ava Amini and Dr. Kevin K. Yang, both Senior Researchers with Microsoft Health Futures, join host Dr. Gretchen Huizinga to discuss “Protein generation with evolutionary diffusion: Sequence is all you need.” The paper introduces EvoDiff, a suite of models that leverages evolutionary-scale protein data to help design novel proteins more efficiently. Improved protein engineering has the potential to help create new vaccines to prevent disease and new ways to recycle plastics.

View the preprint paper

Transcript

[MUSIC PLAYS]

GRETCHEN HUIZINGA: Welcome to Abstracts, a Microsoft Research Podcast that puts the spotlight on world-class research in brief. I’m Dr. Gretchen Huizinga. In this series, members of the research community at Microsoft give us a quick snapshot—or a podcast abstract!—of their new and noteworthy papers.

[MUSIC FADES]

Today, I’m talking to Dr. Ava Amini and Dr. Kevin Yang, both senior researchers at Microsoft Health Futures. Ava and Kevin are co-authors of a paper titled “Protein generation with evolutionary diffusion: Sequence is all you need,” and a preprint of the paper is available now on bioRxiv. Ava and Kevin, thanks for joining us on Abstracts!

KEVIN YANG: Thanks for having us.

AVA AMINI: Thank you so much.

HUIZINGA: So, Kevin, in just a couple sentences, tell us what problem this research addresses and why people should care.

YANG: Yeah, so proteins are this really big, important family of biomolecules, and they’re responsible for a lot of cellular processes. For example, hemoglobin carries oxygen in your blood, and insulin regulates your blood sugar levels. And people are interested in generating new proteins to do things that people care about—not necessarily in our bodies, but we’re interested in proteins as industrial enzymes so for catalysis and to make new chemicals or for therapeutics to make new drugs. And as a step towards this goal, we train a suite of models that we call EvoDiff that learns to generate realistic but novel proteins. So proteins do a lot of useful things in nature, but we can really expand their repertoire to do things that people care about but that nature may not really care about. One really good historical example of this is that most of our modern laundry detergents contain enzymes that break down things that stain your clothes. And these enzymes were based on natural proteins, but natural proteins don’t work under high heat. They don’t work in detergent. So somebody engineered those to work in the conditions of our washing machine. And they work really well nowadays. Looking forward, we look at some of the challenges facing our world, such as sustainability. So some really big things people are working on now are things like enzymes that can break down plastic and help us recycle plastic or enzymes that can perform photosynthesis more efficiently. And then on the other side, there’s therapeutics, and an obvious example there is vaccine design. So designing vaccines quickly and safely for new diseases as they emerge.

HUIZINGA: Ava, how does your approach build on or differ from what’s been done previously in this field?

AMINI: Yeah, so we call our approach EvoDiff, and EvoDiff has two components. The first, Evo, refers to evolutionary, and the second, Diff, refers to this notion of diffusion. And the two things that make our approach cool and powerful is the fact that we are leveraging data about proteins that is at an evolutionary scale in terms of the size and the diversity of the datasets about natural proteins that we use. And specifically, we use that data to build a type of AI model that is called a diffusion model. Now, for a little backstory on this, a few years ago, we in the AI community learned that we can do really well in generating brand-new images by taking natural images, adding small amounts of noise to them, corrupting them, and then training an AI model called a diffusion model to remove that noise. And so what we’ve done in this paper is that we have constructed and trained these diffusion models to do the same kind of process on protein data at evolutionary scale.

HUIZINGA: Kevin, back to you, let’s go a little deeper on methodology. How did you do this?

YANG: Yeah, so we really wanted to do this in a protein sequence space. So in protein biology, you have sequences of amino acids. So that’s a series of amino acid monomers that form a chain, and then that chain folds oftentimes into a 3D structure. And function is usually mediated by that 3D structure. Unfortunately, it’s difficult and can be slow and expensive to obtain experimental structures for all these proteins. And so previous diffusion models of proteins have really focused on generating a three-dimensional structure. And then you can use some other method to find a sequence that will fold to that structure. But what we really wanted to do was generate proteins directly as sequences because it’s much easier to get sequences than it is to get structure. So there’s many, many more sequences out there than there are structures. And we know that deep learning methods scale really well as you increase the size and quality of the datasets they’re trained on. And so we … and by we, it’s me and Ava but also Nitya Thakkar, who was an undergraduate intern last summer with me and Ava, and then Sarah Alamdari, our data scientist, who also did a lot of the hands-on programming for this. And then we also got a lot of help from Rianne van den Berg, who is at AI4Science, and then Alex Lu and Nicolò Fusi, also here in New England. So we went and got these large, diverse, evolutionary datasets of protein sequences, and then we used a deep learning framework called PyTorch to train these diffusion models. And then we do a lot of computational experiments to see whether they do the things we want them to do, which Ava, I think, will talk about next.

HUIZINGA: Right. Right. So, Ava, yes, what were your major findings?

AMINI: Yeah, the first question we really asked was, can our method, EvoDiff, generate proteins that are new, that are realistic, and that are diverse, meaning they’re not similar to proteins that exist in nature but still are realistic? And so what we found was that indeed, we can do this, and we can do this really well. In fact, the generated proteins from our method show a better coverage of the whole landscape of structural features, functional features, and features in sequence space that exist amongst proteins in nature. And so that was our first really exciting result, that we could generate proteins that were really of high quality using our method. The second thing we asked was, OK, now if we give some context to the model, a little bit of information, can we guide the generation to fulfill particular properties that we want to see in that protein? And so specifically here, we experimented with two types of experiments where first, we can give a part of the protein to the model, let’s say, a part of the protein that binds to another protein. And we hold that part constant and ask the model to generate the sequence around that. And we see that we can do really well on this task, as well. And why that’s important is because it means we can now design new proteins that meet some criteria that we, the users, want the protein to have. For example, the ability to bind to something else. And finally, the last really exciting result was … one point that we’ve talked about is why we want to do this generation in sequence space rather than structure—because structure is difficult, it’s expensive, and there are particular types of proteins that don’t actually end up folding into a final 3D structure. They’re what we call disordered. And these types of disordered proteins have really, really important roles in biology and in disease. And so what we show is that because we do our generation and design in protein sequence space, we can actually generate these types of disordered proteins that are completely inaccessible to methods that rely on using information about the protein’s 3D shape.

HUIZINGA: So, Kevin, building on Ava’s description there of the structure and sequence space, how is your work significant in terms of real-world impact?

YANG: Right, so there’s a lot of interest in designing or generating new proteins that do useful things as therapeutics or as industrial catalysts and for a lot of other things, as well. And what our work really does is it gives us a method that can reliably generate high-quality proteins directly in sequence space. And this is good because now we can leverage evolutionary-scale data to do this on any downstream protein engineering problem without relying on a structure-based design or structure-based data. And we’re hoping that this opens up a lot of possibilities for protein engineering, protein design, and we’re really excited about some new experimental work that we—and we hope others—will use to build on this method.

HUIZINGA: Are you guys the first to move into the evolutionary scale in this? Is that a differentiator for your work?

YANG: So there have been a few other preprints or papers that talk about applying diffusion to protein sequences. The difference here is that, yes, like I said, we’re the first ones to do this at evolutionary scale. So people will also train these models on small sets of related protein sequences. For example, you might go look for an enzyme family and find all the sequences in nature of that family and train a model to generate new examples of that enzyme. But what we’re doing is we’re looking at data that’s from all different species and all different functional classes of proteins and giving us a model that is hopefully universal or as close to universal as we can get for protein sequence space.

HUIZINGA: Wow. Ava, if there was one thing you want listeners to take away from this work, what would it be?

AMINI: If there’s one thing to take away, I think it would be this idea that we can and should do protein generation over sequence because of the generality we’re able to achieve, the scale that we’re able to achieve, and the modularity and that our diffusion framework gives us the ability to do that and also to control how we design these proteins to meet specific functional goals.

HUIZINGA: So, Kevin, to kind of wrap it up, I wonder if you could address what unanswered questions still remain, or unsolved problems in this area, and what’s next on your research agenda.

YANG: So there’s kind of two directions we want to see here. One is, we want to test better ideas for conditioner models. And what I mean there is we want to feed in text or a desired chemical reaction or some other function directly and have it generate those things that will then go work in the lab. And that’s a really big step up from just generating sequences that work and are novel. And two is, in biology and in protein engineering, models are really good, but what really matters is, do things work in the lab? So we are actually looking to do some of our own experiments to see if the proteins we generate from EvoDiff work as desired in the lab.

[MUSIC PLAYS]

HUIZINGA: Ava Amini and Kevin Yang, thanks so much for joining us today, and to our listeners, thanks for tuning in. If you’re interested in learning more about the paper, you can find a link at aka.ms/abstracts or you can find a preprint of the paper on bioRxiv. See you next time on Abstracts!

The post Abstracts: September 13, 2023 appeared first on Microsoft Research.

FP2: Fully In-Place Functional Programming provides memory reuse for pure functional programs

September 12, 2023

by Brenda Potts Microsoft AI

This research paper was presented at the 28^th ACM SIGPLAN International Conference on Functional Programming (opens in new tab) (ICFP), a premier forum for discussing design, implementations, principles, and uses of functional programming.

FP2: Fully In-Place Functional Programming; ICFP 2023

Functional programming languages offer a host of advantages, such as ensuring memory safety (opens in new tab) and eliminating arbitrary side effects. This enables systematic analysis and compositional program construction, facilitating development of scalable and complex software systems. However, a drawback of functional programming is its tendency to liberally allocate new memory. We believe this characteristic has impeded widespread adoption in performance-critical domains. How can we overcome this limitation and harness the benefits of functional programming while maintaining efficient memory usage?

To illustrate the issue, let’s examine the well-known functional program to reverse a list in linear time using an accumulating parameter:

FP2: Fully In-Place Functional Programming - reverse list code in Koka

The reversal function is written in Koka (opens in new tab), a functional language developed at Microsoft that implements the techniques described in this blog post. Here, a list is either empty (as Nil) or non-empty as a Cons(head,tail) node, and contains the first element as the head and the rest of the list as the tail.

In most functional languages, reversing a list this way allocates a fresh result list in the heap, where a list of integers from 1 to 10 is reversed, as shown in Figure 1.

FP2: Fully In-Place Functional Programming; Fig 1- This illustration shows two single-linked lists. The first single-linked list contains the numbers 6 to 10 and is pointed to by — Figure 1: The list [1..5] has already been reversed into acc, but we still must reverse the list [6..10].

As the list xs is non-empty, we add its first element to our accumulating acc parameter before recursing on the rest of the list xx. As shown in Figure 2, this step allocates a new Cons cell but also leaves the Cons cell of xs to be garbage collected. This is rather wasteful.

FP2: Fully In-Place Functional Programming; Fig 3- This illustration depicts two single-linked lists. The first single-linked list contains the numbers 7 to 10 and is pointed to by — Figure 2: The lists after one step of recursion. The top `Cons` cell on the left has become garbage, while the top `Cons` cell on the right is freshly allocated.

Fully in-place functional programming avoids allocation

Recent developments have made it possible to avoid such allocations. In particular, by using a compiler-guided reference counting algorithm called Perceus, we can reuse objects in place whenever the objects are uniquely referenced at runtime. With such reuse, the reverse function can reverse a unique input list xs in-place without allocating any fresh Cons nodes, essentially switching the tail pointers of xs in-place. However, the dynamic nature of this form of reuse makes it hard to predict its application at runtime.

In our paper, “FP²: Fully in-Place Functional Programming (opens in new tab),” which we’re presenting at ICFP 2023 (opens in new tab), we describe the new fip keyword. It statically checks that programs like the accumulating reverse function can execute in-place, that is, using constant stack space without needing any heap allocation as long as the arguments are unique.

Tree traversals and zippers

In fact, many familiar functions and algorithms satisfy our fully in-place criteria. For example, consider a binary tree with all the values at the leaves:

FP2: Fully In-Place Functional Programming - binary tree code in Koka

Now, suppose that we want to navigate through this tree, moving up and down in search of a particular element. You might add parent pointers, but in a functional language, there is an alternative solution originally proposed by Gérard Huet known as the zipper (opens in new tab):

FP2: Fully In-Place Functional Programming - Zipper code in Koka

Download

Koka

The zipper stores subtrees along the path from the current node up to the root node. We can define operations on pairs consisting of this type of zipper and the current tree, enabling seamless movement through the tree. For example, the following function uses the zipper to move the focus to the left subtree:

FP2: Fully In-Place Functional Programming - focus on left subtree code in Koka

Here, we move to the left subtree of the current node (if it exists) and extend the zipper data type accordingly. In his 1997, Huet already observed that such zipper operations could be implemented in place:

Efficient destructive algorithms on binary trees may be programmed with these completely applicative primitives, which all use constant time, since they all reduce to local pointer manipulation.

In Koka, we can now make Huet’s intuition precise, where the fip keyword guarantees that left is in place. On closer examination, this might be surprising. While the list reversal example reused a Cons node, here it seems like we may need to garbage collect a Bin constructor and allocate a new BinL constructor. Nonetheless, because both constructors have two fields, the previous Bin memory location can still be reused (only updating the constructor tag). Our paper provides the analysis details that enable this, rooted in the concept of “reuse credits.”

Now, suppose we want to update all the values stored in a tree. Using a zipper, we can do this fully in place. While traversing, the zipper stores input tree fragments in order, using BinL for unvisited and BinR for visited subtrees. Reusing the zipper nodes allows in-order tree mapping without heap or stack usage. The tree map function starts by descending to the leftmost leaf, accumulating unvisited subtrees in BinL. Once we hit the leftmost leaf, we apply the argument function f and work our way back up, recursively processing any unvisited subtrees, as shown in Figure 3.

FP2: Fully In-Place Functional Programming - unvisited subtrees code in Koka

The mutually tail-recursive app and down functions are fully in place. Each matched Bin pairs with BinL, and each BinL with BinR, ultimately leading to BinR pairing with Bin. The definition of tmap may seem somewhat complex, but it is much simpler than its iterative imperative counterpart that uses direct pointer reversal.

FP2: Fully In-Place Functional Programming; Fig 3- An illustration of a binary search tree, where the search path has been pointer-reversed. There are five nodes in total: three leaf nodes and two internal nodes. The first leaf node is the left child of the root and has already been visited. The root node is marked as — Figure 3: The program after visiting the leaf containing f(2) on the given tree. The pointers in the zipper are reversed.

Perspectives and further reading

Koka’s new fip keyword ensures that certain functions do not allocate and only use constant stack space, offering efficient and secure code execution akin to static linear types or Rust’s borrow checker. This introduces a new paradigm for writing programs that are purely functional but can still execute in place. We consider this new technique to be a significant milestone on the path toward using high-level functional programming to develop robust software that delivers both competitive and predictable performance.

To learn about fully in-place functional programming and the Koka language, start at the Koka homepage (opens in new tab). Koka implements a variety of innovative language features, including algebraic effect handlers and first-class constructor contexts. We encourage readers to continue exploring and experimenting with fully in-place programming. For example, try implementing skew binary heaps (opens in new tab) in Koka. Can you demonstrate fully in-place heap union?

The post FP2: Fully In-Place Functional Programming provides memory reuse for pure functional programs appeared first on Microsoft Research.

Understanding social biases through the text-to-image generation lens

September 8, 2023

by Alyssa Hughes Microsoft AI

This research paper was presented at the Sixth AAAI/ACM Conference on Artificial Intelligence, Ethics, and Society (AIES) (opens in new tab), a premier forum for discussion on the societal and ethical aspects of artificial intelligence.

The rise of text-to-image (T2I) generation has ushered in a new era of innovation, offering a broad spectrum of possibilities for creators, designers, and the everyday users of productivity software. This technology can transform descriptive text into remarkably realistic visual content, empowering users to enrich their work with vivid illustrative elements. However, beneath this innovation lies a notable concern—the potential inclusion of harmful societal biases.

These T2I models create images from the extensive web data on which they had been trained, and this data often lacks representation of different demographic groups and cultures and can even harbor harmful content. When these societal biases seep into AI-generated content, they perpetuate and amplify pre-existing societal problems, reinforcing them and creating a disconcerting cycle that undermines previous and current mitigation efforts.

Representation of gender, race, and age across occupations and personality traits

To tackle this problem, it is essential to rigorously evaluate these models across a variety of demographic factors and scenarios. In our paper, “Social Biases through the Text-to-Image Generation Lens (opens in new tab),” presented at AIES 2023 (opens in new tab), we conduct a thorough analysis for studying and quantifying common societal biases reflected in generated images. We focus on the portrayal of occupations, personality traits, and everyday situations across representations of gender, age, race, and geographical location.

For example, consider images that reinforce societal biases for the roles of CEO and housekeeper. These professions have been extensively studied as examples of stereotypical gender biases—where predominantly men are CEOs and women are housekeepers. For all such cases, we observed three different perspectives:

Real-world distribution: Relies on labor statistics, presenting distribution across various dimensions, such as gender, race, and age.

Search engine results: Captures the distribution evident in search engine outcomes, reflecting contemporary portrayals.

Image generation results: Emphasizes the distribution observed in image generation outputs.

We tested two T2I generators, DALLE-v2 and Stable Diffusion and compared them with 2022 data from the U.S. Bureau of Labor Statistics and results for a Google image search conducted in 2020, examining how women are represented across five different occupations. Notably, the analysis of generation models revealed a significant setback in representational fairness compared with data sourced from the U.S. Bureau of Labor Statistics (BLS) and a web image search (GIS) based on geographically referenced information. Notably, images generated by DALLE-v2 provide minimal representation of women in the professions of CEO and computer programmer. Conversely, in images generated by Stable Diffusion, women are consistently represented in the roles of nurses and housekeepers 100% of the time. Figure 1 illustrates our findings, and Figure 2 shows examples of images generated to show different occupations.

A chart showing gender representation in percentage for DALLE-v2, Stable Diffusion, Google Image Search 2020, and BLS data. — Figure 1. Gender representation for DALLE-v2, Stable Diffusion, Google Image Search 2020, and BLS data.

Examples of generations for the professions of “computer programmer” and “housekeeper” using the DALL-E v2 and Stable Diffusion models. — Figure 2. A sample of the first four images generated for the professions of “computer programmer” and “housekeeper” using the DALL-E v2 and Stable Diffusion models. Notably, one gender is conspicuously absent across a distribution of 500 generated images.

Even when using basic prompts like “person” without including an occupation, we observed that models can underrepresent certain demographic groups across age, race, and gender. When we analyzed DALLE-v2 and Stable Diffusion, both offered a limited representation of races other than white across a set of 500 generated images. Furthermore, the DALLE-v2 outputs revealed a remarkable lack of age diversity, with over 80% of the images depicting either adults who appeared to be between the ages 18 and 40 or children. This is illustrated in Figure 3.

Three charts showing gender, race, and age distribution as interpreted by human annotators for DALL-E v2 and Stable Diffusion models. — Figure 3. Gender, race, and age distribution as interpreted by human annotators and automated face processing within the context of image generation for the prompt “person.”

Our study also examines biases of similar representations across positive and negative personality traits, revealing the subtleties of how these traits are depicted. While individuals of nonwhite races appear linked with positive attributes such as vigor, ambition, striving, and independence, they are also associated with negative traits like detachment, hardheartedness, and conceitedness.

Representation of geographical locations in everyday scenarios

Another aspect of bias that we studied pertains to the representation of diverse geographical locations in how models interpret everyday scenarios. We did this using such prompts as “a photo of a birthday party” or “a photo of a library.” Although it is difficult to discern the precise location of a generated photo, distinctions in these representations can still be measured between using a general prompt and a prompt that specifies a location, for example, “a photo of a birthday party in Colombia.” In the paper, we describe this experiment for the two most populous countries in each inhabited continent, considering everyday scenarios centering around events, places, food, institutions, community, and clothing. When models were given a general prompt, overall results indicated that images generated for countries like Nigeria, Papua New Guinea, and Ethiopia had the greatest difference between the prompt and the image, while images generated for Germany, the US, and Russia were the closest aligned to the general prompt.

Subtle effects of using expanded prompts

Many bias mitigation techniques rely on expanding the prompt to enrich and diversify the images that models generate. To tackle bias in AI-generated images, we applied prompt engineering (opens in new tab) to increase the likelihood that the image will reflect what’s specified in the prompt. We used prompt expansion, a type of prompt engineering, to add further descriptors to the initial general prompts and guide the model towards unbiased content. An example of prompt expansion would be “a portrait of a female doctor” instead of “a portrait of a doctor.” Our experiments proved that prompt expansion is predominantly effective in creating more specified content in AI-generated images. However, there are also unintended outcomes, particularly in terms of decreased diversity and image quality, as shown in Figure 4.

Examples of generation output from DALL-E v2 for two prompts: “a portrait of an announcer” and “a portrait of a female announcer.” — Figure 4. Expanded prompts using descriptors like “female” can indeed yield more diverse depictions, but often at the cost of image variety and quality.

Safeguarding against bias in T2I models

As T2I generation models become increasingly integrated into our digital ecosystems, it is paramount that we remain vigilant to the biases they may inadvertently perpetuate. This research underscores the profound importance of continually evaluating and refining these models. We hope that the outcomes and methodology presented in this study provide valuable insights for evaluating and building new generative models. We would like to emphasize the importance of fostering responsible development and ensuring representational fairness in this process.

The post Understanding social biases through the text-to-image generation lens appeared first on Microsoft Research.

Incorporating chemists’ insight with AI models for single-step retrosynthesis prediction

September 7, 2023

by Brenda Potts Microsoft AI

Retrosynthesis analysis is a critical task in organic chemistry and central to many important industries. It primarily involves decomposing a target molecule into commercially available molecules step by step. Since synthesis strategies can be quite diverse and strategic, retrosynthesis planning with expert knowledge has long been considered an “art.”

Recently, machine learning-based approaches have achieved promising results on this task, particularly in single-step retrosynthesis prediction. In retrosynthesis, a molecule can be represented as either a 2D graph or a 1D SMILES (simplified molecular-input line-entry system) sequence. SMILES is a notation system used to represent chemical structures using plain text, which consists of a sequence of characters to describe the arrangement of atoms, bonds, and rings within a molecule. SMILES can be considered a traversal on the corresponding molecular graph, as shown in Figure 1.

Given the representations of molecules, most machine learning-based approaches employ encoder-decoder frameworks, where the encoder part encodes the molecular (the target product) sequence or graph as high dimensional vectors, and the decoder takes the output from the encoder and generates the output sequence (the predicted reactant) token-by-token autoregressively.

Casting retrosynthesis analysis as a sequence decoding problem enables the use of deep neural architectures that are well-developed in machine translation or graph neural networks. While AI has made significant strides in predicting reactants, it’s crucial to acknowledge the expertise of human chemists. In real-world route scouting tasks, synthetic chemists rely on their professional experience and abstract understanding of underlying mechanisms. They often start with molecular substructures or fragments that are chemically similar to target molecules, providing clues for a series of chemical reactions that may yield the target product.

Our paper, Single-step retrosynthesis prediction by leveraging commonly preserved substructures (opens in new tab), proposes a novel approach that leverages commonly preserved substructures in organic synthesis. This approach incorporates chemists’ insight in retrosynthesis, bringing the AI model closer to the way human experts think.

Substructure extraction and modeling

In the context of organic chemistry, “substructures” refer to molecular fragments or smaller building blocks that are chemically similar or preserved within target molecules. These substructures serve as essential components for understanding the assembly of complex molecules and play a significant role in retrosynthesis analysis.

Based on this concept, our framework consists of three main modules:

Reaction Retrieval: This module retrieves similar reactions, given a product molecule as a query. It uses a learnable cross-lingual memory retriever to align reactants and products in high-dimensional vector space.
Substructure Extraction: We extract the common substructures from the product molecule and the top cross-aligned candidates, based on molecular fingerprints. These substructures provide a reaction-level, fragment-to-fragment mapping between reactants and products.
Substructure-level Sequence-to-Sequence Learning: We convert the original token-level sequence to a substructure-level sequence. The new input sequence includes the SMILES strings of the substructures followed by the SMILES strings of other fragments with virtual number labels. The output sequences are the fragments with virtual numbers. The virtual numbers are used to indicate the bond breaking/connecting site.

Unlike most existing work, our model only needs to predict the fragments connected to the substructure, thereby simplifying the prediction task, with the substructure part remaining unchanged.

In the example shown in Figure 2, the substructure “COC(=O)Cc1cc2ccc(F)cc2[2cH]c1C.C[1SH](=O)=O” remains unchanged, and the model only needs to predict that the fragment “[2BH]2OC(C)(C)C(C)(C)O2.[1cH]1ccc(Br)nc1”. The substructure SMILES and the predicted fragment SMILES are then combined to form a complete reactants SMILES.

Retrosynthesis prediction

We analyzed our method using the USPTO full dataset (opens in new tab) and compared it to other notable works in the field. In almost every scenario, our method achieved comparable or better top-1 accuracy compared to previously tested methods. On the subset of data where substructures were successfully extracted, model performance significantly improved compared to the overall result.

The improvement in our method can be attributed to two main factors:

Our method managed to successfully extract substructures from 82.2% of all products on the USPTO full test dataset, demonstrating the general applicability of this approach.
We only needed to generate fragments connected to virtually labeled atoms in the substructures, which shortened the string representations of molecules and significantly lowered the number of atoms to be predicted.

A key aspect of our method for one-step retrosynthesis is the extraction of product-specific substructures. By doing so, we can better capture subtle structural changes from reactants to products that are unique to each reaction. Take phthalimide, a common heterocyclic substructure, as an example. We analyzed four exemplary reactions where the reactants contain phthalimide (see Figure 3). The extracted substructures vary among different reaction types, demonstrating the product-specific nature of the substructures.

In reaction (a) and reaction (b), phthalimide is not considered part of the substructure because it incorporates the reaction. However, in reaction (c) and reaction (d), the substructures are different, yet they both contain phthalimide. These results show that substructures are indeed product-specific, which aligns with our expectations.

Incorporating human insights into decision-making

In addition, leveraging commonly preserved substructures offers another benefit: providing users with valuable insights for decision-making in retrosynthesis planning. When compared to existing methods, our approach can help human experts assess potential pathways and eliminate infeasible reactions using their chemistry knowledge.

For each input product molecule, we extract multiple substructures from retrieved reactions, (see details in our paper) and for some cases, not all substructures are correct. As such, we can group predictions by substructures. As shown in Figure 4, the predicted groups of reactants and reactions offer valuable information to experts. For instance, they can refine predictions by comparing reactions associated with retrieved candidates, making our predictions more explainable and trustworthy compared to existing “black-box” models.

We hope that our work will spark interest in this fast-growing and highly interdisciplinary area of retrosynthesis prediction and other related topics. By pushing the boundaries of what’s possible in chemistry and machine learning, we can continue to make strides in understanding complex chemical reactions and designing more efficient retrosynthetic strategies.

Read the paper

The post Incorporating chemists’ insight with AI models for single-step retrosynthesis prediction appeared first on Microsoft Research.

Rethinking trust in direct messages in the AI era

September 5, 2023

by Brenda Potts Microsoft AI

Rethinking trust in direct messages in the AI era - blog hero showing a flowchart diagram

This blog post is a part of a series exploring our research in privacy, security, and cryptography. For the previous post, see https://www.microsoft.com/en-us/research/blog/research-trends-in-privacy-security-and-cryptography. While AI has the potential to massively increase productivity, this power can be used equally well for malicious purposes, for example, to automate the creation of sophisticated scam messages. In this post, we explore threats AI can pose for online communication ecosystems and outline a high-level approach to mitigating these threats.

Communication in the age of AI

Concerns regarding the influence of AI on the integrity of online communication are increasingly shared by policymakers, AI researchers, business leaders, and other individuals. These concerns are well-founded, as benign AI chatbots can be easily repurposed to impersonate people, help spread misinformation, and sway both public opinion and personal beliefs. So-called “spear phishing” attacks, which are personalized to the target, have proved devastatingly effective. This is particularly true if victims are not using multifactor authentication, meaning an attacker who steals their login credentials with a phishing mail could access authentic services with those credentials. This opportunity has not been missed by organized cybercrime; AI-powered tools marketed to scammers and fraudsters are already emerging. This is disturbing, because democratic systems, business integrity, and interpersonal relationships all hinge on credible and effective communication—a process that has notably migrated to the digital sphere.

As we enter a world where people increasingly interact with artificial agents, it is critical to acknowledge that these challenges from generative AI are not merely hypothetical. In the context of our product offerings at Microsoft, they materialize as genuine threats that we are actively addressing. We are beginning to witness the impact of AI in generating highly specific types of text (emails, reports, scripts, code) in a personalized, automated, and scalable manner. In the workplace, AI-powered tools are expected to bring about a huge increase in productivity, allowing people to focus on the more creative parts of their work rather than tedious, repetitive details. In addition, AI-powered tools can improve productivity and communication for people with disabilities or among people who do not speak the same language.

In this blog post, we focus on the challenge of establishing trust and accountability in direct communication (between two people), such as email, direct messages on social media platforms, SMS, and even phone calls. In all these scenarios, messaging commonly takes place between individuals who share little or no prior context or connection, yet those messages may carry information of high importance. Some examples include emails discussing job prospects, new connections from mutual friends, and unsolicited but important phone calls. The communication may be initiated on behalf of an organization or an individual, but in either case we encounter the same problem: if the message proves to be misleading, malicious, or otherwise inappropriate, holding anyone accountable for it is impractical, may require difficult and slow legal procedures, and does not extend across different communication platforms.

As the scale of these activities increases, there is also a growing need for a flexible cross-platform accountability mechanism that allows both the message sender and receiver to explicitly declare the nature of their communication. Concretely, the sender should be able to declare accountability for their message and the receiver should be able to hold the sender accountable if the message is inappropriate.

Elements of accountability

The problems outlined above are not exactly new, but recent advances in AI have made them more urgent. Over the past several years, the tech community, alongside media organizations and others, have investigated ways to distinguish whether text or images are created by AI; for example, C2PA is a type of watermarking technology, and one possible solution among others. With AI-powered tools increasingly being used in the workplace, Microsoft believes that it will take a combination of approaches to provide the highest value and most transparency to users.

Focusing on accountability is one such approach. We can start by listing some properties we expect of any workable solution:

People and organizations need to be able to declare accountability for the messages they send.
Receivers need to be able to hold the senders accountable if the message is inappropriate or malicious, to protect future potential victims.
There must exist an incentive for the sender to declare accountability.
The mechanism should only solve the accountability problem and nothing else. It must not have unintended side effects, such as a loss of privacy for honest participants.
Receivers should not be required to register with any service.
The accountability mechanism must be compatible with the plurality of methods people use to communicate today.

One way to build an accountability mechanism is to use a reputation system that verifies real-world identities, connecting our digital interactions to a tangible and ultimately accountable organization or human identity. Online reputation has now become an asset that organizations and individuals have a vested interest in preserving. It creates an incentive for honest and trustworthy behavior, which ultimately contributes to a safer and more reliable digital environment for everyone.

Reputation system for online accountability

Consider what an online communication user experience could be like with an integrated reputation system. In this solution, a message sender could declare their accountability by binding their message to their account in the reputation system in the form of a cryptographic reputation tag. Conversely, the receiver uses the tag to verify the sender’s reputation and can use it to report the sender if the message is inappropriate, reducing the sender’s reputation. It is the sender’s responsibility to judge whether the receiver will perceive the message as inappropriate.

Messages with an attached reputation tag are called reputed messages, whereas those without an associated reputation are called generic messages. Reputed messages would typically make the most sense in one-to-one communication that the sender intends for a particular recipient, or one-to-many communication to a few recipients. For example, a proposal to discuss a business deal, a wedding invitation email, a payment reminder SMS from a company’s billing department, or a work email discussing a joint project might be sent as reputed messages. Generic messages would typically not be intended for a particular receiver. For example, emails sent to a mailing list (many receivers) or non-personalized advertisements (large scale) should be sent as generic.

The different components and workflows of our accountability mechanism are depicted, at a high level, in Figure 1.

system diagram — Figure 1: An accountability mechanism design, showing both the account creation and message sending/reporting workflows.

Taking a concrete example, think of a situation where you receive an email from your bank asking you to verify the security settings for your account. You know that phishing emails often target such scenarios, so your first reaction is to ignore the message. However, in this case your email client has noted the valid reputation tag and automatically moved the email to a reputed messages folder. It shows the sender’s reputation, high, next to the message. Instead of deleting the unsolicited and slightly suspicious email, you decide to check whether the link in the email truly leads you to your bank’s website. You are now convinced this is a legitimate message and proceed with the recommendations to review your security settings.

As another example, suppose you work in your company’s billing department. You find something wrong with a customer’s billing information and decide to send them an email to get more information. Since this is an important matter, you hope to maximize the chance of them seeing your message by attaching the billing department’s reputation tag to it. The customer sees the email go in the reputed messages folder, notices the sender’s high reputation, and responds to it with appropriate urgency.

As a third example, imagine that you receive an unsolicited phone call from someone who claims to be your distant relative and wants to discuss a family reunion they are organizing. They ask you questions about your family, making you slightly uneasy. Right before calling you, they sent you a reputation tag via SMS encoding their reputation and the context of their call. You verify that the tag is valid, but that their reputation is medium. You decide to end the call and report them using the tag they shared, as you felt that their call asking for such sensitive information was inappropriate.

These examples highlight that this single system can be used across many different modes of communication, from emails to social media messages to phone calls, fostering trust and safety across the entire landscape of direct communication methods in use today.

Call to action

In this blog post we have attempted to outline a solution to an already existing problem that is exacerbated by modern AI. Capturing the core of this problem is not easy, and many of the previously proposed solutions have unintended consequences that make them unworkable. For example, we explained why approaches that attempt to limit the use of AI are unlikely to succeed.

The solutions are not easy either. The messaging ecosystem is vastly complex and any solution requiring fundamental changes to that are unlikely to be acceptable. Usability is a key concern as well: if the system is only designed to communicate risk, we may want to avoid inadvertently communicating safety, much like the presence of padlock symbols as a sign of HTTPS have caused confusion and underestimation of risk for web browser users (opens in new tab).

Is there a comprehensive identity framework that would connect real-world identities to digital identities? This connection to a unique real-world identity is crucial, as otherwise anyone could simply create as many distinct reputation accounts as they need for any nefarious purpose.

For organizations, the situation is easier, because countries and states tend to hold public records that establish their existence and “identity.” For individuals, platforms like Reddit, TripAdvisor, and Stack Overflow have built reputation systems for their internal use, but without a foundational layer that confirms unique human identities these cannot be used to solve our problem, just as Facebook’s “real name” policy and X Premium (formerly Twitter Blue) have been insufficient to prevent the creation and use of fake accounts. Still, this is not an impossible problem to solve: LinkedIn is already partnering with CLEAR (opens in new tab) to bind government ID verification to a verification marker in user profiles, and with Microsoft Entra Verified ID (opens in new tab) to verify employment status. Worldcoin (opens in new tab) is building a cryptocurrency with each wallet being linked to a unique real-world person through biometrics, and Apple recently announced Optic ID (opens in new tab) for biometric authentication through their Vision Pro headset.

Whenever we talk about identities—especially real-world identities—we need to talk about privacy. People use different digital identities and communication methods in different communities, and these identities need to be kept separate. Trusting a reputation system with such sensitive information requires careful consideration. Our preliminary research suggests that techniques from modern cryptography can be used to provide strong security and privacy guarantees so that the reputation system learns or reveals nothing unnecessary and cannot be used in unintended ways.

What about the governance of the reputation system? In an extreme case, a single centralized party hosts the system while providing cryptographic transparency guarantees of correct operation. In another extreme, we should explore whether a purely decentralized implementation can be feasible. There are also options between these two extremes; for example, multiple smaller reputation systems hosted by different companies and organizations.

These open questions present an opportunity and a responsibility for the research community. At Microsoft Research, we are diligently working on aspects of this problem in partnership with our research on privacy-preserving verifiable information and identity, secure hardware, transparency systems, and media provenance. We invite the rest of the research community to join in by either following the path we outlined here or suggesting better alternatives. This is the start of a broad exploration that calls for a profound commitment and contribution from all of us.

The post Rethinking trust in direct messages in the AI era appeared first on Microsoft Research.

AI Frontiers: AI in India and beyond with Sriram Rajamani

August 31, 2023

by Brenda Potts Microsoft AI

AI Frontiers with Sriram Rajamani; black and white photo of Sriram Rajamani, Managing Director of Microsoft Research India, next to the Microsoft Research Podcast

Episode 146 | August 31, 2023

Powerful large-scale AI models like GPT-4 are showing dramatic improvements in reasoning, problem-solving, and language capabilities. This marks a phase change for artificial intelligence—and a signal of accelerating progress to come.

In this Microsoft Research Podcast series, AI scientist and engineer Ashley Llorens hosts conversations with his collaborators and colleagues about what these models—and the models that will come next—mean for our approach to creating, understanding, and deploying AI, its applications in areas such as healthcare and education, and its potential to benefit humanity.

This episode features Sriram Rajamani, Distinguished Scientist and Managing Director of Microsoft Research India. Rajamani talks about how the lab’s work is being influenced by today’s rapidly advancing AI. One example? The development of a conversational agent in India capable of providing information about governmental agricultural programs in farmers’ natural language, particularly significant in a country with more than 30 languages, including 22 government-recognized languages. It’s an application Microsoft CEO Satya Nadella described as the “mic drop moment” of his trip to the lab early this year.

Learn more:

AI4Bhārat (opens in new tab)
Organization homepage
MEGA: Multilingual Evaluation of Generative AI
Publication, May 2023
AI and Microsoft Research
Learn more about the breadth of AI research at Microsoft

Transcript

[MUSIC PLAYS]

ASHLEY LLORENS: I’m Ashley Llorens with Microsoft Research. I’ve spent the last 20 years working in AI and machine learning, but I’ve never felt more fortunate to work in the field than at this moment. The development of increasingly powerful large-scale AI models like GPT-4 is accelerating the advancement of AI. These models and the systems they power are exhibiting surprising new abilities like reasoning, problem-solving, and translation across languages and domains. In this podcast series, I’m sharing conversations with fellow researchers about the latest developments in large AI models, the work we’re doing to understand their capabilities and limitations, and ultimately how innovations like these can have the greatest benefit for humanity. Welcome to AI Frontiers.

Today, I’ll speak with Sriram Rajamani, Managing Director of Microsoft Research India. For nearly 20 years, this lab has focused on interdisciplinary research, blending theory and practice and computer science with social science. Our researchers in India have made many contributions to advance AI in areas like causal reasoning, but the latest wave of powerful AI models has made a profound impact on all the lab’s work, including their approach to creating technologies for underserved communities.

[MUSIC FADES]

All right, so, Sriram, let’s dive right in. I think it’s fairly obvious for me to say at this point that ChatGPT—and generative AI more broadly—is a worldwide phenomenon. But what’s so striking to me about this is the way that so many people around the world can pick up the technology and use it in their context, in their own way. I was on a panel discussion a few weeks ago where I saw a comedian discover in real time that GPT-4 could write jokes that are actually funny. And shortly after that, I spoke to a student who was using ChatGPT to write an application to obtain a grazing permit for cattle. You know, the work of your lab is situated in its own unique societal context. So, what I really want to know and start with here today is, like, what’s the buzz been like for you in your part of the world around this new wave of AI?

SRIRAM RAJAMANI: Yeah. First of all, Ashley, you know, thank you for having this conversation with me. You’re absolutely right that our lab is situated in a very unique context on how this technology is going to play out in, you know, this part of the world, certainly. And you might remember, Ashley, a sort of a mic drop moment that happened for Satya [Nadella] when he visited India earlier this year, in January. So one of our researchers, Pratyush Kumar—he’s also co-founder of our partner organization called AI4Bhārat—he works also with the government on a project called Bhashini, which the government endeavors to bring conversational AI to the many Indian languages that are spoken in India. And what Pratyush did was he connected some of the AI4Bhārat translation models, language translation models, together with one of the GPT models to build a bot for a farmer to engage and ask questions about the government’s agricultural programs so the farmer could speak in their own language—you know, it could be Hindi—and what the AI4Bhārat models would do is to convert the Hindi speech into text and then translate it into English. And then he taught, you know, either fine-tuned or integrated with augmented generation … I don’t … I’m not … I don’t quite remember which one … it was one of those … where he made a GPT model customized to understand the agricultural program of the government. And he chained it together with this speech recognition and translation model. And the farmer could just now talk to the system, the AI system, in Hindi and ask, you know, are they eligible for their benefits and many details. And the, and the model had a sensible conversation with him, and Satya was just really amazed by that, and he calls … he called that as the mic drop moment of his trip in India, which I think is indicative of the speed at which this disruption is impacting very positively the various parts of the world, including the Indian subcontinent.

LLORENS: You referenced the many Indian languages written and spoken. Can you just bring, bring that to life for us? How many, how many languages are we talking about?

RAJAMANI: So, I think there are at least, you know, 30 or 40, you know, main, mainstream languages. I mean, the government recognizes 22. We call them as IN22. But I would think that there are about 30-plus languages that are spoken very, very broadly, each of them with, you know, several tens of millions, hundreds of millions of speakers. And then there is a long tail of maybe a hundred more languages which are spoken by people with … in, in smaller population counts. The real … they’re also very low-resource languages like Gondi and Idu Mishmi, which are just spoken by maybe just only a million speakers or even under a million speakers who probably … those languages probably don’t have enough data resources. So, India is an amazing testbed because of this huge diversity and distribution of languages in terms of the number of speakers, the amount of available data, and, and many of these tail languages have unique, you know, sociocultural nuances. So I think in that sense, there’s a really good testbed for, you know, how conversational AI can inclusively impact the entire world.

LLORENS: And, and what’s the … you mentioned tail languages. And so maybe we mean they’re low-resource languages like you also mentioned. What’s the gap like between what languages AI is accessible in today versus the full extent of all those languages that you just described, even just for, you know, for the Indian subcontinent?

RAJAMANI: So what is … what we’re seeing is that with IN22, the top languages, if you look at successive versions of the GPT models, for example, the performance is definitely improving. So if you just go from, you know, GPT-2 to GPT-3 to 3.5 to 4, right, you can sort of see that these models are increasingly getting capable. But still there is a gap between what these models are able to do and what custom models are able to do, particularly if you go towards languages in which there’s not enough training data. So, so people in our lab, you know, are doing very systematic work in this area. There is a benchmarking work that my colleagues are doing called MEGA, where there is systematic benchmark being done on various tasks on a matrix that consists of, you know, tasks on one axis and languages on another axis to just systematically, empirically study, you know, what these models are able to do. And also, we are able to build models to predict how much more data is needed in each of these languages in order for the performance to be comparable to, say, languages like English. What is the … what is the gap, and how much data is needed? The other thing is that it turns out that these models, they, they learn also from related languages. So if you want to improve the performance of a language, it turns out there are other languages in the world and in India that have similar characteristics, you know, syntactic and semantic characteristics, to the language that you’re thinking about. So we can also sort of recommend, you know, what distribution of data we should collect so that all the languages improve. So that’s the kind of work that we’re doing.

LLORENS: Yeah, it’s one of the most fascinating parts of all of this—how diversity in the training dataset improves, you know, across the board, like even the addition of code, for example, in addition to language, and now we’re even seeing even other modalities. And, you know, the, the wave of AI and the unprecedented capabilities we’re seeing has significant implications for just about all of computing research. In fact, those of us in and around the field are undergoing now a process that I call, you know, reimagining computing research. And, you know, that’s a somewhat artful way to put it. But beyond the technical journey, there’s an emotional journey happening across the research community and many other communities, as well. So what has that journey been like for you and the folks at the India lab?

RAJAMANI: Yeah, that’s a good question, Ashley. You know, our work in the lab spans four areas. You know, we do work in theory and algorithms. We do work in AI and machine learning. We do systems work, and we also have an area called “Technology and Empowerment.” It’s about making sure that technology benefits people. And so far, our conversation has been about the last area. But all these four areas have been affected in a big way using this disruption. Maybe, maybe I’ll just say a few more things about the empowerment area first and then move on to the other ones. If you look at our work in the empowerment area, Ashley, right, this lab has had a track record of doing work that makes technology inclusive not just from an academic perspective, but by also deploying the work via spun-off startups, many startups, that have taken projects in the lab and scaled them to the community. Examples are Digital Green, which is an agricultural extension; 99DOTS, which is a tuberculosis medication adherence system. Karya is a, is a platform for dignified digital labor to enable underprivileged users, rural users, to contribute data and get paid for it. You know, HAMS is a system that we have built to improve road safety. You know, we’ve built a system called BlendNet that enables rural connectivity. And almost all of these, we have spun them off into startups that are … that have been funded by, you know, venture capitalists, impact investors, and we have a vibrant community of these partners that are taking the work from the lab and deploying them in the community. So the second thing that is actually happening in this area is that, as you may have heard, India is playing a pivotal role in digital public infrastructure. Advances like the Aadhaar biometric authentication system; UPI, which is a payment system—they are pervasively deployed in India, and they reach, you know, several hundreds of millions of people. And in the case of Aadhaar, more than a billion people and so on. And the world is taking note. India is now head of the G20, and many countries now want to be inspired by India and build such a digital public infrastructure in their own countries, right. And so, so, so what you saw is the mic drop moment, right? That … it actually has been coming for a long time. There has been a lot of groundwork that has been laid by our lab, by our partners, you know, such as AI4Bhārat, the people that work on digital public goods to get the technical infrastructure and our know-how to a stage where we can really build technology that benefits people, right. So, so going forward, in addition to these two major advancements, which is the building of the partner and alumni ecosystem, the digital public good infrastructure, I think AI is going to be a third and extremely important pillar that is going to enable citizen-scale digital services to reach people who may only have spoken literacy and who might speak in their own native languages and the public services can be accessible to them.

LLORENS: So you mentioned AI4Bhārat, and I’d love for you to say a bit more about that organization and how researchers are coming together with collaborators across sectors to make some of these technology ideas real.

RAJAMANI: Yeah. So AI4Bhārat is a center in IIT Madras, which is an academic institution. It has multiple stakeholders, not just Microsoft Research, but our search technology center in India also collaborates with them. Nandan Nilekani is a prominent technologist and philanthropist. He’s behind a lot of India’s digital public infrastructure. He also, you know, funds that center significantly through his philanthropic efforts. And there are a lot of academics that have come together. And what the center does is data collection. I talked about the diversity of, you know, Indian languages. They collect various kinds of data. They also look at various applications. Like in the judicial system, in the Indian judicial system, they are thinking about, you know, how to transcribe, you know, judgments, enabling various kinds of technological applications in that context, and really actually thinking about how these kinds of AI advances can help right on top of digital public goods. So that’s actually the context in which they are working on.

LLORENS: Digital public goods. Can you, can you describe that? What, what do we mean in this context by digital public good?

RAJAMANI: So what we mean is if you look at Indian digital public infrastructure, right, that is, as I mentioned, that is Aadhaar, which is the identity system that is now enrolled more than 1.3 billion Indians. There is actually a payment infrastructure called UPI. There are new things that are coming up, like something that’s, that’s called Beckn. There’s something called ONDC that is poised to revolutionize how e-commerce is done. So these are all, you know, sort of protocols that through private-public partnership, right, government together with think tanks have developed, that are now deployed in a big way in India. And they are now pervasively impacting education, health, and agriculture. And every area of public life is now being impacted by these digital public infrastructures. And there is a huge potential for AI and AI-enabled systems to ride on top of this digital public infrastructure to really reach people.

LLORENS: You know, you talked about some of the, you know, the infrastructure considerations, and so what are the challenges in bringing, you know, digital technologies to, you know, to, to the Indian context? And, and you mentioned the G20 and other countries that are following the patterns. What are, what are some of the common challenges there?

RAJAMANI: So, I mean, there are many, many challenges. One of them is lack of access. You know, though India has made huge strides in lifting people out of poverty, people out there don’t have the same access to technology that you and I have. Another challenge is awareness. People just don’t know, you know, how technology can help them, right. You know, people hearing this podcast know about, you know, LinkedIn to get jobs. They know about, you know, Netflix or other streaming services to get entertainment. But there are many people out there that don’t even know that these things exist, right. So awareness is another issue. Affordability is another issue. So … many of the projects that I mentioned, what they do is actually they start not with the technology; they start with the users and their context and this situation, and what they’re trying to do and then map back. And technology is just really one of the pieces that these systems, that all of these systems that I mentioned, right … technology is just only one component. There’s a sociotechnical piece that deals with exactly these kinds of access and awareness and these kinds of issues.

LLORENS: And we’re, we’re kind of taking a walk right now through the work of the lab. And there are some other areas that you, you want to get into, but I want to come back to this … maybe this is a good segue into the emotional journey part of the question I asked a few minutes ago. As you get into some of the, you know, the deep technical work of the lab, what were some of the first impressions of the new technologies, and what were, what were some of the first things that, you know, you and your colleagues there and our colleagues, you know, felt, you know, in observing these new capabilities?

RAJAMANI: So I, I think Peter [Lee] mentioned this very eloquently as stages of grief. And me and my colleagues, I think, went through the same thing. I mean, the … there was … we went from, you know, disbelief, saying, “Oh, wow, this is just amazing. I can’t believe this is happening” to sort of understanding what this technology can do and, over time, understanding what its limitations are and what the opportunities are as a scientist and technologist and engineering organization to really push this forward and make use of it. So that’s, I think, the stages that we went through. Maybe I can be a little bit more specific. As I mentioned, the three other areas we work on are theory in algorithms, in machine learning, and in systems. And I can sort of see … say how my colleagues are evolving, you know, their own technical and research agendas in the, in the light of the disruption. If you take our work in theory, this lab has had a track record of, you know, cracking longstanding open problems. For example, problems like the Kadison-Singer conjecture that was open for many years, many decades, was actually solved by people from the lab. Our lab has incredible experts in arithmetic and circuit complexity. They came so close to resolving the VP versus VNP conjecture, which is the arithmetic analog of the P versus NP problem. So we have incredible people working on, working on theoretical computer science, and a lot of them are now shifting their attention to understanding these large language models, right. Instead of understanding just arithmetic circuits, you know, people like Neeraj Kayal and Ankit Garg are now thinking about mathematically what does it take to understand transformers, how do we understand … how might we evolve these models or training data so that these models improve even further in performance in their capabilities and so on. So that’s actually a journey that the theory people are going through, you know, bringing their brainpower to bear on understanding these models foundationally. Because as you know, currently our understanding of these foundation models is largely empirical. We don’t have a deep scientific understanding of them. So that’s the opportunity that the, that the theoreticians see in this space. If you look at our machine learning work, you know, that actually is going through a huge disruption. I remember now one of the things that we do in this lab is work on causal ML … Amit Sharma, together with Emre Kiciman and other colleagues working on causal machine learning. And I heard a very wonderful podcast that you hosted them some time ago. Maybe you can say a little bit about what, what you heard from them, and then I can pick up back and then connect that with the rest of the lab.

LLORENS: Sure. Well, it’s … you know, I think the, the common knowledge … there’s, there’s so many, there’s so many things about machine learning over the last few decades that have become kind of common knowledge and conventional wisdom. And one of those things is that, you know, correlation is not causation and that, you know, you know, learned models don’t, you know, generally don’t do causal reasoning. And so we, you know, we’ve had very specialized tools created to do the kind of causal reasoning that Amit and Emre do. And it was interesting. I asked them some of the same questions I’m asking you now, you know, about the journey and the initial skepticism. But it has been really interesting to see how they’re moving forward. They recently published a position paper on arXiv where they conducted some pretty compelling experiments, in some cases, showing something like, you know, causal reasoning, you know, being, being exhibited, or at least I’ll say convincing performance on causal reasoning tasks.

RAJAMANI: Yeah, absolutely.

LLORENS: Yeah, go ahead.

RAJAMANI: Yeah, yeah, yeah, absolutely. So, so, so, you know, I would say that their journey was that initially they realized that … of course, they build specialized causal reasoning tools like DoWhy, which they’ve been building for many years. And one of the things they realized was that, “Oh, some of the things that DoWhy can do with sophisticated causal reasoning these large language models were just able to do out of the box.” And that was sort of stunning for them, right. And so the question then becomes, you know, does specific vertical research in causal reasoning is even needed, right. So that’s actually the shock and the awe and the emotional journey that these people went through. But actually, after the initial shock faded, they realized that there is actually [a] “better together” story that is emerging in the sense that, you know, once you understand the details, what they realized was that natural language contains a lot of causal information. Like if you just look at the literature, the literature has many things like, you know, A causes B and if there is, if there is, you know, hot weather, then ice cream sales go up. You know, this information is present in the literature. So if you look at tools like DoWhy, what they do is that in order to provide causal machine learning, they need assumptions from the user on what the causal model is. They need assumptions about what the causal graph is, what is the user’s assumptions about which variables depend on which variables, right? And then … and, and, and what they’ve realized is that models like GPT-4 can now provide this information. Previously, only humans were able to provide this information. And … but in addition to that, right, tools like DoWhy are still needed to confirm or refute these assumptions, statistically, using data. So this division of labor between getting assumptions from either a human or from a large language model and then using the mathematics of DoWhy to confirm or refute the assumptions now is emerging as a real advance in the way we do causal reasoning, right? So I think that’s actually what I heard in your podcast, and that’s indicative of actually what the rest of my colleagues are going through. You know, moving from first thinking about, “Oh, GPT-4 is like a threat, you know, in the sense that it really obviates my research area” to understand, “Oh, no, no. It’s really a friend. It, it really helps me do, you know, some of the things that required primarily human intervention. And if I combine GPT or these large language models together with, you know, domain specific research, we can actually go after bigger problems that we didn’t even dare going after before.”

LLORENS: Mmm. Let me, let me ask you … I’m going to, I’m going to pivot here in a moment, but did you … have you covered, you know, the areas of research in the lab that you wanted to walk through?

RAJAMANI: Yeah, yeah, there’s, there’s more. You know, thank you for reminding me. Even in the machine learning area, there is another work direction that we have called extreme classification, which is about building very, very … classifiers with a large number of labels, you know, hundreds of millions and billions of labels. And, you know, these people are also benefiting from large language encoders. You know, they have come up with clever ways of taking these language encoders that are built using self-supervised learning together with supervised signals from things like, you know, clicks and logs from search engines and so on to improve performance of classifiers. Another work that we’ve been doing is called DiskANN, or approximate nearest neighbor search. As you know, Ashley, in this era of deep learning, retrieval works by converting everything in the world, you know, be it a document, be it an image, you know, be it an audio or video file, everything into an embedding, and relevance … relevant retrieval is done by nearest neighbor search in a geometric space. And our lab has been doing … I mean, we have probably the most scalable vector index that has been built. And, and, and these people are positively impacted by these large language models because, you know, as you know, retrieval augmented generation is one of the most common design patterns in making these large language models work for applications. And so their work is becoming increasingly relevant, and they are being placed huge demands on, you know, pushing the scale and the functionality of the nearest neighbor retrieval API to do things like, oh, can I actually add predicates, can I add streaming queries, and so on. So they are just getting stretched with more demand, you know, for their work. You know, if you look at our systems work, which is the last area that I want to cover, you know, we have, we have been doing work on using GPUs and managing GPU resources for training as well as inference. And this area is also going through a lot of disruption. And prior to these large language models, these people were looking at relatively smaller models, you know, maybe not, you know, hundreds of billions to trillions of parameters. But, but, you know, maybe hundreds of millions and so on. And they invented several techniques to share a GPU cluster among training jobs. So the disruption that they had was all these models are so large that nobody is actually sharing clusters for them. But it turned out that some of the techniques that they invented to deal with, you know, migration of jobs and so on are now used for failure recovery in very, very large models. So it turns out that, you know, at the beginning it seems like, “Oh, my work is not relevant anymore,” but once you get into the details, you find that there are actually still many important problems. And the insights you have from solving problems for smaller models can now carry over to the larger ones. And one other area I would say is the area of, you know, programming. You know, I myself work in this area. We have been doing … combining machine learning together with program analysis to build a new generation of programing tools. And the disruption that I personally faced was that the custom models that I was building were no longer relevant; they’re, they’re not even needed. So that was a disruption. But actually, what me and my colleagues went through was that, “OK, that is true, but we can now go after problems that we didn’t dare to go before.” Like, for example, you know, we can now see that, you know, copilot and so on let you give recommendations in the context of the particular file that you are editing. But can we now edit an entire repository which might contain, you know, millions of files with hundreds of millions of code? Can I just say, let’s take, for example, the whole of the Xbox code base or the Windows code base, and in the whole code base, I want to do this refactoring, or I want to, you know, migrate this package from … migrate this code base from using now this serialization package to that serialization package. Can we just do that, right? I think we wouldn’t even dare going after such a problem two years ago. But now with large language models, we are thinking, can we do that? And large language models cannot do this right now because, you know, whatever context size you have, you can’t have 100-million-line code as a context to a large language model. And so this requires, you know, combining program analysis with these techniques. That’s as an example. And actually, furthermore, there are, you know, many things that we are doing that are not quite affected by large language models. You know, for example, Ashley, you know about the HyWay project, where we’re thinking about technology to make hybrid work work better. And, you know, we are doing work on using GPUs and accelerators for, you know, database systems and so on. And we do networking work. We do a low-earth orbit satellite work for connectivity and so on. And those we are doubling down, you know, though, they have nothing to do with large language models because those are problems that are important. So, I think, you know, to summarize, I would say that, you know, most of us have gone through a journey from, you know, shock and awe to sort of somewhat of an insecurity, saying is my work even relevant, to sort of understanding, oh, these things are really aides for us. These are not threats for us. These are really aides, and we can use them to solve problems that we didn’t even dream of before. That’s the journey I think my colleagues have gone through.

LLORENS: I want to, I want to step into two of the concepts that you just laid out, maybe just to get into some of the intuitions as to what problem is being solved and how generative AI is sort of changing the way that those, those problems are solved. So the first one is extreme classification. I think, you know, a flagship use of generative AI and foundation models is, is Bing chat. And so I think this idea of, of internet search as a, as a, you know, as a, a home for, for these new technologies is, is in the popular imagination now. And I know that extreme classification seeks to solve some challenges related to search and information retrieval. But what is the challenge problem there? What, you know … how is extreme classification addressing that, and how is that, you know, being done differently now?

RAJAMANI: So as I mentioned, where my colleagues have already made a lot of progress is in combining language encoders with extreme classifiers to do retrieval. So there are these models called NLR. Like, for example, there’s a tooling NLR model, which is a large language model which does representation, right. It actually represents, you know, keywords, keyword phrases, documents, and so on in the encodings, you know, based on, you know, self-supervised learning. But it is a very important problem to combine the knowledge that these large language models have, you know, from understanding a text. We have to combine that with supervised signals that we have from click logs. Because we have search engine click logs, we know, you know, for example, when somebody searches for this information and we show these results, what users click on. That’s supervised signals, and we have that in huge amounts. And what our researchers have done is they have figured out how to combine these encoders together with the supervised signals from click logs in order to improve both the quality and cost of retrieval, right. And, Ashley, as you said, retrieval is an extremely important part of experiences like Bing chat and retrieval augmented generation is what prevents hallucination and grounds these large language models with appropriate information retrieved and presented so that the, the relevant results are grounded without hallucination, right. Now, the new challenge that this team is now facing is, OK, that’s so far so good as far as retrieval is concerned, right? But can we do similar things with generation, right? Can we now combine these NLG models, which are these generative models, together with supervised signals, so that even generation can actually be guided in this manner, improved in both performance, as well as accuracy. And that is an example of a challenging problem that the team is going after.

LLORENS: Now let’s do the same thing with programming, and maybe I’m going to engage you on a slightly higher level of abstraction than the deep work you’re doing. And then, we can, we can, we can get back down into the work. But one of the things … one of, one of the, one of the popular ideas about these new foundation models is that you can … effectively through interacting with them, you’re sort of programming them in natural language. How does that concept sit with you as someone who, you know, is an expert in programming languages? What do you, what do you think, what do you think when someone says, you know, sort of programming the, you know, the system in natural language?

RAJAMANI: Yeah, so I, I find it fascinating and, you know, for one, you know, can we … an important topic in programming language research has been always that can we get end users or, you know, people who are nonprogrammers to program. I think that has been a longstanding open problem. And if you look at the programming language community, right, the programming language community has been able to solve it only in, in narrow domains. You know, for example, Excel has Flash Fill, where, through examples, you know, people can program Excel macros and so on. But those are not as general as these kinds of, you know, LLM-based models, right. And, and it is for the whole community, not just me, right. It was stunning when users can just describe in natural language what program they want to write and these models emit in a Python or Java or C# code. But there is a gap between that capability and having programmers just program in natural language, right. Like, you know, the obvious one is … and I can sort of say, you know, write me Python code to do this or that, and it can generate Python code, and I could run it. And if that works, then that’s a happy path. But if it doesn’t work, what am I supposed to do if I don’t know Python? What am I supposed to do, right? I still have to now break that abstraction boundary of natural language and go down into Python and debug Python. So one of the opportunities that I see is then can we build representations that are also in natural language, but that sort of describe, you know, what the application the user is trying to build and enable nonprogrammers—could be lawyers, could be accountants, could be doctors—to engage with a system purely in natural language and the system should talk back to you, saying, “Oh, so far this is what I’ve understood. This is the kind of program that I am writing,” without the user having to break that natural language abstraction boundary and going and having to go and understand Python, right? I think this is a huge opportunity in programming languages to see whether … can we build, like, for example, right, Ashley, right, I’m a programmer, and one of the things I love about programming is that I can write code. I can run it, see what it produces, and if I don’t like the results, I can go change the code and rerun it. And that’s sort of the, you know, coding, evaluating … we call it the REPL loop, right. So that’s, that’s what a programmer faces, right. Can we now provide that to natural language programmers? And since … and I want to say, “Here’s a program I want to write,” and now I want to say, “Well, I want to run this program with this input.” And if it doesn’t work, I want to say, “Oh, this is something I don’t like. I want to change this code this way,” right. So can I now provide that kind of experience to natural language programming? I think that’s a huge opportunity if you managed to pull that off.

LLORENS: And now let’s, let’s maybe return to some of the more societally oriented, you know, topics that, that you were talking about at the top of the episode in the context of, of, of programming. Because being able to program in natural language, I think, really changes, you know, who can use the technologies, who can develop technologies, what a program … what a software development team can actually be, and who, who that, who that kind of a team can consist of. So can you paint a picture? You know, what, what, what kind of opportunities for, for, you know, software development does this open up when you can sort of program in natural languages, assuming we can make the AI compatible with your language, whatever that happens to be?

RAJAMANI: Yeah, I think there are a lot of opportunities, and maybe I’ll, I’ll, I’ll describe a few things that we’re already doing. My, my colleagues are working on a project called VeLLM, which is now a copilot assistant for societal-scale applications. And one application they are going after is education. So, you know, India, like many other countries, has made a lot of educational resources available to teachers in government schools and so on so that if a teacher wants to make a lesson plan, you know, there is enough information available for them to search, find out many videos that their colleagues have created from different parts of the country, and put them together to create a lesson plan for their class, right. But that is a very laborious process. I mean, you have information overload when you deal with it. So my colleagues are thinking about, can we now think about, in some sense, the teacher as a programmer and have the teacher talk to the VeLLM system saying, “Hey, and here is my lesson plan. Here is what I’m trying to put together in terms of what I want to teach. And I now want the AI system to collect the relevant resources that are relevant to my lesson plan and get them in my language, the language that my students speak. You know, how do I do that,” right? And all of the things that I mentioned, right, you have to now index all of the existing information using vector indices. You have to now [use] retrieval augmented generation to get the correct thing. You have to now deal with the trunk and tail languages because this teacher might be speaking in, in, in a language that is not English, right. And, and, and, and the teacher might get a response that they don’t like, right. And how do they now … but they are not a programmer, right? How are they going to deal with it, right? So that’s actually an example. If we, if we pull this off, right, and a teacher in rural India is able to access this information in their own language and create a lesson plan which contains the best resources throughout the country, right, we would have really achieved something.

LLORENS: Yeah, you know, it’s a, it’s a hugely compelling vision. And I’m really looking forward to seeing where you and, you know, our colleagues in Microsoft Research India Lab and MSR [Microsoft Research] more broadly, you know, take all these different directions.

[MUSIC PLAYS] So I really appreciate you spending this time with me today.

RAJAMANI: Thank you, Ashley. And I was very happy that I could share the work that my colleagues are doing here and, and bringing this to your audience. Thank you so much.

The post AI Frontiers: AI in India and beyond with Sriram Rajamani appeared first on Microsoft Research.

Building a “heavy metal quartet” of AI compilers

August 30, 2023

by Alyssa Hughes Microsoft AI

By MSR Editor

Compilation is an important process in program development, in which a program called a compiler translates source code written in a programming language into machine code executable on computer hardware. As AI technology and large-scale AI models become increasingly prevalent across the digital world, their unique characteristics are posing new challenges for compilers.

As AI models have evolved from early versions like recurrent neural networks (RNN) and convolutional neural networks (CNN) to more recent iterations like Transformer, their fundamental architecture is also constantly evolving. Meanwhile, the underlying hardware accelerators, such as graphics processing units (GPUs) and neural processing units (NPUs), are iterating rapidly as well, with some designs disrupting previous architectures. Therefore, an AI compiler plays a critical role in helping new AI models run efficiently on new hardware.

In response, researchers from Microsoft Research, in collaboration with academic colleagues, conducted a series of research and released the “heavy-metal quartet” of AI compilers: Rammer, Roller, Welder, and Grinder^[1]. This quartet provides systematic and innovative solutions for current mainstream AI models and hardware compilation.

The left diagram shows the unified compiler abstraction with a tile-based intermediate representation (IR) as the core. The right diagram shows the four core AI compilation technologies. — Figure 1: The four core AI compilation technologies based on unified tile abstraction

AI compilation “Rammer” improves hardware parallel utilization

Deep neural networks (DNNs) are widely adopted in image classification, natural language processing, and many other intelligence tasks. Because of their importance, many computing devices such as CPUs, GPUs, and specially designed DNN accelerators are being used to perform DNN computations. One key variable for DNN computation efficiency is scheduling, which determines the order in which computational tasks are performed on hardware. Conventional AI compilers typically treat DNN computation as a data flow graph where each node represents a DNN operator. These operators are implemented as opaque library functions and are scheduled to run on the accelerator separately. At the same time, this process also relies on another layer of schedulers, usually implemented in hardware, to take advantage of the parallelism available in operators. This two-level approach incurs significant scheduling overhead and often does not fully utilize hardware resources.

To address this issue, researchers proposed a new DNN compiler, Rammer, which can optimize the execution of DNN workloads on massive-parallel units of accelerators. Rammer imagines the scheduling space for AI compilation as a two-dimensional plane, where computational tasks are “bricks” that can be divided into different shapes and sizes. The purpose of scheduling in Rammer is to arrange these bricks tightly—as if building a wall—on the computational units of the two-dimensional plane. The arrangement should not leave any gaps, which would hurt hardware utilization and thus reduce execution speed. Rammer works like a compactor in this two-dimensional space: when a DNN program is translated into bricks, Rammer can place them on different computing units of the accelerator to compact them.

A schematic diagram illustrating Rammer’s technical framework. The input to Rammer is a data-flow graph where a node is an rOperator. Then, Rammer introduces rTask-aware DFG compiler to manage the inter and intra-operator scheduling in one place. The rTask-aware DFG compiler will generate a static execution plan for runtime execution. Rammer abstracts a hardware accelerator as a virtualized parallel device (vDevice), which includes multiple virtualized execution units (vEUs). The vDevice provides the scheduling and synchronization capabilities at the rTask level so that the rProgram can be mapped to the corresponding vEUs at compile time. The vEUs, together with the vDevice will be mapped to the hardware at runtime. — Figure 2: Rammer’s technical framework

In other words, Rammer generates an efficient static spatiotemporal schedule for DNNs ahead of time (during compilation), minimizing runtime scheduling overhead. Meanwhile, through new hardware-independent abstractions for computing tasks and hardware accelerators, Rammer exposes a larger scheduling space and provides a novel way to implement cooperative intra- and inter-operator scheduling. This allows Rammer to find more efficient schedules, thereby greatly improving hardware utilization.

Researchers evaluated Rammer on multiple devices, including NVIDIA GPUs, AMD GPUs, and Graphcore intelligence processing units (IPUs). Experiments have shown that Rammer significantly outperforms state-of-the-art compilers, such as XLA and TVM, on NVIDIA and AMD GPUs, achieving a speedup of up to 20.1 times. And compared to TensorRT, NVIDIA’s proprietary DNN inference library, Rammer achieves a speedup of up to 3.1 times.

AI compilation “Roller” improves compilation efficiency

An accelerator is equipped with parallel computing units and multiple layers of memory hierarchy. The data needs to be passed upwards layer by layer from the bottom memory layer before computation. At each layer, the data is divided into smaller bricks. Eventually, these smaller bricks are handed over to the top-level processor for computation. The challenge lies in how to partition the data and fill the memory space with large bricks, so as to better utilize available memory and improve efficiency. The current approach involves using machine learning to identify better strategies for partitioning these bricks. However, this typically requires thousands of search steps, each of which is evaluated on the accelerator, in order to find a satisfactory solution. As a result, the process can take days or even weeks to compile a full AI model.

Given the computational logic and the specification of each memory layer, which present a holistic view on the software and hardware information, it is possible to formulate the best strategy for partitioning the bricks, as well as the best brick sizes. This enables faster compilation with good computation efficiency. And it is the key idea behind Roller. Like a road roller, the system lays down high-dimensional tensor data onto two-dimensional memory like tiling a floor, finding the optimal tile sizes given the memory characteristics. At the same time, it encapsulates the tensor shape that aligns with the hardware characteristics of the underlying accelerator, achieving efficient compilation by limiting the choices for shapes.

A schematic diagram illustrating Roller’s technical framework. Roller takes an operator described as a tensor expression. Roller extracts the tensor shapes from the tensor expression and leverage hardware specifications to construct rTiles. Based on rTiles, Roller proposes a scale-up-then-scale-out recursive construction algorithm to generate efficient tensor programs (named rProgram) that describes the data processing pipeline. When generating rProgram, the construction algorithm identifies good rTile configurations by evaluating the performance of a constructed rProgram through a micro-performance model. It is built on top a device described through a hardware abstraction layer exposing only rTile-related interfaces: Load, Compute, and Store. The constructed rProgram is finally realized through a code generator to emit the final kernel code corresponding to the specific device. — Figure 3: Roller’s technical framework

Evaluations on six mainstream DNN models and 119 popular DNN operators demonstrated that Roller can generate highly optimized kernels in seconds, especially for large and expensive custom operators. Roller achieves a three-orders-of-magnitude improvement in compilation time compared to existing compilers. The performance of the kernels generated by Roller is comparable to that of state-of-the-art tensor compilers, including DNN libraries, with some operators performing even better. Roller has also been used in customizing DNN kernels internally, which has demonstrated its real improvement in development agility.

AI compilation “Welder” optimizes memory access and improves computing efficiency

With the growing demand for processing higher fidelity data and the use of faster computing cores in newer hardware accelerators, modern DNN models are becoming increasingly memory intensive. A disparity between underutilized computing cores and saturated memory bandwidth has been observed in various popular DNN models.

For example, profiling on a state-of-the-art DNN benchmark shows that the memory bandwidth utilization can be as high as 96.7% while the average utilization of computing cores is only 51.6%. Even more seriously, the continuous evolution of hardware and DNN models continues to increase this gap. Modern AI models tend to process high-fidelity data, such as larger images, longer sentences, and higher-resolution graphics. Such data demands higher memory bandwidth during computation. Additionally, the introduction of more efficient specialized computing cores (such as NVIDIA Tensor Cores or AMD Matrix Cores) further increases memory pressure.

To address this issue, the researchers proposed the Welder deep learning compiler, which holistically optimizes the memory access efficiency of the end-to-end DNN model. Represented as a data flow graph, the end-to-end DNN computation involves multiple stages, where the input data is divided into blocks that flow through different operators. These blocks are transferred to processor cores for computation and then transferred back to memory. This results in significant overhead due to data movement across memory layers. Since it includes multiple stages, the entire process can be envisioned as a scenario where “workers” are moving bricks upwards layer by layer. The first worker takes the bricks up, processes them, and then puts them back in their original location. The second worker takes them up again, sculpts them, and then once again puts them back. The process continues with the third worker, the fourth worker, and so on, repeatedly moving the bricks. However, this leads to significant overhead. Would it be possible for the first worker to finish a part of the subtask and then directly hand it over to the next worker at the top level? These tasks can then be “welded” together to achieve a pipelined operation with higher efficiency. Welder plays the role of such a welding tool. By connecting (welding) different operators, data blocks are processed in the manner of an assembly line, greatly reducing memory access traffic at lower-level memory layers. With AI models imposing increasingly high requirements for memory efficiency in recent years, Welder helps to significantly improve computational efficiency.

A schematic diagram illustrating Welder’s technical framework. Welder takes a full DNN model as input and converts it into a data-flow graph of tile-based computing tasks, which is called tile-graph. Then, a two-step scheduling algorithm, i.e., graph connecting and sub-graph scheduling, is proposed to recursively decide an efficient tile-graph execution plan for multiple memory layers, known as a hierarchical tile-graph. Finally, this plan is then mapped to an executable code for a specific hardware accelerator using four abstracted computing interfaces defined in the hardware layer. — Figure 4: Welder’s technical framework

Evaluations on 10 mainstream DNN models, (including classic and the latest AI model structures for various tasks, such as vision, natural language processing, 3D graphics, etc.), demonstrated that Welder significantly exceeds the performance of existing mainstream frameworks and compilers on both NVIDIA and AMD GPUs. For example, it outperforms PyTorch, ONNXRuntime, and Ansor by up to 21.4 times, 8.7 times, and 2.8 times, respectively. Welder’s automatic optimization surpasses even TensorRT and Faster Transformer (a hand-crafted library), achieving speedups of up to 3.0 times and 1.7 times, respectively. Furthermore, when running these models on hardware with faster computing cores such as TensorCore, performance is improved even more, underscoring the significance of memory optimization for future AI accelerators.

AI compilation “Grinder” allows efficient control flow execution on accelerators

In AI computation, the movement of data blocks sometimes requires more complex control logic, i.e., control flow code. For example, a program could iteratively traverse each word in a sentence or dynamically determine which part of a program to execute based on input. Currently, most AI compilers focus on addressing data flow execution efficiency and do not provide efficient support for control flow. As a result, models with more complex control flow cannot effectively utilize accelerator performance. The researchers realized that control flow and data flow can be segmented and reorganized in order to execute more efficiently. Their solution is Grinder, which acts like a portable grinding and cutting machine. After cutting the data flow into parallel computing blocks of different sizes, it then integrates (grinds) control flow into data flow, so that control flow can also be executed efficiently on the accelerator.

A schematic diagram illustrating Grinder’s technical framework. The example loop structure is scheduled as a uProgram mapped on the 3-level accelerator. The uProgram consists of 4 loop-uTasks for 4 L1-Units resepectively and each loop-uTask is mapped to a L1-Unit for execution. Both the data flow operators and the loop are scheduled into the loop-uTasks. — Figure 5: Grinder’s technical framework

Grinder can jointly optimize the execution of control flow and data flow on hardware accelerators and unify the representation of AI models, including both control flow and data flow, through uTask, a new abstraction. This allows Grinder to expose the overall scheduling space for rescheduling control flow to lower levels of hardware parallelism. Grinder uses a heuristic strategy to find an effective scheduling scheme and can automatically move control flow into device kernels, thereby achieving optimizations across control flow boundaries. Experiments have shown that Grinder can achieve up to an 8.2x speedup on control flow-intensive DNN models, making it the fastest among DNN frameworks and compilers for control flow.

These four AI compilers, based on a common compiler abstraction and unified intermediate representation (IR), solve multiple fundamental problems in current AI compilers, including parallelism, compilation efficiency, memory, and control flow. Together they constitute a comprehensive set of solutions for compilation. and have played an important role in the customization and optimization of new AI models within Microsoft Research.

Jilong Xue, Principal Researcher at MSR Asia, summed up the project this way:

“On one hand, AI compilers must perform extreme optimizations like operator fusion and kernel specialization tailored for hardware resources. On the other hand, they must also provide systematic compilation support for new, large-scale hardware architectures, such as AI chips featuring on-chip network interconnection (NoC) or hybrid memory architectures, and even guiding hardware design using white-box compilation technologies. The AI compilers we developed have demonstrated a substantial improvement in AI compilation efficiency, thereby facilitating the training and deployment of AI models. At the same time, the evolution of large-scale models also presents opportunities for the next generation AI compiler. In the future, these large-scale models themselves may inherently assist in achieving optimization and compilation.”

The following researchers have contributed to this project:

(In alphabetical order) Wei Cui, Yuxiao Guo, Wenxiang Hu, Lingxiao Ma, Youshan Miao, Ziming Miao, Yuqing Xia, Jilong Xue, Fan Yang, Mao Yang, Lidong Zhou

[1] Grinder is the research project name. However, this system is referred to as Cocktailer in the paper.

The post Building a “heavy metal quartet” of AI compilers appeared first on Microsoft Research.

Research Focus: Week of August 28, 2023

August 30, 2023

by Alyssa Hughes Microsoft AI

Microsoft Research Focus 23 | Week of August 28, 2023

Welcome to Research Focus, a series of blog posts that highlights notable publications, events, code/datasets, new hires and other milestones from across the research community at Microsoft.

NEW RESEARCH

An illusion of predictability in scientific results: Even experts confuse inferential uncertainty and outcome variability

In many fields, practitioners focus on inference (precisely estimating an unknown quantity, such as a population average) instead of prediction (forecasting individual outcomes). In a newly published article, researchers from Microsoft demonstrate that this focus on inference over prediction can mislead readers into thinking that the results of scientific studies are more definitive than they actually are.

Through a series of randomized experiments, the researchers demonstrate that this confusion arises for one of the most basic ways of presenting statistical findings and affects even experts whose jobs involve producing and interpreting such results, including medical professionals, data scientists, and tenure-track faculty. In contrast, the paper shows that communicating both inferential and predictive information side by side provides a simple and effective alternative, leading to calibrated interpretations of scientific results.

This article was published in the Proceedings of the National Academy of Sciences (PNAS).

Read the article

Data and code

NEW RESEARCH

FiGURe: Simple and Efficient Unsupervised Node Representations with Filter Augmentations

Contrastive learning is a powerful method for unsupervised graph representation learning. It is typically deployed on homophilic tasks, where task labels strongly correlate with the graph’s structure. However, these representations struggle when dealing with heterophilic tasks, where edges tend to connect nodes with different labels.

Several papers have tackled the problem of heterophily by leveraging information from both low and high frequency components. Yet these methods operate in semi-supervised settings, and the extension of these ideas in unsupervised learning still needs to be explored.

In a new paper: FiGURe: Simple and Efficient Unsupervised Node Representations with Filter Augmentations, researchers from Microsoft propose using filter banks for learning representations that can cater to both heterophilic and homophilic tasks. They address the related computational and storage burdens by sharing the encoder across these various filter views, and by learning a low-dimensional representation which is projected to high dimensions using Random Fourier Features. FiGURe achieves a gain of up to 4.4%, compared to the state-of-the-art unsupervised models, across all datasets in consideration, both homophilic and heterophilic.

Read the paper

View the code

AWARD

Kathleen Sullivan named to Insider’s 30 under 40 in healthcare list

Microsoft Research congratulates Kathleen Sullivan (opens in new tab) for being named to Insider’s list of 30 under 40 forging a new future in healthcare (opens in new tab). After a competitive nomination and interview, Kathleen was selected for this inspiring list of “entrepreneurs, scientists, doctors, and business leaders who are transforming the healthcare industry.”

As senior director of strategy and operations within the health and life sciences division of Microsoft Research, Sullivan helps steer the company’s investments in AI. She helped engineer a Microsoft collaboration with Nuance Technologies–a precursor to Microsoft’s acquisition of Nuance in 2021. In 2018, Sullivan helped secure Microsoft’s partnership with Adaptive Biotechnologies to map the human immune system (opens in new tab).

Read the Insider article (opens in new tab)
(subscription required)

The post Research Focus: Week of August 28, 2023 appeared first on Microsoft Research.

Using AI for tiered cloud platform operation

August 29, 2023

by Alyssa Hughes Microsoft AI

Tiered AIOps. Incidents not resolved by a tier get escalated to the next one. As upper tiers solve these incidents, this knowledge propagates to the previous tiers to improve their coverage with new models and labeled data.

Read part 1

Read part 2

Read part 3

Cloud Intelligence/AIOps blog series, part 4

In the previous posts in this series, we introduced our research vision for Cloud Intelligence/AIOps (part 1) and how advanced AI can help design, build, and manage large-scale cloud platforms effectively and efficiently; we looked at solutions that are making many aspects of cloud operations more autonomous and proactive (part 2); and we discussed an important aspect of cloud management: RL-based tuning of application configuration parameters (part 3). In this post, we focus on the broader challenges of autonomously managing the entire cloud platform.

In an ideal world, almost all operations of a large-scale cloud platform would be autonomous, and the platform would always be at, or converging to, the operators’ desired state. However, this is not possible for a variety of reasons. Cloud applications and infrastructure are incredibly complex, and they change too much, too fast. For the foreseeable future, there will continue to be problems that are novel and/or too complex for automated solutions, no matter how intelligent, to address. These may arise due to complex cascading or unlikely simultaneous failures, unexpected interactions between components, challenging (or malicious) changes in workloads such as the rapid increase in traffic due to the COVID pandemic, or even external factors such as the need to reduce power usage in a particular region.

At the same time, rapid advances in machine learning and AI are enabling an increase in the automation of several aspects of cloud operations. Our second post in this series listed a number of these, including detection or problematic deployments, fault localization, log parsing, diagnosis of failures, prediction of capacity, and optimized container reallocation.

Figure 1: Stages of evolution toward Tiered AIOps

To reconcile these two realities, we introduce the concept of Tiered AIOps. The idea is to separate systems and issues into tiers of different levels of automation and human intervention. This separation comes in stages (Figure 1). The first stage has only two tiers: one where AI progressively automates routine operations and can mitigate and solve simple incidents without a human in the loop, and a second tier where expert human operators manage the long tail of incidents and scenarios that the AI systems cannot handle. As the AI in the first tier becomes more powerful, the same number of experts can manage larger and more complex cloud systems. However, this is not enough.

New AI tools enable a final, even more scalable stage, where human expertise can also be separated into two tiers. In this stage, the middle tier involves situations and problems that the AI in the first level cannot handle, but which can be solved by non-expert, generalist human operators. AI in this second tier helps these operators manage the platform by lowering the level of expertise needed to respond to incidents. For example, the AI could automatically localize the source of an incident, recommend mitigation actions, and provide risk estimates and explanations to help operators reason about the best mitigating action to take. Finally, the last tier relies on expert engineers for complex and high-impact incidents that automated systems and generalists are unable to solve. In other words, we have the following tiers (Figure 2):

Tier 1: Fully autonomous platform operation. Automates what can be learned or predicted. Includes intelligent and proactive systems to prevent failures and resolution of incidents that follow patterns of past incidents.
Tier 2: Infrastructure for non-expert operators to manage systems and incidents. Receives context from events and incidents that are not handled in the first tier. AI systems provide context, summaries, and mitigation recommendations to generalist operators.
Tier 3: Infrastructure for experts to manage systems and incidents that are novel or highly complex. Receives context from events and incidents not handled in the first two tiers. Can enable experts to interact and manage a remote installation.

There are two types of AI systems involved: first, those that enable increasing levels of automation in the first and second tiers, and; second, the AI systems (different types of co-pilots) that assist operators. It is the latter type that enables the division between the second and third tiers, and also reduces the risk of imperfect or incomplete systems in the first tier. This separation between the top two tiers is also crucial for the operation of air-gapped clouds and makes it more feasible to deploy new datacenters in locations where there might not be the same level of expert operators.

The key idea in the Tiered AIOps concept is to simultaneously expand automation and increase the number of incidents that can be handled by the first tier, while recognizing that all three tiers are critical. The research agenda is to build systems and models to support automation and incident response in all three tiers.

Escalating incidents. Each tier must have safeguards to (automatically or not) escalate an issue to the next tier. For example, when the first tier detects that there is insufficient data, or that the confidence (risk) in a prediction is lower (higher) than a threshold, it should escalate, with the right context, to the next tier.

Migrating learnings. On the other hand, over time and with gained experience (which can be encoded in troubleshooting guides, new monitors, AI models, or better training data), repeated incidents and operations migrate toward the lower tiers, allowing operators to allocate costly expertise to highly complex and impactful incidents and decisions.

Figure 3: Performance and power of the SmartOverclock agent, showing near peak performance at significantly less power.

We will now discuss some work on extending the first tier with on-node learning, how to use new AI systems (large language models or simply LLMs) to assist operators in mitigating incidents and to move toward enabling the second tier, and, finally, how the third tier enables air-gapped clouds.

Tier 1: On-node learning

Managing a cloud platform requires control loops and agents with many different granularities and time scales. Some agents need to be located on individual nodes, either because of latency requirements of the decisions they make, or because they depend on telemetry that is too fine-grained and large to leave the node. Examples of these agents include configuration (credentials, firewalls, operating system updates), services like virtual machine (VM) creation, monitoring and logging, watchdogs, resource controls (e.g., power, memory, or CPU allocation), and access daemons.

Any agent that can use data about current workload characteristics or system state to guide dynamic adjustment of their behavior can potentially take advantage of machine learning (ML). However, current ML solutions such as Resource Central (SOSP’17) require data and decisions to run in a dedicated service outside of the server nodes. The problem is that for some agents this is not feasible, as they either have to make fast decisions or require data that cannot leave the node.

In SOL: Safe On-Node Learning in Cloud Platforms (ASPLOS’22), we proposed a framework that allows local agents to use modern ML techniques in a safe, robust, and effective way. We identified three classes of local agents that can benefit from ML. First, agents that assign resources (CPU, memory, power) benefit from near real-time workload information. Making these decisions quickly and with fine-grained telemetry enables better assignments with smaller impact to customer quality of service (QoS). Second, monitoring and logging agents, which must run on each node, can benefit from online learning algorithms, such as multi-armed bandits to smartly decide which telemetry data to sample and at which frequency, while staying within a sampling budget. Lastly, watchdogs, which monitor for metrics that indicate failures, can benefit from learning algorithms to detect problems and take mitigating actions sooner, as well as detect and diagnose more complex problems that simpler systems would not detect.

SOL makes it easy to integrate protections against invalid data, inaccurate or drifting AI models, and delayed predictions, and to add safeguards in the actions the models can take, through a simple interface. As examples, we developed agents to do CPU overclocking, CPU harvesting, and memory page hotness classification. In our experiments (Figure 3), the overclocking agent, for example, achieved near-peak normalized performance for different workloads, at nearly half of the power draw, while responding well to many failure conditions in the monitoring itself. See our paper for more details.

Publication

SOL: Safe On-Node Learning in Cloud Platforms

Tier 2: Incident similarity and mitigation with LLMs

As an example of how AI systems can enable the second tier, we are exploring how LLMs can help in mitigating and finding the root cause of incidents in cloud operations. When an incident happens in a cloud system, either generated by automated alarms or by customer-reported issues, a team of one or more on-call engineers must quickly find ways to mitigate the incident (resolving the symptoms), and then find the cause of the incident for a permanent fix and to avoid the incident in the future.

There are many steps involved in this process, and they are highly variable. There is also context that relates to the incident, which grows as both automated systems and on-call engineers perform tests, look at logs, and go through a cycle of forming, testing, and validating hypotheses. We are investigating using LLMs to help with several of these steps, including automatically generating summaries of the cumulative status of an incident, finding similar incidents in the database of past incidents, and proposing mitigation steps based on these similar incidents. There is also an ever-growing library of internal troubleshooting guides (TSGs) created by engineers, together with internal and external documentation on the systems involved. We are using LLMs to extract and summarize information from these combined sources in a way that is relevant to the on-call engineer.

We are also using LLMs to find the root cause of incidents. In a recent paper published in ISCE (2023), the Microsoft 365 Systems Innovation research group demonstrated the usefulness of LLMs in determining the root cause of incidents from the title and summary of the incident. In a survey conducted as part of the work, more than 70% of the on-call engineers gave a rating of 3 out of 5 or better on the usefulness of the recommendations in a real-time incident resolution setting.

There is still enormous untapped potential in using these methods, along with some interesting challenges. In aggregate, these efforts are a great step toward the foundation for the second tier in our vision. They can assist on-call engineers, enable junior engineers to be much more effective in handling more incidents, reduce the time to mitigation, and, finally, give room for the most expert engineers to work on the third tier, focusing on complex, atypical, and novel incidents.

Tier 3: Air-gapped clouds

We now turn to an example where the separation between the second and third tiers could enable significantly simplified operations. Air-gapped datacenters, characterized by their isolated nature and restricted access, provide a secure environment for managing sensitive data while prioritizing privacy. In such datacenters, direct access is limited and highly controlled, being operated locally by authorized employees, ensuring that data is handled with utmost care and confidentiality. However, this level of isolation also presents unique challenges when it comes to managing the day-to-day operations and addressing potential issues, as Microsoft’s expert operators do not have physical or direct access to the infrastructure.

In such an environment, future tiered AIOps could improve operations, while maintaining the strict data and communication isolation requirements. The first tier would play a critical role by significantly reducing the occurrence of incidents through the implementation of automated operations. However, the second and third tiers would be equally vital. The second tier would empower local operators on-site to address most issues that the first tier cannot. Even with AI assistance, there would be instances requiring additional expertise beyond that which is available locally. Unfortunately, the experts in the third tier may not even have access to remote desktops, or to the results of queries or commands. LLMs would serve a crucial role here, as they could become an ideal intermediary between tiers 2 and 3, sharing high-level descriptions of problems without sending sensitive information.

Figure 4: LLM-intermediated communication between remote experts (Tier 3) and generalist operators (Tier 2) to solve problems in an air-gapped datacenter.

In an interactive session (Figure 4), an LLM with access to the air-gapped datacenter systems could summarize and sanitize the problem description in natural language (①). A remote expert in Tier 3 would then formulate hypotheses and send high-level instructions in natural language for more investigation or for mitigation (②). The LLM could use the high-level instructions to form a specialized plan. For example, it could query devices with a knowledge of the datacenter topology that the expert does not have; interpret, summarize, and sanitize the results (with or without the help of the generalist, on-site operators) (③); and send the interpretation of the results back to the experts, again in natural language (④). Depending on the problem, this cycle could repeat until the problem is solved (⑤). Crucially, while the operators at the air-gapped cloud would be in the loop, they wouldn’t need deep expertise in all systems to perform the required actions and interpret the results.

Conclusion

Cloud platform operators have seen massive, continuous growth in scale. To remain competitive and viable, we must decouple the scaling of human support operations from this growth. AI offers great hope in increasing automation of platform management, but because of constant change in the systems, environment, and demands, there will likely always be decisions and incidents requiring expert human input. In this post, we described our vision of Tiered AIOps as the way to enable and achieve this decoupling and maximize the effectiveness of both AI tools and human expertise.

The post Using AI for tiered cloud platform operation appeared first on Microsoft Research.

NEW RESEARCH

PolySem: Efficient Polyglot Analytics on Semantic Data

AI Frontiers: The future of causal reasoning with Emre Kiciman and Amit Sharma

NEW RESOURCE

Generative retrieval for conversational question answering

NEW RESOURCE

BatteryML: An open-source tool for machine learning on battery degradation

Episode 148 | September 13, 2023

Subscribe to the Microsoft Research Podcast:

Transcript

Fully in-place functional programming avoids allocation

Collaborators: Holoportation communication technology with Spencer Fowers and Kwame Darko

Tree traversals and zippers

Perspectives and further reading

Representation of gender, race, and age across occupations and personality traits

Collaborators: Holoportation communication technology with Spencer Fowers and Kwame Darko

Representation of geographical locations in everyday scenarios

Subtle effects of using expanded prompts

Safeguarding against bias in T2I models

Substructure extraction and modeling

AI Explainer: Foundation models ​and the next era of AI

Retrosynthesis prediction

Incorporating human insights into decision-making

Communication in the age of AI

Elements of accountability

Reputation system for online accountability

AI Frontiers: Models and Systems with Ece Kamar

Call to action

Episode 146 | August 31, 2023

Learn more:

Subscribe to the Microsoft Research Podcast:

Transcript

AI Frontiers: The future of causal reasoning with Emre Kiciman and Amit Sharma

AI compilation “Rammer” improves hardware parallel utilization

AI compilation “Roller” improves compilation efficiency

AI compilation “Welder” optimizes memory access and improves computing efficiency

AI compilation “Grinder” allows efficient control flow execution on accelerators

NEW RESEARCH

An illusion of predictability in scientific results: Even experts confuse inferential uncertainty and outcome variability

AI and Microsoft Research

NEW RESEARCH

FiGURe: Simple and Efficient Unsupervised Node Representations with Filter Augmentations

AWARD

Kathleen Sullivan named to Insider’s 30 under 40 in healthcare list

Cloud Intelligence/AIOps blog series, part 4

Microsoft Research Newsletter

Tier 1: On-node learning

Tier 2: Incident similarity and mitigation with LLMs

Tier 3: Air-gapped clouds

Conclusion

Navigation

GenAI Vision Endless Possibilities

"I'm interested in things that change the world or that affect the future and wondrous, new technology where you see it, and you're like, 'Wow, how did that even happen? How is that possible?'" -- Elon Musk

Copyright © 2019-2025 Vedere AI. All Rights Reserved.

AI Explainer: Foundation models and the next era of AI