GraphRAG: Unlocking LLM discovery on narrative private data

GraphRAG: Unlocking LLM discovery on narrative private data

Project Ire - GraphRag background: Blue-green gradient

Perhaps the greatest challenge – and opportunity – of LLMs is extending their powerful capabilities to solve problems beyond the data on which they have been trained, and to achieve comparable results with data the LLM has never seen.  This opens new possibilities in data investigation, such as identifying themes and semantic concepts with context and grounding on datasets.  In this post, we introduce GraphRAG, created by Microsoft Research, as a significant advance in enhancing the capability of LLMs.

Retrieval-Augmented Generation (RAG) is a technique to search for information based on a user query and provide the results as reference for an AI answer to be generated. This technique is an important part of most LLM-based tools and the majority of RAG approaches use vector similarity as the search technique. GraphRAG uses LLM-generated knowledge graphs to provide substantial improvements in question-and-answer performance when conducting document analysis of complex information.  This builds upon our recent research, which points to the power of prompt augmentation when performing discovery on private datasets. Here, we define private dataset as data that the LLM is not trained on and has never seen before, such as an enterprise’s proprietary research, business documents, or communications. Baseline RAG1 was created to help solve this problem, but we observe situations where baseline RAG performs very poorly. For example:

  • Baseline RAG struggles to connect the dots.  This happens when answering a question requires traversing disparate pieces of information through their shared attributes in order to provide new synthesized insights.
  • Baseline RAG performs poorly when being asked to holistically understand summarized semantic concepts over large data collections or even singular large documents.

To address this, the tech community is working to develop methods that extend and enhance RAG (e.g., LlamaIndex (opens in new tab)).  Microsoft Research’s new approach, GraphRAG, uses the LLM to create a knowledge graph based on the private dataset.  This graph is then used alongside graph machine learning to perform prompt augmentation at query time.  GraphRAG shows substantial improvement in answering the two classes of questions described above, demonstrating intelligence or mastery that outperforms other approaches previously applied to private datasets.   

Applying RAG to private datasets

To demonstrate the effectiveness of GraphRAG, let’s start with an investigation using the Violent Incident Information from News Articles (VIINA) dataset (opens in new tab).  This dataset was chosen due to its complexity and the presence of differing opinions and partial information.  It is a messy real-world test case that was recent enough not to be included in the LLM base model’s training.  

For this research, we use thousands of news articles from both Russian and Ukrainian news sources for the month of June 2023, translated into English, to create a private dataset on which we will perform our LLM-based retrieval.  The dataset is far too large to fit into an LLM context window, thus demanding a RAG approach.

We start with an exploratory query, which we pose to both a baseline RAG system and to our new approach, GraphRAG:

Query: “What is Novorossiya?”

Baseline RAG GraphRAG
The term “Novorossiya” is not mentioned in the provided context. However, to provide some information, Novorossiya, or New Russia, is a historical term referring to a region in the south of Ukraine that was conquered by the Russian Empire in the 18th century. In recent years, the term has been revived by Russian-backed separatists in Eastern Ukraine to refer to the territories they claim, but it is not officially recognized as a separate entity by international bodies. Novorossiya, also known as New Russia, is a term with historical and contemporary significance. Historically, it referred to a region in the southern part of the Russian Empire that is now part of Ukraine. In the contemporary context, it has been used by pro-Russian separatists to refer to certain areas of Ukraine [Data: Entities (6494, 912, 6493)]. 

The term Novorossiya has been used in the context of a political movement in Ukraine, particularly by Russian nationalists. This movement has been associated with efforts to destabilize Ukraine and create a so-called “Novorossiya” [Data: Entities (912); Relationships (15211, 15206)]. The term has also been used to denote a separatist state in Ukraine, known as the Federal Republic of Novorossiya (FRN) [Data: Entities (7690)]. 

In these results, we can see both systems perform well – highlighting a class of query on which baseline RAG performs well.  Let’s try a query that requires connecting the dots:

Query: “What has Novorossiya done?”

Baseline RAG GraphRAG
The text does not provide specific information on what Novorossiya has done. Novorossiya, a political movement in Ukraine, has been involved in a series of destructive activities, particularly targeting various entities in Ukraine [Entities (6494, 912)]. The movement has been linked to plans to destroy properties of several Ukrainian entities, including Rosen, the Odessa Canning Factory, the Odessa Regional Radio Television Transmission Center, and the National Television Company of Ukraine [Relationships (15207, 15208, 15209, 15210)]. In addition to these, Novorossiya has also been associated with plans to destroy properties of Roshen, the Odessa Conservatory Factory, the Odesa Regional Radio Television Transmission Center, and the National Telecommunications Company of Ukraine [Relationships (8329, 8330, 8331, 8332)]. Furthermore, Novorossiya has been implicated in plans to blow up ATMs of PrivatBank, the largest state bank in Ukraine [Relationships (8335, 8335)]. 
 
The Office of the General Prosecutor in Ukraine has reported on the creation of Novorossiya, indicating the government’s awareness and potential concern over the activities of this movement [Relationships (8264, 8264)]. The destructive activities of Novorossiya have had significant impacts on the targeted entities and have likely contributed to the ongoing tensions and conflicts in the region. 
In summary, Novorossiya has been involved in a series of destructive activities targeting various entities in Ukraine. These activities have had significant impacts on the targeted entities and have likely contributed to the ongoing tensions and conflicts in the region. The Ukrainian government, through the Office of the General Prosecutor, has acknowledged the existence and activities of Novorossiya, indicating a level of concern over the movement’s actions. 

Baseline RAG fails to answer this question.  Looking at the source documents inserted into the context window (Figure 1), none of the text segments discuss Novorossiya, resulting in this failure.

Figure 1: LangChain Q&A Retrieved Context A table entitled “Relevant chunks of source documents” with 10 rows of text segments pulled from the VIINA dataset. Each text segment mentions a news event happening in Ukraine and Russia. None include the term ‘Novorossiya’.
Figure 1: Baseline RAG retrieved context

In comparison, the GraphRAG approach discovered an entity in the query, Novorossiya.  This allows the LLM to ground itself in the graph and results in a superior answer that contains provenance through links to the original supporting text.  For example, Figure 2 below shows the exact content the LLM used for the LLM-generated statement, “Novorossiya has been implicated in plans to blow up ATMs.” We see the snippet from the raw source documents (after English translation) that the LLM used to support the assertion that a specific bank was a target for Novorossiya via the relationship that exists between the two entities in the graph. 

Figure 2: GraphRAG Provenance An image of the GraphRAG system displaying a table of the VIINA source text used to ground the connection between Novorossiya and PrivatBank. The table has three columns for source, date, and text. There is a single row of content shown. The row shows the source is from ‘interfaxua’, the date of publication is June 8, 2023, and the text box contains a paragraph taken from the source document. In summary, the text describes the creation of Novorossiya with intent to commit acts of terrorism targeting PrivatBank, the Regional Radio and Television Broadcasting Center, and other targets. It describes recruitment of residents of Odessa. Highlighted in the text box are two separate strings of text. The first is the word ‘Novorossiya’ and the second is the text ‘criminal blew up buildings of military commissariats, ATMs’.
Figure 2: GraphRAG provenance

By using the LLM-generated knowledge graph, GraphRAG vastly improves the “retrieval” portion of RAG, populating the context window with higher relevance content, resulting in better answers and capturing evidence provenance. 

Being able to trust and verify LLM-generated results is always important.  We care that the results are factually correct, coherent, and accurately represent content found in the source material. GraphRAG provides the provenance, or source grounding information, as it generates each response.  It demonstrates that an answer is grounded in the dataset.  Having the cited source for each assertion readily available also enables a human user to quickly and accurately audit the LLM’s output directly against the original source material.   

However, this isn’t all that’s possible using GraphRAG. 

Whole dataset reasoning 

Baseline RAG struggles with queries that require aggregation of information across the dataset to compose an answer. Queries such as “What are the top 5 themes in the data?” perform terribly because baseline RAG relies on a vector search of semantically similar text content within the dataset. There is nothing in the query to direct it to the correct information. 

However, with GraphRAG we can answer such questions, because the structure of the LLM-generated knowledge graph tells us about the structure (and thus themes) of the dataset as a whole.  This allows the private dataset to be organized into meaningful semantic clusters that are pre-summarized.  The LLM uses these clusters to summarize these themes when responding to a user query. 

We illustrate whole-dataset reasoning abilities by posing the following question to the two systems: 

Query: “What are the top 5 themes in the data?

Baseline RAG GraphRAG
1. Improving the quality of life in cities and urban development. 
2. The meeting between Vladimir Putin and the President of the United Arab Emirates, Mohammed bin Zayed Al Nahyan. 
3. The National Rating of the Investment Climate in the Regions of Russia. 
4. The state of the Russian economy and the growth of domestic brands. 
5. The discussion on the meaning of life and the role of science in understanding it. 
The data presents a wide range of themes, but the top five most prevalent themes can be identified as follows: 

1. Conflict and Military Activity: A significant portion of the data revolves around the ongoing conflict in Ukraine, with numerous reports detailing military activities, occupations, and attacks by various entities. This theme includes the actions of the Ukrainian military, Russian forces, and various other military units and entities [Data: Reports (513, 241, 151, 493, 343, +more)]. 

2. Political and Governmental Entities: Many reports focus on political figures and governmental entities, highlighting their roles, relationships, and actions in the context of the current situation. This includes entities like the Kremlin, the President of Ukraine, the US State Department, and various regional administrations [Data: Reports (98, 493, 265, 129, 196, +more)]. 

3. Infrastructure and Environmental Concerns: Several reports highlight potential environmental impacts and infrastructure threats. These include potential environmental damage due to industrial activities, flooding crises, and threats to infrastructure due to military conflicts [Data: Reports (308, 154, 201, 348, 216, +more)]. 

4. Community Analysis and Threat Assessment: Several reports provide detailed analyses of specific communities, often centered around a particular location or entity. These analyses typically discuss the key entities within the community, their relationships, and any associated threats. Many reports also include assessments of potential threats associated with the entities or situations they discuss [Data: Reports (323, 596, 518, 583, 179, +more)]. 

5. Health, Humanitarian Concerns, and Emergency Situations: Some reports focus on health and humanitarian issues, such as the Shereshevsky-Turner Syndrome community and the potential risk of intestinal diseases in the Odessa region. The data also covers emergency situations, including natural disasters, accidents, and attacks [Data: Reports (204, 149, 372, 154, 532, +more)]. 
These themes provide a comprehensive overview of the current situation, highlighting the complexity and interconnectedness of the issues at hand. 

Looking at the results from baseline RAG, we see that none of the listed themes has much to do with the war between the two countries.  As anticipated, the vector search retrieved irrelevant text, which was inserted into the LLM’s context window.  Results that were included were likely keying on the word “theme,” resulting in a less than useful assessment of what is going on in the dataset. 

Observing the results from GraphRAG, we can clearly see that the results are far more aligned with what is going on in the dataset as a whole.  The answer provides the five main themes as well as supporting details that are observed in the dataset.  The referenced reports are pre-generated by the LLM for each semantic cluster in GraphRAG and, in turn, provide provenance back to original source material.

MICROSOFT RESEARCH PODCAST

AI Frontiers: The future of scale with Ahmed Awadallah and Ashley Llorens

This episode features Senior Principal Research Manager Ahmed H. Awadallah, whose work improving the efficiency of large-scale AI models and efforts to help move advancements in the space from research to practice have put him at the forefront of this new era of AI.


Creating LLM-generated knowledge graphs

We note the basic flow that underpins GraphRAG, which builds upon our prior research (opens in new tab) and repositories (opens in new tab) using graph machine learning: 

  • The LLM processes the entire private dataset, creating references to all entities and relationships within the source data, which are then used to create an LLM-generated knowledge graph. 
  • This graph is then used to create a bottom-up clustering that organizes the data hierarchically into semantic clusters (indicated by using color in Figure 3 below).  This partitioning allows for pre-summarization of semantic concepts and themes, which aids in holistic understanding of the dataset. 
  • At query time, both of these structures are used to provide materials for the LLM context window when answering a question. 

An example visualization of the graph is shown in Figure 3.  Each circle is an entity (e.g., a person, place, or organization), with the entity size representing the number of relationships that entity has, and the color representing groupings of similar entities.  The color partitioning is a bottom-up clustering method built on top of the graph structure, which enables us to answer questions at varying levels of abstraction.

Figure 3: LLM-generated knowledge graph built from a private dataset using GPT-4 Turbo. A knowledge graph visualization represented by a collection in 3D space projected onto a 2D image of circles of varying sizes and colors. The circles are grouped together in space by color, and within each color area the larger circles are surrounded by many smaller circles. Each circle represents an entity within the knowledge graph.
Figure 3: LLM-generated knowledge graph built from a private dataset using GPT-4 Turbo.

Result metrics

The illustrative examples above are representative of GraphRAG’s consistent improvement across multiple datasets in different subject domains.  We assess this improvement by performing an evaluation using an LLM grader to determine a pairwise winner between GraphRAG and baseline RAG.  We use a set of qualitative metrics, including comprehensiveness (completeness within the framing of the implied context of the question), human enfranchisement (provision of supporting source material or other contextual information), and diversity (provision of differing viewpoints or angles on the question posed). Initial results show that GraphRAG consistently outperforms baseline RAG on these metrics.  

In addition to relative comparisons, we also use SelfCheckGPT (opens in new tab) to perform an absolute measurement of faithfulness to help ensure factual, coherent results grounded in the source material. Results show that GraphRAG achieves a similar level of faithfulness to baseline RAG. We are currently developing an evaluation framework to measure performance on the class of problems above.  This will include more robust mechanisms for generating question-answer test sets as well as additional metrics, such as accuracy and context relevance. 

Next steps

By combining LLM-generated knowledge graphs and graph machine learning, GraphRAG enables us to answer important classes of questions that we cannot attempt with baseline RAG alone.  We have seen promising results after applying this technology to a variety of scenarios, including social media, news articles, workplace productivity, and chemistry.  Looking forward, we plan to work closely with customers on a variety of new domains as we continue to apply this technology while working on metrics and robust evaluation. We look forward to sharing more as our research continues.


1As baseline RAG in this comparison we use LangChain’s Q&A (opens in new tab), a well-known representative example of this class of RAG tools in widespread use today.

The post GraphRAG: Unlocking LLM discovery on narrative private data appeared first on Microsoft Research.

Read More

AI Controller Interface: Generative AI with a lightweight, LLM-integrated VM

AI Controller Interface: Generative AI with a lightweight, LLM-integrated VM

This diagram shows the flow and interaction between an AI Controller and LLM during constrained decoding.  The diagram begins with Step 0, uploading the desired AI Controller to the LLM service, if necessary.  Step 1 sends an LLM request to the server.  Step 2 is a token generation, where the AI Controller is called before, during, and after each token generation to control the LLM’s behavior.  Step 2 repeats for every token being generated by the LLM.  Step 3 returns the resulting generated text.

The emergence of large language models (LLMs) has revolutionized the way people create text and interact with computing. However, these models are limited in ensuring the accuracy of the content they generate and enforcing strict compliance with specific formats, such as JSON and other computer programming languages. Additionally, LLMs that process information from multiple sources face notable challenges in preserving confidentiality and security. In sectors like healthcare, finance, and science, where information confidentiality and reliability are critical, the success of LLMs relies heavily on meeting strict privacy and accuracy standards. Current strategies to address these issues, such as constrained decoding and agent-based approaches, pose practical challenges, including significant performance costs or the need for direct model integration, which is difficult.

The AI Controller Interface and program

To make these approaches more feasible, we created the AI Controller Interface (AICI). The AICI goes beyond the standard “text-in/text-out” API for cloud-based tools with a “prompt-as-program” interface. It’s designed to allow user-level code to integrate with LLM output generation seamlessly in the cloud. It also provides support for existing security frameworks, application-specific functionalities, fast experimentation, and various strategies for improving accuracy, privacy, and adherence to specific formats. By providing granular-level access to the generative AI infrastructure, AICI allows for customized control over LLM processing, whether it’s run locally or in the cloud.

A lightweight virtual machine (VM), the AI Controller, sits atop this interface. AICI conceals the LLM processing engine’s specific implementation, providing the right mechanisms to enable developers and researchers to agilely and efficiently work with the LLM, allowing them to more easily develop and experiment. With features that allow for adjustments in decision-making processes, efficient memory use, handling multiple requests at once, and coordinating tasks simultaneously, users can finely tune the output, controlling it step by step.

An individual user, tenant, or platform can develop the AI Controller program using a customizable interface designed for specific applications or prompt-completion tasks. The AICI is designed for the AI Controller to run on the CPU in parallel with model processing on the GPU, enabling advanced control over LLM behavior without impacting its performance. Additionally, multiple AI Controllers can run simultaneously. Figure 1 illustrates the AI Controller architecture.

This figure shows an architecture stack for the AI Controller Interface system.  At the top of the stack, the copilot or application runs independently and calls into an AI Controller one level lower in the stack.  The AI Controller may be the DeclCtrl, PyCtrl, JSCtrl, or a custom controller.  The AI Controller sits above the AI Controller Interface, which is integrated directly with an LLM serving engine, such as rLLM, llama.cpp, or other LLM serving engine.
Figure 1. Applications send instructions to an AI Controller, which provides a high-level API. The AICI allows the Controller to execute efficiently in the cloud in parallel with model inference.

AI Controllers are implemented as WebAssembly VMs, most easily written as Rust programs. However, they can also be written in any language that can be compiled into or interpreted as WebAssembly. We have already developed several sample AI Controllers, available as open source (opens in new tab). These features provide built-in tools for controlled text creation, allowing for on-the-fly changes to initial instructions and the resulting text. They also enable efficient management of tasks that involve multiple stages or batch processing.

High-level execution flow

Let’s take an example to illustrate how the AI Controller impacts the output of LLMs. Suppose a user requests the completion of a task, such as solving a mathematical equation, with the expectation of receiving a numeric answer. The following program ensures the the LLM’s response is numeric. The process unfolds as follows:

1. Setup. The user or platform owner first sets up the AICI-enabled LLM engine and then deploys the provided AI Controller, DeclCtrl, to the cloud via a REST API.

2. Request. The user initiates LLM inference with a REST request specifying the AI Controller (DeclCtrl), and a JSON-formatted declarative program, such as the following example. 

{"steps": [
    {"Fixed":{"text":"Please tell me what is 122.3*140.4?"}},
    {"Gen": {"rx":" ^(([1-9][0-9]*)|(([0-9]*).([0-9]*)))$"}}
]}

Once the server receives this request, it creates an instance of the requested DeclCtrl AI Controller and passes the declarative program into it. The AI Controller parses its input, initializes its internal state, and LLM inference begins.

3. Token generation. The server generates tokens sequentially, with the AICI making calls to the DeclCtrl AI Controller before, during, and after each token generation.

  • pre_process() is called before token generation. At this point, the AI Controller may stop generating (e.g., if it is complete), fork parallel generations, suspend, or continue.
  • mid_process() is called during token generation and is the main entry point for computation in the AI Controller. During this call, the AI Controller can return logit biases to constrain generation, backtrack in the generation, or fast forward through a set of fixed or zero-entropy tokens. The mid_process() function runs in parallel with model inference on the GPU and its computation (e.g., of logit biases) is incorporated into the model’s token sampling on the GPU.
  • post_process() is called once the model has generated the next token. Here, the AI Controller may, for example, perform simple bookkeeping, updating its state based on the sampled token.

During these calls, the DeclCtrl AI Controller executes the necessary logic to ensure that the LLM generation conforms to the declarative program provided by the user. This ensures the LLM response is a numeric solution to the math problem. 

4. Response. Once DeclCtrl completes its program, it assembles the results, which might include intermediate outputs, debug information, and computed variables. These can be returned as a final response or streamed to show progress. Finally, the AI Controller is deallocated.

This diagram shows the flow and interaction between an AI Controller and LLM during constrained decoding.  The diagram begins with Step 0, uploading the desired AI Controller to the LLM service, if necessary.  Step 1 sends an LLM request to the server.  Step 2 is a token generation, where the AI Controller is called before, during, and after each token generation to control the LLM’s behavior.  Step 2 repeats for every token being generated by the LLM.  Step 3 returns the resulting generated text.
Figure 2. AI Controllers incorporate custom logic during the token-by-token decoding, working in parallel with the LLM to support fast, flexible, and secure controlled generation.

Use cases

Efficient constrained decoding

For Rust-based AI Controllers, we’ve developed an efficient way to check and enforce formatting rules (constraints) during text creation within the aici_abi library. This method involves using a special kind of search tree (called a trie) and checks based on patterns (regular expressions) or rules (context-free grammar) to ensure each piece of text follows specified constraints. This efficiency ensures rapid compliance-checking, enabling the program to seamlessly integrate with the GPU’s process without affecting performance.

While AI Controllers currently support mandatory formatting requirements, such as assigning negative infinity values to disallow invalid tokens, we anticipate that future versions will support more flexible guidance.

Information flow constraints

Furthermore, the AI Controller VM gives users the power to control the timing and manner by which prompts, background data, and intermediate text creations affect subsequent outputs. This is achieved through backtracking, editing, and prompt processing.

This functionality can be useful in a number of scenarios. For example, it allows users to selectively influence one part of a structured chain-of-thought process but not another. It can also be applied to preprocessing background data to remove irrelevant or potentially sensitive details before starting an LLM analysis. Currently, achieving this level of control requires multiple independent calls to LLMs.

Looking ahead

Our work with AICI has led to a successful implementation on a reference LLM serving engine (rLLM) and integrations with LLaMa.cpp. Currently, we’re working to provide a small set of standard AI Controllers for popular libraries like Guidance. In the near future, we plan to work with a variety of LLM infrastructures, and we’re excited to use the open-source ecosystem of LLM serving engines to integrate the AICI, providing portability for AI Controllers across environments.

Resources

Code, detailed descriptions of the AICI, and tutorials are available on GitHub (opens in new tab). We encourage developers and researchers to create and share their own custom AI Controllers.

The post AI Controller Interface: Generative AI with a lightweight, LLM-integrated VM appeared first on Microsoft Research.

Read More

Research Focus: Week of February 5, 2024

Research Focus: Week of February 5, 2024

Welcome to Research Focus, a series of blog posts that highlights notable publications, events, code/datasets, new hires and other milestones from across the research community at Microsoft.

Research Focus Week of February 5, 2024

Microsoft Research Forum series kicks off with focus on the promise and challenges of AI

With a look back at the incredible changes of 2023 and a look ahead at the tangible advances to come, the inaugural Microsoft Research Forum explored bold new ideas and important issues in the era of general AI. Leaders from Microsoft Research, including the AI Frontiers team and the AI4Science lab, discussed the importance of openness and collaboration to enable successful and responsible AI research.

Peter Lee, CVP, Microsoft Research and Incubations, led off the discussion, followed by a panel exploring some of the biggest potential AI breakthroughs, along with challenges to overcome. This includes:

  • Building AI systems that become helpers in the physical world 
  • Uncovering the building blocks of human reasoning 
  • Making AI technology smaller and less costly, to improve performance and availability  
  • Helping AI learn from people that use it, rather than simply answering questions 

In the “lightning round,” Microsoft researchers explored current work to improve pretrained large language models, understand and evaluate foundation models, facilitate breakthroughs in molecular science, augment human decision making, and improve visual storytelling.

To learn more, check out this recap (opens in new tab) and browse the on-demand session replays (opens in new tab). Be sure to register for upcoming episodes (opens in new tab).

Spotlight: On-demand video

AI Explainer: Foundation models ​and the next era of AI

Explore how the transformer architecture, larger models and more data, and in-context learning have helped advance AI from perception to creation.


The Truth is in There: Improving Reasoning in Language Models with Layer-Selective Rank Reduction

Transformer-based large language models (LLMs) have become a fixture in machine learning. Correspondingly, significant resources are allocated towards research to further advance this technology, typically resulting in models of increasing size that are trained on increasing amounts of data.

In a recent paper, The Truth is in There: Improving Reasoning in Language Models with Layer-Selective Rank Reduction, researchers from Microsoft demonstrate a surprising result: that it is possible to significantly improve LLM performance by selectively removing higher-order components of their constituent weight matrices. As covered in a Microsoft Research Forum lightning talk, this simple intervention—LAyer-SElective Rank reduction (LASER)—can be done on a model after training has been completed, and requires no additional parameters or data. In extensive experiments, the researchers demonstrate the generality of this finding across language models and datasets. They provide in-depth analyses offering insights into both when LASER is effective and the mechanism by which it operates.


Cache-Efficient Top-k Aggregation over High Cardinality Large Datasets

Business intelligence tools make it easy to analyze large amounts of data. In these tools, top-k aggregation queries are used to summarize and identify important groups of data. These queries are usually processed by computing exact aggregates for all groups and then selecting the groups with the top-k aggregate values. However, this can be inefficient for high-cardinality large datasets, where intermediate results may not fit within the local cache of multi-core processors, leading to excessive data movement.

Researchers from Microsoft, in their recent paper: Cache-Efficient Top-k Aggregation over High Cardinality Large Datasets, introduce a new cache-conscious algorithm to address this. The algorithm efficiently computes exact top-k aggregates without fully processing all groups. Aggregation over large datasets requires multiple passes of data partitioning and repartitioning, thereby presenting a significant opportunity to reduce partitioning overhead for top-k aggregation queries. The algorithm leverages skewness in data distribution to select a small set of candidate groups for early aggregation. This helps eliminate many non-candidates group partitions through efficient partitioning techniques and coarse-grained statistics without computing exact aggregation. Empirical evaluation using both real-world and synthetic datasets demonstrate that the algorithm achieves a median speed-up of over 3x for monotonic aggregation functions and 1.4x for non-monotonic functions, compared to existing cache-conscious aggregation methods, across standard k value ranges (1 to 100).


Six Microsoft researchers named 2023 ACM Fellows

The Association for Computing Machinery’s (ACM) annual fellows award recognizes people who make transformative contributions to computing science and technology. For 2023, the global organization named six researchers from Microsoft among its 68 award recipients.

Jianfeng Gao – VP and Distinguished Scientist
For contributions to machine learning for web search, natural language processing, and conversational systems

Sumit Gulwani – Partner Research Manager
For contributions to AI-assisted programming for developers, data scientists, end users, and students

Nicole Immorlica – Senior Principal Researcher
For contributions to economics and computation including market design, auctions, and social networks

Stefan Saroiu – Senior Principal Researcher
For contributions to memory security and trusted computing

Manik Varma – VP and Distinguished Scientist
For contributions to machine learning and its applications

Xing Xie – Senior Principal Research Manager
For contributions to spatial data mining and recommendation systems

The post Research Focus: Week of February 5, 2024 appeared first on Microsoft Research.

Read More

What’s Your Story: Ivan Tashev

What’s Your Story: Ivan Tashev

photo of Ivan Tashev with the Microsoft Research Podcast logo

In the Microsoft Research Podcast series What’s Your Story, Johannes Gehrke explores the who behind the technical and scientific advancements helping to reshape the world. A systems expert whose 10 years with Microsoft spans research and product, Gehrke talks to members of the company’s research community about what motivates their work and how they got where they are today.

Partner Software Architect Ivan Tashev’s expertise in audio signal processing has contributed to the design and study of audio components for Microsoft products such as Kinect, Teams, and HoloLens. In this episode, Tashev discusses how a first-place finish in the Mathematical Olympiad fueled a lifelong passion for shooting film; how a company event showcasing cutting-edge projects precipitated his move from product back to research; and how laser focus on things within his control has helped him find success in 25-plus years with Microsoft.

photos of Ivan Tashev throughout his life

Transcript

[TEASER] [MUSIC PLAYS UNDER DIALOGUE]

IVAN TASHEV: To succeed in Microsoft, you have to be laser focused on what you are doing. This is the thing you can change. Focus on the problems you have to solve, do your job, and be very good at it. Those are the most important rules I have used in my career in Microsoft.

[TEASER ENDS]

JOHANNES GEHRKE: Microsoft Research works at the cutting edge. But how much do we know about the people behind the science and technology that we create? This is What’s Your Story, and I’m Johannes Gehrke. In my 10 years with Microsoft, across product and research, I’ve been continuously excited and inspired by the people I work with, and I’m curious about how they became the talented and passionate people they are today. So I sat down with some of them. Now, I’m sharing their stories with you. In this podcast series, you’ll hear from them about how they grew up, the critical choices that shaped their lives, and their advice to others looking to carve a similar path.

[MUSIC FADES]

In this episode, I’m talking with Partner Software Architect Ivan Tashev in the anechoic chamber in Building 99 on our Redmond, Washington, campus. Constructed of concrete, rubber, and sound-absorbing panels, making it impervious to outside noise, this chamber has played a significant role in Ivan’s 25 years with Microsoft.

He’s put his expertise in audio processing to work in the space, helping to design and study the audio components of such products as Kinect, Teams, and HoloLens. Here’s my conversation with Ivan, beginning with his childhood in Bulgaria, where he was raised by two history teachers.


IVAN TASHEV: So I’m born in a city called Yambol in Bulgaria, my origin country. The city [was] created 2,000 years B.C. and now sits on the two shores of the river called Tundzha. It always has been an important transportation and agricultural center in the entire region, and I grew up there in a family of two lecturers. My parents were teaching history. And they loved to travel. So everywhere I go, I had two excellent tourist guides with me: “This in this place happened at this and this and this in this year.”

GEHRKE: Were there quizzes afterwards? [LAUGHTER]

TASHEV: But it happened that I was more fond to engineering, technology, math, all of the devices. It just … mechanical things just fascinated me. When I read in a book about the parachutes, I decided that I will have to try this and jump into it from the second floor of a building with an umbrella to see how much it will slow me down. It didn’t.

GEHRKE: And how … did you get hurt?

TASHEV: Oh, I ended with a twisted ankle for quite a while.

GEHRKE: Oh, OK, good. Nothing more … worse. [LAUGHTER] So you were always hands on, that’s what you’re telling me, right? Always the experimenter?

TASHEV: Yep. So I was doing a lot of this stuff, but also I was very strong in math. It happened that I had good teachers in math, and going to those competitions of mathematical Olympiads was something I started since fifth grade. Pretty much every year, they were well organized on school, city, regional level, and I remember how in my sixth grade, I won the first place of a regional Olympiad, and the prize was an 8mm movie camera. That, I would say, changed my life. This is my hobby since then. I have been holding this, a movie camera of several generations, everywhere I go and travel. In Moscow, in Kyiv, in Venice. Everywhere my parents were traveling, I was shooting 8mm films, and I continue this till today. Today, I have much better equipment but also very powerful computers to do the processing. I produce three to five Blu-ray Discs pretty much every year. Performances of the choir or the dancing groups in the Bulgarian Cultural and Heritage Center of Seattle mostly.

GEHRKE: Wow, that’s fascinating. And was that hobby somehow connected to your, you know, entry into, you know, science and then actually doing a PhD and then actually going and, you know, going into audio, audio processing?

TASHEV: The mathematical high school I attended in my … in the city where I’m born was one of the fifth … one of the five strongest in the country, which means first, math every day, two days, twice; physics every day. Around ninth grade, at the end, we finished the entire high school curriculum and started to study differentials and integrals, something which is more towards the university math courses. But this means that I had no problems entering any of the, of the universities with mathematical exams. I didn’t even have to do that because I qualified in one year, my 11th grade, to become member of the Bulgarian national teams in … for the International Math Olympia and for International Physics Olympia. And they actually coincided, so I had to choose one, and I chose physics. And since then, I’m actually saying that math is the language of physics; physics is the language of engineering. And that kind of showed the tendency … so literally, I was 11th grade and I could literally point and choose any of the universities, and I decided to go and study electronic engineering in the Technical University of Sofia.

GEHRKE: And then how did you end up in the US?

TASHEV: So that’s another interesting story. I defended my … graduated from the university, defended my PhD thesis. It was something amazing.

GEHRKE: What was it on, actually?

TASHEV: It was a control system for a telescope. But not just for observation of celestial objects but for tracking and ranging the distance to a satellite. It’s literally one measurement. You shoot with the laser; it goes to the satellite, which is 60 centimeters in diameter; it returns back; and you measure the time with accuracy of 100 picoseconds. And this was part of studying how the Earth rotates, how the satellites move. The data … there were around 44 stations like this in the entire Earth, and the data were public and used by NASA for finalizing the models for the satellites, which later all became GPS; used by Russians to finalize the models for their GLONASS system; used by people who studied the precession and the rotation of the Earth. A lot of interesting PhD theses came from the data from the results of this device, including tides. For example, I found that Balkan Peninsula moves up and down 2 meters every day because of the tides. So the Earth is liquid inside, and there are tides under us in the same way as with the oceans.

GEHRKE: Oh, wow, super interesting. I actually just wanted to come back … so just to get the right kind of comparison for the, for the unit, and so picoseconds, right? Because I know what a nanosecond is because …

TASHEV: Nanoseconds is 1-0 minus ninth; picoseconds is 1-0 minus 12th.

GEHRKE: OK. Good, good. Just to put that in perspective.

TASHEV: Thank you, Johannes. To, to be exact. So this was the, the accuracy. The light goes 30 centimeters for that time. For one nanosecond. And we needed to go way shorter than that. But why this project was so fascinating for me … can you imagine this is 1988—people having Apple II on compatible computers playing with the joystick a very famous game when you have the crosshair in the space and you shoot with laser the satellites.

GEHRKE: [LAUGHS] Absolutely.

TASHEV: And I was sitting behind the ocular and moving a joystick and shooting at real satellites. [LAUGHS]

GEHRKE: Not with the goal to destroy them, of course.

TASHEV: No. The energy of the laser was one joule. You can put your hand in front. But very short and one nanosecond. So it can go and enter and you have the resolution to measure the distance.

GEHRKE: Super, super exciting.

TASHEV: And after that, I became assistant professor in the Technical University of Sofia. How I came to Microsoft is a consequence of that. So I was teaching data and signal processing, and the changes in Europe already started. Think about 1996. And then a friend of mine came back from a scientific institution from the former Eastern Germany, and he basically shared how much money West Germany has poured into the East German economy to change it, to bring it up to the standards, and that … it was, I think, 900 billion Deutsche Marks.

GEHRKE: But this was after the … 

TASHEV: After the changes. After, after basically the East and West Germany united. And then this was in the first nine years of the changes. And then we looked at each other in the eyes and said, wait a minute. If you model this as a first-order system, this is the time constant, and the process will finish after two times more of the time constant, and they will need another 900 billion Marks. You cannot imagine how exact became that prediction when East Germany will be on equal economically to the West Germany. But then we looked at each other’s eyes and said, what about Bulgaria? We don’t have West Bulgaria. And then this started to make me think that most probably there will be technical universal software, but in this economical crisis, there will be no money for research, none for development, for building skills, for going to conferences. And then pretty much around the same time, somebody said, hey, you know, Microsoft is coming here to hire. And I sent my résumé knowing that, OK, I’m an assistant professor. I can program. But that actually happened that I can program quite well, implementing all of those control systems for the telescope, etc., etc., and literally …

GEHRKE: And so there was a programming testing as part of the interview?

TASHEV: Oh, the interview questions were three or four people, one hour, asking programming questions. The opening was for a software engineer.

GEHRKE: Like on a whiteboard?

TASHEV: Like on a whiteboard. And then I got an email saying that, Ivan, we liked your performance. We want to bring you to Redmond for further interviews. I flew here in 1997. After the interviews, I returned to my hotel, and the offer was waiting for me on the reception.

GEHRKE: Wow, that’s fast.

TASHEV: So this is how we decided to move here in Redmond, and I started and went through two full shipping cycles of programs.

GEHRKE: So you didn’t start out in MSR (Microsoft Research), right?

TASHEV: Nope.

GEHRKE: Where were you first?

TASHEV: So, actually, I was lucky enough both products were version 1.0. One of them was COM+. This is the transactional server and the COM technology, which is the backbone of Windows.

GEHRKE: Was the component model being used at that point in time?

TASHEV: Component object model. Basically, creating an object, calling … getting the interface, and calling the methods there. And my experience with low-level programming on assembly language and microprocessor actually became here very handy. We shipped this as a part of Windows 2000. And the second product was the Microsoft Application Center 2000, which was, OK, cluster management system.

GEHRKE: But both of them had nothing to do with signal processing, right?

TASHEV: Nope. Except there were some load balancing in Application Center. But they had nothing to do with signal processing; just pure programming skills.

GEHRKE: Right.

TASHEV: And then in the year of 2000, there was the first TechFest, and I went to see it and said, wait a minute. There are PhDs in this company and they’re doing this amazing research? My place is here.

GEHRKE: And TechFest, maybe … do you want to explain briefly what TechFest is?

TASHEV: TechFest is an annual event when researchers from Microsoft Research go and show and demonstrate technologies they have created.

GEHRKE: So it used to be, like, in the Microsoft Conference Center.

TASHEV: It used to be in the Microsoft Conference Center …

GEHRKE: Like, a really big two-day event …

TASHEV: … and basically visited by 6, 7,000 Microsoft employees. And usually, Microsoft Research, all of the branches were showing around 150ish demos, and it was amazing. And that was the first such event. Pretty much …

GEHRKE: Oh, the very first time?

TASHEV: The very first TechFest. And pretty much not only me, but the rest of Microsoft Corporation learning that we do have a research organization. In short, in three months, I started in Microsoft Research.

GEHRKE: How did you get a job here then? How did that happen?

TASHEV: So … seriously, visiting TechFest made me to think seriously that I should return back to research, and I opened the career website with potential openings, and there were two suitable for me. One of them was in Rico Malvar’s Signal Processing Group …

GEHRKE: Oh, OK, yeah …

TASHEV: … and the other was in Communication, Collaboration, and Multimedia Group led by Anoop Gupta. So I sent my résumé to both of them. Anoop replied in 15 minutes; next week, I was on informational with him. When Rico replied, I already had an offer from Anoop to join the team. [LAUGHS]

GEHRKE: Got it. And that’s, that’s where your focus on communication came from then?

TASHEV: Yes. So our first project was RingCam.

GEHRKE: OK.

TASHEV: So it’s a 360-camera, eight-element microphone array in the base, and the purpose was to record the meetings, to do a, a meeting diarization, to have a 360 view, but also, based on the signal processing and face detection, to have a speaker view, separate camera for the whiteboard, diarization based on who is speaking based on the direction from the microphone array. Honestly, even today when you read our 2002 paper … Ross Cutler was creator of the 360 camera; I was doing the microphone array. Even today when you read our 2002 paper, you say, wow, that was something super exciting and super advanced.

GEHRKE: And that then you brought it all the way to shipping, right, and it became a Microsoft product?

TASHEV: So yes. At some point, it was actually monitored personally by Bill Gates, and at some point …

GEHRKE: So he was PMing it, basically, or …? [LAUGHS]

TASHEV: He basically was …

GEHRKE: He was just aware of it.

TASHEV: I personally stole the distributed meeting system in Bill Gates’ conference room.

GEHRKE: Wow.

TASHEV: We do have basically 360 images with Bill Gates attending a meeting. But anyway, it was believed that this is something important, and a product team was formed to make it a product. Ross Cutler left Microsoft Research and became architect of that team, and this is what became Microsoft RoundTable device. It was licensed to Polycom, and for many years was sold as Polycom [CX5000].

GEHRKE: Yeah, actually, I remember when I was in many meetings, they used to have exactly that device in the middle, and the nice thing was that even if somebody was remote, right, you could see all the people around the table and you got this, sort of, really nice view of who was next to whom and not sort of the transactional windows that you have right now in Teams. That’s a really interesting view.

TASHEV: So, as you can see, very exciting start. [LAUGHS] But then Anoop went and became Bill Gates’ technical assistant, and the signal processing people from his team were merged with Rico Malvar’s signal processing team, and this is how I continued to work on microphone arrays and the speech enhancement, and this is what I do till today.

GEHRKE: And you mentioned, like, amazing products from Microsoft like Kinect and so on, right. And so you were involved in the, like, audio processing layer of all of those, and they were actually then … part of it was designed here in this room?

TASHEV: Yep.

GEHRKE: So tell me a little bit more about how that happened.

TASHEV: You know, at the time, I was fascinated by a problem which was considered theoretically impossible: multichannel acoustic echo cancellation. There was a paper written in 1998 by the inventor of the acoustic echo cancellation from Bell Labs stating that stereo acoustic echo cancellation is not possible.

GEHRKE: And he proved it, or what does it mean? He just …

TASHEV: It’s very simple. You have two unknowns—the two impulse responses from the left and the right loudspeaker—and one equation; that’s the microphone signal. What I did was to circumvent this. When you start Kinect, you’ll hear some melodic signals, and this is the calibration. At least you know the relation between the two unknowns, and now you have one unknown, which is basically discovered using an adaptive filter, the classic acoustic echo cancellation. So technically, Kinect became the first device ever shipped with surround sound acoustic echo cancellation, the first device ever that could recognize human speech from 4 1/2 meters while the loudspeakers are blasting. And gamers are listening to very loud levels of their loudspeakers.

GEHRKE: So maybe just tell the audience a little bit, what does it mean to do acoustic echo cancellation? What is it actually good for, and what does it do?

TASHEV: So in general, speech enhancement is removing unwanted noises and sounds from the desired signal. Some of them we don’t know anything about, which is the surrounding noise. For some of them, we have a pretty good understanding. This is the sound from our own loudspeakers. So you send the signal to the loudspeakers and then try to estimate on the fly how much of it is captured by the microphone and subtract this estimation, and this is called acoustic echo cancellation. This is part of every single speakerphone. This is one of the oldest applications of the adaptive filtering.

GEHRKE: So would the right way to think about this is that noise cancellation is cancelling unwanted noise from the outside?

TASHEV: Unknown noises …

GEHRKE: … whereas acoustic echo cancellation is cancelling the own noise that actually comes …

TASHEV: … which we know about.

GEHRKE: Right, OK.

TASHEV: And that was an amazing work, but … it also started actually in TechFest. I designed this surround sound echo cancellation, and my target was … at the time, we had Windows Media Center. It was a device designed to stay in the media room and controlling all of those loudspeakers. And I made sure to bring all of the VPs of Windows and Windows Media Center, and then I noticed that I started repeatedly to see some faces which I didn’t invite—I didn’t know—but they came over and over and over. And after the meeting, after TechFest, a person called me and said, “Look, we are working on a thing which your technology fits very well,” and this is how I started to work for Kinect. And in the process of the work, I had to go and talk with industrial designers because of the design of the microphones, with electrical designers because of the circuitry and the requirements for identical microphone channels, and with the software team, which had to implement my algorithms, and this … actually, at some point, I had an office in their building and was literally embedded working with them day and night, especially at the end of the shipping cycle, of the shipping cycle when the device had to go out.

GEHRKE: And this was not a time when you could go, like, in the device and, you know, update software on the device or anything. The device would go out as is, right?

TASHEV: Actually, this was one of the first devices like that.

GEHRKE: Oh, it could?

TASHEV: Yup.

GEHRKE: Wow, I didn’t know that.

TASHEV: Already Kinects were manufactured. They are boxed; they are already distributed to the, to the stores. But there was a deadline when we had to provide the image when you connected Kinect to your Xbox and it has to be uploaded.

GEHRKE: But, no, I get that. But then once it was actually connected to the Xbox, you could still update the firmware on the …

TASHEV: Yes, yes.

GEHRKE: Oh, wow. That’s, that’s really cool. OK.

TASHEV: But it also has a deadline. So that was amazing. Literally left us, all of us, breathless. There are plenty of serious technological challenges to overcome. A lot of firsts as a technology is, basically, was brought to this device to make sure … and this is the audio. And next to us were the video people and the gaming people and the designers, and everybody was excited to be working like hell so we can basically bring this to the customers.

GEHRKE: Wow, that’s super exciting. I mean even just being involved in … I think that’s one of the really big things that is so much fun here at Microsoft, right, that you can get whatever you do in the hands of, you know, millions—if not hundreds of millions—of people, right. Coming, coming back to, you know, your work now in, in audio signal processing, and that whole field is also being revolutionized like many other fields right now with AI, right.

TASHEV: Absolutely.

GEHRKE: Photography, one of the other fields that you’re very passionate about, is also being revolutionized with AI, of course.

TASHEV: Also revolutionized.

GEHRKE: You know, in, in terms of changes that you’ve made in your career, how do you deal with such changes, and what were … you know, this is something where you have been an expert in a certain class of algorithms, and now suddenly it says there’s this completely new technology coming along, and we need to shift. How are you dealing with this? How did you deal with this, personally?

TASHEV: Let me put it in …

GEHRKE: In some sense, you’re becoming a little bit of a dinosaur in a little bit while …

TASHEV: Oh, not at all.

GEHRKE: That’s what I’m saying.

TASHEV: I wouldn’t be in research! [LAUGHS]

GEHRKE: Exactly. How did you overcome that?

TASHEV: So, first, each one of us was working and trying to produce better and better technology, and at the time, the signal processing, speech enhancement, most of the audio processing was based on statistical signal processing. You build statistical models, distributions, hidden Markov models, and get …

GEHRKE: Like speech recognition.

TASHEV: … certain improvements. Yep. And all of us started to sense that this set of tools we have started to saturate. And it was simple. We use the simple models we can derive. Let’s say speech is Gaussian distribution; noise is Gaussian distribution. You derive the suppression rule. But this is simplifying the reality. If you apply a more precise model of the speech signal distribution, then you cannot derive easily the suppression rule, for example, in the case of noise suppression. And it was literally hanging in the air that we have to find a way, a way to learn from data. And I have several papers, actually before the neural networks start to appear, that let’s get a big dataset and learn from the data this suppression rule.

GEHRKE: So a more data-driven approach already.

TASHEV: Data-driven approach. I have several papers from that, and by the way, they were not quite well accepted by my audio processing community. All of them are published on bordering conferences, not in the core conferences. I got those papers rejected. But then appeared neural networks. Not that they were something new. We had neural networks in ’80s, and they didn’t work well. The new … the miracle was that now we had an algorithm which allows us to train them. Literally, next year after the work of Geoff Hinton was published in the Implementation of Deep Learning, several things happened. At first, my colleagues in the speech research group started to do neural network–based speech recognition, and I, in my audio group, started to do neural network–based speech enhancement. This is the year of 2013 or 2014. We had the speech, neural network–based speech enhancement algorithm surpassing the existing statistical signal processing algorithm literally instantly. It was big. It was heavy. But better.

GEHRKE: When did the first of these ship? What … can you tell any interesting ship stories about this?

TASHEV: The first neural network–based speech enhancement algorithm was shipped in 2020 in Teams.

GEHRKE: OK, OK.

TASHEV: We had to work with that team for quite a while. Actually, four years took us to work with Teams to find … you see, here in the research, industrial research lab we have a little bit different perspective. It’s not just to make it work; it’s not just to make it a technology. That technology has to be shippable. It has to meet a lot of other requirements and limitations in memory and in CPU and in reliability. It’s one thing to publish a paper with very cool results with your limited dataset and completely different to throw this algorithm in the wild, where it can face everything. And this is why it cost us around four years before to ship the first prototype in Teams.

GEHRKE: That, that makes sense. And I think a lot of the infrastructure was also not there at that point in time early on, right, in terms of, you know, how do you upload a model to the client, even in terms of all the model profiling, you know, neural architecture search, quantization, and other tooling that now exists where you can take a model …

TASHEV: That’s correct.

GEHRKE: … and squeeze it on the right kind of computation for the …

TASHEV: That’s correct. And …

GEHRKE: So you did all of that manually, I guess, at that point in time.

TASHEV: Initially, yes. But new architectures arrived. The cloud. Wow, it was a savior. You can press a button; you can get a hundred or thousand machines. You can run in parallel multiple architectures. You can really select the optimal from every single standpoint. Actually, what we did is we ended up with a set of speech enhancement algorithms. Given computing power, we can tell you what is the best architecture for this, or if you want to hit up this improvement, I can tell you how much CPU you will need for that.

GEHRKE: Got it.

TASHEV: But that tradeoff is also something very typical for industrial research lab and not very well understood in academia.

GEHRKE: Makes sense. Let me, let me switch gears one last time, namely, I mean, you have made quite a few changes in your career, you know, throughout, right. You started as an assistant professor and then became, sort of, a core developer, then, you know, were a member of a signal processing group and now you’re, sort of, driving a lot of the audio processing research for the company. How do you deal with this change, and do you have any advice for our listeners on how to, you know, keep your career going, especially as the rate of change seems to be accelerating all the time?

TASHEV: So for 25 years in Microsoft Corporation, I have learned several rules I follow. The first is dealing with ambiguity. It is not just change in the technology but changes in the … of the teams and organizations, etc., etc. Simply put, there are things you cannot change. There are things you cannot hide. Just accept them and go on. And here comes the second rule. To succeed in Microsoft, you have to be laser focused on what you are doing. This is the thing you can change. Focus on the problems you have to solve, do your job, and be very good at it. This is the most important … those are the two most important rules I have used in my career in Microsoft.

GEHRKE: OK, super, super interesting, Ivan. Thank you very much for this amazing conversation.

TASHEV: Thank you for the invitation, Johannes.

GEHRKE: To learn more about Ivan’s work or to see photos of Ivan pursuing his passion for shooting film and video, visit aka.ms/ResearcherStories (opens in new tab).

The post What’s Your Story: Ivan Tashev appeared first on Microsoft Research.

Read More

Microsoft Research Forum: New series explores bold ideas in technology research in the era of AI

Microsoft Research Forum: New series explores bold ideas in technology research in the era of AI

Microsoft Research Forum (opens in new tab) is a new series of conversations that explore recent advances, bold new ideas, and important discussions within the global research community. Leading Microsoft researchers will share insights into their work, followed by live online discussions with audience participants.

This post provides an overview of the inaugural Microsoft Research Forum conversation, with a summary of each presentation. Full details, including the copilot experience (opens in new tab) and replays of each session (opens in new tab), are available on demand. Register now (opens in new tab) to attend upcoming Research Forum events.

Keynote: Research in the era of AI

Research Forum January 2024 - Peter Lee

Peter Lee, CVP, Microsoft Research & Incubations

2023 was an incredible year for AI research, with rapid change and the emerging sparks of artificial general intelligence.  Generative AI now influences everything in research, and research has never mattered more to innovating technology that will benefit society. And while there is plenty of reason for optimism, we must also be clear-eyed about risks and limitations—another direction where research can play an important role.

In this environment, openness and collaboration are essential, not just to advance the research, but to ensure technology is developed with a commitment to safety and ethical use. Microsoft continues to invest in its commitment to responsible AI (RAI), which is deeply integrated not only into every engineering group across the company, but also across functions like finance, security, and legal teams. Additional progress will require close collaboration with the broader research community.

Some of the most promising and tangible advances are coming in medicine and materials science. Examples include work by Microsoft AI4Science, a Microsoft Research lab, which is working with the Global Health Drug Discovery Institute to accelerate discovery of new treatments for infectious diseases.

Panel discussion: AI Frontiers

Research Forum January 2024 - panel discussion with Ashley Llorens, Sebastien Bubeck, Ahmed Awadallah, and Ece Kamar

Ashley Llorens, VP and Distinguished Scientist, Microsoft
Ece Kamar, Managing Director, Microsoft Research AI Frontiers
Sébastien Bubeck, VP, Microsoft GenAI

Ahmed Awadallah, Senior Principal Research Manager, Microsoft Research AI Frontiers

The panelists explored their aspirations for AI in the near future, as well as the challenges to overcome. Examples include:

  • Going beyond language to build AI systems that become helpers in the physical world. AI can do more than just answer questions; it can better understand our goals and intentions and create a difference in people’s lives.
  • Beyond trying to get AI to mimic the human mind, can AI actually illuminate how the human mind works and uncover the building blocks of reasoning?
  • Making AI technology smaller would help reduce the cost and increase the performance of current AI systems. How can we divide problems into smaller pieces to solve? And how can we lower the requirements of big data, large neural networks, and massive computing resources?
  • Can we create a virtuous feedback loop, where AI learns from people that use it, rather than simply delivering answers from a static base of information?

The panelists also explored the rapid pace of technology development. Historical timelines of three to five years are now condensed into mere weeks. In this environment, collaboration is essential to quickly develop ideas and scale up experimentation across organizations. This also amplifies existing concerns about optimizing for safety and alleviating bias in language models.

Lightning Talks

Improving reasoning in language models with LASER: Layer-Selective Rank Reduction

Research Forum January 2024 - Dipendra Misra

Dipendra Misra, Senior Researcher, Microsoft Research NYC and AI Frontiers

Large language models (LLMs) have revolutionized machine learning. As researchers continue to advance this technology, one approach involves performing an intervention in the models and observing how that affects their performance. This talk presents LASER, a new method of intervention that can increase LLMs’ accuracy while reducing their memory footprint.

Evaluation and understanding of foundation models

Research Forum January 2024 - Besmira Nushi

Besmira Nushi, Principal Researcher, Microsoft Research AI Frontiers

Model evaluation and understanding serve as guides to AI innovation. But evaluation is hard, and new generative tasks pose new challenges in evaluation and understanding. This talk explores efforts to measure, inform, and accelerate model improvement, which help the scientific community understand and study new forms and levels of intelligence.

Generative AI meets structural biology: Equilibrium distribution prediction

Research Forum January 2024 - Shuxin Zheng

Shuxin Zheng, Principal Researcher, Microsoft Research AI4Science

Distributional Graphormer (DIG) is a deep learning framework for predicting protein structures with greater accuracy, a fundamental challenge in molecular science. Using generative AI to solve the problem of predicting equilibrium distribution, DIG opens exciting new possibilities. By learning about different states and behaviors of molecules, scientists can make breakthroughs in developing new drugs, creating advanced materials, and understanding biological processes.

Augmenting human cognition and decision making with AI

Research Forum January 2024 - Jake Hofman

Jake Hofman, Senior Principal Researcher, Microsoft Research NYC

How can AI help people make better decisions, be more productive, and improve themselves in a sustainable way? Some technology can help in the short term without providing lasting solutions. For example, relying on a spell checker may not improve one’s ability to spell correctly. This talk explores choices in the design and use of AI tools to help with decision making and the importance of rigorous measurement and experimentation to maximize the benefits and minimize the risks.

Kahani: Visual storytelling through culturally nuanced images

Research Forum January 2024 - Sameer Segal

Sameer Segal, Principal Research Software Development Engineer, Microsoft Research India

Image generation models can produce visually stunning images from natural language descriptions, but they often lack cultural awareness and nuances. These models may rely on stereotypes and fail to understand local words, which require heavy fixes like modifying or significantly fine tuning the model. Image generation can also require sophisticated prompting, beyond the abilities of many laypeople.

This talk looks at Kahani, a Microsoft Research project focused on developing a visual storytelling prototype that allows people to create visually striking and culturally nuanced images just by describing them in their local languages. Kahani leverages state-of-the-art techniques like inpainting and models like Segment Anything and GPT-4V(ision) to generate feedback for the candidate images.

Closing remarks and announcements

Research Forum January 2024 - Ashley Llorens

Ashley Llorens, VP and Distinguished Scientist, Microsoft

The acceleration of AI underscores the importance of engagement across disciplines, organizations, and geographies. This session introduced the first cohort of fellows for Microsoft Research’s AI & Society Fellows (opens in new tab) program, which aims to foster deep interdisciplinary collaboration that maximizes the value of AI for people and society. The session also provided an update on the Accelerate Foundation Models Research (opens in new tab) (AFMR) program, which issues grants that make leading models, hosted through Microsoft Azure, accessible to academic research teams. To date, AFMR grants are supporting nearly 200 projects across 80 research institutions around the world. These projects include work in AI model innovation and evaluation, responsible AI, health, AI for scientific discovery, and more. 

The post Microsoft Research Forum: New series explores bold ideas in technology research in the era of AI appeared first on Microsoft Research.

Read More

Announcing recipients of the AFMR Minority Serving Institutions grant

Announcing recipients of the AFMR Minority Serving Institutions grant

Today, as part of the Accelerate Foundation Models Research (AFMR) initiative, Microsoft is delighted to announce the 10 inaugural grant recipients through the AFMR Minority Serving Institutions grant program.

This pilot focuses on supporting historically black colleges and universities (HBCUs) and Hispanic-serving institutions (HSIs), providing them with access to the state-of-the-art tools necessary to conduct meaningful and impactful research on AI. In addition to a grant award, recipients are provided with credits they can use to access leading-edge models hosted by Microsoft Azure through Azure AI Studio (opens in new tab).

AFMR is a collaborative effort with the academic research community and part of the Microsoft pledge to support the President’s Executive Order on the Safe, Secure, and Trustworthy Development and Use of Artificial Intelligence. This program is dedicated to exploring foundation models with the aim of achieving three main objectives: aligning AI with human values and goals to enhance safety, responsibility, and transparency; improving human interactions through research at the intersection of technology and society, fostering trust and creativity; and accelerating scientific discovery in natural sciences through innovative knowledge and data approaches.

Operating as a global network and resource platform, AFMR fosters interdisciplinary collaboration among researchers from various fields, addressing significant technical and societal challenges.

AFMR Minority Serving Institutions grant recipients:

Creativity and Design

  • Cesar Torres, AI-Enhanced Bricolage: Augmenting Creative Decision Making in Creative Practices, The University of Texas at Arlington
  • Nikita Soni, Exploring Interaction Design Space for Child-AI Visual Storytelling Creativity Support Tools, University of Illinois, Chicago

Cognition and Societal Benefits

  • Muhammed Idris, Advancing Culturally Congruent Cancer Communication with Foundation Models, Morehouse School of Medicine
  • Hajar Homayouni, Federated Privacy-Preserving Mulitmodal Generator for Synthetic Medical Data Generation, San Diego State University
  • Junzhou Huang, Developing Foundation Models for Survival Prediction from Pathological Image and Biomedical Text, The University of Texas at Arlington
  • Pedram Rooshenas, LLM-Powered Teaching Assistant for Computer Science Courses, University of Illinois, Chicago

Benchmarks, Evaluation and Measurement

  • Kinnis Gosha, Evaluation of Hybrid AI Systems for Workforce Performance Evaluation, Morehouse College

Model Advancement

  • Davi Valerio de Queiroz Rodrigues, Advancing Foundation Models Towards Physical AI: Bridging the Gap Between Natural Language and Wireless Sensing, The University of Texas at El Paso

Multimodal and Crossmodal Learning

  • Amr Magdy, Visual Knowledge Distillation on the Edge: An Application on Enhancing Self-Inference Cameras for Near-Real-Time Operations, University of California, Riverside

Responsible AI

  • Xiang (Susie) Zhao, Accelerating Environmental Justice Analysis using Foundational Models for Intelligent Disaster Recovery and City Planning, Alabama A&M University

By driving deeper collaboration across disciplines, institutions, and sectors, Microsoft aims to unlock the full potential of AI across greater breadth of research pursuits, application domains, and societal contexts.

The post Announcing recipients of the AFMR Minority Serving Institutions grant appeared first on Microsoft Research.

Read More

Abstracts: January 25, 2024

Abstracts: January 25, 2024

MSR Podcast - Abstracts hero with a microphone icon

Members of the research community at Microsoft work continuously to advance their respective fields. Abstracts brings its audience to the cutting edge with them through short, compelling conversations about new and noteworthy achievements.

In this episode, Senior Researchers Jordan Ash and Dipendra Misra join host Gretchen Huizinga to discuss “The Truth is in There: Improving Reasoning in Language Models with Layer-Selective Rank Reduction,” which was accepted to the 2024 International Conference on Learning Representations (ICLR). Layer-Selective Rank reduction, or LASER, is an intervention for targeted parameter reduction in transformer-based models. The work shows that the removal of certain parameters not only maintains model performance like some existing parameter-reduction methods but can actually improve it—no additional training necessary.

To learn more about the paper and related topics, register for Microsoft Research Forum (opens in new tab), a series of panel discussions and lightning talks around science and technology research in the era of general AI.

Transcript

[MUSIC PLAYS]

GRETCHEN HUIZINGA: Welcome to Abstracts, a Microsoft Research Podcast that puts the spotlight on world-class research in brief. I’m Dr. Gretchen Huizinga. In this series, members of the research community at Microsoft give us a quick snapshot—or a podcast abstract—of their new and noteworthy papers.

[MUSIC FADES]

Today, I’m talking to Dr. Dipendra Misra and Dr. Jordan Ash, both senior researchers at Microsoft Research. Drs. Misra and Ash are coauthors of a paper called “The Truth is in There: Improving Reasoning in Language Models with Layer-Selective Rank Reduction,” also known as LASER. This paper has been accepted at the International Conference on Learning Representations, or ICLR, in Vienna this year, and you can read a preprint of it now on arXiv. Dipendra, Jordan, thanks for joining us on Abstracts!


JORDAN ASH: Thanks for having us.

DIPENDRA MISRA: Yeah, thanks for having us, Gretchen.

HUIZINGA: Dipendra, let’s start with a general overview of this paper. In a few sentences, describe the issue or problem your work addresses and, perhaps more importantly, why we should care about it.

MISRA: Thanks, Gretchen. So as we know, large language models, also known as LLMs, have revolutionized both business and research in artificial intelligence. They are everywhere, being used to solve a wide range of problems. So in our paper, we introduce an intervention which can be applied to any existing pretrained large language models, and our main purpose for introducing this is to see how it affects the performance of the LLMs and whether we can gain insight into how an LLM stores information in its parameters and how it uses that information to generate a response. And what our intervention does is that it performs a low-rank approximation of the parameters of the LLM. And the surprising discovery that our paper makes is that if you do this intervention correctly, then we can get significant improvement on various tasks for different LLMs.

HUIZINGA: So that’s the first part of the question. Tell me why I should care about it!

MISRA: So if you are a person who uses LLMs for solving any tasks, then you do care about performance on a given task. So, for example, you could be using LLMs to generate an email, right, from a given description. Or you could be using an LLM to do question answering. And by applying our intervention, we can gain accuracy on the task that we care about.

HUIZINGA: Well, let’s stick with you, Dipendra, for a minute and talk about the field writ large. Almost all research owes a debt to some other research that went before. So tell us a bit about the related work in this field and how your work builds on or adds to it.

MISRA: So the work that is most closely related to our LASER paper is this growing body of work on understanding how knowledge is stored and edited inside a large language model. So these works don’t apply the intervention that we do, but they were certainly inspirational for us for arriving at the intervention that we introduced. Another line of work which is very related is, like, adding a small number of parameters to improve the performance of the LLM on a given task. The most relevant work in this space is the LoRA paper, also known as the “Low-Rank Adaptation of Large Language Models,” which came from Microsoft. And what LoRA does, it adds a small number of additional parameters to an LLM and then fine-tunes it on a given task. And what our intervention, called LASER, does is that it removes parameters instead of adding it. And another line of work which is also related is the work on model compression. So there are people who focus on breaking down the size of the models as much as possible while still retaining the performance, more or less, compared to the base model. And so these people are also focused on removing parameters, but they are coming at a different angle of, like, trying to reduce the memory footprint, while what we were doing is that we are less focused on the memory footprint—that’s more like a side effect of it—and more like if I were to fiddle with this parameter of the LLM, then how does it affect the performance? And what can we learn by looking at the comparison? Like, OK, so if I remove this parameter, I see the performance drop; then it means that these parameters are storing something about this type of task on which the performance is dropping.

HUIZINGA: So I’ll ask you one more question, Dipendra, before I pull Jordan into the conversation, and that would be about your methodology. How would you describe your approach to this project, and how did you conduct the research?

MISRA: So we started by analyzing the intervention LASER on a particular LLM called GPT-J and evaluating its performance on this question-answering data CounterFact. So our idea was, like, before trying this thing on [a] bunch of things, let’s just understand this in one setting deeply and, kind of, build insights that we can then evaluate in other settings. And the reason we chose this setup was that the GPT-J large language model has its training data publicly available. It’s called the Pile dataset. And that allows us to do analysis with the training data. For example, is the performance dropping on data points which are rarer or more frequent in the training data? And this is important because training data analysis is frequently omitted in existing LLM literature, and that’s something we wanted to do. And the second reason is that the CounterFact question-answering data is both related to the prior work in this space, so there was a reason for choosing it, but also it has paraphrases of the same question. For example, it might ask, like, “Who is the president of United States of America?” But it will also have paraphrases like “The president of the United States of America is …” or “The head of the government of United States of America is …” And so it will have different variations of the same question. And then you can see if the LLM is able to get all of them right, or is it not robust to variations of the same question? And so we did analysis on this GPT-J and CounterFact dataset. And Jordan will talk more about what the results were. And so based on this rigorous analysis, we developed some insights as to what the intervention is doing. And then we evaluated these insights on other settings. So then we tried, like, two other different large language models and evaluated it on, like, multiple different datasets. And then we saw that the insights actually hold more broadly. And finally, we also evaluated this in a non-text related task, right. Because the intervention could, in principle, be applied to any neural network. So we went after this reinforcement learning model, which solves a puzzle called Sokoban. And we also saw that if you apply this intervention correctly, then you can get some performance improvement. So it’s not related to just large language models, although that was our main motivation.

HUIZINGA: Well, Jordan, let’s get your take on the last few questions here. As I’ve said before, the most interesting section of a research paper for me is the part where it says, “and what we found was …” So as a result of this research, what did you find? Were there outcomes that you expected, or were there any surprises?

ASH: I would say this paper is full of surprises. So as Dipendra was mentioning earlier, the LASER intervention removes information from a model. It doesn’t add information to a model. And up until now, there’s been a lot of work on pruning model parameters for a variety of reasons. But generally, these papers show that as parameters are removed from the model, performance just does not degrade. You can, overall, keep performance roughly the same even with a fairly drastic reduction of model parameters. And those reductions are typically done across layers of the model. What we’re showing here is surprising because we’re showing if we do a very targeted intervention, maybe at only one layer of the model, we could actually get a big boost in performance rather than just, you know, keep it the same or something like this.

HUIZINGA: Hmm. So with those results in mind, Jordan, I’m curious about practical applications. How would you say this research makes an impact in real-world situations? I know that Dipendra alluded to that earlier, but where is this most useful and who benefits most?

ASH: I think the short sales pitch for this technique is that you could potentially improve the performance of a language model with no additional training at all just by applying this intervention, which again just removes information from the model, so you don’t need to have any extra data on hand to refine the model or to add new information into it. The real-world situations we’re seeing a boost right now in LASER is for, like, question answering or reasoning-type tasks where there is, there’s, like, a concrete answer that corresponds to what you’re asking the LLM rather than just a, sort of, like, broad-purpose generative task.

HUIZINGA: So typically speaking, when you’re dealing with LLMs, part of the issue is prompt engineering. And it’s like my responsibility to be able to put the right words in it so I’ll get the best answer from the model, right? Are you saying that this helps me not have to be that good on the prompt-engineer end versus what the model can interpret and do?

ASH: I think prompt engineering still has a place in, sort of, eking out a good answer from a language model, but given a fixed prompt, this intervention seems to offer an improved accuracy over not intervening at all and applying the same prompt.

HUIZINGA: So, Jordan, I often think of an abstract as a sort of appetizer for a research paper. But let’s distill it even further. If there was one thing—sort of an amuse-bouche, if you will—that you want our listeners to take away from this work, what would it be?

ASH: For me, I like this idea of how, you know, typically if you want to get a model to perform better, you would take that model off the shelf and you would refine it on data related to the task at hand. And that might take the form of refining all of the parameters or doing some low-rank LoRA-type thing that Dipendra alluded to earlier. Here, we counterintuitively show that sometimes just carefully removing information from the model can have a positive effect, as well. And this is great news because refining a model requires a lot of new target domain data to be available, but removing information from the model doesn’t necessarily have that same constraint.

HUIZINGA: Well, finally, let’s talk a little bit about the future, Jordan, and I’ll have you close the show for us. What unanswered questions or ongoing research challenges do you see here, and what’s next maybe on your research agenda?

ASH: Yeah, I think there’s a lot of exciting future work for this project. I think for one, as a practical matter, there’s this question of just what’s the best way to find the best LASER intervention? LASER targets a specific layer of the model, and then it finds the extent by which it should be rank-reduced. That search procedure is, kind of, expensive. Right now, we’re doing it in a, sort of, exhaustive way. But also, it seems to be beneficial to apply LASER at multiple layers of the model. And that makes the search procedure, sort of, combinatorially explode. So finding out the best way to compose these interventions, I think, is an important area of future research. And then just, sort of, less on the practical side, I think there are all these questions related to just, why does this work at all? Like, why is it helpful to remove information from the model? And, you know, I think there are some rough ideas we have about this. For example, when you’re training a model on lots and lots of data, you know, it’s not all created equally. Some of it might be noisy or low quality, and some of it might be high quality. And maybe it’s better to remove those samples at training time to get a better model. So I guess there’s this question of, is pruning the model using a LASER-type intervention roughly equivalent to pruning the training data in a way to make it more favorable for eliciting a high-quality model? And again, like Dipendra alluded to earlier, this LoRA procedure, which does something that very much complements LASER and is often used to add information to a model, is it possible that LoRA is actually not just adding information but also removing information from the model? And perhaps that’s one reason why LASER seems to be so effective.

HUIZINGA: So lots of questions.

ASH: I would say so, yeah!

HUIZINGA: Well, Dipendra Misra, Jordan Ash, thanks for joining us today. And to our listeners, thanks for tuning in.

[MUSIC PLAYS]

Again, you can find a link to this paper at aka.ms/abstracts (opens in new tab) or on arXiv (opens in new tab). And I’ll also add that Dipendra will be speaking about this work at the upcoming Microsoft Research Forum, and you can register for this series of events at researchforum.microsoft.com (opens in new tab). See you next time on Abstracts!

[MUSIC FADES]

The post Abstracts: January 25, 2024 appeared first on Microsoft Research.

Read More

Research Focus: Week of January 22, 2024

Research Focus: Week of January 22, 2024

Welcome to Research Focus, a series of blog posts that highlights notable publications, events, code/datasets, new hires and other milestones from across the research community at Microsoft.

Research Focus
January 22, 2024

Register for Microsoft Research Forum

Join Microsoft Research Forum (opens in new tab) for a continuous exchange of ideas about science and technology research in the era of general AI. This series, which begins on January 30, will explore recent research advances, bold new ideas, and important discussions with the global research community. Register now to receive access to all episodes in this quarterly series and be part of the conversation.


Improving Text Embeddings with Large Language Models

Text embeddings are vector representations of natural language that encode semantic information. They are widely used in various natural language processing tasks, such as information retrieval, question answering, semantic textual similarity, bitext mining, item recommendation, etc.

In a recent paper: Improving Text Embeddings with Large Language Models (opens in new tab), researchers from Microsoft introduce a novel and simple method for obtaining high-quality text embeddings using only synthetic data and less than 1k training steps. Unlike existing methods, this new method does not require building complex training pipelines or manually collected datasets that are often constrained by task diversity and language coverage. The researchers leverage proprietary large language models (LLMs) to generate diverse synthetic data for hundreds of thousands of text embedding tasks across nearly 100 languages. They then fine-tune open-source decoder-only LLMs on the synthetic data using standard contrastive loss. Experiments demonstrate that this method achieves strong performance on highly competitive text embedding benchmarks without using any labeled data. Furthermore, when fine-tuned with a mixture of synthetic and labeled data, the model sets new state-of-the-art results on the BEIR (opens in new tab) and MTEB (opens in new tab) benchmarks.

Spotlight: On-demand video

AI Explainer: Foundation models ​and the next era of AI

Explore how the transformer architecture, larger models and more data, and in-context learning have helped advance AI from perception to creation.


DevEx in Action: A study of its tangible impacts

For many professional software developers, the development lifecycle is riddled with friction and red tape, and successful delivery of code to production is a frustratingly infrequent event. Even worse, the problems are often compounded by a lack of management engagement, delaying and frustrating top engineers.

Developer experience (DevEx) is garnering increased attention at many organizations as leaders seek to optimize software delivery against a backdrop of fiscal tightening and transformational technologies such as AI. Developers and technical leaders generally understand that good DevEx leads to better products, more effective software delivery, and developer happiness. Yet, at many organizations, proposed initiatives and investments to improve DevEx struggle to get buy-in, as business stakeholders question the value proposition of improvements.

In a recent paper: DevEx in Action: A study of its tangible impacts (opens in new tab), researchers from Microsoft, GitHub, and DX (opens in new tab) examine this problem and present empirical evidence of how improvements in DevEx influence outcomes like productivity, code quality, and innovation.


The post Research Focus: Week of January 22, 2024 appeared first on Microsoft Research.

Read More

MetaOpt: Examining, explaining, and improving heuristic performance

MetaOpt: Examining, explaining, and improving heuristic performance

The MetaOpt workflow involves 4 steps (1) users encode the heuristic; (2) MetaOpt automatically does re-writes to obtain a single-level optimization; (3) it partitions the problem into smaller sub-problems to achieve scale; (4) it uses existing solvers to find the highest performance gap.

Heuristic algorithms, often referred to as heuristics, are tools used to approximate optimal algorithms to make faster and more efficient decisions. These are particularly useful in operational scenarios in production systems, such as determining which server a virtual machine should be assigned to or deciding whether data should be removed from a cache in a content delivery network.

However, cloud operators, who are responsible for designing and managing systems in production, often struggle to evaluate when and where their heuristics may underperform. This challenge can lead to over-provisioning the network and the inefficient use of available resources. Such practices can be costly and may result in the inability to meet customer demand.

To address this, we developed MetaOpt, a heuristic analyzer designed to enable operators to examine, explain, and improve heuristics’ performance before deploying them in critical, high-stakes environments. MetaOpt is unique because it not only compares algorithm performance but also provides insights into the underlying reasons for performance disparities between algorithms. It empowers operators and researchers to conduct “what-if” analyses, strategize how to combine heuristics in production, and understand why certain heuristics perform better in specific areas of the input space—the range of possible inputs that the heuristic may encounter.

We demonstrate MetaOpt’s capability for heuristic analysis by studying heuristics from three domains: traffic engineering, vector bin packing, and packet scheduling. MetaOpt identifies large performance gaps, enables us to prove properties about these heuristics, and guides us in improving them. Table 1 summarizes the results.

MetaOpt allowed us to (1) find the performance gap between heuristics from traffic engineering (TE), vector bin packing (VBP), and packet scheduling (PIFO); (2) prove various properties about the heuristic; and (3) design modificationsto improve their performance. DP refers to a heuristic Microsoft has deployed in our wide area network for traffic engineering.
Table 1. MetaOpt enabled us to find the performance gap between heuristics from traffic engineering, vector bin packing, and packet scheduling. It also helped us prove various properties about the heuristics. Finally, it helped us modify the heuristics to improve their performance. DP refers to a heuristic Microsoft has deployed in its wide area network for traffic engineering.

Currently, MetaOpt helps Azure operators analyze heuristics in production and serves as a “helper for theorem proving.” For example, we used MetaOpt to establish a tighter bound for the “first fit decreasing” heuristic in vector bin packing, a challenge for theoreticians for over three decades. As a result, we don’t need to over-provision resources in a cloud environment, ensuring we always have sufficient servers to meet customer demand.

MetaOpt framework

To use MetaOpt, users input the heuristic they want analyzed and either the optimal algorithm or another heuristic. MetaOpt efficiently translates these inputs into a solver format. It then finds performance gaps and the inputs that cause them. Recognizing that not all users are versed in optimization theory, we designed a higher-level abstraction for MetaOpt. This feature enables users to input their heuristics using a few simple building blocks and constrain the input space to what is relevant in practice. MetaOpt can then analyze decisions made by the heuristic that led to underperformance or identify input properties that caused the heuristic to make suboptimal choices. We illustrate the MetaOpt workflow in Figure 1.

The MetaOpt workflow involves 4 steps (1) users encode the heuristic; (2) MetaOpt automatically does re-writes to obtain a single-level optimization; (3) it partitions the problem into smaller sub-problems to achieve scale; (4) it uses existing solvers to find the highest performance gap.
Figure 1. The four steps in the MetaOpt workflow. (1) Users encode the heuristic; (2) MetaOpt automatically rewrites it to obtain a single-level optimization; (3) it divides the problem into smaller, more manageable segments for scalability; (4) it employs existing solvers to find the highest performance gap.

Rooted in game theory concepts

MetaOpt is based on Stackelberg games, a well-known class of leader-follower games in game theory. Here, the leader determines inputs for one or more followers, who must then optimize their outcomes based on these inputs. In the MetaOpt framework, the leader’s goal is to maximize the performance disparity between two algorithms (the followers) by deciding their inputs. The followers, representing the algorithms being compared, choose internal variables to optimize their outcomes. This, in turn, affects the leader’s results. We show this in Figure 2.

The stackelberg structure of MetaOpt
Figure 2. The high-level formulation of MetaOpt.

Looking ahead

MetaOpt marks a significant advance in the field of scalable, user-friendly analytical tools. It enables users to examine, understand, and explain differences in performance across competing algorithms. It also helps them improve those algorithms before deploying them in critical environments.

We began developing MetaOpt in early 2022 to address a specific need for heuristic analysis in our network’s traffic engineering solution. Since then, our focus has been on enhancing MetaOpt’s accessibility for users without a background in optimization theory. Currently, we are improving MetaOpt’s scalability and usability, and we are expanding the range of heuristics it supports. We plan to release it as an open-source tool at the USENIX Symposium on Networked Systems Design and Implementation (opens in new tab) (NSDI) conference, scheduled for April 16–18, 2024.

We believe that MetaOpt can significantly boost productivity for those studying or designing heuristics by serving as a risk-analysis engine and a tool for explainable AI and active learning. In the near future, we aim to publish papers on new MetaOpt applications and share our language for describing heuristics.

For more details, visit the MetaOpt webpage, and review our publications page for the latest developments.

The post MetaOpt: Examining, explaining, and improving heuristic performance appeared first on Microsoft Research.

Read More

GHDDI and Microsoft Research use AI technology to achieve significant progress in discovering new drugs to treat global infectious diseases

GHDDI and Microsoft Research use AI technology to achieve significant progress in discovering new drugs to treat global infectious diseases

GHDDI name and logo on the left with a rainbow spectrum colored honeycomb on the right on a green and blue gradient background

The Global Health Drug Discovery Institute (GHDDI) (opens in new tab) and Microsoft Research recently achieved significant progress in accelerating drug discovery for the treatment of global infectious diseases. Working in close collaboration, the joint team successfully used generative AI and foundation models to design several small molecule inhibitors for essential target proteins of Mycobacterium tuberculosis and coronaviruses. These new inhibitors show outstanding bioactivities, comparable to or surpassing the best-known lead compounds.

This breakthrough is a testament to the team’s combined efforts in generative AI, molecular physicochemical modeling, and iterative feedback loops between scientists and AI technologies. Normally, the discovery and in vitro confirmation of such molecules could take up to several years, but with the acceleration of AI, the joint team achieved these new results in just five months. This research also shows the tremendous potential of AI for helping scientists discover or create the building blocks needed to develop effective treatments for infectious diseases that continue to threaten the health and lives of people around the world.

Since 2019, for example, there have been more than 772 million confirmed cases of COVID-19 worldwide and nearly 7 million deaths from the virus, according to the World Health Organization (WHO), the Centers for Disease Control, and various other sources. Although vaccines have reduced the incidence and deadliness of the disease, the coronavirus continues to mutate and evolve, making it a serious ongoing threat to global health. Meanwhile, the WHO reports that tuberculosis continues to be a leading cause of death among infectious diseases, second only to COVID-19 in 2022, when 10.6 million people worldwide fell ill with TB and the disease killed 1.3 million (the most recent figures currently available).

Laying the foundation for new infectious disease treatments

Microsoft Research has rich experience in developing and pre-training large AI models specialized for proteins and molecules, demonstrated in both property prediction and molecular generation. Based on those experiences, Microsoft Research developed and maintains ownership of an AI model for molecule generation tailored for specific protein targets. The generated compounds were virtually screened and further optimized by data scientists and medicinal chemists from GHDDI, followed by compound synthesis and wet-lab experiments to quantify bioactivities. The experimental results were then fed back to the research team at Microsoft for AI model improvement and new compound generation.

This AI-expert-experiment integrated pipeline enables the success of novel compound generation for protein targets in Mycobacterium tuberculosis and coronaviruses SARS-CoV-2. In less than five months, the joint team designed several chemical compounds that are effective in inhibiting these pathogens’ essential target proteins, accelerating the structure-based drug discovery process.

Figure 1. Two potential inhibitor compounds (generated by our method) for ClpP of tuberculosis.
Dose response curves of the compounds generated for coronavirus, with GRL0617 as the reference compound, demonstrating enhanced bioactivity. The most recent progress is that the joint team has effectively optimized the IC50 to 0.18uM, which is approximately an eight-fold improvement compared to GRL0617.
Dose response curves of the compounds generated for coronavirus, with GRL0617 as the reference compound, demonstrating enhanced bioactivity. The most recent progress is that the joint team has effectively optimized the IC50 to 0.18uM, which is approximately an eight-fold improvement compared to GRL0617.

One distinguishing feature of AI-generated molecules is their novel scaffold structures, which are important because they create the potential for these molecules to be developed into a new class of drug candidates. These novel structures offer the possibility of more effective treatments, and also help to address the escalating challenge of antimicrobial resistance (AMR), a major hurdle in treating infectious diseases like tuberculosis and COVID-19.

“In the current landscape of scientific research, we encounter unparalleled challenges but also have unprecedented opportunities,” said Dr. Sheng Ding, institute director of GHDDI. “Innovation stands as the central catalyst for scientific advancement and a crucial element in addressing global health challenges. I’m excited about our collaboration with Microsoft Research and gratified with the progress we’ve jointly achieved. Without a doubt, our combined efforts will enhance R&D efficiency and expedite the process of drug discovery.”

“This represents a collaboration that transcends disciplines and boundaries,” he noted. “Our combined strengths will advance pharmaceutical research, paving new avenues in scientific exploration. Going forward, we anticipate deploying such cutting-edge technologies in uncharted realms of life sciences. This will enable us to offer more comprehensive, profound, and practical solutions for global health issues.”

MICROSOFT RESEARCH PODCAST

Abstracts: October 23, 2023

On “Abstracts,” Partner Research Manager Andy Gordon & Senior Researcher Carina Negreanu explore new work introducing co-audit, a term for any tool-assisted experience that helps users of generative AI find and fix mistakes in AI output.


Using AI to improve global health

Embracing the principle of open innovation, the collaboration between GHDDI and Microsoft Research is dedicated to harnessing AI technology to expedite drug discovery. The goal is to contribute to global health equity through the development of lifesaving medications and the prompt delivery of safer and more effective drug solutions that are accessible to everyone.  The collaboration focuses on infectious diseases that pose a threat to global health, including but not limited to tuberculosis, viral infections, and malaria. Both parties are committed to a deep integration of generative AI, foundational models, high-throughput virtual screening, and expert knowledge to tackle these challenges.

“Successful AI-driven drug discovery necessitates a tight-knit collaboration between AI specialists and medicinal experts,” said Dr. Tie-Yan Liu, distinguished scientist at Microsoft Research AI4Science. “In recent years, our globally recognized team at Microsoft Research has been deeply engaged in interdisciplinary research between AI and natural science. To complement this, GHDDI experts bring to the table a wealth of industry experience and profound domain knowledge. Their experimental facilities not only allow for testing but also help provide invaluable feedback for training AI models. Because of our close collaboration, we look forward to producing groundbreaking research outcomes with the potential to redefine the future of healthcare through AI technology innovation.”

Accelerating drug discovery

Commenting on the research into Mycobacterium tuberculosis and coronaviruses, Dr. Rumin Zhang, chief scientific officer at GHDDI, noted that the application of AI technology by the collaborative team managed to considerably reduce the traditionally lengthy drug discovery process. The team was able to design and validate highly effective small molecule inhibitors for the pathogens in just five months.

“This is an exceptional accomplishment that underscores the immense potential of AI in efficient de novo drug design. It also vividly illustrates the team’s exceptional innovative capacity and professional prowess,” he said. “We are excited about this innovative R&D strategy leading to more groundbreaking advancements in a broader spectrum of future drug discovery projects.”

“This work is all about pushing the boundaries of AI technology for application in new drug R&D,” said Dr. Tao Qin, senior principal researcher at Microsoft Research AI4Science “We aim to leverage AI innovations to enhance human health, tackle worldwide health issues, and ensure the advantages of AI technology are accessible to all.”

“We plan to intensify and broaden our collaboration, further advancing the use of AI technology in the realm of life sciences,” said Dr. Jinjiang Guo, head of the Data Science Department at GHDDI. “This will yield novel insights that will enrich researchers’ understanding of mechanisms underlying diseases and life, thus paving the way for the development of innovative treatment strategies and providing more effective solutions for diseases that have long affected human health. We are highly optimistic about the potential of this collaboration and are confident that it will have a substantial impact on the future of the healthcare field.”

Next steps

In the next phase, Microsoft Research and GHDDI will collaborate to optimize the discovered hit compounds, enhance ADMET properties, progress toward preclinical studies, and initiate a broader range of drug-discovery projects.

The post GHDDI and Microsoft Research use AI technology to achieve significant progress in discovering new drugs to treat global infectious diseases appeared first on Microsoft Research.

Read More