Microsoft AI – Page 9

Abstracts: November 5, 2024

November 5, 2024

by Alyssa Hughes (2ADAPTIVE LLC dba 2A Consulting) Microsoft AI

Outlined illustrations of Chris Hawblitzel and Jay Lorch for the Microsoft Research Podcast, Abstracts series.

Members of the research community at Microsoft work continuously to advance their respective fields. Abstracts brings its audience to the cutting edge with them through short, compelling conversations about new and noteworthy achievements. 

In this episode, Microsoft senior principal researchers Chris Hawblitzel and Jay Lorch join host Amber Tingle to discuss “Verus: A Practical Foundation for Systems Verification,” which received the Distinguished Artifact Award at this year’s Symposium on Operating Systems Principles, or SOSP. In their research, Hawblitzel, Lorch, and their coauthors leverage advances in programming languages and formal verification with two aims. The first aim is to help make software verification more accessible for systems developers so they can demonstrate their code will behave as intended. The second aim is to provide the research community with sound groundwork to tackle the application of formal verification to large, complex systems.

Read the paper

Transcript

[MUSIC]

AMBER TINGLE: Welcome to Abstracts, a Microsoft Research Podcast that puts the spotlight on world-class research in brief. I’m Amber Tingle. In this series, members of the research community at Microsoft give us a quick snapshot—or a podcast abstract—of their new and noteworthy papers.

[MUSIC FADES]

Our guests today are Chris Hawblitzel and Jay Lorch. They are both senior principal researchers at Microsoft and two of the coauthors on a paper called “Verus: A Practical Foundation for Systems Verification.” This work received the Distinguished Artifact Award at the 30th Symposium on Operating Systems Principles, also known as SOSP, which is happening right now in Austin, Texas. Chris and Jay, thank you for joining us today for Abstracts and congratulations!

JAY LORCH: Thank you for having us.

CHRIS HAWBLITZEL: Glad to be here.

TINGLE: Chris, let’s start with an overview. What problem does this research address, and why is Verus something that the broader research community should know about?

HAWBLITZEL: So what we’re trying to address is a very simple problem where we’re trying to help developers write software that doesn’t have bugs in it. And we’re trying to provide a tool with Verus that will help developers show that their code actually behaves the way it’s supposed to; it obeys some sort of specification for what the program is supposed to do.

TINGLE: How does this publication build on or differ from other research in this field, including your previous Verus-related work?

HAWBLITZEL: So formal verification is a process where you write down what it is that you want your program to do in mathematical terms. So if you’re writing an algorithm to sort a list, for example, you might say that the output of this algorithm should be a new list that is a rearrangement of the elements of the old list, but now this rearrangement should be in sorted order. So you can write that down using standard mathematics. And now given that mathematical specification, the challenge is to prove that your piece of software written in a particular language, like Java or C# or Rust, actually generates an output that meets that mathematical specification. So this idea of using verification to prove that your software obeys some sort of specification, this has been around for a long time, so, you know, even Alan Turing talked about ways of doing this many, many decades ago. The challenge has always been that it’s really hard to develop these proofs for any large piece of software. It simply takes a long time for a human being to write down a proof of correctness of their software. And so what we’re trying to do is to build on earlier work in verification and recent developments in programming languages to try to make this as easy as possible and to try to make it as accessible to ordinary software developers as possible. So we’ve been using existing tools. There are automated theorem provers—one of them from Microsoft Research called Z3—where you give it a mathematical formula and ask it to prove that the formula is valid. We’re building on that. And we’re also taking a lot of inspiration from tools developed at Microsoft Research and elsewhere, like Dafny and F* and so on, that we’ve used in the past for our previous verification projects. And we’re trying to take ideas from those and make them accessible to developers who are using common programming languages. In this case, the Rust programming language is what we’re focusing on.

TINGLE: Jay, could you describe your methodology for us and maybe share a bit about how you and your coauthors tested the robustness of Verus.

LORCH: So the question we really want to answer is, is Verus suitable for systems programming? So that means a variety of things. Is it amenable to a variety of kinds of software that you want to build as part of a system? Is it usable by developers? Can they produce compact proofs? And can they get timely feedback about those proofs? Can the verifier tell you quickly that your proof is correct or, if it’s wrong, that it’s wrong and guide you to fix it? So the main two methodological techniques we used were millibenchmarks and full systems. So the millibenchmarks are small pieces of programs that have been verified by other tools in the past, and we built them in Verus and compared to what other tools would do to find whether we could improve usability. And we found generally that we could verify the same things but with more compact proofs and proofs that would give much snappier feedback. The difference between one second and 10 seconds might not seem a lot, but when you’re writing code and working with the verifier, it’s much nicer to get immediate feedback about what is wrong with your proof so you can say, oh, what about this? And it can say, oh, well, I still see a problem there. And you could say, OK, let me fix that. As opposed to waiting 10, 20 seconds between each such query to the verifier. So the millibenchmarks helped us evaluate that. And the macrobenchmarks, the building entire systems, we built a couple of distributed systems that had been verified before—a key value store and a node replication system—to show that you could do them more effectively and with less verification time. We also built some new systems, a verified OS page table, a memory allocator, and a persistent memory append-only log.

TINGLE: Chris, the paper mentions that successfully verifying system software has required—you actually use the word heroic to describe the developer effort. Thinking of those heroes in the developer community and perhaps others, what real-world impact do you expect Verus to have? What kind of gains are we talking about here?

HAWBLITZEL: Yeah, so I think, you know, traditionally verification or this formal software verification that we’re doing has been considered a little bit of a pie-in-the-sky research agenda. Something that people have applied to small research problems but has not necessarily had a real-world impact before. And so I think it’s just, you know, recently, in the last 10 or 15 years, that we started to see a change in this and started to see verified software actually deployed in practice. So on one of our previous projects, we worked on verifying the cryptographic primitives that people use when, say, they browse the web or something and their data is encrypted. So in these cryptographic primitives, there’s a very clear specification for exactly what bytes you’re supposed to produce when you encrypt some data. And the challenge is just writing software that actually performs those operations and does so efficiently. So in one of our previous projects that we worked on called HACL* and EverCrypt, we verified some of the most commonly used and efficient cryptographic primitives for things like encryption and hashing and so on. And these are things that are actually used on a day-to-day basis. So we, kind of, took from that experience that the tools that we’re building are getting ready for prime time here. We can actually verify software that is security critical, reliability critical, and is in use. So some of the things that Jay just mentioned, like verifying, you know, persistent memory storage systems and so on, those are the things that we’re looking at next for software that would really benefit from reliability and where we can formally prove that your data that’s written to disk is read correctly back from disk and not lost during a crash, for example. So that’s the kind of software that we’re looking to verify to try to have a real-world impact.

LORCH: The way I see the real-world impact, is it going to enable Microsoft to deal with a couple of challenges that are severe and increasing in scale? So the first challenge is attackers, and the second challenge is the vast scale at which we operate. There’s a lot of hackers out there with a lot of resources that are trying to get through our defenses, and every bug that we have offers them purchase, and techniques like this, that can get rid of bugs, allow us to deal with that increasing attacker capability. The other challenge we have is scale. We have billions of customers. We have vast amounts of data and compute power. And when you have a bug that you’ve thoroughly tested but then you run it on millions of computers over decades, those rare bugs eventually crop up. So they become a problem, and traditional testing has a lot of difficulty finding those. And this technology, which enables us to reason about the infinite possibilities in a finite amount of time and observe all possible ways that the system can go wrong and make sure that it can deal with them, that enables us to deal with the vast scale that Microsoft operates on today.

HAWBLITZEL: Yeah, and I think this is an important point that differentiates us from testing. Traditionally, you find a bug when you see that bug happen in running software. With formal verification, we’re catching the bugs before you run the software at all. We’re trying to prove that on all possible inputs, on all possible executions of the software, these bugs will not happen, and it’s much cheaper to fix bugs before you’ve deployed the software that has bugs, before attackers have tried to exploit those bugs.

TINGLE: So, Jay, ideally, what would you like our listeners and your fellow SOSP conference attendees to tell their colleagues about Verus? What’s the key takeaway here?

LORCH: I think the key takeaway is that it is possible now to build software without bugs, to build systems code that is going to obey its specification on all possible inputs always. We have that technology. And this is possible now because a lot of technology has advanced to the point where we can use it. So for one thing, there’s advances in programming languages. People are moving from C to Rust. They’ve discovered that you can get the high performance that you want for systems code without having to sacrifice the ability to reason about ownership and lifetimes, concurrency. The other thing that we build on is advances in computer-aided theorem proving. So we can really make compact and quick-to-verify mathematical descriptions of all possible behaviors of a program and get fast answers that allow us to rapidly turn around proof challenges from developers.

TINGLE: Well, finally, Chris, what are some of the open questions or future opportunities for formal software verification research, and what might you and your collaborators tackle next? I heard a few of the things earlier.

HAWBLITZEL: Yes, I think despite, you know, the effort that we and many other researchers have put into trying to make these tools more accessible, trying to make them easier to use, there still is a lot of work to prove a piece of software correct, even with advanced state-of-the-art tools. And so we’re still going to keep trying to push to make that easier. Trying to figure out how to automate the process better. There’s a lot of interest right now in artificial intelligence for trying to help with this, especially if you think about artificial intelligence actually writing software. You ask it to write a piece of software to do a particular task, and it generates some C code or some Rust code or some Java code, and then you hope that that’s correct because it could have generated any sort of code that performs the right thing or does total nonsense. So it would be really great going forward if when we ask AI to develop software, we also expect it to create a proof that the software is correct and does what the user asked for. We’ve started working on some projects, and we found that the AI is not quite there yet for realistic code. It can do small examples this way. But I think this is still a very large challenge going forward that could have a large payoff in the future if we can get AI to develop software and prove that the software is correct.

LORCH: Yeah, I see there’s a lot of synergy between—potential synergy—between AI and verification. Artificial intelligence can solve one of the key challenges of verification, namely making it easy for developers to write that code. And verification can solve one of the key challenges of AI, which is hallucinations, synthesizing code that is not correct, and Verus can verify that that code actually is correct.

TINGLE: Well, Chris Hawblitzel and Jay Lorch, thank you so much for joining us today on the Microsoft Research Podcast to discuss your work on Verus.

[MUSIC]

HAWBLITZEL: Thanks for having us.

LORCH: Thank you.

TINGLE: And to our listeners, we appreciate you, too. If you’d like to learn more about Verus, you’ll find a link to the paper at aka.ms/abstracts or you can read it on the SOSP website. Thanks for tuning in. I’m Amber Tingle, and we hope you’ll join us again for Abstracts.

[MUSIC FADES]

The post Abstracts: November 5, 2024 appeared first on Microsoft Research.

Abstracts: November 4, 2024

November 4, 2024

by Alyssa Hughes (2ADAPTIVE LLC dba 2A Consulting) Microsoft AI

Outlined illustrations of Shan Lu and Bogdan Stoica for the Microsoft Research Podcast.

Members of the research community at Microsoft work continuously to advance their respective fields. Abstracts brings its audience to the cutting edge with them through short, compelling conversations about new and noteworthy achievements.

In this episode, Senior Principal Research Manager Shan Lu and Bogdan Stoica, a PhD candidate at the University of Chicago, join host Gretchen Huizinga to discuss “If At First You Don’t Succeed, Try, Try, Again … ? Insights and LLM-informed Tooling for Detecting Retry Bugs in Software Systems.” In the paper, which was accepted at this year’s Symposium on Operating Systems Principles, or SOSP, Lu, Stoica, and their coauthors examine typical retry issues and present techniques that leverage traditional program analysis and large language models to help detect them.

Read the paper

Transcript

[MUSIC]

GRETCHEN HUIZINGA: Welcome to Abstracts, a Microsoft Research Podcast that puts the spotlight on world-class research in brief. I’m Dr. Gretchen Huizinga. In this series, members of the research community at Microsoft give us a quick snapshot—or a podcast abstract—of their new and noteworthy papers.

[MUSIC FADES]

Today I’m talking to Dr. Shan Lu, a senior principal research manager at Microsoft Research, and Bogdan Stoica, also known as Bo, a doctoral candidate in computer science at the University of Chicago. Shan and Bogdan are coauthors of a paper called “If at First You Don’t Succeed, Try, Try, Again …? Insights and LLM-informed Tooling for Detecting Retry Bugs in Software Systems.” And this paper was presented at this year’s Symposium on Operating Systems Principles, or SOSP. Shan and Bo, thanks for joining us on Abstracts today!

SHAN LU: Thank you.

BOGDAN STOICA: Thanks for having us.

HUIZINGA: Shan, let’s kick things off with you. Give us a brief overview of your paper. What problem or issue does it address, and why should we care about it?

LU: Yeah, so basically from the title, we are looking at retry bugs in software systems. So what retry means is that people may not realize for big software like the ones that run in Microsoft, all kinds of unexpected failures—software failure, hardware failure—may happen. So just to make our software system robust, there’s often a retry mechanism built in. So if something unexpected happens, a task, a request, a job will be re-executed. And what this paper talks about is, it’s actually very difficult to implement this retry mechanism correctly. So in this paper, we do a study to understand what are typical retry problems and we offer a solution to detecting these problems.

HUIZINGA: Bo, this clearly isn’t a new problem. What research does your paper build on, and how does your research challenge or add to it?

STOICA: Right, so retry is a well-known mechanism and is widely used. And retry bugs, in particular, have been identified in other papers as root causes for all sorts of failures but never have been studied as a standalone class of bugs. And what I mean by that, nobody looked into, why is it so difficult to implement retry? What are the symptoms that occur when you don’t implement retry correctly? What are the causes of why developers struggle to implement retry correctly? We built on a few key bug-finding ideas that have been looked at by other papers but never in this context. We use fault injection. We repurpose existing unit tests to trigger this type of bugs as opposed to asking developers to write specialized tests to trigger retry bugs. So we’re, kind of, making the developer’s job easier in a sense. And in this pipeline, we also rely on large language models to augment the program and the code analysis that goes behind the fault injection and the reutilization of existing tests.

HUIZINGA: Have large language models not been utilized much in this arena?

LU: I want to say that, you know, actually this work was started about two years ago. And at that time, large language model was really in its infancy and people just started exploring what large language model can help us in terms of improving software reliability. And our group, and together with, you know, actually same set of authors from Microsoft Research, we actually did some of the first things in a workshop paper just to see what kind of things that we were able to do before like, you know, finding bugs can now be replicated by using large language model.

HUIZINGA: OK …

LU: But at that time, we were not very happy because, you know, just use large language model to do something people were able to do using traditional program analysis, I mean, it seems cool, right, but does not add new functionality. So I would say what is new, at least when we started this project, is we were really thinking, hey, are there anything, right, are there some program analysis, are there some bug finding that we were not able to do using traditional program analysis but actually can be enabled by large language model.

HUIZINGA: Gotcha …

LU: And so that was at, you know, what I feel like was novel at least, you know, when we worked on this. But of course, you know, large language model is a field that is moving so fast. People are, you know, finding new ways to using it every day. So yeah.

HUIZINGA: Right. Well, in your paper, you say that retry functionality is commonly undertested and thus prone to problems slipping into production. Why would it be undertested if it’s such a problem?

STOICA: So testing retry is difficult because what you need is to simulate the systemwide conditions that lead to retry. That often means simulating external transient errors that might happen on the system that runs your application. And to do this during testing and capture this in a small unit test is difficult.

LU: I think, actually, Bogdan said this very well. It’s like, why do we need a retry? It’s, like, when unexpected failure happen, right. And this is, like, something like Bogdan mentioned, like external transient error such as my network card suddenly does not work, right. And this may occur, you know, only for, say, one second and then it goes back on. But this one second may cause some job to fail and need retry. So during normal testing, these kind of unexpected things rarely, rarely happen, if at all, and it’s also difficult to simulate. That’s why it’s just not well tested.

HUIZINGA: Well, Shan, let’s talk about methodology. Talk a bit about how you tackled this work and why you chose the approach you did for this particular problem.

LU: Yeah, so I think this work includes two parts. One is a systematic study. We study several big open-source systems to see whether there are retry-related problems in this real system. Of course there are. And then we did a very systematic categorization to understand the common characteristics. And the second part is about, you know, detecting. And in terms of method, we have used, particularly in the detecting part, we actually used a hybrid of techniques of traditional static program analysis. We used this large language model-enabled program analysis. In this case, imagine we just asked a large language model saying, hey, tell us, are there any retry implemented in this code? If there is, where it is, right. And then we also use, as Bogdan mentioned, we repurposed unit test to help us to execute, you know, the part of code that large language model tell us there may be a retry. And addition to that, we also used fault injection, which means we simulate those transient, external, environmental failures such as network failures that very rarely would occur by itself.

HUIZINGA: Well, Bo, I love the part in every paper where the researchers say, “And what we found was …” So tell us, what did you find?

STOICA: Well, we found that implementing retry is difficult and complex! Not only find new bugs because, yes, that was kind of the end goal of the paper but also try to understand why these bugs are happening. As Shan mentioned, we started this project with a bug study. We looked at retry bugs across eight to 10 applications that are widely popular, widely used, and that the community is actively contributing to them. And the experiences of both users and developers, if we can condense that—what do you think about retries?—is that, yeah, they’re frustrated because it’s a simple mechanism, but there’s so many pitfalls that you have to be aware of. So I think that’s the biggest takeaway. Another takeaway is that when I was thinking about bug-finding tools, I was having this somewhat myopic view of, you know, you instrument at the program statement level, you figure out relationships between different lines of code and anti-patterns, and then you build your tools to find those anti-patterns. Well, with retry, this kind of gets thrown out the window because retry is a mechanism. It’s not just one line of code. It is multiple lines of code that span multiple functions, multiple methods, and multiple files. And you need to think about retry holistically to find these issues. And that’s one of the reasons we used large language models, because traditional static analysis or traditional program analysis cannot capture this. And, you know, large language models turns out to be actually great at this task, and we try to harness the, I would say, fuzzy code comprehension capabilities of large language models to help us find retry bugs.

HUIZINGA: Well, Shan, research findings are important, but real-world impact is the ultimate goal here. So who will this research help most and why?

LU: Yeah, that’s a great question. I would consider several groups of people. One is hopefully, you know, people who actually build, design real systems will find our study interesting. I hope it will resonate with them about those difficulties in implementing retry because we studied a set of systems and there was a little bit of comparison about how different retry mechanisms are actually used in different systems. And you can actually see that, you know, this different mechanism, you know, they have pros and cons, and we have a little bit of, you know, suggestion about what might be good practice. That’s the first group. The second group is, our tool actually did find, I would say, a relatively large number of retry problems in the latest version of every system we tried, and we find these problems, right, by repurposing existing unit tests. So I hope our tool will be used, you know, in the field by, you know, being maybe integrated with future unit testing so that our future system will become more robust. And I guess the third type of, you know, audience I feel like may benefit by reading our work, knowing our work: the people who are thinking about how to use large language model. And as I mentioned, I think a takeaway is large language model can repeat, can replace some of things we were able to do using traditional program analysis and it can do more, right, for those fuzzy code comprehension–related things. Because for traditional program analysis, we need to precisely describe what I want. Like, oh, I need a loop. I need a WRITE statement, right. For large language model, it’s imprecise by nature, and that imprecision sometimes actually match with the type of things we’re looking for.

HUIZINGA: Interesting. Well, both of you have just, sort of, addressed nuggets of this research. And so the question that I normally ask now is, if there’s one thing you want our listeners to take away from the work, what would it be? So let’s give it a try and say, OK, in a sentence or less, if I’m reading this paper and it matters to me, what’s my big takeaway? What is my big “aha” that this research helps me with?

STOICA: So the biggest takeaway of this paper is not to be afraid to integrate large language models in your bug-finding or testing pipelines. And I’m saying this knowing full well how imprecise large language models can be. But as long as you can trust but verify, as long as you have a way of checking what these models are outputting, you can effectively insert them into your testing framework. And I think this paper is showing one use case and bring us closer to, you know, having it integrated more ubiquitously.

HUIZINGA: Well, Shan, let’s finish up with ongoing research challenges and open questions in this field. I think you’ve both alluded to the difficulties that you face. Tell us what’s up next on your research agenda in this field.

LU: Yeah, so for me, personally, I mean, I learned a lot from this project and particularly this idea of leveraging large language model but also as a way to validate its result. I’m actually working on how to leverage large language model to verify the correctness of code, code that may be generated by large language model itself. So it’s not exactly, you know, a follow-up of this work, but I would say at idea, you know, philosophical level, it is something that is along this line of, you know, leverage large language model, leverage its creativity, leverage its … sometimes, you know … leverage its imprecision but has a way, you know, to control it, to verify it. That’s what I’m working on now.

HUIZINGA: Yeah … Bo, you’re finishing up your doctorate. What’s next on your agenda?

STOICA: So we’re thinking of, as Shan mentioned, exploring what large language models can do in this bug-finding/testing arena further and harvesting their imprecision. I think there are a lot of great problems that traditional code analysis has tried to tackle, but it was difficult. So in that regard, we’re looking at performance issues and how large language models can help identify and diagnose those issues because my PhD was mostly focused, up until this point, on correctness. And I think performance inefficiencies are such a wider field and with a lot of exciting problems. And they do have this inherent imprecision and fuzziness to them that also large language models have, so I hope that combining the two imprecisions maybe gives us something a little bit more precise.

HUIZINGA: Well, this is important research and very, very interesting.

[MUSIC]

Shan Lu, Bogdan Stoica, thanks for joining us today. And to our listeners, thanks for tuning in. If you’re interested in learning more about this paper, you can find a link at aka.ms/abstracts. And you can also find it on the SOSP website. See you next time on Abstracts!

[MUSIC FADES]

The post Abstracts: November 4, 2024 appeared first on Microsoft Research.

AI-powered microgrids facilitate energy resilience and equity in regional communities

November 1, 2024

by Alyssa Hughes (2ADAPTIVE LLC dba 2A Consulting) Microsoft AI

Three icons that represent (left to right) ecology and environment, economics, and technology for emerging markets.

The rise of affordable small-scale renewable energy, like rooftop solar panels, is reshaping energy systems around the world. This shift away from fossil fuel-powered grids creates new opportunities for energy distribution that prioritize decentralized energy ownership and community empowerment. Despite this progress, centralized energy systems still dominate, often failing to provide vulnerable communities with reliable, affordable renewable energy. In response, Microsoft researchers are collaborating with local communities to explore how AI can enable community-scale energy solutions focused on energy availability and equity as well as decarbonization.

AI-powered microgrids support resilient communities

Microgrids, small and localized energy systems, hold promise as a solution to the challenges of centralized energy systems. These microgrids can operate independently from the larger grid, providing participants with resilience and control. Figure 1 shows how these systems integrate renewable energy sources and storage to efficiently manage local energy needs.

Figure 1: The image shows a microgrid system with interconnected assets, including rooftop solar panels, battery storage locations, electric vehicle chargers, wind turbines, and large solar farms, all supporting a small community and tied to the central power grid. — Figure 1. An example of the decentralized nature of a microgrid power system

AI improves energy reliability by integrating data about energy consumption, market prices, and weather forecasts, necessary when using wind and solar power, which rely on weather conditions. Advanced forecasting predicts renewable energy availability, while AI-driven analytics determine when to generate, store, or sell electricity. This increases efficiency and stabilizes the grid by balancing supply and demand.

When powered by AI, microgrids can also contribute to energy equity. In many rural parts of the US, flat-rate billing models are still common, often leading to unfair pricing. AI-enabled microgrids provide an alternative by allowing communities to pay only for the energy they use. By analyzing consumption patterns, AI can ensure optimized distribution that promotes equitable pricing and access. These systems also improve resilience during crises, enabling communities to manage energy distribution more effectively and reduce reliance on centralized utilities. AI allows microgrids to predict energy demands, identify system vulnerabilities, and recover quickly during outages.

Evaluating AI’s impact on microgrid efficiency and equity

To explore AI’s potential in improving efficiency and equity in energy management, a team of Microsoft researchers collaborated with community organizations on simulations and a case study. They built a tabletop simulator to test whether AI could effectively determine when to generate, store, or sell electricity based on real-time data. The AI model was optimized for resilience and efficiency, using reinforcement learning to control grid and battery processes, enabling microgrids adapt to changing energy conditions and market dynamics.

This simulation used a theoretical model with external data to show how an AI-driven microgrid could autonomously buy and sell energy based on strategic design parameters. By controlling when the battery is charged and discharged based on energy production and consumption patterns, the model maximized efficiency and maintained local power availability. Figure 2 shows the AI-controlled grid’s optimal decisions using open-source data from the California Independent System Operator (CAISO), serving as a proof of concept (PoC) for AI-driven microgrids operating under real-world conditions.

Figure 2 (A): Graph depicting peak and off-peak net power bought or sold over one week using simulations of the AI controller on historical CAISO data. The graph shows a direct correlation that when solar is available then more power is bought than sold, whereas, during nighttime the controller relies on stored energy in battery to power consumption, making fewer transactions

Figure 2 (B) The graph shows battery levels on a simulated AI controller for the historical CAISO data. During peak hours, the battery discharges as reserves are sold, while solar power supplies the load. At night, the battery conserves power, minimizing purchases and optimizing reserves for daytime selling. — Figure 2. (A) Peak and off-peak net power bought or sold over one week using simulations of the AI controller on historical CAISO data. (B) Peak and off-peak battery levels over one week using simulations of the AI controller on historical CAISO data.

Case study: AI-powered microgrid for community energy transition

Microsoft researchers, in partnership with community-based organizations Remix: The Soul of Innovation (opens in new tab), Maverick IQ (opens in new tab) and Ayika Solutions (opens in new tab), are designing and implementing an AI-powered microgrid system in West Atlanta. Working closely with the Vicars Community Center (VCC) resilience hub (opens in new tab), they aim to address challenges faced by the community due to rapid development. West Atlanta, like many Atlanta neighborhoods, faces rising housing prices and energy costs that disproportionately affect long-time residents. Communities relying on centralized grids are more vulnerable to outages, with slow recovery times, highlighting systemic inequalities in energy distribution.

The VCC resilience hub is tackling these issues by helping to establish a solar microgrid for the West Atlanta Watershed Alliance (opens in new tab) (WAWA) community farm and surrounding neighborhoods. Microsoft researchers and collaborators are integrating AI into the microgrid to achieve energy savings, improve resilience, and create local job opportunities. Figure 3 shows the VCC resilience hub and WAWA community farm powered by the microgrid, highlighting key infrastructure for installing distributed energy resources (DERs).

Figure 3 (A) and 3 (B) shows pictures of the VCC resilience hub, with solar panels and batteries for energy storage

Figure 3 (C) and 3 (D) shows pictures of the community farm, and volunteers at WAWA, a key center to support the future of community agriculture to be supported by the microgrid — Figure 3. A and B show the VCC resilience hub, with solar panels (left) and batteries for energy storage (right) – photographs by Erica Holloman-Hill. C and D show the WAWA community farm and community members holding freshly harvested crops.

Project phases

Co-innovation design

Microsoft researchers, architects, and community partners held a participatory design session with state and utility representatives to define the project’s mission and key metrics. The CDC’s Social Vulnerability Index informed the site selection, supporting the project’s diversity, equity, and inclusion goals.

Renewables and microgrid siting

A renewable siting survey conducted by community partners identified the VCC as a key resilience hub for solar panel and battery installation.

To deliver these benefits, the site first needed upgrades. Older homes required energy-efficiency improvements, such as electrical upgrades and better insulation, before they could be integrated into the microgrid. As a PoC, the team collaborated with community partners to modernize an older home with inefficient energy consumption. Sensors were installed to track energy usage and environmental conditions (Figure 4).

Figure 4: A graph showing estimated cost of electricity per day based on a legacy household in West Atlanta through kilowatt-hour usage between July 29, 2024 and August 13, 2023. Data validates the family’s experience about high energy bills, inefficient heating and cooling, and high humidity in the basement. — Figure 4. Estimated daily electricity costs based on a home’s kilowatt-hour usage between July 29 and August 13, 2023. The data confirms the residents’ experience of high energy bills, inefficient heating and cooling, and high humidity in the basement. Used by permission from Erica Holloman-Hill.

Students from Morehouse College (opens in new tab) used this data to create a digital twin of the home, which provided actionable insights (Figure 5). The analysis confirmed issues like high radon levels and energy drains from outdated appliances. Guided by these findings, the team upgraded the house into a “smart home” where AI monitors energy and environmental conditions, enabling it to join the microgrid and making it eligible for LEED certification (opens in new tab).

Figure 5: 2 Figures showing snapshots of digital twin created for Dr. Erica Holloman-Hill’s home, provided by courtesy of Dr. Erica L Holloman-Hill, owner of Ayika Solutions Inc. The first figure shows the sensor readings of pollutants and weather in various parts of the home. The second figure shows the measurements in detail for the basement. The detailed environmental data—including climatic conditions, appliance-level energy usage, and pollutant levels—provide actionable insights for identifying targeted areas for grid modernization. — Figure 5. Smart electrification: Snapshots of digital twin created for the PoC home. Panel A shows the digital twin for the entire home. Panel B shows detailed views for the first floor and basement, respectively. The detailed environmental data—including climatic conditions, appliance-level energy usage, and pollutant levels—provide actionable insights for identifying targeted areas for grid modernization. Used by permission from Erica Holloman-Hill.

Microgrid simulation phase

To prepare the AI-powered microgrid, Microsoft researchers built a simplified tabletop prototype simulating the setup using real data from the design and siting phases. This prototype demonstrated the control mechanism’s ability to manage DERs—solar panels, batteries, and appliances—and the interface between the microgrid and the larger grid. Figure 6 shows the tabletop model during prototyping.

Figure 7 illustrates the results of this simulation, showing power bought and sold and the battery charge-discharge profile. The AI controller made optimal buying and selling decisions, promoting efficiency and reliability.

Figure 6 (A): Graph depicting peak and off-peak net power bought or sold over one week using simulations of the AI controller on data generated during runs of tabletop microgrid model. The graph shows a direct correlation that when solar is available then more power is bought than sold, whereas, during night time the controller relies on stored energy in battery to power consumption, making fewer transactions.

Figure 6 (B) The graph shows battery levels on a simulated microgrid controller powered by AI. During peak hours, the battery discharges as reserves are sold, while solar power supplies the load. At night, the battery conserves power, minimizing purchases and optimizing reserves for daytime selling. — Figure 7. (A) Peak and off-peak net power bought or sold over one week using AI-controller simulations. (B) Corresponding battery levels.

Erica Holloman-Hill, director of WAWA, CEO of Ayika Solutions and owner of the PoC home, reflected: “This study helped me understand how our home’s outdated condition affects our quality of life. Upgrading homes like mine could make a significant difference. Thanks to partnerships like this one, controlling and sharing the electricity the community generates is within reach, highlighting the potential of AI-supported technologies like microgrids for communities like ours.”

Building on the simulation’s success, the VCC resilience hub and local organizations are continuing to install solar panels to power the microgrid. AI will play a key role in siting and controlling the system as it expands. Efforts are also underway to establish sustainable financing models and assess homes for modernization to enable broader participation in the microgrid.

AI: A path to equity and resilience

The transition to decentralized microgrids offers new opportunities for energy efficiency, with AI playing a critical role in managing these systems. Yet additional efforts are needed for communities to fully realize these benefits. Residents of aging homes are burdened with outdated wiring, inefficient appliances, and poor insulation—factors that drive up energy costs. Their dependence on centralized grids offers little relief, underscoring the need for community-focused energy solutions.

The West Atlanta project illustrates AI’s potential to create resilient, equitable, community-driven energy systems, paving the way for a more inclusive and sustainable future. Microsoft researchers are continuing to collaborate with local organizations to promote smarter energy management.

For additional details, please review the project report.

Acknowledgements

I would like to thank all the collaborators on these projects: West Atlanta microgrid: Erica L. Holloman-Hill, John Jordan Jr, Markese Bryant. I also want to thank Karin Strauss for reviewing and providing feedback on this blog post; Andalib Samandari, the intern who supported this project; Vaishnavi Ranganathan for helping to brainstorm throughout the project; AI & Society Fellows program for supporting projects in this domain; and Microsoft’s Datacenter Community Affairs team, Jon McKenley and Kelly Lanier Arnold for supporting the project in West Atlanta.

The post AI-powered microgrids facilitate energy resilience and equity in regional communities appeared first on Microsoft Research.

Introducing DRIFT Search: Combining global and local search methods to improve quality and efficiency

October 31, 2024

by Brenda Potts Microsoft AI

Three icons that represent local and global search and GraphRAG. These icons sit on a blue to pink gradient.

GraphRAG is a technique that uses large language models (LLMs) to create knowledge graphs and summaries from unstructured text documents and leverages them to improve retrieval-augmented generation (RAG) operations on private datasets. It offers comprehensive global overviews of large, private troves of unstructured text documents while also enabling exploration of detailed, localized information. By using LLMs to create comprehensive knowledge graphs that connect and describe entities and relationships contained in those documents, GraphRAG leverages semantic structuring of the data to generate responses to a wide variety of complex user queries. Uncharted (opens in new tab), one of Microsoft’s research collaborators, has recently been expanding the frontiers of this technology by developing a new approach to processing local queries: DRIFT search (Dynamic Reasoning and Inference with Flexible Traversal). This approach builds upon Microsoft’s GraphRAG technique, combining characteristics of both global and local search to generate detailed responses in a method that balances computational costs with quality outcomes.

How GraphRAG works

GraphRAG has two primary components, an indexing engine and a query engine.

The indexing engine breaks down documents into smaller chunks, converting them into a knowledge graph with entities and relationships. It then identifies communities within the graph and generates summaries—or “community reports”—that represent the global data structure.

The query engine utilizes LLMs to build graph indexes over unstructured text and query them in two primary modes:

Global search handles queries that span the entire dataset. This mode synthesizes information from diverse underlying sources to answer questions that require a broad understanding of the whole corpus. For example, in a dataset about tech company research efforts, a global query could be: “What trends in AI research have emerged over the past five years across multiple organizations?” While effective for connecting scattered information, global search can be resource intensive.
Local search optimizes for targeted queries, drawing from a smaller subset of documents that closely match the user’s input. This mode works best when the answer lies within a small number of text units. E.g. a query asking: “What new features and integrations did Microsoft’s Cosmos DB team release on October 4th?”

The creation of these summaries often involves a human in the loop (HITL), as user input shapes how information is summarized (e.g., what kinds of entities and relationships are extracted). To index documents using GraphRAG, a clear description of the intended user persona (as defined in the indexing phase) is needed, as it influences how nodes, edges, and community reports are structured.

Introducing DRIFT Search

DRIFT Search introduces a new approach to local search queries by including community information in the search process. This greatly expands the breadth of the query’s starting point and leads to retrieval and usage of a far higher variety of facts in the final answer. This addition expands the GraphRAG query engine by providing a more comprehensive option for local search, which uses community insights to refine a query into detailed follow-up questions. These follow-ups allow DRIFT to handle queries that may not fully align with the original extraction templates defined by the user at index time.

Answer details	Drift (DS_Default)	Local (LS)
Supply Chain	Traced back to cinnamon in Ecuador and Sri Lanka [Redacted Brand] and [Redacted Brand] Brands Impacted Products sold at [Redacted Brand] and [Redacted Brand]	Plants in Ecuador
Contamination Levels	2000 times higher than FDA max	Blood lead levels ranging from 4 to 29 micrograms per deciliter
Actions	Recalls and health advisories Investigating plant in Ecuador Issued warnings to retailers	Recalls and health advisories

Table 1: An example of summarized responses from two search techniques (DRIFT and Local Search) on a dataset of AP News articles to the query: “Describe what actions are being taken by the U.S. Food and Drug Administration and the Centers for Disease Control and Prevention to address the lead contamination in apple cinnamon fruit puree and applesauce pouches in the United States during November 2023”. As shown in the table, DRIFT search was able to surface details not immediately available with the two other approaches.

DRIFT Search: A step-by-step process

Primer: When a user submits a query, DRIFT compares it to the top K most semantically relevant community reports. This generates an initial answer along with several follow-up questions, which act as a lighter version of global search. To do this, we expand the query using Hypothetical Document Embeddings (HyDE), to increase sensitivity (recall), embed the query, look up the query against all community reports, select the top K and then use the top K to try to answer the query. The aim is to leverage high-level abstractions to guide further exploration.
Follow-Up: With the primer in place, DRIFT executes each follow-up using a local search variant. This yields additional intermediate answers and follow-up questions, creating a loop of refinement that continues until the search engine meets its termination criteria, which is currently configured for two iterations (further research will investigate reward functions to guide terminations). This phase represents a globally informed query refinement. Using global data structures, DRIFT navigates toward specific, relevant information within the knowledge graph even when the initial query diverges from the indexing persona. This follow-up process enables DRIFT to adjust its approach based on emerging information.
Output Hierarchy: The final output is a hierarchy of questions and answers ranked on their relevance to the original query. This hierarchical structure can be customized to fit specific user needs. During benchmark testing, a naive map-reduce approach aggregated all intermediate answers, with each answer weighted equally.

An image that shows a hierarchical tree with each node represented as a pie chart of weighting. — Figure 1. An entire DRIFT search hierarchy highlighting the three core phases of the DRIFT search process. **A (Primer):** DRIFT compares the user’s query with the top K most semantically relevant community reports, generating a broad initial answer and follow-up questions to steer further exploration. **B (Follow-Up):** DRIFT uses local search to refine queries, producing additional intermediate answers and follow-up questions that enhance specificity, guiding the engine towards context-rich information. A glyph on each node in the diagram shows the confidence the algorithm has to continue the query expansion step. **C (Output Hierarchy):** The final output is a hierarchical structure of questions and answers ranked by relevance, reflecting a balanced mix of global insights and local refinements, making the results adaptable and comprehensive.

Why DRIFT search is effective

DRIFT search excels by dynamically combining global insights with local refinement, enabling navigation from high-level summaries down to original text chunks within the knowledge graph. This layered approach ensures that detailed, context-rich information is preserved even when the initial query diverges from the persona used during indexing. By decomposing broad questions into fine-grained follow-ups, DRIFT captures granular details and adjusts based on the emerging context, making it adaptable to diverse query types. This makes it particularly effective when handling queries that require both breadth and depth without losing specific details.

Benchmarking DRIFT search

As shown, we tested the effectiveness of DRIFT search by performing a comparative analysis across a variety of use cases against GraphRAG local search and a highly tuned variant of semantic search methods. The analysis evaluated each method’s performance based on key metrics such as:

Comprehensiveness: Does the response answer all aspects of the question?
Diversity of responses: Does the response provide different perspectives and insights on the question?

In our results, DRIFT search provided significantly better results on both comprehensiveness and diversity in the metrics. We set up an experiment where we ingested 5K+ news articles from the Associated Press and ingested those articles using GraphRAG. After ingestion, we generated 50 “local” questions on this dataset and used both DRIFT and Local Search to generate answers for each of these questions. These “local” questions were questions that target specific details in the dataset that could be attributed to a small number of text units containing the answer. These answers were then used with an LLM judge to score for comprehensiveness and diversity.

On comprehensiveness, DRIFT search outperformed Local Search 78% of the time.
On diversity, DRIFT search outperformed Local Search 81% of the time.

Availability

DRIFT search is available now on the GraphRAG GitHub (opens in new tab).

Future research directions

A future version of DRIFT will incorporate an improved version of Global Search that will allow it to more directly address questions currently serviced best by global search. The hope is to then move towards a single query interface that can service questions of both local and global varieties. This work will further evolve DRIFT’s termination logic, potentially through a reward model that balances novel information with redundancy. Additionally, executing follow-up queries using either global or local search modes could improve efficiency. Some queries require broader data access, which can be achieved by leveraging a query router and a lite-global search variant that uses fewer community reports, tokens, and overall resources.

DRIFT search is the first of several major optimizations to GraphRAG that are being explored. It shows how a global index can even benefit local queries. In our future work, we plan to explore more approaches to bring greater efficiency to the system by leveraging the knowledge graph that GraphRAG creates.

The post Introducing DRIFT Search: Combining global and local search methods to improve quality and efficiency appeared first on Microsoft Research.

Intern Insights: Vaishnavi Ranganathan with Angela Busheska

October 24, 2024

by Alyssa Hughes (2ADAPTIVE LLC dba 2A Consulting) Microsoft AI

Outline illustrations of Angela Busheska, an undergraduate engineering student at Lafayette College and Vaishnavi Ranganathan, a Senior Researcher at Microsoft.

Every year, interns from academic institutions around the world apply and grow their knowledge as members of the research community at Microsoft. In this Microsoft Research Podcast series, these students join their internship supervisors to share their experience working alongside some of the leading researchers in their respective fields.

In this episode, Angela Busheska, an undergraduate engineering student at Lafayette College, talks to Senior Researcher Vaishnavi Ranganathan about her work on TerraTrace, a platform that brings together statistics and large language models to track land use over time for agricultural and forestry applications. Busheska discusses the personal loss that drew her to climate activism, the chain of events that led to a memorable face-to-face meeting with Microsoft’s chief sustainability officer, and her advice for going after the internship you want and making the experience count.

Angela Busheska standing to the left of the Microsoft sign on the Microsoft campus in Redmond, Washington. — Angela Busheska, pictured on the Microsoft campus in Redmond, Washington, was a part of the Microsoft Research Undergraduate Research Intern Program. During her time in the internship program, she helped develop a platform for tracking land use across time for agricultural and forestry applications.

Angela Busheska and Melanie Nakagawa standing in front of a fence — During her internship, Busheska met with Microsoft Chief Sustainability Officer Melanie Nakagawa at the Bloomberg Green Festival in Seattle and spoke with the Microsoft executive about her sustainability work.

Learn more:

TerraTrace (opens in new tab)
GitHub repo
Project FarmVibes
Project homepage
Project FoodVibes
Project homepage

Transcript

[TEASER]

[MUSIC PLAYS UNDER DIALOGUE]

ANGELA BUSHESKA: Being in New York while working with people from Seattle and Brazil allowed me to have a broad range of people that I had a chance to meet. So, like, the New York office, it is very specifically focused on economics and social aspects that I’m not an expert on, but having a lunch with these people every single day, I had a chance to learn a lot. Being in Seattle and meeting with interns and researchers in Seattle, I had a chance to learn how other projects came to life. With our meetings, I realized how FarmVibes and FoodVibes were once just, like, a small idea and now are these huge projects. So having a chance to understand the history of all the things around me was a great way to see how I am able to build something for the future.

[TEASER ENDS] [MUSIC FADES]

VAISHNAVI RANGANATHAN: Hey, everyone, welcome to Intern Insights, a Microsoft Research Podcast featuring some of the brilliant students who are contributing to research and advances at Microsoft as a part of the renowned internship program at Microsoft Research. I’m Dr. Vaishnavi Ranganathan, a senior researcher at Microsoft Research with a passion for leveraging sensing and wireless technology to help address global challenges in health, environment, and sustainability. Today, I’m speaking with my intern, Angela Busheska, about her work this summer and her experience as a Microsoft Research intern. Angela, welcome to the podcast. So I’ve had the pleasure of getting to know you this summer not only in the capacity as a student and researcher but also as a person. So to start with, why don’t you introduce yourself. Tell the audience a little bit about yourself—where you’re from, your academic background, what excites you in research, and most importantly, outside studies, what do you enjoy doing?

ANGELA BUSHESKA: Yeah, absolutely. Thank you so much for having me. Super excited to be here and super excited to have spent three months—time flies!— as a researcher along with the project. So my name is Angela. I am originally from North Macedonia, a very small country next to Greece in the Balkans. And I spent my first, like, 16 years thinking that I would become a mathematician, really focused on math Olympiads. And then I moved to the capital city of my country, which is Skopje, North Macedonia, and that year, in 2019, we had the greatest pollution. We were, like, on the top of every single list in terms of pollution, but honestly, I didn’t care. I was like, I’m going to do, like, math Olympiads, win every medal. I moved in with my aunt, who was living very close to the city center, and she had a lot of cardiovascular difficulties. And then this air pollution clogged her blood, and she passed away. And that was the moment when I saw like, I love math, but this air pollution took my aunt, so I have to do something. So I made the hard decision that junior year going into senior year to cut my participation in math Olympiads after seven great years there and focus on the climate. I really just started to understand where is the pollution coming from in my area. I started to realize that fast fashion had a huge influence, and then that’s why I started, like, getting into this field, getting into research, started a climate tech nonprofit called EnRoute, where our mission is to debunk the fast fashion supply chain and let people but also corporations know what is behind their clothes. So we, kind of, help both sides. We help people with their shopping choices, but we help a lot of corporations and policy to understand either what’s behind their supply chain and change it or how to better communicate their supply chain because there’s a lot of greenwashing that is happening in this space. So it really started as a rag-tag group of teenagers, and this is my fifth year working at it. So I really had a passion for both understanding the research side of it and the nonprofit, like, getting people to learn more about it. So I think that marked really my presence. And then coming into the US in 2021, I started to study both electrical engineering and computer science just because I love this intersection of building and doing stuff in practice and a way to apply my math knowledge into real life.

RANGANATHAN: Wow, so early in life you have such amazing goals. That’s impressive. So, Angela, we connected after Madeleine Daepp, a researcher who was involved in the intern selection process, introduced your work and your passion for sustainability to me. I saw your passion for this area and, you know, your enthusiasm in the first conversation—we were already working on ideas, right! I knew I had to work with you. But for you, there are many other internship opportunities. I would love to know what excited you the most about Microsoft Research.

BUSHESKA: Absolutely. Yeah. So as I said, like, I started getting into this area of sustainability by myself. I really didn’t have a structured way to learn. So when I heard that there is an opportunity from people who have, like, previously worked in this area, are doctors in this area, to learn along them, that was definitely one. But also throughout my life, I had like entrepreneurship stints. I was selling lemonade when I was [LAUGHS] 6 years old; started a nonprofit. And this was a great opportunity for me to start a zero-to-one project while being an intern, which is, I would say, a once-in-a-lifetime chance. We had a chance to define the problem, see solution, try things. I was not part of just one project or building one feature. I had a chance to drive it along you and the team. So that was, I think, the main reason why.

RANGANATHAN: I’m glad we were able to attract your attention here. [LAUGHS] Before we get into specifics on, you know, exactly what we did this summer, I’d like for us to talk a bit about your internship setup, right. I think it allowed for some really unique experiences, and, you know, I would love to have you share your internship highlights. I think there is one in particular that I’m thinking about, which I really hope you’ll share.

BUSHESKA: Absolutely. I would say this was an internship like no other experience I’ve had in the past. So just for context, I was interning in New York while part of the team being in Brazil, part of the team—you and the other researchers—being in Redmond, a couple of other researchers, like, being technically in New York. So I had a chance to interact with all of these different time zones. Also for an additional context, I was a part of undergrad research internship group where they brought us to Seattle from New York—and also there are other interns from Boston—to spend like a week and learn more about Microsoft, Microsoft Research, the leadership. So that was an incredible opportunity. Also, I had a great opportunity prior to that to come to the team meeting to see farms in real life, interact with farmers, interact with the team in person, which was also another great opportunity. And I’ll come to the one that you’re referring [to] here [LAUGHS]. So the second time when I was, like, visiting Seattle, I remember a week prior, it was Friday, we were finishing up some stuff. You actually pointed out that I should definitely message some of the leadership people to interact with them and hear their insights. And it was EST time zone, somewhere around 8 when I left the office, and I shoot an email to the chief sustainability officer, Melanie Nakagawa. I was like, “Hey, my name is Angela. I’m a research intern. I have worked in, like, fast fashion in the past. Now I’m working with Vaishnavi and the team on this food project. I’d really love to meet you next week. [LAUGHS] I’m coming for like three days.” I sent that email. I was like, there’s no way I’m getting response from this. And then on Sunday, I traveled to Seattle. And I remember during the first day, assistant of the chief sustainability officer says, like, there is this Bloomberg Green Festival that is happening in Seattle on Thursday, and we would love for you to come and meet Melanie. It was, like, I don’t know. It was like real life or not. I really couldn’t … it took a lot of time to process. And after that, one thing additional that was a barrier was how to get into this festival because this festival had a ticket of like $300, $400. And I think that by that time everything was sold out. So I was like, I would love to, but I cannot just get into the Bloomberg Green Festival without a badge. So then I realized there is this sustainability community at Microsoft that is like 8,000 employees or something, and they got badges previously for the festival, and everyone who is working there can just get the basic badge just to get into. So I emailed a couple of people from there; they set [me up] with a badge. Huge props also to the undergraduate research team who extended my stay for a day because I was supposed to leave for New York the other day. So a couple of miracles happened there. I had a chance to go to the Bloomberg Green Festival, had a chance to meet with Melanie and her assistant, Tyler, who was working also in this area of sustainability in her presence on, like, these events. It really goes to say how an undergrad intern can, like, meet with leadership and learn from the leadership to the place and the opportunities we had during this internship. Thank you so much Melanie and Tyler for taking out the time at the Bloomberg Green Festival. I know it was a super packed day for both of you, but thank you so much for taking the time.

RANGANATHAN: Thank you for sharing that. I want to point out something. It’s amazing how the leadership makes time and they really value every person’s work, and that’s an internship which is accounted for now, right, and you know your work is meaningful to the company. I think Microsoft is so big that people don’t realize there’s this community of 8,000 people in sustainability often, so it’s amazing. It blows my mind every single time, and kudos to you for following up on my late-night comment. [LAUGHS] Um, Angela, so this internship falls under larger efforts within Microsoft Research to establish a sustainable agriculture and food supply chain, a project that we know as FoodVibes. In your work this summer, you focus specifically on enabling technology to meet the new European Union’s regulation around deforestation-free products. Could you tell us, in your words, about these regulations and how you think the work can aid in their implementation?

BUSHESKA: Absolutely, yeah. So the European Union’s regulation on deforestation-free products, also known as EUDR, is a legislation that will prevent food linked to deforestation to enter European Union borders[1]. But how will the border officials know if something is coming from a deforested area or not? On another side, it’s also the farmers from this area who don’t know what exactly is a deforested area, especially the ones who moved after maybe an area has been deforested. So there are a lot of questions around there, and I think that getting into and understanding the problem was a huge part to continue after that in building the solution.

RANGANATHAN: Yeah, and I believe that having a large team here really helped because we all often had discussions around many of these areas and we really valued your inputs and how you participated in these discussions, right. So through your internship, we’ve seen that there are a variety of tools, you know, and vegetation metrics at our disposal to help determine what exists on a plot of land because that’s what we want to identify. There’s, like, satellite imagery. We get the normalized difference vegetation index, and then there’s the USDA’s cropland data layer, which exists purely for the US. But, you know, we are getting an incomplete picture here, at least as it pertains to this particular use case. How do you think TerraTrace brings these pieces together, and what are some of the specific challenges or motivations of the core of your work?

BUSHESKA: Absolutely. Yeah, I think, like, a couple of the first weeks, we tried these great machine learning models, and one thing we realized is that when we have, like, orchards, and when we had pine trees, and then when we had, like, regular forest, all of them are classified the same, which was a really great light saying, like, yeah, these are great. These are models that are trained by millions of data. They still don’t work as we want to. And also, I think that in an era where everyone is, like, training big great models, we kind of decided to take another route and say like, models are great. What if we go step by step and maybe we don’t need a full model round. Maybe we can go step by step to a certain point to try to understand deeply. And I think I’m really grateful that we took this direction because we had a chance to understand at a granular level what is actually happening. Here we had, like, we were lucky at the beginning to work with farms in Washington. We had, like, an actual farmer who was in the team who we have, like, pictures and we had all the data. So that was a great ground truth for us to understand, OK, this is how the vegetation looks on a farm. This is how the vegetation looks on a forest across years. So we had these snapshots and then from then on, we just moved along to say, well, if we know about this, what’s happening in Washington, how we can scale now to other places in the US. We looked deeply into California just because it has a lot of, like, agricultural diversity. And I think that this step-by-step level brought us to create TerraTrace. TerraTrace is a platform where one can enter, like, the coordinates of a farm and see what has been happening on a given piece of land across time. Now, TerraTrace combines a couple of different things. It combines mathematical results, combines LLMs, and combines just, like, the basic information of risk. And I think this is good just because it’s not a heavy computational platform that we have. And if we need to use it on, like, a couple of years from now on a farm, we technically can. If we need to use it, like, in a legal office, we can. Just because it doesn’t depend a lot on data and it’s, like, easily portable, which is another plus, and differentiator compared to what exists outside.

RANGANATHAN: Do you want to share a little bit about the signature curves that you identified?

BUSHESKA: Absolutely. So when I just, like, realized … there is a metric called NDVI (Normalized Difference Vegetation Index), which is basically measuring vegetation. It is a time series of vegetation. What we did with NDVI is we tried to, like, measure it across time on different places. And I think in coffee, it had the greatest impact. Just because there was a coffee in Vietnam, coffee in Honduras, completely two different places. And then when we analyzed and saw the shapes, we were like, wow, they are completely two different places; however, they still follow the same trend and that was a great calling to say, like, this is a greater metric. Maybe if, like, vision computer models failed, this is something that is, like, very rudimentary. You wouldn’t expect that just a very simple index would calculate that, but we saw it working for a lot of different places, and I’m particularly grateful for the places that were not together at the same place.

RANGANATHAN: Yeah, and I think, like, looking at this, like, temporal, over-time picture is the unique aspect of what you built and that was what unlocked it for us, right. So you hinted at this. In developing the TerraTrace platform, you ended up combining large language models, or LLMs, and statistics. What did you like about this approach? What did each bring to the table? How did it contribute to your ultimate goal? Could you share a bit about?

BUSHESKA: As I said, like, at the beginning, I’m a huge math person, and I truly believe that math has more power than we credit today for. So one thing that I started building, when I started building, I was just, how can we get as much information as possible without LLMs and then give it to the LLMs? That was kind of the approach that I started. So we have this, as you mentioned previously, crop data layer that was the base truth. So I could always go back and compare to see if it has, like, if we are doing the same thing. Then we had the signature curves that we could compare to base signature curves to understand, OK, it’s coffee or it’s a farm. And now when we were having a farm, we had a problem because corn looked the same as wheat. Multiple crops look the same. So I needed something more to differentiate what exactly is there. So by math, I managed to understand like the growth rate, the fallout rate, what is the percentage when the curve is up, how much percentage is down so every insight that we could get by just basic statistics. And after that, all this combined knowledge was given to the LLM to, kind of, just confirm that the math is right and give us, OK, greater insight into, well, your math is good, and after that, based on your math and everything the LLM knows, well, there is this probability that it’s corn and after that we came to the crop data layer to say, yep, it’s corn. Referring to corn just because this was our example demo. Also, LLMs are changing very rapidly today, so like GPT-4, GPT-4 Turbo, expecting a GPT-5, so constant development needs to be done, and I think that if something is, like, changed and doesn’t work in the platform, one thing that will always work is the statistics. So having something to always refer to was a very interesting process.

RANGANATHAN: Yeah, I really appreciate, you know, how we had the focus on, hey, let’s do what we can with math first because, after all, this is a sustainability project, right?

BUSHESKA: Exactly.

RANGANATHAN: So LLMs were great at summarizing this, taking all the data and, like, giving you the outputs. I think that was a very interesting approach, too. So there are several existing pieces of work we’ve seen in this area. We’ve relied on a lot of literature in this specific application, right, that ended up havingauthors who were affiliated with Microsoft. Could you share a little bit about those works that we drew on and how we leveraged the Microsoft connections?

BUSHESKA: Absolutely. One thing that was something I was not expecting was reading the papers and then hopping on a call with the authors. So one thing that I would credit is first we understood … there was this, like, GeoLLM group. They actually had a presentation in our weekly meeting for the group. So it was, like, me just staying in the presentation and trying to learn. And the other day, they send out like, this is a research paper; take a look at it. And after, like, we realized that the majority of the authors were in the same building as you actually, we had a great chance to, like, meet, like, with them and understand how …

RANGANATHAN: This is SatCLIP, right?

BUSHESKA: … SatCLIP (opens in new tab) and GeoLLM (opens in new tab), both of them. Trying to understand, like, how they have built it, what they have built, why, how they are planning to continue, how I am able to use it. And, actually, because when you’re reading a paper and you’re reading GitHub repo, it’s one thing. [LAUGHTER] When you’re speaking with the author and seeing, like, this is why we’ve built it, here’s the limitations, be careful how you’re using it, it’s completely another thing. So I am really grateful for this situation. But also, not only, like, inside of Microsoft, we also had a chance to work with a lot of other folks. For example, we read a lot of papers from research groups in universities. And because our interns came from these universities, it was, like, great to understand from them also what they have built.

RANGANATHAN: Nice, and the other interns whom you worked with, which we’ll come to. So for this specific work, we were motivated by the use case. What were some of your findings, and also how are you envisioning this work as a foundation for other supply-chain or even, you know, broader sustainability scenarios and even beyond those fields that you can think of?

BUSHESKA: Yeah, so one thing that we, as a conclusion, came out from here is that signature curves are not just by look. When we tested that with a lot of different other signature curves, we realized, OK, this is true because in most of the cases, they worked. Obviously we had some failures, like citrus signature curves didn’t work because of other reasons that happened on the farm. But for most of them, we got same results on what’s happened with the signature curves with what happens actually on the farm, which is a great way to further this exploration here. Another thing that we realize from this project is now the group previously has worked on trying to understand supply chain, like having the tracking part and having this code to track along the journey, but we were missing the part on what happened on the land. And now that we have this additional way to understand what happened on the land, I think it is a full system of starting this is what happened on the land, this is what happened across the road, and this is where it is now, so that not only EUDR, but even, like, customers in the future will know where their food is coming from, which is that total part of sustainability that we want to get to. Because I think that at this point, we know the challenges, we know the climate change challenges that we are having, and the more information we have, the better decisions we and policymakers would be able to make.

RANGANATHAN: And I think we realized this and you can chime in, Angela, but I feel that the traceability is a vehicle for data. What you build is a means to take all that data and make it meaningful to people, right. Like what does NDVI mean? I have no idea as a layperson, so you can actually get that information out. There were also a few other use cases that you gleaned out of your data input, right. Like wildfire was one of them. Would you like to share a little bit about that?

BUSHESKA: Yes, so we were looking specifically about California, specifically about 2020, which was a season of very, very bad wildfires. And we could see by measuring this vegetation how, like, wildfires affected these regions. Like, you could see how vegetation was dropping very quickly from a great number of one to just, like, flat zero just because everything was there burned. So seeing firsthand on these curves and after that linking up to a lot of other background wildfire data was a good check-in. Like, you could see that it’s not a person deforestation; it’s a wildfire deforestation. And also it is really helpful for risk estimation as we go further along. We could see, like, this past year climate change effects are there. It’s not just like this fancy term in the future. It is there. So I think that having a chance to see in the past and having these models, it’s also giving us a preview for the future. Because ultimately we want this greater food yield so that more and more people can enjoy healthy food. So having a chance to predict the risk would help us to save more food and, yeah, care better about our planet and our people.

RANGANATHAN: I think there’s a lot of potential. The more information you have, the more applications in sustainability you can build. This is merely, like, a footstool to launch it off, right. You’ve said one of the things you’ll miss most from your internship is the density of smart people in the building and in the company. Over the course of your internship, what did you learn? How did your actual experience compare to, like, expectations you had for it?

BUSHESKA: I think it exceeded every single expectation that I had because I would say when you mentioned the density of smart people, it was really incredible to have—and I think that this … maybe my very interesting connection of being in New York while working with people from Seattle and Brazil allowed me to have a broad range of people that I had, like, had a chance to meet. So, like, the New York office, it is very specifically focused on economics and social aspects that I’m not an expert on, but having a lunch with these people every single day, I had a chance to learn a lot about how, like, human data is provisioned. They were building a lot of things for, like, Microsoft’s new projects and new products that are, like, the Copilot and the computers, so I had a chance to hear about those perspectives. Being in Seattle and meeting with interns and researchers in Seattle, I had a chance to learn not only about the farms perspective but also how other projects came to life. With our meetings, I realized how FarmVibes and FoodVibes were once just, like, a small idea and now are these huge projects. So having a chance to understand the history of all the things around me was a great way to see how I am able to build something for the future. And also maybe another thing is the collaboration that happens. As I said, I was, like, surrounded by a lot of socioeconomic people—that I am not really an expert in—but they were really great with providing me advice from an area I would never think of. So when I was speaking, like, from farms to someone who is in economics, they would raise another point on how this can hurt or help economy or to look at it from another side, which is all great perspectives to have when you’re creating a research from scratch. That’s why I really, like, referred to entrepreneurship previously because I feel like even though was an intern, I really had a chance to shape the project in a way that I would get all these insights from people and then obviously decide how to go further.

RANGANATHAN: Yeah, and I want to add summer with interns is the highest energy level at MSR (Microsoft Research) and the best time of my year at Microsoft. So having successfully completed your internship, what advice do you have for our audience, you know, when it comes to applying for an internship like this in industry and then getting the most out of that experience?

BUSHESKA: I would say one of the things that really maybe helped me through the process is, like, working obviously, in an area that I was really passionate about. But another thing, and especially this is coming specifically for the marginalized communities in tech—although I would say MSR is doing a great job in having this balance. But I’m coming from a liberal arts college where we have, like, two girls that are studying electrical engineering—so having this disbalance can in some ways, like, I don’t know, put you, “I’m not good enough” and all this imposter syndrome. So maybe one thing is definitely, like, reach out to people. And something that I really learned heavily from you is reach out. There is nothing you can lose. Most of the time, it is just … getting a no is maybe the hardest answer that you will get. So having a chance to ask is always a good idea. Obviously, don’t go to, like, every single executive to ask questions, even though it paid out—it paid out [LAUGHTER]—and just, like, ask for help from the people around you. Ask to learn. I realized, like, researchers really want to speak about what they have worked on previously. So whenever you get a chance to ask them about their past project, they’re super eager to tell you about like their grad days and how they have become what they are now. So that was also great. And in terms of applying, I think I would … when I applied for this, I was like, there’s no chance I will be able to get in. But for me, as an introvert turned into an extrovert—fun fact, I was a big introvert in the past who turned into an extrovert just because of my work—it’s like if you don’t apply, there’s a 100 percent chance that you’re not going to get in. So, like, when I interviewed, after my first interview, I was like, yeah, there is, like, no chance. Because, like, I presented the projects, but I was like, there are so many other students who I know that are working in these areas and then it happened and then I’m here recording this podcast. I remember it was Thanksgiving break when I applied for this internship. If I was like, no—if I self-rejected myself previously, yeah, this would never have happened. So I would say for all interns out there, just, like, don’t self-reject yourself. Let them reject you. [LAUGHTER] It’s my really advice.

RANGANATHAN: And, yeah, have more confidence. I’m glad you applied and you joined us because this was a highlight of the summer for us, this was, right. So your work with Microsoft Research is one of your several projects. You’ve been involved with many other things, right. What else do you have going on? How does it contribute to the impact you’re looking to have at this point in your life?

BUSHESKA: Yeah. So one thing that I realized in the past four years is that I started … so my, like, journey was very, I would say, interesting and not predictable just because when I, like, graduated from high school, I got trapped by all the COVID pandemic. I couldn’t fly into the US just because of all the visa regulations. It was like, OK, a gap year at home, which if you asked any of my friends before, I was never the person who would take a gap year. So staying that year really, like, without a school or official job was the chance for me to just look at and start my nonprofit, EnRoute. From then on, I worked with policy. We worked at the UN. We even went to Hollywood in March to speak about, like, fast fashion. So I realized there is no one way to make an impact. So I was really glad to join research, to join policy, and speak wherever I can. I think now I’m getting back into my senior year of college. Time flies. And something that I am, like, looking definitely into is how to continue my research in this area, where I most probably would love to be through grad school. Having a chance also to push further this project just because I think that speaking out and sharing research is one of the, like, greatest things we can do, like referring to all the other times when I was speaking on conferences. It will be magical for someone to respond with positive and negative feedback and maybe even can create a collaboration for us to move forward. So that is definitely something that I’m looking forward to. So definitely this area is something that I got in by mistake. I got in really just by … because I had that great time in my senior year to spend more time research, but it definitely something that marked my life at this point. And I think that, like, whatever route I pursue will be to fight climate change in any way or another.

RANGANATHAN: I want to point out we are also planning to open source this and you hope to work with people, you know, to get it in the hands of people who need it and could benefit from it most, right.

BUSHESKA: Absolutely.

RANGANATHAN: So that’s a great direction. I think every year our undergraduate researchers set a new bar in my life expectations. And I thank you for that. This year is another bump up. So, yeah, you are a senior, Angela. What’s next for you, and what’s next for your research and work?

BUSHESKA: Yeah, so this will be a fall really spent with a lot of applications because of, like, graduate school. Definitely something that I’m looking forward, especially graduate school in this area. So we kind of spent a lot of time living in an era of GPT wrappers, and I think when there is a product out there that is really based on research, it can make a tremendous impact. As I said, I am really a person who loves to experiment and do things. So something out there is definitely either, like, having a research that is spun out of this or having perhaps a startup down the line that is based in this area and helping millions of people around the world is definitely something I’m looking forward. Because I saw the impact when you’re helping other people, and I think that is the best thing that one person can do. So I was really excited to do it with all of you this summer and obviously continuing on to open source this project obviously for feedback but also for real use. And if we can help with something, that will be amazing. Hopefully getting these out to conferences around the world. And, yeah, see where it can take us.

RANGANATHAN: Yeah, I have one last question for you. You had some interesting intern mentors, myself included, and then Bruno Silva, who sits in Brazil, Peder Olsen, and Solon Barocas. Do you have anything to say to all of them? [LAUGHS]

BUSHESKA: Yeah, I would say a big thank-you for keeping up with my questions. As all the listeners can probably see, I’m a person who really speaks a lot, even more than needed sometimes. So I would say I had a great balance because I was in a time zone that was between Brazil and Redmond. So I would spend a lot of my mornings, especially in the beginning—huge shoutout to Bruno for dealing with all of my technical questions [LAUGHS]. It was my first time setting up a lot of research clusters. We should not forget also the people at Microsoft who are dealing with all of the GPUs and stuff because I think that managing all of these researchers for all the people out there is a hard job. So having a chance to communicate and also understand. Even though I study electrical engineering, I never had a chance to deeply understand how they’re used in real life. Speaking also with people from all different walks of life because Peder, my other mentor, was someone who has been really experienced in this area, so he would reference projects from I don’t know how many years, and it was, like, so fun to see the history, to see his previous experiences come to life. You don’t every single day get to have like three or four mentors from all different areas and aspects. So I’m really grateful that I had a chance to talk and also get called when things were not right so we can make them better and more correct for the people that will use this project.

RANGANATHAN: I’ll tell them you said thank you. So, Angela, I think we are coming to a close here. Thank you so much for taking the time to share your experience with us. It’s been a pleasure to work with you, and I really look forward to seeing what the future holds for you and what amazing things you’re going to achieve. So thank you. Closing out.

[MUSIC]

BUSHESKA: Thank you so much. Really appreciate to have this opportunity. And for everyone that is, like, working in this area, for every future intern that is doubting themselves, feel free to find me out there and, like, ask. Always happy to help people, always happy to chat sustainability with people. So thank you so much.

[MUSIC FADES]

[1] (opens in new tab) For more information, see “Regulation on Deforestation-free products” on the European Commission website (opens in new tab).

The post Intern Insights: Vaishnavi Ranganathan with Angela Busheska appeared first on Microsoft Research.

Research Focus: Week of October 7, 2024

October 9, 2024

by Brenda Potts Microsoft AI

Welcome to Research Focus, a series of blog posts that highlights notable publications, events, code/datasets, new hires and other milestones from across the research community at Microsoft.

NEW RESEARCH

Securely Training Decision Trees Efficiently

In a recent paper: Securely Training Decision Trees Efficiently that will appear at ACM CCS 2024, researchers from Microsoft significantly reduce the communication complexity of secure decision tree training. Decision trees are an important class of supervised learning algorithms. In this approach, a classification or regression tree is built based on a set of features or attributes present in the training dataset. As with many learning algorithms, the accuracy of decision trees can be greatly improved with larger volumes of data. However, this can be a challenge, since data may come from multiple independent sources and require attention to data privacy concerns. In this case, the use of a privacy-enhancing technology, such as secure multi-party computation (MPC), can help protect the underlying training data. 

When the number of elements in the dataset is 𝑁, the number of attributes is 𝑚 and the height of the tree to be built is ℎ, the researchers construct a protocol with communication complexity O(𝑚𝑁 log 𝑁 + ℎ𝑚𝑁 + ℎ𝑁 log 𝑁 ), thereby achieving an improvement of ≈ min(ℎ, 𝑚, log 𝑁 ) over the previous state of the art. The essential feature is an improved protocol to regroup sorted private elements further into additional groups (according to a flag vector) while maintaining their relative ordering. Implementing this protocol in the MP-SPDZ framework shows that it requires 10× lesser communication and is 9× faster than existing approaches.

Read the paper

NEW RESEARCH

Multi-label audio classification with a noisy zero-shot teacher

Improving the real-world accuracy of audio content detection (ACD) is an important problem for streaming platforms, operating systems and playback devices. It’s similar to audio tagging, i.e., labeling sounds present in a given audio segment of several seconds length or longer. However, ACD may consist of a small number of higher-level labels or super-classes, e.g. speech, music, traffic, machines, animals, etc., where each label can include a multitude of specific sounds.

In a recent paper: Multi-label audio classification with a noisy zero-shot teacher, researchers from Microsoft propose a novel training scheme using self-label correction and data augmentation methods to deal with noisy labels and improve real-world accuracy on a polyphonic audio content detection task. The augmentation method reduces label noise by mixing multiple audio clips and joining their labels, while being compatible with multiple active labels. The researchers show that performance can be improved by a self-label correction method using the same pretrained model. They also show that it is feasible to use a strong zero-shot model such as CLAP to generate labels for unlabeled data and improve the results using the proposed training and label enhancement methods. The resulting model performs similar to CLAP while providing an efficient mobile device friendly architecture which can be quickly adapted to unlabeled sound classes.

Read the paper

NEW RESEARCH

Tabularis Revilio: Converting Text to Tables

Tables are commonly used to store and present data. These tables are often moved as free-form text when copied from documents and applications without proper tabular support like PDF documents, web pages, or images. Users are dependent on manual effort or programming abilities to parse this free-form text back into structured tables.

In a recent paper: Tabularis Revilio: Converting Text to Tables, researchers from Microsoft present a novel neurosymbolic system for reconstructing tables when their column boundaries have been lost. Revilio addresses this task by detecting headers, generating an initial table sketch using a large language model (LLM), and using that sketch as a guiding representation during an enumerate-and-test strategy that evaluates syntactic and semantic table structures. Revilio was evaluated on a diverse set of datasets, demonstrating significant improvements over existing table parsing methods. Revilio outperforms traditional techniques in both accuracy and scalability, handling large tables with over 100,000 rows. The researchers’ experiments using publicly available datasets show an increase in reconstruction accuracy by 5.8–11.3% over both neural and symbolic baseline state-of-the-art systems.

Read the paper

NEW RESEARCH

Confidential Container Groups: Implementing Confidential Computing on Azure Container Instances

Container-based technologies empower cloud tenants to develop highly portable software and deploy services in the cloud at a rapid pace. Cloud privacy, meanwhile, is important as a large number of container deployments operate on privacy-sensitive data, but challenging due to the increasing frequency and sophistication of attacks. State-of-the-art confidential container-based designs leverage process-based trusted execution environments (TEEs), but face security and compatibility issues that limit their practical deployment.

In a recent article in Communications of the ACM: Confidential Container Groups: Implementing Confidential Computing on Azure Container Instances (opens in new tab), researchers from Microsoft with external colleagues present the Parma architecture, which provides lift-and-shift deployment of unmodified containers while providing strong security protection against a powerful attacker who controls the untrusted host and hypervisor. Parma leverages VM-level isolation to execute a container group within a unique VM-based TEE. Besides container integrity and user data confidentiality and integrity, Parma also offers container attestation and execution integrity based on an attested execution policy. This policy, which is specified by the customer, delimits the actions that the cloud service provider is allowed to take on their behalf when managing the container group.

The result is that customers receive the security protections of TEEs for their container workloads with minimal costs to perfromance. To learn more, check out Confidential Containers on Azure Container Instances (opens in new tab), which is based on Microsoft’s Parma architecture.

Read the paper

NEW VIDEO SERIES

AI for Business Transformation with Peter Lee and Vijay Mital

Generative AI is changing how businesses operate and how stakeholders talk to each other. The building blocks for large scale AI transformation are now in place, but we are only beginning to imagine how it will unfold. Learn what Microsoft research leaders discovered from some early AI innovation in healthcare, and how businesses can prepare for what’s ahead.

In this new three-part video series, Microsoft Research President Peter Lee and Corporate Vice President Vijay Mital discuss how Microsoft is helping businesses navigate this transformation, along with the critical role of data and how emerging multimodal AI models could turbocharge business innovation.

Watch the video

The post Research Focus: Week of October 7, 2024 appeared first on Microsoft Research.

Data Formulator: Exploring how AI can help analysts create rich data visualizations

October 1, 2024

by Alyssa Hughes Microsoft AI

white outline icons (representing AI and human computer interaction) on a blue to purple to pink gradient background.

Transforming raw data into meaningful visuals, such as charts, is key to uncovering hidden trends and valuable insights, but even with advances in AI-powered tools, this process remains complex. Integrating AI into the iterative nature of the data visualization process is particularly challenging, as data analysts often struggle to describe complicated tasks in a single text prompt while lacking the direct control of traditional tools. This highlights the need for smarter, more intuitive solutions that combine AI’s precision with the flexibility of hands-on methods.

Publication

Data Formulator 2: Iteratively Creating Rich Visualizations with AI

To address this, we’re excited to release Data Formulator as an open-source research project. This update builds on last year’s release by combining user interface (UI) interactions for designing charts with natural language input for refining details. Unlike the previous version, which required users to choose between two methods, this unified approach allows them to iteratively solve complex tasks and with less effort.

Download

Data Formulator

Transform data and create rich visualizations iteratively with AI.

Figure 1: This figure shows the user interface of Data Formulator. There are four callouts in the figure highlighting key components of the user interface. The first call out describes “1. Concept Encoding Shelf: specify charts with field encodings and NL instructions”. The second callout describes “2. (Local) Data Threads: backtrack and revise inputs”. The third describes “3. Data Threads: navigate data derivation history”. The fourth callout contains “4. Data View: inspect original and derived data”. The user interface contains a visualization in the center that shows renewable percentage. — Figure 1. Data Formulator’s UI

Creating and refining charts with the Concept Encoding Shelf and data threads

With Data Formulator, data analysts can now create charts from scratch or select from existing designs through data threads. The UI features a pane called the “Concept Encoding Shelf,” where users can build their chart by dragging various data fields into it and defining them or by creating new ones. A large language model (LLM) on the backend processes this input, generating the necessary code to produce the visual and updating the data threads for future use. This process is illustrated in Figure 2.

Figure 2: This figure shows the user experience workflow in Data Formulator. On the left it shows Data Threads, and the user clicks a line chart that visualizes the renewable percentage of 20 countries and expands it in the main panel. In the middle it shows “Concept Encoding Shelf”, and the user provides an instruction “Show only top 5 CO2 emission countries”. On the right it shows the result produced from running the user instruction with AI: the result is a table with three columns “Year” “Entity” “Renewable Percentage” and int contains only top 5 CO2 countries’ values; a line chart that only contains these five countries trends is also generated. The line chart is added to data threads. — Figure 2. To create a new chart, users can select a previously created chart from the data threads and then use a combination of UI elements and language to describe their intent.

Data threads enable users to review and modify charts they created previously. This iterative process streamlines the editing and refinement process, as the LLM adapts past code to new contexts. Without this feature, users would need to provide more detailed prompts to recreate designs from scratch. This iterative mechanism also allows users to continue updating their charts until they’re satisfied.

Figure 3: This figure illustrates how Data Formulator’s data threads work. On the left side, it shows two data threads, one is the derivation process of electricity produced from each energy source from each country from 2000 to 2020, the other is the thread showing that the user derives the renewable percentage of each country per year followed by a line chart that shows the rankings of these countries. The figure illustrates that each of the plots is backed by a python data transformation code to derive data appropriate to the user instruction. On the right it shows actions users can take in local data threads: (a) the user can click and rerun a previous instruction, (b) the user can provide a new instruction to follow up, (c) the user can click the previous card and revise instruction and rerun. — Figure 3: Data Formulator’s data threads support complex navigation, quick editing, and the rerunning of previous instructions.

Data Formulator’s framework

Data Formulator’s architecture separates data transformation from chart configuration, improving both the user experience and AI performance. Upon receiving user specifications, the system follows a three-step process: (1) it generates a Vega-Lite script, which defines how data is visualized; (2) it instructs the AI to handle data transformation; and (3) it creates the chart using the converted data, as illustrated in Figure 4.

Figure 4: This figure shows data formulator architecture. The left side shows user’s chart specification with Year on x-axis, rank on y-axis, Entity on color with instruction “rank by renewable percentage”. In the first step, Data Formulator generates a Vega-Lite line chart template with field names. In step 2, Data Formulator compiles a prompt containing “system prompt”, “Context (data fields + sample data + dialog history)” and “Goal (user instruction + expected fields)”, and AI takes this prompt to generate a python code to transform the data. In step 3, Data Formulator combines the data and the Vega-Lite spec to create a line chart that shows ranking of the countries from 2000 to 2020. — Figure 4: Behind the scenes, Data Formulator compiles a Vega-Lite script from the Concept Encoding Shelf (1), prompts the LLM to generate the necessary code for preparation (2), and, upon creating new data, creates the chart (3).

Implications and looking forward

Refining how users interact with AI-powered tools is essential for improving how they communicate their requirements, paving the way for more efficient and effective collaboration. By integrating UI elements and natural language input, we designed Data Formulator to let users to define their visualization needs with precision, leading to better results and reducing the need for multiple clarifications.

While Data Formulator addresses some challenges in data transformation and visualization authoring, others remain. For example, how can AI assist in cleaning unstructured data without losing critical information? And how can it help users define clear data analysis goals when starting with ambiguous or undefined objectives? We’re actively investigating these research questions and invite you to contribute by building on the Data Formulator codebase (opens in new tab).

Learn more about our research efforts on human-AI interaction by exploring how we design dynamic UI widgets (opens in new tab) for visualization editing. You can also view a demo of the Data Formulator project on GitHub Codespace (opens in new tab).

Acknowledgements

We’d like to thank Bongshin Lee, John Thompson, and Gonzalo Ramos for their feedback and contributions to this project.

The post Data Formulator: Exploring how AI can help analysts create rich data visualizations appeared first on Microsoft Research.

Stress-testing biomedical vision models with RadEdit: A synthetic data approach for robust model deployment

September 30, 2024

by Alyssa Hughes (2ADAPTIVE LLC dba 2A Consulting) Microsoft AI

This paper has been accepted at the 18th European Conference on Computer Vision (ECCV 2024) (opens in new tab), the premier gathering on computer vision and machine learning.

On the left is a simple drawing of the lungs. The drawing shows the borders of the left and right lung as well as the trachea and the left and right main stem bronchi. The text under the drawing reads: Original image. To the right of the drawing are the 3 additional inputs of RadEdit. They are arranged vertically. On top there is an example editing prompt. It reads

Biomedical vision models are computational tools that analyze medical images, like X-rays, MRIs, and CT scans, and are used to predict medical conditions and outcomes. These models assist medical practitioners in disease diagnosis, treatment planning, disease monitoring, and risk assessment. However, datasets used to train these models can be small and not representative of real-world conditions, which often leads to these models performing worse in actual medical settings. To avoid misdiagnoses and other errors, these models must be rigorously tested and adjusted to perform reliably across different conditions.

To mitigate the dataset challenge of not having enough diverse data and to improve the testing of biomedical vision models, we developed “RadEdit: Stress-testing biomedical vision models via diffusion image editing,” presented at ECCV 2024. Aligned with the Microsoft Responsible AI principles of reliability and safety, RadEdit helps researchers identify when and how models might fail before they are deployed in a medical setting. RadEdit uses generative image editing to simulate different dataset shifts (e.g., a shift in the patients’ demographics), helping researchers to identify weaknesses in the model. By employing text-to-image diffusion models trained on a wide array of chest X-ray datasets, RadEdit can generate synthetic yet realistic X-rays.

RadEdit’s approach involves using multiple image masks (binary images representing designated regions of a reference image), as illustrated in Figure 1, to limit changes to specific areas of the image, therefore preserving their integrity. It generates synthetic datasets free from spurious correlations and artifacts, addressing shortcomings in existing editing techniques. Traditional editing techniques often overlook biases within the generative model, leading to synthetic data that perpetuate these biases. Alternatively, these other editing techniques restrict edits to the point of unrealistic outputs.

How RadEdit works

RadEdit improves biomedical image editing using three key inputs, as illustrated in Figure 1:

Text prompt: Defines the desired modifications. For example, a disease can be added with a description like “Consolidation”
Edit mask: A binary mask indicating the main area to be modified, such as the “right lung”
Keep mask: A binary mask outlining parts of the original image to be preserved, like the “left lung”

RadEdit depends on a diffusion model for image editing, where the image is first converted to a latent noise representation by inverting the diffusion generative process. The noise representation is then iteratively denoised over multiple time steps. During each step, RadEdit:

Uses the text prompt to conditionally generate pixels within the edit mask with classifier-free guidance.
Generates the remaining pixels based on the original image and edited area.
Replicates the content of the original image within the “keep” mask, ensuring that this area remains unaltered.

Finally, a quality check ensures that the edited image is faithful to the editing prompt. RadEdit uses Microsoft’s BioViL-T to compute an image-text alignment score that we can then use to filter out low-quality and unfaithful edits.

Simulating dataset shifts

A key feature of RadEdit is its ability to simulate dataset shifts with precise spatial control for comprehensive model performance evaluation. This includes differences in image acquisition, the appearance of underlying pathologies, and population characteristics.

Particularly notable is RadEdit’s ability to simulate image variations from different sources (e.g., different hospitals), helping researchers identify potential biases in models trained solely on data from one source. For example, in a COVID-19 study, if all positive cases in a dataset come from a single hospital and all negative cases come from a different hospital, a model trained on detecting COVID-19 might over-rely on hospital-specific indicators from the X-ray images. Among others, we considered the laterality markers in the corners of an X-ray (e.g., a highly visible letter “L” on the left side of the X-ray) as well as the amount of black space on the image edges to be hospital-specific indicators. To test if a model relies too much on differences in image acquisition, we created synthetic data using RadEdit, where we removed COVID-19 features while retaining hospital-specific indicators. After creating the synthetic dataset with the COVID-19 features no longer present, we can test if the COVID-10 detection model still predicts COVID-19. This would indicate that the model is biased with respect to hospital-specific indicators.

RadEdit can also remove specific diseases, like pneumothorax (collapsed lung), from an image while keeping treatment features like chest drains. This helps researchers understand how models detect and understand “visual shortcuts.” Because RadEdit maintains the size and location of the main anatomical structures (like lungs, ribs, and heart), it can also be used to stress-test segmentation models. For example, RadEdit can add rare abnormalities or medical devices to lung images to test how well segmentation models handle new variations, ensuring they generalize accurately across different populations. Figure 2 illustrates these three examples of stress-testing scenarios.

All drawings of lungs are the same as in Figure 1. The drawing shows the borders of the left and right lung as well as the trachea and the left and right main stem bronchi. In the first row on the left there are two drawings of a lung. The first drawing of a lung labelled — Figure 2: Stress-testing models by simulating dataset shifts via image editing.

Stress-testing multimodal models

We have used RadEdit to stress-test image classification and segmentation models, and we see potential for future applications in complex multimodal tasks like generating radiology reports. RadEdit can help identify limitations in multimodal large language models (MLLMs) like Microsoft’s MAIRA-1 and MAIRA-2, especially when dealing with rare conditions or unusual combinations of findings not well-represented in the training data. These MLLMs take one or more radiological images and relevant clinical information as input to produce detailed text reports.

RadEdit can generate synthetic image-report pairs for challenging scenarios. For example, manually editing a report to describe a rare combination of findings and then using RadEdit to edit the corresponding image, creates a valuable test case for the MLLM. This approach allows us to stress-test MLLM with diverse synthetic data, identifying weaknesses or biases and ensuring the model is more robust in real-world scenarios. This is a crucial step for using these models safely and effectively in clinical settings.

Implications and looking forward

RadEdit offers significant advantages for the biomedical research community. It helps identify biases and blind spots before deployment, helping to ensure that biomedical vision models perform reliably in clinical settings. By simulating dataset shifts, RadEdit reduces the need to collect additional evaluation data, saving time and resources.

RadEdit is applicable to a wide range of settings and can be used to stress-test state-of-the-art foundation models like Microsoft’s Rad-DINO (opens in new tab) and BiomedParse (opens in new tab). By integrating RadEdit into their research workflow, researchers can validate that their biomedical vision models are not only state-of-the-art but also more prepared for the complexities of real-world deployment. In the future, we envision RadEdit being applied to more complex multimodal tasks, such as generating radiology reports.

The code for RadEdit as well as the weights of the diffusion model we used can be found under https://huggingface.co/microsoft/radedit (opens in new tab).

Acknowledgments

We would like to thank our paper coauthors: Fernando Pérez-García, Sam Bond-Taylor, Pedro P. Sanchez, Boris van Breugel, Harshita Sharma, Valentina Salvatelli, Maria T. A. Wetscherek, Hannah Richardson, Matthew P. Lungren, Aditya Nori, and Ozan Oktay, as well as all our collaborators across Microsoft Cloud for Healthcare and Microsoft Health Futures.

RadEdit is intended for research purposes only and not for any commercial or clinical use.

The post Stress-testing biomedical vision models with RadEdit: A synthetic data approach for robust model deployment appeared first on Microsoft Research.

Microsoft Research Forum Episode 4: The future of multimodal models, a new “small” language model, and other AI updates

September 26, 2024

by Brenda Potts Microsoft AI

Microsoft Research Forum is a continuous exchange of ideas about science and technology research in the era of general AI. In the latest episode (opens in new tab), researchers discussed the latest multimodal AI models, advanced benchmarks for AI evaluation and model self-improvement, and an entirely new kind of computer for AI inference and hard optimization. Researchers at Microsoft are working to explore breakthrough technology that can help advance everything from weather prediction to materials design.

Below is a brief recap of the event, including select quotes from the presentations. Register to join future Research Forum episodes and view previous sessions. Transcripts and additional resources can be found in the Research Forum briefing book.

Keynote

Phi-3-Vision: A highly capable and “small” language vision model (opens in new tab)

Jianfeng Gao introduced Phi-3-Vision, an advanced and economical open-source multimodal model. As a member of the Phi-3 model family, Phi-3-Vision enhances language models by integrating multisensory skills, seamlessly combining language and vision capabilities.

“Phi-3-Vision is the first multimodal model in the Phi small model family. It matches and sometimes exceeds some of the capabilities of much larger models … at a much lower cost. And to help everyone build more affordable and accessible AI systems, we have released the model weights into the open-source community.”

— Jianfeng Gao, Distinguished Scientist and Vice President, Microsoft Research Redmond

Panel Discussion

Beyond language: The future of multimodal models in healthcare, gaming, and AI (opens in new tab)

This discussion examined the transformative potential and core challenges of multimodal models across various domains, including precision health, game intelligence, and foundation models. Microsoft researchers John Langford, Hoifung Poon, Katja Hofmann, and Jianwei Yang shared their thoughts on future directions, bridging gaps, and fostering synergies within the field. 

“One of the really cutting-edge treatments for cancer these days is immunotherapy. That works by mobilizing the immune system to fight the cancer. And then one of the blockbuster drugs is a KEYTRUDA, that really can work miracles for some of the late- stage cancers … Unfortunately, only 20 to 30 percent of the patients actually respond. So that’s … a marquee example of what are the growth opportunity in precision health.”
_—_{Hoifung Poon}_{, General Manager, Microsoft Research Health Futures}

“We experience the world through vision, touch, and all our other senses before we start to make sense of any of the language that is spoken around us. So, it’s really, really interesting to think through the implications of that, and potentially, as we start to understand more about the different modalities that we can model and the different ways in which we combine them.”
_—_{Katja Hofmann}_{, Senior Principal Researcher, Microsoft Research}

“To really have a capable multimodal model, we need to encode different information from different modalities, for example, from vision, from language, from even audio, speech, etc. We need to develop a very capable encoder for each of these domains and then … tokenize each of these raw data.”
_—_{Jianwei Yang}_{, Principal Researcher, Microsoft Research Redmond}

Lightning Talks

Analog optical computing for sustainable AI and beyond (opens in new tab)

Research Forum | Episode 4 Talk 1 | Francesca Parmigiani and Jiaqi Chu

This talk presented a new kind of computer—an analog optical computer—that has the potential to accelerate AI inference and hard optimization workloads by 100x, leveraging hardware-software co-design to improve the efficiency and sustainability of real-world applications. 

“Most likely, you or your loved ones have been inside an MRI scan — not really a great place to be in. Imagine if you can reduce that amount of time from 20 to 40 minutes to less than five minutes.”
_{— Francesca Parmigiani, Principal Researcher, Microsoft Research Cambridge}

“I’m really excited to share that we have just completed the second generation of [this] computer. It is much smaller in physical size, and this is a world first in that exactly the same computer is simultaneously solving hard optimization problems and accelerating machine learning inference. Looking ahead, we estimate that at scale, this computer can achieve around 450 tera operations per second per watt, which is a 100-times improvement as compared to state-of-the-art GPUs.”
_—_{Jiaqi Chu}_{, Principal Researcher, Microsoft Research Cambridge}

Direct Nash Optimization: Teaching language models to self-improve with general preferences (opens in new tab)

Research Forum | Episode 4 Talk 2 | Corby Rosset

This talk explored teaching language models to self-improve using AI preference feedback, challenging the model to play against itself and a powerful teacher until it arrives at a Nash equilibrium, resulting in state-of-the-art win rates against GPT-4 Turbo on benchmarks such as AlpacaEval and MT-Bench. 

“The traditional way to fine-tune an LLM for post-training … basically tells the model to emulate good behaviors, but it does not target or correct any mistakes or bad behaviors that it makes explicitly. … Self-improving post-training explicitly identifies and tries to correct bad behaviors or mistakes that the model makes.”
_{— Corby Rosset, Senior Researcher, Microsoft Research AI Frontiers}

Project Aurora: The first large-scale foundation model of the atmosphere (opens in new tab)

Research Forum | Episode 4 Talk 3 | Megan Stanley

This talk presented Aurora, a cutting-edge foundation model that offers a new approach to weather forecasting that could transform our ability to predict and mitigate the impacts of extreme events, air pollution, and the changing climate.

“If we look at Aurora’s ability to predict pollutants such as nitrogen dioxide that are strongly related to emissions from human activity, we can see that the model has learned to make these predictions with no emissions data provided. It’s learned the implicit patterns that cause the gas concentrations, which is very impressive.”
_{— Megan Stanley, Senior Researcher, Microsoft Research AI for Science}

A generative model of biology for in-silico experimentation and discovery (opens in new tab)

Research Forum | Episode 4 Talk 4 | Kevin Yang

This talk explored how deep learning enables generation of novel and useful biomolecules, allowing researchers and practitioners to better understand biology. This includes EvoDiff, a general-purpose diffusion framework that combines evolutionary-scale data with the distinct conditioning capabilities of diffusion models to generate new proteins, given a protein sequence.

“Often, protein engineers want proteins that perform a similar function to a natural protein, or they want to produce a protein that performs the same function but has other desirable properties, such as stability. By conditioning EvoDiff with a family of related sequences, we can generate new proteins that are very different in sequence space to the natural proteins but are predicted to fold into similar three-dimensional structures. These may be good starting points for finding new functions or for discovering versions of a protein with desirable properties.”
_{— Kevin Yang, Senior Researcher, Microsoft Research New England}

Fostering appropriate reliance on AI (opens in new tab)

Research Forum | Episode 4 Talk 5 | Mihaela Vorvoreanu

Since AI systems are probabilistic, they can make mistakes. One of the main challenges in human-AI interaction is to avoid overreliance on AI and empower people to determine when to accept or not accept an AI system’s recommendation. This talk explores Microsoft’s work in this area.

“This is where I think it is our responsibility as people working in UX disciplines—as people researching UX and human-computer interaction—to really, really step up to the front and see how it is our moment to shine and to address this problem.”
_{— Mihaela Vorvoreanu, Director UX Research and Responsible AI Education, Microsoft AI Ethics and Effects in Engineering and Research (Aether)}

The post Microsoft Research Forum Episode 4: The future of multimodal models, a new “small” language model, and other AI updates appeared first on Microsoft Research.

Research Focus: Week of September 23, 2024

September 25, 2024

by Alyssa Hughes (2ADAPTIVE LLC dba 2A Consulting) Microsoft AI

Welcome to Research Focus, a series of blog posts that highlights notable publications, events, code/datasets, new hires and other milestones from across the research community at Microsoft.

NEW RESEARCH

ProbTS: Benchmarking Point and Distributional Forecasting across Diverse Prediction Horizons

Time-series forecasting is a technique used to predict future values based on previously observed data points over time. It has extensive applications for traffic flow, renewable energy, retail, finance, and climate, among other uses. For these applications, it is crucial to provide forecasts across different prediction horizons, addressing both short- and long-term planning needs. Many decision-making processes also require not only point forecasts to quantify planning efficiency but also robust distributional estimations to manage uncertainty effectively.

Delivering precise point and distributional forecasts across a spectrum of prediction horizons is a significant challenge. Prior research on developing deep learning models for time-series forecasting has often concentrated on isolated aspects, such as long-term point forecasting or short-term probabilistic estimations. This may result in skewed methodological choices and hinder the adaptability of these models to uncharted scenarios. While there is a rising trend in developing universal forecasting models, a thorough understanding of their advantages and drawbacks is still lacking.

In a recent paper: ProbTS: Benchmarking Point and Distributional Forecasting across Diverse Prediction Horizons, researchers from Microsoft and external collaborators present a platform to evaluate these fundamental forecasting needs and to conduct a rigorous comparative analysis of related recent studies. They examine the latest models for universal time-series forecasting and discover that their analyses of methodological strengths and weaknesses are also applicable to these universal models. They then outline the limitations inherent in current research and underscore several avenues for future exploration.

Read the paper

NEW RESEARCH

SynDL: A Large-Scale Synthetic Test Collection for Passage Retrieval

Information retrieval (IR) involves identifying and retrieving recorded data that is relevant to an information need. Large-scale test collections play a crucial role in IR research. However, existing IR research studies are commonly developed on small-scale datasets that rely on human assessors for relevance judgments – a time-intensive and expensive process. Recent studies have shown the strong capability of large language models (LLMs) in producing reliable relevance judgments with human accuracy but at a greatly reduced cost.

In a recent paper: SynDL: A Large-Scale Synthetic Test Collection for Passage Retrieval (opens in new tab), researchers from Microsoft and external colleagues address the missing large-scale ad-hoc document retrieval dataset. They extend the TREC Deep Learning Track (opens in new tab) test collection via additional language model synthetic labels to enable researchers to test and evaluate their search systems at a large scale. Such a test collection includes more than 1,900 test queries from previous tracks. The researchers compare system evaluation with past human labels and show that their synthetically created large-scale test collection can lead to highly correlated system rankings.

Read the paper

NEW RESEARCH

Intelligent Router for LLM Workloads: Improving Performance Through Workload-Aware Scheduling

LLMs are used for a wide variety of tasks and scenarios, such as chat, question answering, code generation, summarization and reasoning. These tasks exhibit variations in their input and output characteristics. Requests for different tasks with distinct input and output characteristics are often served concurrently at a single model instance, which can lead to spikes in end-to-end latency, time to generate the first token, and time between tokens (in the case of a streaming request). Understanding the interplay between requests of different characteristics is important for optimizing the end-to-end performance during LLM inference.

In a recent preprint, Intelligent Router for LLM Workloads: Improving Performance Through Workload-Aware Scheduling, researchers from Microsoft propose a heuristic-guided reinforcement learning-based intelligent router for data-driven and workload-aware scheduling. This router leverages a trainable response-length predictor, and a novel formulation for estimating the impact of mixing different workloads to schedule queries across LLM instances and achieve over 11% lower end-to-end latency than existing approaches.

Read the paper

INTERNSHIP OPPORTUNITY

Apply now: Microsoft Research Undergrad Internship Program – Summer 2025

The Microsoft Research Undergrad Internship Program offers 12-week internships in Redmond, Washington; New York City; or Cambridge, Massachusetts, for rising college juniors and seniors who are passionate about technology and champion diversity and inclusion.

Come work alongside world-class researchers on state-of-the-art projects. Participants will collaborate with an extended network of visiting faculty, postdoctoral researchers, data and applied scientists, engineers, designers, and doctoral students to make important contributions to new and ongoing research. On-the-job learning will be augmented with mentoring, community building, and networking opportunities. Candidates from groups currently underrepresented in engineering and computer science are strongly encouraged to apply.

Applications (opens in new tab) will be accepted until October 21, 2024. Apply now!

Apply now

The post Research Focus: Week of September 23, 2024 appeared first on Microsoft Research.

Subscribe to the Microsoft Research Podcast:

Transcript

Subscribe to the Microsoft Research Podcast:

Transcript

AI-powered microgrids support resilient communities

Evaluating AI’s impact on microgrid efficiency and equity

Case study: AI-powered microgrid for community energy transition

Project phases

Co-innovation design

Renewables and microgrid siting

Microgrid simulation phase

AI: A path to equity and resilience

Acknowledgements

How GraphRAG works

Introducing DRIFT Search

MedFuzz: Exploring the robustness of LLMs on medical challenge problems

DRIFT Search: A step-by-step process

Why DRIFT search is effective

Benchmarking DRIFT search

Availability

Future research directions

Learn more:

Subscribe to the Microsoft Research Podcast:

Transcript

NEW RESEARCH

Securely Training Decision Trees Efficiently

NEW RESEARCH

Multi-label audio classification with a noisy zero-shot teacher

NEW RESEARCH

Tabularis Revilio: Converting Text to Tables

Microsoft Research Forum Episode 4

NEW RESEARCH

Confidential Container Groups: Implementing Confidential Computing on Azure Container Instances

NEW VIDEO SERIES

AI for Business Transformation with Peter Lee and Vijay Mital

Creating and refining charts with the Concept Encoding Shelf and data threads

Data Formulator’s framework

Implications and looking forward

Acknowledgements

AI Explainer: Foundation models ​and the next era of AI

How RadEdit works

Simulating dataset shifts

Stress-testing multimodal models

Implications and looking forward

Acknowledgments

Keynote

Panel Discussion

Lightning Talks

NEW RESEARCH

ProbTS: Benchmarking Point and Distributional Forecasting across Diverse Prediction Horizons

NEW RESEARCH

SynDL: A Large-Scale Synthetic Test Collection for Passage Retrieval

Research Focus: Week of September 9, 2024

NEW RESEARCH

Intelligent Router for LLM Workloads: Improving Performance Through Workload-Aware Scheduling

INTERNSHIP OPPORTUNITY

Apply now: Microsoft Research Undergrad Internship Program – Summer 2025

Navigation

GenAI Vision Endless Possibilities

"I'm interested in things that change the world or that affect the future and wondrous, new technology where you see it, and you're like, 'Wow, how did that even happen? How is that possible?'" -- Elon Musk

Copyright © 2019-2025 Vedere AI. All Rights Reserved.

AI Explainer: Foundation models and the next era of AI