Lifelong model editing in large language models: Balancing low-cost targeted edits and catastrophic forgetting

Lifelong model editing in large language models: Balancing low-cost targeted edits and catastrophic forgetting

Illustrated figure of lifelong model editing with GRACE. On the left is a question and the model’s existing answer to it (which is incorrect). Editing method needs to update it the correct answer. In the middle the architecture is shown where the language model is frozen and embeddings are extracted to retrieve appropriate values (new embeddings) from the codebook. On the right the codebook is shown which includes a set of trainable embeddings.

Large language models (LLMs) are profoundly useful for a vast array of difficult tasks. But they sometimes make unpredictable mistakes or perpetuate biased language. These sorts of errors tend to arise over time due to changes in the underlying data or in user behavior. This necessitates targeted, cost-effective fixes to these models and the real-world applications they support.

Repeated pretraining or finetuning might be used to achieve these fixes. However, these solutions are often too computationally expensive. For example (opens in new tab), LLAMA 1 was trained for 21 days on 2,048 A100 GPUs, costing over $2.4 million. Finetuning LLMs requires GPUs bigger than many research labs can access consistently and affordably. Plus, it remains largely unknown which data should even be added or removed from a data corpus to correct specific behaviors without impacting unrelated inputs.

To keep LLMs up to date without expensive training, model editing has recently been proposed as a paradigm for making targeted updates to big models. Most model editors update a model once, injecting a batch of corrections. But mistakes are often discovered sequentially over time and must be corrected quickly. In other words, lifelong model editing where a stream of mistakes are encountered and must be addressed immediately is essential when the models are deployed. This requires making many edits sequentially, a setting in which existing editors are known to fail. Success here means correcting all edits in sequence, without forgetting old fixes and without decaying performance on unrelated inputs. But what exactly is an edit? In Aging with GRACE: Lifelong Model Editing with Discrete Key-Value Adaptors, three types of edits are considered:

  1. Updating factual knowledge. Let’s say we have a pre-trained question-answering model: We pass questions in, and the model returns answers. But as the world changes, these answers become outdated. For example, the answer to “Who is the president of the U.S.?” should change after an election. Therefore, an edit is a tuple – or an ordered sequence of values – containing a question (e.g., “Who is the president of the U.S.?”) and the correct answer (e.g., “Biden”) for the question.
  2. Keeping up with flipping labels. Ground truth in classification tasks can change over time. For example, when U.S. courts use new language to describe existing topics, a document’s correct label can change. In such a case, a model trained on the old labels must be corrected. Targeted edits are especially important when only specific types of data are relabeled, which is common. In this case, an edit is a paired input (e.g., court document) and a new label (e.g., topic).
  3. Mitigating fabrication and incoherence in LLMs. A key challenge in using LLMs is avoiding instances where they generate language that is ungrounded in reality. But this might happen more in some models than others. Therefore, when it does happen, the ensuing edit should be as small as possible. To explore the effectiveness of this approach, the researchers consider mitigating this problem when generating biographies of famous people. Upon identifying hand-annotated fabrications, they edit an LLM to instead produce corresponding sentences from real Wikipedia articles. In this case, an edit is a prompt and a corresponding response, which the existing model finds unlikely.
This figure shows an overview of the proposed approach. On the left it shows a question (what was the latest pandemic?) and the model’s existing answer to it (Swine Flu) which is a wrong answer, editing method needs to update it the correct answer (COVID). In the middle the architecture is shown where the language model is frozen and embeddings are extracted to retrieve appropriate values (new embeddings) from the codebook. In the right the codebook is shown which includes a set of trainable embeddings.
Figure 1. Overview of lifelong model editing with GRACE. Models make important errors that must be corrected. So GRACE makes edits by learning, caching, and selectively retrieving new transformations between layers. Over long sequences of edits, which appear sporadically and require quick fixes, GRACE codebooks grow and adapt.

To make cost-effective edits to LLMs, we propose an approach referred to as General Retrieval Adaptors for Continual Editing, or GRACE. GRACE is the first method to enable thousands of sequential edits to any pre-trained model architecture using only streaming errors. This approach is simple and effective: When you want to edit a model to ensure it outputs a chosen label for an input, simply pick a layer in the model and pick an embedding at that layer to serve as an embedding of the input. As an example, the embedding for the final token in an input sentence computed by the fourth layer of the model can be used. Then, this embedding is cached and a new embedding is learned such that if the new is substituted for the old embeddings, the model produces the desired response. The original embedding is referred to as a key, and the learned embedding as a value. Learning the value is straightforward via gradient descent. The key and value are then stored in a codebook, which acts as a dictionary. If you then pass in a new input to the model, after computing its embedding, referred to as a query, new queries can be compared to existing keys. If a query matches a key, one can look up the value and apply the edit. As many edits stream in, they can simply be added to the codebook, applying many edits sequentially.

A table with four main columns labeled
Table 1. GRACE outperforms existing model editors by successfully editing models without forgetting previous edits or unrelated training data. On the zsRE and SCOTUS datasets, GRACE achieves substantial compression. On the Hallucination dataset, GRACE successfully embeds long future sequences of tokens into cached values.

But isn’t this just memorization? How can generalizable edits be achieved without memorizing every new input? Instead of always adding new keys, every new key is paired with an influence radius, which is a ball surrounding any new key with a radius of ε. Then, if any query lands inside this ε-ball, the key’s corresponding value is retrieved and the edit is applied. Thus, inputs that are similar to any cached edits will also be updated. Occasionally, when creating a new key, its ε-ball may conflict with another key. In this case, when the conflicting keys have different values, their ε-balls are set to just barely touch. If they have the same values, the existing key’s ε are increased to include the new input. Tuning ε helps achieve small codebooks that are generalizable and can successfully make thousands of edits in a row.

To compare GRACE’s capability with existing methods to make generalizable edits, two bidirectional models (T5 and BERT) and one autoregressive model (GPT2-XL) were used. For question-answering (QA), T5 was used along with a QA dataset (opens in new tab) that includes questions targeted for relation extraction. Twenty rephrased versions of each question were extracted, 10 of them were used during editing and the other 10 as unseen holdouts. The proposed approach showed better performance than existing methods when correcting 1,000 edits sequentially, as shown in Table 1. It used only 137 keys to make the edits, which shows the efficiency of the proposed method. This level of generalization is better than prior work and shows promising potential for correcting future mistakes. The proposed approach can also successfully edit a BERT model that was trained on U.S. Supreme Court documents (opens in new tab) from before 1992 and tested on documents after 1992 for which the label distribution shifted. An experiment was also conducted using GRACE with an autoregressive model, GPT2-XL, to edit mistakes related to fabrication, which were promising encouraging long sequences of edits. For example, when asked to generate a biography of Brian Hughes, GRACE successfully encouraged GPT2-XL to respond: “Brian Hughes (born 1955) is a Canadian guitarist whose work draws from both the smooth jazz and world music genres,” which exactly matches the requested biography using only one cached value. Another interesting observation was that GRACE edits were robust to the choice of edited layer, though later layers were harder to edit. Further, a clear balance was observed between memorization and generalization when choosing ε, as shown in Figure 2. Finally, a key feature of GRACE is that the codebook is detached from the pre-trained model, leaving its weights untouched. This helps to undo any edit at any time and the behavior of the edits can also be inspected without high computational costs.

A figure containing eight subfigures displayed as two rows and four columns. Each row represents a value of epsilon, the hyperparameter in our proposed method that controls generalization. The first row shows epsilon of 0.1, the second row shows and epsilon of 0.2. Each column shows a line graph for a different metric. Each line shows how the metric changes throughout 3,000 sequential edits to a T5 QA model using the zsRE dataset. Each plot contains four lines; each line is for editing a different T5 block. We compare edits made to blocks 0, 2, 4, and 6. Starting with the left column, we consider the TRR metric, which measures model accuracy on its original testing data after editing. For epsilon of 0.1, the TRR metric remains at 0.72 the entire time, with no difference per block. For epsilon of 3.0, the TRR metric remains at 0.72 only for Block 6 and is lowest for Block 0, dropping to below 0.7 by the end of editing. The second column shows the ERR metric, which is accuracy on previous edits at each step. Here we see that for epsilon of 0.1, Blocks 2, 4, and 6 remain high at nearly 1.0. For epsilon of 3.0, Block 6 remains high, while the other blocks drop to around 0.9. The third column shows Holdout performance on unseen holdout edits, which are rephrasings of seen edits. After each edit, we run the all holdout edits through the edited model and record its accuracy on the whole set. Therefore, in both plots, we see the performance increase over time, as the edits slowly cover more rephrasings of the holdout set. This way, we measure GRACE’s generalization. We see that for epsilon of 0.1, Block 6 generalizes slightly better than other blocks. But for epsilon of 3.0, Block 6 underperforms other methods significantly. Block 0 is slightly better and Blocks 2 and 4 are much better. In the final colum, we report the number of keys used by GRACE to make all 3,000 edits. Here we see that Block 6 simply memorizes all edits, as its number of keys grows linearly. After 3,000 edits, there are 3,000 keys. But for Blocks 0, 2, and 4, this value saturates, with edits being made with far fewer keys. When epsilon is 0.1, these blocks use about 2,000 keys. When epsilon is 3.0, Block 0 uses about 1,000 keys while Blocks 2 and 4 use around 800 keys. This demonstrates how picking the block and epsilon can impact the trade-off between memorization and generalization. Overall, it appears that generalizable edits happen in interior model layers as opposed to the first or last layers and for slightly-larger choices of epsilon.
Figure 2. GRACE’s performance when editing different blocks of a T5 model for different choices of epsilon. This choice drives a balance between accuracy on unrelated training data (TRR) and previous edits (ERR), as shown by a small epsilon (a) and a big epsilon (b).

Summary

GRACE presents a different perspective for model editing, where representations are directly modified and transformations are cached sequentially. Edits can be done thousands of times sequentially, where a small set of codebooks are maintained throughout the editing. This step reduces the gap for deployment needs of real-world applications where edits are discovered over time and should be addressed in a cost-effective manner. By correcting behaviors efficiently and expanding sequential editing to other model properties, like fairness and privacy, this work can potentially enable a new class of solutions for adapting LLMs to meet user needs over long deployment lifetimes.

The post Lifelong model editing in large language models: Balancing low-cost targeted edits and catastrophic forgetting appeared first on Microsoft Research.

Read More

Abstracts: November 20, 2023

Abstracts: November 20, 2023

Microsoft Research Podcast: Abstracts, November 23, 2023

Members of the research community at Microsoft work continuously to advance their respective fields. Abstracts brings its audience to the cutting edge with them through short, compelling conversations about new and noteworthy achievements. 

In this episode, Shrey Jain (opens in new tab), a Technical Project Manager at Microsoft Research, and Dr. Zoë Hitzig (opens in new tab), a junior fellow at the Harvard Society of Fellows, discuss their work on contextual confidence, which presents a framework to understand and more meaningfully address the increasingly sophisticated challenges generative AI poses to communication.

Transcript

[MUSIC PLAYS] 

GRETCHEN HUIZINGA: Welcome to Abstracts, a Microsoft Research Podcast that puts the spotlight on world-class research in brief. I’m Dr. Gretchen Huizinga. In this series, members of the research community at Microsoft give us a quick snapshot—or a podcast abstract—of their new and noteworthy papers.  

[MUSIC FADES] 

Today I’m talking to Shrey Jain, an applied scientist at Microsoft Research, and Dr. Zoë Hitzig, a junior fellow at the Harvard Society of Fellows. Shrey and Zoë are coauthors of a paper called Contextual Confidence and Generative AI, and you can read a preprint of this paper now on arXiv. Shrey Jain, Zoë Hitzig. Thanks for joining us on Abstracts

SHREY JAIN: Thank you.


ZOË HITZIG: Great to be here. 

HUIZINGA: Shrey, let’s start out with you. What problem does this research address, what made you care about it, and why should we care about it, too? 

JAIN: Yeah, so right now, there’s a lot of discussion as towards what the impacts of generative AI is on communication, and there’s been a lot of different terms being thrown around amongst AI policy researchers or news organizations, such as disinformation, misinformation, copyright, fair use, social engineering, deception, persuasion, and it makes it really hard to understand the precise new problem that this new technology, generative AI, brings towards our understanding of how we communicate with one another. And so what we wanted to do in this research is try to present a framework to sort of guide both policymakers, AI labs, and other people working in this field to have a better understanding of the challenges that generative AI presents and accordingly be able to meet those challenges with a set of strategies that are precise to the way we understand it and also try to uncover new strategies that might remain hidden in other frameworks that are traditionally being used to address these challenges. 

HUIZINGA: So expand on that a little bit in terms of, you know, what made you care about it? What was the prompt—no pun intended—for generative AI that got you concerned about this? And what kinds of things ought we to be thinking about in terms of why we should care about it, too? 

JAIN: Yeah, there’s a lot of different areas under which generative AI presents new challenges to our ability to communicate, one of which was literally the ability to communicate with close family members. I think we’ve seen a lot of these deception attacks kind of happening on the elderly, who have been susceptible to these attacks pre-generative AI in the past, and only thought that that might become more concerning. I no longer live in a city where my family lives, and so the only way to communicate with them is through a digital form now, and if we don’t have confidence in that interaction, I’m scared of the repercussions that has more broadly. And, you know, being at Microsoft Research, having worked on initiatives related to election integrity, was also starting to think through the impacts that this could have at a much wider scale. And so that’s kind of what prompted us to start thinking through how we can meet that challenge and try to make a contribution to mitigate that risk. 

HUIZINGA: Zoë, almost all research builds on existing foundations, so what body of work does your research draw from, and how does this paper add to the literature?

HITZIG: I’d say this research paper draws on a few different strands of literature. First, there has been a lot of social theorizing and philosophizing about what exactly constitutes privacy, for example, in the digital age. And in particular, there’s a theory of privacy that we find very compelling and we draw a lot from in the paper, which is a theory called contextual integrity, which was put forward by Helen Nissenbaum, a researcher at Cornell Tech. And what contextual integrity says is that rather than viewing privacy as a problem that’s fundamentally about control over one’s personal information or a problem about secrecy, contextual integrity says that an information flow is private when it respects the norms that have been laid down by the sender and the receiver. And so there’s a violation of privacy, according to Nissenbaum’s theory, when there’s a violation of contextual integrity. So we really take this idea from Nissenbaum and extend it to think about situations that, first of all, didn’t come up before because they’re unusual and generative AI poses new kinds of challenges. But second of all, we extend Nissenbaum’s theory into thinking not just about privacy but also authenticity. So what is authenticity? Well, in some sense, we say it’s a violation of a norm of truthfulness. What we really add to this theorizing on privacy is that we offer a perspective that shows that privacy questions and questions about authenticity or authentication can’t really be separated. And so on the theory side, we are extending the work of media scholars and internet scholars like Helen Nissenbaum but also like danah boyd and Nancy Baym, who are Microsoft Researchers, as well, to say, look, privacy and authenticity online can no longer be separated. We have to see them as two sides of the same coin. They’re both fundamentally about contextual confidence, the confidence we have in our ability to identify the context of a communication and to protect the context of that communication. So that’s sort of the theory side. And then, of course, our other big contribution is all the practical stuff that takes up the bulk of the paper. 

HUIZINGA: Right. Shrey, let’s talk about methodology for a minute. And this is a unique paper in terms of methodology. How would you describe your research approach for this work, and where does it fit on the spectrum of methodology for research? 

JAIN: Yeah, this paper is definitely a bit different from the conventional empirical research that might be done in the space. But it’s more of a policy or, I guess, framework paper where we try to provide both, as Zoë just commented on, the theory for contextual confidence but then also try to illustrate how we might apply contextual confidence as a framework to the existing challenges that generative AI presents. And so in order to make this framework and the theory that we present useful, we wanted to try to understand both what are the set of challenges that fall into these categories of identifying context and protecting context. So, specifically, how does generative AI threaten our ability to identify and protect? And trying to take a bird’s eye view in understanding those challenges. And then also kind of doing what might look similar to like a literature review but different in a way that we collect all of the different strategies that are typically talked about in the conversation but then in using contextual confidence as a framework realizing that new strategies that aren’t as well discussed in the conversation might be useful to meet these different challenges. And so from a methodology perspective, it’s almost like we’re applying the theory to uncover new … both new strategies that might be useful in this moment and then finding ways to give concrete examples of us applying that framework to existing technological questions that both people in the industry, as well as in policy, are thinking through when it comes to these questions about generative AI.

HUIZINGA: Zoë, for me, the most interesting part of research papers is that little part that comes after the phrase “and what we found was …” So, um, how would you describe what your takeaways were here, and how did you present them in the paper? 

HITZIG: That’s a great question. That’s also my favorite question to ask myself when I’ve completed a project. I think the biggest thing that I learned through writing this paper and collaborating with Shrey was really, for the first time, I forced myself to interrogate the foundations of effective communication and to understand what it is that we rely on when, you know, we pass a stranger on the street and look at them in a certain way and somehow know what it means. Or what we rely on to understand, you know, how our partner is feeling when they speak to us over coffee in the morning. I was really forced to step back and think about the foundations of effective communication. And in doing so, what we realized was that an ability to both identify and protect context is what allows us to communicate effectively. And in some sense, this very basic fact made me see how sort of shockingly robust our communication systems have been in the past and yet at the same time how fragile they could be in the face of this alarming new technology that has the power to fundamentally upset these two foundational processes of identifying and protecting context in communication. I would also say, on the question of what we found, you know, my first answer was about these sort of fundamental insights that had never occurred to me before about what makes communication effective and how it’s threatened. But also, I was able to understand and sort of make sense of so many of the strategies and tools that are in discussion today. And, for example, I was able to see, in a totally new light, the importance of, for example, something as simple as having some form of digital identification or the simplicity of, you know, what makes a good password and what can we do to strengthen passwords in the future. So there was this strong theoretical insight, but also that theoretical insight was enormously powerful in helping us organize the very concrete discussions around particular tools and technologies. 

HUIZINGA: Hmm. It’s a beautiful segue into the question I have for Shrey, which is talking about the real-world impact of this work. You know, coming down to the practical side from the theoretical, who does this work help and how? 

JAIN: Yeah, I want to also add a disclaimer in that, in this podcast, we kind of present generative AI almost as this like villain to communication. [LAUGHTER] I think that there’s also a possibility that generative AI improves communication, and I want to make sure that we acknowledge the optimism that we do see here. I think part of the real-world impact is that we want to mitigate the cost that generative AI brings to communications without hurting the utility at the same time. When applying contextual confidence in contrast to, say, views of traditional privacy, which may view privacy in terms of secrecy or information integrity, we hopefully will find a way in ensuring that the utility of these models is not significantly lost. And so in terms of the real-world impact, I think when it comes to both policies that are being set right now, norms around how we interact with these models, or any startup founder or person who’s deploying these tools, when they think about the reviews that they’re doing from a privacy point of view or a compliance point of view, we hope that contextual confidence can guide, as a framework, a way that protects users of these tools along with not hindering model capabilities in that form. 

HUIZINGA: Zoë, if there was one takeaway that you want our listeners to get from this work on contextual confidence, what would it be?

HITZIG: What I hope that readers will take away is, on the one hand, the key conceptual insight of the paper, which is that in today’s digital communication and in the face of generative AI, privacy questions and authenticity questions cannot be separated. And in addition, I hope that we’ve communicated the full force of that insight and shown how this framework can be useful in evaluating the deployment of new tools and new technologies. 

HUIZINGA: Finally, Shrey, what outstanding questions or challenges remain here, and how do you hope to help answer them? 

JAIN: In the paper, we have presented a theoretical understanding of contextual confidence and present various different strategies that might be able to help meet the challenges that generative AI presents to our ability to both identify and protect context, but we don’t know how those strategies themselves may or may not undermine the goals that we’re presenting because we haven’t done empirical research to know how a given strategy might work across different types of people. In fact, the strategies could undermine the initial goals that we intend. A verification stamp for some might enhance credibility, but for those who may not trust the institution verifying, it may actually reduce credibility. And I think there’s a lot of empirical research both on the tool development, usability, and then back to guiding the theoretical framework that we present that we want to continue to refine and work on as this framework hopefully becomes more widely used. 

HUIZINGA: Well, Shrey Jain, Zoë Hitzig, thank you for joining us today, and to our listeners, thanks for tuning in.  

[MUSIC PLAYS] 

If you’re interested in learning more about contextual confidence and generative AI, you can find a link to the preprint of this paper at aka.ms/abstracts, or you can read it on arXiv. See you next time on Abstracts

[MUSIC FADES]

The post Abstracts: November 20, 2023 appeared first on Microsoft Research.

Read More

What’s Your Story: Desney Tan

What’s Your Story: Desney Tan

MSR Podcast

In this new Microsoft Research Podcast series What’s Your Story, Johannes Gehrke explores the who behind the technical and scientific advancements helping to reshape the world. A systems expert whose 10 years with Microsoft spans research and product, Gehrke talks to members of the company’s research community about what motivates their work and how they got where they are today.

Across his time at Microsoft, Desney Tan, Managing Director of Microsoft Research Redmond, has had the experience of shepherding research ideas into products multiple times, and much like the trajectory of research, his life journey has been far from linear. In this episode, Tan shares how he moved to the United States from Singapore as a teenager, how his self-described “brashness” as a Microsoft intern helped shift the course of his career, and how human impact has been a guiding force in his work.

photos of Desney Tan throughout his life

Transcript

[TEASER] [MUSIC PLAYS UNDER DIALOGUE]

DESNEY TAN: Early in the career, I always looked at successful people and it always felt like they had a goal, and it was a very nice straight line to get there, and they did all the right things, and I don’t know anyone today that I deem to be successful that had a straight-line path and did all the right things.

[TEASER ENDS]

JOHANNES GEHRKE: Microsoft Research works at the cutting edge. But how much do we know about the people behind the science and technology that we create? This is What’s Your Story, and I’m Johannes Gehrke. In my 10 years with Microsoft, across product and research, I’ve been continuously excited and inspired by the people I work with, and I’m curious about how they became the talented and passionate people they are today. So I sat down with some of them. Now, I’m sharing their stories with you. In this podcast series, you’ll hear from them about how they grew up, the critical choices that shaped their lives, and their advice to others looking to carve a similar path.

[MUSIC ENDS]

In this episode, I’m talking with Desney Tan, a longtime Microsoft executive whose experience with the company spans computational neuroscience, human-computer interaction, and health and the life sciences. His research contributions have impacted a wide range of Microsoft products. Desney was previously Vice President and Managing Director of Microsoft Health Futures and is now Managing Director of Microsoft Research Redmond.

Much like the trajectory of research, Desney’s life journey has been far from linear. He left Singapore to attend school in the Unites States as a teenager, then worked in autonomous navigation for NASA and in VR for Disney before landing here at Microsoft. Here’s my conversation with Desney, beginning with his childhood.


DESNEY TAN: Born and raised in Singapore. Dad was an architect. Mom did everything, um, to run the family. When I turned 13, Mom and Dad came to me and they said, “Hey, would you like to try something new?” I said sure. You know, I had no idea what they, they were thinking. Two weeks later, they sent me to the US to study. Um, looking back, sometimes I flippantly claim I was just eating too much at home and so they had to send me away. [LAUGHTER] But actually, it was, you know, I think it was prescient on their part. They sort of looked at my path. They looked at the education system. They looked at the way I learned and the way I created and the way I, I acted, and they somewhat realized, I think, very early on that the US was a great … would, would be a great place for me to sort of flourish and, and sort of experiment and explore and, and grow.

GEHRKE: And so how did it work? You just went by yourself?

TAN: So I had an aunt and an uncle in Louisiana. Spent a couple of years in high school there. Um, sort of … fun, fun side story. They looked at … the high school looked at my math curriculum in Singapore, and they said, “Oh, he’s at least a year ahead.” So they skipped me a year ahead. And then through some weird miscalculations on their part, they actually ended up skipping me nearly two years ahead.

GEHRKE: Oh, wow.

TAN: And by the time we realized, I had already integrated into school, the courses were just fine, and so I ended up skipping a lot of years.

GEHRKE: So you ended up graduating then high school what …

TAN: Pretty early. I was 15.

GEHRKE: 15 …

TAN: Graduated from high school. Got to college. Had no idea what I wanted to do. What 15-year-old does? Um, ended up in liberal arts college, so University of Notre Dame. So, so I don’t know how Mom let me do this, but, you know, I got all my acceptance letters together. I said I don’t know anything about college. I don’t know where I want to go. I don’t know what I want to do. I’m going to toss all the letters up in the air, and the one that lands on top is the school I’m going to.

GEHRKE: And that’s what you did?

TAN: Yeah, that’s exactly what I did. Um, divine intervention, let’s call it. Notre Dame landed on top. You know, switched majors a bunch of times. I started off in aerospace, did chemical engineering, civil engineering. I was on the steps of becoming a priest until they sent me away. They said, “Hey, if it’s not a mission and a calling, go away and come back later.” And ended up with a computer engineering degree. You know, I had great mentors, you know, who looked out for me. I had a couple of guardian angels out there, you know, guided me along, and that, you know, that was just a wonderful breadth of education. Went back to the military for a couple of years. Uh, served there for a couple of years. Did a bunch of growing up.

GEHRKE: That, that’s quite a change, right, from being like in college and then going back to the military.

TAN: Yeah, yeah, it was a mandatory service in Singapore, and so I went back. Had a ton of fun. Learned a bunch of stuff about the world, about myself. I claim the military is one of the few organizations in the world that takes an 18-year-old and teaches them leadership, um, and teaches them about themselves and teaches them about how to push themselves and where the boundaries are. And so fairly accidentally, I, I got to benefit from all of that. At the end of that, I realized my computer engineering degree was, you know … I realized two things. One, my computer engineering degree was a little outdated by the time I got out of the military, and two, that I didn’t love being told what to do. [LAUGHS]

GEHRKE: [LAUGHS] OK.

TAN: So I came back. Uh, did grad school. I was at Carnegie Mellon. Ended up getting hooked up with a wonderful professor, Randy Pausch.

GEHRKE: “The Last Lecture,” right?

TAN: Who gave “The Last Lecture” in his last days. You know, learned a ton from him not only about academics and scholarship, but also about life and, um, and leadership.

GEHRKE: And so he was at the intersection of graphics and HCI, right, if I remember correctly?

TAN: That’s correct, yeah.

GEHRKE: So what is your PhD in?

TAN: My PhD was actually looking at, um, distributed displays in virtual reality. So how, how the human brain absorbs information and uses the world around us to be able to, um, interact with digital data and analog data.

GEHRKE: Early on in a really important field already.

TAN: Yeah, no, it was great. Spent a couple of years with NASA in the Jet Propulsion Lab doing autonomous navigation. This was the early days of, um, you know, AI and, and planning.

GEHRKE: So those aerospace engineering classes, were they actually useful?

TAN: They, you know, all the classes I took ended up coming back to be useful in a number of ways. And actually, um, you know, the diversity of viewpoints and the diversity of perspectives is something that sat very deeply in me. So anyways, you know, spent some time at NASA. Um, spent some time at Disney with the Imagineers building virtual reality theme parks. This was the late ’90s, early 2000s. So Disney at the time had all the destination theme parks: Disneyland, Disney World, places you would fly for a week and, and spend a week at. Their goal was really to build a theme park in a box that they could drop down into the urban centers, and the only way to get a theme park into a building was digital experiences. And so this was the very early days of VR. We were using, you know, million-dollar military-grade headsets. They were, you know, 18, 19 pounds.

GEHRKE: Wow.

TAN: Disney was one of the companies—and, you know, it’s sat with me for a very long time—that designs experiences for every single person on earth, right. So these headsets had to work on your 2-year-old. They had to work on your 102-year-old. They had to work on, you know, a person who spoke English, who read, who didn’t read, who didn’t speak English. You know, tall, short, large, small, all of it. And they did a wonderful job finding the core of what it means to be human and designing compelling experiences for all of us, um, and that was a ton of fun. We ended up deploying these facilities called DisneyQuest. There was one in Chicago; one in Orlando. They just closed them down a couple of years ago because actually all the VR rights have now migrated into the theme parks themselves.

GEHRKE: And it was actually a VR experience? You would go and sit …

TAN: It was a VR experience. They dropped them down. They had basically buildings. There were, you know, floors full of classic and new-age arcade games. And then there were VR experiences that you could run around in and, um, interact with.

GEHRKE: Interesting. I’ve never, I mean, I lived in Madison for four years, but I’ve never heard of that Quest experience. It seems to be a fun way to experience Disney … by not going to any of the, the theme parks.

TAN: It was super fun. Um, yeah, we, I personally got to work on a couple of rides. There was Pirates of the Caribbean.

GEHRKE: Oh, wow.

TAN: So you put on … a family would put on headsets and kind of run around, shooting pirates and what have you. And then the Aladdin ride was I thought one of the better rides.

GEHRKE: Oh, wow, yeah …

TAN: Where you sit on a magic carpet as you can imagine.

GEHRKE: Oh yeah. That sounds fun.

TAN: It was perfectly scripted for it. Um, anyways, ended up at Microsoft largely because entertainment technology while a lot of fun and while I learned a ton was, uh, strangely unsatisfying, and there was something in me and, you know, that was seeking human impact at scale in a much deeper and much more direct way. And so I thought I’d be here for three or four years largely to learn about the tech industry and how, you know, large pieces of software were deployed before going off and doing the impact work. And I’ve now been here for nearly 20 years.

GEHRKE: And where do you start? Did you start out right away at Microsoft Research, or were you first in a product group?

TAN: My career here has been a cycle of starting in Microsoft Research, incubating, failing, trying again. Failing again. You know, at some point, screaming “Eureka!” [LAUGHTER] and then doing my tours of duty through the product groups, commercializing … productizing, commercializing, you know, seeing it to at least robustness and sustainability if not impact and then coming back and doing it again. Um, and the thing that’s kept me here for so long is every time I’ve completed one of those cycles and thought I was done here, um, the company or the world in some cases would throw, you know, a bigger, thornier, juicier thing in front of me, and Microsoft has always been extremely encouraging, um, and supportive of, you know, taking on those challenges and really innovating and opening up all new, whole new opportunities.

GEHRKE: I mean this whole cycle that you’re talking about, right, of sort of starting out small at MSR (Microsoft Research), you know, having sort of the seed of an, of an idea and then growing it to a bigger project and at some point in time transitioning, transitioning it into, into the product group and actually really making it a business. So tell me about … you said you have done this, you know, a few times and, you know, once you were even highly successful. I’d love to learn more about this because I think it’s so inspiring for everybody to learn more about this.

TAN: Yeah. No, it’s been magical. I have to say before going into any of these stories that none of these paths were architected. As, as you well know, they never are. So actually my, my first experience was as an intern here, and, you know, I was a sort of brash, perhaps rash, intern. I was working on virtual reality, and in the evenings, I would meet with folks around the company to learn more, and I met with a team that was building out multi-monitor functionality in Windows NT. Prior to Windows NT, Windows computers had one and only one monitor, and they started to build the functionality to build multiple. As the brash grad student, you know, I, I had different thoughts about how this should be implemented and, you know, couldn’t convince anyone of it. And so in the evenings, I ended up starting just to build it. At the end of the internship, in addition to all the stuff I was doing, I said, “Hey, by the way, I’ve built this thing. You know, take it or leave it. Here you go.” And it ended up being the thing that was implemented in NT for a variety of reasons. That really got me hooked. Prior to that, I had imagined myself an academic, going back and, you know, being a professor somewhere in academia. And as soon as I saw, you know, the thing I did and that, you know, Microsoft actually polished up and made good in the real world…

GEHRKE: And shipping in millions and millions of desktops, right?

TAN: That’s right. There was no getting away from that.

GEHRKE: OK, right.

TAN: When I first got here, MSR had actually hired me thinking I’d work on virtual reality. And I got here and I said, hey, VR … I’ve just done a ton of VR. VR is probably 15 or 20 years out from being democratized and consumerized. I’m going to do something for a couple of years, and then I’ll come back to this. Um, so I got into computational neuroscience, looking at, um, sensors that scanned or sensed the brain and trying to figure out mental state of people. I had the imagination that this would be useful both for direct interaction but also for understanding human behavior and human actions a little bit better. We won’t go into that work, but, um, what happened with the productization of that was I went … this was at the time when Bill Gates was actually pushing very hard on tablet PCs and the stylus and the pen as an interesting input modality. The realization we had was, hey, we’ve got spatial temporal signal coming off the brain we’re trying to make sense of; the tablet guys had spatial temporal signal coming off a pen they were trying to make sense of in handwriting recognition. And so we went over and we said, hey, what interesting technological assets do you have that we can steal and use on the brain. Turns out they were more convincing than us. And, and so they said, hey, actually you’re right. The problems do look similar. What do you have that you could bring over? And so if you look at the handwriting recognition system even that stands today, it’s a big mess of a neural network, um, largely because that came out of interpreting neural signal that got transferred into the handwriting recognizer.

GEHRKE: I see.

TAN: And so I ended up spending two, maybe 2 1/2 years, working not only on the core recognition engine itself but also the entire interface that ran around the tablet PC and, you know, the tablet input panel.

GEHRKE: But that’s sort of an interesting realization, right. You came because you thought you would land Technology X for Application Y, but actually you land it for a very different application.

TAN: That’s right. And, and each cycle has had a little bit of that surprise and that serendipity, which we’ve now built into the way we do research. And, um, you sort of head down a path because it moves you forward as quickly as possible. But you keep your eyes peeled for the serendipitous detours and the, the discovery that comes out of that. Um, and I think that’s what makes Microsoft Research as an organization, um, so compelling and, and so productive, right, as … we, we do run very fast, but we have the freedoms and, you know, the flexibility really to take these windy paths and to take these detours and, and to go flip over, you know, rocks, some of which end up being, you know, dead ends.

GEHRKE: Right.

TAN: Others of which end up being extremely productive.

GEHRKE: Right. And so if you think about, let’s say, a junior person in the lab, right. They’re sort of looking at you and your career and saying, “Wow, what steps should I take to, you know, become as successful as Desney?” What, what advice would you give them, right? Because it seems like you have always had sort of MSR as sort of your rock, right. But then you jumped over the river a few times, but then came back and jumped over again. Came back.

TAN: First off, I, I don’t know that Desney has been so successful so much as, you know, the people around Desney have been extremely successful and Desney’s gotten to ride the wave. But, yeah, no, I mean every, everyone’s … you know, as I look around the table and the board, you know, everyone has a slightly different journey, and everyone has slightly different work styles and mindsets and personalities and risk tolerance and what have you. Um, so the first thing really is, is not to try to fully emulate anyone else. I always claim we’re, we’re kind of like machine learning models, right. We, we should be taking input data, positive and negative, and building our models of ourselves and our models of the world and really operating within that context. I think having a North Star, whether it’s implicit or explicit, has been extremely useful for myself and the people around me.

GEHRKE: By North Star, you mean like a philosophical North Star or technical North Star or North Star in what you want to be? What, what do you mean?

TAN: Yes, yes, yes. All of it.

GEHRKE: So tell me more about your personal North Star.

TAN: For, for, for us … for myself, it’s really been about human impact, right. Everything we do is centered on human impact. We do research because it’s part … it’s, it’s one of the steps towards achieving human impact. We productize because it’s one of the steps towards human impact. Our jobs are not ever done until we hit the point of human impact, and then they’re not quite done because there’s always more to be had. Um, so I think having that, you know, perhaps a value system, um, at least, you know, sort of grounds you really nicely and, and creates, I think, or can create a courage and a bravery to pursue, which I think is important. You know, different people do this differently, but I have been very lucky in my career to be surrounded by people that have been way, way, way better than myself, um, and, and extremely generous of their passions and their skills and their expertise and their time. You know, ask it and just about any successful person by whatever definition and I think they’ll tell you the same thing, that it’s the people around. And then being tolerant, maybe even seeking of, this windy path. You know, when I was early in the career, I always looked at successful people, or people I deemed to be successful, and it always felt like they had a goal, and it was a very nice straight line to get there, and they did all the right things and, and took all the right steps, and, um, and I don’t know anyone today that I deem to be successful that had a straight-line path and did all the right things.

GEHRKE: Yeah, and it’s often these setbacks in, you know, one’s career that actually give you often some of the best learnings because either of some things that you’ve sort of done structurally wrong or some things that, you know, you really need more experience and, and, you know, that setback gave you that experience. So, so one other question around this is also just around change, right. Because especially right now, we’re living in this time where maybe the rate of change especially in AI is kind of unprecedented. I mean, benchmarks are falling in like a quarter of the time than they would have thought to be lasting. You know, we all have played with ChatGPT. Just extrapolate that out a few more months, if not years, right. OpenAI is here talking about AGI. So how do you think about change for yourself and evolution and learning, and do you have any, any routines? How, how do you keep up with everything that’s going on?

TAN: Yeah, it’s, uh … good question. I guess the overarching philosophy, the approach that I’ve taken with my career, is that everything’s constantly in change. You know, the rate of change may vary, and the type of change and the, the mode of change might vary, but everything’s constantly changing, and so our jobs at any given point are to understand the context in the world, in the organization, with the people around you, and really be doing the best that you can at any given moment. And as that context changes, you kind of have to dynamically morph with it. I subscribe pretty fully to the Lean Startup model. So, you know, formulate hypotheses … and this is the research process really, right. Formulate hypotheses, test them as quickly as you can, learn from that, and then do it again, and rinse and repeat. And then … and, you know, you could sort of plot your path and steer your path through based on that. Um, and so we operate very much on that. As, as the world changes, we change. As, you know, the org changes, we change. And there’s a certain robustness that comes along with that. It’s not all roses, and obviously change is and uncertainty is, is a difficult context to operate in.

GEHRKE: And super interesting because it also speaks to some of the things that one should, um, sort of look out for when doing research, right. If you’re saying, well, I have these hypotheses and I want to quickly test them, right, if I’m in a field or if I work with data that I, you know, cannot really use, where the testing of an hypothesis will take months if not years to bring out, this might not be the best research direction. So how should I think about sort of research, the choice of research problems …

TAN: It’s a good question, yeah.

GEHRKE: … sort of with this, with this change in mind, right?

TAN: Yeah, yeah. Um, I don’t know. I, I’m, I guess … again, I’m brash on this. There are, there are very few problems and spaces that can’t be navigated, um, and so things that seem impossible at first glance are often navigable, you know, with a little bit or maybe sometimes a lot of creativity. Um, you know, if our jobs are to take Microsoft and the rest of the world to places that Microsoft and the rest of the world might not get itself to—hopefully positive places—then we’re going to have to do things in a way that is probably unnatural for Microsoft and the rest of the world, um, to get there. And the company and the organization, MSR, has been extremely supportive of that level of creativity.

GEHRKE: Can you give an example of that for …?

TAN: We had Cortana, which is our speech recognition and conversational engine. We didn’t really have a platform to deploy that on. At the same time, we saw a bunch of physicians, clinicians, struggling with burnout because they were seeing patients for less than half the time. They were spending more of their time sitting in front of the computer, documenting stuff, than they were seeing patients and treating patients. We said, hey, what if you put the two together? What if you sat in the room, listened to the doctor and the patient, and started to automatically generate the documentation? And in fact, if you did that, you could structure the data, which leads for better downstream analytics. Um, and if you did that, you could start to put machine learning and AI and smarts into the system, as well. That project, which was called EmpowerMD, led eventually—after a bunch of missteps and a bunch of learnings and a bunch of creativity—to a very deep partnership with Nuance, um, and creation of Dragon Ambient eXperience and the eventual acquisition thereof of that company. And, um, it’s just a wonderful product line. It’s, you know, kind of a neat way to think about data and intelligence and human augmentation and integration into otherwise messy, noisy human processes. Um, but yeah, you know, I think with enough creativity, um, you know, we’ve, we’ve bumped into very, very few brick walls.

GEHRKE: And what I love about the story is that it’s not about a specific technology choice, but it’s more about a really important problem, right.

TAN: That’s right. Yeah. If your problem is right and if your conviction is right about the value of the solution …

GEHRKE: Yeah.

TAN: …you build teams around it. You build processes around it. You’re creative in the way you execute. And, um, I’d say more times than not, we end up getting there.

GEHRKE: Yeah, well, I love that insight because it’s often much more valuable to solve an important problem than to land some deep technology on a problem that very few people care about …

TAN: I think that’s right.

GEHRKE: …and it seems like that’s what you have done here.

TAN: Yeah.

GEHRKE: Well, it was really great and inspiring to hear from you, Desney. Thanks so much for the conversation.

[OUTRO MUSIC]

TAN: Yeah, thanks for having me, Johannes.

GEHRKE: To learn more about Desney’s work or to see photos of Desney during his winding journey to Microsoft, visit aka.ms/ResearcherStories (opens in new tab).

The post What’s Your Story: Desney Tan appeared first on Microsoft Research.

Read More

Research Focus: Week of November 8, 2023

Research Focus: Week of November 8, 2023

Welcome to Research Focus, a series of blog posts that highlights notable publications, events, code/datasets, new hires and other milestones from across the research community at Microsoft.

Research Focus: November 8, 2023 on a gradient patterned background

NEW RESEARCH

HMD-NeMo: Online 3D Avatar Motion Generation From Sparse Observations

Generating both plausible and accurate full body avatar motion is essential for creating high quality immersive experiences in mixed reality scenarios. Head-mounted devices (HMDs) typically only provide a few input signals, such as head and hands 6-DoF—or the six degrees of freedom of movement by a rigid body in a three-dimensional space. Recent approaches have achieved impressive performance in generating full body motion given only head and hands signal. However, all known existing approaches rely on full hand visibility. While this is the case when using motion controllers, for example, a considerable proportion of mixed reality experiences do not involve motion controllers and instead rely on egocentric hand tracking. This introduces the challenge of partial hand visibility, owing to the restricted field of view of the HMD.

In a recent paper: HMD-NeMo: Online 3D Avatar Motion Generation From Sparse Observations, researchers from Microsoft propose HMD-NeMo, the first unified approach that addresses plausible and accurate full body motion generation even when the hands may be only partially visible. HMD-NeMo is a lightweight neural network that predicts full body motion in an online and real-time fashion. At the heart of HMD-NeMo is a spatio-temporal encoder with novel temporally adaptable mask tokens that encourage plausible motion in the absence of hand observations. The researchers perform extensive analysis of the impact of different components in HMD-NeMo and, through their evaluation, introduce a new state-of-the-art on AMASS, a large database of human motion unifying different optical marker-based motion capture datasets.

Microsoft Research Podcast

Collaborators: Renewable energy storage with Bichlien Nguyen and David Kwabi

Dr. Bichlien Nguyen and Dr. David Kwabi explore their work in flow batteries and how machine learning can help more effectively search the vast organic chemistry space to identify compounds with properties just right for storing waterpower and other renewables.


NEW ARTICLE

Will Code Remain a Relevant User Interface for End-User Programming with Generative AI Models?

The research field of end-user programming has largely been concerned with helping non-experts learn to code well enough to achieve their own tasks. Generative AI stands to obviate this entirely by allowing users to generate code from naturalistic language prompts.

In a recent essay: Will Code Remain a Relevant User Interface for End-User Programming with Generative AI Models?, researchers from Microsoft explore the relevance of “traditional” programming languages for non-expert end-user programmers in a world with generative AI. They posit the “generative shift hypothesis”: that generative AI will create qualitative and quantitative expansions in the traditional scope of end-user programming. They outline some reasons that traditional programming languages may still be relevant and useful for end-user programmers, and speculate whether each of these reasons might endure or disappear with further improvements and innovations in generative AI. And finally, they articulate a set of implications for end-user programming research, including the possibility of needing to revisit many well-established core concepts, such as Ko’s learning barriers and Blackwell’s attention investment model.


NEW RESEARCH

LUT-NN: Empower Efficient Neural Network Inference with Centroid Learning and Table Lookup

On-device deep neural network (DNN) inference, widely used in mobile devices such as smartphones and smartwatches, offers unparalleled intelligent services, but also stresses the limited hardware resources on those devices.

In a recent paper: LUT-NN: Empower Efficient Neural Network Inference with Centroid Learning and Table Lookup, researchers at Microsoft propose a system that consumes less latency, memory, disk, and power, for more efficient DNN inference. LUT-NN learns the typical features for each operator, known as the centroid, and precomputes the results for these centroids to save in lookup tables. During inference, the results of the closest centroids with the inputs can be read directly from the table, as the approximated outputs without computations.

LUT-NN integrates two major novel techniques: (1) differentiable centroid learning through backpropagation, which adapts three levels of approximation to minimize the accuracy impact by centroids; (2) table lookup inference execution, which comprehensively considers different levels of parallelism, memory access reduction, and dedicated hardware units for optimal performance.

The post Research Focus: Week of November 8, 2023 appeared first on Microsoft Research.

Read More

Toward developing faster algorithms for minimizing submodular functions

Toward developing faster algorithms for minimizing submodular functions

This research paper was presented at the 64th IEEE Symposium on Foundations of Computer Science (FOCS) 2023 (opens in new tab), a premier forum for the latest research in theoretical computer science.

FOCS 2023 paper: Toward developing faster algorithms for minimizing submodular functions

Submodular functions are versatile mathematical tools, finding diverse applications in real-world scenarios and guiding solutions across complex domains. From dissecting the intricate networks of graphs to deciphering the complexities of economic landscapes through utility functions, and even navigating the enigmatic world of random variables via entropy functions, they offer valuable insights into challenging problems. Their wide-ranging applicability has made them pivotal tools for modeling and optimization in various theoretical computer science domains, including operations research and game theory. In recent years, submodular functions have gained prominence in solving optimization problems within machine learning (ML) applications. These tasks encompass vital areas such as feature selection and clustering, as illustrated in Figure 1. Additionally, submodular functions are instrumental in applications like sensor placement and graphical models. For further exploration, comprehensive resources are available in Bilmes’ insightful survey (opens in new tab) and Bach’s standard textbook (opens in new tab) on this subject.

Two graphics. The left graphic depicts the process of feature selection, beginning with all the features on the top, then the unselected features crossed in the middle, and finally the selected features remain at the bottom. The right graphic shows the process of clustering, where a set of points in 2D are assigned different colors so that points with the same color are physically close to each other to form a cluster.
Figure 1. Application of submodular function optimization to feature selection, on the left, and clustering on the right.

Algorithm design for submodular function minimization

In a joint paper with researchers from Stanford University, “Sparse Submodular Function Minimization(opens in new tab) (opens in new tab),” presented at FOCS 2023(opens in new tab) (opens in new tab), we investigate the problem of minimizing a submodular function in the standard model.   Here, we assume that the submodular function can be accessed through an evaluation oracle that returns the value ( f(S) ) in response to a query with a set ( S ). This is the most classical and well-studied model for studying algorithm design for minimizing submodular functions.

Before we discuss our study, it’s important to bear in mind that a submodular function ( f ) is defined on subsets of a finite set of elements ( V ) that satisfy a diminishing marginal difference property. That is, for any two subsets ( S subseteq T ) and any element ( e in V setminus T ), the marginal value of ( e ) when added to the smaller set ( f(S cup {e}) – f(S) ) is at least the marginal value of ( e ) when added to the bigger set ( f(T cup {e}) – f(T) ).

In the 1980s, foundational work (opens in new tab) revealed that submodular functions could be minimized in polynomial time, marking a significant breakthrough. Since then, researchers have made substantial progress in the quest for faster algorithms for submodular function minimization (SFM). Despite these efforts, fundamental questions persist, such as determining the minimum number of queries required to minimize any given submodular function—a concept referred to as the problem’s query complexity.

Currently, the most advanced algorithm needs to make ( widetilde{O}(n^2) ) queries for any given submodular function, while the best lower bound is only ( widetilde{Omega}(n) ), where (n) is the size of the ground set on which the submodular function is defined. This disparity results in a substantial gap, leaving an (n)-fold difference between the existing upper and lower bounds.

Given this considerable difference, a natural question arises: What additional structural assumptions could potentially pave the way for faster algorithms in submodular function minimization (SFM)? One prevalent assumption is sparsity, which posits that the size of the set minimizing the submodular function is small. This holds particular relevance in diverse applications, including signal processing, feature selection, and compressed sensing. In these scenarios, solutions are expected to exhibit sparse non-zero entries, making it important to understand how algorithmic complexity depends on sparsity, as it provides insights into the intricate combinatorial and geometric structures of the problems.

Interestingly, existing algorithmic techniques developed over the past four decades for SFM do not yield improved runtimes even when the solution is sparse. Therefore, it is imperative to develop innovative techniques that can drive advancements in sparse SFM and bridge the existing gap between upper and lower bounds.

Microsoft Research Podcast

AI Frontiers: The future of causal reasoning with Emre Kiciman and Amit Sharma

Emre Kiciman and Amit Sharma discuss their paper “Causal Reasoning and Large Language Models: Opening a New Frontier for Causality” and how it examines the causal capabilities of large language models (LLMs) and their implications.


Parallel algorithms for submodular function minimization

Exploring beyond SFM’s query complexity, recent research has shed light on the importance of sparse SFM, particularly in understanding the inherent adaptivity of parallel algorithms (known as parallel complexity) designed to solve the problem. Research has shown that any parallel algorithm for SFM requires a minimum adaptivity that is a polynomial in the size of the ground set.

Our results improve both parallel and sequential algorithms for SFM. For example, consider a scenario where the minimizer of the given submodular function is (widetilde{O}(1))-sparse. In this context, our parallel algorithm runs in a nearly constant number of rounds, while our sequential algorithm makes a nearly linear number of queries. This achievement stands in stark contrast with the previous best parallel upper bound of (widetilde{O}(n)) and the best query complexity upper bound of (widetilde{O}(n^2)).

Fast first-order methods for exact submodular function minimization

Current fast algorithms for SFM rely on cutting-plane methods, a standard class of convex optimization techniques applied to the Lovász extension—a natural continuous extension of the given submodular function. However, restricting the optimization domain to sparse solutions doesn’t significantly expedite cutting-plane methods beyond a logarithmic factor. To address this, we shifted our approach and employed first-order methods, including stochastic mirror descent, to minimize the Lovász extension. These methods, non-Euclidean generalizations of stochastic gradient descent, are more attuned to problem geometry. Unlike cutting-plane methods, first-order methods exhibit a polynomial convergence rate, rather than a polylogarithmic dependency on the additive error concerning the optimal solution. 

This rate of convergence indicates that first-order methods are better suited for approximate submodular function minimization, while our goal is to solve it exactly. Using the sparsity assumption, we developed a new algorithmic framework for SFM based on a new concept of duality. We used this framework to demonstrate how first-order methods, with substantially reduced accuracy requirements, can be applied to solve SFM exactly.

Toward faster algorithms for SFM and its applications

These techniques not only promise advancements for sparse SFM but also provide a foundation for tackling other fundamental problems in SFM theory. Our algorithms for sparse SFM serve as valuable starting points for designing improved algorithms for related problems. They offer potential insights into developing polynomial-time algorithms for SFM with lower query and parallel complexity, opening avenues for future research.

Traditionally, research on submodular function minimization has focused on the global properties of the problem over the past four decades. Sparse SFM, in contrast, enables us to explore local and more refined structures of submodular functions. Our work introduces new algorithmic tools that better use these structural properties, a vital aspect for applications in ML and operations research, because these areas often have special structures. Beyond advancing sparse SFM, our paradigm paves the way for the development of enhanced algorithms for SFM and its diverse applications.

The post Toward developing faster algorithms for minimizing submodular functions appeared first on Microsoft Research.

Read More

Teachers in India help Microsoft Research design AI tool for creating great classroom content

Teachers in India help Microsoft Research design AI tool for creating great classroom content

a group of people sitting at a desk in front of a crowd

Teachers are the backbone of any educational system. They are not just educators; they are indispensable navigators, mentors, and leaders. Teachers around the world face many challenges, which vary from country to country or even within a city or town. But some challenges are universal, including time management, classroom organization, and creating effective lesson plans.

Advances in AI present new opportunities to enhance teachers’ abilities and empower students to learn more effectively. That’s the goal of a new project from Microsoft Research, which uses generative AI to help teachers quickly develop personalized learning experiences, design assignments, create hands-on activities, and more, while giving them back hours of time that they spend on daily planning today.

Shiksha copilot is a research project which is an interdisciplinary collaboration between Microsoft Research India and teams across Microsoft. Shiksha (Sanskrit: शिक्षा, IAST and ISO: śikṣā) is a Sanskrit word, which means “instruction, lesson, learning, study of skill”. The project aims to improve learning outcomes and empower teachers to create comprehensive, age-appropriate lesson plans combining the best available online resources, including textbooks, videos, classroom activities, and student assessment tools. To help curate these resources, the project team built a copilot—an AI-powered digital assistant—centered around teachers’ specific needs, which were identified right at the start through multiple interviews and workshops.

Working with Sikshana Foundation (opens in new tab), a local non-governmental organization focused on improving public education, the researchers are piloting this program at several public schools in and around Bengaluru, India, to build and improve the underlying tools. This post gives an overview of the project, including interviews with three teachers who have used Shiksha copilot in their own classrooms.

Spotlight: Microsoft research newsletter

Microsoft Research Newsletter

Stay connected to the research community at Microsoft.


A road map for teachers

A lesson plan is like a road map charting what students need to learn and how to efficiently cover the material during class time. It includes three key components:​

  • Objectives for student learning, based on grade level and subject​  
  • Teaching and learning tactics, including tutorials and activities to help students understand the topic
  • Strategies to assess student understanding, both in class and through homework 

Parimala H V teaches science in grades 6-8 at Government Higher Primary School, Santhe Beedhi in Bengaluru. She teaches in the local language, Kannada, and in English. For each class she teaches, she spends an hour or more each day scanning textbooks and printed materials to put together an effective lesson plan. She also searches the internet for ideas, but sifting through the growing body of online content could take just as long. Often she would work till midnight planning the next day’s activities, which left her feeling tired and stressed.

“Lesson planning can be a struggle, but it’s very important,” Parimala said. “If the planning goes well, everything goes well.”

With Shiksha copilot, Parimala was able to develop a complete lesson plan in 60 to 90 seconds, instead of 60 to 90 minutes. The simple interface asks basic questions about the curriculum, language of delivery, grade level, and subject. It then compiles engaging learning materials to achieve the teacher’s classroom objectives. Parimala finds better ideas and hands-on activities using Shiksha copilot than through other online tools. She feels well rested and better prepared for her day, which also makes her happier in the classroom. And with the time she saves, she can focus more on coaching her students and improving her teaching practices.

Ms. Parimala standing in front of a school

“I was thrilled to have the opportunity to use Shiksha copilot,” Parimala said. “It could be very useful for new teachers just learning their profession. I think it could revolutionize the way teachers teach.” 

Parimala H.V., Teacher, Government Higher Primary School, Santhee Beedhi

At Parimala’s school and others in the Bengaluru area, teachers face some significant challenges. Classrooms can have up to 70 students of varying abilities. Teachers often need to prepare lessons and give instruction in both English and Kannada. As the Covid pandemic brought about remote learning on a large scale, technology began to rapidly change how teachers and students interact. Most students now have computers or smartphones, expanding teachers’ options. But it also makes it harder to keep students focused on a traditional classroom blackboard.

“These children are addicted to their mobile phones and social media. If I use the ‘chalk and talk’ method in class, they may get bored,” said Gireesh K S, who relies heavily on his blackboard to teach math and physics at Government High School, Jalige. Gireesh has used web search tools to find digital resources like interactive PowerPoint slides that will hold his students’ attention longer. With Shiksha copilot, he can zero in more quickly on videos or classroom activities that help him connect better with all 40+ students in his class.

“Here lies the teacher’s job. The teacher has to select whichever activity, whichever video, or whichever questions to use,” Gireesh said. “There are so many questions and videos (to choose from), but as a teacher for my class, I know my students. So, I have to select the suitable ones.”

Other learning platforms were less flexible and less dynamic, returning static content options that were not always useful for a diverse group of learners. Shiksha copilot, on the other hand, does a much better job of customizing and adapting its recommendations based on teacher input, Gireesh said.

“Shiksha copilot is very easy to use when compared to other AI we have tried, because it is mapped with our own syllabus and our own curriculum.”

Gireesh K S, Teacher, Government High School, Jalige

Mr. Gireesh KM posing for the camera

Behind the technology

Designing and building Shiksha copilot requires various technological innovations. Educational content is mainly multimodal, including text, images, tables, videos, charts, and interactive materials. Therefore, for developing engaging learning experiences, it is essential to build generative AI models which have unified multimodal capabilities. Also, these experiences are most impactful when delivered in native languages, which requires improving the multilingual capabilities of generative AI models.

Shiksha copilot includes a range of powerful features that address those challenges and enhance the educational experience. It’s grounded in specific curricula and learning objectives, to ensure that all generated content aligns with desired educational outcomes, according to Akshay Nambi (opens in new tab), principal researcher at Microsoft Research. “This grounding is enabled by ingesting relevant data with the help of state-of-the-art optical character recognition (OCR), computer vision (CV) and generative AI models. It was also important to use natural language and support voice-based interactions while including options for English and Kannada speakers,” Nambi said. 

Shiksha copilot supports connectivity to both public and private resource content, enabling educators to tap into a vast array of materials and tailor them to their unique teaching requirements. Shiksha copilot can be accessed through different modalities, such as WhatsApp, Telegram, and web applications, enabling seamless integration with teachers’ current workflows.

To help create content more quickly and efficiently, the system leverages semantic caching with LLMs. Storing and reusing previously processed educational content reduces computational resources required to deliver a scalable, and affordable copilot experience. Throughout development, the project team followed established protocols regarding safety, reliability and trustworthiness.

“Extensive prompt designing, testing and rigorous responsible AI procedures, including content filtering and moderation, red team assessments and jailbreaking simulations, have been deployed to maximize safety and reliability. These measures are in place so that Shiksha copilot consistently produces factual and trustworthy content,” said Tanuja Ganu, principal research SDE manager at Microsoft Research.

Convincing the skeptics

Before the initial workshop, some teachers expressed skepticism about using AI for lesson planning. Students already have multiple digital learning tools. But for Mahalakshmi A, who teaches standard science in grades 4-8 at rural Government Higher Primary School, Basavana Halli, outside Bengaluru, the value for teachers was less clear. However, during a two-hour initial workshop session, Mahalakshmi found she could easily create multiple lesson plans using Shiksha copilot that would work well in her classroom.

Ms. Mahalakshmi standing in front of a classroom

“I felt very happy because it’s a totally different concept. Before now, I could see that technology could work for the students. But this is the first time that it felt like the teachers also had a tool for themselves.”

Mahalakshmi A., Teacher, Government Higher Primary School, Basavana Halli

Mahalakshmi could also see how the content assembled using Shiksha copilot would make her class more interesting for her students, which is an important goal. “Instead of giving them the same problems, the same experiments, and the same videos, we make learning interesting. And then they learn what we call shashwatha kalike, or permanent learning. With Shiksha copilot, we can make that permanent learning happen in our classroom,” she added.

Next steps

The initial pilot program for Shiksha copilot is underway at more than 10 schools in and around Bengaluru. The goal is to let the teachers experience how Shiksha copilot can best be used in their daily workflows to improve learning experiences and collect feedback. The early response has been highly positive, with teachers expressing great satisfaction in both the quality of the content generated and the time savings. To build on this successful pilot, researchers are gearing up to scale Shiksha copilot in schools across the state of Karnataka and beyond, in collaboration with Sikshana Foundation.

This copilot is being developed as part of Project VeLLM (Universal Empowerment with Large Language Models) at Microsoft Research India. VeLLM’s goal is to make inclusive and accessible copilots available to everyone by building a platform for developing population-scale copilots. Inclusive copilots must address various real-world challenges, such as a multilingual user base, varied skillsets, limited devices and connectivity, domain-specific understanding, guardrails, and safety principles. Shiksha is the first copilot developed using the VeLLM platform. The VeLLM team is working with collaborators across diverse domains, such as agriculture and healthcare, to develop tailored domain-specific copilot experiences utilizing the platform and addressing associated research problems. 

To learn more about the project or collaboration opportunities, email the team at shikshacopilot@microsoft.com

Group photo (from left to right): Meena Elapulli (MSR), Ishaan Watts (MSR), Kavyansh Chourasia (MSR), Gireesh K.S. (GHPS, Tumkur), Srujana V S (MSR), Tanuja Ganu (MSR), Mahalakshmi A (GHPS, Basavana Halli), Parimala H.V. (GHPS,Santhe Beedi), Ravi R (GHPS,Gowdahalli), Maruthi K.R. (GHPS, Anedoddi), Smitha Venkatesh (Sikshana Foundation),  Akshay Nambi (MSR), Somnath Kumar (MSR), Yash Gadhia (MSR), Sanchit Gupta (MSR)
The Shiksha copilot team and collaborators (from left to right): Meena Elapulli (Microsoft Research), Ishaan Watts (Microsoft Research), Kavyansh Chourasia (Microsoft Research), Gireesh K.S. (GHPS, Tumkur), Srujana V S (Microsoft Research), Tanuja Ganu (Microsoft Research), Mahalakshmi A (GHPS, Basavana Halli), Parimala H.V. (GHPS, Santhe Beedi), Ravi R (GHPS, Gowdahalli), Maruthi K.R. (GHPS, Anedoddi), Smitha Venkatesh (Sikshana Foundation), Akshay Nambi (Microsoft Research), Somnath Kumar (Microsoft Research), Yash Gadhia (Microsoft Research), Sanchit Gupta (Microsoft Research)

The post Teachers in India help Microsoft Research design AI tool for creating great classroom content appeared first on Microsoft Research.

Read More

Data Formulator: A concept-driven, AI-powered approach to data visualization

Data Formulator: A concept-driven, AI-powered approach to data visualization

This research paper was presented at the IEEE Visualization Conference (opens in new tab) (VIS 2023), the premier forum for advances in visualization and visual analytics.

The VIS2023 logo to the left of the first page of an accepted research paper

Effective data visualization plays a crucial role in data analysis. It enables data analysts and others to explore complex datasets, comprehend patterns, and convey meaningful insights to various stakeholders. Today, there are numerous tools for creating visual representations of data. However, these tools only work with tidy data, meaning that data points must be organized according to the specific categories required by the tool’s visualization format. This poses significant challenges for data analysts, requiring the use of additional tools to transform raw data into a compatible format before it is entered into one of these visualization tools.

For instance, consider a dataset displaying 2020 temperatures in Seattle and Atlanta. If an analyst aims to create a scatter plot comparing the temperatures of these two US cities on the x/y-axes, data transformation is essential. The visualization tool mandates separate columns for Seattle and Atlanta temperatures to map to the scatter plot’s axes. Consequently, the analyst must pivot the input table to generate these columns. Moreover, if the analyst intends to compare which city experiences warmer days or create a smoothed line chart illustrating Seattle’s 7-day moving average temperature, further computations on the transformed data are necessary. Fields like “Warmer” and “Seattle 7-day Moving Avg” need to be calculated to facilitate the visualization, as depicted in Figure 1. This intricate process highlights the complexity and expertise currently needed to prepare raw data for effective visualization.

A figure with upper left showing an input data table with three columns Date, City and Temperature showing temperatures of Seattle and Atlanta from 2020-01-01 to 2020-12-31. On its right side show three visualizations that the user wants to create: (1) a scatter plot to compare their temperatures, (2) a histogram to show number days each city is warmer, and (3) a line chart shows Seattle moving average temperature; and the user cannot create these visualizations because the input table is not in the right format. At the bottom of the figure, it shows a data table that the analyst needs to transform from the input table in order to create desired visualizations. This table contains six columns: Date, Seattle Temp, Atlanta Temp, Warmer, Difference and Seattle Temp Moving Average. There is an emoji of “confusion” to express that the data transformation process can be challenging.
Figure 1. A data analyst wants to compare 2020 temperatures in Seattle and Atlanta using visualizations like scatter plots and histograms. However, the original dataset lacks necessary columns (“Seattle Temp,” “Atlanta Temp,” “Warmer,” and “Seattle Temp Moving Average”) for these visualizations. Data transformation is needed to include these fields.

This hurdle is particularly daunting because it necessitates a certain level of programming expertise or familiarity with additional data processing tools. It highlights the complexities of data visualization and underscores the need for an easier and more seamless process for data analysts, enabling them to create impactful visualizations regardless of their technical background.

Against the backdrop of rapid advancements in learning language models (LLMs) and programming-by-example techniques, researchers have made significant strides in breaking down these barriers. In this context, we share our paper, “Data Formulator: AI-powered Concept-driven Visualization Authoring (opens in new tab),” presented at VIS 2023 (opens in new tab) and winner of the Best Paper Honorable Mention (opens in new tab) award. Data Formulator is an AI-powered visualization authoring tool developed through a collaboration between researchers studying AI and those studying human-computer interaction (HCI). The result is a new visualization paradigm that separates high-level visualization intents from low-level data transformation steps. The process begins with data analysts articulating their visualization ideas as data concepts. These concepts refer to specific data categories, or fields, that analysts want to visualize, even though they are not present in the raw input data. This way, they effectively convey their visualization intent with the AI agent, which, in turn, assists them in implementing their visualization.

Defining data concepts and creating visualizations

The way Data Formula operates is straightforward. The analyst defines the specific data concepts they plan to visualize, either through natural language queries or by providing categories, or example entries for the concept. Once these concepts are defined, they are linked to appropriate visual representation, as illustrated in Figure 2.

A figure shows the user interface of Data Formulator and steps for an analyst to interact with the interface. At the right side shows the concept shelf, there is an annotation that reads “1. Concept Shelf: create and derive new concepts needed for visualization”. To its left is the Chart Builder panel, with an annotation “2. Chart Builder: encode data concepts to visual channels”. The bottom left side is a table view that shows the input data, the annotation reads “3. Data View: inspect the original and derive tables”. The top left is the visualization panel that shows visualizations generated by Data Formulator, the annotation reads “4. Visualization View: explore generated visualizations.”
Figure 2. The Data Formulator user interface. Data Formulator has four panels: (1) the Concept Shelf, for defining new data concepts to be visualized, (2) the Chart Builder, for specifying the visualization type, (3) the Table View, for analysts to inspect data automatically generated by Data Formulator, and (4) the Visualization Panel, for presenting final visualizations.

If the analyst defines concepts through examples, Data Formulator engages a program synthesizer, which generates a specialized data reshaping program, transforming the provided data to bring out the required data fields. Conversely, when an analyst introduces a new concept using natural language queries, Data Formulator calls on LLMs to generate code, which facilitates the creation of a new data category based on the provided description. In both cases, Data Formulator compiles the transformed data into a structured table and creates corresponding visualizations.

We recognize that analyst specifications can be ambiguous, so we designed Data Formulator to generate multiple visualization options to help them identify what they want. The tool also provides analysts with the AI-generated transformation program and the transformed data for inspection. This transparency helps analysts refine their intent for future iterations.

In continuing our Seattle/Atlanta temperatures example, the following two figures show how analysts can use Data Formulator to create visualizations without reformatting raw data using an external tool. Instead, the analyst provides example entries in the form of temperature values to create new the data concepts “Seattle Temp” and “Atlanta Temp,” shown in Figure 3. The analyst uses these natural language queries to create the new concept “Warmer” and instructs Data Formulator to format the data so that it can be visualized, shown in Figure 4.

The figure shows the workflow of the analyst to create new data concepts “Atlanta Temp” and “Seattle Temp” using examples. The left figure shows that the user opens a panel in Data Formulator’s concept shelf, typed the concept name “Atlanta Temp”, and provide example temperature values “45, 47, 56, 41” to define the concept. Then, the user drags Atlanta Temp concept to y-axis in the Chart Builder (the Seattle Temp concept is already placed in the x-axis box). The analyst then completes an example table with two columns Atlanta Temp, Seattle Temp with two rows (row 1 contains two values 45, 51, row contains values 47, 45) to demonstrate the relation between these two concepts. Finally, the analyst clicks “Formulate” button and Data Formulator returns the transformed data (with columns “#”, “Seattle Temp”, “Atlanta Temp”, “Date”) and a scatter plot that visualizes the data with Seattle Temp on x axis, Atlanta Temp on y axis.
Figure 3. The analyst creates new data concepts “Atlanta Temp”, “Seattle Temp” using examples. The AI agent solves a programming-by-example problem to create the new concepts for visualization.
The figure shows the workflow of the analyst to create new data concepts “Warmer” using natural language query. The left figure shows that the user opens a panel in Data Formulator’s concept shelf. The user selected “derived from” two concepts “Seattle Temp” and “Atlanta Temp” and typed the concept name “Warmer”. The user also provides a natural language query “Which is the warmer city, or the same” to describe the concept. After clicking a “forge” icon, in the second box shows the concept with the instantiated concept which contains an example table: the example table has 5 rows and header “Seattle Temp, Atlanta Temp, Warmer”, and the rows show “51, 45, Seattle”, “38, 58, Atlanta”, “44, 65, Atlanta”, “42, 60, Atlanta”, “35, 62, Atlanta”. The user then clicks the inspect button, and Data Formulator opens a panel that shows the code that achieve the transformation. Finally, the analyst clicks “save” button after inspecting the code to confirm the code is correct.
Figure 4. The analyst creates a new data concept “Warmer” using natural language description. Data Formulator calls LLMs to generate a transformation program to derive the new concept.

Looking ahead: Analyst-AI collaboration in data analysis

AI-powered data analysis tools have the potential to significantly streamline the entire data analysis process by consolidating various tasks into a single tool. Beyond just visualization, this concept-driven technique can be applied to data cleaning, data integration, visual data exploration, and visual storytelling. Our vision is for an AI system to take high-level instruction from the user and automatically recommend the necessary steps across the entire data analysis pipeline, enabling collaboration between the user and the AI agent to achieve their data visualization goals.

Inevitably, data analysts will need to tackle more complex tasks beyond the scope mentioned here. For this reason, it’s crucial to consider how to design AI-powered tools that effectively convey results to the analyst that are uncertain, ambiguous, or incorrect. This ensures that the analyst can trust the tool and collaborate effectively with AI to accomplish their objectives.

The post Data Formulator: A concept-driven, AI-powered approach to data visualization appeared first on Microsoft Research.

Read More

Project Silica: Sustainable cloud archival storage in glass

Project Silica: Sustainable cloud archival storage in glass

This research paper was presented at the 29th ACM Symposium on Operating Systems Principles (opens in new tab) (SOSP 2023), the premier forum for the theory and practice of computer systems software.

SOSP 2023
Project Silica: Towards Sustainable Cloud Archival Storage in Glass

Data growth demands a sustainable archival solution

For millennia, data has woven itself into every facet of our lives, from business and academia to personal spheres. Our production of data is staggering, encompassing personal photos, medical records, financial data, scientific insights, and more. By 2025, it’s estimated that we will generate a massive 175 zettabytes of data annually. Amidst this deluge, a substantial portion is vital for preserving our collective heritage and personal histories.  

Presently, magnetic technologies like tape and hard disk drives provide the most economical storage, but they come with limitations. Magnetic media lacks the longevity and durability essential for enduring archival storage, requiring data to be periodically migrated to new media—for hard disk drives, this is every five years, for magnetic tape, it’s around ten. Moreover, ensuring data longevity on magnetic media requires regular “scrubbing,” a process involving reading data to identify corruption and fixing any errors. This leads to substantial energy consumption. We need a sustainable solution, one that ensures the preservation of our digital heritage without imposing an ongoing environmental and financial burden.

Project Silica: Sustainable and durable cloud archival storage

Our paper, “Project Silica: Towards Sustainable Cloud Archival Storage in Glass, (opens in new tab)” presented at SOSP 2023 (opens in new tab), describes Project Silica, a cloud-based storage system underpinned by quartz glass. This type of glass is a durable, chemically inert, and resilient low-cost media, impervious to electromagnetic interference. With data’s lifespan lasting thousands of years, quartz glass is ideal for archival storage, offering a sustainable solution and eliminating the need for periodic data refreshes.

Writing, reading, and decoding data

Ultrafast femtosecond lasers enable the writing process. Data is written inside a square glass platter similar in size to a DVD through voxels, permanent modifications to the physical structure of the glass made using femtosecond-scale laser pulses. Voxels encode multiple bits of data and are written in 2D layers across the XY plane. Hundreds of these layers are then stacked in the Z axis. To achieve high write throughput, we rapidly scan the laser pulses across the length of the media using a scanner similar to that used in barcode readers. 

To read data, we employ polarization microscopy to image the platter. The read drive scans sectors in a single swift Z-pattern, and the resulting images are processed for decoding. Different read drive options offer varying throughput, balancing cost and performance.

Data decoding relies on ML models that analyze images captured by the read drive, accurately converting signals from analog to digital. The glass library design includes independent read, write, and storage racks. Platters are stored in power-free storage racks and moved by free-roaming shuttles, ensuring minimal resource consumption for passive storage, as shown in Video 1. A one-way system between write racks and the rest of the library ensures that a written platter cannot be over-written under any circumstances, enforcing data integrity.

Video 1. The Silica library prototype demonstrates the flexible and scalable design of the system and its ability to sustainably service archival workloads. 

Azure workload analysis informs Silica’s design

To build an optimal storage system around the core Silica technology, we extensively studied cloud archival data workloads from Azure Storage. Surprisingly, we discovered that small read requests dominate the read workload, yet a small percentage of requests constitute the majority of read bytes, creating a skewed distribution, as illustrated in Figure 1.

Project Silica paper at SOSP 2023: A double bar chart with 2 y-axes: percentage of total read operations on the left y-axis, and percentage of total bytes read on the right y-axis; with file size buckets on the x-axis. The graph shows that the majority of read operations are for files with small file sizes, but they only make up a small fraction of all the bytes read (i.e., 58% of operations are for file sizes smaller than 4MB, but make up less than 1.2% of all bytes read). Conversely, most bytes read are for large files, but make up a small fraction of all read operations (i.e., 85% of bytes read are for files larger than 256MB, but make up less than 2% of requests).
Figure 1. The distribution of read request sizes. Most requests are for small files, but they make up a small percentage of the total load in bytes.

This implies that minimizing the latency of mechanical movement in the library is crucial for optimal performance. Silica glass, a random-seeking storage medium, can suitably meet these requirements as it eliminates the necessity for spooling, unlike magnetic tape. Figure 2 illustrates substantial differences in read demand across various datacenters. These results suggest that we need a flexible library design that can scale resources for each datacenter’s workload. Studying these archival workloads has been instrumental in helping us establish the core design principles for the Silica storage system.

Project Silica paper at SOSP 2023: Figure 2. A bar chart showing different, unlabeled data centers on the x-axis, and tail over median read throughput on the y-axis on a log scale. The graph shows up to 7 orders of magnitude mean-to-tail difference within a data center, and up to 5 orders of magnitude variability in the mean-to-tail difference across different data centers.
Figure 2. Tail over median read load for different datacenters. The data shows significant variation across and within datacenters.

Microsoft Research Podcast

Collaborators: Renewable energy storage with Bichlien Nguyen and David Kwabi

Dr. Bichlien Nguyen and Dr. David Kwabi explore their work in flow batteries and how machine learning can help more effectively search the vast organic chemistry space to identify compounds with properties just right for storing waterpower and other renewables.


Project Silica’s versatile storage system

We designed and evaluated a comprehensive storage system that manages error correction, data layout, request scheduling, and shuttle traffic management. Our design effectively manages IOPS-intensive tasks, meeting the expected service level objective (SLO) of an archival storage tier, approximately 15 hours. Interestingly, even in volume-intensive scenarios where a large number of bytes are read, our system efficiently handles requests using read drives with low throughput. In both cases, throughput demands are significantly below those of traditional tape drives. This is shown in Figure 3. The paper provides an extensive description of this system, and the video above shows our prototype library’s capabilities. 

Project Silica paper at SOSP 2023: Figure 3. A line chart with 3 lines: Volume, IOPS, and Typical. The x-axis shows Read drive throughput ranging from 30MB/s to 210MB/s in increments of 30, and the y-axis shows the tail completion time in hours of the system running each of the workloads represented by each line. The graph shows that all workloads complete within the desired 15-hour SLO, even with 30MB/s read drives. The SLO improves as read drive throughput increases, but starts to plateau past 60MB/s for all workloads.
Figure 3. Volume and IOPS workloads represent different extremes in the spectrum of read workloads. Our design can service both workloads well within the expected SLO for an archival storage tier, at about 15 hours.

Diverse applications for sustainably archiving humanity’s data

Project Silica holds promise in numerous sectors, such as healthcare, scientific research, and finance, where secure and durable archival storage of sensitive data is crucial. Research institutions could benefit from Silica’s ability to store vast datasets generated from experiments and simulations, ensuring the integrity and accessibility of research findings over time. Similarly, healthcare organizations could securely archive patient records, medical imaging data, and research outcomes for long-term reference and analysis. 

As the volume of globally generated data grows, traditional storage solutions will continue to face challenges in terms of scalability, energy-efficiency, and long-term durability. Moreover, as technologies like AI and advanced analytics progress, the need for reliable and accessible archival data will continue to intensify. Project Silica is well-positioned to play a pivotal role in supporting these technologies by providing a stable, secure, and sustainable repository for the vast amounts of data we create and rely on.

The post Project Silica: Sustainable cloud archival storage in glass appeared first on Microsoft Research.

Read More

Research Focus: Week of October 23, 2023

Research Focus: Week of October 23, 2023

Welcome to Research Focus, a series of blog posts that highlights notable publications, events, code/datasets, new hires and other milestones from across the research community at Microsoft.

Research Focus: October 25, 2023

NEW RESEARCH

Kosmos-2.5: A Multimodal Literate Model 

Current large language models (LLMs) primarily focus on textual information and cannot understand visual information. However, advancements in the field of multimodal large language models (MLLMs) aim to address this limitation. MLLMs combine visual and textual information within a single Transformer-based model, enabling the model to learn and generate content based on both modalities.

While existing MLLMs have mainly focused on natural images with lower resolutions, the exploration of text images requires further investigation. Incorporating text images into the training process and developing models based on textual and visual information can unlock new possibilities for multimodal applications involving high-resolution text-intensive images.

In a new paper: Kosmos-2.5: A Multimodal Literate Model, researchers from Microsoft present Kosmos-2.5, a MLLM for machine reading of text-intensive images. Pre-trained on large-scale text-intensive images, Kosmos-2.5 excels in: (1) generating spatially-aware text blocks, where each block of text is assigned its spatial coordinates within the image, and (2) producing structured text output that captures styles and structures into the markdown format. The model can be adapted for any text-intensive image understanding task with different prompts through supervised fine-tuning. This work paves the way for the future scaling of MLLMs.

Microsoft Research Podcast

AI Frontiers: The future of causal reasoning with Emre Kiciman and Amit Sharma

Emre Kiciman and Amit Sharma discuss their paper “Causal Reasoning and Large Language Models: Opening a New Frontier for Causality” and how it examines the causal capabilities of large language models (LLMs) and their implications.


NEW RESEARCH

Evaluation of Dependency Structure for Multivariate Weather Predictors using Copulas

In the Global South (opens in new tab), climate change is driving more frequent and severe weather events such as droughts, floods, and storms. This leads to crop failures, food insecurity, and job loss. These effects are expected to increase in intensity, further disadvantaging marginalized communities and exacerbating existing inequalities. The need for prevention and adaptation is urgent. But despite advances in machine learning and numerical modeling, accurate weather forecasting remains challenging, due to complex interactions among atmospheric and oceanic variables.

In a new paper: Evaluation of Dependency Structure for Multivariate Weather Predictors using Copulas, researchers from Microsoft explore the potential of vine copulas to explain complex relationships of different weather variables in three African locations. Copulas separate marginal distributions from the dependency structure, offering a flexible way to model dependence between random variables for improved risk assessments and simulations. Vine copulas are based on a variety of bivariate copulas, including Gaussian, Student’s t, Clayton, Gumbel, and Frank copulas. They are effective in high-dimensional problems and offer a hierarchy of trees to express conditional dependence. The researchers propose applying this framework within subseasonal forecasting models to enhance the prediction of different weather events or variables.


NEW RESEARCH

Adaptive Training System

Adaptive training has been defined as training in which the problem, stimulus, or task is varied as a function of how well the trainee performs. Researchers have shown that this type of training outperforms comparative training that is non-adaptive or fixed across a range of populations and learning contexts. Virtual reality offers new opportunities for applying this type of training and has already demonstrated its effectiveness (opens in new tab) across a variety of simulated tasks. By using a computational model of the training process, we can derive recommendations for optimal scenario difficulty, resulting in faster and enhanced training.

In a new paper: Adaptive Training System, researchers from Microsoft propose an adaptive training algorithm that accelerates the training process based on a parametric model of trainees and training scenarios. The proposed approach makes trial-by-trial recommendations on optimal scenario difficulty selections to maximize improvements in the trainee’s absolute skill level. The Adaptive Training System is applied to the task of training pilots on a virtual reality flight simulator. The system was designed for scenarios varying in difficulty from easy, with full visibility, to flight in fog with side wind, which is difficult even for experienced pilots. 

Adaptive Training System applied to the task of training pilots on a virtual reality flight simulator. On the left, a flight scenario with fog. On the right, a flight scenario with full visibility.

NEW RESEARCH

CodePlan: Repository-level Coding using LLMs and Planning

Software engineering activities such as package migration, fixing error reports from static analysis or testing, and adding type annotations or other specifications to a codebase, involve pervasively editing the entire repository of code. These activities are formulated as repository-level coding tasks.

Large language model-powered coding assistants, like GitHub Copilot, have succeeded in offering high-quality solutions to localized coding problems. But repository-level coding tasks are more involved and cannot be solved directly using LLMs, since code within a repository is interdependent and the entire repository may be too large to fit into the prompt.

In a new paper: CodePlan: Repository-level Coding using LLMs and Planning, researchers from Microsoft frame LLM-driven repository-level coding as a planning problem, where the goal is to take the repository from its initial state to a target state whose specifications are provided in natural language. They present CodePlan, a task-agnostic framework, to solve it by synthesizing a multi-step chain of edits, where each step results in a call to an LLM on a code location with context derived from the entire repository, previous code changes and task-specific instructions. This research evaluates the effectiveness of CodePlan on two repository-level tasks: package migration (C#) and temporal code edits (Python) and shows that CodePlan exhibits a stronger alignment with the ground truth in comparison to baselines.


NEW ARTICLE

The intimacy triple bind: Structural inequalities and relational labor in the influencer industry

Social media content creators, or influencers, depend heavily on their ability to cultivate and maintain an invested audience-community. They are encouraged to practice “relational labor,” commodifying their personalities, lives and tastes in order to build authentic self-brands and intimacy with audiences.

In a new article (opens in new tab), a researcher from Microsoft draws on an ethnographic study of the London influencer industry to examine relational labor through an intersectional feminist lens, exploring the ways in which structural inequalities shape relationships between creators and their audiences. Managing audience relationships is harder for marginalized creators – especially those making stigmatized and less brandable content genres – who are at higher risk of trolling and harassment.

This article explores four key tactics for managing such conditions: (1) leaning into making rather than being content; (2) (dis)engaging with anti-fans through silence; (3) retreating into private community spaces, away from the exposure of public platforms; and, in parallel, (4) turning off public comments.


The post Research Focus: Week of October 23, 2023 appeared first on Microsoft Research.

Read More

Abstracts: October 23, 2023

Abstracts: October 23, 2023

Microsoft Research Podcast - Abstracts

Members of the research community at Microsoft work continuously to advance their respective fields. Abstracts brings its audience to the cutting edge with them through short, compelling conversations about new and noteworthy achievements.

In this episode, Andy Gordon, a Partner Research Manager, and Carina Negreanu, a Senior Researcher, both at Microsoft Research, join host Dr. Gretchen Huizinga to discuss “Co-audit: Tools to help humans double-check AI-generated content.” This paper brings together current understanding of generative AI performance to explore the need and context for tools to help people using the technology find and fix mistakes in AI output.

Transcript

GRETCHEN HUIZINGA: Welcome to Abstracts, a Microsoft Research Podcast that puts the spotlight on world-class research in brief. I’m Dr. Gretchen Huizinga. In this series, members of the research community at Microsoft give us a quick snapshot, or a podcast abstract, of their new and noteworthy papers. Today, I’m talking to Dr. Andy Gordon, a Partner Research Manager, and Dr. Carina Negreanu, a Senior Researcher, both at Microsoft Research. Doctors Gordon and Negreanu are co-editors of a paper called “Co-audit: Tools to help humans double-check AI-generated content,” and you can read a preprint of this paper now on arXiv. Andy Gordon, Carina Negreanu, thanks for joining us on Abstracts!


ANDY GORDON: Great to be here.

CARINA NEGREANU: Likewise.

HUIZINGA: Let’s start with you, Andy. In a few sentences, describe the issue or problem your paper addresses and why people should care about it.

GORDON: Well, generative AI is amazing. Things like Bing Chat or ChatGPT, all these things powered by large language models. Totally amazing. But it’s really important for everyone to remember that these AIs can make mistakes. For example, you ask when your favorite actor got married, and the model says the year but gets it wrong. Or you ask for some Python code, and it works on positive numbers, but occasionally you give it negative numbers and it goes wrong. Another example, you get a summary of some text. It’s great but unfortunately misses one of the important points. Or thinking about images, you ask for a portrait of a character from the AI and there’s some glitch, and it produces a hand with six fingers. So as users, we need to get into the habit of carefully checking AI outputs for mistakes. And we refer to that as “audit” in a sense of a systematic review. Coming to the paper, it’s about what we call co-audit. And that’s our term for any tool support that helps the human audit the AI output. And some examples of co-audit are tools that can help check for hallucinations, like when the actor’s date of birth is wrong, or to check Python code to find some errors or show how a summary has been constructed to help people find errors.

HUIZINGA: Carina, let’s talk to you. What related research does this paper build on, and how does your work add to it?

NEGREANU: So there was no direct work on the co-audit brand before us. We’re just introducing it. But there has been a lot of research that either motivates the need for co-audit or provides relevant framing for it or even like early examples of what we start thinking of co-audit. So as you’re probably aware, there has been a really great effort in the last years to assess the quality of generations by large language models across a multitude, really, of tasks. And currently we use this body of work as motivation for our research. It basically shows there really is a need for this kind of work. And we hope that in the future, we can also use it to benchmark co-audit tools that we are going to produce in our wider community. But the idea of dealing with errors has been a key part of research on human-AI interaction for ages. And there have been some really cool guidelines that came out recently, especially from Amershi in 2019, on human-AI interactions that are concerned with this part of the world. And more recently, Glassman had a really cool paper about conversational frameworks for human-AI and communication and basically links these concepts to psychology. And in our work, as you can read in our paper, we are trying to basically frame co-audit within her framework, and we find that it’s a natural fit. But before we started defining formally co-audit and building this paper, our group has built co-audit tools in the co-generation space. One such tool is GAM, which is grounded abstraction matching, where we basically help users learn how to effectively communicate with large language models so that they both understand what the large language model understands they’re asking and also get good feedback back. We also built ColDeco, which is a spreadsheet tool for inspecting and verifying calculated columns without the user requiring to view the underlying code produced by the large language models. But really, any tool that focuses on debugging or basically getting information back from human-generated content is useful here. So even tools that are like early debugging tools like FxD are very important here as we learn how people use these kinds of tools and we try to basically apply the same concepts in the context of LLM-generated content. So basically, we are building on top of work that helps understand the needs and challenges that end-user programmers have when working in this space and trying to extrapolate them to co-auditing tools for LLM-generated content.

HUIZINGA: Well, Andy, how would you describe the research approach you used or your methodology for this paper, and how did it come about?

GORDON: Great question, Gretchen, and it was actually quite an unusual methodology for us. So as Carina says, we’ve been looking at co-audit in a very specific setting of spreadsheet computations, and we began to realize that co-audit was really important for any kind of AI-generated output, and we started to see other people doing research that was doing the same sort of thing we were doing but in different settings. So, for example, there was a paper, they were generating bits of Python and they were deliberately showing multiple pieces of code after they’d been generated to kind of nudge the human user to make a decision about which one was better. I mean that’s, it’s really important to get people to think about the outputs, and this was a nice trick. So we thought, look, this is actually quite an important problem, and MSR (Microsoft Research) should step up and sort of gather people. So we organized a workshop inside Microsoft in the spring and got folks together to share their perspectives on co-audit. And then since then, we’ve reflected on those discussions and tried to kind of pull them together in a more coherent sense than the sort of whiteboards and sticky notes that we produced back then. And so that’s produced this paper. I think one of the key things that we learned in that process that we hadn’t been thinking about before was that co-audit really complements prompt engineering. So you hear a lot about prompt engineering, and it’s the first part of what we call the prompt-response-audit loop. And this is related to what Carina was saying about Elena Glassman’s work about AI-human interaction. So the first step is you formulate a prompt. For example, you ask for Python code. That’s the first step. The second step is we wait for the response from the AI. And then the third step is that we need to inspect the response—that’s the audit part—decide if it meets our needs or if there is a mistake, and if that’s the case, we need to repeat again. So that’s this loop, the prompt-response-audit loop. And prompt engineering, they’re the tools and techniques that you use in that first step to create the prompt. So, for example, some tools will automatically include a data context in a prompt if you’re trying to create some Python to apply to a table in a spreadsheet or, or something like that. And then duly, co-audit, those are the tools and techniques we have to help the human audit the response in the third step of this loop. And that’s like these tools I’ve been mentioning that show maybe two or three candidates of code that’s to be used.

HUIZINGA: Carina, let’s move over to what kinds of things you came away with. Your takeaways or your findings from this workshop. Talk about that and how you chose to articulate them in the paper.

NEGREANU: So as part of our research, we found that basically one co-audit tool does not fit all needs, which in a way was great because we have a bigger field to explore, but in other ways a bit daunting, as it means you have to think of many things. And one thing that really came to light was that even though we can’t, you know, build something that fits everything, we can build a set of principles that we think are important. So really, we wrote our paper around those 10 principles that we have identified from the workshop and then are trying to promote them as things people should think about when they start going on the journey of building co-auditing tools. So one of the examples is that we really think that we should think about grounding outputs, so, for example, by citing reliable sources similar to what Bing Chat does today. We think that’s a really valuable, important principle that people should follow, and they should think about what that means in the concept of their co-auditing tool. In the case of Bing, it’s quite simple, as it’s like factual references, but maybe if it becomes referencing code, that becomes more tricky but still super interesting going forward. We also propose that co-auditing tools should have the capability to prioritize the user’s attention to the most likely errors, as we need to be mindful of the user’s cognitive efforts and have a positive cost benefit. Basically, if we overflood the users with different errors and flags, it might be too problematic, and the adoption might be quite difficult going forward. And finally, this is something that really comes to core to our research area in spreadsheets. It’s about thinking beyond text. So we know visuals are so important in how we explain things, in how we teach in schools, how we teach universities. So how do we include them in the co-auditing process going forward? I think that’s going to be a really interesting challenge, and we hope we’re going to see some interesting work in that space.

HUIZINGA: Yeah. Well, principles are one thing, Andy, but how does this paper contribute to real-world impact? We talked about that a bit at the beginning. Who benefits most from this tool?

GORDON: That is a great question, Gretchen, and actually that was a question that we talked about at the workshop. We think that some application areas are going to benefit more than others. So co-audit really matters when correctness really matters and when mistakes are bad consequences, so in terms of application area, that’s areas like maybe finance or technology development or medicine. But you asked particularly about who, and we think some people will benefit more from co-audit than others. And we found this really striking example, I guess it’s an anecdotal example that someone was posting on social media. A professor was teaching a class using generative AI tools for the first time to generate code, and he found some evidence that people who have low self-confidence with computers can be intimidated by generative AI. So he would find that some of the class were really confident users and they would ask it, you know, generate some Python to do such and such, and it would come back with code with, you know, a bunch of mistakes in it. And the confident users were happy just to swat that away; they were even quite a little arrogant about it, like this is a stupid computer, they were saying. But, Gretchen, he found that a lot of his students who were less confident with computers were quite intimidated by this because it was very confidently just saying, oh look, all this code is going to work. And they kind of got a bit stuck, and some of them were scrolling around through this code, trying to understand how it worked, when in fact it was just really broken. So he thought this was pretty bad that these able students who were just less confident were being intimidated and were making less good use of the, the generative AI. Now that is an example that’s an anecdote from social media from a reputable professor, but we looked into it and there’s peer-reviewed studies that show similar effect in the literature. So I’d say we need co-audit tools that will encourage these less confident users to question when the AI is mistaken rather than getting stuck, and I think otherwise they’re not going to see the benefits of the generative AI.

HUIZINGA: Well, Carina, sometimes I like to boil things down to a nugget or a beautiful takeaway. So if there’s one thing you want our listeners to take away from this work, this paper, what would it be?

NEGREANU: I think that what this study has taught us is that really we need significantly more research. So basically, a good co-auditing experience can really be the element that makes it or breaks it in how we incorporate LLMs safely into our day-to-day lives. But to make this happen, we need people from the field working towards the same goal. It’s really an interdisciplinary work, and I don’t think we can do it by isolating into groups as we’re currently researching now. So I would urge our listeners to think about how they could contribute in this space and reach out with feedback and questions to us. We are more than open to collaboration. Really, we are just starting this journey, and we’d love to see this area to become a research priority going forward in 2024.

HUIZINGA: Well, Andy, as an opportunity to give some specificity to Carina’s call for help, what potential pitfalls have you already identified that represent ongoing research challenges in this field? And what’s next on yours—and potentially others’—research agenda in this field?

GORDON: Well, one point, and I think Carina made this, that co-audit techniques will themselves never be perfect. I mean, we’re saying that language models are never going to be perfect. Mistakes will come through. But the co-audit techniques themselves won’t be perfect either. So sometimes a user who is using the tools will still miss some mistakes. So, for example, you know, at the workshop, we thought about security questions and co-audit tools themselves. And we were thinking, for instance, about maybe deliberate attacks on a generative AI. There’s various techniques that people are talking about at the moment where you might sort of poison the inputs that generative AI models pick up on. And in principle, co-audit tools could help users realize that there are deliberate mistakes that have been engineered by the attacker. So that’s good. But on the other hand, you know, security always becomes an arms race. And so once, you know, if we did have a good tool that could detect those kinds of mistakes, the attackers then will start to engineer around the co-audit tools, trying to make them less effective. So that will be an ongoing problem, I think. And on the other hand, you know, we’ll find that if co-audit tools are giving too many warnings, users will start to ignore them, and there’ll be a sort of under-reliance on co-audit tools. And of course, if we give too few, users will miss the mistakes. So an interesting balance needs to be struck. And also, we don’t expect there’s going to be one overarching co-audit experience, but we think there’ll be many different realizations. And so, as Carina says, we hope that common lessons can be learned, and that’s why we want to keep documenting this space in general and building a research community. So I echo what Carina was saying. If you’re listening and you think that what you’re working on is co-audit, do reach out.

HUIZINGA: Well, Andy Gordon, Carina Negreanu, thanks for joining us today. And to our listeners, thanks for tuning in. If you’re interested in learning more about this paper and this research, you can find a link at aka.ms/abstracts, or you can read the preprint on arXiv. See you next time on Abstracts!

The post Abstracts: October 23, 2023 appeared first on Microsoft Research.

Read More