Microsoft AI – Page 14

The Crossroads of Innovation and Privacy: Private Synthetic Data for Generative AI

May 29, 2024

by Alyssa Hughes Microsoft AI

A flow chart with four successive blocks. Starting with a data owner, private data is provisioned to train a language model with differential privacy. The language model is subsequently prompted to generate novel synthetic data resembling the private data. This data can be used for down-stream applications such as machine learning, feedback analysis or statistical analysis.

Introduction

In today’s data-driven world, organizations strive to leverage data to train and adapt AI models. However, this pursuit often faces an important challenge: balancing the value of data with the need to safeguard individuals’ right to privacy and comply with data privacy regulations like the General Data Protection Regulation (opens in new tab) (GDPR) and the EU AI Act (opens in new tab).

Synthetic data has emerged as a powerful solution to privacy and compliance challenges. It allows organizations to create realistic and useful datasets, tailored to specific use cases, without compromising individual privacy. This enables organizations to:

Train and adapt AI models: Synthetic data can be used to train and adapt models to specific domains and industries, even when real-world data is limited, or privacy concerns exist.
Comply with regulations: Since it doesn’t require user data, synthetic data generation helps organizations adhere to data privacy regulations.
Unlock new possibilities: Synthetic data opens doors to innovative AI applications that were previously limited by data availability or privacy constraints.

Microsoft’s Phi-3 (opens in new tab) small language model (SLM) is a good example of how synthetic data can contribute to responsible AI development, enabling the creation of powerful language models without compromising privacy. Phi-3 leverages a combination of “textbook quality” web data and LLM-generated synthetic content, creating a strategic approach that doesn’t need real-world personal data.

However, synthetic data carries limitations. It can be difficult to artificially generate realistic data that anticipates a wide range of use cases and individual scenarios. Furthermore, synthetic data generated by pre-trained large-language models (LLMs) can sometimes reduce accuracy and increase bias on down-stream tasks (opens in new tab). So, how could we generate synthetic data that accurately captures the diversity and specificity of private data while maintaining strict privacy protections for data contributors?

Differential privacy: A bridge between innovation and privacy

Differentially private (DP) synthetic data generation is a promising solution. It allows developers to pursue innovations in machine learning while prioritizing privacy. The goal of synthetic data generation is to produce data statistically similar to real-world data sources. However, when the data is too similar, replicating uniquely identifying details of the source data, the promise of preserving privacy is compromised. This is where DP can help. DP is a mathematical framework for providing a guarantee that a particular computation is relatively invariant to the addition or removal of a single data contributor. Using DP techniques, researchers can generate synthetic datasets that retain the statistical properties of the original data while ensuring that information that could help identify data contributors remains obscured.

This blog post explores recent advancements in private synthetic data generation. We examine four recently published research papers that propose innovative techniques for generating synthetic data with strong privacy guarantees, while maintaining its usefulness for analytics, training AI models, and other tasks.

Synthetic Text Generation with Differential Privacy: A Simple and Practical Recipe (opens in new tab) by Yue et al., which appeared at ACL 2023 (opens in new tab), proposes using DP in the fine-tuned training process of a generative LLM. This approach injects noise into the model’s updates during training, ensuring privacy guarantees while maintaining the model’s ability to generate realistic text.
Differentially Private Synthetic Data via Foundation Model APIs 1: Images (opens in new tab) and Differentially Private Synthetic Data via Foundation Model APIs 2: Text (opens in new tab) by Lin, Xie, et al., which appeared at ICLR 2024 (opens in new tab) and ICML 2024 (opens in new tab), respectively, present an approach to data synthesis that focuses on leveraging pre-trained foundation models as black boxes. This method utilizes differentially private queries to the models’ inference APIs for data generation, offering an API-based, training-free approach.
Privacy-Preserving In-Context Learning with Differentially Private Few-Shot Generation by Tang et al., which appeared at ICLR 2024, explores applying DP to the task of few-shot learning, where models are conditioned on a handful of synthetically generated demonstration examples at inference time. This approach is useful when only private labeled examples are available, and the generalizing power of an LLM can be leveraged to solve an in-context task.

In the remainder of this blog post, we describe each approach in more detail, and present experimental results illustrating their value.

Technical deep dive: Differentially private synthetic data generation

Synthetic Text Generation with Differential Privacy: A Simple and Practical Recipe

Generative LLMs offer the opportunity to produce synthetic text by sampling from LLM outputs. One avenue to generating realistic synthetic text is to fine-tune an LLM using representative data. For example, we could consider fine-tuning a pre-trained LLM on a corpus of scientific papers, enabling the model to more readily produce text that captures the knowledge and writing style used in scientific writing. Suppose, however, that we want to produce synthetic text based on a private corpus of documents. What steps can we take to protect the document authors and any sensitive information in their documents? For example, we may want to produce synthetic medical notes, or personal emails. LLMs have a well-known capacity to memorize training examples, and a model with the potential for reproducing samples from the training set might pose significant privacy risks.

In the paper Synthetic Text Generation with Differential Privacy: A Simple and Practical Recipe, researchers from Microsoft presented an approach to leveraging a private data corpus for synthetic generation, without compromising the privacy of the data subjects. This approach uses differentially private stochastic gradient descent (DP-SGD) to fine-tune an LLM on the private documents with a strong privacy guarantee. Differentially private model training provides a mathematical guarantee that the trained model parameters, and any subsequent model outputs, are relatively unaffected by the addition or removal of any single user’s training examples.

The synthetic generation approach described in this work was validated by training on restaurant reviews with varying levels of privacy protection, then prompting the model to generate novel reviews. These reviews were then used for downstream classification tasks, such as sentiment prediction and restaurant genre classification, and the results, which are shown in Table 1, demonstrated only small accuracy penalties compared to training on the raw private data. This approach unlocks a powerful way for realistic synthetic data to be generated from private data without compromising privacy or confidentiality.

A table of results with four columns and four rows. The columns indicate data type, data generator, epsilon, rating and category. The first row indicates “original” data type and no entry for data generator or epsilon. The rating is 0.733 and category is 0.775. The following three rows all indicate Synthetic for data type and GPT2, GPT2-Medium, and GPT2-Large for the data generator. Each row is further divided into two rows corresponding to epsilon = 4 and epsilon = infinity respectively. In all cases the rating and category scores are lower than the row marked original by a few percentage points. The rows corresponding to epsilon = 4 are lower than corresponding rows marked epsilon=infinity by 1-2 percentage points. In general the epsilon = 4 rows have increased scores for larger GPT2 models, while the epsilon=infinity rows are relatively flat. — Table 1: Various versions of GPT-2 were trained on restaurant reviews both with (ε=4) and without (ε =∞) a privacy guarantee. These models were used to produce synthetic training sets, which were used to train classification models for review rating and restaurant category, and subsequently evaluated for accuracy on a private hold-out set. The results show that models trained on the synthetic data can achieve accuracy competitive with models trained without a privacy guarantee.

Differentially Private Synthetic Data via Foundation Model APIs

While the ACL paper demonstrated a robust approach to synthetic data generation, fine-tuning a large model can be impractical. Model training requires significant computing capacity and some of the most powerful models available are proprietary and not accessible for DP training. Recognizing this challenge, researchers at Microsoft explored whether synthetic data can be generated directly using only inference API access to a model, even while utilizing an untrusted model controlled by a third party. Crucially, the synthetic data should resemble a targeted private corpus, and yield a similar DP guarantee as was met in the previous work based on model training. In two separate papers, the authors demonstrate an approach to this problem using a differentially private sampling approach called Private Evolution (PE).

Two independent flow charts. In the first, private data is applied to a pre-trained model using DP-SGD. The fine-tuned model is used to produce differentially private synthetic data. In the second chart, a pre-trained model is prompted via its API to produce generic data. Private data is used to inform selection of the generated data, with a strong privacy guarantee, yielding differentially private synthetic data. — Figure 2: Instead of fine-tuning pre-trained models with DP-SGD (top figure), Private Evolution (PE) only requires accessing the inference APIs of a model (bottom figure). Thus, PE is easily compatible with foundation models that are difficult to DP-fine-tune (e.g., because they are too large) or infeasible to fine-tune (e.g., they are only accessible through inference APIs).

Synthetic image generation using foundation model APIs: In Differentially Private Synthetic Data via Foundation Model APIs 1: Images, the authors introduced Private Evolution (PE), an approach that enables DP image synthesis merely through inference APIs of a generative model. PE operates by sampling from a pre-trained diffusion model such as Stable Diffusion, which has no knowledge of the private corpus. PE then iteratively compares these samples to the private corpus, keeps the ones that are most similar to the private corpus, and uses the pre-trained model to generate more such samples. Crucially, the comparison to the private corpus is done with a DP guarantee, so that any information revealed about the private corpus is strictly bounded. Also, all the queries to the foundation model APIs satisfy the same DP guarantee, so that we can safely use APIs provided by (untrusted) third parties.

Figure 3: Overview of PE. We use two private and synthetic images for illustration. Step 1 (RANDOM_API): we use the model API to generate random images. Step 2: We iteratively go through steps 2.1-2.3 to refine the synthetic images towards the private images. Step 2.1: Each private image votes for their closet synthetic image in the embedding space. In this example, we assume that the bird image gets two votes, and the car image gets zero votes. We then add Gaussian noise to the votes to ensure DP. This gives us the DP Nearest Neighbor Histogram (DP_NN_HISTOGRAM). Step 2.2: We resample the generated images proportional to the histogram. We assume that only the bird image remains. Step 2.3 (VARIATION_API): We use the model API to generate new similar images to the bird image, which are the initial synthetic images in the next iteration.

Even without doing any model training, PE significantly advances state-of-the-art results on some of the datasets. For example, on CIFAR10 dataset (opens in new tab), we achieve FID score (image quality measure, smaller is better) ≤ 7.9 with DP privacy cost ϵ = 0.67, significantly improving the previous SOTA from ϵ = 32. In the paper, we also show that PE requires less computational resource (GPU hours) than DP fine-tuning to achieve such results.

A 2D line chart with six line series, comprising conditional and unconditional variations on the private evolution and DP-MEPF methods, as well as DP-GAN and DP-Diffusion. The x axis presents values of epsilon from 0 to 32. The y axis presents values of the image quality measure FID from 0 to 80, where values are better. All six series show decreasing values of FID for increasing values of epsilon. Both of the series corresponding to private evolution show significantly lower FID values, ranging from about epsilon = 0.1 to epsilon = 2. — Figure 4: FID (image quality measure, lower is better) vs. DP privacy cost ϵ on CIFAR10 (δ = 10⁻⁵ ). (Un)cond means (un)conditional generation. Ours achieves the best privacy-quality trade-off compared to prior training-based approaches.

An array of ten rows of thumbnails, each row depicting ten instances of generated synthetic images. The rows include birds, cars, cats, dogs, and other animals, planes, boats and trucks. Most of the images appear to be realistic with some exhibiting unusual artifacts. — Figure 5: Private Evolution-generated samples using CIFAR-10 as the private corpus (ε =0.67, δ =10^-5). Each row corresponds to one object class.

Synthetic Text Generation using foundation model APIs: the PE approach described above works well for images since it is easy to produce nearby perturbations of promising images. In Differentially Private Synthetic Data via Foundation Model APIs 2: Text, Microsoft researchers explored whether a similar approach could be applied to text. Their method, called Augmented Private Evolution (Aug-PE), operates similarly to the basic PE approach, but leverages the power of a pre-trained LLM to produce variations and re-wordings of input text. Aug-PE also proposes some fundamental algorithmic improvements that may benefit future development of PE.

An overview of the Augmented Private Evolution algorithm for synthetic text generation. Step 1 invokes a language model to produce random text. Step 2.1 uses private data and differential private to vote on the best candidates from step 1, Step 2.2 samples from this differentially private histogram to produce a selected set of generations. Step 2.3 prompts a language model to produce variants of the selected generations, and steps 2.1 to 2.3 are repeated. — Figure 6: Augmented Private Evolution (Aug-PE) leverages a foundational LLM to synthesize text and compare in a privacy-preserving way with a private corpus. Similar to PE for images, in Aug-PE, samples that more closely resemble the private data are retained and refined to produce new synthetic text with a strong privacy guarantee. The illustration shows how we generate DP synthetic reviews for restaurants given two private samples.

Results show that Aug-PE is a promising alternative to DP-fine-tuning for DP text synthesis. With the same foundation model, PE can match or even beat DP-fine-tuning in terms of the trade-off between text quality and privacy. Moreover, as Aug-PE only requires inference APIs, Aug-PE can easily work with the most advanced LLMs such as GPT-3.5, LLaMA, and Mixtral to further improve the text quality. In terms of computational cost (GPU hours), PE can achieve up to 65.7x speedup compared to the DP fine-tuning approach.

A table of results for area and rating classification accuracy for a variety of models and comparing PE with DP synthesis. The table contains the remark that with the same model PE matches or beats DP fine-tuning on text quality vs privacy, and PE works well with advanced LLMs which may be challenging or impossible to fine-tune. The models compared include three sizes of GPT-2, several major open source models, and GPT-3.5. PE on the Mixtral model shows the strongest Area classification accuracy at 43.6 while PE on GPT-3.5 shows the strongest Rating classification accuracy at 43.1. — Table 2: Results on ICLR 2023 paper reviews (ϵ = 1). We use each method to generate DP synthetic paper reviews and test the utility of the data by training downstream paper area or rating classifiers and evaluate their accuracies on the real hold-out data (higher is better). Under the same base model (GPT-2 families), PE achieves competitive results with DP fine-tuning. PE also supports advanced LLMs that may be challenging to work with DP fine-tuning due to large model sizes or black box access.

Privacy-Preserving In-Context Learning with Differentially Private Few-Shot Generation

In-context learning is a technique for performing tasks with an LLM by providing a sample of demonstration examples in the prompt of the LLM before presenting it with a specific task. For example, we might show a few movie plots and their genre and ask the LLM to suggest the genre for a particular plot of interest. In-context learning harnesses the strong generalization capabilities of LLMs, but it requires a sample of labeled demonstration examples at inference time. How can we perform in-context learning when the only available labeled examples are private? A naïve solution might be to use the private examples but hide the demonstration prompt from the user. However, the threat posed by jailbreak attacks puts these examples at risk for exposure to a malicious user.

In Privacy-Preserving In-Context Learning with Differentially Private Few-Shot Generation, Microsoft researchers explored how demonstration examples can be synthesized from a private corpus with a privacy guarantee. The method operates by incrementally drawing samples from a token distribution defined by the private examples but with noise added to the distribution. The noise is calibrated to ensure a bound on the privacy lost with each sample. The research demonstrated that in-context learning can out-perform zero-shot learning (querying a model without any demonstration examples) and comes close to performing at the same level as the case with no privacy mitigations, as shown in Table 3.

An overview of differentially private few-shot generation. A round of token generation is depicted with four steps. Given the tokens generated so far, step 1 selects the relevant private data. Step 2 takes an M by N sample of the private data, producing M batches of N examples. Step 3 assembles M LLM prompts with task instructions and the N examples appended. Step 4 feeds the M prompts to the LLM and performs noisy aggregation over the LLM’s output probabilities to select the next generated token. — Figure 7: Illustration of DP few-shot generation. The example shows a synthetic demonstration generated token by token for the topic school with a differentially private guarantee. As new tokens are sampled, the private examples inform the sampling probability of each subsequent token, with noise injected to preserve privacy.

A table of results for private in-context learning tasks, including text classification on three datasets (AGNews, DBPedia, and TREC) and information extraction on two datasets (MIT-G and MIT-D). Accuracy is compared across two cases where epsilon = 0 (zero-shot and four-shot) and values of epsilon at 1, 2, 4, 8 and infinity. Generally, accuracy improves as epsilon increases but epsilon = 8 often outperforms epsilon = infinity. — Table 3: For classification and information extraction tasks, DP in-context learning achieves accuracy similar to non-private ICL (ϵ =∞)

Conclusion

Synthetic data generation presents enormous opportunities to develop AI systems without compromising end-user privacy. In this blog post, we have explored recent innovations in synthetic data generation with strong privacy guarantees. These approaches can enable practitioners to produce synthetic data from private entities, while mitigating the risk that private information might be revealed. While these approaches are highly promising, they do have limitations. For example, we are currently limited to producing relatively short text passages. Future work will continue to explore the opportunities presented by these approaches, with an aim to produce increasingly realistic data with strong privacy guarantees.

Acknowledgments: The authors are grateful for the contributions of the co-authors of the papers reviewed in this blog post: Xiang Yue, Xuechen Li, Girish Kumar, Julia McAnallen, Hoda Shajari, Huan Sun, David Levitan, Chulin Xie, Arturs Backurs, Sivakanth Gopi, Da Yu, Harsha Nori, Haotian Jiang, Huishuai Zhang, Yin Tat Lee, Bo Li, Janardhan Kulkarni, Xinyu Tang, Richard Shin, Andre Manoel, and Niloofar Mireshghallah.

The post The Crossroads of Innovation and Privacy: Private Synthetic Data for Generative AI appeared first on Microsoft Research.

Research Focus: Week of May 27, 2024

May 29, 2024

by Brenda Potts Microsoft AI

Welcome to Research Focus, a series of blog posts that highlights notable publications, events, code/datasets, new hires and other milestones from across the research community at Microsoft.

EVENT

Register now for Research Forum on June 4

Join us for Research Forum (opens in new tab), an event series that explores recent research advances, bold new ideas, and important discussions with the global research community in the era of general AI.

In Episode 3, researchers at Microsoft emphasize the importance of globally equitable AI, and will share novel use cases, transformative applications from industry to material design, and provide updates on AutoGen and MatterGen.

Your registration includes access to our live chat with researchers on the event day.

Episode 3 will air Tuesday, June 4 at 9:00 AM PT.

NEW RESEARCH

Generative AI and the Politics of Visibility

Generative AI tools have a remarkable capacity to produce complicated and lengthy texts, with just simple direction from users. AI proponents assert they can help writers, providing creative suggestions, completing half-written sentences or story fragments, and inventing character backstories. But this raises questions about the politics of visibility: what kinds of stories do these tools tend to generate, and what do they generally leave out? Do these tools fully represent diverse or marginalized populations and non-normative communities?

In a recent paper: Generative AI and the Politics of Visibility, a researcher from Microsoft tested three widely available generative AI tools (Bing Chat, ChatGPT, and Google’s Bard, now Gemini) with prompts designed to reveal their normative assumptions, prompting the tools multiple times with each to track the diversity of the outputs to the same query. His research demonstrates that, at least as currently designed and trained, generative AI tools tend to reproduce normative identities and narratives, rarely representing less common arrangements and perspectives unless specifically prompted. When they do generate variety, it is often narrow, maintaining deeper normative assumptions in what remains absent.

Read the paper

NEW RESEARCH

ACM MMSys 2024 Bandwidth Estimation in Real Time Communications Challenge

Videoconferencing has become indispensable for everything from global business operations to accessible education, transforming the way people communicate across physical barriers and geographical divides. The quality of experience (QoE) delivered by video conferencing systems depends in part on correctly estimating the capacity of the bottleneck link between the sender and the receiver over time. Bandwidth estimation for real-time communications (RTC) remains a significant challenge, primarily due to the continuously evolving heterogeneous network architectures and technologies. From the first bandwidth estimation challenge hosted by Microsoft at ACM MMSys 2021, researchers learned that bandwidth estimation models trained with reinforcement learning (RL) in simulations to maximize network-based reward functions may not be optimal, due to the sim-to-real gap and the difficulty of aligning network-based rewards with user-perceived QoE. In this year’s ACM MMSys 2024 Bandwidth Estimation in Real Time Communications Challenge, researchers from Microsoft aim to align reward maximization with user-perceived QoE optimization using offline RL and a real-world dataset released by Microsoft Teams. The challenge received enthusiastic participation from both academia and industry. All models submitted to the grand challenge underwent initial evaluation, and top models were further evaluated on a geographically distributed testbed. Challenge results show that by leveraging real-world data and integrating objective audio/video quality scores as rewards, offline RL can facilitate the development of competitive bandwidth estimators for RTC.

Read the paper

NEW RESEARCH

Player-Driven Emergence in LLM-Driven Game Narrative

Game creation is a labor-intensive process, with limited automation of non-graphic game elements related to dialogue and narrative structure. These elements are typically hand-coded and rigidly deterministic, with few options presented to the player. Large language models (LLMs) are beginning to show potential in the creation of richer and more expansive narrative spaces.

In a recent paper: Player-Driven Emergence in LLM-Driven Game Narrative, accepted for presentation at the IEEE Conference on Games 2024, researchers from Microsoft in collaboration with members of the Xbox organization explore how interaction with LLMs can empower players to participate in the evolution of game narratives. As a testbed, they created a text-adventure game in which players attempt to solve a mystery under a fixed narrative premise but can freely interact with non-player characters generated by GPT-4, a state-of-the-art LLM. They recruited 28 gamers to play the game and used GPT-4 to automatically convert the game logs into a node-graph representing the narrative in the player’s gameplay. Through their interactions with the non-deterministic behavior of the LLM, players were able to discover interesting new emergent nodes that were not a part of the original narrative but have potential for being fun and engaging. Players that created the most emergent nodes tended to be those that often enjoy games that facilitate discovery, exploration and experimentation.

Read the paper

NEW RESEARCH

Segmentation using large language models: A new typology of American neighborhoods

The U.S. Census Bureau’s American Community Survey (ACS) is the country’s primary source of social and economic data. But much of the data is low quality, especially at the highest levels of geographic detail (Block Groups). As one zooms in geographically on a map, the resolution of social and economic data decreases, which is counterintuitive. Typically, zooming in generates more detail, not less. Recent changes in the U.S. statistical system have amplified this geographic-demographic resolution trade-off.

In a recent paper: Segmentation using large language models: A new typology of American neighborhoods, researchers from Microsoft present a solution to this problem in the form of an AI-based open and reproducible geodemographic classification system using small area estimates from the ACS. They employ a partitioning clustering algorithm to a range of socio-economic, demographic, and built environment variables. Using an open-source software pipeline ensures adaptability to future data updates. One key innovation is the integration of GPT-4, to generate intuitive cluster descriptions and names. This represents a novel application of natural language processing in geodemographic research and showcases the potential for human-AI collaboration within the geospatial domain.

Read the paper

NEW RESEARCH

From Local to Global: A Graph RAG Approach to Query-Focused Summarization

The use of retrieval-augmented generation (RAG) to retrieve relevant information from an external knowledge source enables LLMs to answer questions over private and/or previously unseen document collections. However, RAG fails on global questions directed at an entire text corpus, such as: “What are the main themes in the dataset?”, since this is inherently a query-focused summarization (QFS) task, rather than an explicit retrieval task. Prior QFS methods fail to scale to the quantities of text indexed by typical RAG systems.

In a recent preprint: From Local to Global: A Graph RAG Approach to Query-Focused Summarization, researchers from Microsoft propose combining the strengths of these contrasting methods through a Graph RAG approach to question answering over private text corpora that scales with both the generality of user questions and the quantity of source text to be indexed. This approach uses an LLM to build a graph-based text index in two stages: first to derive an entity knowledge graph from the source documents, then to pre-generate community summaries for all groups of closely-related entities. Given a question, each community summary is used to generate a partial response, before all partial responses are again summarized in a final response to the user. For a class of global sensemaking questions over datasets in the 1 million token range, Graph RAG leads to substantial improvements over a naïve RAG baseline for both the comprehensiveness and diversity of generated answers.

Read the paper

Microsoft Research in the news

Microsoft Announces New Foundation Model For Digital Pathology, Diving Deeper Into Clinical Medicine

Forbes | May 22, 2024

In partnership with Providence health system and the University of Washington, Microsoft has leveraged its significant work with generative AI to launch GigaPath, the first whole-slide foundation model for digital pathology that has been pre-trained with real-world data.

Spanish mini-satellites bring the internet to isolated areas (en español)

La Razon | May 17, 2024

The Spanish company Fossa, with help from Microsoft Research, has successfully tested a small satellite weighing less than a kilogram that improves connectivity in places with little or no coverage, a potential boost for the internet of things (IoT).

View more news and awards

The post Research Focus: Week of May 27, 2024 appeared first on Microsoft Research.

Ideas: Designing AI for people with Abigail Sellen

May 23, 2024

by Brenda Potts Microsoft AI

Microsoft Research Podcast | Ideas | Abigail Sellen

Behind every emerging technology is a great idea propelling it forward. In the new Microsoft Research Podcast series, Ideas, members of the research community at Microsoft discuss the beliefs that animate their research, the experiences and thinkers that inform it, and the positive human impact it targets.

In this episode, host Gretchen Huizinga talks with Distinguished Scientist and Lab Director Abigail Sellen. The idea that computers could be designed for people is commonplace today, but when Sellen was pursuing an advanced degree in psychology, it was a novel one that set her on course for a career in human-centric computing. Today, Sellen and the teams she oversees are studying how AI could—and should—be designed for people, focusing on helping to ensure new developments support people in growing the skills and qualities they value. Sellen explores those efforts through the AI, Cognition, and the Economy initiative—or AICE, for short—a collective of interdisciplinary scientists examining the short- and long-term effects of generative AI on human cognition, organizational structures, and the economy.

Learn more:

AI, Cognition, and the Economy (AICE)
Initiative page

Responsible AI Principles and Approach | Microsoft AI

The Rise of the AI Co-Pilot: Lessons for Design from Aviation and Beyond
Publication, 2023

The Myth of the Paperless Office
Book, 2003

Transcript

[SPOT]

GRETCHEN HUIZINGA: Hey, listeners. It’s host Gretchen Huizinga. Microsoft Research podcasts are known for bringing you stories about the latest in technology research and the scientists behind it. But if you want to dive even deeper, I encourage you to attend Microsoft Research Forum. Each episode is a series of talks and panels exploring recent advances in research, bold new ideas, and important discussions with the global research community in the era of general AI. The next episode is coming up on June 4, and you can register now at aka.ms/MyResearchForum (opens in new tab). Now, here’s today’s show.

[END OF SPOT]

[TEASER]

[MUSIC PLAYS UNDER DIALOGUE]

ABIGAIL SELLEN: I’m not saying that we shouldn’t take concerns seriously about AI or be hugely optimistic about the opportunities, but rather, my view on this is that we can do research to get, kind of, line of sight into the future and what is going to happen with AI. And more than this, we should be using research to not just get line of sight but to steer the future, right. We can actually help to shape it. And especially being at Microsoft, we have a chance to do that.

[TEASER ENDS]

GRETCHEN HUIZINGA: You’re listening to Ideas, a Microsoft Research Podcast that dives deep into the world of technology research and the profound questions behind the code. I’m Dr. Gretchen Huizinga. In this series, we’ll explore the technologies that are shaping our future and the big ideas that propel them forward.

[MUSIC FADES]

My guest on this episode is Abigail Sellen, known by her friends and colleagues as Abi. A social scientist by training and an expert in human-computer interaction, Abi has a long list of accomplishments and honors, and she’s a fellow of many technical academies and societies. But today I’m talking to her in her role as distinguished scientist and lab director of Microsoft Research Cambridge, UK, where she oversees a diverse portfolio of research, some of which supports a new initiative centered around the big idea of AI, Cognition, and the Economy, also known as AICE. Abi Sellen. I’m so excited to talk to you today. Welcome to Ideas!

ABIGAIL SELLEN: Thanks! Me, too.

HUIZINGA: So before we get into an overview of the ideas behind AICE research, let’s talk about the big ideas behind you. Tell us your own research origin story, as it were, and if there was one, what big idea or animating “what if?” captured your imagination and inspired you to do what you’re doing today?

SELLEN: OK, well, you’re asking me to go back in the mists of time a little bit, but let me try. [LAUGHTER] So I would say, going … this goes back to my time when I started doing my PhD at UC San Diego. So I had just graduated as a psychologist from the University of Toronto, and I was going to go off and do a PhD in psychology with a guy called Don Norman. So back then, I really had very little interest in computers. And in fact, computers weren’t really a thing that normal people used. [LAUGHTER] They were things that you might, like, put punch cards into. Or, in fact, in my undergrad days, I actually programmed in hexadecimal, and it was horrible. But at UCSD, they were using computers everywhere, and it was, kind of, central to how everyone worked. And we even had email back then. So computers weren’t really for personal use, and it was clear that they were designed for engineers by engineers. And so they were horrible to use, people grappling with them, people were making mistakes. You could easily remove all your files just by doing rm*. So the big idea that was going around the lab at the time—and this was by a bunch of psychologists, not just Don, but other ones—was that we could design computers for people, for people to use, and take into account, you know, how people act and interact with things and what they want. And that was a radical idea at the time. And that was the start of this field called human-computer interaction, which is … you know, now we talk about designing computers for people and “user-friendly” and that’s a, kind of, like, normal thing, but back then …

HUIZINGA: Yeah …

SELLEN: … it was a radical idea. And so, to me, that changed everything for me to think about how we could design technology for people. And then, if I can, I’ll talk about one other thing that happened …

HUIZINGA: Yeah, please.

SELLEN: … during that time. So at that time, there was another gang of psychologists, people like Dave Rumelhart, Geoff Hinton, Jay McClelland, people like that, who were thinking about, how do we model human intelligence—learning, memory, cognition—using computers? And so these were psychologists thinking about, how do people represent ideas and knowledge, and how can we do that with computers?

HUIZINGA: Yeah …

SELLEN: And this was radical at the time because cognitive psychologists back then were thinking about … they did lots of, kind of, flow chart models of human cognition. And people like Dave Rumelhart did networks, neural networks, …

HUIZINGA: Ooh …

SELLEN: … and they were using what were then called spreading activation models of memory and things, which came from psychology. And that’s interesting because not only were they modeling human cognition in this, kind of, what they called parallel distributed processing, but they operationalized it. And that’s where Hinton and others came up with the back-propagation algorithm, and that was a huge leap forward in AI. So psychologists were actually directly responsible for the wave of AI we see today. A lot of computer scientists don’t know that. A lot of machine learning people don’t know that. But so, for me, long story short, that time in my life and doing my PhD at UC San Diego led to me understanding that social science, psychology in particular, and computing should be seen as things which mutually support one another and that can lead to huge breakthroughs in how we design computers and computer algorithms and how we do computing. So that, kind of, set the path for the rest of my career. And that was 40 years ago!

HUIZINGA: Did you have what we’ll call metacognition of that being an aha moment for you, and like, I’m going to embrace this, and this is my path forward? Or was it just, sort of, more iterative: these things interest you, you take the next step, these things interest you more, you take that step?

SELLEN: I think it was an aha moment at certain points. Like, for example, the day that Francis Crick walked into our seminar and started talking about biologically inspired models of computing, I thought, “Ooh, there’s something big going on here!”

HUIZINGA: Wow, yeah.

SELLEN: Because even then I knew that he was a big deal. So I knew there was something happening that was really, really interesting. I didn’t think so much about it from the point of view of, you know, I would have a career of helping to design human-centric computing, but more, wow, there’s a breakthrough in psychology and how we understand the human mind. And I didn’t realize at that time that that was going to lead to what’s happening in AI today.

HUIZINGA: Well, let’s talk about some of these people that were influential for you as a follow-up to the animating “big idea.” If I’m honest, Abi, my jaw dropped a little when I read your bio because it’s like a who’s who of human-centered computing and UX design. And now these people are famous. Maybe they weren’t so much at the time. But tell us about the influential people in your life, and how their ideas inspired you?

SELLEN: Yeah, sure, happy to. In fact, I’ll start with one person who is not a, sort of, HCI person, but my stepfather, John Senders, was this remarkable human being. He died three years ago at the age of 98. He worked almost to his dying day. Just an amazing man. He entered my life when I was about 13. He joined the family. And he went to Harvard. He trained with people like Skinner. He was taught by these, kind of, famous psychologists of the 20th century, and they were his friends and his colleagues, and he introduced me to a lot of them. You know, people like Danny Kahneman and, you know, Amos Tversky and Alan Baddeley, and all these people that, you know, I had learned about as an undergrad. But the main thing that John did for me was to open my eyes to how you could think about modeling humans as machines. And he really believed that. He was not only a psychologist, but he was an engineer. And he, sort of, kicked off or he was one of the founders of the field of human factors engineering. And that’s what human factors engineers do. They look at people, and they think, how can we mathematically model them? So, you know, we’d be sitting by a pool, and he’d say, “You can use information sampling to model the frequency with which somebody has to watch a baby as they go towards the pool. And it depends on their velocity and then their trajectory… !” [LAUGHTER] Or we go into a bank, and he’d say, “Abi, how would you use queuing theory to, you know, estimate the mean wait time?” Like, you know, so he got me thinking like that, and he recognized in me that I had this curiosity about the world and about people, but also, that I loved mathematics. So he was the first guy. Don Norman, I’ve already mentioned as my PhD supervisor, and I’ve said something about already how he, sort of, had this radical idea about designing computers for people. And I was fortunate to be there when the field of human-computer interaction was being born, and that was mainly down to him. And he’s just [an] incredible guy. He’s still going. He’s still working, consulting, and he wrote this famous book called The Psychology of Everyday Things, which now is, I think it’s been renamed The Design of Everyday Things, and he was really influential and been a huge supporter of mine. And then the third person I’ll mention is Bill Buxton. And …

HUIZINGA: Yeah …

SELLEN: Bill, Bill …

HUIZINGA: Bill, Bill, Bill! [LAUGHTER]

SELLEN: Yeah. I met Bill at, first, well, actually first at University of Toronto; when I was a grad student, I went up and told him his … the experiment he was describing was badly designed. And instead of, you know, brushing me off, he said, “Oh really, OK, I want to talk to you about that.” And then I met him at Apple later when I was an intern, and we just started working together. And he is, he’s just … amazing designer. Everything he does is based on, kind of, theory and deep thought. And he’s just so much fun. So I would say those three people have been big influences on me.

HUIZINGA: Yeah. What about Marilyn Tremaine? Was she a factor in what you did?

SELLEN: Yes, yeah, she was great. And Ron Baecker. So…

HUIZINGA: Yeah …

SELLEN: … after I did my PhD, I did a postdoc at Toronto in the Dynamic Graphics Project Lab. And they were building a media space, and they asked me to join them. And Marilyn and Ron and Bill were building this video telepresence media space, which was way ahead of its time.

HUIZINGA: Yeah.

SELLEN: So I worked with all three of them, and they were great fun.

HUIZINGA: Well, let’s talk about the research initiative AI, Cognition, and the Economy. For context, this is a global, interdisciplinary effort to explore the impact of generative AI on human cognition and thinking, work dynamics and practices, and labor markets and the economy. Now, we’ve already lined up some AICE researchers to come on the podcast and talk about specific projects, including pilot studies, workshops, and extended collaborations, but I’d like you to act as a, sort of, docent or tour guide for the initiative, writ large, and tell us why, particularly now, you think it’s important to bring this group of scientists together and what you hope to accomplish.

SELLEN: I think it’s important now because I think there are so many extreme views out there about how AI is going to impact people. A lot of hyperbole, right. So there’s a lot of fear about, you know, jobs going away, people being replaced, robots taking over the world. And there’s a lot of enthusiasm about how, you know, we’re all going to be more productive, have more free time, how it’s going to be the answer to all our problems. And so I think there are people at either end of that conversation. And I always … I love the Helen Fielding quote … I don’t know if you know Helen Fielding. She wrote…

HUIZINGA: Yeah, Bridget Jones’s Diary …

SELLEN: … Bridget Jones’s Diary. Yeah. [LAUGHTER] She says, “Nothing is either as bad or as good as it seems,” right. And I live by that because I think things are usually somewhere in the middle. So I’m not saying that we shouldn’t take concerns seriously about AI or be hugely optimistic about the opportunities, but rather, my view on this is that we can do research to get, kind of, line of sight into the future and what is going to happen with AI. And more than this, we should be using research to not just get line of sight but to steer the future, right. We can actually help to shape it. And especially being at Microsoft, we have a chance to do that. So what I mean here is that let’s begin by understanding first the capabilities of AI and get a good understanding of where it’s heading and the pace that it’s heading at because it’s changing so fast, right.

HUIZINGA: Mm-hmm …

SELLEN: And then let’s do some research looking at the impact, both in the short term and the long term, about its impact on tasks, on interaction, and, most importantly for me anyway, on people. Yeah, and then we can extrapolate out how this is going to impact jobs, skills, organizations, society at large, you know. So we get this, kind of, arc that we can trace, but we do it because we do research. We don’t just rely on the hyperbole and speculation, but we actually try and do it more systematically. And then I think the last piece here is that if we’re going to do this well and if we think about what AI’s impact can be, which we think is going to impact on a global scale, we need many different skills and disciplines. We need not just machine learning people and engineering and computer scientists at large, but we need designers, we need social scientists, we need even philosophers, and we need domain experts, right. So we need to bring all of these people together to do this properly.

HUIZINGA: Interesting. Well, let’s do break it down a little bit then. And I want to ask you a couple questions about each of the disciplines within the acronym A-I-C-E, or AICE. And I’ll start with AI and another author that we can refer to. Sci-fi author and futurist Arthur C. Clarke famously said that “any sufficiently advanced technology is indistinguishable from magic,” and for many people, AI systems seem to be magic. So in response to that, many in the industry have emphatically stated that AI is just a tool. But you’ve said things like AI is more a “collaborative copilot than a mere tool,” and recently, you said we might even think of it as a “very smart and intuitive butler.” So how do those ideas from the airline industry and Downton Abbey help us better understand and position AI and its role in our world?

SELLEN: Well, I’m going to use Wodehouse here in a minute as well, but um … so I think AI is different from many other tech developments in a number of important ways. One is, it has agency, right. So it can take initiative and do things on your behalf. It’s highly complex, and, you know, it’s getting more complex by the day. It changes. It’s dynamic. It’s probabilistic rather than deterministic, so it will give you different answers depending on when, you know, when you ask it and what you ask it. And it’s based on human-generated data. So it’s a vastly different kind of tool than HCI, as a field, has studied in the past. There are lots of downsides to that, right. One is it means it’s very hard to understand how it works under the hood, right …

HUIZINGA: Yeah …

SELLEN: … and understanding the output. It’s fraught with uncertainty because the output changes every time you use it. But then let’s think about the upsides, especially, large language models give us a way of conversationally interacting with AI like never before, right. So it really is a new interaction paradigm which has finally come of age. So I do think it’s going to get more personal over time and more anticipatory of our needs. And if we design it right, it can be like the perfect butler. So if you know P.G. Wodehouse, Jeeves and Wooster, you know, Jeeves knows that Bertie has had a rough night and has a hangover, so he’s there at the bedside with a tonic and a warm bath already ready for him. But he also knows what Wooster enjoys and what decisions should be left to him, and he knows when to get out of the way. He also knows when to be very discreet, right. So when I use that butler metaphor, I think about how it’s going to take time to get this right, but eventually, we may live in a world where AI helps us with good attention to privacy of getting that kind of partnership right between Jeeves and Wooster.

HUIZINGA: Right. Do you think that’s possible?

SELLEN: I don’t think we’ll ever get it exactly right, but if we have a conversational system where we can mutually shape the interaction, then even if Jeeves doesn’t get things right, Wooster can train him to do a better job.

HUIZINGA: Go back to the copilot analogy, which is a huge thing at Microsoft — in fact, they’ve got products named Copilot — and the idea of a copilot, which is, sort of, assuaging our fears that it would be the pilot …

SELLEN: Yeah …

HUIZINGA: … AI.

SELLEN: Yeah, yeah.

HUIZINGA: So how do we envision that in a way that … you say it’s more than a mere tool, but it’s more like a copilot?

SELLEN: Yeah, I actually like the copilot metaphor for what you’re alluding to, which is that the pilot is the one who has the final say, who has the, kind of, oversight of everything that’s happening and can step in. And also that the copilot is there in a supportive role, who kind of trains by dint of the fact that they work next to the pilot, and that they have, you know, specialist skills that can help.

HUIZINGA: Right …

SELLEN: So I really like that metaphor. I think there are other metaphors that we will explore in future and which will make sense for different contexts, but I think, as a metaphor for a lot of the things we’re developing today, it makes a lot of sense.

HUIZINGA: You know, it also feels like, in the conversation, words really matter in how people perceive what the tool is. So having these other frameworks to describe it and to implement it, I think, could be really helpful.

SELLEN: Yes, I agree.

[MUSIC BREAK]

HUIZINGA: Well, let’s talk about intelligence for a second. One of the most interesting things about AI is it’s caused us to pay attention to other kinds of intelligence. As author Meghan O’Gieblyn puts it, “God, human, animal, machine … ” So why do you think, Abi, it’s important to understand the characteristics of each kind of intelligence, and how does that impact how we conceptualize, make, and use what we’re calling artificial intelligence?

SELLEN: Yeah, well, I actually prefer the term machine intelligence to artificial intelligence …

HUIZINGA: Me too! Thank you! [LAUGHTER]

SELLEN: Because the latter implies that there’s one kind of intelligence, and also, it does allude to the fact that that is human-like. You know, we’re trying to imitate the human. But if you think about animals, I think that’s really interesting. I mean, many of us have good relationships with our pets, right. And we know that they have a different kind of intelligence. And it’s different from ours, but that doesn’t mean we can’t understand it to some extent, right. And if you think about … animals are superhuman in many ways, right. They can do things we can’t. So whether it’s an ox pulling a plow or a dog who can sniff out drugs or ferrets who can, you know, thread electrical cables through pipes, they can do things. And bee colonies are fascinating to me, right. And they work as a, kind of, a crowd intelligence, or hive mind, right. [LAUGHTER] That’s where that comes from. And so in so many ways, animals are smarter than humans. But it doesn’t matter—like this “smarter than” thing also bugs me. It’s about being differently intelligent, right. And the reason I think that’s important when we think about machine intelligence is that machine intelligence is differently intelligent, as well. So the conversational interface allows us to explore the nature of that machine intelligence because we can speak to it in a kind of human-like way, but that doesn’t mean that it is intelligent in the same way a human is intelligent. And in fact, we don’t really want it to be, right.

HUIZINGA: Right …

SELLEN: Because we want it, we want it to be a partner with us, to do things that we can’t, you know, just like using the plow and the ox. That partnership works because the ox is stronger than we are. So I think machine intelligence is a much better word, and understanding it’s not human is a good thing. I do worry that, because it sounds like a human, it can seduce us into thinking it’s a human …

HUIZINGA: Yeah …

SELLEN: … and that can be problematic. You know, there are instances where people have been on, for example, dating sites and a bot is sounding like a human and people get fooled. So I think we don’t want to go down the path of fooling people. We want to be really careful about that.

HUIZINGA: Yeah, this idea of conflating different kinds of intelligences to our own … I think we can have a separate vision of animal intelligence, but machines are, like you say, kind of seductively built to be like us.

SELLEN: Yeah …

HUIZINGA: And so back to your comment about shaping how this technology moves forward and the psychology of it, how might we envision how we could shape, either through language or the way these machines operate, that we build in a “I’m not going to fool you” mechanism?

SELLEN: Well, I mean, there are things that we do at the, kind of, technical level in terms of guardrails and metaprompts, and we have guidelines around that. But there’s also the language that an AI character will use in terms of, you know, expressing thoughts and feelings and some suggestion of an inner life, which … these machines don’t have an inner life, right.

HUIZINGA: Right!

SELLEN: So … and one of the reasons we talk to people is we want to discover something about their inner life.

HUIZINGA: Yessss …

SELLEN: And so why would I talk to a machine to try and discover that? So I think there are things that we can do in terms of how we design these systems so that they’re not trying to deceive us. Unless we want them to deceive us. So if we want to be entertained or immersed, maybe that’s a good thing, right? That they deceive us. But we enter into that knowing that that’s what’s happening, and I think that’s the difference.

HUIZINGA: Well, let’s talk about the C in A-I-C-E, which is cognition. And we’ve just talked about other kinds of intelligence. Let’s broaden the conversation and talk about the impact of AI on humans themselves. Is there any evidence to indicate that machine intelligence actually has an impact on human intelligence, and if so, why is that an important data point?

SELLEN: Yeah, OK, great topic. This is one of my favorite topics. [LAUGHTER] So, well, let me just backtrack a little bit for a minute. A lot of the work that’s coming out today looking at the impact of AI on people is in terms of their productivity, in terms of how fast they can do something, how efficiently they can do a job, or the quality of the output of the tasks. And I do think that’s important to understand because, you know, as we deploy these new tools in peoples’ hands, we want to know what’s happening in terms of, you know, peoples’ productivity, workflow, and so on. But there’s far less of it on looking at the impact of using AI on people themselves and on how people think, on their cognitive processes, and how are these changing over time? Are they growing? Are they atrophying as they use them? And, relatedly, what’s happening to our skills? You know, over time, what’s going to be valued, and what’s going to drop away? And I think that’s important for all kinds of reasons. So if you think about generative AI, right, these are these AI systems that will write something for us or make a slide deck or a picture or a video. What they’re doing is they are taking the cognitive work of generation of an artifact or the effort of self-expression that most of us, in the old-fashioned world, will do, right—we write something, we make something—they’re doing that for us on our behalf. And so our job then is to think about how do we specify our intention to the machine, how do we talk to it to get it to do the things we want, and then how do we evaluate the output at the end? So it’s really radically shifting what we do, the work that we do, the cognitive and mental work that we do, when we engage with these tools. Now why is that a problem? Or should it be a problem? One concern is that many of us think and structure our thoughts through the process of making things, right. Through the process of writing or making something. So a big question for me is, if we’re removed from that process, how deeply will we learn or understand what we’re writing about? A second one is, you know, if we’re not deeply engaged in the process of generating these things, does that actually undermine our ability to evaluate the output when we do get presented with it?

HUIZINGA: Right …

SELLEN: Like, if it writes something for us and it’s full of problems and errors, if we stop writing for ourselves, are we going to be worse at, kind of, judging the output? Another one is, as we hand things over to more and more of these automated processes, will we start to blindly accept or over-rely on our AI assistants, right. And the aviation industry has known that for years …

HUIZINGA: Yeah …

SELLEN: … which is why they stick pilots in simulators. Because they rely on autopilot so much that they forget those key skills. And then another one is, kind of, longer term, which is like these new generations of people who are going to grow up with this technology, what are the fundamental skills that they’re going to need to not just to use the AI but to be kind of citizens of the world and also be able to judge the output of these AI systems? So the calculator, right, is a great example. When it was first introduced, there was a huge outcry around, you know, kids won’t be able to do math anymore! Or we don’t need to teach it anymore. Well, we do still teach it because when you use a calculator, you need to be able to see whether or not the output the machine is giving you is in the right ballpark, right.

HUIZINGA: Right …

SELLEN: You need to know the basics. And so what are the basics that kids are going to need to know? We just don’t have the answer to those questions. And then the last thing I’ll say on this, because I could go on for a long time, is we also know that there are changes in the brain when we use these new technologies. There are shifts in our cognitive skills, you know, things get better and things do deteriorate. So I think Susan Greenfield is famous for her work looking at what happens to the neural pathways in the age of the internet, for example. So she found that all the studies were pointing to the fact that reading online and on the internet meant that our visual-spatial skills were being boosted, but our capacity to do deep processing, mindful knowledge acquisition, critical thinking, reflection, were all decreasing over time. And I think any parent who has a teenager will know that focus of attention, flitting from one thing to another, multitasking, is, sort of, the order of the day. Well, not just for teenagers. I think all of us are suffering from this now. It’s much harder. I find it much harder to sit down and read something in a long, focused way …

HUIZINGA: Yeah …

SELLEN: … than I used to. So all of this long-winded answer is to say, we don’t understand what the impact of these new AI systems is going to be. We need to do research to understand it. And we need to do that research both looking at short-term impacts and long-term impacts. Not to say that this is all going to be bad, but we need to understand where it’s going so we can design around it.

HUIZINGA: You know, even as you asked each of those questions, Abi, I found myself answering it preemptively, “Yes. That’s going to happen. That’s going to happen.” [LAUGHS] And so even as you say all of this and you say we need research, do you already have some thinking about, you know, if research tells us the answer that we thought might be true already, do we have a plan in place or a thought process in place to address it?

SELLEN: Well, yes, and I think we’ve got some really exciting research going on in the company right now and in the AICE program, and I’m hoping your future guests will be able to talk more in-depth about these things. But we are looking at things like the impact of AI on writing, on comprehension, on mathematical abilities. But more than that. Not just understanding the impact on these skills and abilities, but how can we design systems better to help people think better, right?

HUIZINGA: Yeah …

SELLEN: To help them think more deeply, more creatively. I don’t think AI needs to necessarily de-skill us in the critical skills that we want and need. It can actually help us if we design them properly. And so that’s the other part of what we’re doing. It’s not just understanding the impact, but now saying, OK, now that we understand what’s going on, how do we design these systems better to help people deepen their skills, change the way that they think in ways that they want to change—in being more creative, thinking more deeply, you know, reading in different ways, understanding the world in different ways.

HUIZINGA: Right. Well, that is a brilliant segue into my next question. Because we’re on the last letter, E, in AICE: the economy. And that I think instills a lot of fear in people. To cite another author, since we’re on a citing authors roll, Clay Shirky, in his book Here Comes Everybody, writes about technical revolutions in general and the impact they have on existing economic paradigms. And he says, “Real revolutions don’t involve an orderly transition from point A to point B. Rather, they go from A, through a long period of chaos, and only then reach B. And in that chaotic period the old systems get broken long before the new ones become stable.” Let’s take Shirky’s idea and apply it to generative AI. If B equals the future of work, what’s getting broken in the period of transition from how things were to how things are going to be, what do we have to look forward to, and how do we progress toward B in a way that minimizes chaos?

SELLEN: Hmm … oh, those are big questions! [LAUGHS]

HUIZINGA: Too many questions! [LAUGHS]

SELLEN: Yeah, well, I mean, Shirky was right. Things take a long time to bed in, right. And much of what happens over time, I don’t think we can actually predict. You know, so who would have predicted echo chambers or the rise of deepfakes or, you know, the way social media could start revolutions in those early days of social media, right. So good and bad things happen, and a lot of it’s because it rolls out over time, it scales up, and then people get involved. And that’s the really unpredictable bit, is when people get involved en masse. I think we’re going to see the same thing with AI systems. They are going to take a long time to bed in, and its impact is going to be global, and it’s going to take a long time to unfold. So I think what we can do is, to some extent, we can see the glimmerings of what’s going to happen, right. So I think it’s the William Gibson quote is, you know, “The future’s already here; it’s just not evenly distributed,” or something like that, right. We can see some of the problems that are playing out, both in the hands of bad actors and things that will happen unintentionally. We can see those, and we can design for them, and we can do things about it because we are alert and we are looking to see what happens. And also, the good things, right. And all the good things that are playing out, …

HUIZINGA: Yeah …

SELLEN: … we can make the most of those. Other things we can do is, you know, at Microsoft, we have a set of responsible AI principles that we make sure all our products go through to make sure that we look into the future as much as we can, consider what the consequences might be, and then deploy things in very careful steps, evaluating as we go. And then, coming back to what I said earlier, doing deep research to try and get a better line of sight. So in terms of what’s going to happen with the future of work, I think, again, we need to steer it. Some of the things I talked about earlier in terms of making sure we build skills rather than undermine them, making sure we don’t over automate, making sure that we put agency in the hands of people. And always making sure that we design our AI experiences with human hope, aspirations, and needs in mind. If we do that, I think we’re on a good track, but we should always be vigilant, you know, to what’s evolving, what’s happening here.

HUIZINGA: Yeah …

SELLEN: I can’t really predict whether we’re headed for chaos or not. I don’t think we are, as long as we’re mindful.

HUIZINGA: Yeah. And it sounds like there’s a lot more involved outside of computer science, in terms of support systems and education and communication, to acclimatize people to a new kind of economy, which like you say, you can’t … I’m shocked that you can’t predict it, Abi. I was expecting that you could, but … [LAUGHTER]

SELLEN: Sorry.

HUIZINGA: Sorry! But yeah, I mean, do you see the ancillary industries, we’ll call them, in on this? And how can, you know, sort of, a lab in Cambridge, and labs around the world that are doing AI, how can they spread out to incorporate these other things to help the people who know nothing about what’s going on in your lab move forward here?

SELLEN: Well, I think, you know, there are lots of people that we need to talk to and to take account of. The word stakeholder … I hate that word stakeholder! I’m not sure why. [LAUGHTER] But anyway, stakeholders in this whole AI odyssey that we’re on … you know, public perceptions are one thing. I’m a member of a lot of societies where we do a lot of outreach and talks about AI and what’s going on, and I think that’s really, really important. And get people excited also about the possibilities of what could happen.

HUIZINGA: Yeah …

SELLEN: Because I think a lot of the media, a lot of the stories that get out there are very dystopian and scary, and it’s right that we are concerned and we are alert to possibilities, but I don’t think it does anybody any good to make people scared or anxious. And so I think there’s a lot we can do with the public. And there’s a lot we can do with, when I think about the future of work, different domains, you know, and talking to them about their needs and how they see AI fitting into their particular work processes.

HUIZINGA: So, Abi, we’re kind of [LAUGHS] dancing around these dystopian narratives, and whether they’re right or wrong, they have gained traction. So it’s about now that I ask all of my guests what could go wrong if you got everything right? So maybe you could present, in this area, some more hopeful, we’ll call them “-topias,” or preferred futures, if you will, around AI and how you and/or your lab and other people in the industry are preparing for them.

SELLEN: Well, again, I come back to the idea that the future is all around us to some extent, and we’re seeing really amazing breakthroughs, right, with AI. For example, scientific breakthroughs in terms of, you know, drug discovery, new materials to help tackle climate change, all kinds of things that are going to help us tackle some of the world’s biggest problems. Better understandings of the natural world, right, and how interventions can help us. New tools in the hands of low-literacy populations and support for, you know, different ways of working in different cultures. I think that’s another big area in which AI can help us. Personalization—personalized medicine, personalized tutoring systems, right. So we talked about education earlier. I think that AI could do a lot if we design it right to really help in education and help support people’s learning processes. So I think there’s a lot here, and there’s a lot of excitement—with good reason. Because we’re already seeing these things happening. And we should bear those things in mind when we start to get anxious about AI. And I personally am really, really excited about it. I’m excited about, you know, what the company I work for is doing in this area and other companies around the world. I think that it’s really going to help us in the long term, build new skills, see the world in new ways, you know, tackle some of these big problems.

HUIZINGA: I recently saw an ad—I’m not making this up—it was the quote-unquote “productivity app,” and it was simply a small wooden box filled with pieces of paper. And there was a young man who had a how-to video on how to use it on YouTube. [LAUGHS] He was clearly born into the digital age and found writing lists on paper to be a revolutionary idea. But I myself have toggled back and forth between what we’ll call the affordances of the digital world and the familiarity and comfort of the physical world. And you actually studied this and wrote about it in a book called The Myth of the Paperless Office. That was 20 years ago. Why did you do the work then, what’s changed in the ensuing years, and why in the age of AI do I love paper so much?

SELLEN: Yeah, so, that was quite a while ago now. It was a book that I cowrote with my husband. He’s a sociologist, so we, sort of, came together on that book, me as a psychologist and he as a sociologist. What we were responding to at the time was a lot of hype about the paperless office and the paperless future. At the time, I was working at EuroPARC, you know, which is the European sister lab of Xerox PARC. And so, obviously, they had big investment in this. And there were many people in that lab who really believed in the paperless office, and lots of great inventions came out of the fact that people were pursuing that vision. So that was a good side of that, but we also saw where things could go horribly wrong when you just took a paper-based system away and you just replaced it with a digital system.

HUIZINGA: Yeah …

SELLEN: I remember some of the disasters in air traffic control, for example, when they took the paper flight strips away and just made them all digital. And those are places where you don’t want to mess around with something that works.

HUIZINGA: Right.

SELLEN: You have to be really careful about how you introduce digital systems. Likewise, many people remember things that went wrong when hospitals tried to go paperless with health records being paperless. Now, those things are digital now, but we were talking about chaos earlier. There was a lot of chaos on the path. So what we’ve tried to say in that book to some extent is, let’s understand the work that paper is doing in these different work contexts and the affordances of paper. You know, what is it doing for people? Anything from, you know, I hand a document over to someone else; a physical document gives me the excuse to talk to that person …

HUIZINGA: Right…

SELLEN: … through to, you know, when I place a document on somebody’s desk, other people in the workplace can see that I’ve passed it on to someone else. Those kind of nuanced observations are useful because you then need to think, how’s the digital system going to replace that? Not in the same way, but it’s got to do the same job, right. So you need to talk to people, you need to understand the context of their work, and then you need to carefully plan out how you’re going to make the transition. So if we just try to inject AI into workflows or totally replace parts of workflows with AI without a really deep understanding of how that work is currently done, what the workers get from it, what is the value that the workers bring to that process, we could go through that chaos. And so it’s really important to get social scientists involved in this and good designers, and that’s where the, kind of, multidisciplinary thing really comes into its own. That’s where it’s really, really valuable.

HUIZINGA: Yeah … You know, it feels super important, that book, about a different thing, how it applies now and how you can take lessons from that arc to what you’re talking about with AI. I feel like people should go back and read that book.

SELLEN: I wouldn’t object! [LAUGHTER]

[MUSIC BREAK]

HUIZINGA: Let’s talk about some research ideas that are on the horizon. Lots of research is basically just incremental building on what’s been done before, but there are always those moonshot ideas that seem outrageous at first. Now, you’re a scientist and an inventor yourself, and you’re also a lab director, so you’ve seen a lot of ideas over the years. [LAUGHS] You’ve probably had a lot of ideas. Have any of them been outrageous in your mind? And if so, what was the most outrageous, and how did it work out?

SELLEN: OK, well, I’m a little reluctant to say this one, but I always believed that the dream of AI was outrageous. [LAUGHTER] So, you know, going back to those early days when, you know, I was a psychologist in the ’80s and seeing those early expert systems that were being built back then and trying to codify and articulate expert knowledge into machines to make them artificially intelligent, it just seemed like they were on a road to nowhere. I didn’t really believe in the whole vision of AI for many, many years. I think that when deep learning, that whole revolution’s kicked off, I never saw where it was heading. So I am, to this day, amazed by what these systems can do and never believed that these things would be possible. And so I was a skeptic, and I am no longer a skeptic, [LAUGHTER] with a proviso of everything else I’ve said before, but I thought it was an outrageous idea that these systems would be capable of what they’re now capable of.

HUIZINGA: You know, that’s funny because, going back to what you said earlier about your stepdad walking you around and asking you how you’d codify a human into a machine … was that just outrageous to you, or is that just part of the exploratory mode that your stepdad, kind of, brought you into?

SELLEN: Well, so, back then I was quite young, and I was willing to believe him, and I, sort of, signed up to that. But later, especially when I met my husband, a sociologist, I realized that I didn’t agree with any of that at all. [LAUGHTER] So we had great, I’ll say, “energetic” discussions with my stepdad after that, which was fun.

HUIZINGA: I bet.

SELLEN: But yeah, but so, it was how I used to think and then I went through this long period of really rejecting all of that. And part of that was, you know, seeing these AI systems really struggle and fail. And now here we are today. So yeah.

HUIZINGA: Yeah, I just had Rafah Hosn on the podcast and when we were talking about this “outrageous ideas” question, she said, “Well, I don’t really see much that’s outrageous.” And I said, “Wait a minute! You’re living in outrageous! You are in AI Frontiers at Microsoft Research.” Maybe it’s just because it’s so outrageous that it’s become normal?

SELLEN: Yeah …

HUIZINGA: And yeah, well … Well, finally, Abi, your mentor and adviser, Don Norman … you referred to a book that he wrote, and I know it as The Design of Everyday Things, and in it he wrote this: “Design is really an act of communication, which means having a deep understanding of the person with whom the designer is communicating.” So as we close, I’d love it if you’d speak to this statement in the context of AI, Cognition, and the Economy. How might we see the design of AI systems as an act of communication with people, and how do we get to a place where an understanding of deeply human qualities plays a larger role in informing these ideas, and ultimately the products, that emerge from a lab like yours?

SELLEN: So this is absolutely critical to getting AI development and design right. It’s deeply understanding people and what they need, what their aspirations are, what human values are we designing for. You know, I would say that as a social scientist, but I also believe that most of the technologists and computer scientists and machine learning people that I interact with on a daily basis also believe that. And that’s one thing that I love about the lab that I’m a part of, is that it’s very interdisciplinary. We’re always putting the, kind of, human-centric spin on things. And, you know, Don was right. And that’s what he’s been all about through his career. We really need to understand, who are we designing this technology for? Ultimately, it’s for people; it’s for society; it’s for the, you know, it’s for the common good. And so that’s what we’re all about. Also, I’m really excited to say we are becoming, as an organization, much more globally distributed. Just recently taken on a lab in Nairobi. And the cultural differences and the differences in different countries casts a whole new light on how these technologies might be used. And so I think that it’s not just about understanding different people’s needs but different cultures and different parts of the world and how this is all going to play out on a global scale.

HUIZINGA: Yeah … So just to, kind of, put a cap on it, when I said the term “deeply human qualities,” what I’m thinking about is the way we collaborate and work as a team with other people, having empathy and compassion, being innovative and creative, and seeking well-being and prosperity. Those are qualities that I have a hard time superimposing onto or into a machine. Do you think that AI can help us?

SELLEN: Yeah, I think all of these things that you just named are things which, as you say, are deeply human, and they are the aspects of our relationship with technology that we want to not only protect and preserve but support and amplify. And I think there are many examples I’ve seen in development and coming out which have that in mind, which seek to augment those different aspects of human nature. And that’s exciting. And we always need to keep that in mind as we design these new technologies.

HUIZINGA: Yeah. Well, Abi Sellen, I’d love to stay and chat with you for another couple hours, but how fun to have you on the show. Thanks for joining us today on Ideas.

SELLEN: It’s been great. I really enjoyed it. Thank you.

[MUSIC]

The post Ideas: Designing AI for people with Abigail Sellen appeared first on Microsoft Research.

What’s Your Story: Jacki O’Neill

May 16, 2024

by Alyssa Hughes Microsoft AI

Circle photo of Jacki O'Neill, director of the Microsoft Africa Research Institute (MARI), with a microphone in the corner on a blue and green gradient background

In the Microsoft Research Podcast series What’s Your Story, Johannes Gehrke explores the who behind the technical and scientific advancements helping to reshape the world. A systems expert whose 10 years with Microsoft spans research and product, Gehrke talks to members of the company’s research community about what motivates their work and how they got where they are today.

In this episode, Gehrke is joined by Jacki O’Neill, director of Microsoft Research Africa, Nairobi (formerly the Microsoft Africa Research Institute, or MARI) in Kenya. O’Neill pitched the idea for the lab after seeing an opportunity to expand the Microsoft research portfolio. She shares how a desire to build tech that can have global societal impact and a familial connection to the continent factored into the decision; how a belief that life is meant to be exciting has allowed her to take big personal and professional swings; and how her team in Nairobi is applying their respective expertise in human-computer interaction, machine learning, and data science to pursue globally equitable AI.

To learn more about the global impact of AI, efforts to make AI more equitable, and related topics, register for Microsoft Research Forum (opens in new tab), a series of panel discussions and lightning talks around science and technology research in the era of general AI.

Photos of Jacki O'Neill, director of the Microsoft Africa Research Institute (MARI), throughout her life.

Learn more:

Editor’s note, May 16, 2024 – Since the recording of this podcast episode, the name of the Microsoft Africa Research Institute (MARI) has changed. The name of the lab is now Microsoft Research Africa, Nairobi.

Transcript

[TEASER]

[MUSIC PLAYS UNDER DIALOGUE]

JACKI O’NEILL: I love living in different places, and those experiences are what help us innovate better and design things that are, like, taking another point of view, more creative, I think. Just sparks things in your, in your head. And, I mean, it’s so much fun.

[TEASER ENDS]

JOHANNES GEHRKE: Microsoft Research works at the cutting edge. But how much do we know about the people behind the science and technology that we create? This is What’s Your Story, and I’m Johannes Gehrke. In my 10 years with Microsoft, across product and research, I’ve been continuously excited and inspired by the people I work with, and I’m curious about how they became the talented and passionate people they are today. So I sat down with some of them. Now, I’m sharing their stories with you. In this podcast series, you’ll hear from them about how they grew up, the critical choices that shaped their lives, and their advice to others looking to carve a similar path.

[MUSIC FADES]

In this episode, I’m talking with Jacki O’Neill, director of the Microsoft Africa Research Institute—or MARI, for short—in Nairobi, Kenya. Jacki’s decadelong career at Microsoft began at the company’s India research lab, where she applied her ethnographic and human-computer interaction expertise to advancing equity in the country.

After the opening of two Microsoft software engineering centers in Africa, Jacki made the case for a research lab on the continent. She now leads the MARI team in making technology more inclusive, a role that allows her to pursue her goal of positive local change with global impact. Here’s my conversation with Jacki, beginning with her time growing up in Plymouth, England.

GEHRKE: We just had a discussion maybe a couple of years ago, right, when you were just in transition to Africa. So it’s really great to have you here and both learn a little bit what’s happening there, but also to learn a bit more about your story. Where did you grow up, and how did you end up here at Microsoft?

O’NEILL: Yeah, thanks for asking that. I’ve had a very, well, it’s definitely not been a straight road to get here, but the windy roads are the most interesting ones. I grew up in Plymouth, which is a dockyard and naval town in the southwest of England, so a socially deprived working-class town. So when I was growing up, it was a thriving working-class town, but of course with those industries, you know, they didn’t, they didn’t pass so well through those years. So, you know, by the time I was leaving school, it was quite a deprived city and still is. I think that it’s really important to be in those type of places, though, because you get a very rich view of life, and I left them as soon as I could, [LAUGHS] so …

GEHRKE: When you went to university?

O’NEILL: Went to, well, I went and I was a cook for a year in the Lake District, which is a very beautiful part of the UK, and then went to university.

GEHRKE: Where’s the Lake District?

O’NEILL: It is northwest, and it’s all hills. It’s, like, Wordsworth Country. It’s all hills and poetry and beautiful houses. And, yeah, it was a fantastic time working as a cook there. And then I went to Manchester to do my degree.

GEHRKE: OK. And what is your degree in?

O’NEILL: Ah, so, yes, I had, I did a social science degree to start with. I started at the time when you could get a degree in anything and get any job at the end of it. But by the time I came out of my degree, it was a recession.

GEHRKE: But did you have, did you have specific plans while you were studying of what you want, you know, what profession you wanted to go into?

O’NEILL: Not really. I didn’t. I think I’d, I think like many young people, I didn’t really know, but I felt that I would find something interesting when I came out. And then, you know, I just worked lots of different jobs. [LAUGHS]

GEHRKE: What is your favorite college course?

O’NEILL: My favorite college course—in my degree? Gosh, that’s a good question. It was all so long ago. [LAUGHS]

GEHRKE: OK …

O’NEILL: My favorite, I guess, yeah, no, I, so, I did … my degree was in psychology. I worked, and then I did my master’s in computer science and then my PhD in human-computer interaction.

GEHRKE: That’s quite a change, right, from psychology into computer science, then.

O’NEILL: Yes, yes. And I just, you know, I’d always just wanted to do computing, but when I was at school, it was … we had one computer in the school, and so it was, like, a computer at home or you don’t do computer science. So, you know, I didn’t do it.

GEHRKE: Right.

O’NEILL: So then as computers became more prominent, more available, you know, I was working in libraries, and they started computerizing, and I worked on that project, and then that led me to do a master’s. And so I was like, hey, this is the opportunity to really get into this area, and I loved it. It was fantastic. And Manchester’s computer science department is one of the top departments, and I had an amazing … Carole Goble was my thesis supervisor. She was absolutely amazing and strong for women in computing. But at the end of it, I was like, OK, so I didn’t want to do pure social science and I didn’t want to do pure computer science. What I want to do is do human-computer science, so where you really merge the two. And that’s how I got into HCI, and I think that’s where I started finding my favorite courses. You know, I loved the research methods. I loved those types of things.

GEHRKE: And what is your PhD about?

O’NEILL: Ooh, it was very boring. [LAUGHTER] My PhD was in computer-supported cooperative work [CSCW], and …

GEHRKE: OK. Oh, yeah. Very relevant now, right?

O’NEILL: Yeah, very relevant now. And that was a really exciting time for CSCW, as well, because there were so many different labs. There were Sun Systems, there was Xerox, there was Microsoft—all doing really cool, like, collaborative technologies. So it seemed like a brilliant area to go into. But I was looking at, can we support networking events for businesses?

GEHRKE: Wow. Uh-huh …

O’NEILL: So it was just at the time of the first, you know, things like Webex and things, you know, the first collaborative seminar-y …

GEHRKE: Yeah, so you’re way ahead of the social networks, right, and everything, right?

O’NEILL: Yeah, yeah.

GEHRKE: And there was a whole conference at that point in time, right? CSCW, I think I remember. Wasn’t there …

O’NEILL: Yes, yes, yes.

GEHRKE: So it was and still is, I think, a really big field.

O’NEILL: Yes, it’s a, it’s a, it’s really interesting. And I think one of the things that’s interesting with the foundational models now is many of the things that people like me, HCI people, have been wanting to happen—”Oh, if only we can enable people to interact with technology like this”—are now suddenly possible, which is quite exciting.

GEHRKE: Yeah, so we’ll get to that in a little bit because I think, you know, as you said, the whole field of HCI is now changing with foundational models and what the interfaces are, will be. I think it’s a really interesting, deep research question right now. So, so, OK, so you got your PhD; you’re in Manchester. What’s the next step in your career? Where did you go next?

O’NEILL: Yeah, I actually got a job before I finished my PhD. So I took quite a long time to do my PhD. I think it was seven years in the end, partly because I was teaching. When I was doing—like, lecturing when I was doing my PhD, and I also had a job as a consultant occasionally, working with, I think, I worked with the Co-op Bank. I worked with some usability companies, and you could, I could make enough money to live for a term on, like, two weeks’ consultancy because I didn’t have very high costs. [LAUGHS]

GEHRKE: Right. You lived as a grad student, right?

O’NEILL: Yes. Yeah. Yeah. And, actually, you know, I was living in Manchester. I was living in a squat, so I wasn’t paying any rent, [LAUGHS] so …

GEHRKE: Oh, really?

O’NEILL: Yes. So I didn’t have very many costs.

GEHRKE: OK.

O’NEILL: Which was very handy. So I didn’t have any real incentive to finish my PhD until I got a job, you know. When I finished my master’s, I looked at the job market, and with my computer science master’s, the main job was database manager, [LAUGHS] which didn’t appeal.

GEHRKE: That sounds now really interesting. [LAUGHTER]

O’NEILL: Yeah. So I, actually, that’s why I ended up doing a PhD, because I was like, I don’t want to go back to work yet. You know, I’ve been working for five years before. So, so, yeah, I just was enjoying doing a PhD and doing pieces of work here and there. And then I got a job at Xerox in Cambridge, and then that’s when I got motivated to finish my PhD because working and doing a PhD at the same time is not much fun.

GEHRKE: Right, right. So you got your PhD, had your job lined up, and then you’re starting at Xerox. What were you doing in Xerox?

O’NEILL: Human-computer interaction. Yeah, it was a really exciting time. There was so much going on in the industry. I was so delighted. It was like my dream job to be in industry and to maybe create cool interfaces and, you know, cool collaborative systems. So … and then they closed the lab [LAUGHS] within six months. It wasn’t my fault.

GEHRKE: So quickly?

O’NEILL: Mm-hmm.

GEHRKE: Wow. And what did you do then? I mean, this is your first big job, and …

O’NEILL: Yes …

GEHRKE: … such a quick setback.

O’NEILL: They offered me a job in their lab in France. So I stayed in the UK for a while and worked half in France, half in the UK, and then I shifted to France full time.

GEHRKE: OK. Oh, wow. So do you … where in France did you live then?

O’NEILL: Grenoble.

GEHRKE: OK, yeah. In the middle of …

O’NEILL: In the French Alps.

GEHRKE: … the French Alps. Exactly. Beautiful place.

O’NEILL: Absolutely … yes. Yeah. Skiing, climbing, hiking. So much fun.

GEHRKE: And, OK, so you’re at Xerox PARC in the French Alps. What’s, what’s next?

O’NEILL: They were opening, Xerox was opening a research lab in India. And I’d always wanted to travel. You know, I’d always wanted … and I never really had the money or the opportunity to travel. So when they said they were opening it, I just went to my boss and said, hey, I don’t know what you’d want me to do, but if there’s any opportunities for me to do anything to help …

GEHRKE: Wow.

O’NEILL: … the opening of India, I’d love to. And I went out for a month and then I went out for three months.

GEHRKE: I mean, both of these sound like really bold steps to me. First of all, I mean, Grenoble is probably pure French speaking, right? And, I don’t know, did you have high school French or you were good … [LAUGHS]

O’NEILL: I had high school French, yes, and then we drove, we drove from the UK to Grenoble listening to “learn French” tapes [LAUGHS] …

GEHRKE: OK, wow … [LAUGHS]

O’NEILL: …in the car. Yeah.

GEHRKE: Wow. And that was enough then to get by with a daily …

O’NEILL: Actually, so it was great in France because they expect you to learn the language, so you have French lessons at work. And then, actually, I did an evening class, as well, that was paid for by work, a really intensive one-month, like two hours a night, every night of the week. And that really helped. Yeah, it was, it’s fantastic.

GEHRKE: Wow, that’s really great. And then, and then you took the even bigger step to move to India, right. How was that like, and what was your experience there?

O’NEILL: Yeah, India is just magical. You know, initially, I just went for one month, then three months, and it was just—the people, the culture, the work I was doing, the research I was doing was like no research … you know, I’d spent a lot of time in call centers around Europe doing studies, ethnographic studies, and designing technology. Lots of time looking at photocopiers because I was with Xerox. [LAUGHS] And then so going to India, suddenly, you know, I’m looking at social enterprises. I’m looking at all sorts of businesses and different ways of life and different people. And it was just so rich and so amazing that I was like, OK, I really want to do this. And that’s actually when I applied to Microsoft because Microsoft had the Technology for Emerging Markets group there, which is world-class research in that space. So I was like, OK, if I want to keep on doing this, then that’s what I’m going to apply to. And luckily enough, I got the job, and that’s how I joined Microsoft.

GEHRKE: Wow. So, so, OK, so you’re now at Microsoft in India. That was in Bangalore, right, where our research lab there is?

O’NEILL: Mm-hmm.

GEHRKE: And so what, what were you working on there for the next few years?

O’NEILL: Yeah. So initially, I looked at a few different things. I joined some existing projects. So I was on MEC, which was the educational platform, looking at whether we could bring the power of MOOCs [Massive Open Online Courses] to Indian education to improve the level of education because they have amazing colleges at the top, but, actually, the vast majority of students go to these intermediate colleges, and the teaching level really varies. And so the idea was, can you help with blended learning? Can you help the teachers teach better? That turns out to be really challenging. And, actually, the system ended up being used by the students to teach themselves.

GEHRKE: Oh, like for independent learning?

O’NEILL: Mm-hmm. Mm-hmm. And that was really, so that was interesting, doing some studies there. I looked at … Indrani [Medhi Thies] had done an amazing project where they’d built “Facebook for Farmers.” So I did a study of that, which was really, really fun. And then I worked in financial inclusion, one of my big areas. I spent about five years working with auto-rickshaw drivers in Bangalore, designing technologies to help them understand the loans they’d taken out, which was really, really fun. They’re a very great community to work [with]. You don’t get any nonsense from an auto-rickshaw driver. [LAUGHS]

GEHRKE: Well, I was just thinking, what was it like to, like, live in India and just move there and start out there?

O’NEILL: Uh, it was, I mean, it was fantastic. It’s a great place to live. The people are amazing. The food is amazing. Moving with Microsoft makes it very easy because Microsoft takes care of you when you move so you’re not, you know, some of the stresses that you might have around the move are taken care of. I had a young family. I had a 2-year-old son when we moved out there and within a year had another one, which was not 100 percent planned, because you don’t usually move to a new company and then have a baby. You’re like, oh, sorry. [LAUGHS] But that was all fine. Yeah.

GEHRKE: And, and, you know, you worked with all of these different communities in India, right. How did you connect to the communities? I mean, these were teachers …

O’NEILL: Yeah, you need to, you really need to go with people, so you have to convince some organization that what you’re going to do is going to be beneficial to them and useful for them. And then if they’re trusted by the community, they give you access. And that’s really great because you do have access that you wouldn’t otherwise have. You know, if you’re really wanting to build technologies to support people, you really need to understand what they care about—what do they want help with?—and you only get that if you’ve got a trusted relationship with them. So we worked with, there was one organization that worked with the auto-rickshaw drivers’ wives. It was about empowering women, and we got access to the drivers initially through that organization.

GEHRKE: That’s amazing. I mean, you know, I’ve visited India many times, but I can only imagine how it is to live there, actually. So do you have some of the stories of what is, sort of, most surprising for you given that you’ve lived there?

O’NEILL: Yeah … what’s most surprising? I think, so one thing is, one thing is people want to tell you what they think you want to hear. So if you’re lost, you need to ask quite a few people for directions and then make some sort of assessment about whether the person was just saying “yes, yes, that way” because he knew the way or “yes, yes that way” because he just didn’t want to tell you that he didn’t know. And so you have to, sort of, judge. [LAUGHS] So that’s one, like, useful piece of …

GEHRKE: So the first few times you went in the wrong direction? [LAUGHS]

O’NEILL: Yes, exactly. And then you’re like, “But they said …”; you ask someone else, and they’re like, “No, it’s over there.” And then someone … so that’s … the most useful piece of advice I could give to anyone who’s visiting India, is when you cross the road, just find someone else who’s already crossing the road and cross with them.

GEHRKE: Because it’s so dangerous if you go by yourself potentially?

O’NEILL: Yes, yeah. You get used to it quite quickly, and there’s obviously something that changes in you when you’ve been there a while. You know, when you first go there, all the auto-rickshaw drivers are going to overcharge you and drive around the block twice and all of those things. And I find after about four to five weeks when you’ve been there, they know, like, there must be something that changes in your attitude because they actually know that you’re there longer term and you’re not going to take any nonsense.

GEHRKE: So, so do you behave differently? What’s the change there?

O’NEILL: I don’t know. That’s, I’ve tried to think about this, but I think, I don’t know, it must be just an air of confidence or an air of certainty or something. But, yeah, it’s like something just clicks or changes.

GEHRKE: That’s so interesting. Is it only for the drivers, or is it in other aspects of your life, as well, where, sort of, you get treated differently because you suddenly have become a native?

O’NEILL: I think you notice it most in the drivers because they’re the ones that you’re interacting so much with to get about, you know, to get … you’re always getting a tuk-tuk to go from here to there. And they really do, you know, if they can make extra money out of you, they are going to make extra money out of you.

GEHRKE: They smell it, that you’re a tourist.

O’NEILL: Yeah, yeah, yes. [LAUGHS]

GEHRKE: And then so you were in India and then another opportunity came along. So tell us a little bit about that opportunity, where you ended up now.

O’NEILL: Yes, yes. So when I heard that the ADCs were opening—the Africa Development Center, so our software engineering center in Nairobi and Lagos—I thought that that was a great time to pitch for research in Africa for Microsoft. It seemed like a bit of a hole in our portfolio. I have family connections to Africa. So, actually, one of the reasons for joining Microsoft was partly because I thought there might be opportunities eventually in Africa because we had a great Africa startup program, for example. So, you know, but there wasn’t any research there. And so when I heard the ADCs were open, I just put together a, like, pitch for setting up research in Africa within the ADCs, and, you know, all sorts of people really helped me hone that pitch. And then I flew at the end of February 2020. I flew …

GEHRKE: Oh, just right before the pandemic.

O’NEILL: Mm-hmm. I flew to … I was in Barcelona for a Future of Work event, and then I flew to Nairobi and then Lagos to meet the people who were running the ADCs and to think about where, which one I would want to set up research in if such a thing were to happen. And I did that. I decided that Nairobi was the right one. And when I went there, Jack Ngare ran the ADC, and he was so enthusiastic about having research there. So I did a pitch and got some funding just—I think if it had been two weeks later, I’m not sure. But, you know, it was just before we knew how bad COVID was going to be, so I was very lucky with timing.

GEHRKE: And, I mean, you’ve made these amazing moves throughout your career, right. You, sort of, raised your hand for India when the lab was open; now here in Africa. Why, and how? I’m just, I mean, so curious because people make the most unexpected turns in their careers from time to time. But it’s more like because, you know, they lose their current job or they, their manager moves away and they really think about their career. But you, like, raise your hand from time to time and make these really bold and amazing moves.

O’NEILL: Yeah, I mean, life’s meant to be exciting, isn’t it?

GEHRKE: OK …

O’NEILL: I think. You know, life’s meant to be exciting. I love living in different places and, you know, as an ethnographer, as a person interested in human-computer interaction, it’s, like, those experiences are what help us innovate better and design things that are, like, taking another point of view, more creative, I think. Like, just sparks things in your, in your head. And, I mean, it’s so much fun. Like, I don’t understand why everyone doesn’t do it. [LAUGHS]

GEHRKE: So it’s just really amazing. So if I think about, you know, India, where you said, right, the experience for you was that the drivers were treating you suddenly differently. Did you have a similar experience in Africa, or what is one of the or a few of the defining experiences and stories there?

O’NEILL: Yeah, I think … so the animals are amazing in Kenya. They’ve done such an amazing job at conservation. I imagine that they would, you would only see, like, these big animals in the national parks, but—they’re not everywhere. They’re not going to be, you’re not going to find a hippo walking down the road in Nairobi. But they are all over the place. So you can go camping in Lake Naivasha, which is just an hour and a half from Nairobi, and I was camping with a friend, and the kids were in their tent, and my friend was in her tent, and I was just sitting by the fire. It’s about 10 o’clock. I said, yeah, I might go to bed in a minute. And then I just heard this snort, and I get up with my torch, and I look, and there’s a hippo, [LAUGHS] like, probably less than a meter and a half …

GEHRKE: Wow …

O’NEILL: … away from me. So I carefully went and sat back down by the fire and waited for a while before I moved. [LAUGHS]

GEHRKE: So are they dangerous in that aspect, if you’ve startled them or so … ?

O’NEILL: Yeah, I think … they say that you should never get between a hippo and the water. So, luckily, I was on the other side of the, [LAUGHS] of the hippo and the water. But they are big. I mean, they can be very grumpy.

GEHRKE: And so you should, just, shouldn’t startle them or … ? I’m just trying to understand what’s the recommended behavior. Don’t get between the hippo and the water.

O’NEILL: Yes, that’s recommended, and don’t, yeah, don’t startle them, and just, you know, stay very, stay very calm. So, actually, when you’re camping, if you don’t have an electric fence around the campsite, then you shouldn’t come out of your tent at night. So don’t drink too much beer before you go to bed, [LAUGHTER] because it’s the “zip.” When you unzip it, you can really startle … If there’s any wild animals, lions, or whatever around, then you can really scare them. And you don’t want to scare a lion.

GEHRKE: Yeah, I was thinking, just, actually, about the lions or so, right. I mean, they could be probably even more dangerous than the hippos or, or not really?

O’NEILL: Hippos are actually more dangerous than lions. Yeah, lions will generally not attack you. And apparently, the thing—I haven’t had to try this, I’m glad to say—but the thing you should do if you encounter a lion is just look them in the eye, and then they’ll go off.

GEHRKE: Stare them down.

O’NEILL: Mm-hmm.

GEHRKE: OK.

O’NEILL: I hope I never have to try that because they are quite scary … [LAUGHS]

GEHRKE: I hope I never have to do that but good advice …

O’NEILL: Yes, yeah, yeah. I think hippos are more likely to charge at you. Like, a lion’s more likely to go off in the other direction.

GEHRKE: And what’s the daily life like, you know, living in Nairobi, right? I mean, is it, I mean, it must be very, very different from living in both India, as well as, you know, Great Britain or here.

O’NEILL: Yeah. I mean it is very different. The traffic’s bad but not as crazy as India. Like, I drive in Kenya. I didn’t drive in India because it was a bit too scary with the bikes and everything. It’s a really, it’s a really nice pace, I think, in Nairobi. It’s a beautiful city. There’s nightlife, and there’s cafes and restaurants, but you’ve got countryside so close. You know, compared to Bangalore, it’s quite a small city. And the weather is amazing, and the people are really friendly and kind, and, you know, it’s just, it’s a very nice, it’s a very nice place to live.

GEHRKE: That’s amazing, and you now are leading the Microsoft Africa Research Institute there, right?

O’NEILL: Yes.

GEHRKE: What is the focus of the institute, and what are you studying there?

O’NEILL: Mm-hmm. Yeah, we’re mainly focused on foundational models. It won’t be a surprise to anybody. [LAUGHS] Which actually, you know, it’s worked out very well for us because, you know, we have a mixed disciplinary team. We have HCI and AI and ML and data science.

GEHRKE: And all local?

O’NEILL: All local. Yeah. And, yeah, we’re looking at multilingual languages in models. So we’re working with MSR [Microsoft Research] India, thinking about how can you benchmark these models for different languages. And we’re thinking all the way along the scale from your high-resource, you know, French and German, to your mid-resource Swahili, Hindi, all the way to your low-resource languages because, you know, the vast majority of training data is in English. So we’ve been working a lot. That’s nice because we’re having, you know, in a very short amount of time, you know, four or five months, we’re having both scientific impact with papers but also product impact, working with the Copilot Language Globalization team as they’re rolling out Copilot in different languages.

GEHRKE: I see. So the research that you have will go into, let’s say, Word or PowerPoint or so to make it available in some of the languages from the continent.

O’NEILL: Yes, exactly. Because it’s not just about translation. It’s also if you think about RAI, responsible AI, you know, a lot of that is language based. And so how do … you can’t just translate this to words. You have to find the right list of words in those languages. And then what about things like tone and stuff? So that’s one area. And then related to that, it’s in a much bigger space of equity, the models and equity. You know, what’s going to happen to the digital divide with these models? In some ways, you could imagine that they may be flattening it, but in other ways, they could be increasing it. So we really are trying to map out how … the different elements of the digital divide as it plays out in these models. Because you obviously have your traditional things like access to devices, access to, you know, infrastructure, and things like that. But there’s also the data divide. So not only is most of the training material in English; it’s also mostly from America and the Global North. So it embodies very particular world views. And if you think about data on Africa, data on Africa tends to be collected by particular organizations. So there’s lots of data on poverty and disease and forced migration and things like that. Not much data on, like, the stories, the creativity, wealth, innovation. So what does that mean? Even if the models can speak perfectly, which they can’t yet, but they’ll eventually get quite good at, you know, even smaller languages like Luo, if that model is just translating English content into Luo, that’s not necessarily what we want from a model. So there’s some really interesting questions there to be answered.

GEHRKE: Well, it seems to me like it’s clearly also a question of, like, getting the right kind of data. So where do you get the data, and how do you get the data?

O’NEILL: Yeah, that’s a big question. And it was already a challenge, you know, before these models. You know, many people have been working with Masakhane, which is one of the African NLP communities which is around creating datasets in African languages for training the models. So that was, you know, getting good quality training data is already a challenge. Sriram [Rajamani] from MSR India, though, was telling me of a really interesting project they’ve got going on in India with the Indian government where they are trying to collect data from each region of India so that they can use it to train the OpenAI models, which would be really cool. And we should think about, is that what we can do for different African countries and contexts?

GEHRKE: Exactly. It seems to be very much like a citizen science project, right, where you, sort of, involve the citizens that speak different dialects and then involve them in collecting the right kind of data.

O’NEILL: Yeah, yeah. And maybe collecting the stories, you know, and the cultural attributes and assets from different places.

GEHRKE: That’ll be really, really exciting probably also about preservation of the culture and history, right.

O’NEILL: Yes, yes. But challenging.

GEHRKE: But challenging. [LAUGHTER]

O’NEILL: Yeah.

GEHRKE: So that’s one big aspect of the work. Anything else that’s happening there?

O’NEILL: Yeah. So we’re doing a lot of work, you’ll be unsurprised to hear, on Future of Work and AI. And so we’ve got a project on modern work and LLMs, so looking at the work that enterprise workers, frontline and knowledge workers, are doing and then what bits of their job they would like to get rid of if they could and what bits they would keep and how we can use LLMs to support them. And we’ve also, like, Maxamed [Axmed] on my team, also worked with The Garage to train them up in foundational models, both the LLMs and the vision models, and then they’ve introduced them to a whole load of small businesses in Kenya.

GEHRKE: Oh, wow.

O’NEILL: So that’s really interesting. You got everyone from like car salespeople to lawyers who are now using, like, LLMs as part of their everyday work, which is amazing.

GEHRKE: As part of like composing messages or part of … what’s …

O’NEILL: Yeah. Writing contracts, sales documents for cars, all sorts of really interesting things.

GEHRKE: Oh, wow.

O’NEILL: So we’re going to go out and look at what they’re doing and think about how, you know, what else is needed, what, what more do they need.

GEHRKE: What’s the prevalent form factor in terms of if I think about, like, a computer there? Is it my, is it a mobile phone? Is it a tablet?

O’NEILL: Yeah.

GEHRKE: It’s a mobile phone?

O’NEILL: It’s a mobile phone. Yeah.

GEHRKE: So you have to rethink also, probably, all the interfaces.

O’NEILL: Yes, I mean …

GEHRKE: You mentioned that early on, right, as you think about the next generation of HCI with AI in it, right.

O’NEILL: Yes, yes. I mean conversational interfaces. The idea that you can talk to your phone or enter existing text, you know. If you look at small businesses, a lot of their interactions with customers are on chat. If you can enter that chat into an LLM and extract structured data from it, then suddenly you’ve got all this data that’s been lost to the business becomes usable. So it’s a really exciting space, and I think voice interfaces are going to become really, really, really big. And that’s why there’s opportunities for leapfrogging, because suddenly everyone with a mobile phone potentially has a really powerful office productivity tool in their hand and can do things … you know, many of the small businesses, they don’t employ a designer; they don’t employ an accountant. But now they could maybe have an accountant or a designer in their pocket, which enables them to do more, which is definitely the more positive side of the future of work than some of the …

GEHRKE: Right. You know, this whole enablement story of people is just really amazing, what you can do with LLMs and especially with voice interfaces, as well. Let me conclude maybe with a question about your career. I mean, it seems like you’ve always amazingly managed to somewhat align your career moves with your passion. You moved to India because you’re just excited to live in India. You moved then to, you know, Microsoft Research, but then you moved to Africa again for, what I hear, is a little bit the adventure, as well, right?

O’NEILL: Yes.

GEHRKE: So what’s your advice for people who want to, sort of, align these two and who want to not only work but also want to work on something they’re really passionate about? How do you manage to create that alignment?

O’NEILL: That is a good question. I don’t know. It just, sort of, happens. I mean, I think you have to, you have to be passionate about it; you have to talk about it and decide what you want to do. You know, I never really imagined MARI would happen. But I just started talking to people, and people were saying, before I did the pitch, people were saying to me, oh, what would you like to do in five years, Jacki? And I was like, oh, you know what? If I had my way, I’d love to run a research center in Africa. And then within a couple of years … it was nothing more than an idea in my head. So I think that you have to have the ideas, verbalize it, and maybe it can happen.

GEHRKE: And why a research center in Africa? What’s personal for you there?

O’NEILL: So my children are African; my children are Cameroonian. So I wanted them to grow, spend some time on the continent, and, you know, as a family, we’d always had that idea of moving to the continent eventually. So that was part, that was a personal motivation in there as well as the passion. Yeah.

GEHRKE: So it’s, well, sort of, the confluence of, I guess, opportunity but then also drive on your side? Because that’s what I’ve heard. Very often in careers, that it’s not only about, well, this is what I finally want to do but also watching out for that opportunity.

O’NEILL: Yes.

GEHRKE: So it seems like that played a big role here, as well. And so when you heard about, you know, that there was an Africa Development Center, how did you, what were your next steps then? I mean, you must have been excited, but you also had to take some action.

O’NEILL: Yeah, I mean, I created, [LAUGHS] I created a small pitch, a small set of slides, and then I just started talking to everybody I knew who was doing anything. I didn’t have any contact with the ADCs.

GEHRKE: So you created that energy and excitement about it?

O’NEILL: I just started to, you know, every time anyone would come to India, you know, I was just like, oh, this is what I’d like to do. And you just almost talk it into being, I think.

GEHRKE: And were there some setbacks, or was it just like a straight line from, sort of, the excitement all the way up to realization?

O’NEILL: No, I mean, I didn’t, I don’t think I ever really imagined it would happen, you know. But you’re just doing it, and you’re plugging away, and then taking the, you know, taking the advice of people.

GEHRKE: Really an awesome story. So maybe as a last question, where do you see the center being in like three to five years? I mean, you’re starting off right now, but I’m sure you have really big ambitions for the center, and there’s so much to do on the whole continent.

O’NEILL: No, absolutely. I think that I have a few ambitions. So the most important, I think, I want it to be really established as this thing that’s really beneficial to Microsoft, that Microsoft is like, really, “Yeah, the guys at MARI, they’re doing great research. We really like them.” So that it, sort of, exists without me, you know. At the moment, I think I’m the driver of it. I would …

GEHRKE: So you want to grow the next generation that is basically going to be the next generation of leaders?

O’NEILL: Yes, exactly, exactly. And then I think also grow, I would love to help in growing Microsoft’s market in Africa. We don’t have a particularly big market in Africa, but I think there’s a lot of opportunity, especially now with these, with these large language models. I think that we … so that would be really exciting, you know, if we can help. I don’t see our success only being about growing the African market, but I think it’s part of what we can do, and if we can grow that market, as well as do research that’s relevant for Redmond and relevant globally, that’s really, that’s really exciting, I think, you know. So everything we do, I think, has to have a relevance globally. And I think, you know, at the beginning I was talking about different ways of viewing the world and how that leads to innovation. I think by having researchers who are African, based in Africa, doing this great research, we can create better products for everyone.

GEHRKE: That’s such a great finishing note. Thank you so much for the great conversation, Jacki.

O’NEILL: Thank you, Johannes. It’s been fun.

[MUSIC]

To learn more about Jacki or to see photos of Jacki living and working abroad, visit aka.ms/ResearcherStories (opens in new tab).

[MUSIC FADES]

The post What’s Your Story: Jacki O’Neill appeared first on Microsoft Research.

Research Focus: Week of May 13, 2024

May 15, 2024

by Brenda Potts Microsoft AI

Welcome to Research Focus, a series of blog posts that highlights notable publications, events, code/datasets, new hires and other milestones from across the research community at Microsoft.

NEW RESEARCH

Injecting New Knowledge into Large Language Models via Supervised Fine-Tuning

Large language models (LLMs) have shown remarkable performance in generating text similar to that created by people, proving to be a valuable asset across various applications. However, adapting these models to incorporate new, out-of-domain knowledge remains a challenge, particularly for facts and events that occur after the model’s training knowledge cutoff date.

In a recent paper: Injecting New Knowledge into Large Language Models via Supervised Fine-Tuning, researchers from Microsoft investigate the effectiveness of supervised fine-tuning (SFT) as a method for knowledge injection in LLMs, specifically focusing on recent sporting events. They compare different dataset generation strategies—token-based and fact-based scaling—to create training data that helps the model learn new information. Their experiments on GPT-4 demonstrate that while token-based scaling can lead to improvements in Q&A accuracy, it may not provide uniform coverage of new knowledge. Fact-based scaling, on the other hand, offers a more systematic approach to ensure even coverage across all facts. The researchers present a novel dataset generation process that leads to more effective knowledge ingestion through SFT, and results show considerable performance improvements in Q&A tasks related to out-of-domain knowledge.

Read the paper

NEW RESEARCH

A Reflection on Human-Notebook Experiences in the Era of AI

Computational notebooks provide an interactive way to work with data. They have been widely used by data professionals to write code, explore data, and generate visualizations, all in one document. Previous research has revealed unique pain points around the user experience in computational notebooks. However, as AI tools like ChatGPT or Copilot have emerged, it is unclear whether these pain points have been reduced or changed, or whether new pain points have arisen. Due to the fast pace of advances in AI technology, most of the development of new AI tools has been primarily driven by technology and not by user experience.

In a recent paper: A Reflection on Human-Notebook Experiences in the Era of AI, researchers from Microsoft summarize literature on how new AI technology has impacted human-notebook interaction and human-computer interaction (HCI) paradigms, new challenges and user behavior around using AI assistants, and recent research on AI assistants in computational notebook scenarios. They outline gaps in existing literature and suggest a future focus on improving macro human-notebook experiences throughout a user’s workflow, measuring and quantifying the value of AI systems, and establishing a set of standards and best practices for AI tools.

Read the paper

NEW RESEARCH

Jacdac: Service-Based Prototyping of Embedded Systems

The traditional approach to programming embedded systems is monolithic: firmware on a microcontroller contains both application code and the drivers needed to communicate with sensors and actuators, using low-level protocols such as I2C, SPI, and RS232. In comparison, software development for the cloud has moved to a service-based development and operation paradigm: a service provides a discrete unit of functionality that can be accessed remotely by an application, or other service, but is independently managed and updated.

In a recent paper: Jacdac: Service-Based Prototyping of Embedded Systems (opens in new tab), researchers from Microsoft propose, design, implement, and evaluate a service-based approach to prototyping embedded systems called Jacdac (opens in new tab). Jacdac defines a service specification language, designed especially for embedded systems, along with a host of specifications for a variety of sensors and actuators. With Jacdac, each sensor/actuator in a system is paired with a low-cost microcontroller that advertises the services that represent the functionality of the underlying hardware over an efficient and low-cost single-wire bus protocol. A separate microcontroller executes the user’s application program, which is a client of the Jacdac services on the bus.

Three Jacdac kits, comprising over twenty modules, have been produced by third-party manufacturers: KittenBot (opens in new tab) and Forward Education (opens in new tab).

Read the paper

NEW RESEARCH

PARIKSHA: A Scalable, Democratic, Transparent Evaluation Platform for Assessing Indic Large Language Models

Evaluation of multilingual LLMs is challenging due to a variety of factors – the lack of benchmarks with sufficient linguistic diversity, contamination of popular benchmarks into LLM pre-training data, and the lack of local, cultural nuances in translated benchmarks. Hence, it is difficult to extensively evaluate LLMs in a multilingual setting, leading to lack of fair comparisons between models and difficulties in replicating the evaluation setup used by some models. Recently, several Indic (Indian language) LLMs have been created to help build more locally and culturally relevant LLMs.

In a recent paper: PARIKSHA: A Scalable, Democratic, Transparent Evaluation Platform for Assessing Indic Large Language Models, researchers from Microsoft present an evaluation framework, which is the first comprehensive evaluation of Indic LLMs using a combination of human and LLM-based evaluation. The researchers conduct a total of 90,000 human evaluations and 50,000 LLM-based evaluations of 29 models to present leaderboards for 10 Indic languages. Pariksha provides inclusive evaluation by engaging a community of workers that represent India’s large and diverse workforce and also serves as a research platform for improving the process of evaluation. For transparency on the process, the evaluation artifacts will be released. Conducting Pariksha at regular intervals, the researchers aim to enable models to improve over time with insights and artifacts from their evaluations.

Read the paper

NEW RESEARCH

Tinker, Tailor, Configure, Customize: The Articulation Work of Customizing AI Fairness Checklists

Many responsible AI resources, such as toolkits, playbooks, and checklists, have been developed to support AI practitioners in identifying, measuring, and mitigating potential fairness-related harms. These resources are often designed to be general purpose, in order to address a variety of use cases, domains, and deployment contexts. However, this can lead to decontextualization, where such resources lack the level of relevance or specificity needed to use them.

To understand how AI practitioners might contextualize one such resource, an AI fairness checklist, for their particular use cases, domains, and deployment contexts, researchers from Microsoft conducted a retrospective contextual inquiry with 13 AI practitioners from seven organizations. In a recent paper: Tinker, Tailor, Configure, Customize: The Articulation Work of Customizing AI Fairness Checklists, they identify how contextualizing this checklist introduces new forms of work for AI practitioners and other stakeholders, while opening up new sites for negotiation and contestation of values in AI. The researchers also identify how the contextualization process may help AI practitioners develop a shared language around AI fairness. They also identify dynamics related to ownership over this process that suggest larger issues of accountability in responsible AI work.

Read the paper

NEW RESEARCH

MS MARCO Web Search: A Large-scale Information-rich Web Dataset with Millions of Real Click Labels

LLMs are becoming indispensable tools for many creative and information related tasks, but they still come with limitations, including a tendency to fabricate content. State-of-the-art algorithms pair the LLM with an external, dynamically updated knowledge base to ground the LLM’s answers and provide up-to-date information. However, these techniques require large amounts of relevant, labeled training data that have not previously been publicly available.

In a recent paper: MS MARCO Web Search: A Large-scale Information-rich Web Dataset with Millions of Real Click Labels presented at the 2024 ACM Web Conference, researchers from Microsoft introduce a novel dataset that closely mimics real-world web document and query distribution. MS MARCO Web Search contains 10 million unique queries across 93 languages with millions of relevant labeled query-document pairs. It uses ClueWeb22’s 10 billion high-quality web pages as the document corpus and provides rich information for various kinds of downstream tasks.

This dataset unlocks several new research directions that previous datasets cannot well support, including generic end-to-end neural indexer models, generic embedding models, and next generation information access systems with LLMs. MS MARCO Web Search offers a retrieval benchmark with three web scale retrieval challenge tasks, each with automatic evaluation and leaderboard. These tasks demand innovation in both machine learning and information retrieval systems. The researchers intend for MS MARCO Web Search to lay the groundwork for future advancements in AI and systems research.

View dataset

Read the paper

VIDEO

AI Case Studies for Natural Science Research with Bonnie Kruft

Among the stunning changes and disruptions driven by AI, one of the most significant is the impact on scientific discovery. In her presentation at EmTech Digital 2024 (opens in new tab), Bonnie Kruft, partner deputy director at Microsoft Research AI for Science, outlined some examples of how generative AI enables groundbreaking research in the natural sciences. Recent breakthroughs aided by AI include small molecular inhibitors for treating infectious disease, the discovery of new materials for energy storage, and new drug development.

Catch a replay of the presentation, including a follow-up Q&A with the audience, and hear how researchers are reducing discovery times from years to months. The discussion explores safe and responsible AI practices, how large language models can work with science-based models, and what lies ahead for AI in science.

Watch the video

Microsoft Research in the news

The tiny glass blocks that can preserve your data for centuries

The Times UK | April 27, 2024

Microsoft’s Project Silica is an innovative form of long-term storage – potentially revolutionizing how important data can be preserved for future generations.

These Recyclable Circuit Boards Could Stem E-Waste

IEEE Spectrum | May 2, 2024

New research from the University of Washington and Microsoft show that vitrimer-based PCBs can be broken down into a gel for repeated reuse. The research stems from the Microsoft Research Climate Initiative.

Today’s AI models are impressive. Teams of them will be formidable

The Economist | May 13, 2024

Teams of LLMs are more capable and intelligent than solitary agents because a single job can be split into many smaller, more specialized tasks, says Chi Wang, a principal researcher at Microsoft Research in Redmond, Washington.

You Only Cache Once: Decoder-Decoder Architectures for Language Models

Microsoft Research LinkedIn | May 11, 2024

YOCO is a novel decoder-decoder architecture for LLMs, enhancing memory efficiency by caching key-value pairs only once. It slashes KV cache memory and prefilling time and makes 1M-length LLMs practical.

Peter Lee discusses new technologies that will drive the future of drug discovery

AAPS | May 10, 2024

The president of Microsoft Research explores how new advances in technologies, such as AI and machine learning, are transforming biotechnology, in the closing plenary of the AAPS National Biotechnology Conference (NBC) on Thursday, May 16.

PKSHA develops advanced LLMs in collaboration with Microsoft Japan

Business Wire | April 29, 2024

PKSHA Technology has developed one of the first Japanese-English LLMs in collaboration with Microsoft Japan. This development primarily focuses on boosting productivity within contact centers and corporate help desks.

BRAID fellowships include three collaborations with Microsoft Research

Bridging Responsible AI Divides | May 2024

BRAID fellowships support individual researchers in partnership with public and private organizations to address challenges in the field of responsible AI. Among the latest fellowships are three supported by Microsoft Research.

View more news and awards

The post Research Focus: Week of May 13, 2024 appeared first on Microsoft Research.

Microsoft at CHI 2024: Innovations in human-centered design

May 15, 2024

by Brenda Potts Microsoft AI

The ways people engage with technology, through its design and functionality, determine its utility and acceptance in everyday use, setting the stage for widespread adoption. When computing tools and services respect the diversity of people’s experiences and abilities, technology is not only functional but also universally accessible. Human-computer interaction (HCI) plays a crucial role in this process, examining how technology integrates into our daily lives and exploring ways digital tools can be shaped to meet individual needs and enhance our interactions with the world.

The ACM CHI Conference on Human Factors in Computing Systems is a premier forum that brings together researchers and experts in the field, and Microsoft is honored to support CHI 2024 as a returning sponsor. We’re pleased to announce that 33 papers by Microsoft researchers and their collaborators have been accepted this year, with four winning the Best Paper Award and seven receiving honorable mentions.

This research aims to redefine how people work, collaborate, and play using technology, with a focus on design innovation to create more personalized, engaging, and effective interactions. Several projects emphasize customizing the user experience to better meet individual needs, such as exploring the potential of large language models (LLMs) to help reduce procrastination. Others investigate ways to boost realism in virtual and mixed reality environments, using touch to create a more immersive experience. There are also studies that address the challenges of understanding how people interact with technology. These include applying psychology and cognitive science to examine the use of generative AI and social media, with the goal of using the insights to guide future research and design directions. This post highlights these projects.

Best Paper Award recipients

DynaVis: Dynamically Synthesized UI Widgets for Visualization Editing
Priyan Vaithilingam, Elena L. Glassman, Jeevana Priya Inala, Chenglong Wang
GUIs used for editing visualizations can overwhelm users or limit their interactions. To address this, the authors introduce DynaVis, which combines natural language interfaces with dynamically synthesized UI widgets, enabling people to initiate and refine edits using natural language.

Generative Echo Chamber? Effects of LLM-Powered Search Systems on Diverse Information Seeking
Nikhil Sharma, Q. Vera Liao, Ziang Xiao
Conversational search systems powered by LLMs potentially improve on traditional search methods, yet their influence on increasing selective exposure and fostering echo chambers remains underexplored. This research suggests that LLM-driven conversational search may enhance biased information querying, particularly when the LLM’s outputs reinforce user views, emphasizing significant implications for the development and regulation of these technologies.

Piet: Facilitating Color Authoring for Motion Graphics Video
Xinyu Shi, Yinghou Wang, Yun Wang, Jian Zhao
Motion graphic (MG) videos use animated visuals and color to effectively communicate complex ideas, yet existing color authoring tools are lacking. This work introduces Piet, a tool prototype that offers an interactive palette and support for quick theme changes and controlled focus, significantly streamlining the color design process.

The Metacognitive Demands and Opportunities of Generative AI
Lev Tankelevitch, Viktor Kewenig, Auste Simkute, Ava Elizabeth Scott, Advait Sarkar, Abigail Sellen, Sean Rintel
Generative AI systems offer unprecedented opportunities for transforming professional and personal work, yet they present challenges around prompting, evaluating and relying on outputs, and optimizing workflows. This paper shows that metacognition—the psychological ability to monitor and control one’s thoughts and behavior—offers a valuable lens through which to understand and design for these usability challenges.

Honorable Mentions

Big or Small, It’s All in Your Head: Visuo-Haptic Illusion of Size-Change Using Finger-Repositioning
Myung Jin Kim, Eyal Ofek, Michel Pahud, Mike J. Sinclair, Andrea Bianchi
This research introduces a fixed-sized VR controller that uses finger repositioning to create a visuo-haptic illusion of dynamic size changes in handheld virtual objects, allowing users to perceive virtual objects as significantly smaller or larger than the actual device.

LLMR: Real-time Prompting of Interactive Worlds Using Large Language Models
Fernanda De La Torre, Cathy Mengying Fang, Han Huang, Andrzej Banburski-Fahey, Judith Amores, Jaron Lanier
Large Language Model for Mixed Reality (LLMR) is a framework for the real-time creation and modification of interactive mixed reality experiences using LLMs. It uses novel strategies to tackle difficult cases where ideal training data is scarce or where the design goal requires the synthesis of internal dynamics, intuitive analysis, or advanced interactivity.

Observer Effect in Social Media Use
Koustuv Saha, Pranshu Gupta, Gloria Mark, Emre Kiciman, Munmun De Choudhury
This work investigates the observer effect in behavioral assessments on social media use. The observer effect is a phenomenon in which individuals alter their behavior due to awareness of being monitored. Conducted over an average of 82 months (about 7 years) retrospectively and five months prospectively using Facebook data, the study found that deviations in expected behavior and language post-enrollment in the study reflected individual psychological traits. The authors recommend ways to mitigate the observer effect in these scenarios.

Reading Between the Lines: Modeling User Behavior and Costs in AI-Assisted Programming
Hussein Mozannar, Gagan Bansal, Adam Fourney, Eric Horvitz
By investigating how developers use GitHub Copilot, the authors created CUPS, a taxonomy of programmer activities during system interaction. This approach not only elucidates interaction patterns and inefficiencies but can also drive more effective metrics and UI design for code-recommendation systems with the goal of improving programmer productivity.

SharedNeRF: Leveraging Photorealistic and View-dependent Rendering for Real-time and Remote Collaboration
Mose Sakashita, Bala Kumaravel, Nicolai Marquardt, Andrew D. Wilson
SharedNeRF, a system for synchronous remote collaboration, utilizes neural radiance field (NeRF) technology to provide photorealistic, viewpoint-specific renderings that are seamlessly integrated with point clouds to capture dynamic movements and changes in a shared space. A preliminary study demonstrated its effectiveness, as participants used this high-fidelity, multi-perspective visualization to successfully complete a flower arrangement task.

Understanding the Role of Large Language Models in Personalizing and Scaffolding Strategies to Combat Academic Procrastination
Ananya Bhattacharjee, Yuchen Zeng, Sarah Yi Xu, Dana Kulzhabayeva, Minyi Ma, Rachel Kornfield, Syed Ishtiaque Ahmed, Alex Mariakakis, Mary P. Czerwinski, Anastasia Kuzminykh, Michael Liut, Joseph Jay Williams
In this study, the authors explore the potential of LLMs for customizing academic procrastination interventions, employing a technology probe to generate personalized advice. Their findings emphasize the need for LLMs to offer structured, deadline-oriented advice and adaptive questioning techniques, providing key design insights for LLM-based tools while highlighting cautions against their use for therapeutic guidance.

Where Are We So Far? Understanding Data Storytelling Tools from the Perspective of Human-AI Collaboration
Haotian Li, Yun Wang, Huamin Qu
This paper evaluates data storytelling tools using a dual framework to analyze the stages of the storytelling workflow—analysis, planning, implementation, communication—and the roles of humans and AI in each stage, such as creators, assistants, optimizers, and reviewers. The study identifies common collaboration patterns in existing tools, summarizes lessons from these patterns, and highlights future research opportunities for human-AI collaboration in data storytelling.

Learn more about our work and contributions to CHI 2024, including our full list of publications, on our conference webpage.

The post Microsoft at CHI 2024: Innovations in human-centered design appeared first on Microsoft Research.

RASCAL: Novel robotics for scalable and highly available automated storage and retrieval

May 14, 2024

by Brenda Potts Microsoft AI

This research paper was presented at the
41^st IEEE International Conference on Robotics and Automation (opens in new tab) (ICRA 2024), the premier international forum for robotics research.

Over the past decade, robotics has revolutionized numerous industries that rely on storage systems, such as manufacturing and warehousing. In these contexts, robotics streamlines operations and increase efficiency, and automated storage and retrieval systems (ASRS) are at the heart of this technological shift, exemplifying the transition to smarter, computer-controlled logistics solutions. These systems quickly move items from storage to fulfilment stations, helping to increase speed and accuracy in the overall process. Yet despite these advances, current ASRS—whether rail-based, fixed, or free-roaming—continue to face challenges, often sacrificing scalability and availability for higher throughput capacity. For instance, the use of fixed robots in traditional tape storage libraries, typically used for archival storage, can lead to availability limitations, as the robots cannot pass each other, and a single robot failure can restrict access to a significant portion of the library.

Our paper, published at ICRA 2024, introduces RASCAL: A Scalable, High-redundancy Robot for Automated Storage and Retrieval Systems, which addresses these concerns. RASCAL is an untethered robot that improves the efficiency of vertical storage systems by operating across evenly spaced, parallel shelves and horizontal rails. Designed to maximize scalability and redundancy, it handles the storage and retrieval of small objects. RASCAL was inspired by the challenges of managing archival storage media in datacenters, and it’s the key component of Project Silica’s storage and retrieval system. However, RASCAL’s modularity enables it to be used in other scenarios as well.

An innovative approach to archival storage

RASCAL’s design is based on four key principles:

Addressability: This allows any robot to access any item being stored on the shelves.
Scalability: The system can adjust retrieval capacity and storage space by adding or removing robots and shelving with negligible downtime.
Availability: A single robot failure minimally impacts access to items and routing, and it does not obstruct the operation of other robots.
Serviceability: Robots can easily be added or removed from the rails without the need for special training.

RASCAL’s motion system supports horizontal and vertical movement along storage panels assembled from contiguous storage racks. The parallel rail system enables independent and flexible movement. These rails are designed to be passive—functioning without the need for active power or energy sources, relying instead on their physical structure and positioning to guide and support the robot’s movement along the storage panels. The robot can travel along and between these rails using various pathways to reach a given item. Video 1 shows how RASCAL operates multiple robots on a single storage panel.

Video 1. Multiple robots in action

RASCAL utilizes a special rail geometry, allowing the robot to passively latch onto the rails with opposing wheels mounted on each end, as illustrated in Figure 1. This design ensures that the robot is securely held in place by gravity alone. The passive nature of this latching mechanism simplifies the process of adding or removing robots from the rails, as it does not require any tools or power.

Picture of a RASCAL prototype mounted on a Silica library. The library is composed of a series of connected storage racks that hold glass media. The storage panel's front has parallel rails mounted horizontally to allow the robot to move vertically and horizontally. RASCAL uses a pair of opposing wheels to latch onto these rails. — Figure 1. The RASCAL prototype in a Silica library.

The robot features two rotating assemblies known as wings, each equipped with wheels that allow it to move horizontally. The wings rotate in a choreographed sequence to enable ascent and descent. RASCAL climbs by unlatching one wing from its current rail while remaining attached to the other. It then rotates and secures its free wing to a new rail either two levels up or down. This is shown in Video 2.

Video 2. RASCAL’s novel climbing maneuver.

Video 3. RASCAL performing a pick operation.

Video 3 demonstrates RASCAL’s item-selection system, or picker interface, which is designed to handle various robotic tool attachments for precise pick-and-place operations. This interface can rotate in alternating directions during climbs, ensuring that the robotic tool attachment, or end effector, remains oriented towards the shelving while stationary, preventing the cables from tangling.

Advancing robotics and automation

As digital economies grow, the need for efficient storage and retrieval systems becomes increasingly urgent. Breakthroughs in robotics technology are poised to drive productivity, efficiency, and innovation across numerous industries. Developments like RASCAL, with its flexible design and advanced capabilities, are leading the way for the next generation of robotics and automation.

The post RASCAL: Novel robotics for scalable and highly available automated storage and retrieval appeared first on Microsoft Research.

Enhanced autoscaling with VASIM: Vertical Autoscaling Simulator Toolkit

May 13, 2024

by Alyssa Hughes Microsoft AI

This research was presented as a demonstration at the 40^th IEEE International Conference on Data Engineering (opens in new tab) (ICDE 2024), one of the premier conferences on data and information engineering.

Since the inception of cloud computing, autoscaling has been an essential technique for optimizing resources and performance. By dynamically adjusting the number of computing resources allocated to a service based on current demand, autoscaling ensures that the service can handle the load efficiently while optimizing costs. However, developing and fine-tuning autoscaling algorithms, which govern this process, present significant challenges. The complexity and cost associated with testing these algorithms can lead to inefficient resource management and impede the development of more effective autoscaling strategies.

Publication

VASIM: Vertical Autoscaling Simulator Toolkit

In our paper, “VASIM: Vertical Autoscaling Simulator Toolkit,” presented at ICDE 2024, we introduce a tool designed to address the complexities involved in assessing autoscaling algorithms. While existing simulation tools cover a range of capabilities, such as energy efficiency and fault tolerance, VASIM stands out by evaluating the critical recommender component within the algorithm and suggesting optimal resource scaling actions based on usage data, balancing performance and cost. This enables developers to iterate more rapidly, enhancing algorithmic performance, and improving resource efficiency and cost savings.

VASIM’s user-friendly interface simplifies the evaluation of autoscaling policies, as illustrated in Figure 1. First steps entail uploading historical data and defining autoscaling policies, including the algorithm and its parameters, shown in the left panel. The Simulation Run feature enables the modification of algorithm parameters, imported via a configuration file, and the execution of simulations based on the selected trace. A results screen displays the CPU limits determined by the selected policies as well as the actual CPU usage tailored to these limits. Additionally, VASIM provides fundamental metrics like throttling incidents, number of scaling operations, and amount of unused capacity, or slack, for the current simulation.

[On the left] Image of VASIM user interface. On the left panel, it has options to select from “Simulation Run”, “Simulation Tuning”, “Simulation Tuning History”. Option “Simulation Run” is selected. Below user has loaded a trace from csv file on disk (c_26742_perf_event_log.csv), algorithm C, metadata config json file from disk. Button “Visualize workload” was clicked and loaded trace is displayed.

[On the right] On the right panel, user picked other parameters for simulation run (lag – how often recommender gives decision and initial core count) and algorithm parameter from json are shown for edit.

Image of VASIM UI when simulation was run for selected algorithm, trace and parameter setting. It shows a graph with cpu usage in blue and the limit calculated by selected algorithm in red. It is different from the trace plot that was shown before because calculated limits were below cpu utilization, so the latter was cut off. On top of the plot it shows metrics of the simulation like average slack, average insufficient CPU, sum slack, sum insufficient CPU, number of scalings, number of times of insufficient CPU etc. — Figure 1. The VASIM user interface comprises a run simulation pane on the left and a results pane on the right.

VASIM achieves several important goals:

Resource efficiency and cost reduction. VASIM reduces costs by removing the need to test scaling operations in real-time, which would be resource intensive. This enables developers to adjust algorithms iteratively in a controlled, cost-efficient environment, accelerating development cycles. Because the tool allows users to upload CPU performance history and algorithm parameters, it delivers the results of scaling operations across the entire workload in minutes rather than hours.

Multi-objective optimization. It’s challenging to develop an autoscaling method that handles conflicting parameters. VASIM makes this easier by applying Pareto optimization techniques (opens in new tab), helping developers to find a balance among key metrics. Figure 2 depicts scatter plots for two metrics: average slack and average insufficient CPU. It also shows three optimization objectives: the optimal amount of slack, throttling, and number of scaling operations.

[On the left] A graph that plots the average slack on the Y axis and the average insufficient cpu on the X axis. It shows that the more average insufficient cpu decreases, the more average slack increases. There are six points in red that are pareto frontier points, all on the very edge of the graph but not too close to each other, showing some possible choices of configuration.

[On the right] A 3D scatter plot displays the total slack on the X axis, cpu total throttle on the Y axis, and the amount of scalings in Z axis. It shows that as you aim to lower total slack and throttle, the amount of scalings increases. — Figure 2. The 2D diagram on the left shows a scatter plot of tuning with Pareto points. The 3D graph on the right shows a scatter plot with the three objectives.

Recommender algorithm testing. VASIM simplifies the process of testing and evaluating recommendation algorithms across diverse workloads. With all tuning jobs running in parallel, computation occurs more quickly, allowing users to efficiently adjust their recommender parameters as necessary. To assess the algorithm’s generalizability, we ran VASIM against 11 available open cluster traces (opens in new tab) for benchmarking and internal product workload traces. This enabled us to evaluate the algorithms’ robustness across a variety of workload types, including cyclical, bursty, and monotonic variations, demonstrating their reliability across different scenarios.

Versatility and adaptability. VASIM provides users with the flexibility to modify components, experiment with recommendation strategies, and evaluate the impact of changes in a controlled and customizable environment. Figure 3 shows the results of a simulation run on the same algorithm and historical performance data but with different parameters. This versatility ensures that infrastructure engineers can tailor the system to meet their needs, enhancing the overall effectiveness of their autoscaling strategies.

These graphs display VASIM running an identical algorithm on the same historical data but with varying parameters, affecting slack, throttling, and the frequency of scaling events. The objective is to maintain a minimal gap between the peak and the lowest resource utilization levels (the top of the bottom line and the bottom of the top line, respectively), and to reduce the space between the response lag indicated by the trailing edges to the left of the lines. Simultaneously, it's important to minimize the occurrence of scaling events to prevent disruptions in workload execution. — Figure 3. These graphs show VASIM running an identical algorithm on the same historical data but with varying parameters, affecting slack, throttling, and the frequency of scaling events. The objective is to maintain a minimal gap between the peak and the lowest resource utilization levels—the top of the bottom line and the bottom of the top line, respectively. The goal is also to reduce the space between the response lag indicated by the trailing edges to the left of the lines. Simultaneously, it’s important to minimize the occurrence of scaling events to prevent disruptions in workload execution.

Optimizing scalability and costs in Kubernetes environments

Our research on vertically autoscaling monolithic applications with a container-as-a-service algorithm (opens in new tab) helped us to better understand the tradeoffs between cost and availability that different algorithm variations introduce. Because VASIM is similar to standard autoscaling architecture (as in the Kubernetes Vertical Pod Autoscaler (opens in new tab) [VPA]) it allows us to test autoscaling algorithms for pods, applications, and virtual machine (VM) capacity. This is possible because these systems share similar components, including resource updaters, controllers, and recommenders. Despite differences in specific systems, their underlying architectures are sufficiently similar, enabling VASIM to effectively mimic them, as shown in Figure 4.

The image depicts how VASIM works. It has a Simulation Controller in the middle, which asks Recommender for decisions using one of the algorithms, Simulation Scaler with a scale function, Cloud State Provider to get traces and use them for time simulation, Analyzer to get metrics after each run. Params Tuning Controller tells Simulation Controller to run for every tuning setting and calls Analyzer to get pareto front to find tradeoff between multiple goals after multiple configs were evaluated. Recommender also needs data from Cloud State Provider to access historical data. — Figure 4. VASIM architecture mimics the main components of general autoscaling architectures, allowing users to parametrize those modules to fit their specific needs.

Implications and looking ahead

Looking forward, we plan to broaden the scope of VASIM’s support beyond just CPUs to include a wide range of resources, such as memory, disk I/O, and network bandwidth. This expansion will provide future users with a comprehensive understanding of system performance and enable them to make more accurate decisions regarding system management and resource optimization. Additionally, a deeper understanding of system performance will help inform proactive optimization strategies focused on maximizing system efficiency and performance.

The post Enhanced autoscaling with VASIM: Vertical Autoscaling Simulator Toolkit appeared first on Microsoft Research.

MatterSim: A deep-learning model for materials under real-world conditions

May 13, 2024

by Brenda Potts Microsoft AI

The image features a complex network of interconnected nodes with a molecular structure, illuminated in blue against a dark background.

In the quest for groundbreaking materials crucial to nanoelectronics, energy storage, and healthcare, a critical challenge looms: predicting a material’s properties before it is even created. This is no small feat, with any combination of 118 elements in the periodic table, and the range of temperatures and pressures under which materials are synthesized and operated. These factors drastically affect atomic interactions within materials, making accurate property prediction and behavior simulation exceedingly demanding.

Here at Microsoft Research, we developed MatterSim, a deep-learning model for accurate and efficient materials simulation and property prediction over a broad range of elements, temperatures, and pressures to enable the in silico materials design. MatterSim employs deep learning to understand atomic interactions from the very fundamental principles of quantum mechanics, across a comprehensive spectrum of elements and conditions—from 0 to 5,000 Kelvin (K), and from standard atmospheric pressure to 10,000,000 atmospheres. In our experiment, MatterSim efficiently handles simulations for a variety of materials, including metals, oxides, sulfides, halides, and their various states such as crystals, amorphous solids, and liquids. Additionally, it offers customization options for intricate prediction tasks by incorporating user-provided data.

Figure 1: There are two subfigures. On the left-hand side, atomic structures of 12 materials belonging to metals, oxides, sulfides, halides, and organic molecules are shown. On the right-hand side, the temperature and pressure ranges of materials' application and synthesis are plotted. — Figure 1. MatterSim can model materials properties and behaviors under realistic temperature and pressure conditions for wide ranges of applications.

Simulating materials under realistic conditions across the periodic table

MatterSim’s learning foundation is built on large-scale synthetic data, generated through a blend of active learning, generative models, and molecular dynamics simulations. This data generation strategy ensures extensive coverage of material space, enabling the model to predict energies, atomic forces, and stresses. It serves as a machine-learning force field with a level of accuracy compatible with first-principles predictions. Notably, MatterSim achieves a10-fold increase in accuracy for material property predictions at finite temperatures and pressures when compared to previous state-of-the-art models. Our research demonstrates its proficiency in simulating a vast array of material properties, including thermal, mechanical, and transport properties, and can even predict phase diagrams.

Figure 2: There are three subfigures. The panel on the left shows a comparison of the highest phonon frequency predicted by MatterSim and by first-principles methods. The two values are for each material is very close, leading to a nearly straight line in the parity plot. The middle panel depicts the same relation of free energies of around 50 materials and comparison between MatterSim and first-principles results. The right panel shows the phase diagram of MgO predicted using MatterSim. The x-axis denotes the temperature and the y-axis denotes the pressure. The pressure ranges of where MgO’s B1 phase is below 500 GPa and this range decreases with temperature increase. The blue lines show the prediction from MatterSim and fits well with the shaded region which is the result from experiment measurement. — Figure 2. MatterSim achieves high accuracy in predicting mechanical properties, vibrational properties, and phases diagrams of material comparable to quantum mechanics and experimental measurements. The figure shows the comparison between the predicted properties and the experimental measured results.

Adapting to complex design tasks

While trained on broad synthetic datasets, MatterSim is also adaptable for specific design requirements by incorporating additional data. The model utilizes active learning and fine-tuning to customize predictions with high data efficiency. For example, simulating water properties — a task seemingly straightforward but computationally intensive — is significantly optimized with MatterSim’s adaptive capability. The model requires only 3% of the data compared to traditional methods, to match experimental accuracy that would otherwise require 30 times more resources for a specialized model and exponentially more for first-principles methods.

Figure 3: There are two panels in this figure. The right panel shows the structure of Li2B12H12, a complex material system used for solid-state batteries. This system is used in the benchmark of the performance of MatterSim. The left panel panels show the comparison between number of data point needed to train a model from scratch and customize from MatterSim to achieve the same accuracy. MatterSim requires 3% and 10% of the data for the two tasks compared with training from scratch. — Figure 3. MatterSim achieves high data efficiency with 90%-97% data save for complex simulation tasks.

Bridging the gap between atomistic models and real-world measurements

Translating material properties from atomic structures is a complex task, often too intricate for current methods based on statistics, such as molecular dynamics. MatterSim addresses this by mapping these relationships directly through machine learning. It incorporates custom adaptor modules that refine the model to predict material properties from structural data, eliminating the need for intricate simulations. Benchmarking against MatBench (opens in new tab), a renowned material property prediction benchmark set, MatterSim demonstrates significant accuracy improvement and outperforms all specialized property-specific models, showcasing its robust capability in direct material property prediction from domain-specific data.

Looking ahead

As MatterSim research advances, the emphasis is on experimental validation to reinforce its potential role in pivotal sectors, including the design of catalysts for sustainability, energy storage breakthroughs, and nanotechnology advancements. The planned integration of MatterSim with generative AI models and reinforcement learning heralds a new era in the systematic pursuit of novel materials. This synergy is expected to revolutionize the field, streamlining guided creation of materials tailored for diverse applications ranging from semiconductor technologies to biomedical engineering. Such progress promises to expedite material development and bolster sustainable industrial practices, thereby fostering technological advancements that will benefit society.

The post MatterSim: A deep-learning model for materials under real-world conditions appeared first on Microsoft Research.

LLM profiling guides KV cache optimization

May 8, 2024

by Alyssa Hughes Microsoft AI

This research paper was presented at the 12^th International Conference on Learning Representations (opens in new tab) (ICLR 2024), the premier conference dedicated to the advancement of deep learning.

Large language models (LLMs) rely on complex internal mechanisms that require more memory than what is typically available to operate on standard devices. One such mechanism is the key-value (KV) cache, which stores and retrieves previously computed data, helping the model generate responses quickly without needing to recalculate information it has already processed. This method uses a substantial amount of memory because it keeps a large amount of this data readily accessible to enhance the model’s speed and efficiency. Consequently, the KV cache can become prohibitively large as the complexity of the tasks increases, sometimes requiring up to 320 GB for a single operation. To address this, we developed FastGen, a novel method aimed at reducing the memory demands for LLMs.

Publication

Model Tells You What to Discard: Adaptive KV Cache Compression for LLMs

Our paper, “Model Tells You What to Discard: Adaptive KV Cache Compression for LLMs (opens in new tab),” presented at ICLR 2024, we describe how FastGen optimizes the way LLMs store and access data, potentially cutting memory use by half while preserving their efficiency. This approach represents a significant step toward making sophisticated AI tools more accessible and affordable for broader applications. We are honored to share that this paper has been awarded an Honorable Mention for the Outstanding Paper Award (opens in new tab).

Observations of the KV cache

The development of FastGen is underpinned by our observations of how the KV cache functions. We first observed that not all the data in the KV cache is needed for LLMs to complete their required tasks, as shown in Figure 1. By providing the KV cache with the mechanism to discard unnecessary data, it is possible to significantly cut memory use. For example, some LLM modules don’t require broad contexts to process input. For this, it is possible to construct a KV cache that removes data that contains less important long-range contexts, such as several sentences or paragraphs. Also, some LLM modules primarily attend only to special tokens, such as punctuation, for which it is possible to create a KV cache that retains only those tokens. Finally, some LLM modules broadly need all tokens, and for these we can employ the standard KV cache and store all words.

Another key observation in our study is that attention modules in different layers and positions in the LLM behave differently and need different preferences for their KV cache, as shown on the right in Figure 1.

Graphs depicting the different structures of the KV cache. The graph on the left contains common structures. The circle graphs on the right contain compositions of three modules that are in the same layer, but the way they store data is different. — Figure 1: These graphs depict the different structures of the KV cache. The graph on the left contains common structures. The circle graphs on the right contain compositions of three modules that are in the same layer, but the way they store data is different.

FastGen accounts for the diversity of KV cache structures

Because different KV caches have different structures, they need to be handled differently. We based the development of the FastGen algorithm on our observations, enabling it to categorize and optimize the data that is stored in a given KV cache. FastGen first analyzes the specific behaviors of different modules to understand their structures, a method called profiling. It then uses the results to adjust how data is stored in real-time, making the process more efficient. Our tests show that FastGen can reduce the amount of memory by 50% without sacrificing quality. Additional experiments, discussed in detail in our paper, confirm that the profiling process is crucial and significantly improves the efficiency of the KV cache.

The broader picture

Fueled by unprecedented advances in data handling and computational capabilities, LLM pretraining has emerged as a cornerstone of deep learning, transforming natural language processing tasks and continuously challenging our understanding of learning and cognition.

However, greater capabilities can bring challenges. As models scale larger, customizing them for specific tasks can become more resource-intensive. At Microsoft Research, we are exploring different approaches to more efficient model editing. A critical strategy involves targeted model profiling, which identifies essential components of a model that align with predefined goals. This profiling informs precise model modifications, optimizing resource use and effectiveness.

The two research projects we are presenting at ICLR 2024 support these goals. Both adopt the profile-then-edit paradigm to address different problems. FastGen reduces memory consumption. Our related work, Post-hoc Attention Steering for LLMs (PASTA), focuses on better controllability. These approaches are designed to be resource-efficient, as they do not require tuning or back propagation. Looking ahead, our goal is to further develop these techniques to improve the resource-efficiency of LLM applications, making them more accessible to a wider audience.

The post LLM profiling guides KV cache optimization appeared first on Microsoft Research.

Introduction

Differential privacy: A bridge between innovation and privacy

Technical deep dive: Differentially private synthetic data generation

Synthetic Text Generation with Differential Privacy: A Simple and Practical Recipe

Differentially Private Synthetic Data via Foundation Model APIs

Privacy-Preserving In-Context Learning with Differentially Private Few-Shot Generation

Conclusion

EVENT

Register now for Research Forum on June 4

NEW RESEARCH

Generative AI and the Politics of Visibility

Microsoft Research Forum

NEW RESEARCH

ACM MMSys 2024 Bandwidth Estimation in Real Time Communications Challenge

NEW RESEARCH

Player-Driven Emergence in LLM-Driven Game Narrative

NEW RESEARCH

Segmentation using large language models: A new typology of American neighborhoods

NEW RESEARCH

From Local to Global: A Graph RAG Approach to Query-Focused Summarization

Microsoft Research in the news

Learn more:

Subscribe to the Microsoft Research Podcast:

Transcript

Learn more:

Subscribe to the Microsoft Research Podcast:

Transcript

NEW RESEARCH

Injecting New Knowledge into Large Language Models via Supervised Fine-Tuning

NEW RESEARCH

A Reflection on Human-Notebook Experiences in the Era of AI

Collaborators: Renewable energy storage with Bichlien Nguyen and David Kwabi

NEW RESEARCH

Jacdac: Service-Based Prototyping of Embedded Systems

NEW RESEARCH

PARIKSHA: A Scalable, Democratic, Transparent Evaluation Platform for Assessing Indic Large Language Models

NEW RESEARCH

Tinker, Tailor, Configure, Customize: The Articulation Work of Customizing AI Fairness Checklists

NEW RESEARCH

MS MARCO Web Search: A Large-scale Information-rich Web Dataset with Millions of Real Click Labels

VIDEO

AI Case Studies for Natural Science Research with Bonnie Kruft

Microsoft Research in the news

Collaborators: Renewable energy storage with Bichlien Nguyen and David Kwabi

Best Paper Award recipients

Honorable Mentions

An innovative approach to archival storage

Advancing robotics and automation

AI Explainer: Foundation models ​and the next era of AI

Optimizing scalability and costs in Kubernetes environments

Implications and looking ahead

Simulating materials under realistic conditions across the periodic table

Adapting to complex design tasks

Microsoft Research Forum

Bridging the gap between atomistic models and real-world measurements

Looking ahead

Observations of the KV cache

Microsoft Research Forum

FastGen accounts for the diversity of KV cache structures

The broader picture

Navigation

GenAI Vision Endless Possibilities

"I'm interested in things that change the world or that affect the future and wondrous, new technology where you see it, and you're like, 'Wow, how did that even happen? How is that possible?'" -- Elon Musk

Copyright © 2019-2025 Vedere AI. All Rights Reserved.

AI Explainer: Foundation models and the next era of AI