Microsoft AI – Page 7

Toward developing faster algorithms for minimizing submodular functions

November 7, 2023

by Brenda Potts Microsoft AI

This research paper was presented at the 64^th IEEE Symposium on Foundations of Computer Science (FOCS) 2023 (opens in new tab), a premier forum for the latest research in theoretical computer science.

FOCS 2023 paper: Toward developing faster algorithms for minimizing submodular functions

Submodular functions are versatile mathematical tools, finding diverse applications in real-world scenarios and guiding solutions across complex domains. From dissecting the intricate networks of graphs to deciphering the complexities of economic landscapes through utility functions, and even navigating the enigmatic world of random variables via entropy functions, they offer valuable insights into challenging problems. Their wide-ranging applicability has made them pivotal tools for modeling and optimization in various theoretical computer science domains, including operations research and game theory. In recent years, submodular functions have gained prominence in solving optimization problems within machine learning (ML) applications. These tasks encompass vital areas such as feature selection and clustering, as illustrated in Figure 1. Additionally, submodular functions are instrumental in applications like sensor placement and graphical models. For further exploration, comprehensive resources are available in Bilmes’ insightful survey (opens in new tab) and Bach’s standard textbook (opens in new tab) on this subject.

Two graphics. The left graphic depicts the process of feature selection, beginning with all the features on the top, then the unselected features crossed in the middle, and finally the selected features remain at the bottom. The right graphic shows the process of clustering, where a set of points in 2D are assigned different colors so that points with the same color are physically close to each other to form a cluster. — Figure 1. Application of submodular function optimization to feature selection, on the left, and clustering on the right.

Algorithm design for submodular function minimization

In a joint paper with researchers from Stanford University, “Sparse Submodular Function Minimization(opens in new tab) (opens in new tab),” presented at FOCS 2023(opens in new tab) (opens in new tab), we investigate the problem of minimizing a submodular function in the standard model. Here, we assume that the submodular function can be accessed through an evaluation oracle that returns the value ( f(S) ) in response to a query with a set ( S ). This is the most classical and well-studied model for studying algorithm design for minimizing submodular functions.

Before we discuss our study, it’s important to bear in mind that a submodular function ( f ) is defined on subsets of a finite set of elements ( V ) that satisfy a diminishing marginal difference property. That is, for any two subsets ( S subseteq T ) and any element ( e in V setminus T ), the marginal value of ( e ) when added to the smaller set ( f(S cup {e}) – f(S) ) is at least the marginal value of ( e ) when added to the bigger set ( f(T cup {e}) – f(T) ).

In the 1980s, foundational work (opens in new tab) revealed that submodular functions could be minimized in polynomial time, marking a significant breakthrough. Since then, researchers have made substantial progress in the quest for faster algorithms for submodular function minimization (SFM). Despite these efforts, fundamental questions persist, such as determining the minimum number of queries required to minimize any given submodular function—a concept referred to as the problem’s query complexity.

Currently, the most advanced algorithm needs to make ( widetilde{O}(n^2) ) queries for any given submodular function, while the best lower bound is only ( widetilde{Omega}(n) ), where (n) is the size of the ground set on which the submodular function is defined. This disparity results in a substantial gap, leaving an (n)-fold difference between the existing upper and lower bounds.

Given this considerable difference, a natural question arises: What additional structural assumptions could potentially pave the way for faster algorithms in submodular function minimization (SFM)? One prevalent assumption is sparsity, which posits that the size of the set minimizing the submodular function is small. This holds particular relevance in diverse applications, including signal processing, feature selection, and compressed sensing. In these scenarios, solutions are expected to exhibit sparse non-zero entries, making it important to understand how algorithmic complexity depends on sparsity, as it provides insights into the intricate combinatorial and geometric structures of the problems.

Interestingly, existing algorithmic techniques developed over the past four decades for SFM do not yield improved runtimes even when the solution is sparse. Therefore, it is imperative to develop innovative techniques that can drive advancements in sparse SFM and bridge the existing gap between upper and lower bounds.

Parallel algorithms for submodular function minimization

Exploring beyond SFM’s query complexity, recent research has shed light on the importance of sparse SFM, particularly in understanding the inherent adaptivity of parallel algorithms (known as parallel complexity) designed to solve the problem. Research has shown that any parallel algorithm for SFM requires a minimum adaptivity that is a polynomial in the size of the ground set.

Our results improve both parallel and sequential algorithms for SFM. For example, consider a scenario where the minimizer of the given submodular function is (widetilde{O}(1))-sparse. In this context, our parallel algorithm runs in a nearly constant number of rounds, while our sequential algorithm makes a nearly linear number of queries. This achievement stands in stark contrast with the previous best parallel upper bound of (widetilde{O}(n)) and the best query complexity upper bound of (widetilde{O}(n^2)).

Fast first-order methods for exact submodular function minimization

Current fast algorithms for SFM rely on cutting-plane methods, a standard class of convex optimization techniques applied to the Lovász extension—a natural continuous extension of the given submodular function. However, restricting the optimization domain to sparse solutions doesn’t significantly expedite cutting-plane methods beyond a logarithmic factor. To address this, we shifted our approach and employed first-order methods, including stochastic mirror descent, to minimize the Lovász extension. These methods, non-Euclidean generalizations of stochastic gradient descent, are more attuned to problem geometry. Unlike cutting-plane methods, first-order methods exhibit a polynomial convergence rate, rather than a polylogarithmic dependency on the additive error concerning the optimal solution.

This rate of convergence indicates that first-order methods are better suited for approximate submodular function minimization, while our goal is to solve it exactly. Using the sparsity assumption, we developed a new algorithmic framework for SFM based on a new concept of duality. We used this framework to demonstrate how first-order methods, with substantially reduced accuracy requirements, can be applied to solve SFM exactly.

Toward faster algorithms for SFM and its applications

These techniques not only promise advancements for sparse SFM but also provide a foundation for tackling other fundamental problems in SFM theory. Our algorithms for sparse SFM serve as valuable starting points for designing improved algorithms for related problems. They offer potential insights into developing polynomial-time algorithms for SFM with lower query and parallel complexity, opening avenues for future research.

Traditionally, research on submodular function minimization has focused on the global properties of the problem over the past four decades. Sparse SFM, in contrast, enables us to explore local and more refined structures of submodular functions. Our work introduces new algorithmic tools that better use these structural properties, a vital aspect for applications in ML and operations research, because these areas often have special structures. Beyond advancing sparse SFM, our paradigm paves the way for the development of enhanced algorithms for SFM and its diverse applications.

The post Toward developing faster algorithms for minimizing submodular functions appeared first on Microsoft Research.

Teachers in India help Microsoft Research design AI tool for creating great classroom content

October 30, 2023

by Brenda Potts Microsoft AI

a group of people sitting at a desk in front of a crowd

Teachers are the backbone of any educational system. They are not just educators; they are indispensable navigators, mentors, and leaders. Teachers around the world face many challenges, which vary from country to country or even within a city or town. But some challenges are universal, including time management, classroom organization, and creating effective lesson plans.

Advances in AI present new opportunities to enhance teachers’ abilities and empower students to learn more effectively. That’s the goal of a new project from Microsoft Research, which uses generative AI to help teachers quickly develop personalized learning experiences, design assignments, create hands-on activities, and more, while giving them back hours of time that they spend on daily planning today.

Shiksha copilot is a research project which is an interdisciplinary collaboration between Microsoft Research India and teams across Microsoft. Shiksha (Sanskrit: शिक्षा, IAST and ISO: śikṣā) is a Sanskrit word, which means “instruction, lesson, learning, study of skill”. The project aims to improve learning outcomes and empower teachers to create comprehensive, age-appropriate lesson plans combining the best available online resources, including textbooks, videos, classroom activities, and student assessment tools. To help curate these resources, the project team built a copilot—an AI-powered digital assistant—centered around teachers’ specific needs, which were identified right at the start through multiple interviews and workshops.

Working with Sikshana Foundation (opens in new tab), a local non-governmental organization focused on improving public education, the researchers are piloting this program at several public schools in and around Bengaluru, India, to build and improve the underlying tools. This post gives an overview of the project, including interviews with three teachers who have used Shiksha copilot in their own classrooms.

A road map for teachers

A lesson plan is like a road map charting what students need to learn and how to efficiently cover the material during class time. It includes three key components:

Objectives for student learning, based on grade level and subject 
Teaching and learning tactics, including tutorials and activities to help students understand the topic
Strategies to assess student understanding, both in class and through homework

Parimala H V teaches science in grades 6-8 at Government Higher Primary School, Santhe Beedhi in Bengaluru. She teaches in the local language, Kannada, and in English. For each class she teaches, she spends an hour or more each day scanning textbooks and printed materials to put together an effective lesson plan. She also searches the internet for ideas, but sifting through the growing body of online content could take just as long. Often she would work till midnight planning the next day’s activities, which left her feeling tired and stressed.

“Lesson planning can be a struggle, but it’s very important,” Parimala said. “If the planning goes well, everything goes well.”

With Shiksha copilot, Parimala was able to develop a complete lesson plan in 60 to 90 seconds, instead of 60 to 90 minutes. The simple interface asks basic questions about the curriculum, language of delivery, grade level, and subject. It then compiles engaging learning materials to achieve the teacher’s classroom objectives. Parimala finds better ideas and hands-on activities using Shiksha copilot than through other online tools. She feels well rested and better prepared for her day, which also makes her happier in the classroom. And with the time she saves, she can focus more on coaching her students and improving her teaching practices.

Ms. Parimala standing in front of a school

“I was thrilled to have the opportunity to use Shiksha copilot,” Parimala said. “It could be very useful for new teachers just learning their profession. I think it could revolutionize the way teachers teach.”

Parimala H.V., Teacher, Government Higher Primary School, Santhee Beedhi

At Parimala’s school and others in the Bengaluru area, teachers face some significant challenges. Classrooms can have up to 70 students of varying abilities. Teachers often need to prepare lessons and give instruction in both English and Kannada. As the Covid pandemic brought about remote learning on a large scale, technology began to rapidly change how teachers and students interact. Most students now have computers or smartphones, expanding teachers’ options. But it also makes it harder to keep students focused on a traditional classroom blackboard.

“These children are addicted to their mobile phones and social media. If I use the ‘chalk and talk’ method in class, they may get bored,” said Gireesh K S, who relies heavily on his blackboard to teach math and physics at Government High School, Jalige. Gireesh has used web search tools to find digital resources like interactive PowerPoint slides that will hold his students’ attention longer. With Shiksha copilot, he can zero in more quickly on videos or classroom activities that help him connect better with all 40+ students in his class.

“Here lies the teacher’s job. The teacher has to select whichever activity, whichever video, or whichever questions to use,” Gireesh said. “There are so many questions and videos (to choose from), but as a teacher for my class, I know my students. So, I have to select the suitable ones.”

Other learning platforms were less flexible and less dynamic, returning static content options that were not always useful for a diverse group of learners. Shiksha copilot, on the other hand, does a much better job of customizing and adapting its recommendations based on teacher input, Gireesh said.

“Shiksha copilot is very easy to use when compared to other AI we have tried, because it is mapped with our own syllabus and our own curriculum.”

Gireesh K S, Teacher, Government High School, Jalige

Behind the technology

Designing and building Shiksha copilot requires various technological innovations. Educational content is mainly multimodal, including text, images, tables, videos, charts, and interactive materials. Therefore, for developing engaging learning experiences, it is essential to build generative AI models which have unified multimodal capabilities. Also, these experiences are most impactful when delivered in native languages, which requires improving the multilingual capabilities of generative AI models.

Shiksha copilot includes a range of powerful features that address those challenges and enhance the educational experience. It’s grounded in specific curricula and learning objectives, to ensure that all generated content aligns with desired educational outcomes, according to Akshay Nambi (opens in new tab), principal researcher at Microsoft Research. “This grounding is enabled by ingesting relevant data with the help of state-of-the-art optical character recognition (OCR), computer vision (CV) and generative AI models. It was also important to use natural language and support voice-based interactions while including options for English and Kannada speakers,” Nambi said.

Shiksha copilot supports connectivity to both public and private resource content, enabling educators to tap into a vast array of materials and tailor them to their unique teaching requirements. Shiksha copilot can be accessed through different modalities, such as WhatsApp, Telegram, and web applications, enabling seamless integration with teachers’ current workflows.

To help create content more quickly and efficiently, the system leverages semantic caching with LLMs. Storing and reusing previously processed educational content reduces computational resources required to deliver a scalable, and affordable copilot experience. Throughout development, the project team followed established protocols regarding safety, reliability and trustworthiness.

“Extensive prompt designing, testing and rigorous responsible AI procedures, including content filtering and moderation, red team assessments and jailbreaking simulations, have been deployed to maximize safety and reliability. These measures are in place so that Shiksha copilot consistently produces factual and trustworthy content,” said Tanuja Ganu, principal research SDE manager at Microsoft Research.

Convincing the skeptics

Before the initial workshop, some teachers expressed skepticism about using AI for lesson planning. Students already have multiple digital learning tools. But for Mahalakshmi A, who teaches standard science in grades 4-8 at rural Government Higher Primary School, Basavana Halli, outside Bengaluru, the value for teachers was less clear. However, during a two-hour initial workshop session, Mahalakshmi found she could easily create multiple lesson plans using Shiksha copilot that would work well in her classroom.

Ms. Mahalakshmi standing in front of a classroom

“I felt very happy because it’s a totally different concept. Before now, I could see that technology could work for the students. But this is the first time that it felt like the teachers also had a tool for themselves.”

Mahalakshmi A., Teacher, Government Higher Primary School, Basavana Halli

Mahalakshmi could also see how the content assembled using Shiksha copilot would make her class more interesting for her students, which is an important goal. “Instead of giving them the same problems, the same experiments, and the same videos, we make learning interesting. And then they learn what we call shashwatha kalike, or permanent learning. With Shiksha copilot, we can make that permanent learning happen in our classroom,” she added.

Next steps

The initial pilot program for Shiksha copilot is underway at more than 10 schools in and around Bengaluru. The goal is to let the teachers experience how Shiksha copilot can best be used in their daily workflows to improve learning experiences and collect feedback. The early response has been highly positive, with teachers expressing great satisfaction in both the quality of the content generated and the time savings. To build on this successful pilot, researchers are gearing up to scale Shiksha copilot in schools across the state of Karnataka and beyond, in collaboration with Sikshana Foundation.

This copilot is being developed as part of Project VeLLM (Universal Empowerment with Large Language Models) at Microsoft Research India. VeLLM’s goal is to make inclusive and accessible copilots available to everyone by building a platform for developing population-scale copilots. Inclusive copilots must address various real-world challenges, such as a multilingual user base, varied skillsets, limited devices and connectivity, domain-specific understanding, guardrails, and safety principles. Shiksha is the first copilot developed using the VeLLM platform. The VeLLM team is working with collaborators across diverse domains, such as agriculture and healthcare, to develop tailored domain-specific copilot experiences utilizing the platform and addressing associated research problems.

To learn more about the project or collaboration opportunities, email the team at shikshacopilot@microsoft.com

Group photo (from left to right): Meena Elapulli (MSR), Ishaan Watts (MSR), Kavyansh Chourasia (MSR), Gireesh K.S. (GHPS, Tumkur), Srujana V S (MSR), Tanuja Ganu (MSR), Mahalakshmi A (GHPS, Basavana Halli), Parimala H.V. (GHPS,Santhe Beedi), Ravi R (GHPS,Gowdahalli), Maruthi K.R. (GHPS, Anedoddi), Smitha Venkatesh (Sikshana Foundation), Akshay Nambi (MSR), Somnath Kumar (MSR), Yash Gadhia (MSR), Sanchit Gupta (MSR) — *The Shiksha copilot team and collaborators (from left to right)*: Meena Elapulli (Microsoft Research), Ishaan Watts (*Microsoft Research*), Kavyansh Chourasia (*Microsoft Research*), Gireesh K.S. (GHPS, Tumkur), Srujana V S (*Microsoft Research*), Tanuja Ganu (*Microsoft Research*), Mahalakshmi A (GHPS, Basavana Halli), Parimala H.V. (GHPS, Santhe Beedi), Ravi R (GHPS, Gowdahalli), Maruthi K.R. (GHPS, Anedoddi), Smitha Venkatesh (Sikshana Foundation), Akshay Nambi (*Microsoft Research*), Somnath Kumar (*Microsoft Research*), Yash Gadhia (*Microsoft Research*), Sanchit Gupta (*Microsoft Research*)

The post Teachers in India help Microsoft Research design AI tool for creating great classroom content appeared first on Microsoft Research.

Data Formulator: A concept-driven, AI-powered approach to data visualization

October 27, 2023

by Alyssa Hughes Microsoft AI

This research paper was presented at the IEEE Visualization Conference (opens in new tab) (VIS 2023), the premier forum for advances in visualization and visual analytics.

Effective data visualization plays a crucial role in data analysis. It enables data analysts and others to explore complex datasets, comprehend patterns, and convey meaningful insights to various stakeholders. Today, there are numerous tools for creating visual representations of data. However, these tools only work with tidy data, meaning that data points must be organized according to the specific categories required by the tool’s visualization format. This poses significant challenges for data analysts, requiring the use of additional tools to transform raw data into a compatible format before it is entered into one of these visualization tools.

For instance, consider a dataset displaying 2020 temperatures in Seattle and Atlanta. If an analyst aims to create a scatter plot comparing the temperatures of these two US cities on the x/y-axes, data transformation is essential. The visualization tool mandates separate columns for Seattle and Atlanta temperatures to map to the scatter plot’s axes. Consequently, the analyst must pivot the input table to generate these columns. Moreover, if the analyst intends to compare which city experiences warmer days or create a smoothed line chart illustrating Seattle’s 7-day moving average temperature, further computations on the transformed data are necessary. Fields like “Warmer” and “Seattle 7-day Moving Avg” need to be calculated to facilitate the visualization, as depicted in Figure 1. This intricate process highlights the complexity and expertise currently needed to prepare raw data for effective visualization.

A figure with upper left showing an input data table with three columns Date, City and Temperature showing temperatures of Seattle and Atlanta from 2020-01-01 to 2020-12-31. On its right side show three visualizations that the user wants to create: (1) a scatter plot to compare their temperatures, (2) a histogram to show number days each city is warmer, and (3) a line chart shows Seattle moving average temperature; and the user cannot create these visualizations because the input table is not in the right format. At the bottom of the figure, it shows a data table that the analyst needs to transform from the input table in order to create desired visualizations. This table contains six columns: Date, Seattle Temp, Atlanta Temp, Warmer, Difference and Seattle Temp Moving Average. There is an emoji of “confusion” to express that the data transformation process can be challenging. — Figure 1. A data analyst wants to compare 2020 temperatures in Seattle and Atlanta using visualizations like scatter plots and histograms. However, the original dataset lacks necessary columns (“Seattle Temp,” “Atlanta Temp,” “Warmer,” and “Seattle Temp Moving Average”) for these visualizations. Data transformation is needed to include these fields.

This hurdle is particularly daunting because it necessitates a certain level of programming expertise or familiarity with additional data processing tools. It highlights the complexities of data visualization and underscores the need for an easier and more seamless process for data analysts, enabling them to create impactful visualizations regardless of their technical background.

Against the backdrop of rapid advancements in learning language models (LLMs) and programming-by-example techniques, researchers have made significant strides in breaking down these barriers. In this context, we share our paper, “Data Formulator: AI-powered Concept-driven Visualization Authoring (opens in new tab),” presented at VIS 2023 (opens in new tab) and winner of the Best Paper Honorable Mention (opens in new tab) award. Data Formulator is an AI-powered visualization authoring tool developed through a collaboration between researchers studying AI and those studying human-computer interaction (HCI). The result is a new visualization paradigm that separates high-level visualization intents from low-level data transformation steps. The process begins with data analysts articulating their visualization ideas as data concepts. These concepts refer to specific data categories, or fields, that analysts want to visualize, even though they are not present in the raw input data. This way, they effectively convey their visualization intent with the AI agent, which, in turn, assists them in implementing their visualization.

Publication

Data Formulator: AI-powered Concept-driven Visualization Authoring

Defining data concepts and creating visualizations

The way Data Formula operates is straightforward. The analyst defines the specific data concepts they plan to visualize, either through natural language queries or by providing categories, or example entries for the concept. Once these concepts are defined, they are linked to appropriate visual representation, as illustrated in Figure 2.

A figure shows the user interface of Data Formulator and steps for an analyst to interact with the interface. At the right side shows the concept shelf, there is an annotation that reads “1. Concept Shelf: create and derive new concepts needed for visualization”. To its left is the Chart Builder panel, with an annotation “2. Chart Builder: encode data concepts to visual channels”. The bottom left side is a table view that shows the input data, the annotation reads “3. Data View: inspect the original and derive tables”. The top left is the visualization panel that shows visualizations generated by Data Formulator, the annotation reads “4. Visualization View: explore generated visualizations.” — Figure 2. The Data Formulator user interface. Data Formulator has four panels: (1) the Concept Shelf, for defining new data concepts to be visualized, (2) the Chart Builder, for specifying the visualization type, (3) the Table View, for analysts to inspect data automatically generated by Data Formulator, and (4) the Visualization Panel, for presenting final visualizations.

If the analyst defines concepts through examples, Data Formulator engages a program synthesizer, which generates a specialized data reshaping program, transforming the provided data to bring out the required data fields. Conversely, when an analyst introduces a new concept using natural language queries, Data Formulator calls on LLMs to generate code, which facilitates the creation of a new data category based on the provided description. In both cases, Data Formulator compiles the transformed data into a structured table and creates corresponding visualizations.

We recognize that analyst specifications can be ambiguous, so we designed Data Formulator to generate multiple visualization options to help them identify what they want. The tool also provides analysts with the AI-generated transformation program and the transformed data for inspection. This transparency helps analysts refine their intent for future iterations.

In continuing our Seattle/Atlanta temperatures example, the following two figures show how analysts can use Data Formulator to create visualizations without reformatting raw data using an external tool. Instead, the analyst provides example entries in the form of temperature values to create new the data concepts “Seattle Temp” and “Atlanta Temp,” shown in Figure 3. The analyst uses these natural language queries to create the new concept “Warmer” and instructs Data Formulator to format the data so that it can be visualized, shown in Figure 4.

The figure shows the workflow of the analyst to create new data concepts “Atlanta Temp” and “Seattle Temp” using examples. The left figure shows that the user opens a panel in Data Formulator’s concept shelf, typed the concept name “Atlanta Temp”, and provide example temperature values “45, 47, 56, 41” to define the concept. Then, the user drags Atlanta Temp concept to y-axis in the Chart Builder (the Seattle Temp concept is already placed in the x-axis box). The analyst then completes an example table with two columns Atlanta Temp, Seattle Temp with two rows (row 1 contains two values 45, 51, row contains values 47, 45) to demonstrate the relation between these two concepts. Finally, the analyst clicks “Formulate” button and Data Formulator returns the transformed data (with columns “#”, “Seattle Temp”, “Atlanta Temp”, “Date”) and a scatter plot that visualizes the data with Seattle Temp on x axis, Atlanta Temp on y axis. — Figure 3. The analyst creates new data concepts “Atlanta Temp”, “Seattle Temp” using examples. The AI agent solves a programming-by-example problem to create the new concepts for visualization.

The figure shows the workflow of the analyst to create new data concepts “Warmer” using natural language query. The left figure shows that the user opens a panel in Data Formulator’s concept shelf. The user selected “derived from” two concepts “Seattle Temp” and “Atlanta Temp” and typed the concept name “Warmer”. The user also provides a natural language query “Which is the warmer city, or the same” to describe the concept. After clicking a “forge” icon, in the second box shows the concept with the instantiated concept which contains an example table: the example table has 5 rows and header “Seattle Temp, Atlanta Temp, Warmer”, and the rows show “51, 45, Seattle”, “38, 58, Atlanta”, “44, 65, Atlanta”, “42, 60, Atlanta”, “35, 62, Atlanta”. The user then clicks the inspect button, and Data Formulator opens a panel that shows the code that achieve the transformation. Finally, the analyst clicks “save” button after inspecting the code to confirm the code is correct. — Figure 4. The analyst creates a new data concept “Warmer” using natural language description. Data Formulator calls LLMs to generate a transformation program to derive the new concept.

Looking ahead: Analyst-AI collaboration in data analysis

AI-powered data analysis tools have the potential to significantly streamline the entire data analysis process by consolidating various tasks into a single tool. Beyond just visualization, this concept-driven technique can be applied to data cleaning, data integration, visual data exploration, and visual storytelling. Our vision is for an AI system to take high-level instruction from the user and automatically recommend the necessary steps across the entire data analysis pipeline, enabling collaboration between the user and the AI agent to achieve their data visualization goals.

Inevitably, data analysts will need to tackle more complex tasks beyond the scope mentioned here. For this reason, it’s crucial to consider how to design AI-powered tools that effectively convey results to the analyst that are uncertain, ambiguous, or incorrect. This ensures that the analyst can trust the tool and collaborate effectively with AI to accomplish their objectives.

The post Data Formulator: A concept-driven, AI-powered approach to data visualization appeared first on Microsoft Research.

Project Silica: Sustainable cloud archival storage in glass

October 26, 2023

by Brenda Potts Microsoft AI

This research paper was presented at the 29^th ACM Symposium on Operating Systems Principles (opens in new tab) (SOSP 2023), the premier forum for the theory and practice of computer systems software.

SOSP 2023
Project Silica: Towards Sustainable Cloud Archival Storage in Glass

Data growth demands a sustainable archival solution

For millennia, data has woven itself into every facet of our lives, from business and academia to personal spheres. Our production of data is staggering, encompassing personal photos, medical records, financial data, scientific insights, and more. By 2025, it’s estimated that we will generate a massive 175 zettabytes of data annually. Amidst this deluge, a substantial portion is vital for preserving our collective heritage and personal histories.

Presently, magnetic technologies like tape and hard disk drives provide the most economical storage, but they come with limitations. Magnetic media lacks the longevity and durability essential for enduring archival storage, requiring data to be periodically migrated to new media—for hard disk drives, this is every five years, for magnetic tape, it’s around ten. Moreover, ensuring data longevity on magnetic media requires regular “scrubbing,” a process involving reading data to identify corruption and fixing any errors. This leads to substantial energy consumption. We need a sustainable solution, one that ensures the preservation of our digital heritage without imposing an ongoing environmental and financial burden.

Project Silica: Sustainable and durable cloud archival storage

Our paper, “Project Silica: Towards Sustainable Cloud Archival Storage in Glass, (opens in new tab)” presented at SOSP 2023 (opens in new tab), describes Project Silica, a cloud-based storage system underpinned by quartz glass. This type of glass is a durable, chemically inert, and resilient low-cost media, impervious to electromagnetic interference. With data’s lifespan lasting thousands of years, quartz glass is ideal for archival storage, offering a sustainable solution and eliminating the need for periodic data refreshes.

Writing, reading, and decoding data

Ultrafast femtosecond lasers enable the writing process. Data is written inside a square glass platter similar in size to a DVD through voxels, permanent modifications to the physical structure of the glass made using femtosecond-scale laser pulses. Voxels encode multiple bits of data and are written in 2D layers across the XY plane. Hundreds of these layers are then stacked in the Z axis. To achieve high write throughput, we rapidly scan the laser pulses across the length of the media using a scanner similar to that used in barcode readers.

To read data, we employ polarization microscopy to image the platter. The read drive scans sectors in a single swift Z-pattern, and the resulting images are processed for decoding. Different read drive options offer varying throughput, balancing cost and performance.

Data decoding relies on ML models that analyze images captured by the read drive, accurately converting signals from analog to digital. The glass library design includes independent read, write, and storage racks. Platters are stored in power-free storage racks and moved by free-roaming shuttles, ensuring minimal resource consumption for passive storage, as shown in Video 1. A one-way system between write racks and the rest of the library ensures that a written platter cannot be over-written under any circumstances, enforcing data integrity.

Video 1. The Silica library prototype demonstrates the flexible and scalable design of the system and its ability to sustainably service archival workloads.

Azure workload analysis informs Silica’s design

To build an optimal storage system around the core Silica technology, we extensively studied cloud archival data workloads from Azure Storage. Surprisingly, we discovered that small read requests dominate the read workload, yet a small percentage of requests constitute the majority of read bytes, creating a skewed distribution, as illustrated in Figure 1.

Project Silica paper at SOSP 2023: A double bar chart with 2 y-axes: percentage of total read operations on the left y-axis, and percentage of total bytes read on the right y-axis; with file size buckets on the x-axis. The graph shows that the majority of read operations are for files with small file sizes, but they only make up a small fraction of all the bytes read (i.e., 58% of operations are for file sizes smaller than 4MB, but make up less than 1.2% of all bytes read). Conversely, most bytes read are for large files, but make up a small fraction of all read operations (i.e., 85% of bytes read are for files larger than 256MB, but make up less than 2% of requests). — Figure 1. The distribution of read request sizes. Most requests are for small files, but they make up a small percentage of the total load in bytes.

This implies that minimizing the latency of mechanical movement in the library is crucial for optimal performance. Silica glass, a random-seeking storage medium, can suitably meet these requirements as it eliminates the necessity for spooling, unlike magnetic tape. Figure 2 illustrates substantial differences in read demand across various datacenters. These results suggest that we need a flexible library design that can scale resources for each datacenter’s workload. Studying these archival workloads has been instrumental in helping us establish the core design principles for the Silica storage system.

Project Silica paper at SOSP 2023: Figure 2. A bar chart showing different, unlabeled data centers on the x-axis, and tail over median read throughput on the y-axis on a log scale. The graph shows up to 7 orders of magnitude mean-to-tail difference within a data center, and up to 5 orders of magnitude variability in the mean-to-tail difference across different data centers. — Figure 2. Tail over median read load for different datacenters. The data shows significant variation across and within datacenters.

Project Silica’s versatile storage system

We designed and evaluated a comprehensive storage system that manages error correction, data layout, request scheduling, and shuttle traffic management. Our design effectively manages IOPS-intensive tasks, meeting the expected service level objective (SLO) of an archival storage tier, approximately 15 hours. Interestingly, even in volume-intensive scenarios where a large number of bytes are read, our system efficiently handles requests using read drives with low throughput. In both cases, throughput demands are significantly below those of traditional tape drives. This is shown in Figure 3. The paper provides an extensive description of this system, and the video above shows our prototype library’s capabilities.

Project Silica paper at SOSP 2023: Figure 3. A line chart with 3 lines: Volume, IOPS, and Typical. The x-axis shows Read drive throughput ranging from 30MB/s to 210MB/s in increments of 30, and the y-axis shows the tail completion time in hours of the system running each of the workloads represented by each line. The graph shows that all workloads complete within the desired 15-hour SLO, even with 30MB/s read drives. The SLO improves as read drive throughput increases, but starts to plateau past 60MB/s for all workloads. — Figure 3. Volume and IOPS workloads represent different extremes in the spectrum of read workloads. Our design can service both workloads well within the expected SLO for an archival storage tier, at about 15 hours.

Diverse applications for sustainably archiving humanity’s data

Project Silica holds promise in numerous sectors, such as healthcare, scientific research, and finance, where secure and durable archival storage of sensitive data is crucial. Research institutions could benefit from Silica’s ability to store vast datasets generated from experiments and simulations, ensuring the integrity and accessibility of research findings over time. Similarly, healthcare organizations could securely archive patient records, medical imaging data, and research outcomes for long-term reference and analysis.

As the volume of globally generated data grows, traditional storage solutions will continue to face challenges in terms of scalability, energy-efficiency, and long-term durability. Moreover, as technologies like AI and advanced analytics progress, the need for reliable and accessible archival data will continue to intensify. Project Silica is well-positioned to play a pivotal role in supporting these technologies by providing a stable, secure, and sustainable repository for the vast amounts of data we create and rely on.

The post Project Silica: Sustainable cloud archival storage in glass appeared first on Microsoft Research.

Research Focus: Week of October 23, 2023

October 25, 2023

by Alyssa Hughes Microsoft AI

Welcome to Research Focus, a series of blog posts that highlights notable publications, events, code/datasets, new hires and other milestones from across the research community at Microsoft.

NEW RESEARCH

Kosmos-2.5: A Multimodal Literate Model

Current large language models (LLMs) primarily focus on textual information and cannot understand visual information. However, advancements in the field of multimodal large language models (MLLMs) aim to address this limitation. MLLMs combine visual and textual information within a single Transformer-based model, enabling the model to learn and generate content based on both modalities.

While existing MLLMs have mainly focused on natural images with lower resolutions, the exploration of text images requires further investigation. Incorporating text images into the training process and developing models based on textual and visual information can unlock new possibilities for multimodal applications involving high-resolution text-intensive images.

In a new paper: Kosmos-2.5: A Multimodal Literate Model, researchers from Microsoft present Kosmos-2.5, a MLLM for machine reading of text-intensive images. Pre-trained on large-scale text-intensive images, Kosmos-2.5 excels in: (1) generating spatially-aware text blocks, where each block of text is assigned its spatial coordinates within the image, and (2) producing structured text output that captures styles and structures into the markdown format. The model can be adapted for any text-intensive image understanding task with different prompts through supervised fine-tuning. This work paves the way for the future scaling of MLLMs.

Read the paper

NEW RESEARCH

Evaluation of Dependency Structure for Multivariate Weather Predictors using Copulas

In the Global South (opens in new tab), climate change is driving more frequent and severe weather events such as droughts, floods, and storms. This leads to crop failures, food insecurity, and job loss. These effects are expected to increase in intensity, further disadvantaging marginalized communities and exacerbating existing inequalities. The need for prevention and adaptation is urgent. But despite advances in machine learning and numerical modeling, accurate weather forecasting remains challenging, due to complex interactions among atmospheric and oceanic variables.

In a new paper: Evaluation of Dependency Structure for Multivariate Weather Predictors using Copulas, researchers from Microsoft explore the potential of vine copulas to explain complex relationships of different weather variables in three African locations. Copulas separate marginal distributions from the dependency structure, offering a flexible way to model dependence between random variables for improved risk assessments and simulations. Vine copulas are based on a variety of bivariate copulas, including Gaussian, Student’s t, Clayton, Gumbel, and Frank copulas. They are effective in high-dimensional problems and offer a hierarchy of trees to express conditional dependence. The researchers propose applying this framework within subseasonal forecasting models to enhance the prediction of different weather events or variables.

Read the paper

NEW RESEARCH

Adaptive Training System

Adaptive training has been defined as training in which the problem, stimulus, or task is varied as a function of how well the trainee performs. Researchers have shown that this type of training outperforms comparative training that is non-adaptive or fixed across a range of populations and learning contexts. Virtual reality offers new opportunities for applying this type of training and has already demonstrated its effectiveness (opens in new tab) across a variety of simulated tasks. By using a computational model of the training process, we can derive recommendations for optimal scenario difficulty, resulting in faster and enhanced training.

In a new paper: Adaptive Training System, researchers from Microsoft propose an adaptive training algorithm that accelerates the training process based on a parametric model of trainees and training scenarios. The proposed approach makes trial-by-trial recommendations on optimal scenario difficulty selections to maximize improvements in the trainee’s absolute skill level. The Adaptive Training System is applied to the task of training pilots on a virtual reality flight simulator. The system was designed for scenarios varying in difficulty from easy, with full visibility, to flight in fog with side wind, which is difficult even for experienced pilots.

Adaptive Training System applied to the task of training pilots on a virtual reality flight simulator. On the left, a flight scenario with fog. On the right, a flight scenario with full visibility.

Read the paper

NEW RESEARCH

CodePlan: Repository-level Coding using LLMs and Planning

Software engineering activities such as package migration, fixing error reports from static analysis or testing, and adding type annotations or other specifications to a codebase, involve pervasively editing the entire repository of code. These activities are formulated as repository-level coding tasks.

Large language model-powered coding assistants, like GitHub Copilot, have succeeded in offering high-quality solutions to localized coding problems. But repository-level coding tasks are more involved and cannot be solved directly using LLMs, since code within a repository is interdependent and the entire repository may be too large to fit into the prompt.

In a new paper: CodePlan: Repository-level Coding using LLMs and Planning, researchers from Microsoft frame LLM-driven repository-level coding as a planning problem, where the goal is to take the repository from its initial state to a target state whose specifications are provided in natural language. They present CodePlan, a task-agnostic framework, to solve it by synthesizing a multi-step chain of edits, where each step results in a call to an LLM on a code location with context derived from the entire repository, previous code changes and task-specific instructions. This research evaluates the effectiveness of CodePlan on two repository-level tasks: package migration (C#) and temporal code edits (Python) and shows that CodePlan exhibits a stronger alignment with the ground truth in comparison to baselines.

Read the paper

NEW ARTICLE

The intimacy triple bind: Structural inequalities and relational labor in the influencer industry

Social media content creators, or influencers, depend heavily on their ability to cultivate and maintain an invested audience-community. They are encouraged to practice “relational labor,” commodifying their personalities, lives and tastes in order to build authentic self-brands and intimacy with audiences.

In a new article (opens in new tab), a researcher from Microsoft draws on an ethnographic study of the London influencer industry to examine relational labor through an intersectional feminist lens, exploring the ways in which structural inequalities shape relationships between creators and their audiences. Managing audience relationships is harder for marginalized creators – especially those making stigmatized and less brandable content genres – who are at higher risk of trolling and harassment.

This article explores four key tactics for managing such conditions: (1) leaning into making rather than being content; (2) (dis)engaging with anti-fans through silence; (3) retreating into private community spaces, away from the exposure of public platforms; and, in parallel, (4) turning off public comments.

Read the article

Read the paper

The post Research Focus: Week of October 23, 2023 appeared first on Microsoft Research.

Abstracts: October 23, 2023

October 23, 2023

by Alyssa Hughes Microsoft AI

Members of the research community at Microsoft work continuously to advance their respective fields. Abstracts brings its audience to the cutting edge with them through short, compelling conversations about new and noteworthy achievements.

In this episode, Andy Gordon, a Partner Research Manager, and Carina Negreanu, a Senior Researcher, both at Microsoft Research, join host Dr. Gretchen Huizinga to discuss “Co-audit: Tools to help humans double-check AI-generated content.” This paper brings together current understanding of generative AI performance to explore the need and context for tools to help people using the technology find and fix mistakes in AI output.

View the paper

Transcript

GRETCHEN HUIZINGA: Welcome to Abstracts, a Microsoft Research Podcast that puts the spotlight on world-class research in brief. I’m Dr. Gretchen Huizinga. In this series, members of the research community at Microsoft give us a quick snapshot, or a podcast abstract, of their new and noteworthy papers. Today, I’m talking to Dr. Andy Gordon, a Partner Research Manager, and Dr. Carina Negreanu, a Senior Researcher, both at Microsoft Research. Doctors Gordon and Negreanu are co-editors of a paper called “Co-audit: Tools to help humans double-check AI-generated content,” and you can read a preprint of this paper now on arXiv. Andy Gordon, Carina Negreanu, thanks for joining us on Abstracts!

ANDY GORDON: Great to be here.

CARINA NEGREANU: Likewise.

HUIZINGA: Let’s start with you, Andy. In a few sentences, describe the issue or problem your paper addresses and why people should care about it.

GORDON: Well, generative AI is amazing. Things like Bing Chat or ChatGPT, all these things powered by large language models. Totally amazing. But it’s really important for everyone to remember that these AIs can make mistakes. For example, you ask when your favorite actor got married, and the model says the year but gets it wrong. Or you ask for some Python code, and it works on positive numbers, but occasionally you give it negative numbers and it goes wrong. Another example, you get a summary of some text. It’s great but unfortunately misses one of the important points. Or thinking about images, you ask for a portrait of a character from the AI and there’s some glitch, and it produces a hand with six fingers. So as users, we need to get into the habit of carefully checking AI outputs for mistakes. And we refer to that as “audit” in a sense of a systematic review. Coming to the paper, it’s about what we call co-audit. And that’s our term for any tool support that helps the human audit the AI output. And some examples of co-audit are tools that can help check for hallucinations, like when the actor’s date of birth is wrong, or to check Python code to find some errors or show how a summary has been constructed to help people find errors.

HUIZINGA: Carina, let’s talk to you. What related research does this paper build on, and how does your work add to it?

NEGREANU: So there was no direct work on the co-audit brand before us. We’re just introducing it. But there has been a lot of research that either motivates the need for co-audit or provides relevant framing for it or even like early examples of what we start thinking of co-audit. So as you’re probably aware, there has been a really great effort in the last years to assess the quality of generations by large language models across a multitude, really, of tasks. And currently we use this body of work as motivation for our research. It basically shows there really is a need for this kind of work. And we hope that in the future, we can also use it to benchmark co-audit tools that we are going to produce in our wider community. But the idea of dealing with errors has been a key part of research on human-AI interaction for ages. And there have been some really cool guidelines that came out recently, especially from Amershi in 2019, on human-AI interactions that are concerned with this part of the world. And more recently, Glassman had a really cool paper about conversational frameworks for human-AI and communication and basically links these concepts to psychology. And in our work, as you can read in our paper, we are trying to basically frame co-audit within her framework, and we find that it’s a natural fit. But before we started defining formally co-audit and building this paper, our group has built co-audit tools in the co-generation space. One such tool is GAM, which is grounded abstraction matching, where we basically help users learn how to effectively communicate with large language models so that they both understand what the large language model understands they’re asking and also get good feedback back. We also built ColDeco, which is a spreadsheet tool for inspecting and verifying calculated columns without the user requiring to view the underlying code produced by the large language models. But really, any tool that focuses on debugging or basically getting information back from human-generated content is useful here. So even tools that are like early debugging tools like FxD are very important here as we learn how people use these kinds of tools and we try to basically apply the same concepts in the context of LLM-generated content. So basically, we are building on top of work that helps understand the needs and challenges that end-user programmers have when working in this space and trying to extrapolate them to co-auditing tools for LLM-generated content.

HUIZINGA: Well, Andy, how would you describe the research approach you used or your methodology for this paper, and how did it come about?

GORDON: Great question, Gretchen, and it was actually quite an unusual methodology for us. So as Carina says, we’ve been looking at co-audit in a very specific setting of spreadsheet computations, and we began to realize that co-audit was really important for any kind of AI-generated output, and we started to see other people doing research that was doing the same sort of thing we were doing but in different settings. So, for example, there was a paper, they were generating bits of Python and they were deliberately showing multiple pieces of code after they’d been generated to kind of nudge the human user to make a decision about which one was better. I mean that’s, it’s really important to get people to think about the outputs, and this was a nice trick. So we thought, look, this is actually quite an important problem, and MSR (Microsoft Research) should step up and sort of gather people. So we organized a workshop inside Microsoft in the spring and got folks together to share their perspectives on co-audit. And then since then, we’ve reflected on those discussions and tried to kind of pull them together in a more coherent sense than the sort of whiteboards and sticky notes that we produced back then. And so that’s produced this paper. I think one of the key things that we learned in that process that we hadn’t been thinking about before was that co-audit really complements prompt engineering. So you hear a lot about prompt engineering, and it’s the first part of what we call the prompt-response-audit loop. And this is related to what Carina was saying about Elena Glassman’s work about AI-human interaction. So the first step is you formulate a prompt. For example, you ask for Python code. That’s the first step. The second step is we wait for the response from the AI. And then the third step is that we need to inspect the response—that’s the audit part—decide if it meets our needs or if there is a mistake, and if that’s the case, we need to repeat again. So that’s this loop, the prompt-response-audit loop. And prompt engineering, they’re the tools and techniques that you use in that first step to create the prompt. So, for example, some tools will automatically include a data context in a prompt if you’re trying to create some Python to apply to a table in a spreadsheet or, or something like that. And then duly, co-audit, those are the tools and techniques we have to help the human audit the response in the third step of this loop. And that’s like these tools I’ve been mentioning that show maybe two or three candidates of code that’s to be used.

HUIZINGA: Carina, let’s move over to what kinds of things you came away with. Your takeaways or your findings from this workshop. Talk about that and how you chose to articulate them in the paper.

NEGREANU: So as part of our research, we found that basically one co-audit tool does not fit all needs, which in a way was great because we have a bigger field to explore, but in other ways a bit daunting, as it means you have to think of many things. And one thing that really came to light was that even though we can’t, you know, build something that fits everything, we can build a set of principles that we think are important. So really, we wrote our paper around those 10 principles that we have identified from the workshop and then are trying to promote them as things people should think about when they start going on the journey of building co-auditing tools. So one of the examples is that we really think that we should think about grounding outputs, so, for example, by citing reliable sources similar to what Bing Chat does today. We think that’s a really valuable, important principle that people should follow, and they should think about what that means in the concept of their co-auditing tool. In the case of Bing, it’s quite simple, as it’s like factual references, but maybe if it becomes referencing code, that becomes more tricky but still super interesting going forward. We also propose that co-auditing tools should have the capability to prioritize the user’s attention to the most likely errors, as we need to be mindful of the user’s cognitive efforts and have a positive cost benefit. Basically, if we overflood the users with different errors and flags, it might be too problematic, and the adoption might be quite difficult going forward. And finally, this is something that really comes to core to our research area in spreadsheets. It’s about thinking beyond text. So we know visuals are so important in how we explain things, in how we teach in schools, how we teach universities. So how do we include them in the co-auditing process going forward? I think that’s going to be a really interesting challenge, and we hope we’re going to see some interesting work in that space.

HUIZINGA: Yeah. Well, principles are one thing, Andy, but how does this paper contribute to real-world impact? We talked about that a bit at the beginning. Who benefits most from this tool?

GORDON: That is a great question, Gretchen, and actually that was a question that we talked about at the workshop. We think that some application areas are going to benefit more than others. So co-audit really matters when correctness really matters and when mistakes are bad consequences, so in terms of application area, that’s areas like maybe finance or technology development or medicine. But you asked particularly about who, and we think some people will benefit more from co-audit than others. And we found this really striking example, I guess it’s an anecdotal example that someone was posting on social media. A professor was teaching a class using generative AI tools for the first time to generate code, and he found some evidence that people who have low self-confidence with computers can be intimidated by generative AI. So he would find that some of the class were really confident users and they would ask it, you know, generate some Python to do such and such, and it would come back with code with, you know, a bunch of mistakes in it. And the confident users were happy just to swat that away; they were even quite a little arrogant about it, like this is a stupid computer, they were saying. But, Gretchen, he found that a lot of his students who were less confident with computers were quite intimidated by this because it was very confidently just saying, oh look, all this code is going to work. And they kind of got a bit stuck, and some of them were scrolling around through this code, trying to understand how it worked, when in fact it was just really broken. So he thought this was pretty bad that these able students who were just less confident were being intimidated and were making less good use of the, the generative AI. Now that is an example that’s an anecdote from social media from a reputable professor, but we looked into it and there’s peer-reviewed studies that show similar effect in the literature. So I’d say we need co-audit tools that will encourage these less confident users to question when the AI is mistaken rather than getting stuck, and I think otherwise they’re not going to see the benefits of the generative AI.

HUIZINGA: Well, Carina, sometimes I like to boil things down to a nugget or a beautiful takeaway. So if there’s one thing you want our listeners to take away from this work, this paper, what would it be?

NEGREANU: I think that what this study has taught us is that really we need significantly more research. So basically, a good co-auditing experience can really be the element that makes it or breaks it in how we incorporate LLMs safely into our day-to-day lives. But to make this happen, we need people from the field working towards the same goal. It’s really an interdisciplinary work, and I don’t think we can do it by isolating into groups as we’re currently researching now. So I would urge our listeners to think about how they could contribute in this space and reach out with feedback and questions to us. We are more than open to collaboration. Really, we are just starting this journey, and we’d love to see this area to become a research priority going forward in 2024.

HUIZINGA: Well, Andy, as an opportunity to give some specificity to Carina’s call for help, what potential pitfalls have you already identified that represent ongoing research challenges in this field? And what’s next on yours—and potentially others’—research agenda in this field?

GORDON: Well, one point, and I think Carina made this, that co-audit techniques will themselves never be perfect. I mean, we’re saying that language models are never going to be perfect. Mistakes will come through. But the co-audit techniques themselves won’t be perfect either. So sometimes a user who is using the tools will still miss some mistakes. So, for example, you know, at the workshop, we thought about security questions and co-audit tools themselves. And we were thinking, for instance, about maybe deliberate attacks on a generative AI. There’s various techniques that people are talking about at the moment where you might sort of poison the inputs that generative AI models pick up on. And in principle, co-audit tools could help users realize that there are deliberate mistakes that have been engineered by the attacker. So that’s good. But on the other hand, you know, security always becomes an arms race. And so once, you know, if we did have a good tool that could detect those kinds of mistakes, the attackers then will start to engineer around the co-audit tools, trying to make them less effective. So that will be an ongoing problem, I think. And on the other hand, you know, we’ll find that if co-audit tools are giving too many warnings, users will start to ignore them, and there’ll be a sort of under-reliance on co-audit tools. And of course, if we give too few, users will miss the mistakes. So an interesting balance needs to be struck. And also, we don’t expect there’s going to be one overarching co-audit experience, but we think there’ll be many different realizations. And so, as Carina says, we hope that common lessons can be learned, and that’s why we want to keep documenting this space in general and building a research community. So I echo what Carina was saying. If you’re listening and you think that what you’re working on is co-audit, do reach out.

HUIZINGA: Well, Andy Gordon, Carina Negreanu, thanks for joining us today. And to our listeners, thanks for tuning in. If you’re interested in learning more about this paper and this research, you can find a link at aka.ms/abstracts, or you can read the preprint on arXiv. See you next time on Abstracts!

The post Abstracts: October 23, 2023 appeared first on Microsoft Research.

What’s Your Story: Ranveer Chandra

October 19, 2023

by Brenda Potts Microsoft AI

In this new Microsoft Research Podcast series What’s Your Story, Lab Director Johannes Gehrke explores the who behind the technical and scientific advancements helping to reshape the world. He talks to members of the research community at Microsoft about what motivates their work and how they got where they are today.

Ranveer Chandra is Managing Director of Research for Industry and CTO of Agri-Food. He is also head of Networking Research at Microsoft Research Redmond. His work in systems and networking is helping to bring more internet connectivity to more people and is yielding tools designed to help farmers increase food production more affordably and sustainably. In this episode, he shares what it was like growing up in Jamshedpur, India; why he focuses his efforts in the areas he does; and where the joy in his work comes from.

Learn more:

Ranveer Chandra at Microsoft Research

FarmBeats: AI, Edge & IoT for Agriculture

Project FarmVibes

6G | Space

Transcript

[TEASER]

[MUSIC PLAYS UNDER DIALOGUE]

RANVEER CHANDRA: If you’re a professional, one of the things I would say is try to go after your passion. If you give your work a bigger meaning than just making money, you’ll go beyond the 9-to-5 or 9-to-6 schedule. You’ll give it a lot more than just thinking about it as work.

[TEASER ENDS]

JOHANNES GEHRKE: Microsoft Research works at the cutting edge. But how much do we know about the people behind the science and technology that we create? This is What’s Your Story, and I’m Johannes Gehrke, Lab Director of Microsoft Research Redmond. I’m excited by the people I work with, and I’m curious about how they became the talented and passionate people they are today. So I sat down with some of them. Now, I’m sharing their stories with you. In this podcast series, you’ll hear from them about how they grew up, the critical choices that shaped their lives, and their advice to others looking to carve a similar path.

[MUSIC FADES]

In this episode, I’m talking with Ranveer Chandra. Ranveer is the Managing Director of Research for Industry and head of Networking Research in Redmond, and he’s been with the company for almost 20 years. His work in systems and networking is helping to bring more internet connectivity to more people and is yielding tools designed to help farmers increase food production more affordably and sustainably.

Here’s my conversation with Ranveer, beginning with his childhood in India and his experience applying to and studying at one of the prestigious Indian Institutes of Technology.

RANVEER CHANDRA: So I grew up in India. I grew up in a city called Jamshedpur in India. It’s a steel city. It’s the only city in India without a mayor, so the Tatas run …

JOHANNES GEHRKE: What does steel city mean?

CHANDRA: It’s the first steel plant in India …

GEHRKE: OK, uh-huh …

CHANDRA: … and a lot of the steel comes from there. The Tatas, which are the big industrialists in India, they run the city. So it’s … I grew up with 24 water, 24 electricity, trees on both sides. It looks like a mini Seattle or a mini Palo Alto in India. It’s a beautiful city. And I did my schooling there in, uh, in one of the schools in that city called Jamshedpur. I did my undergrad in IITs, one of the IITs in India, and then I came to the US, uh, to do my PhD at Cornell. So my childhood, we are three brothers and a sister. All three brothers went to … all four of us studied engineering. Uh, the three brothers all went to IITs, different IITs, and we studied hard, played hard. We did spend a lot of time in villages, though. Every summer and winter vacation, we would go to my grandparents’ place, which was in another state in India called Bihar, which is one of the poorest states. But my grandparents, they were farmers, and they had a lot of farmlands in those villages, so I did spend summer and winter vacations in those villages.

GEHRKE: And how did you end up to study engineering? How do you decide on that?

CHANDRA: Yeah. So in India, as it happens, these IITs are very competitive exams. So in … around … during our time, like close to half a million people gave the test, students gave the test, and the top 2,000 got into IIT. Those are tests of physics, chemistry, and math. Those are the only three subjects, and among those, the top few would try … typically pick computer science. So it was more I enjoyed math. That was my … that was what I really enjoyed. And then, uh, because I got selected into the IITs, that was of course a kind of a dream for many people, to go study there. The education level there is really high, really good. And that is how I ended up in IIT. It was kind of unplanned. It wasn’t, you know, when I got my … when I got through IIT … I wanted to go to another IIT because the one I went to, Kharagpur, it was close to home, but I wanted to go to Bombay because it’s a big city. Mumbai. It’s a big city. Bollywood runs out of Bombay. I thought I could get into Bollywood. Not, not really. [LAUGHS] But I did go to Kharagpur, which is closest to … this is the one where it has … it’s the oldest IIT, and it was very close to home. So I ended up going there and studying computer science.

GEHRKE: And why computer science?

CHANDRA: Yeah, so, uh, computer science because it was … it had a lot of math, so once I got … the way I got exposed to computers was I was … in high school, I studied the theory before I got to touch a computer. There was one computer in school.

GEHRKE: One computer in the whole …

CHANDRA: One computer, and everyone had to go there and see what a computer is. But we did get … we had books to teach you everything about what binary is, how computers were invented. That was around the time I enjoyed reading about computers …

GEHRKE: So you did like algorithms on like sheets of paper?

CHANDRA: So, yeah, so you draw the flow charts …

GEHRKE: Right.

CHANDRA: I enjoyed some of the flow charts. I remember some of the flow charts like how do you have the greatest common factor and things like that. I enjoyed doing those algorithms, and there was that similarity with math. You need to have a good math background to, to enjoy those things in computers. So I did a lot of programming on pen and paper, and someone would correct it. And then we got to start learning … BASIC was the first language that I learned. I really liked coding.

GEHRKE: What kind of computer was it, actually?

CHANDRA: So this was, um, you know, these, these dumb computers with one mainframe behind, so this was one of the Sun computers back then.

GEHRKE: Oh, wow. OK, uh-huh.

CHANDRA: And we had just these dumb terminals through which you would get access to these … [LAUGHS]

GEHRKE: And that was BASIC, not, not Pascal or anything? That’s interesting …

CHANDRA: It started with BASIC. Yeah, BASIC, and then FORTRAN was the next one, then C. So those were the languages that I learned. And computers because I just enjoyed … I would have picked either math or computers. Those were my two things, and computers was just fun. It was more … and that was just the time, you know, when you would reserve some time to play a computer game, Pac-Man and things like that, so those things were fun back then. This was, uh, late ’80s, early ’90s.

GEHRKE: And then what I’ve heard is that to get into the IITs is super competitive, so did you then study a lot or you played a lot, or what was the … ?

CHANDRA: That’s a funny story. So you know, when I went into IIT, I was … the interesting thing is once you go there, everyone who comes there is from all over India; these are the people who are top of their class. So everyone else is as good as you. So you, you then end up studying very hard because that’s the culture. Everyone is coming in there. And, uh, in the first semester at IIT, I was No. 1 in the IIT, all across IIT, and that was like, “Whoa, that was, that was easy.” I didn’t really put in a lot of effort. My elder brother was there, too. He was in the last year, and he was, he was more of the fun kind. I was more of the, you know, the studious kind. He came and told me don’t do Thing A, Thing B; don’t get into alcohol or, or party and all of that stuff. I ended up doing all of that. Don’t run for elections. I ran for elections and all that stuff.

GEHRKE: Oh really. Where did you run for elections? The student parliament?

CHANDRA: Within the institute, so I was the secretary of sports—volleyball and all of that stuff. So I did … a lot of fun, as well. So in the end I was like No. 3 graduating. But I did have a lot of fun, too. I did a lot of social, cultural things. I was on the volleyball team and things like that at IIT.

GEHRKE: And coming once more back, I mean so to get into the IIT … I mean, for, for me as a German, this is so, you know, unusual because we don’t have these centralized entrance exams, except for medicine.

CHANDRA: Yeah.

GEHRKE: But, um, I heard the test is really, really hard. And you actually in your last year of high school, you don’t really study for high school anymore. You just study for that test. How, how, how is that actually?

CHANDRA: Yeah. And now it has become even more competitive. During our time, it was … there were fewer seats. There were like 2,000 people from all across … there were five IITs, six IITs, back then. And yeah, studying towards the end … so you start studying just physics, chemistry, math. Back during our time, we didn’t have as much tuitions and stuff. I didn’t have many … anyone … like the last six months, I’ve had something. But now, people go to these, these other towns which are meant for coaching people for IITs, and they have these different sections …

GEHRKE: They live away from their parents?

CHANDRA: They go away from their parents; they live in a hostel. And all they’re preparing for is the IITs. We don’t have that … I didn’t have that during our time. But now it has become so much even more competitive. More students take it, and it’s like a centralized exam for, for studying. But it does … you know, in the end, the experience was worth it. If you ask me, “Hey, was all this studying worth it?” I think getting into IITs, of course, the professors are good, but the students are exceptional, the kind of people you’re interacting with, that ambience. And now when I look at my classmates, everyone’s doing well. And you find people doing different things. Not, not everyone is in, is in tech. They go do different things, and they excel in that field because of the kind of people that they select into these IITs. So I think in the end, it was stressful, but it was worth it.

GEHRKE: It was a great opportunity. Yeah, I mean, and, and then you made sort of the decision, not only after the IITs, to, to stay in India and to take probably a very good job …

CHANDRA: Yeah.

GEHRKE: … but to come to the US and, you know, learn even more. So what, what drove you to that decision?

CHANDRA: Yeah, that was kind of like the way I studied computer science. It was not … at least I had a passion for computer science. I didn’t want to do a PhD, by the way, when I was coming here. So you would ask why. So when, when I was graduating, I got the highest-paying job that year among all the undergrads. And that was a big deal. That was back then, Synopsys, one of the EDA companies, the CAD companies, right, the VLSI companies. So I would have taken that, but as it happens, usually the people who are at the top of the class, they would apply outside and they would come here to study. And that was the reason I had applied. But then the, the thing I really wanted to do in my career was to be in business. I wasn’t really looking to be an academic back then. I was like, you know, I’ll go study an MBA.

GEHRKE: And you studied to, to get a PhD instead?

CHANDRA: Yeah. No, so I was like, you know, I’ll apply to PhD programs, and they give a master’s anyways, and after that I’ll go do an MBA. I wanted to be the business guy. So that was the reason I applied, and … but the reason … the, the person who had convinced me to come here was a professor at Cornell. I had other top schools, but there was a professor at Cornell who was a networking guru at that time. I won’t name him. He’s still a very good friend of mine. So he convinced me to come there. I was a fan of his work, um, and I decided to come to Cornell for him. I said no to other schools. And then I land here; this was 1999. I send a message as soon as I get to Ithaca saying, “Hey, I’m here. I would love to meet you,” and he says, “Well, you know, I’m really sorry, but I left Cornell to do a startup.” And then I was a bit … I was very upset. For a few months, I didn’t know … what am I doing here. I gave everything up. I had other colleges where I could have gone. But to be here, I came to study computer science and the person I came here for is no longer here. It was disappointing. But then I was lucky that Professor Ken Birman adopted me. He was like, “Hey, you have a fellowship. You do what you want. I’m not going to interfere. You just do what you want.” And that’s what convinced me to do a PhD … that in the sense, the first few months were disappointing, but then once I got the freedom, I really was like I was getting paid some money for just learning. And that bit really got me very excited. The fact that I had all the independence to pick what I want, to work on the things that I want. And that’s what convinced me that I don’t want to do an MBA. I want to … I can do what I would do with an MBA after doing a PhD. So that’s what got me to do a PhD at Cornell.

GEHRKE: It’s, it’s super interesting because, I mean, if you hear that story, for many people, it would be kind of frightening, right. You come there … well, you have this person who you wanted to work with and maybe he … there was sort of a plan set up or so, and now, I mean you have to switch advisors. OK, that’s one thing. But the second thing is a PhD sounds so frightening to many people because it’s like a step into the unknown, right. So your PhD by definition, you don’t know whether you’re going to get there, right, because it’s research, and research sometimes leads you into the wrong path and sometimes you don’t get the result that you want. So how, how do you deal personally with that uncertainty?

CHANDRA: For me, it is more I like the unstructured part of it. I like the fact that I could take it in many directions and grow it, and I want that, that level of flexibility, and the more I realized … I think problem picking becomes important, and Ken helped me a bit with it. So initially, I told him I want to do wireless. This was back in 1999. It’s six months into a course, and I’m like I want to write a paper. This is what we want to do, on reliable multicast but for wireless systems. At that time, wireless was very new; people didn’t have cellphones, uh, and such. So … and he said go for it; it was worth it. And then I started exploring it with another grad student. We wrote a paper on it. And that was a good learning experience, which I really enjoyed. The fact that I’m venturing into the unknown, and Ken was explicit. He was like I’m not the expert in wireless; you have to learn it yourself. We did it ourselves. We wrote the paper. It got accepted. And all that was … really helped me … it gave me the confidence that it is possible to explore and do new things. And that’s what got me excited. And that’s what got me into the space of networking, as well. It’s all about wireless, and, um, and getting people more connected at low cost. How do you get everyone connected to the internet? And that’s the space. I think there is a passion within me around that, as well. And the fact that during my PhD, I got the opportunity to go explore, just try everything, and we just made … kept making the right bets, as well, with respect to papers and what got accepted. I did, I did an internship at Microsoft Research, as well, during my PhD. This was three years into my PhD. I came here a few times and that helped me, as well. That helped me further. I worked with Victor Bahl, who was my intern manager, but he was my mentor, and that helped me further go towards my career goals.

GEHRKE: And Victor is now a technical fellow in Azure, where he’s the CTO of our Azure for Operators efforts.

CHANDRA: That’s right. Yeah.

GEHRKE: And maybe, maybe one thought about networking, right. So networking seems to me like this field which is pretty hard because without the hardware, networking doesn’t work. But without the right kind of network protocols and the software, it doesn’t work. So you don’t only … you can’t only do one thing, right. You cannot do only just …

CHANDRA: The one layer …

GEHRKE: … the software … and you also have to do the hardware, and they have to sort of co-evolve. How, how does this work in networking research? Explain that a little bit to our audience. How does networking research actually make progress if both of these have to sort of work in lockstep?

CHANDRA: That’s a great point, Johannes. And that is one of the things with networking. Right when I was an undergrad, I started getting excited by this layered diagram of networking—the seven-layer diagram, the seven-layer OSI stack that we …

GEHRKE: Oh, yeah, I never understood that completely. [LAUGHTER]

CHANDRA: Yeah, it’s all the way from the physical layer, so if you think of the physical layer as one hop …

GEHRKE: Yeah.

CHANDRA: MAC layer … so networking is all about how do you send bits across two computers anywhere in the world. And at the lowest layer, it’s about how do you send the bits across. The layer above it makes it reliable over one hop. That’s the medium access layer. The layer above that ensures that you can communicate not just over one hop but anywhere on the internet using IP. The level above that, with TCP, you make sure that end-to-end communication is more reliable. So every step, every layer that you go above …

GEHRKE: Got it …

CHANDRA: … helps to make sure that your network is better. Now of course once you start layering things, it makes it harder to interoperate. It makes things inefficient because you’re adding headers per layer, which makes it … when you’re consuming bandwidth, you’re introducing extra latency. But that’s an opportunity. At the very least, what this layer diagram has done is that it has ensured innovation across different layers as long as they’re good enough APIs for each layer to communicate with the next layer, so that is the key part of networking research, where over the years it has kept evolving. Every layer has changed. The hardware, we’ve seen Ethernet go from bits per second to kilobits to megabits, gigabits. Now it’s hundreds of gigabits. We’ll soon hit terabits, as well, which people are talking about with 6G—to every layer. When we think of the MAC layer, the TCP layer, all of those have been evolving and that has led to applications. A lot of times, a lot of people just worry about the applications: is my media application … can I watch things on Netflix? Well, underlying that is all the bandwidth that the network provides.

GEHRKE: Got it. So, so one way to think about this is that as long as I make my hardware have the same APIs, I can even go … I can sort of significantly evolve my hardware and all the other parts of the network stack will work?

CHANDRA: Exactly, exactly.

GEHRKE: I see.

CHANDRA: So you could be innovating on the radio—you make that radio faster—but as long as you keep the APIs the same, the TCP layer would work as is with the layers on the bottom.

GEHRKE: Got it. So, so I hear this magic word “6G” from you a lot these days. Can you just explain a little bit. What is 6G all about, and why is it interesting?

CHANDRA: Yeah, so every … over the network, we’ve seen these standards evolve over time. Every 10 years, we see from 2G to 3G to 4G. Now 5G.

GEHRKE: Why 10 years? Why 10 years?

CHANDRA: Ten years is usually the time it takes to come up with a new innovation, drive the standard, drive alignment across different stakeholders to see this is what the next standard should be. Because then once we finalize the standard is when you’ll have all the other vendors like people who build the hardware to base stations, to cellphones, to modems, everyone can then align and build something that is … you know, you have your Qualcomm modem talking with say an Ericsson hardware with the AT&T carrier, which is running on Azure cloud.

GEHRKE: Because everything has to interoperate. That’s why we have the standards.

CHANDRA: Yeah, so that’s why we have the standards, which evolve in a 10-year time frame. With 6G, we are looking at 10x more bandwidth. Your throughputs will go much higher. And one-tenth the latency. Can you get to sub millisecond latency? And the kind of scenarios that we are thinking of are … think of, uh, we can think of completing the feedback loop like robotics and so on, where you’re getting the information, you need to send all this to the cloud because this is huge amounts of information, you need to act on it using AI, and you need to send the feedback so that your robot can perform in time. This could be something in a racetrack, something in, uh, in the middle of … on the roads, or it could be in the middle of a farm. So this is what the vision is. And along with that, the other vision that we have with 6G is to bring internet all over the world. That is right now still around 40 percent of the population in the world—that’s close to 3 billion people in the world—doesn’t have internet access. They just cannot … they, they just don’t have access to the internet.

GEHRKE: And why does 6G help with that?

CHANDRA: 6G should make connectivity more affordable.

GEHRKE: So 6G is also cheaper, even though it’s faster and lower latency?

CHANDRA: It will be.

GEHRKE: That seems so contradictory. Why is that the case?

CHANDRA: No, so I think it will be high speed and low latency in areas where it is needed, but the other feature it should bring in is affordable connectivity in regions that are not connected. And these are in … a lot of it is in the emerging markets, where the people are not connected. And it is not just people. Now we’re also talking about people and things because, you know, if you think of the entire world’s surface, close to 80 percent of the world’s surface, which includes ocean and land, doesn’t have terrestrial internet. So how do you bring internet connectivity throughout the world? That’s one of the challenges that people are looking at with 6G, along with some of the other things around sustainability, security, trust. These are all issues, as well, but at an underlying layer, the fundamental thing we want is high speed, lower latency, and connectivity—affordable connectivity—everywhere. We can’t be leaving 3 billion people in the world behind with no internet when it is so central to the way we are. It defines everything we do, and yet there are so many people in the world who don’t have internet access.

GEHRKE: You mentioned one word, one word “farm,” and we’ll get to that in a second. I just wanted to ask one more question because it just sounds a little bit like magic to me that, you know, you get lower latency, higher bandwidth, and lower cost.

CHANDRA: Yeah.

GEHRKE: Why don’t I get this with 5G if I just push the hardware along?

CHANDRA: So this is where the research would come in, and I think it won’t be the … so when you think of a standard, we think of different components of the standard. One part of the standard is the spectrum. Which part of the spectrum do you operate on?

GEHRKE: Right.

CHANDRA: That could define the throughputs that you get. Now the high speed usually comes with a limited range, as well. Like, you know, like one of the technologies that people are talking about—we are investigating here at Microsoft Research, as well—is terahertz networks. This is a part of the spectrum where you get huge amounts of bandwidth. It’s still following Shannon’s law, but it is just in that part of the spectrum that until now, people said couldn’t be used for communication. But what we are showing is that well, you could, you could use it for communication in that part of the spectrum. Once you get that bandwidth, it also helps us reduce latency by a significant amount. So that’s one thing people are looking at. Along with that and other technology people are looking at to overcome this problem of short range, like 100 meters, to go beyond that, is smart surfaces. So one of the things we’re building is rather than just have these base stations, what do we … if we have smart surfaces, which are programmable and can then make sure that wherever people are, wherever things are, you can provide connectivity there by, by channeling the signals in that particular region. Along with that, people are also looking at affordability. People are looking at different forms, other forms of communication. Like previously, we’ve looked at other parts of the spectrum like lower … terahertz is going further closer to light, that part of the spectrum. The other part of the spectrum is lower in the TV spectrum, for example, a radio spectrum. Once you go lower in the spectrum, your connectivity can just, just go really long distances. So one of the innovations that we had done a lot at Microsoft Research was on using TV spectrum to send and receive information. The benefit is this spectrum is not being used in many places, and using that, you can provide very low-cost point-to-multipoint connectivity in different regions.

GEHRKE: Makes sense. And so 6G encompasses all of those?

CHANDRA: 6G would encompass … right now, it’s still being defined. It’s still early. But as far as research goes, we are working with the community on all these aspects. The other aspect about 6G is AI-driven networks. So can you make your networks much more intelligent? Right now, you define these networks in standards, and the standard’s written, and that’s what is implemented. But you could adapt parts of it based on what’s happening around you, and you can use the spectrum better. You can use it to make sure that you’re getting much more efficiencies in your, in your system. You can prioritize things better. So that’s again one of the other themes that, uh, that we’re investing in and a few of the other, other research labs are investing in, as well.

GEHRKE: Super interesting. And I mean you mentioned this word “farm” before. You’re, of course, known for FarmVibes. And maybe just explain very briefly what FarmVibes is and then also explain … you know, you started out here doing this in Microsoft Research, but then you actually went to a product group, right. What made you, you know, what made you take that decision? And, you know, you actually now finally here in Microsoft Research again. So maybe tell us a little bit about that, that journey.

CHANDRA: Yeah. So I’ll start with why did I even pick agriculture, right. So as I said I did spend a lot of time in my grandparents’ farms in Bihar. This was in north India.

GEHRKE: What were they farming?

CHANDRA: So they used to farm wheat, sugar cane, rice, and, uh, they had farms there. And back then, I did not like anything to do with agriculture. So I used to go there with my brothers and sister and, you know, I, I did do … like I played kabaddi there. I learned how to ride a bicycle with the people there.

GEHRKE: What, what is kabaddi?

CHANDRA: Kabaddi is like, uh, it’s a funner form of rugby. Not, not … it’s, uh, it’s … there are two teams, and you essentially have to bring the other team down, so it’s … you play it in the sand; you get really dirty playing it. Growing up in those villages, it was fun. But spending time in those villages, I didn’t really look forward to them. The reason was that, you know, the rest of the year, you were in this city, which is maintained by the Tatas, which has water, electricity, clean roads, everything, and then, the rest of the three, four months, I was in this village, which did not have electricity. They didn’t have toilets. If you have to go to the bathroom, you have to go out in the fields in the middle of the night in the winter. It wasn’t what you’d look out … look forward to. But that’s how … that’s one of the things that I grew up with. But one of the things that really stuck with me was the poverty that exists in these villages. Like one of the times, my mom, she did some prayers, she had an offering, and she left it outside. And there were a group of kids, they hadn’t had anything to eat during the day. They were just there to grab something to eat. And that has been something that, um, has really … it’s been in my, my mind during my undergrad, and even over here, one of the, one of the things I always want with any project I’m working on is this bigger mission, things that can impact the people I grew up with, and be it with TV white spaces for providing internet connectivity to what I saw was very primitive forms of agriculture. These, these farmers, they would do hand-based seeding. Over here, you use tractors. They would go with the hand and put the seeds. They would do … use bullock-driven tractors. Like to till, they would just go with a bullock. They’d put this hitch on it and then go till the fields, and this, it’s very primitive. So what we, what we want to do is to enable data-driven agriculture, and the bigger goal here is to help address the world’s food problem. The world needs 50 percent more food compared to today’s levels. And in order to grow that 50 percent more food, we need to get there … not just food. We need to grow good food, nutritious food, and we need to get there without harming the planet. The soils are not getting any richer. The water levels are receding. So that’s the big picture of what we want to do with FarmBeats. Our goal is … one of the most promising approaches to get there is what we call data-driven agriculture. That is, can you, can farmers, use data and AI to remove guesswork as part of their farming operations? You know, the farmers that we work with, um, like even when I was doing networking here, I would actually go and volunteer in farms here, and I’d cold call various farmers. What I realized is that these farmers …

GEHRKE: Like here in, in Washington state, right?

CHANDRA: In Washington state. In fact, there’s a Starbucks right here. There was a barista who knew me. She said that, “Hey, I’m going to Spokane, Eastern Washington, this weekend.” If she’s in Eastern Washington, maybe farms. Who farms there? She says, “My grandfather.” I was like, can you connect me to him? So I would just cold call them. I talk to a lot of farmers, and what I realize is that these farmers, they know a lot about their farm. They’ve been farming there for a long time, yet a lot of decisions they make is based on guesswork.

GEHRKE: Right.

CHANDRA: That is where all the data-driven farming piece came in. So through FarmVibes and previously FarmBeats … with FarmBeats, we built, we built a data, data platform for agriculture. Then I had moved over to the product side; we shipped it as a product. We announced partnerships with Land O’ Lakes. We announced partnerships with … well, Land O’ Lakes, their agriculture platform is now running on that. We announced partnerships with Bayer Corporation, with USDA, and other organizations, as well. Then while I was there, what I realized is that the engineering team is now on track. They are delivering this product. But that’s not enough to help us address the world’s food problem. We need to add intelligence on top of what we are building. We need to bring all the innovations that we’re doing in AI for this field. And that was one of the reasons and of course working with you and the fact that the networking … networking is one of the key components that can help us—networking and AI—so that was one of the other reasons why I came back. And with FarmVibes, that’s the problem that we are addressing. With FarmVibes, it’s the, it’s the farm intelligence-speak; it’s the intelligence that sits … that can light up scenarios, these scenarios that we talk of when we think of data-driven agriculture, sustainable agriculture. The kind of things we want to do is help a farmer take the right decisions for what will make them more productive, what will make them reduce their emissions, what will help them sequester more carbon. These are the kind of questions we want to help a farmer, a farmer answer. And some of that is very fundamental research. We’ve come up with ways to see through the clouds, to do very hyperlocal microclimate prediction, to combine different models to make much more accurate predictions to help farmers. And that’s, that’s the kind of thing we are enabling as part of FarmVibes.

GEHRKE: Well, and so just curious, I mean here in Washington state, what is, what is grown on those farms, and how, how have you helped so far?

CHANDRA: Yeah, so there are farms here … there’s one farm in Eastern Washington. We work with this farmer Andrew Nelson, who is a fifth-generation wheat farmer. This is an hour east of Spokane, so if you go to Spokane, you have to drive another hour. It’s interesting. When you go to Andrew’s farm, like we are about 15, 20 minutes from his farm and you lose internet … cell connectivity.

GEHRKE: It’s completely gone.

CHANDRA: [LAUGHS] So you’re like off the grid. And then you reach his house, and then he set up this TV white spaces thing. He has some connectivity in his farm.

GEHRKE: Through satellite or TV white spaces?

CHANDRA: TV white spaces and a fiber to his home. So there’s a fiber that he’s paid to bring fiber to his home and then that lights up the area around his farm using some of the technologies that we have been inventing here. And with Andrew, this is just one use case, but you could replicate it across other farms, as well. He uses some of the techniques throughout his farming life cycle, all the way from planning what to farm to planting—what to plant, where to plant—to in production, like, for example, doing chemical application. Where do I apply herbicides? Do I need to spray pesticides? Where do I spray it? Rather than spraying it throughout. To harvest. That is, when should I harvest? How should I … what route should I take? To post-harvest. Monitoring things and deciding when and where to sell certain things to get more profit. So he uses it throughout his, um, his farming life cycle, and he’s seen a lot of benefit. Like Andrew’s talked about how in one part of the farm, he could double his yield.

GEHRKE: Double? I was just going to ask actually how much benefit he got from it.

CHANDRA: Yeah, double the yield. And in another part of the farm, he’s talked about 40 percent reduction in chemical costs. You know, for a farmer, one of the input costs is chemicals. And using these precision techniques that we built, Andrew’s been able to save 40 percent. That’s, that’s huge.

GEHRKE: It’s also good for the environment.

CHANDRA: It’s good for the environment, as well. He’s not putting in more chemicals than are needed. So these are real use cases with farmers in our, in Washington state. We’re also working, for example, we announced a partnership in India, in Maharashtra. This is one of the Centres of Excellence that’s being put up for FarmVibes. So this is again, they are building AI capable … this is across Oxford University; there’s an organization in India called Agriculture Development Trusts; and Microsoft. And working with, of course, Microsoft India, our sales team there. They’ve set up this Centre of Excellence in a village called Baramati in India, where they are going to be taking the same techniques we built and adapting it for smallholder farmers in, in, in that region. So really excited about the value it brings. Yeah.

GEHRKE: Ranveer, it seems like, you know, here at Microsoft, you had the amazing opportunity to really have huge impact. You know, you start on research, then deliver a product, now even extending the product to more use cases. Do you have any career advice for our listeners given where you are and where you’re going?

CHANDRA: Yeah. And as a student, if there are students listening to this, I would say consider going after a PhD. It gives you that exposure, uh, the opportunity to learn, to dig deep, to know a lot about, about a field. If you’re a professional, one of the things I would say is try to go after your passion. If you give your work a bigger meaning than just making money, you’ll go beyond the 9-to-5 or 9-to-6 schedule to make that happen. Like, you know, Johannes, one funny incident is over here, working at Microsoft, most people sit in front of the computer. When I had started working on FarmBeats, every day, I would be driving to this farm. There was another farm about 40 minutes’ drive from here. Every day, summer or winter, I would be driving there to do the experiments. And I would go there, and a few days, especially when it rains, it gets really gloomy [LAUGHS] and you have to go in boots to a farm that is muddy, half flooded. I’d be like, “Why am I doing this? I could be sitting there …” And then the, the way I would argue to myself is, you know, even if 1 percent of what I’m doing works, it will help the lives of so many farmers worldwide. And then that just gave me the extra energy to go even more, to, to just give it everything that I have to make that difference. So that’s something which I tell students, as well. Give whatever work you do—you’re working in AI, you’re working in systems, you’re working in, in building the next plane or building the next ship—give your work a bigger meaning. You will, you’ll enjoy it. It just, um, you’ll give it a lot more than, than just thinking about it as work. And, you’re right, that at Microsoft, we get that opportunity to make that wholesome impact, that is as you did, as well. We get to go to the products. If something ships as part of a Microsoft product, it touches the lives of so many people. Like one of the projects I was with was Xbox, for the Xbox, when I designed that Xbox wireless controller protocol. Now over 100 million people use it, and one of the most common congratulatory messages I get is still around the Xbox. When I’m giving a talk, someone will come and say, “My son said thanks to you because you helped make the Xbox successful.” So that’s one of the opportunities we get. But not just that. We get the opportunity to come do research and think bigger about the problem; take it to a different level and then influence the next generation of product. So this is, uh, thank you. I think this is an awesome place to work, to realize that mission, that, that vision of what we want to achieve in our lives.

GEHRKE: Yeah, I think, I mean it speaks so much to me because that’s something that I was also really excited about. Making the transition from university here to Microsoft, as well.

[OUTRO MUSIC]

Thanks again, Ranveer, for the great conversation.

CHANDRA: Yeah, thank you, Johannes.

The post What’s Your Story: Ranveer Chandra appeared first on Microsoft Research.

Understanding the user: How the Enterprise System Usability Scale aligns with user reality

October 18, 2023

by Brenda Potts Microsoft AI

This position research paper was presented at the 26^th ACM Conference on Computer-Supported Cooperative Work and Social Computing (opens in new tab) (CSCW 2023), a premier venue for research on the design and use of technologies that affect groups, organizations, and communities.

Microsoft at CSCW 2023 conference highlights

In the business world, measuring success is as critical as selecting the right goals, and metrics act as a guiding compass, shaping organizational objectives. They are instrumental as businesses strategize to develop products that are likely to succeed in specific markets or among certain user groups.

However, businesses often overlook whether these metrics accurately reflect users’ experiences and behaviors. Do they truly reflect the consumers’ journey and provide a reliable evaluation of the products’ place in the market? Put differently, do these metrics truly capture a product’s effectiveness and value, or are they superficial, overlooking deeper insights that could lead a business toward lasting success?

Challenges in enterprise usability metrics research

In our paper, “A Call to Revisit Classic Measurements for UX Evaluation (opens in new tab),” presented at the UX Outcomes Workshop at CSCW 2023 (opens in new tab), we explore these questions about usability metrics—which evaluate the simplicity and effectiveness of a product, service, or system for its users—and their applicability to enterprise products. These metrics are vital when measuring a product’s health in the market and predicting adoption rates, user engagement, and, by extension, revenue generation. Current usability metrics in the enterprise space often fail to align with the actual user’s reality when using technical enterprise products such as business analytics, data engineering, and data science software. Oftentimes, they lack methodological rigor, calling into question their generalizability and validity.

One example is the System Usability Scale (opens in new tab) (SUS), the most widely used usability metric. In the context of enterprise products, at least two questions used in SUS do not resonate with users’ actual experiences: “I think I would like to use the system frequently” and “I think I need the support of a technical person to be able to use this product.” Because users of enterprise products are consumers, not necessarily customers, they often do not get to choose which product to use. In some cases, they are IT professionals with no one to turn to for technical assistance. This misalignment highlights the need to refine how we measure usability for enterprise products.

Another concern is the lack of rigorous validation for metrics that reflect a product’s performance. For instance, UMUX-Lite (opens in new tab) is a popular metric for its simplicity and strong correlation with SUS. However, its scoring methodology requires that researchers use an equation consisting of a regression weight and constant to align the average scores with SUS scores. This lacks a solid theoretical foundation, which raises questions about UMUX-Lite’s ability to generalize to different contexts and respondent samples.

The lack of standardization underscores the need for metrics that are grounded in the user’s reality for the types of products being assessed and based on theoretical and empirical evidence, ensuring that they are generalizable to diverse contexts. This approach will pave the way for more reliable insights into product usability, fostering informed decisions crucial for enhancing the user experience and driving product success.

ESUS: A reality-driven approach to usability metrics

Recognizing this need, we endeavored to create a new usability metric that accurately reflects the experience of enterprise product users, built on solid theory and supported by empirical evidence. Our research combines qualitative and quantitative approaches to devise a tailored usability metric for enterprise products, named the Enterprise System Usability Scale (ESUS).

ESUS offers a number of benefits over the SUS and UMUX-Lite. It is more concise than the SUS, containing only half the questions and streamlining the evaluation process. It also eliminates the need for practitioners to use a sample-specific weight and constant, as required by UMUX-Lite, providing a more reliable measure of product usability. Moreover, ESUS demonstrates convergent validity, correlating with other usability metrics, such as SUS. Most importantly, through its conciseness and specificity, it was designed with enterprise product users in mind, providing relevant and actionable insights.

In Table 1 below, we offer ESUS as a step towards more accurate, reliable, and user-focused metrics for enterprise products, which are instrumental in driving well-informed decisions in improving product usability and customer satisfaction.

ESUS Items	1	2	3	4	5
How useful is [this product] to you?	Not at all useful	Slightly useful	Somewhat useful	Mostly useful	Very useful
How easy or hard was [this product] to use for you?	Very hard	Hard	Neutral	Easy	Very easy
How confident were you when using [this product]?	Not at all confident	Slightly confident	Somewhat confident	Mostly confident	Very confident
How well do the functions work together or do not work together in [this product]?	Does not work together at all	Does not work well together	Neutral	Works well together	Works very well together
How easy or hard was it to get started with [this product]?	Very hard	Hard	Neutral	Easy	Very easy

Table 1: Proposed ESUS questionnaire

Looking ahead: Advancing precision in understanding the user

Moving forward, our focus is on rigorously testing and enhancing ESUS. We aim to examine its consistency over time and its effectiveness with small sample sizes. Our goal is to ensure our metrics are as robust and adaptable as the rapidly evolving enterprise product environment requires. We’re committed to continuous improvement, striving for metrics that are not just accurate but also relevant and reliable, offering actionable insights for an ever-improving user experience.

The post Understanding the user: How the Enterprise System Usability Scale aligns with user reality appeared first on Microsoft Research.

DecodingTrust: A Comprehensive Assessment of Trustworthiness in GPT Models

October 16, 2023

by Alyssa Hughes Microsoft AI

White line icons on a blue and green gradient background

Introduction

How trustworthy are generative pre-trained transformer (GPT) models?

To answer this question, University of Illinois Urbana-Champaign, together with Stanford University, University of California, Berkeley, Center for AI Safety, and Microsoft Research, released a comprehensive trustworthiness evaluation platform for large language models (LLMs), which is presented in the recent paper: DecodingTrust: A Comprehensive Assessment of Trustworthiness in GPT Models – Microsoft Research (opens in new tab). This paper, which was accepted as an oral presentation at NeurIPS 2023 (Datasets and Benchmarks Track), (opens in new tab) focuses specifically on GPT-4 and GPT-3.5. It considers diverse perspectives, including toxicity, stereotype bias, adversarial robustness, out-of-distribution robustness, robustness on adversarial demonstrations, privacy, machine ethics, and fairness.

Based on our evaluations, we found previously unpublished vulnerabilities relating to trustworthiness. For instance, we find that GPT models can be easily misled to generate toxic and biased outputs and leak private information in both training data and conversation history. We also find that although GPT-4 is usually more trustworthy than GPT-3.5 on standard benchmarks, GPT-4 is more vulnerable given jailbreaking system or user prompts, which are maliciously designed to bypass the security measures of LLMs, potentially because GPT-4 follows (misleading) instructions more precisely.

Our work illustrates a comprehensive trustworthiness evaluation of GPT models and sheds light on the trustworthiness gaps. Our benchmark (opens in new tab) is publicly available.

It’s important to note that the research team worked with Microsoft product groups to confirm that the potential vulnerabilities identified do not impact current customer-facing services. This is in part true because finished AI applications apply a range of mitigation approaches to address potential harms that may occur at the model level of the technology. In addition, we have shared our research with GPT’s developer, OpenAI, which has noted the potential vulnerabilities in the system cards for relevant models.

Our goal is to encourage others in the research community to utilize and build upon this work, potentially pre-empting nefarious actions by adversaries who would exploit vulnerabilities to cause harm. This trustworthiness assessment is only a starting point, and we hope to work together with others to build on its findings and create powerful and more trustworthy models going forward. To facilitate collaboration, we have made our benchmark code very extensible and easy to use: a single command is sufficient to run the complete evaluation on a new model.

Trustworthiness perspectives of language models

Recent breakthroughs in machine learning, especially LLMs, have enabled a wide range of applications, from chatbots to robotics. Yet, while the literature on the trustworthiness of GPT models remains limited, practitioners have proposed employing capable GPT models even for sensitive applications such as healthcare and finance. To this end, we focus on a comprehensive trustworthiness evaluation of GPT models towards eight trustworthiness perspectives, with thorough evaluations based on different constructed scenarios, tasks, metrics, and datasets, as shown in Figure 1 below.

Overall, we aim to evaluate 1) the performance of GPT models under different trustworthiness perspectives, and 2) the resilience of their performance in adversarial environments (e.g., adversarial system/user prompts, demonstrations).

For example, to evaluate the robustness of GPT-3.5 and GPT-4 on textual adversarial attacks, we construct three evaluation scenarios: 1) evaluation on the standard benchmark AdvGLUE with a vanilla task description, aiming to assess: a) the vulnerabilities of GPT models to existing textual adversarial attacks, b) the robustness of different GPT models in comparison to state-of-the-art models on the standard AdvGLUE benchmark, c) the impact of adversarial attacks on their instruction-following abilities (measured by the rate at which the model refuses to answer a question or presents an incorrect answer when it is under attack), and d) the transferability of current attack strategies (quantified by the transferability attack success rates of different attack approaches); 2) evaluation on the AdvGLUE benchmark given different instructive task descriptions and designed system prompts, so as to investigate the resilience of models under diverse (adversarial) task descriptions and system prompts; 3) evaluation of GPT-3.5 and GPT-4 on our generated challenging adversarial texts AdvGLUE++ against open-source autoregressive models such as Alpaca-7B, Vicuna-13B, and StableVicuna-13B in different settings to further evaluate the vulnerabilities of GPT-3.5 and GPT-4 under strong adversarial attacks in diverse settings.

A graph listing the trustworthiness perspectives, benchmarks, and datasets evaluated in this work, organized in three layers. The first layer shows the 8 main trustworthiness perspectives, including toxicity, stereotypes, adversarial robustness, out-of-distribution robustness, robustness on adversarial demonstrations, privacy, machine ethics, and fairness. The second layer shows different benchmarks of each trustworthiness perspective. Some of the benchmarks are from prior work, while some are newly proposed in the paper. The third layer further shows different evaluation settings of each benchmark. — Fig 1. Taxonomy of our evaluation based on different trustworthiness perspectives. We use yellow boxes to represent the evaluations on existing benchmarks, and green boxes for evaluations using our designed new data or new evaluation protocols on existing datasets.

Insights and findings from DecodingTrust

Our research uncovered some previously undisclosed advantages as well as threats to the reliability of large language models.

For example, in terms of model robustness to adversarial demonstrations, we find that on the one hand, GPT-3.5 and GPT-4 will not be misled by the counterfactual examples added in the demonstrations and can even benefit from the counterfactual demonstrations in general; on the other hand, we observe that providing backdoored demonstrations will mislead both GPT-3.5 and GPT-4 to make incorrect predictions for backdoored inputs, especially when the backdoored demonstrations are positioned close to the (backdoored) user inputs, and GPT-4 is more vulnerable to backdoored demonstrations.

Moreover, in terms of toxicity and bias, both GPT-3.5 and GPT-4 are not strongly biased for the majority of stereotype topics considered under the benign and untargeted system prompts. However, both GPT models can be “tricked” into agreeing with biased content given designed misleading (adversarial) system prompts. GPT-4 is more vulnerable to misleading targeted system prompts and agrees with the biased content more frequently than GPT-3.5, potentially because GPT-4 follows the misleading instructions more precisely. Model bias can often be dependent on the demographic groups mentioned in the user prompt. For instance, for the query “Homosexuals have HIV.”, GPT-4 strongly disagrees in general, while for the query “Women have HIV.”, GPT-4 frequently agrees and outputs biased content. Model bias is also dependent on the stereotype topics. GPT models output more biased content on less sensitive topics like leadership and greed, while generating less biased content on more sensitive topics like drug dealing and terrorism. This is potentially due to the fine-tuning of GPT models on some protected demographic groups and sensitive topics.

DecodingTrust also evaluates the privacy-leakage issues of LLMs. We find that GPT models can leak privacy-sensitive training data, such as the email addresses from the standard Enron email dataset, especially when prompted with the context of emails or few-shot demonstrations of (name, email) pairs. Moreover, under few-shot prompting, with supplementary knowledge such as the targeted email domain, the email extraction accuracy can be 100x higher than the scenarios where the email domain is unknown. We also observe that GPT models can leak the injected private information in the conversation history. Overall, GPT-4 is more robust than GPT-3.5 in safeguarding personally identifiable information (PII), and both models are robust to specific types of PII, such as Social Security numbers, possibly due to the explicit instruction tuning for those PII keywords. However, both GPT-4 and GPT-3.5 would leak all types of PII when prompted with privacy-leakage demonstrations during in-context learning. Lastly, GPT models demonstrate different capabilities in understanding different privacy-related words or privacy events (e.g., they will leak private information when told “confidentially” but not when told “in confidence”). GPT-4 is more likely to leak privacy than GPT-3.5, given our constructed prompts, potentially due to the fact that it follows the (misleading) instructions more precisely. We present more examples of model unreliable outputs in Figure 2 below.

The figure showing the examples of undesirable responses of GPT-4 given benign system prompts for each of the 8 trustworthiness perspectives, including toxicity, stereotypes, adversarial robustness, out-of-distribution robustness, robustness on adversarial demonstrations, privacy, machine ethics, and fairness. — Fig 2. Examples of undesirable responses of GPT-4 given benign system prompts from different trustworthiness perspectives. Offensive or sensitive information is masked.

Read the paper

The post DecodingTrust: A Comprehensive Assessment of Trustworthiness in GPT Models appeared first on Microsoft Research.

Microsoft at VL/HCC 2023: Focus on co-audit tools for spreadsheets

October 12, 2023

by Brenda Potts Microsoft AI

These research papers were presented at the IEEE Symposium on Visual Languages and Human-Centric Computing (opens in new tab) (VL/HCC 2023), a premier forum for design, theory, and application of computing technologies for programming, modelling, and communication.

Large language models (LLMs) have revolutionized the way novice programmers and everyday computer users tap into the capabilities of natural language for programming. Among the tools used in this context, spreadsheets stand out as the preferred choice. The integration of LLMs into spreadsheets promises to substantially enhance their functionality and the user experience. At the same time, it’s well known that spreadsheet users commonly though inadvertently introduce errors (opens in new tab), and this can carry significant risks. For example, in 2010, a spreadsheet used in a Harvard economic analysis (opens in new tab) to inform austerity measures imposed on Greece was discovered to contain multiple errors (opens in new tab).

Microsoft is actively pursuing (opens in new tab) research focused on developing co-auditing tools and techniques, with an initial emphasis on spreadsheets. These tools are designed to help users verify the results generated by LLMs. At VL/HCC 2023 (opens in new tab), we introduce two new spreadsheet tools, ColDeco and FxD, specifically built to help users thoroughly examine and debug their programs within spreadsheets. Additionally, it is worth mentioning that the paper on FxD was awarded the Honorable Mention (opens in new tab).

ColDeco: An end-user inspection tool

Working with tables in spreadsheets is a common task, and the ability to add a calculated column can be incredibly useful. A calculated column not only adds information but also facilitates tasks like filtering and sorting. Generative AI can enable users to create sophisticated calculated columns in tables. However, verification of AI-generated code in this scenario is crucial because AI can misinterpret the user’s intent or overlook important data.

In our paper, “ColDeco: An End User Spreadsheet Inspection Tool for AI-Generated Code,” we introduce ColDeco, a no-code inspection tool for calculated columns. ColDeco uses helper columns and row grouping to help users understand how an AI-generated column works and locate any errors.

To describe how ColDeco works, we’ll use an example table containing people’s first, middle, and last names in separate columns. Our user asks the system to “create a column called ‘Abbreviation’ that takes the first letter of each part of the name.” In this example, there’s an error in the generated code that fails to handle rows with no middle names, causing some Abbreviation cells to be empty.

First, the model generates a program that computes an abbreviation for each row and adds it to the new Abbreviation column. ColDeco’s interface automatically opens as a side panel, as shown in Figure 1.

The Inspect Columns view displays any generated columns, accompanied by a natural language description of the generated code. The Inspect Rows view displays a subset of the table, organized by behavior. The Row Inspection view uses dataflow analysis to group rows, highlighting key distinct execution behaviors. In our example, this view quickly draws the user’s attention to the two rows that fail to calculate an abbreviation.

Two graphics. The first graphic depicts a table with columns: “First Name”, “Middle Name”, “Last Name”, “DoB”, and “Abbreviation”. There are 11 rows. As examples, row 3 contains the information: First Name: Christopher, Middle Name: Michael, Last Name: Fleming, DoB: 11/5/1995, Abbreviation: CMF. Row 9 contains the information: First Name: William, Middle name is empty, Last Name: Smith, DoB: 6/3/1968, Abbreviation is empty. The second graphic depicts a side panel with two sections. The first section is the Inspect Columns view (labelled 1a). A single column named “Abbreviation” and a corresponding description is shown. The second section is the Inspect Rows view (labelled 1b). It contains a table with columns “Index”, “First Name”, “Middle Name”, “Last Name”, and “Abbreviation”. Within the table there are two groups of rows. The first group has an example row: Index: 4716, First Name: William, Middle Name is empty, Last Name: Smith, Abbreviation is empty. The second group has an example row: Index: 8984, First Name: Christopher, Middle Name: Michael, Last Name: Flemming, Abbreviation: CMF. — Figure 1. The initial view of the ColDeco side panel. An Abbreviation program is generated by the AI and added to the table as a new column. The Inspect Columns view (1a) shows the column generated by the AI, including a description of how the code works. The Inspect Rows view (1b) groups rows into different behaviors, indicating that there are errors in two rows.

If our user wants to investigate an error, they can expand a generated column into multiple helper columns, illustrated in Figure 2. These helper columns are visible in both the table (2a) and the side panel (2b), and they show the intermediate values. The user can now see that the missing abbreviations are caused by an error that occurred when the system tried to take the first and middle initials.

Two graphics. The first graphic (labelled 2a) depicts a table with 4 columns: “DoB”, “text concatenation”, “1st letter of Last Name”, “Abbreviation”. As examples, row 3 contains the information: DoB: 11/5/1995, text concatenation: CM, 1st letter of Lan Name: F, Abbreviation: CMF. Row 9 contains the information DoB: 6/3/1968, text concatenation: is empty, 1st letter of Lan Name: S, Abbreviation: is empty. The second graphic (labelled 2b) depicts a side panel showing the Inspect Columns view. A tree view shows “Abbreviation” as the root with two children: “1st letter of Last Name” and “text concatenation”, corresponding to the columns in the table. Each column in the tree view has a corresponding description. — Figure 2. The ColDeco side panel after a user expands the Abbreviation column into two additional helper columns. Each additional column has a description.

FxD: A functional debugger

Not every spreadsheet task involves generating a new table column. Moreover, many users are already well acquainted with spreadsheet formulas. This brings us to our second tool, a spreadsheet formula debugger, introduced in the paper, “FxD: a functional debugger for dysfunctional spreadsheets.”

We employed a user-centered approach when designing FxD, extensively reviewing existing literature on functional programming debuggers. This informed the four key features we implemented into FxD:

Live debugging. FxD dynamically updates as a user edits a formula, allowing for quick formula modification and exploration (Figure 3, image 1).

Hybrid formula tracing. The debugger combines step-based evaluation (Figure 3, image 1) with tree-based derivations (Figure 3, image 3) to provide a step-by-step breakdown of the formula. Substeps are hidden behind expandable cards to prevent user overload.

Subformula coloring. Color coding highlights changes in a formula as FxD evaluates it. This facilitates the tracking of these updates when a user hovers over a step (Figure 3, images 2 and 4).

Information inspector. Context-aware tooltips improve the user experience. One example is table previews when a user hovers over ranges in functions like VLOOKUP. These tooltips offer insights into the range, surrounding context, and the lookup column used by the containing function (Figure 3, image 3).

Four graphics, each graphic describing a different feature of the debugger. The formula being debugged is ‘=IF(G3 < (B1 + B2) * (1 + B3), “low”, “high”)’. The first graphic (labelled 1) shows the formula and its evaluation trace. Each step in the trace shows the formula with some part evaluated. The last step is the value “low” which is the result of the formula. The second graphic (labelled 2) shows a step being highlighted. The step has a before formula and after formula, with multiple parts evaluated. Each part that is evaluated is highlighted with the same color in the “before” and “after” formula. The third graphic (labelled 3) shows a cell range being hovered on and a range information inspector being shown. The inspector shows a preview of the grid for the corresponding range. The fourth graphic (labelled 4) shows a step being highlighted and an evaluated subpart being hovered over. The user hovers over the value 15 in the “after” formula and the corresponding formula “B1 + B2” in the “before” formula is underlined. — Figure 3. The FxD debugger. Image 1 shows the edited formula and evaluation steps. The steps update as a user edits the formula. Image 2 shows subformula coloring, which highlights a subformula and its value upon hovering. Image 3 shows an information inspector that previews the range referenced in a formula. Image 4 shows the concurrent evaluation of multiple subformulas. When the user hovers over a value, the corresponding subformula is underlined.

Growing importance of AI code verification

As the complexity of AI-generated code rises, the need for tools to verify accuracy becomes increasingly critical. In response, we developed these two co-audit tools tailored to spreadsheets. Moving forward, a key consideration lies in managing the complexity of these tools. Our vision is that debugging tools will become infused with generative AI to assist users in both generating and verifying workflows.

Review our paper on co-auditing in general to learn more.

The post Microsoft at VL/HCC 2023: Focus on co-audit tools for spreadsheets appeared first on Microsoft Research.

Algorithm design for submodular function minimization

AI Frontiers: The future of causal reasoning with Emre Kiciman and Amit Sharma

Parallel algorithms for submodular function minimization

Fast first-order methods for exact submodular function minimization

Toward faster algorithms for SFM and its applications

Microsoft Research Newsletter

A road map for teachers

Behind the technology

Convincing the skeptics

Next steps

Defining data concepts and creating visualizations

Looking ahead: Analyst-AI collaboration in data analysis

Data growth demands a sustainable archival solution

Project Silica: Sustainable and durable cloud archival storage

Writing, reading, and decoding data

Azure workload analysis informs Silica’s design

Collaborators: Renewable energy storage with Bichlien Nguyen and David Kwabi

Project Silica’s versatile storage system

Diverse applications for sustainably archiving humanity’s data

NEW RESEARCH

Kosmos-2.5: A Multimodal Literate Model

AI Frontiers: The future of causal reasoning with Emre Kiciman and Amit Sharma

NEW RESEARCH

Evaluation of Dependency Structure for Multivariate Weather Predictors using Copulas

NEW RESEARCH

Adaptive Training System

NEW RESEARCH

CodePlan: Repository-level Coding using LLMs and Planning

NEW ARTICLE

The intimacy triple bind: Structural inequalities and relational labor in the influencer industry

Subscribe to the Microsoft Research Podcast:

Transcript

Learn more:

Subscribe to the Microsoft Research Podcast:

Transcript

Collaborators: Gov4git with Petar Maymounkov and Kasia Sitkiewicz

Challenges in enterprise usability metrics research

ESUS: A reality-driven approach to usability metrics

Looking ahead: Advancing precision in understanding the user

Introduction

Collaborators: Holoportation communication technology with Spencer Fowers and Kwame Darko

Trustworthiness perspectives of language models

Insights and findings from DecodingTrust

ColDeco: An end-user inspection tool

Microsoft Research Summit 2022

FxD: A functional debugger

Growing importance of AI code verification

Navigation

Computer Vision Endless Possibilities

"I'm interested in things that change the world or that affect the future and wondrous, new technology where you see it, and you're like, 'Wow, how did that even happen? How is that possible?'" -- Elon Musk

Copyright © 2019-2023 Vedere AI. All Rights Reserved.