Inaugural Day of AI brings new digital literacy to classrooms worldwide

The first annual Day of AI on Friday, May 13 introduced artificial intelligence literacy to classrooms all over the world. An initiative of MIT Responsible AI for Social Empowerment and Education (RAISE), Day of AI is an opportunity for teachers to introduce K-12 students of all backgrounds to artificial intelligence (AI) and its role in their lives.

With over 3,000 registrations from educators across 88 countries — far exceeding the first-year goal of 1,000 registrations in the United States — the initiative has clearly struck a chord with students and teachers who want to better understand the technology that’s increasingly part of everyday life.

In today’s technology-driven world, kids are exposed to and interact with AI in ways they might not realize — from search algorithms to smart devices, video recommendations to facial recognition. Day of AI aims to help educators and students develop AI literacy with an easy entry point, with free curricula and hands-on activities developed by MIT RAISE for grades 3-12.

Professor Cynthia Breazeal, director of MIT RAISE, dean for digital learning, and head of the MIT Media Lab’s Personal Robots research group, says “We’re so inspired by the enthusiasm that students have expressed about learning about AI. We created this program because we want students and their teachers to be able to learn about these technologies in a way that’s engaging, that’s meaningful, that gives them the experience so they know that they can do AI too.”

AI is for everyone

The MIT RAISE team designed all Day of AI activities to be accessible to educators and students of all backgrounds and abilities, including those with little or no technology experience. In collaboration with education provider i2 Learning, MIT RAISE also offered teachers free professional development sessions prior to teaching the material. “That really helped me understand GANs and how that works,” says Gar-Hay Kit, a sixth-grade teacher from Mary Lyon School in Boston. “The slides that we were given were easy to work with and my class was engaged with all of the activities that we did that day.”

Students engaged with AI topics such as deepfakes, generative adversarial networks (GANs), algorithmic bias in datasets, and responsible design in social media platforms. Through hands-on activities and accessible, age-appropriate lessons, they learned what these technologies do, how they’re built, the potential dangers, along with responsible design and use — to bring benefit while mitigating unintended negative consequences.

To celebrate the inaugural Day of AI, the RAISE team hosted an event at WBUR CitySpace. Students from the fifth and sixth grade at Mary Lyon School shared projects they had created using the Day of AI curriculum during the previous few days. They demonstrated how Google QuickDraw was more likely to recognize spotted cows when the majority of users submit input with drawings of cows with spots; the AI didn’t have a wide enough dataset to draw from to be able to account for other breeds of cows that have different patterns or solid colors.

In a project about responsible social media and game design, students showed how the Roblox game platform only recommends gendered clothing for characters based on the user-entered gender. The solution the students proposed was to change the design of the recommendation system by inputting more options that were less overtly gendered, and allowing all users access to all of the clothing.

When asked what stuck out the most about the Day of AI activities, sixth-grade student Julia said, “It was cool how they were teaching young students AI and how we got to watch videos, and draw on the website.”

“One of the great benefits of this program is that no experience is necessary. You can be from anywhere and still have access to this career,” said Lieutenant Governor Karyn Polito at the event. The accessibility of Day of AI curricula relates to the tenet of Massachusetts STEM Week, “See yourself in STEM,” and Massachusetts’ STEM education goals at large. When Polito asked the audience of fifth- and sixth-graders from Mary Lyon School if they saw themselves in STEM, dozens of hands shot up in the air.

Breazeal echoed that sentiment, saying, “No matter your background, we want you to feel empowered and see a place where you can be inventing and driving these technologies in responsible ways to make a better world.” Working professionals and graduate students who use AI aren’t the only ones affected by this technology. RAISE pursues research, innovation, and outreach programs like Day of AI so K-12 students of all ages can recognize AI, evaluate its influence, and learn how to use it responsibly. Addressing the students, Breazeal said, “As you grow up, you’ll have a voice in our democracy to say how you want to see AI used.” 

More than just robots … but sometimes robots

Breazeal also moderated a panel of professionals who work with AI every day: Daniella DiPaola, PhD student at the MIT Media Lab; Steve Idowu, senior manager of strategic innovation at Liberty Mutual; Alex Aronov, executive director of data strategy and solutions at Vertex; and Sara Saperstein, head of data science, cybersecurity, and fraud at MassMutual. The panelists discussed how they’re able to leverage AI in a variety of different ways at their jobs.

Aronov explained that in a broad sense, AI can help automate “mundane” tasks so employees can focus on projects that require creative, innately “human” thinking. Idowu uses AI to improve customer and employee experiences, from claims to risk assessments. DiPaola addressed the common misconception that AI refers to sentient robots: when the Media Lab developed the social robot Jibo, the AI in action is not the robot itself but natural language understanding, technology that helps Jibo understand what people say and mean. Throughout her academic career, DiPaola has been interested in how people interact with technology. “AI is helping us uncover things about ourselves,” she said.

The panelists also spoke to the broader goals of Day of AI — not only to introduce a younger generation to the STEM concepts at the core of AI technology, but to help them envision a future for themselves that uses those skills in new ways. “It’s not just the math and computer science, it’s about thinking deeply about what we’re doing — and how,” said Saperstein.

Jeffrey Leiden, executive chair of Vertex Pharmaceuticals (a founding sponsor of Day of AI as well as the CitySpace event), said, “Twenty years ago, I don’t think any of us could have predicted how much AI and machine learning would be in our lives. We have Siri on our phones, AI can tell us what’s in our fridges, it can change the temperature automatically on our thermostats,” he said. As someone working in the medical industry, he’s particularly excited for how AI can detect medical events before they happen so patients can be treated proactively.

By introducing STEM subjects as early as elementary and middle school, educators can build pathways for students to pursue STEM in high school and beyond. Exposure to future careers as scientists and researchers working in fields ranging from life sciences to robotics can empower students to bring their ideas forward and come up with even better solutions for science’s great questions.

The first Day of AI was hugely successful, with teachers posting photos and stories of their students’ enthusiasm from all over the world on social media using #DayofAI. Further Day of AI events are planned in Australia and Hong Kong later this summer, and the MIT RAISE team is already planning new curriculum modules, resources, and community-building efforts in advance of next year’s event. Plans include engaging the growing global community for language translation, more cultural localization for curriculum modules, and more.

Read More

In bias we trust?

When the stakes are high, machine-learning models are sometimes used to aid human decision-makers. For instance, a model could predict which law school applicants are most likely to pass the bar exam to help an admissions officer determine which students should be accepted.

These models often have millions of parameters, so how they make predictions is nearly impossible for researchers to fully understand, let alone an admissions officer with no machine-learning experience. Researchers sometimes employ explanation methods that mimic a larger model by creating simple approximations of its predictions. These approximations, which are far easier to understand, help users determine whether to trust the model’s predictions.

But are these explanation methods fair? If an explanation method provides better approximations for men than for women, or for white people than for Black people, it may encourage users to trust the model’s predictions for some people but not for others.

MIT researchers took a hard look at the fairness of some widely used explanation methods. They found that the approximation quality of these explanations can vary dramatically between subgroups and that the quality is often significantly lower for minoritized subgroups.

In practice, this means that if the approximation quality is lower for female applicants, there is a mismatch between the explanations and the model’s predictions that could lead the admissions officer to wrongly reject more women than men.

Once the MIT researchers saw how pervasive these fairness gaps are, they tried several techniques to level the playing field. They were able to shrink some gaps, but couldn’t eradicate them.

“What this means in the real-world is that people might incorrectly trust predictions more for some subgroups than for others. So, improving explanation models is important, but communicating the details of these models to end users is equally important. These gaps exist, so users may want to adjust their expectations as to what they are getting when they use these explanations,” says lead author Aparna Balagopalan, a graduate student in the Healthy ML group of the MIT Computer Science and Artificial Intelligence Laboratory (CSAIL).

Balagopalan wrote the paper with CSAIL graduate students Haoran Zhang and Kimia Hamidieh; CSAIL postdoc Thomas Hartvigsen; Frank Rudzicz, associate professor of computer science at the University of Toronto; and senior author Marzyeh Ghassemi, an assistant professor and head of the Healthy ML Group. The research will be presented at the ACM Conference on Fairness, Accountability, and Transparency.

High fidelity

Simplified explanation models can approximate predictions of a more complex machine-learning model in a way that humans can grasp. An effective explanation model maximizes a property known as fidelity, which measures how well it matches the larger model’s predictions.

Rather than focusing on average fidelity for the overall explanation model, the MIT researchers studied fidelity for subgroups of people in the model’s dataset. In a dataset with men and women, the fidelity should be very similar for each group, and both groups should have fidelity close to that of the overall explanation model.

“When you are just looking at the average fidelity across all instances, you might be missing out on artifacts that could exist in the explanation model,” Balagopalan says.

They developed two metrics to measure fidelity gaps, or disparities in fidelity between subgroups. One is the difference between the average fidelity across the entire explanation model and the fidelity for the worst-performing subgroup. The second calculates the absolute difference in fidelity between all possible pairs of subgroups and then computes the average.

With these metrics, they searched for fidelity gaps using two types of explanation models that were trained on four real-world datasets for high-stakes situations, such as predicting whether a patient dies in the ICU, whether a defendant reoffends, or whether a law school applicant will pass the bar exam. Each dataset contained protected attributes, like the sex and race of individual people. Protected attributes are features that may not be used for decisions, often due to laws or organizational policies. The definition for these can vary based on the task specific to each decision setting.

The researchers found clear fidelity gaps for all datasets and explanation models. The fidelity for disadvantaged groups was often much lower, up to 21 percent in some instances. The law school dataset had a fidelity gap of 7 percent between race subgroups, meaning the approximations for some subgroups were wrong 7 percent more often on average. If there are 10,000 applicants from these subgroups in the dataset, for example, a significant portion could be wrongly rejected, Balagopalan explains.

“I was surprised by how pervasive these fidelity gaps are in all the datasets we evaluated. It is hard to overemphasize how commonly explanations are used as a ‘fix’ for black-box machine-learning models. In this paper, we are showing that the explanation methods themselves are imperfect approximations that may be worse for some subgroups,” says Ghassemi.

Narrowing the gaps

After identifying fidelity gaps, the researchers tried some machine-learning approaches to fix them. They trained the explanation models to identify regions of a dataset that could be prone to low fidelity and then focus more on those samples. They also tried using balanced datasets with an equal number of samples from all subgroups.

These robust training strategies did reduce some fidelity gaps, but they didn’t eliminate them.

The researchers then modified the explanation models to explore why fidelity gaps occur in the first place. Their analysis revealed that an explanation model might indirectly use protected group information, like sex or race, that it could learn from the dataset, even if group labels are hidden.

They want to explore this conundrum more in future work. They also plan to further study the implications of fidelity gaps in the context of real-world decision making.

Balagopalan is excited to see that concurrent work on explanation fairness from an independent lab has arrived at similar conclusions, highlighting the importance of understanding this problem well.

As she looks to the next phase in this research, she has some words of warning for machine-learning users.

“Choose the explanation model carefully. But even more importantly, think carefully about the goals of using an explanation model and who it eventually affects,” she says.

This work was funded, in part, by the MIT-IBM Watson AI Lab, the Quanta Research Institute, a Canadian Institute for Advanced Research AI Chair, and Microsoft Research.

Read More

Is diversity the key to collaboration? New AI research suggests so

As artificial intelligence gets better at performing tasks once solely in the hands of humans, like driving cars, many see teaming intelligence as a next frontier. In this future, humans and AI are true partners in high-stakes jobs, such as performing complex surgery or defending from missiles. But before teaming intelligence can take off, researchers must overcome a problem that corrodes cooperation: humans often do not like or trust their AI partners

Now, new research points to diversity as being a key parameter for making AI a better team player.  

MIT Lincoln Laboratory researchers have found that training an AI model with mathematically “diverse” teammates improves its ability to collaborate with other AI it has never worked with before, in the card game Hanabi. Moreover, both Facebook and Google’s DeepMind concurrently published independent work that also infused diversity into training to improve outcomes in human-AI collaborative games.  

Altogether, the results may point researchers down a promising path to making AI that can both perform well and be seen as good collaborators by human teammates.  

“The fact that we all converged on the same idea — that if you want to cooperate, you need to train in a diverse setting — is exciting, and I believe it really sets the stage for the future work in cooperative AI,” says Ross Allen, a researcher in Lincoln Laboratory’s Artificial Intelligence Technology Group and co-author of a paper detailing this work, which was recently presented at the International Conference on Autonomous Agents and Multi-Agent Systems.   

Adapting to different behaviors

To develop cooperative AI, many researchers are using Hanabi as a testing ground. Hanabi challenges players to work together to stack cards in order, but players can only see their teammates’ cards and can only give sparse clues to each other about which cards they hold. 

In a previous experiment, Lincoln Laboratory researchers tested one of the world’s best-performing Hanabi AI models with humans. They were surprised to find that humans strongly disliked playing with this AI model, calling it a confusing and unpredictable teammate. “The conclusion was that we’re missing something about human preference, and we’re not yet good at making models that might work in the real world,” Allen says.  

The team wondered if cooperative AI needs to be trained differently. The type of AI being used, called reinforcement learning, traditionally learns how to succeed at complex tasks by discovering which actions yield the highest reward. It is often trained and evaluated against models similar to itself. This process has created unmatched AI players in competitive games like Go and StarCraft.

But for AI to be a successful collaborator, perhaps it has to not only care about maximizing reward when collaborating with other AI agents, but also something more intrinsic: understanding and adapting to others’ strengths and preferences. In other words, it needs to learn from and adapt to diversity.  

How do you train such a diversity-minded AI? The researchers came up with “Any-Play.” Any-Play augments the process of training an AI Hanabi agent by adding another objective, besides maximizing the game score: the AI must correctly identify the play-style of its training partner.

This play-style is encoded within the training partner as a latent, or hidden, variable that the agent must estimate. It does this by observing differences in the behavior of its partner. This objective also requires its partner to learn distinct, recognizable behaviors in order to convey these differences to the receiving AI agent.

Though this method of inducing diversity is not new to the field of AI, the team extended the concept to collaborative games by leveraging these distinct behaviors as diverse play-styles of the game.

“The AI agent has to observe its partners’ behavior in order to identify that secret input they received and has to accommodate these various ways of playing to perform well in the game. The idea is that this would result in an AI agent that is good at playing with different play styles,” says first author and Carnegie Mellon University PhD candidate Keane Lucas, who led the experiments as a former intern at the laboratory.

Playing with others unlike itself

The team augmented that earlier Hanabi model (the one they had tested with humans in their prior experiment) with the Any-Play training process. To evaluate if the approach improved collaboration, the researchers teamed up the model with “strangers” — more than 100 other Hanabi models that it had never encountered before and that were trained by separate algorithms — in millions of two-player matches. 

The Any-Play pairings outperformed all other teams, when those teams were also made up of partners who were algorithmically dissimilar to each other. It also scored better when partnering with the original version of itself not trained with Any-Play.

The researchers view this type of evaluation, called inter-algorithm cross-play, as the best predictor of how cooperative AI would perform in the real world with humans. Inter-algorithm cross-play contrasts with more commonly used evaluations that test a model against copies of itself or against models trained by the same algorithm.

“We argue that those other metrics can be misleading and artificially boost the apparent performance of some algorithms. Instead, we want to know, ‘if you just drop in a partner out of the blue, with no prior knowledge of how they’ll play, how well can you collaborate?’ We think this type of evaluation is most realistic when evaluating cooperative AI with other AI, when you can’t test with humans,” Allen says.  

Indeed, this work did not test Any-Play with humans. However, research published by DeepMind, simultaneous to the lab’s work, used a similar diversity-training approach to develop an AI agent to play the collaborative game Overcooked with humans. “The AI agent and humans showed remarkably good cooperation, and this result leads us to believe our approach, which we find to be even more generalized, would also work well with humans,” Allen says. Facebook similarly used diversity in training to improve collaboration among Hanabi AI agents, but used a more complicated algorithm that required modifications of the Hanabi game rules to be tractable.

Whether inter-algorithm cross-play scores are actually good indicators of human preference is still a hypothesis. To bring human perspective back into the process, the researchers want to try to correlate a person’s feelings about an AI, such as distrust or confusion, to specific objectives used to train the AI. Uncovering these connections could help accelerate advances in the field.  

“The challenge with developing AI to work better with humans is that we can’t have humans in the loop during training telling the AI what they like and dislike. It would take millions of hours and personalities. But if we could find some kind of quantifiable proxy for human preference — and perhaps diversity in training is one such proxy ­ — then maybe we’ve found a way through this challenge,” Allen says.

Read More

Early sound exposure in the womb shapes the auditory system

Inside the womb, fetuses can begin to hear some sounds around 20 weeks of gestation. However, the input they are exposed to is limited to low-frequency sounds because of the muffling effect of the amniotic fluid and surrounding tissues.

A new MIT-led study suggests that this degraded sensory input is beneficial, and perhaps necessary, for auditory development. Using simple computer models of the human auditory processing, the researchers showed that initially limiting input to low-frequency sounds as the models learned to perform certain tasks actually improved their performance.

Along with an earlier study from the same team, which showed that early exposure to blurry faces improves computer models’ subsequent generalization ability to recognize faces, the findings suggest that receiving low-quality sensory input may be key to some aspects of brain development.

“Instead of thinking of the poor quality of the input as a limitation that biology is imposing on us, this work takes the standpoint that perhaps nature is being clever and giving us the right kind of impetus to develop the mechanisms that later prove to be very beneficial when we are asked to deal with challenging recognition tasks,” says Pawan Sinha, a professor of vision and computational neuroscience in MIT’s Department of Brain and Cognitive Sciences, who led the research team.

In the new study, the researchers showed that exposing a computational model of the human auditory system to a full range of frequencies from the beginning led to worse generalization performance on tasks that require absorbing information over longer periods of time — for example, identifying emotions from a voice clip. From the applied perspective, the findings suggest that babies born prematurely may benefit from being exposed to lower-frequency sounds rather than the full spectrum of frequencies that they now hear in neonatal intensive care units, the researchers say.

Marin Vogelsang and Lukas Vogelsang, currently both students at EPFL Lausanne, are the lead authors of the study, which appears in the journal Developmental Science. Sidney Diamond, a retired neurologist and now an MIT research affiliate, is also an author of the paper.

Low-quality input

Several years ago, Sinha and his colleagues became interested in studying how low-quality sensory input affects the brain’s subsequent development. This question arose in part after the researchers had the opportunity to meet and study a young boy who had been born with cataracts that were not removed until he was four years old.

This boy, who was born in China, was later adopted by an American family and referred to Sinha’s lab at the age of 10. Studies revealed that his vision was nearly normal, with one notable exception: He performed very poorly in recognizing faces. Other studies of children born blind have also revealed deficits in face recognition after their sight was restored.

The researchers hypothesized that this impairment might be a result of missing out on some of the low-quality visual input that babies and young children normally receive. When babies are born, their visual acuity is very poor — around 20/800, 1/40 the strength of normal 20/20 vision. This is in part because of the lower packing density of photoreceptors in the newborn retina. As the baby grows, the receptors become more densely packed and visual acuity improves.

“The theory we proposed was that this initial period of blurry or degraded vision was very important. Because everything is so blurry, the brain needs to integrate over larger areas of the visual field,” Sinha says.

To explore this theory, the researchers used a type of computational model of vision known as a convolutional neural network. They trained the model to recognize faces, giving it either blurry input followed later by clear input, or clear input from the beginning. They found that the models that received fuzzy input early on showed superior generalization performance on facial recognition tasks. Additionally, the neural networks’ receptive fields — the size of the visual area that they cover — were larger than the receptive fields in models trained on the clear input from the beginning.

After that study was published in 2018, the researchers wanted to explore whether this phenomenon could also be seen in other types of sensory systems. For audition, the timeline of development is slightly different, as full-term babies are born with nearly normal hearing across the sound spectrum. However, during the prenatal period, while the auditory system is still developing, babies are exposed to degraded sound quality in the womb.

To examine the effects of that degraded input, the researchers trained a computational model of human audition to perform a task that requires integrating information over long time periods — identifying emotion from a voice clip. As the models learned the task, the researchers fed them one of four different types of auditory input: low frequency only, full frequency only, low frequency followed by full frequency, and full frequency followed by low frequency.

Low frequency followed by full frequency most closely mimics what developing infants are exposed to, and the researchers found that the computer models exposed to that scenario exhibited the most generalized performance profile on the emotion recognition task. Those models also generated larger temporal receptive fields, meaning that they were able to analyze sounds occurring over a longer time period.

This suggests, just like the vision study, that degraded input early in development actually promotes better sensory integration abilities later in life.

“It supports the idea that starting with very limited information, and then getting better and better over time might actually be a feature of the system rather than being a bug,” Lukas Vogelsang says.

Effects of premature birth

Previous research done by other labs has found that babies born prematurely do show impairments in processing low-frequency sounds. They perform worse than full-term babies on tests of emotion classification, later in life. The MIT team’s computational findings suggest that these impairments may be the result of missing out on some of the low-quality sensory input they would normally receive in the womb.

“If you provide full-frequency input right from the get-go, then you are taking away the impetus on the part of the brain to try to discover long range or extended temporal structure. It can get by with just local temporal structure,” Sinha says. “Presumably that is what immediate immersion in full-frequency soundscapes does to the brain of a prematurely born child.”

The researchers suggest that for babies born prematurely, it could be beneficial to expose them to primarily low-frequency sounds after birth, to mimic the womb-like conditions they’re missing out on.

The research team is now exploring other areas in which this kind of degraded input may be beneficial to brain development. These include aspects of vision, such as color perception, as well as qualitatively different domains such as linguistic development.

“We have been surprised by how consistent the narrative and the hypothesis of the experimental results are, to this idea of initial degradations being adaptive for developmental purposes,” Sinha says. “I feel that this work illustrates the gratifying surprises science offers us. We did not expect that the ideas which germinated from our work with congenitally blind children would have much bearing on our thinking about audition. But, in fact, there appears to be a beautiful conceptual commonality between the two domains. And, maybe that common thread goes even beyond these two sensory modalities. There are clearly a host of exciting research questions ahead of us.”

The research was funded by the National Institutes of Health.

Read More

President Guðni Thorlacius Jóhannesson of Iceland visits MIT

Guðni Thorlacius Jóhannesson, the president of Iceland, visited MIT on Friday, engaging in talks with several campus leaders and professors, and touring the Media Lab.

Jóhannesson visited the Institute along with a substantial delegation of officials and scholars from Iceland. They met with MIT scholars, who delivered a variety of presentations on research, design, and entrepreneurship; the Iceland delegation also had a particular interest in the inclusion of the Icelandic language in artificial intelligence-driven tools that automatically recognize, translate, and deploy speech and texts.

“We are determined to make sure that Icelandic has a place in the digital age,” Jóhannesson said. “AI plays a key role there.” In working to have Icelandic represented in machine-translation, voice-recognition, and associated language tools, he added: “We want to make sure it is in our interest, and to the benefit of mankind.”

In addition to the presentations, the delegation also met with Hashim Sarkis, dean of MIT’s School of Architecture and Planning (SA+P).

“It is important to put the human being front and center,” Sarkis said, describing the priorities of Media Lab researchers and MIT scholars and students.

Sarkis added that many researchers at the Media Lab, SA+P, and across the Institute have “a very deep interest in design, as [an] approach to solving problems in the world.”

Jóhannesson was first elected president of Iceland in 2016, then reelected with a large majority to his second term in 2020. By training, Jóhannesson is a professional historian who has studied and written extensively about modern Iceland. He received his undergraduate degree in history and political science from the University of Warwick in England; his MA in history from the University of Iceland; an MSt in history from St. Antony’s College at Oxford University; and his PhD in history from Queen Mary University of London.

The Icelandic delegation sat down for a series of discussions with MIT faculty and administrators. Krystyn Van Vliet, associate provost, associate vice president for research, and the Michael and Sonja Koerner Professor of Materials Science and Biological Engineering, discussed MIT’s research mission and the Kendall Square innovation ecosystem.

Daniela Rus, the Andrew and Erna Viterbi Professor of Electrical Engineering and Computer Science, director of the Computer Science and Artificial Intelligence Laboratory (CSAIL), and deputy dean of research for the MIT Schwarzman College of Computing, talked about CSAIL, the state of AI, and, in dialogue with the Iceland group, discussed the application of AI to language-recognition and translation tasks.

The group also had a face-to-face discussion about entrepreneurship and the MIT Regional Entrepreneurship Acceleration Program with Scott Stern, the David Sarnoff Professor of Management at the MIT Sloan School of Management.

Kelly Gavin, manager of philanthropic relations at the Media Lab, gave the group a tour of the lab, which included a talk on natural language processing by Media Lab PhD student Pedro Colon-Hernandez; the group discussed research in the field and the potential of AI-driven tools to assist people in everyday life.

“You need technology, but you also need some ethical thinking behind it,” Jóhannesson said.

Jóhannesson displayed a quick wit during his visit, quipping to Sarkis that he was “on unpaid leave” from his academic post while serving as Iceland’s president. When Rus mentioned that MIT identifies buildings by number rather than name, and that she worked in Building 32, the president joked that in Iceland, “I am in Building 1.”

The visit also incorporated a lunch with Icelandic MIT faculty and Boston-area researchers, including Elfar Adalsteinsson, the Eaton-Peabody Professor in the Department of Electrical Engineering and Computer Science; Svafa Grönfeldt, a professor of the practice in the School of Architecture and Planning and faculty director of MITdesignX; and Jónas Oddur Jónasson, an assistant professor at the MIT Sloan School of Management.

The president’s visit concluded with another sit-down talk, with Grönfeldt and Gilad Rosenzweig, executive director of MITdesignX.

The Icelandic delegation consisted of about a dozen other government officials, business leaders, and scholars, including Sif Gunnarsdóttir, the president’s chief of staff; Lilja Dögg Alfreðsdóttir, Iceland’s minister for culture and business; and Nikulás Hannigan, consul general and trade commissioner at Iceland’s U.S. Embassy.

Read More

Artificial intelligence predicts patients’ race from their medical images

The miseducation of algorithms is a critical problem; when artificial intelligence mirrors unconscious thoughts, racism, and biases of the humans who generated these algorithms, it can lead to serious harm. Computer programs, for example, have wrongly flagged Black defendants as twice as likely to reoffend as someone who’s white. When an AI used cost as a proxy for health needs, it falsely named Black patients as healthier than equally sick white ones, as less money was spent on them. Even AI used to write a play relied on using harmful stereotypes for casting. 

Removing sensitive features from the data seems like a viable tweak. But what happens when it’s not enough? 

Examples of bias in natural language processing are boundless — but MIT scientists have investigated another important, largely underexplored modality: medical images. Using both private and public datasets, the team found that AI can accurately predict self-reported race of patients from medical images alone. Using imaging data of chest X-rays, limb X-rays, chest CT scans, and mammograms, the team trained a deep learning model to identify race as white, Black, or Asian — even though the images themselves contained no explicit mention of the patient’s race. This is a feat even the most seasoned physicians cannot do, and it’s not clear how the model was able to do this. 

In an attempt to tease out and make sense of the enigmatic “how” of it all, the researchers ran a slew of experiments. To investigate possible mechanisms of race detection, they looked at variables like differences in anatomy, bone density, resolution of images — and many more, and the models still prevailed with high ability to detect race from chest X-rays. “These results were initially confusing, because the members of our research team could not come anywhere close to identifying a good proxy for this task,” says paper co-author Marzyeh Ghassemi, an assistant professor in the MIT Department of Electrical Engineering and Computer Science and the Institute for Medical Engineering and Science (IMES), who is an affiliate of the Computer Science and Artificial Intelligence Laboratory (CSAIL) and of the MIT Jameel Clinic. “Even when you filter medical images past where the images are recognizable as medical images at all, deep models maintain a very high performance. That is concerning because superhuman capacities are generally much more difficult to control, regulate, and prevent from harming people.”

In a clinical setting, algorithms can help tell us whether a patient is a candidate for chemotherapy, dictate the triage of patients, or decide if a movement to the ICU is necessary. “We think that the algorithms are only looking at vital signs or laboratory tests, but it’s possible they’re also looking at your race, ethnicity, sex, whether you’re incarcerated or not — even if all of that information is hidden,” says paper co-author Leo Anthony Celi, principal research scientist in IMES at MIT and associate professor of medicine at Harvard Medical School. “Just because you have representation of different groups in your algorithms, that doesn’t guarantee it won’t perpetuate or magnify existing disparities and inequities. Feeding the algorithms with more data with representation is not a panacea. This paper should make us pause and truly reconsider whether we are ready to bring AI to the bedside.” 

The study, “AI recognition of patient race in medical imaging: a modeling study,” was published in Lancet Digital Health on May 11. Celi and Ghassemi wrote the paper alongside 20 other authors in four countries.

To set up the tests, the scientists first showed that the models were able to predict race across multiple imaging modalities, various datasets, and diverse clinical tasks, as well as across a range of academic centers and patient populations in the United States. They used three large chest X-ray datasets, and tested the model on an unseen subset of the dataset used to train the model and a completely different one. Next, they trained the racial identity detection models for non-chest X-ray images from multiple body locations, including digital radiography, mammography, lateral cervical spine radiographs, and chest CTs to see whether the model’s performance was limited to chest X-rays. 

The team covered many bases in an attempt to explain the model’s behavior: differences in physical characteristics between different racial groups (body habitus, breast density), disease distribution (previous studies have shown that Black patients have a higher incidence for health issues like cardiac disease), location-specific or tissue specific differences, effects of societal bias and environmental stress, the ability of deep learning systems to detect race when multiple demographic and patient factors were combined, and if specific image regions contributed to recognizing race. 

What emerged was truly staggering: The ability of the models to predict race from diagnostic labels alone was much lower than the chest X-ray image-based models. 

For example, the bone density test used images where the thicker part of the bone appeared white, and the thinner part appeared more gray or translucent. Scientists assumed that since Black people generally have higher bone mineral density, the color differences helped the AI models to detect race. To cut that off, they clipped the images with a filter, so the model couldn’t color differences. It turned out that cutting off the color supply didn’t faze the model — it still could accurately predict races. (The “Area Under the Curve” value, meaning the measure of the accuracy of a quantitative diagnostic test, was 0.94–0.96). As such, the learned features of the model appeared to rely on all regions of the image, meaning that controlling this type of algorithmic behavior presents a messy, challenging problem. 

The scientists acknowledge limited availability of racial identity labels, which caused them to focus on Asian, Black, and white populations, and that their ground truth was a self-reported detail. Other forthcoming work will include potentially looking at isolating different signals before image reconstruction, because, as with bone density experiments, they couldn’t account for residual bone tissue that was on the images. 

Notably, other work by Ghassemi and Celi led by MIT student Hammaad Adam has found that models can also identify patient self-reported race from clinical notes even when those notes are stripped of explicit indicators of race. Just as in this work, human experts are not able to accurately predict patient race from the same redacted clinical notes.

“We need to bring social scientists into the picture. Domain experts, which are usually the clinicians, public health practitioners, computer scientists, and engineers are not enough. Health care is a social-cultural problem just as much as it’s a medical problem. We need another group of experts to weigh in and to provide input and feedback on how we design, develop, deploy, and evaluate these algorithms,” says Celi. “We need to also ask the data scientists, before any exploration of the data, are there disparities? Which patient groups are marginalized? What are the drivers of those disparities? Is it access to care? Is it from the subjectivity of the care providers? If we don’t understand that, we won’t have a chance of being able to identify the unintended consequences of the algorithms, and there’s no way we’ll be able to safeguard the algorithms from perpetuating biases.”

“The fact that algorithms ‘see’ race, as the authors convincingly document, can be dangerous. But an important and related fact is that, when used carefully, algorithms can also work to counter bias,” says Ziad Obermeyer, associate professor at the University of California at Berkeley, whose research focuses on AI applied to health. “In our own work, led by computer scientist Emma Pierson at Cornell, we show that algorithms that learn from patients’ pain experiences can find new sources of knee pain in X-rays that disproportionately affect Black patients — and are disproportionately missed by radiologists. So just like any tool, algorithms can be a force for evil or a force for good — which one depends on us, and the choices we make when we build algorithms.”

The work is supported, in part, by the National Institutes of Health.

Read More

Living better with algorithms

Laboratory for Information and Decision Systems (LIDS) student Sarah Cen remembers the lecture that sent her down the track to an upstream question.

At a talk on ethical artificial intelligence, the speaker brought up a variation on the famous trolley problem, which outlines a philosophical choice between two undesirable outcomes.

The speaker’s scenario: Say a self-driving car is traveling down a narrow alley with an elderly woman walking on one side and a small child on the other, and no way to thread between both without a fatality. Who should the car hit?

Then the speaker said: Let’s take a step back. Is this the question we should even be asking?

That’s when things clicked for Cen. Instead of considering the point of impact, a self-driving car could have avoided choosing between two bad outcomes by making a decision earlier on — the speaker pointed out that, when entering the alley, the car could have determined that the space was narrow and slowed to a speed that would keep everyone safe.

Recognizing that today’s AI safety approaches often resemble the trolley problem, focusing on downstream regulation such as liability after someone is left with no good choices, Cen wondered: What if we could design better upstream and downstream safeguards to such problems? This question has informed much of Cen’s work.

“Engineering systems are not divorced from the social systems on which they intervene,” Cen says. Ignoring this fact risks creating tools that fail to be useful when deployed or, more worryingly, that are harmful.

Cen arrived at LIDS in 2018 via a slightly roundabout route. She first got a taste for research during her undergraduate degree at Princeton University, where she majored in mechanical engineering. For her master’s degree, she changed course, working on radar solutions in mobile robotics (primarily for self-driving cars) at Oxford University. There, she developed an interest in AI algorithms, curious about when and why they misbehave. So, she came to MIT and LIDS for her doctoral research, working with Professor Devavrat Shah in the Department of Electrical Engineering and Computer Science, for a stronger theoretical grounding in information systems.

Auditing social media algorithms

Together with Shah and other collaborators, Cen has worked on a wide range of projects during her time at LIDS, many of which tie directly to her interest in the interactions between humans and computational systems. In one such project, Cen studies options for regulating social media. Her recent work provides a method for translating human-readable regulations into implementable audits.

To get a sense of what this means, suppose that regulators require that any public health content — for example, on vaccines — not be vastly different for politically left- and right-leaning users. How should auditors check that a social media platform complies with this regulation? Can a platform be made to comply with the regulation without damaging its bottom line? And how does compliance affect the actual content that users do see?

Designing an auditing procedure is difficult in large part because there are so many stakeholders when it comes to social media. Auditors have to inspect the algorithm without accessing sensitive user data. They also have to work around tricky trade secrets, which can prevent them from getting a close look at the very algorithm that they are auditing because these algorithms are legally protected. Other considerations come into play as well, such as balancing the removal of misinformation with the protection of free speech.

To meet these challenges, Cen and Shah developed an auditing procedure that does not need more than black-box access to the social media algorithm (which respects trade secrets), does not remove content (which avoids issues of censorship), and does not require access to users (which preserves users’ privacy).

In their design process, the team also analyzed the properties of their auditing procedure, finding that it ensures a desirable property they call decision robustness. As good news for the platform, they show that a platform can pass the audit without sacrificing profits. Interestingly, they also found the audit naturally incentivizes the platform to show users diverse content, which is known to help reduce the spread of misinformation, counteract echo chambers, and more.

Who gets good outcomes and who gets bad ones?

In another line of research, Cen looks at whether people can receive good long-term outcomes when they not only compete for resources, but also don’t know upfront what resources are best for them.

Some platforms, such as job-search platforms or ride-sharing apps, are part of what is called a matching market, which uses an algorithm to match one set of individuals (such as workers or riders) with another (such as employers or drivers). In many cases, individuals have matching preferences that they learn through trial and error. In labor markets, for example, workers learn their preferences about what kinds of jobs they want, and employers learn their preferences about the qualifications they seek from workers.

But learning can be disrupted by competition. If workers with a particular background are repeatedly denied jobs in tech because of high competition for tech jobs, for instance, they may never get the knowledge they need to make an informed decision about whether they want to work in tech. Similarly, tech employers may never see and learn what these workers could do if they were hired.

Cen’s work examines this interaction between learning and competition, studying whether it is possible for individuals on both sides of the matching market to walk away happy.

Modeling such matching markets, Cen and Shah found that it is indeed possible to get to a stable outcome (workers aren’t incentivized to leave the matching market), with low regret (workers are happy with their long-term outcomes), fairness (happiness is evenly distributed), and high social welfare.

Interestingly, it’s not obvious that it’s possible to get stability, low regret, fairness, and high social welfare simultaneously.  So another important aspect of the research was uncovering when it is possible to achieve all four criteria at once and exploring the implications of those conditions.

What is the effect of X on Y?

For the next few years, though, Cen plans to work on a new project, studying how to quantify the effect of an action X on an outcome Y when it’s expensive — or impossible — to measure this effect, focusing in particular on systems that have complex social behaviors.

For instance, when Covid-19 cases surged in the pandemic, many cities had to decide what restrictions to adopt, such as mask mandates, business closures, or stay-home orders. They had to act fast and balance public health with community and business needs, public spending, and a host of other considerations.

Typically, in order to estimate the effect of restrictions on the rate of infection, one might compare the rates of infection in areas that underwent different interventions. If one county has a mask mandate while its neighboring county does not, one might think comparing the counties’ infection rates would reveal the effectiveness of mask mandates. 

But of course, no county exists in a vacuum. If, for instance, people from both counties gather to watch a football game in the maskless county every week, people from both counties mix. These complex interactions matter, and Sarah plans to study questions of cause and effect in such settings.

“We’re interested in how decisions or interventions affect an outcome of interest, such as how criminal justice reform affects incarceration rates or how an ad campaign might change the public’s behaviors,” Cen says.

Cen has also applied the principles of promoting inclusivity to her work in the MIT community.

As one of three co-presidents of the Graduate Women in MIT EECS student group, she helped organize the inaugural GW6 research summit featuring the research of women graduate students — not only to showcase positive role models to students, but also to highlight the many successful graduate women at MIT who are not to be underestimated.

Whether in computing or in the community, a system taking steps to address bias is one that enjoys legitimacy and trust, Cen says. “Accountability, legitimacy, trust — these principles play crucial roles in society and, ultimately, will determine which systems endure with time.” 

Read More

Can artificial intelligence overcome the challenges of the health care system?

Even as rapid improvements in artificial intelligence have led to speculation over significant changes in the health care landscape, the adoption of AI in health care has been minimal. A 2020 survey by Brookings, for example, found that less than 1 percent of job postings in health care required AI-related skills.

The Abdul Latif Jameel Clinic for Machine Learning in Health (Jameel Clinic), a research center within the MIT Schwarzman College of Computing, recently hosted the MITxMGB AI Cures Conference in an effort to accelerate the adoption of clinical AI tools by creating new opportunities for collaboration between researchers and physicians focused on improving care for diverse patient populations.

Once virtual, the AI Cures Conference returned to in-person attendance at MIT’s Samberg Conference Center on the morning of April 25, welcoming over 300 attendees primarily made up of researchers and physicians from MIT and Mass General Brigham (MGB). 

MIT President L. Rafael Reif began the event by welcoming attendees and speaking to the “transformative capacity of artificial intelligence and its ability to detect, in a dark river of swirling data, the brilliant patterns of meaning that we could never see otherwise.” MGB’s president and CEO Anne Klibanski followed up by lauding the joint partnership between the two institutions and noting that the collaboration could “have a real impact on patients’ lives” and “help to eliminate some of the barriers to information-sharing.”

Domestically, about $20 million in subcontract work currently takes place between MIT and MGB. MGB’s chief academic officer and AI Cures co-chair Ravi Thadhani thinks that five times that amount would be necessary in order to do more transformative work. “We could certainly be doing more,” Thadhani said. “The conference … just scratched the surface of a relationship between a leading university and a leading health-care system.”

MIT Professor and AI Cures Co-Chair Regina Barzilay echoed similar sentiments during the conference. “If we’re going to take 30 years to take all the algorithms and translate them into patient care, we’ll be losing patient lives,” she said. “I hope the main impact of this conference is finding a way to translate it into a clinical setting to benefit patients.”

This year’s event featured 25 speakers and two panels, with many of the speakers addressing the obstacles facing the mainstream deployment of AI in clinical settings, from fairness and clinical validation to regulatory hurdles and translation issues using AI tools. 

On the speaker list, of note was the appearance of Amir Khan, a senior fellow from the U.S. Food and Drug Administration (FDA), who fielded a number of questions from curious researchers and clinicians on the FDA’s ongoing efforts and challenges in regulating AI in health care.

The conference also covered many of the impressive advancements AI made in the past several years: Lecia Sequist, a lung cancer oncologist from MGB, spoke about her collaborative work with MGB radiologist Florian Fintelmann and Barzilay to develop an AI algorithm that could detect lung cancer up to six years in advance. MIT Professor Dina Katabi presented with MGB’s doctors Ipsit Vahia and Aleksandar Videnovic on an AI device that could detect the presence of Parkinson’s disease simply by monitoring a person’s breathing patterns while asleep. “It is an honor to collaborate with Professor Katabi,” Videnovic said during the presentation.

MIT Assistant Professor Marzyeh Ghassemi, whose presentation concerned designing machine learning processes for more equitable health systems, found the longer-range perspectives shared by the speakers during the first panel on AI changing clinical science compelling.

“What I really liked about that panel was the emphasis on how relevant technology and AI has become in clinical science,” Ghassemi says. “You heard some panel members [Eliezer Van Allen, Najat Khan, Isaac Kohane, Peter Szolovits] say that they used to be the only person at a conference from their university that was focused on AI and ML [machine learning], and now we’re in a space where we have a miniature conference with posters just with people from MIT.”

The 88 posters accepted to AI Cures were on display for attendees to peruse during the lunch break. The presented research spanned different areas of focus from clinical AI and AI for biology to AI-powered systems and others. 

“I was really impressed with the breadth of work going on in this space,” Collin Stultz, a professor at MIT, says. Stultz also spoke at AI Cures, focusing primarily on the risks of interpretability and explainability when using AI tools in a clinical setting, using cardiovascular care as an example of showing how algorithms could potentially mislead clinicians with grave consequences for patients. 

“There are a growing number of failures in this space where companies or algorithms strive to be the most accurate, but do not take into consideration how the clinician views the algorithm and their likelihood of using it,” Stultz said. “This is about what the patient deserves and how the clinician is able to explain and justify their decision-making to the patient.” 

Phil Sharp, MIT Institute Professor and chair of the advisory board for Jameel Clinic, found the conference energizing and thought that the in-person interactions were crucial to gaining insight and motivation, unmatched by many conferences that are still being hosted virtually. 

“The broad participation by students and leaders and members of the community indicate that there’s an awareness that this is a tremendous opportunity and a tremendous need,” Sharp says. He pointed out that AI and machine learning are being used to predict the structures of “almost everything” from protein structures to drug efficacy. “It says to young people, watch out, there might be a machine revolution coming.” 

Read More

On the road to cleaner, greener, and faster driving

No one likes sitting at a red light. But signalized intersections aren’t just a minor nuisance for drivers; vehicles consume fuel and emit greenhouse gases while waiting for the light to change.

What if motorists could time their trips so they arrive at the intersection when the light is green? While that might be just a lucky break for a human driver, it could be achieved more consistently by an autonomous vehicle that uses artificial intelligence to control its speed.

In a new study, MIT researchers demonstrate a machine-learning approach that can learn to control a fleet of autonomous vehicles as they approach and travel through a signalized intersection in a way that keeps traffic flowing smoothly.

Using simulations, they found that their approach reduces fuel consumption and emissions while improving average vehicle speed. The technique gets the best results if all cars on the road are autonomous, but even if only 25 percent use their control algorithm, it still leads to substantial fuel and emissions benefits.

“This is a really interesting place to intervene. No one’s life is better because they were stuck at an intersection. With a lot of other climate change interventions, there is a quality-of-life difference that is expected, so there is a barrier to entry there. Here, the barrier is much lower,” says senior author Cathy Wu, the Gilbert W. Winslow Career Development Assistant Professor in the Department of Civil and Environmental Engineering and a member of the Institute for Data, Systems, and Society (IDSS) and the Laboratory for Information and Decision Systems (LIDS).

The lead author of the study is Vindula Jayawardana, a graduate student in LIDS and the Department of Electrical Engineering and Computer Science. The research will be presented at the European Control Conference.

Intersection intricacies

While humans may drive past a green light without giving it much thought, intersections can present billions of different scenarios depending on the number of lanes, how the signals operate, the number of vehicles and their speeds, the presence of pedestrians and cyclists, etc.

Typical approaches for tackling intersection control problems use mathematical models to solve one simple, ideal intersection. That looks good on paper, but likely won’t hold up in the real world, where traffic patterns are often about as messy as they come.

Wu and Jayawardana shifted gears and approached the problem using a model-free technique known as deep reinforcement learning. Reinforcement learning is a trial-and-error method where the control algorithm learns to make a sequence of decisions. It is rewarded when it finds a good sequence. With deep reinforcement learning, the algorithm leverages assumptions learned by a neural network to find shortcuts to good sequences, even if there are billions of possibilities.

This is useful for solving a long-horizon problem like this; the control algorithm must issue upwards of 500 acceleration instructions to a vehicle over an extended time period, Wu explains.

“And we have to get the sequence right before we know that we have done a good job of mitigating emissions and getting to the intersection at a good speed,” she adds.

But there’s an additional wrinkle. The researchers want the system to learn a strategy that reduces fuel consumption and limits the impact on travel time. These goals can be conflicting.

“To reduce travel time, we want the car to go fast, but to reduce emissions, we want the car to slow down or not move at all. Those competing rewards can be very confusing to the learning agent,” Wu says.

While it is challenging to solve this problem in its full generality, the researchers employed a workaround using a technique known as reward shaping. With reward shaping, they give the system some domain knowledge it is unable to learn on its own. In this case, they penalized the system whenever the vehicle came to a complete stop, so it would learn to avoid that action.

Traffic tests

Once they developed an effective control algorithm, they evaluated it using a traffic simulation platform with a single intersection. The control algorithm is applied to a fleet of connected autonomous vehicles, which can communicate with upcoming traffic lights to receive signal phase and timing information and observe their immediate surroundings. The control algorithm tells each vehicle how to accelerate and decelerate.

Their system didn’t create any stop-and-go traffic as vehicles approached the intersection. (Stop-and-go traffic occurs when cars are forced to come to a complete stop due to stopped traffic ahead). In simulations, more cars made it through in a single green phase, which outperformed a model that simulates human drivers. When compared to other optimization methods also designed to avoid stop-and-go traffic, their technique resulted in larger fuel consumption and emissions reductions. If every vehicle on the road is autonomous, their control system can reduce fuel consumption by 18 percent and carbon dioxide emissions by 25 percent, while boosting travel speeds by 20 percent.

“A single intervention having 20 to 25 percent reduction in fuel or emissions is really incredible. But what I find interesting, and was really hoping to see, is this non-linear scaling. If we only control 25 percent of vehicles, that gives us 50 percent of the benefits in terms of fuel and emissions reduction. That means we don’t have to wait until we get to 100 percent autonomous vehicles to get benefits from this approach,” she says.

Down the road, the researchers want to study interaction effects between multiple intersections. They also plan to explore how different intersection set-ups (number of lanes, signals, timings, etc.) can influence travel time, emissions, and fuel consumption. In addition, they intend to study how their control system could impact safety when autonomous vehicles and human drivers share the road. For instance, even though autonomous vehicles may drive differently than human drivers, slower roadways and roadways with more consistent speeds could improve safety, Wu says.

While this work is still in its early stages, Wu sees this approach as one that could be more feasibly implemented in the near-term.

“The aim in this work is to move the needle in sustainable mobility. We want to dream, as well, but these systems are big monsters of inertia. Identifying points of intervention that are small changes to the system but have significant impact is something that gets me up in the morning,” she says.  

This work was supported, in part, by the MIT-IBM Watson AI Lab.

Read More

Technique protects privacy when making online recommendations

Algorithms recommend products while we shop online or suggest songs we might like as we listen to music on streaming apps.

These algorithms work by using personal information like our past purchases and browsing history to generate tailored recommendations. The sensitive nature of such data makes preserving privacy extremely important, but existing methods for solving this problem rely on heavy cryptographic tools requiring enormous amounts of computation and bandwidth.

MIT researchers may have a better solution. They developed a privacy-preserving protocol that is so efficient it can run on a smartphone over a very slow network. Their technique safeguards personal data while ensuring recommendation results are accurate.

In addition to user privacy, their protocol minimizes the unauthorized transfer of information from the database, known as leakage, even if a malicious agent tries to trick a database into revealing secret information.

The new protocol could be especially useful in situations where data leaks could violate user privacy laws, like when a health care provider uses a patient’s medical history to search a database for other patients who had similar symptoms or when a company serves targeted advertisements to users under European privacy regulations.

“This is a really hard problem. We relied on a whole string of cryptographic and algorithmic tricks to arrive at our protocol,” says Sacha Servan-Schreiber, a graduate student in the Computer Science and Artificial Intelligence Laboratory (CSAIL) and lead author of the paper that presents this new protocol.

Servan-Schreiber wrote the paper with fellow CSAIL graduate student Simon Langowski and their advisor and senior author Srinivas Devadas, the Edwin Sibley Webster Professor of Electrical Engineering. The research will be presented at the IEEE Symposium on Security and Privacy.

The data next door

The technique at the heart of algorithmic recommendation engines is known as a nearest neighbor search, which involves finding the data point in a database that is closest to a query point. Data points that are mapped nearby share similar attributes and are called neighbors.

These searches involve a server that is linked with an online database which contains concise representations of data point attributes. In the case of a music streaming service, those attributes, known as feature vectors, could be the genre or popularity of different songs.

To find a song recommendation, the client (user) sends a query to the server that contains a certain feature vector, like a genre of music the user likes or a compressed history of their listening habits. The server then provides the ID of a feature vector in the database that is closest to the client’s query, without revealing the actual vector. In the case of music streaming, that ID would likely be a song title. The client learns the recommended song title without learning the feature vector associated with it.

“The server has to be able to do this computation without seeing the numbers it is doing the computation on. It can’t actually see the features, but still needs to give you the closest thing in the database,” says Langowski.

To achieve this, the researchers created a protocol that relies on two separate servers that access the same database. Using two servers makes the process more efficient and enables the use of a cryptographic technique known as private information retrieval. This technique allows a client to query a database without revealing what it is searching for, Servan-Schreiber explains.

Overcoming security challenges

But while private information retrieval is secure on the client side, it doesn’t provide database privacy on its own. The database offers a set of candidate vectors — possible nearest neighbors — for the client, which are typically winnowed down later by the client using brute force. However, doing so can reveal a lot about the database to the client. The additional privacy challenge is to prevent the client from learning those extra vectors. 

The researchers employed a tuning technique that eliminates many of the extra vectors in the first place, and then used a different trick, which they call oblivious masking, to hide any additional data points except for the actual nearest neighbor. This efficiently preserves database privacy, so the client won’t learn anything about the feature vectors in the database.  

Once they designed this protocol, they tested it with a nonprivate implementation on four real-world datasets to determine how to tune the algorithm to maximize accuracy. Then, they used their protocol to conduct private nearest neighbor search queries on those datasets.

Their technique requires a few seconds of server processing time per query and less than 10 megabytes of communication between the client and servers, even with databases that contained more than 10 million items. By contrast, other secure methods can require gigabytes of communication or hours of computation time. With each query, their method achieved greater than 95 percent accuracy (meaning that nearly every time it found the actual approximate nearest neighbor to the query point). 

The techniques they used to enable database privacy will thwart a malicious client even if it sends false queries to try and trick the server into leaking information.

“A malicious client won’t learn much more information than an honest client following protocol. And it protects against malicious servers, too. If one deviates from protocol, you might not get the right result, but they will never learn what the client’s query was,” Langowski says.

In the future, the researchers plan to adjust the protocol so it can preserve privacy using only one server. This could enable it to be applied in more real-world situations, since it would not require the use of two noncolluding entities (which don’t share information with each other) to manage the database.  

“Nearest neighbor search undergirds many critical machine-learning driven applications, from providing users with content recommendations to classifying medical conditions. However, it typically requires sharing a lot of data with a central system to aggregate and enable the search,” says Bayan Bruss, head of applied machine-learning research at Capital One, who was not involved with this work. “This research provides a key step towards ensuring that the user receives the benefits from nearest neighbor search while having confidence that the central system will not use their data for other purposes.”

Read More