An easier way to teach robots new skills

With e-commerce orders pouring in, a warehouse robot picks mugs off a shelf and places them into boxes for shipping. Everything is humming along, until the warehouse processes a change and the robot must now grasp taller, narrower mugs that are stored upside down.

Reprogramming that robot involves hand-labeling thousands of images that show it how to grasp these new mugs, then training the system all over again.

But a new technique developed by MIT researchers would require only a handful of human demonstrations to reprogram the robot. This machine-learning method enables a robot to pick up and place never-before-seen objects that are in random poses it has never encountered. Within 10 to 15 minutes, the robot would be ready to perform a new pick-and-place task.

The technique uses a neural network specifically designed to reconstruct the shapes of 3D objects. With just a few demonstrations, the system uses what the neural network has learned about 3D geometry to grasp new objects that are similar to those in the demos.

In simulations and using a real robotic arm, the researchers show that their system can effectively manipulate never-before-seen mugs, bowls, and bottles, arranged in random poses, using only 10 demonstrations to teach the robot.

“Our major contribution is the general ability to much more efficiently provide new skills to robots that need to operate in more unstructured environments where there could be a lot of variability. The concept of generalization by construction is a fascinating capability because this problem is typically so much harder,” says Anthony Simeonov, a graduate student in electrical engineering and computer science (EECS) and co-lead author of the paper.

Simeonov wrote the paper with co-lead author Yilun Du, an EECS graduate student; Andrea Tagliasacchi, a staff research scientist at Google Brain; Joshua B. Tenenbaum, the Paul E. Newton Career Development Professor of Cognitive Science and Computation in the Department of Brain and Cognitive Sciences and a member of the Computer Science and Artificial Intelligence Laboratory (CSAIL); Alberto Rodriguez, the Class of 1957 Associate Professor in the Department of Mechanical Engineering; and senior authors Pulkit Agrawal, a professor in CSAIL, and Vincent Sitzmann, an incoming assistant professor in EECS. The research will be presented at the International Conference on Robotics and Automation.

Grasping geometry

A robot may be trained to pick up a specific item, but if that object is lying on its side (perhaps it fell over), the robot sees this as a completely new scenario. This is one reason it is so hard for machine-learning systems to generalize to new object orientations.

To overcome this challenge, the researchers created a new type of neural network model, a Neural Descriptor Field (NDF), that learns the 3D geometry of a class of items. The model computes the geometric representation for a specific item using a 3D point cloud, which is a set of data points or coordinates in three dimensions. The data points can be obtained from a depth camera that provides information on the distance between the object and a viewpoint. While the network was trained in simulation on a large dataset of synthetic 3D shapes, it can be directly applied to objects in the real world.

The team designed the NDF with a property known as equivariance. With this property, if the model is shown an image of an upright mug, and then shown an image of the same mug on its side, it understands that the second mug is the same object, just rotated.

“This equivariance is what allows us to much more effectively handle cases where the object you observe is in some arbitrary orientation,” Simeonov says.

As the NDF learns to reconstruct shapes of similar objects, it also learns to associate related parts of those objects. For instance, it learns that the handles of mugs are similar, even if some mugs are taller or wider than others, or have smaller or longer handles.

“If you wanted to do this with another approach, you’d have to hand-label all the parts. Instead, our approach automatically discovers these parts from the shape reconstruction,” Du says.

The researchers use this trained NDF model to teach a robot a new skill with only a few physical examples. They move the hand of the robot onto the part of an object they want it to grip, like the rim of a bowl or the handle of a mug, and record the locations of the fingertips.

Because the NDF has learned so much about 3D geometry and how to reconstruct shapes, it can infer the structure of a new shape, which enables the system to transfer the demonstrations to new objects in arbitrary poses, Du explains.

Picking a winner

They tested their model in simulations and on a real robotic arm using mugs, bowls, and bottles as objects. Their method had a success rate of 85 percent on pick-and-place tasks with new objects in new orientations, while the best baseline was only able to achieve a success rate of 45 percent. Success means grasping a new object and placing it on a target location, like hanging mugs on a rack.

Many baselines use 2D image information rather than 3D geometry, which makes it more difficult for these methods to integrate equivariance. This is one reason the NDF technique performed so much better.

While the researchers were happy with its performance, their method only works for the particular object category on which it is trained. A robot taught to pick up mugs won’t be able to pick up boxes or headphones, since these objects have geometric features that are too different than what the network was trained on.

“In the future, scaling it up to many categories or completely letting go of the notion of category altogether would be ideal,” Simeonov says.

They also plan to adapt the system for nonrigid objects and, in the longer term, enable the system to perform pick-and-place tasks when the target area changes.

This work is supported, in part, by the Defense Advanced Research Projects Agency, the Singapore Defense Science and Technology Agency, and the National Science Foundation.

Read More

A new state of the art for unsupervised vision

Labeling data can be a chore. It’s the main source of sustenance for computer-vision models; without it, they’d have a lot of difficulty identifying objects, people, and other important image characteristics. Yet producing just an hour of tagged and labeled data can take a whopping 800 hours of human time. Our high-fidelity understanding of the world develops as machines can better perceive and interact with our surroundings. But they need more help.

Scientists from MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL), Microsoft, and Cornell University have attempted to solve this problem plaguing vision models by creating “STEGO,” an algorithm that can jointly discover and segment objects without any human labels at all, down to the pixel.

STEGO learns something called “semantic segmentation” — fancy speak for the process of assigning a label to every pixel in an image. Semantic segmentation is an important skill for today’s computer-vision systems because images can be cluttered with objects. Even more challenging is that these objects don’t always fit into literal boxes; algorithms tend to work better for discrete “things” like people and cars as opposed to “stuff” like vegetation, sky, and mashed potatoes. A previous system might simply perceive a nuanced scene of a dog playing in the park as just a dog, but by assigning every pixel of the image a label, STEGO can break the image into its main ingredients: a dog, sky, grass, and its owner.

Assigning every single pixel of the world a label is ambitious — especially without any kind of feedback from humans. The majority of algorithms today get their knowledge from mounds of labeled data, which can take painstaking human-hours to source. Just imagine the excitement of labeling every pixel of 100,000 images! To discover these objects without a human’s helpful guidance, STEGO looks for similar objects that appear throughout a dataset. It then associates these similar objects together to construct a consistent view of the world across all of the images it learns from.

Seeing the world

Machines that can “see” are crucial for a wide array of new and emerging technologies like self-driving cars and predictive modeling for medical diagnostics. Since STEGO can learn without labels, it can detect objects in many different domains, even those that humans don’t yet understand fully. 

“If you’re looking at oncological scans, the surface of planets, or high-resolution biological images, it’s hard to know what objects to look for without expert knowledge. In emerging domains, sometimes even human experts don’t know what the right objects should be,” says Mark Hamilton, a PhD student in electrical engineering and computer science at MIT, research affiliate of MIT CSAIL, software engineer at Microsoft, and lead author on a new paper about STEGO. “In these types of situations where you want to design a method to operate at the boundaries of science, you can’t rely on humans to figure it out before machines do.”

STEGO was tested on a slew of visual domains spanning general images, driving images, and high-altitude aerial photographs. In each domain, STEGO was able to identify and segment relevant objects that were closely aligned with human judgments. STEGO’s most diverse benchmark was the COCO-Stuff dataset, which is made up of diverse images from all over the world, from indoor scenes to people playing sports to trees and cows. In most cases, the previous state-of-the-art system could capture a low-resolution gist of a scene, but struggled on fine-grained details: A human was a blob, a motorcycle was captured as a person, and it couldn’t recognize any geese. On the same scenes, STEGO doubled the performance of previous systems and discovered concepts like animals, buildings, people, furniture, and many others.

STEGO not only doubled the performance of prior systems on the COCO-Stuff benchmark, but made similar leaps forward in other visual domains. When applied to driverless car datasets, STEGO successfully segmented out roads, people, and street signs with much higher resolution and granularity than previous systems. On images from space, the system broke down every single square foot of the surface of the Earth into roads, vegetation, and buildings. 

Connecting the pixels

STEGO — which stands for “Self-supervised Transformer with Energy-based Graph Optimization” — builds on top of the DINO algorithm, which learned about the world through 14 million images from the ImageNet database. STEGO refines the DINO backbone through a learning process that mimics our own way of stitching together pieces of the world to make meaning. 

For example, you might consider two images of dogs walking in the park. Even though they’re different dogs, with different owners, in different parks, STEGO can tell (without humans) how each scene’s objects relate to each other. The authors even probe STEGO’s mind to see how each little, brown, furry thing in the images are similar, and likewise with other shared objects like grass and people. By connecting objects across images, STEGO builds a consistent view of the word.

“The idea is that these types of algorithms can find consistent groupings in a largely automated fashion so we don’t have to do that ourselves,” says Hamilton. “It might have taken years to understand complex visual datasets like biological imagery, but if we can avoid spending 1,000 hours combing through data and labeling it, we can find and discover new information that we might have missed. We hope this will help us understand the visual word in a more empirically grounded way.”

Looking ahead

Despite its improvements, STEGO still faces certain challenges. One is that labels can be arbitrary. For example, the labels of the COCO-Stuff dataset distinguish between “food-things” like bananas and chicken wings, and “food-stuff” like grits and pasta. STEGO doesn’t see much of a distinction there. In other cases, STEGO was confused by odd images — like one of a banana sitting on a phone receiver — where the receiver was labeled “foodstuff,” instead of “raw material.” 

For upcoming work, they’re planning to explore giving STEGO a bit more flexibility than just labeling pixels into a fixed number of classes as things in the real world can sometimes be multiple things at the same time (like “food”, “plant” and “fruit”). The authors hope this will give the algorithm room for uncertainty, trade-offs, and more abstract thinking.

“In making a general tool for understanding potentially complicated datasets, we hope that this type of an algorithm can automate the scientific process of object discovery from images. There’s a lot of different domains where human labeling would be prohibitively expensive, or humans simply don’t even know the specific structure, like in certain biological and astrophysical domains. We hope that future work enables application to a very broad scope of datasets. Since you don’t need any human labels, we can now start to apply ML tools more broadly,” says Hamilton.

“STEGO is simple, elegant, and very effective. I consider unsupervised segmentation to be a benchmark for progress in image understanding, and a very difficult problem. The research community has made terrific progress in unsupervised image understanding with the adoption of transformer architectures,” says Andrea Vedaldi, professor of computer vision and machine learning and a co-lead of the Visual Geometry Group at the engineering science department of the University of Oxford. “This research provides perhaps the most direct and effective demonstration of this progress on unsupervised segmentation.” 

Hamilton wrote the paper alongside MIT CSAIL PhD student Zhoutong Zhang, Assistant Professor Bharath Hariharan of Cornell University, Associate Professor Noah Snavely of Cornell Tech, and MIT professor William T. Freeman. They will present the paper at the 2022 International Conference on Learning Representations (ICLR). 

Read More

Anticipating others’ behavior on the road

Humans may be one of the biggest roadblocks keeping fully autonomous vehicles off city streets.

If a robot is going to navigate a vehicle safely through downtown Boston, it must be able to predict what nearby drivers, cyclists, and pedestrians are going to do next.

Behavior prediction is a tough problem, however, and current artificial intelligence solutions are either too simplistic (they may assume pedestrians always walk in a straight line), too conservative (to avoid pedestrians, the robot just leaves the car in park), or can only forecast the next moves of one agent (roads typically carry many users at once.)  

MIT researchers have devised a deceptively simple solution to this complicated challenge. They break a multiagent behavior prediction problem into smaller pieces and tackle each one individually, so a computer can solve this complex task in real-time.

Their behavior-prediction framework first guesses the relationships between two road users — which car, cyclist, or pedestrian has the right of way, and which agent will yield — and uses those relationships to predict future trajectories for multiple agents.

These estimated trajectories were more accurate than those from other machine-learning models, compared to real traffic flow in an enormous dataset compiled by autonomous driving company Waymo. The MIT technique even outperformed Waymo’s recently published model. And because the researchers broke the problem into simpler pieces, their technique used less memory.

“This is a very intuitive idea, but no one has fully explored it before, and it works quite well. The simplicity is definitely a plus. We are comparing our model with other state-of-the-art models in the field, including the one from Waymo, the leading company in this area, and our model achieves top performance on this challenging benchmark. This has a lot of potential for the future,” says co-lead author Xin “Cyrus” Huang, a graduate student in the Department of Aeronautics and Astronautics and a research assistant in the lab of Brian Williams, professor of aeronautics and astronautics and a member of the Computer Science and Artificial Intelligence Laboratory (CSAIL).

Joining Huang and Williams on the paper are three researchers from Tsinghua University in China: co-lead author Qiao Sun, a research assistant; Junru Gu, a graduate student; and senior author Hang Zhao PhD ’19, an assistant professor. The research will be presented at the Conference on Computer Vision and Pattern Recognition.

Multiple small models

The researchers’ machine-learning method, called M2I, takes two inputs: past trajectories of the cars, cyclists, and pedestrians interacting in a traffic setting such as a four-way intersection, and a map with street locations, lane configurations, etc.

Using this information, a relation predictor infers which of two agents has the right of way first, classifying one as a passer and one as a yielder. Then a prediction model, known as a marginal predictor, guesses the trajectory for the passing agent, since this agent behaves independently.

A second prediction model, known as a conditional predictor, then guesses what the yielding agent will do based on the actions of the passing agent. The system predicts a number of different trajectories for the yielder and passer, computes the probability of each one individually, and then selects the six joint results with the highest likelihood of occurring.

M2I outputs a prediction of how these agents will move through traffic for the next eight seconds. In one example, their method caused a vehicle to slow down so a pedestrian could cross the street, then speed up when they cleared the intersection. In another example, the vehicle waited until several cars had passed before turning from a side street onto a busy, main road.

While this initial research focuses on interactions between two agents, M2I could infer relationships among many agents and then guess their trajectories by linking multiple marginal and conditional predictors.

predictionprediction

Real-world driving tests

The researchers trained the models using the Waymo Open Motion Dataset, which contains millions of real traffic scenes involving vehicles, pedestrians, and cyclists recorded by lidar (light detection and ranging) sensors and cameras mounted on the company’s autonomous vehicles. They focused specifically on cases with multiple agents.

To determine accuracy, they compared each method’s six prediction samples, weighted by their confidence levels, to the actual trajectories followed by the cars, cyclists, and pedestrians in a scene. Their method was the most accurate. It also outperformed the baseline models on a metric known as overlap rate; if two trajectories overlap, that indicates a collision. M2I had the lowest overlap rate.

“Rather than just building a more complex model to solve this problem, we took an approach that is more like how a human thinks when they reason about interactions with others. A human does not reason about all hundreds of combinations of future behaviors. We make decisions quite fast,” Huang says.

Another advantage of M2I is that, because it breaks the problem down into smaller pieces, it is easier for a user to understand the model’s decision making. In the long run, that could help users put more trust in autonomous vehicles, says Huang.

But the framework can’t account for cases where two agents are mutually influencing each other, like when two vehicles each nudge forward at a four-way stop because the drivers aren’t sure who should be yielding.

They plan to address this limitation in future work. They also want to use their method to simulate realistic interactions between road users, which could be used to verify planning algorithms for self-driving cars or create huge amounts of synthetic driving data to improve model performance.

“Predicting future trajectories of multiple, interacting agents is under-explored and extremely challenging for enabling full autonomy in complex scenes. M2I provides a highly promising prediction method with the relation predictor to discriminate agents predicted marginally or conditionally which significantly simplifies the problem,” wrote Masayoshi Tomizuka, the Cheryl and John Neerhout, Jr. Distinguished Professor of Mechanical Engineering at University of California at Berkeley and Wei Zhan, an assistant professional researcher, in an email. “The prediction model can capture the inherent relation and interactions of the agents to achieve the state-of-the-art performance.” The two colleagues were not involved in the research.

This research is supported, in part, by the Qualcomm Innovation Fellowship. Toyota Research Institute also provided funds to support this work.

Read More

Learning to think critically about machine learning

Students in the MIT course 6.036 (Introduction to Machine Learning) study the principles behind powerful models that help physicians diagnose disease or aid recruiters in screening job candidates.

Now, thanks to the Social and Ethical Responsibilities of Computing (SERC) framework, these students will also stop to ponder the implications of these artificial intelligence tools, which sometimes come with their share of unintended consequences.

Last winter, a team of SERC Scholars worked with instructor Leslie Kaelbling, the Panasonic Professor of Computer Science and Engineering, and the 6.036 teaching assistants to infuse weekly labs with material covering ethical computing, data and model bias, and fairness in machine learning. The process was initiated in the fall of 2019 by Jacob Andreas, the X Consortium Assistant Professor in the Department of Electrical Engineering and Computer Science. SERC Scholars collaborate in multidisciplinary teams to help postdocs and faculty develop new course material.

Because 6.036 is such a large course, more than 500 students who were enrolled in the 2021 spring term grappled with these ethical dimensions alongside their efforts to learn new computing techniques. For some, it may have been their first experience thinking critically in an academic setting about the potential negative impacts of machine learning.

The SERC Scholars evaluated each lab to develop concrete examples and ethics-related questions to fit that week’s material. Each brought a different toolset. Serena Booth is a graduate student in the Interactive Robotics Group of the Computer Science and Artificial Intelligence Laboratory (CSAIL). Marion Boulicault was a graduate student in the Department of Linguistics and Philosophy, and is now a postdoc in the MIT Schwarzman College of Computing, where SERC is based. And Rodrigo Ochigame was a graduate student in the Program in History, Anthropology, and Science, Technology, and Society (HASTS) and is now an assistant professor at Leiden University in the Netherlands. They collaborated closely with teaching assistant Dheekshita Kumar, MEng ’21, who was instrumental in developing the course materials.

They brainstormed and iterated on each lab, while working closely with the teaching assistants to ensure the content fit and would advance the core learning objectives of the course. At the same time, they helped the teaching assistants determine the best way to present the material and lead conversations on topics with social implications, such as race, gender, and surveillance.

“In a class like 6.036, we are dealing with 500 people who are not there to learn about ethics. They think they are there to learn the nuts and bolts of machine learning, like loss functions, activation functions, and things like that. We have this challenge of trying to get those students to really participate in these discussions in a very active and engaged way. We did that by tying the social questions very intimately with the technical content,” Booth says.

For instance, in a lab on how to represent input features for a machine learning model, they introduced different definitions of fairness, asked students to consider the pros and cons of each definition, then challenged them to think about the features that should be input into a model to make it fair.

Four labs have now been published on MIT OpenCourseWare. A new team of SERC Scholars is revising the other eight, based on feedback from the instructors and students, with a focus on learning objectives, filling in gaps, and highlighting important concepts.

An intentional approach

The students’ efforts on 6.036 show how SERC aims to work with faculty in ways that work for them, says Julie Shah, associate dean of SERC and professor of aeronautics and astronautics. They adapted the SERC process due to the unique nature of this large course and tight time constraints.

SERC was established more than two years ago through the MIT Schwarzman College of Computing as an intentional approach to bring faculty from divergent disciplines together into a collaborative setting to co-create and launch new course material focused on social and responsible computing.

Each semester, the SERC team invites about a dozen faculty members to join an Action Group dedicated to developing new curricular materials (there are several SERC Action Groups, each with a different mission). They are purposeful in whom they invite, and seek to include faculty members who will likely form fruitful partnerships in smaller subgroups, says David Kaiser, associate dean of SERC, the Germeshausen Professor of the History of Science, and professor of physics.

These subgroups of two or three faculty members hone their shared interest over the course of the term to develop new ethics-related material. But rather than one discipline serving another, the process is a two-way street; every faculty member brings new material back to their course, Shah explains. Faculty are drawn to the Action Groups from all of MIT’s five schools.

“Part of this involves going outside your normal disciplinary boundaries and building a language, and then trusting and collaborating with someone new outside of your normal circles. That’s why I think our intentional approach has been so successful. It is good to pilot materials and bring new things back to your course, but building relationships is the core. That makes this something valuable for everybody,” she says.

Making an impact

Over the past two years, Shah and Kaiser have been impressed by the energy and enthusiasm surrounding these efforts.

They have worked with about 80 faculty members since the program started, and more than 2,100 students took courses that included new SERC content in the last year alone. Those students aren’t all necessarily engineers — about 500 were exposed to SERC content through courses offered in the School of Humanities, Arts, and Social Sciences, the Sloan School of Management, and the School of Architecture and Planning.

Central to SERC is the principle that ethics and social responsibility in computing should be integrated into all areas of teaching at MIT, so it becomes just as relevant as the technical parts of the curriculum, Shah says. Technology, and AI in particular, now touches nearly every industry, so students in all disciplines should have training that helps them understand these tools, and think deeply about their power and pitfalls.

“It is not someone else’s job to figure out the why or what happens when things go wrong. It is all of our responsibility and we can all be equipped to do it. Let’s get used to that. Let’s build up that muscle of being able to pause and ask those tough questions, even if we can’t identify a single answer at the end of a problem set,” Kaiser says.

For the three SERC Scholars, it was uniquely challenging to carefully craft ethical questions when there was no answer key to refer to. But thinking deeply about such thorny problems also helped Booth, Boulicault, and Ochigame learn, grow, and see the world through the lens of other disciplines.

They are hopeful the undergraduates and teaching assistants in 6.036 take these important lessons to heart, and into their future careers.

“I was inspired and energized by this process, and I learned so much, not just the technical material, but also what you can achieve when you collaborate across disciplines. Just the scale of this effort felt exciting. If we have this cohort of 500 students who go out into the world with a better understanding of how to think about these sorts of problems, I feel like we could really make a difference,” Boulicault says.

Read More

Three MIT students awarded 2022 Paul and Daisy Soros Fellowships for New Americans

MIT graduate student Fernanda De La Torre, alumna Trang Luu ’18, SM ’20, and senior Syamantak Payra are recipients of the 2022 Paul and Daisy Soros Fellowships for New Americans.

De La Torre, Luu, and Payra are among 30 New Americans selected from a pool of over 1,800 applicants. The fellowship honors the contributions of immigrants and children of immigrants by providing $90,000 in funding for graduate school.

Students interested in applying to the P.D. Soros Fellowship for future years may contact Kim Benard, associate dean of distinguished fellowships in Career Advising and Professional Development.

Fernanda De La Torre

Fernanda De La Torre is a PhD student in the Department of Brain and Cognitive Sciences. With Professor Josh McDermott, she studies how we integrate vision and sound, and with Professor Robert Yang, she develops computational models of imagination. 

De La Torre spent her early childhood with her younger sister and grandmother in Guadalajara, Mexico. At age 12, she crossed the Mexican border to reunite with her mother in Kansas City, Missouri. Shortly after, an abusive home environment forced De La Torre to leave her family and support herself throughout her early teens.

Despite her difficult circumstances, De La Torre excelled academically in high school. By winning various scholarships that would discretely take applications from undocumented students, she was able to continue her studies in computer science and mathematics at Kansas State University. There, she became intrigued by the mysteries of the human mind. During college, De La Torre received invaluable mentorship from her former high school principal, Thomas Herrera, who helped her become documented through the Violence Against Women Act. Her college professor, William Hsu, supported her interests in artificial intelligence and encouraged her to pursue a scientific career.

After her undergraduate studies, De La Torre won a post-baccalaureate fellowship from the Department of Brain and Cognitive Sciences at MIT, where she worked with Professor Tomaso Poggio on the theory of deep learning. She then transitioned into the department’s PhD program. Beyond contributing to scientific knowledge, De La Torre plans to use science to create spaces where all people, including those from backgrounds like her own, can innovate and thrive.

She says: “Immigrants face many obstacles, but overcoming them gives us a unique strength: We learn to become resilient, while relying on friends and mentors. These experiences foster both the desire and the ability to pay it forward to our community.”

Trang Luu

Trang Luu graduated from MIT with a BS in mechanical engineering in 2018, and a master of engineering degree in 2020. Her Soros award will support her graduate studies at Harvard University in the MBA/MS engineering sciences program.

Born in Saigon, Vietnam, Luu was 3 when her family immigrated to Houston, Texas. Watching her parents’ efforts to make a living in a land where they did not understand the culture or speak the language well, Luu wanted to alleviate hardship for her family. She took full responsibility for her education and found mentors to help her navigate the American education system. At home, she assisted her family in making and repairing household items, which fueled her excitement for engineering.

As an MIT undergraduate, Luu focused on assistive technology projects, applying her engineering background to solve problems impeding daily living. These projects included a new adaptive socket liner for below-the-knee amputees in Kenya, Ethiopia, and Thailand; a walking stick adapter for wheelchairs; a computer head pointer for patients with limited arm mobility, a safer makeshift cook stove design for street vendors in South Africa; and a quicker method to test new drip irrigation designs. As a graduate student in MIT D-Lab under the direction of Professor Daniel Frey, Luu was awarded a National Science Foundation Graduate Research Fellowship. In her graduate studies, Luu researched methods to improve evaporative cooling devices for off-grid farmers to reduce rapid fruit and vegetable deterioration.

These projects strengthened Luu’s commitment to innovating new technology and devices for people struggling with basic daily tasks. During her senior year, Luu collaborated on developing a working prototype of a wearable device that noninvasively reduces hand tremors associated with Parkinson’s disease or essential tremor. Observing patients’ joy after their tremors stopped compelled Luu and three co-founders to continue developing the device after college. Four years later, Encora Therapeutics has accomplished major milestones, including Breakthrough Device designation by the U.S. Food and Drug Administration.

Syamantak Payra

Hailing from Houston, Texas, Syamantak Payra is a senior majoring in electrical engineering and computer science, with minors in public policy and entrepreneurship and innovation. He will be pursuing a PhD in engineering, with the goal of creating new biomedical devices that can help improve daily life for patients worldwide and enhance health care outcomes for decades to come.

Payra’s parents had emigrated from India, and he grew up immersed in his grandparents’ rich Bengali culture. As a high school student, he conducted projects with NASA engineers at Johnson Space Center, experimented at home with his scientist parents, and competed in spelling bees and science fairs across the United States. Through these avenues and activities, Syamantak not only gained perspectives on bridging gaps between people, but also found passions for language, scientific discovery, and teaching others.

After watching his grandmother struggle with asthma and chronic obstructive pulmonary disease and losing his baby brother to brain cancer, Payra devoted himself to trying to use technology to solve health-care challenges. Payra’s proudest accomplishments include building a robotic leg brace for his paralyzed teacher and conducting free literacy workshops and STEM outreach programs that reached nearly a thousand underprivileged students across the Greater Houston Area.

At MIT, Payra has worked in Professor Yoel Fink’s research laboratory, creating digital sensor fibers that have been woven into intelligent garments that can assist in diagnosing illnesses, and in Professor Joseph Paradiso’s research laboratory, where he contributed to next-generation spacesuit prototypes that better protect astronauts on spacewalks. Payra’s research has been published by multiple scientific journals, and he was inducted into the National Gallery of America’s Young Inventors.

Read More

MIT Schwarzman College of Computing unveils Break Through Tech AI

Aimed at driving diversity and inclusion in artificial intelligence, the MIT Stephen A. Schwarzman College of Computing is launching Break Through Tech AI, a new program to bridge the talent gap for women and underrepresented genders in AI positions in industry.

Break Through Tech AI will provide skills-based training, industry-relevant portfolios, and mentoring to qualified undergraduate students in the Greater Boston area in order to position them more competitively for careers in data science, machine learning, and artificial intelligence. The free, 18-month program will also provide each student with a stipend for participation to lower the barrier for those typically unable to engage in an unpaid, extra-curricular educational opportunity.

“Helping position students from diverse backgrounds to succeed in fields such as data science, machine learning, and artificial intelligence is critical for our society’s future,” says Daniel Huttenlocher, dean of the MIT Schwarzman College of Computing and Henry Ellis Warren Professor of Electrical Engineering and Computer Science. “We look forward to working with students from across the Greater Boston area to provide them with skills and mentorship to help them find careers in this competitive and growing industry.”

The college is collaborating with Break Through Tech — a national initiative launched by Cornell Tech in 2016 to increase the number of women and underrepresented groups graduating with degrees in computing — to host and administer the program locally. In addition to Boston, the inaugural artificial intelligence and machine learning program will be offered in two other metropolitan areas — one based in New York hosted by Cornell Tech and another in Los Angeles hosted by the University of California at Los Angeles Samueli School of Engineering.

“Break Through Tech’s success at diversifying who is pursuing computer science degrees and careers has transformed lives and the industry,” says Judith Spitz, executive director of Break Through Tech. “With our new collaborators, we can apply our impactful model to drive inclusion and diversity in artificial intelligence.”

The new program will kick off this summer at MIT with an eight-week, skills-based online course and in-person lab experience that teaches industry-relevant tools to build real-world AI solutions. Students will learn how to analyze datasets and use several common machine learning libraries to build, train, and implement their own ML models in a business context.

Following the summer course, students will be matched with machine-learning challenge projects for which they will convene monthly at MIT and work in teams to build solutions and collaborate with an industry advisor or mentor throughout the academic year, resulting in a portfolio of resume-quality work. The participants will also be paired with young professionals in the field to help build their network, prepare their portfolio, practice for interviews, and cultivate workplace skills.

“Leveraging the college’s strong partnership with industry, Break Through AI will offer unique opportunities to students that will enhance their portfolio in machine learning and AI,” says Asu Ozdaglar, deputy dean of academics of the MIT Schwarzman College of Computing and head of the Department of Electrical Engineering and Computer Science. Ozdaglar, who will be the MIT faculty director of Break Through Tech AI, adds: “The college is committed to making computing inclusive and accessible for all. We’re thrilled to host this program at MIT for the Greater Boston area and to do what we can to help increase diversity in computing fields.”

Break Through Tech AI is part of the MIT Schwarzman College of Computing’s focus to advance diversity, equity, and inclusion in computing. The college aims to improve and create programs and activities that broaden participation in computing classes and degree programs, increase the diversity of top faculty candidates in computing fields, and ensure that faculty search and graduate admissions processes have diverse slates of candidates and interviews.

“By engaging in activities like Break Through Tech AI that work to improve the climate for underrepresented groups, we’re taking an important step toward creating more welcoming environments where all members can innovate and thrive,” says Alana Anderson, assistant dean for diversity, equity and inclusion for the Schwarzman College of Computing.

Read More

Engineers enlist AI to help scale up advanced solar cell manufacturing

Perovskites are a family of materials that are currently the leading contender to potentially replace today’s silicon-based solar photovoltaics. They hold the promise of panels that are far thinner and lighter, that could be made with ultra-high throughput at room temperature instead of at hundreds of degrees, and that are cheaper and easier to transport and install. But bringing these materials from controlled laboratory experiments into a product that can be manufactured competitively has been a long struggle.

Manufacturing perovskite-based solar cells involves optimizing at least a dozen or so variables at once, even within one particular manufacturing approach among many possibilities. But a new system based on a novel approach to machine learning could speed up the development of optimized production methods and help make the next generation of solar power a reality.

The system, developed by researchers at MIT and Stanford University over the last few years, makes it possible to integrate data from prior experiments, and information based on personal observations by experienced workers, into the machine learning process. This makes the outcomes more accurate and has already led to the manufacturing of perovskite cells with an energy conversion efficiency of 18.5 percent, a competitive level for today’s market.

The research is reported today in the journal Joule, in a paper by MIT professor of mechanical engineering Tonio Buonassisi, Stanford professor of materials science and engineering Reinhold Dauskardt, recent MIT research assistant Zhe Liu, Stanford doctoral graduate Nicholas Rolston, and three others.

Perovskites are a group of layered crystalline compounds defined by the configuration of the atoms in their crystal lattice. There are thousands of such possible compounds and many different ways of making them. While most lab-scale development of perovskite materials uses a spin-coating technique, that’s not practical for larger-scale manufacturing, so companies and labs around the world have been searching for ways of translating these lab materials into a practical, manufacturable product.

“There’s always a big challenge when you’re trying to take a lab-scale process and then transfer it to something like a startup or a manufacturing line,” says Rolston, who is now an assistant professor at Arizona State University. The team looked at a process that they felt had the greatest potential, a method called rapid spray plasma processing, or RSPP.

The manufacturing process would involve a moving roll-to-roll surface, or series of sheets, on which the precursor solutions for the perovskite compound would be sprayed or ink-jetted as the sheet rolled by. The material would then move on to a curing stage, providing a rapid and continuous output “with throughputs that are higher than for any other photovoltaic technology,” Rolston says.

“The real breakthrough with this platform is that it would allow us to scale in a way that no other material has allowed us to do,” he adds. “Even materials like silicon require a much longer timeframe because of the processing that’s done. Whereas you can think of [this approach as more] like spray painting.”

Within that process, at least a dozen variables may affect the outcome, some of them more controllable than others. These include the composition of the starting materials, the temperature, the humidity, the speed of the processing path, the distance of the nozzle used to spray the material onto a substrate, and the methods of curing the material. Many of these factors can interact with each other, and if the process is in open air, then humidity, for example, may be uncontrolled. Evaluating all possible combinations of these variables through experimentation is impossible, so machine learning was needed to help guide the experimental process.

But while most machine-learning systems use raw data such as measurements of the electrical and other properties of test samples, they don’t typically incorporate human experience such as qualitative observations made by the experimenters of the visual and other properties of the test samples, or information from other experiments reported by other researchers. So, the team found a way to incorporate such outside information into the machine learning model, using a probability factor based on a mathematical technique called Bayesian Optimization.

Using the system, he says, “having a model that comes from experimental data, we can find out trends that we weren’t able to see before.” For example, they initially had trouble adjusting for uncontrolled variations in humidity in their ambient setting. But the model showed them “that we could overcome our humidity challenges by changing the temperature, for instance, and by changing some of the other knobs.”

The system now allows experimenters to much more rapidly guide their process in order to optimize it for a given set of conditions or required outcomes. In their experiments, the team focused on optimizing the power output, but the system could also be used to simultaneously incorporate other criteria, such as cost and durability — something members of the team are continuing to work on, Buonassisi says.

The researchers were encouraged by the Department of Energy, which sponsored the work, to commercialize the technology, and they’re currently focusing on tech transfer to existing perovskite manufacturers. “We are reaching out to companies now,” Buonassisi says, and the code they developed has been made freely available through an open-source server. “It’s now on GitHub, anyone can download it, anyone can run it,” he says. “We’re happy to help companies get started in using our code.”

Already, several companies are gearing up to produce perovskite-based solar panels, even though they are still working out the details of how to produce them, says Liu, who is now at the Northwestern Polytechnical University in Xi’an, China. He says companies there are not yet doing large-scale manufacturing, but instead starting with smaller, high-value applications such as building-integrated solar tiles where appearance is important. Three of these companies “are on track or are being pushed by investors to manufacture 1 meter by 2-meter rectangular modules [comparable to today’s most common solar panels], within two years,” he says.

‘The problem is, they don’t have a consensus on what manufacturing technology to use,” Liu says. The RSPP method, developed at Stanford, “still has a good chance” to be competitive, he says. And the machine learning system the team developed could prove to be important in guiding the optimization of whatever process ends up being used.

“The primary goal was to accelerate the process, so it required less time, less experiments, and less human hours to develop something that is usable right away, for free, for industry,” he says.

“Existing work on machine-learning-driven perovskite PV fabrication largely focuses on spin-coating, a lab-scale technique,” says Ted Sargent, University Professor at the University of Toronto, who was not associated with this work, which he says demonstrates “a workflow that is readily adapted to the deposition techniques that dominate the thin-film industry. Only a handful of groups have the simultaneous expertise in engineering and computation to drive such advances.” Sargent adds that this approach “could be an exciting advance for the manufacture of a broader family of materials” including LEDs, other PV technologies, and graphene, “in short, any industry that uses some form of vapor or vacuum deposition.” 

The team also included Austin Flick and Thomas Colburn at Stanford and Zekun Ren at the Singapore-MIT Alliance for Science and Technology (SMART). In addition to the Department of Energy, the work was supported by a fellowship from the MIT Energy Initiative, the Graduate Research Fellowship Program from the National Science Foundation, and the SMART program.

Read More

MIT’s FutureMakers programs help kids get their minds around — and hands on — AI

As she was looking for a camp last summer, Yabesra Ewnetu, who’d just finished eighth grade, found a reference to MIT’s FutureMakers Create-a-thon. Ewnetu had heard that it’s hard to detect bias in artificial intelligence because AI algorithms are so complex, but this didn’t make sense to her. “I was like, well, we’re the ones coding it, shouldn’t we be able to see what it’s doing and explain why?” She signed up for the six-week virtual FutureMakers program so she could delve into AI herself.

FutureMakers is part of the MIT-wide Responsible AI for Social Empowerment and Education (RAISE) initiative launched earlier this year. RAISE is headquartered in the MIT Media Lab and run in collaboration with MIT Schwarzman College of Computing and MIT Open Learning.

MIT piloted FutureMakers to students from all over the United States last year in two formats.

During one-week, themed FutureMakers Workshops organized around key topics related to AI, students learn how AI technologies work, including social implications, then build something that uses AI.

And during six-week summer Create-a-Thons, middle school and high school students do a deep dive into AI and coding for four weeks, then take two weeks to design an app for social good. The Create-a-Thon culminates in a competition where teams present their ideas and prototypes to an expert panel of judges.

“We want to remove as many barriers as we possibly can to support diverse students and teachers,” says Cynthia Breazeal, a professor of media arts and sciences at MIT who founded the Media Lab’s Personal Robots Group and also heads up the RAISE initiative. All RAISE programs are free for educators and students. The courses are designed to meet students and teachers where they are in terms of resources, comfort with technology, and interests.

But it’s not all about learning to code.

“AI is shaping our behaviors, it’s shaping the way we think, it’s shaping the way we learn, and a lot of people aren’t even aware of that,” says Breazeal. “People now need to be AI literate given how AI is rapidly changing digital literacy and digital citizenship.”

The one-week FutureMaker Workshops are offered year-round. MIT trains teachers or people who work at STEM educational organizations so they can bring the tools and project-based hands-on curriculum and activities to their students. One year in, MIT has trained 60 teachers who have given workshops to more than 300 students, many from underserved and under-represented communities across the United States. Teachers and mentors choose from among four workshop themes for their training: Conversational AI, Dancing With AI, Creativity and AI, and How to Train Your Robot.

MIT worked with Lili’uokalani Trust in Hawaii to teach the How to Train Your Robot workshop during a spring break program on the remote islands of Moloka’i and Lana’i.

When the trust visited the MIT Media Lab on an East Coast study tour, “we were immediately inspired by the vast array of AI and STEM programs and decided to pilot How to Train Your Robot,” says Lili’uokalani Trust program manager Kau’ilani Arce.

The workshop introduced students to AI, image classification, and algorithmic bias, and taught them to program robots using a custom block-based coding environment built using the Scratch programming language, which was developed at the Media Lab.

On the first day, “we learned about algorithmic bias and how it can lead to deeply rooted issues, such as social and racial injustices,” Arce says. “It was a wonderful opportunity to critically think about how Native Hawaiians are equally represented in algorithms we use daily.”

The best moment for sixth-grader Yesmine Kiroloss: “When I got to program my robot!”

For students without previous AI experience, it took grit to understand the correlation between a coding environment and a functioning robot, says Arce. “There was an overwhelming sense of accomplishment.”

MIT collaborated with SureStart, a startup aimed at mentoring high school and college students in AI, for the first six-week FutureMakers Create-a-thon last summer.

The Create-a-Thon had two tracks: an MIT App Inventor track with 30 students including Ewnetu, and a Deep Learning track with 45. The 78 students hailed from more than 20 states and just over two-thirds were female.

For the first four weeks students worked in groups of eight with two graduate student mentors who summarized each day’s lessons and held office hours so the students could ask questions.

In the final two weeks, the students applied what they’d learned to create something with societal or environmental impact.

A key step was plotting out a minimum viable product: a web or mobile app that contained the minimum elements needed to illustrate their idea.

At the end of the six weeks, 15 teams of four students and a mentor showed off their ideas in an entrepreneurial-style pitch competition judged by experts form academia and industry.

Ewnetu’s team, Team Dyadic, built a prediction model to warn people about wildfires. The idea was inspired by a team member from California. The team bootstrapped a website, collected a dataset, trained a machine language model, and added an interactive map. “Our code is a prediction of how close the current conditions are to a fire condition,” says Ewnetu, who’s now a first-year student at Justice High School in Falls Church, Virginia.

The team members had a mix of experience. “There were people in the class who had a lot of [coding] experience and there were people in the class like me who had very little to no experience,” says Ewnetu. She needed a lot of help from the mentors in the first couple of weeks, but then everything clicked, she says. “It went from like an error every other line to an error maybe every other section.”

Ewnetu “is the perfect embodiment of what happens when you just provide people with support,” says SureStart founder Taniya Mishra. “[Having] high expectations is a good thing, especially if you can provide a lot of scaffolding.”

Team Dyadic made the finals. “To see all of our work culminate and then pay off just made us feel like winners,” Ewnetu says.

Meanwhile, Team Youth of Tech created the Vividly app, which allows parents to input questions for their child. When the child logs in, the app asks if they’re happy, sad, frustrated or angry, a bot named Viviana asks the questions, and the child communicates with the bot, knowing the parents can see the conversation.

The idea is to give kids a way to be open with their parents in a very comfortable environment, says Bella Baidak, a first-year masters of information student at Cornell Tech in New York City, and the team’s mentor.

“It’s a form of facilitating better communication so they can talk more,” says team member Netra Rameshbabu, now a first-year at Matea Valley High School in Aurora, Illinois.

“Our idea was to make this a routine, like brushing your teeth.”

Team Youth of Tech made the finals, then won. When they announced Team Youth, “I was screaming I was so excited! I was in tears. I was in joy,” says Rameshbabu.

The mentees did “a brilliant job,” says Kunjal Panchal, head mentor and PhD student at the University of Massachusetts at Amherst. “They know how to use AI and they know how to use it for the common good.”

This year’s six-week FutureMakers program starts July 6. Middle, high school, and undergraduate students can apply here.

Teachers can reach out to RAISE to learn about the one-week training workshops.

Students and teachers can also get started with AI this May 13 during the Day of AI. Students and their teachers from all over the country can learn about AI literacy through a modular, hands-on curriculum that supports up to four hours of learning per grade track. The Day of AI format can be taught by teachers with a wide range of technology backgrounds and are designed to be accessible to all students.

This year’s Day of AI, on May 13, includes teaching materials for upper elementary through high school. The program will eventually span all of K-12. Teachers can register here for a two-hour Day of AI teacher training program. Teaching materials are available under a Creative Commons license.

For this year’s Day of AI, students in grades 3 to 5 will learn about datasets, algorithms, predictions and bias, and will create an AI application that can tell the difference, say, between an image of a dog and that of a cat. Middle school students will learn about generative adversarial networks, which can produce both deepfakes and art. High school students will learn about the recommendation systems used by social media and their implications for individuals and for society. Students and their teachers can register here to participate.

Read More

An optimized solution for face recognition

The human brain seems to care a lot about faces. It’s dedicated a specific area to identifying them, and the neurons there are so good at their job that most of us can readily recognize thousands of individuals. With artificial intelligence, computers can now recognize faces with a similar efficiency — and neuroscientists at MIT’s McGovern Institute for Brain Research have found that a computational network trained to identify faces and other objects discovers a surprisingly brain-like strategy to sort them all out.

The finding, reported March 16 in Science Advances, suggests that the millions of years of evolution that have shaped circuits in the human brain have optimized our system for facial recognition.

“The human brain’s solution is to segregate the processing of faces from the processing of objects,” explains Katharina Dobs, who led the study as a postdoc in the lab of McGovern investigator Nancy Kanwisher, the Walter A. Rosenblith Professor of Cognitive Neuroscience at MIT. The artificial network that she trained did the same. “And that’s the same solution that we hypothesize any system that’s trained to recognize faces and to categorize objects would find,” she adds.

“These two completely different systems have figured out what a — if not the — good solution is. And that feels very profound,” says Kanwisher.

Functionally specific brain regions

More than 20 years ago, Kanwisher and her colleagues discovered a small spot in the brain’s temporal lobe that responds specifically to faces. This region, which they named the fusiform face area, is one of many brain regions Kanwisher and others have found that are dedicated to specific tasks, such as the detection of written words, the perception of vocal songs, and understanding language.

Kanwisher says that as she has explored how the human brain is organized, she has always been curious about the reasons for that organization. Does the brain really need special machinery for facial recognition and other functions? “‘Why questions’ are very difficult in science,” she says. But with a sophisticated type of machine learning called a deep neural network, her team could at least find out how a different system would handle a similar task.

Dobs, who is now a research group leader at Justus Liebig University Giessen in Germany, assembled hundreds of thousands of images with which to train a deep neural network in face and object recognition. The collection included the faces of more than 1,700 different people and hundreds of different kinds of objects, from chairs to cheeseburgers. All of these were presented to the network, with no clues about which was which. “We never told the system that some of those are faces, and some of those are objects. So it’s basically just one big task,” Dobs says. “It needs to recognize a face identity, as well as a bike or a pen.”

As the program learned to identify the objects and faces, it organized itself into an information-processing network with that included units specifically dedicated to face recognition. Like the brain, this specialization occurred during the later stages of image processing. In both the brain and the artificial network, early steps in facial recognition involve more general vision processing machinery, and final stages rely on face-dedicated components.

It’s not known how face-processing machinery arises in a developing brain, but based on their findings, Kanwisher and Dobs say networks don’t necessarily require an innate face-processing mechanism to acquire that specialization. “We didn’t build anything face-ish into our network,” Kanwisher says. “The networks managed to segregate themselves without being given a face-specific nudge.”

Kanwisher says it was thrilling seeing the deep neural network segregate itself into separate parts for face and object recognition. “That’s what we’ve been looking at in the brain for 20-some years,” she says. “Why do we have a separate system for face recognition in the brain? This tells me it is because that is what an optimized solution looks like.”

Now, she is eager to use deep neural nets to ask similar questions about why other brain functions are organized the way they are. “We have a new way to ask why the brain is organized the way it is,” she says. “How much of the structure we see in human brains will arise spontaneously by training networks to do comparable tasks?”

Read More

Does this artificial intelligence think like a human?

In machine learning, understanding why a model makes certain decisions is often just as important as whether those decisions are correct. For instance, a machine-learning model might correctly predict that a skin lesion is cancerous, but it could have done so using an unrelated blip on a clinical photo.

While tools exist to help experts make sense of a model’s reasoning, often these methods only provide insights on one decision at a time, and each must be manually evaluated. Models are commonly trained using millions of data inputs, making it almost impossible for a human to evaluate enough decisions to identify patterns.

Now, researchers at MIT and IBM Research have created a method that enables a user to aggregate, sort, and rank these individual explanations to rapidly analyze a machine-learning model’s behavior. Their technique, called Shared Interest, incorporates quantifiable metrics that compare how well a model’s reasoning matches that of a human.

Shared Interest could help a user easily uncover concerning trends in a model’s decision-making — for example, perhaps the model often becomes confused by distracting, irrelevant features, like background objects in photos. Aggregating these insights could help the user quickly and quantitatively determine whether a model is trustworthy and ready to be deployed in a real-world situation.

“In developing Shared Interest, our goal is to be able to scale up this analysis process so that you could understand on a more global level what your model’s behavior is,” says lead author Angie Boggust, a graduate student in the Visualization Group of the Computer Science and Artificial Intelligence Laboratory (CSAIL).

Boggust wrote the paper with her advisor, Arvind Satyanarayan, an assistant professor of computer science who leads the Visualization Group, as well as Benjamin Hoover and senior author Hendrik Strobelt, both of IBM Research. The paper will be presented at the Conference on Human Factors in Computing Systems.

Boggust began working on this project during a summer internship at IBM, under the mentorship of Strobelt. After returning to MIT, Boggust and Satyanarayan expanded on the project and continued the collaboration with Strobelt and Hoover, who helped deploy the case studies that show how the technique could be used in practice.

Human-AI alignment

Shared Interest leverages popular techniques that show how a machine-learning model made a specific decision, known as saliency methods. If the model is classifying images, saliency methods highlight areas of an image that are important to the model when it made its decision. These areas are visualized as a type of heatmap, called a saliency map, that is often overlaid on the original image. If the model classified the image as a dog, and the dog’s head is highlighted, that means those pixels were important to the model when it decided the image contains a dog.

Shared Interest works by comparing saliency methods to ground-truth data. In an image dataset, ground-truth data are typically human-generated annotations that surround the relevant parts of each image. In the previous example, the box would surround the entire dog in the photo. When evaluating an image classification model, Shared Interest compares the model-generated saliency data and the human-generated ground-truth data for the same image to see how well they align.

The technique uses several metrics to quantify that alignment (or misalignment) and then sorts a particular decision into one of eight categories. The categories run the gamut from perfectly human-aligned (the model makes a correct prediction and the highlighted area in the saliency map is identical to the human-generated box) to completely distracted (the model makes an incorrect prediction and does not use any image features found in the human-generated box).

“On one end of the spectrum, your model made the decision for the exact same reason a human did, and on the other end of the spectrum, your model and the human are making this decision for totally different reasons. By quantifying that for all the images in your dataset, you can use that quantification to sort through them,” Boggust explains.

The technique works similarly with text-based data, where key words are highlighted instead of image regions.

Rapid analysis

The researchers used three case studies to show how Shared Interest could be useful to both nonexperts and machine-learning researchers.

In the first case study, they used Shared Interest to help a dermatologist determine if he should trust a machine-learning model designed to help diagnose cancer from photos of skin lesions. Shared Interest enabled the dermatologist to quickly see examples of the model’s correct and incorrect predictions. Ultimately, the dermatologist decided he could not trust the model because it made too many predictions based on image artifacts, rather than actual lesions.

“The value here is that using Shared Interest, we are able to see these patterns emerge in our model’s behavior. In about half an hour, the dermatologist was able to make a confident decision of whether or not to trust the model and whether or not to deploy it,” Boggust says.

In the second case study, they worked with a machine-learning researcher to show how Shared Interest can evaluate a particular saliency method by revealing previously unknown pitfalls in the model. Their technique enabled the researcher to analyze thousands of correct and incorrect decisions in a fraction of the time required by typical manual methods.

In the third case study, they used Shared Interest to dive deeper into a specific image classification example. By manipulating the ground-truth area of the image, they were able to conduct a what-if analysis to see which image features were most important for particular predictions.   

The researchers were impressed by how well Shared Interest performed in these case studies, but Boggust cautions that the technique is only as good as the saliency methods it is based upon. If those techniques contain bias or are inaccurate, then Shared Interest will inherit those limitations.

In the future, the researchers want to apply Shared Interest to different types of data, particularly tabular data which is used in medical records. They also want to use Shared Interest to help improve current saliency techniques. Boggust hopes this research inspires more work that seeks to quantify machine-learning model behavior in ways that make sense to humans.

This work is funded, in part, by the MIT-IBM Watson AI Lab, the United States Air Force Research Laboratory, and the United States Air Force Artificial Intelligence Accelerator.

Read More