Making health care more personal

Making health care more personal

The health care system today largely focuses on helping people after they have problems. When they do receive treatment, it’s based on what has worked best on average across a huge, diverse group of patients.

Now the company Health at Scale is making health care more proactive and personalized — and, true to its name, it’s doing so for millions of people.

Health at Scale uses a new approach for making care recommendations based on new classes of machine-learning models that work even when only small amounts of data on individual patients, providers, and treatments are available.

The company is already working with health plans, insurers, and employers to match patients with doctors. It’s also helping to identify people at rising risk of visiting the emergency department or being hospitalized in the future, and to predict the progression of chronic diseases. Recently, Health at Scale showed its models can identify people at risk of severe respiratory infections like influenza or pneumonia, or, potentially, Covid-19.

“From the beginning, we decided all of our predictions would be related to achieving better outcomes for patients,” says John Guttag, chief technology officer of Health at Scale and the Dugald C. Jackson Professor of Computer Science and Electrical Engineering at MIT. “We’re trying to predict what treatment or physician or intervention would lead to better outcomes for people.”

A new approach to improving health

Health at Scale co-founder and CEO Zeeshan Syed met Guttag while studying electrical engineering and computer science at MIT. Guttag served as Syed’s advisor for his bachelor’s and master’s degrees. When Syed decided to pursue his PhD, he only applied to one school, and his advisor was easy to choose.

Syed did his PhD through the Harvard-MIT Program in Health Sciences and Technology (HST). During that time, he looked at how patients who’d had heart attacks could be better managed. The work was personal for Syed: His father had recently suffered a serious heart attack.

Through the work, Syed met Mohammed Saeed SM ’97, PhD ’07, who was also in the HST program. Syed, Guttag, and Saeed founded Health at Scale in 2015 along with  David Guttag ’05, focusing on using core advances in machine learning to solve some of health care’s hardest problems.

“It started with the burning itch to address real challenges in health care about personalization and prediction,” Syed says.

From the beginning, the founders knew their solutions needed to work with widely available data like health care claims, which include information on diagnoses, tests, prescriptions, and more. They also sought to build tools for cleaning up and processing raw data sets, so that their models would be part of what Guttag refers to as a “full machine-learning stack for health care.”

Finally, to deliver effective, personalized solutions, the founders knew their models needed to work with small numbers of encounters for individual physicians, clinics, and patients, which posed severe challenges for conventional AI and machine learning.

“The large companies getting into [the health care AI] space had it wrong in that they viewed it as a big data problem,” Guttag says. “They thought, ‘We’re the experts. No one’s better at crunching large amounts of data than us.’ We thought if you want to make the right decision for individuals, the problem was a small data problem: Each patient is different, and we didn’t want to recommend to patients what was best on average. We wanted what was best for each individual.”

The company’s first models helped recommend skilled nursing facilities for post-acute care patients. Many such patients experience further health problems and return to the hospital. Health at Scale’s models showed that some facilities were better at helping specific kinds of people with specific health problems. For example, a 64-year-old man with a history of cardiovascular disease may fare better at one facility compared to another.

Today the company’s recommendations help guide patients to the primary care physicians, surgeons, and specialists that are best suited for them. Guttag even used the service when he got his hip replaced last year.

Health at Scale also helps organizations identify people at rising risk of specific adverse health events, like heart attacks, in the future.

“We’ve gone beyond the notion of identifying people who have frequently visited emergency departments or hospitals in the past, to get to the much more actionable problem of finding those people at an inflection point, where they are likely to experience worse outcomes and higher costs,” Syed says.

The company’s other solutions help determine the best treatment options for patients and help reduce health care fraud, waste, and abuse. Each use case is designed to improve patient health outcomes by giving health care organizations decision-support for action.

“Broadly speaking, we are interested in building models that can be used to help avoid problems, rather than simply predict them,” says Guttag. “For example, identifying those individuals at highest risk for serious complications of a respiratory infection [enables care providers] to target them for interventions that reduce their chance of developing such an infection.”

Impact at scale

Earlier this year, as the scope of the Covid-19 pandemic was becoming clear, Health at Scale began considering ways its models could help.

“The lack of data in the beginning of the pandemic motivated us to look at the experiences we have gained from combatting other respiratory infections like influenza and pneumonia,” says Saeed, who serves as Health at Scale’s chief medical officer.

The idea led to a peer-reviewed paper where researchers affiliated with the company, the University of Michigan, and MIT showed Health at Scale’s models could accurately predict hospitalizations and visits to the emergency department related to respiratory infections.

“We did the work on the paper using the tech we’d already built,” Guttag says. “We had interception products deployed for predicting patients at-risk of emergent hospitalizations for a variety of causes, and we saw that we could extend that approach. We had customers that we gave the solution to for free.”

The paper proved out another use case for a technology that is already being used by some of the largest health plans in the U.S. That’s an impressive customer base for a five-year-old company of only 20 people — about half of which have MIT affiliations.

“The culture MIT creates to solve problems that are worth solving, to go after impact, I think that’s been reflected in the way the company got together and has operated,” Syed says. “I’m deeply proud that we’ve maintained that MIT spirit.”

And, Syed believes, there’s much more to come.

“We set out with the goal of driving impact,” Syed says. “We currently run some of the largest production deployments of machine learning at scale, affecting millions, if not tens of millions, of patients, and we  are only just getting started.”

Read More

Six strategic areas identified for shared faculty hiring in computing

Six strategic areas identified for shared faculty hiring in computing

Nearly every aspect of the modern world is being transformed by computing. As computing technology continues to revolutionize the way people live, work, learn, and interact, computing research and education are increasingly playing a role in a broad range of academic disciplines, and are in turn being shaped by this expanding breadth.

To connect computing and other disciplines in addressing critical challenges and opportunities facing the world today, the MIT Stephen A. Schwarzman College of Computing is planning to create 25 new faculty positions that will be shared between the college and an MIT department or school. Hiring for these new positions will be focused on six strategic areas of inquiry, to build capacity at MIT in key computing domains that cut across departments and schools. The shared faculty members are expected to engage in research and teaching that contributes to their home department, that is of mutual value to that department and the college, and that helps form and strengthen cross-departmental ties.

“These new shared faculty positions present an unprecedented opportunity to develop crucial areas at MIT which connect computing with other disciplines,” says Daniel Huttenlocher, dean of the MIT Schwarzman College of Computing. “By coordinated hiring between the college and departments and schools, we expect to have significant impact with multiple touch points across MIT.”

The six strategic areas and the schools expected to be involved in hiring for each are as follows:

Social, Economic, and Ethical Implications of Computing and Networks. Associated schools: School of Humanities, Arts and the Social Sciences and MIT Sloan School of Management.

There have been tremendous advances in new digital platforms and algorithms, which have already transformed our economic, social, and even political lives. But the future societal implications of these technologies and the consequences of the use and misuse of massive social data are poorly understood. There are exciting opportunities for building on the growing intellectual connections between computer science, data science, and social science and humanities, in order to bring a better conceptual framework to understand the social and economic implications, ethical dimensions, and regulation of these technologies.

Focusing on the interplay between computing systems and our understanding of individuals and societal institutions, this strategic hiring area will include faculty whose work focuses on the broader consequences of the changing digital and information environment, market design, digital commerce and competition, and economic and social networks. Issues of interest include how computing and AI technologies have shaped and are shaping the work of the future; how social media tools have reshaped political campaigns, changed the nature and organization of mass protests, and spurred governments to either reduce or dramatically enhance censorship and social control; increasing challenges in adjudicating what information is reliable, what is slanted, and what is entirely fake; conceptions of privacy, fairness, and transparency of algorithms; and the effects of new technologies on democratic governance.

Computing and Natural Intelligence: Cognition, Perception, and Language. Associated schools: School of Science; School of Humanities, Arts, and Social Sciences; and School of Architecture and Planning.

Intelligence — what it is, how the brain produces it, and how it can be engineered — is simultaneously one of the greatest open questions in natural sciences and the most important engineering challenges of our time. Significant advances in computing and machine learning have enabled a better understanding of the brain and the mind. Concurrently, neuroscience and cognitive science have started to give meaningful engineering guidance to AI and related computing efforts. Yet, huge gaps remain in connecting the science and engineering of intelligence.

Integrating science, computing, and social sciences and humanities, this strategic hiring area aims to address the gap between science and engineering of intelligence, in order to make transformative advances in AI and deepen our understanding of natural intelligence. Hiring in this area is expected to advance a holistic approach to understanding human perception and cognition through work such as the study of computational properties of language by bridging linguistic theory, cognitive science, and computer science; improving the art of listening by re-engineering music through music classification and machine learning, music cognition, and new interfaces for musical expression; discovering how artificial systems might help explain natural intelligence and vice versa; and seeking ways in which computing can aid in human expression, communications, health, and meaning.

Computing in Health and Life Sciences. Associated schools: School of Engineering; School of Science; and MIT Sloan School of Management.

Computing is increasingly becoming an indispensable tool in the health and life sciences. A key area is facilitating new approaches to identifying molecular and biomolecular agents with desired functions and for discovering new medications and new means of diagnosis. For instance, machine learning provides a unique opportunity in the pursuit of molecular and biomolecular discovery to parameterize and augment physics-based models, or possibly even replace them, and enable a revolution in molecular science and engineering. Another major area is health-care delivery, where novel algorithms, high performance computing, and machine learning offer new possibilities to transform health monitoring and treatment planning, facilitating better patient care, and enabling more effective ways to help prevent disease. In diagnosis, machine learning methods hold the promise of improved detection of diseases, increasing both specificity and sensitivity of imaging and testing.

This strategic area aims to hire faculty who help create transformative new computational methods in health and life sciences, while complementing the considerable existing work at MIT by forging additional connections. The broad scope ranges from computational approaches to fundamental problems in molecular design and synthesis for human health; to reshaping health-care delivery and personalized medicine; to understanding radiation effects and optimizing dose delivery on target cells; to improving tracing, imaging, and diagnosis techniques.

Computing for Health of the Planet. Associated schools: School of Engineering; School of Science; and School of Architecture and Planning.

The health of the planet is one of the most important challenges facing humankind today. Rapid industrialization has led to a number of serious threats to human and ecosystem health, including climate change, unsafe levels of air and water pollution, coastal and agricultural land erosion, and many others. Ensuring the health and safety of our planet necessitates an interdisciplinary approach that connects scientific understanding, engineering solutions, social, economic, and political aspects, with new computational methods to provide data-driven models and solutions for providing clean air, usable water, resilient food, efficient transportation systems, and identifying sustainable sources of energy.

This strategic hiring area will help facilitate such collaborations by bringing together expertise that will enable us to advance physical understanding of low-carbon energy solutions, earth-climate modelling, and urban planning through high performance computing, transformational numerical methods, and/or machine learning techniques.

Computing and Human Experience. Associated schools: School of Humanities, Arts, and Social Sciences and School of Architecture and Planning.

Computing and digital technologies are challenging the very ways in which people understand reality and our role in it. These technologies are embedded in the everyday lives of people around the world, and while frequently highly useful, they can reflect cultural assumptions and technological heritage, even though they are often viewed as being neutral prescriptions for structuring the world. Indeed, as becomes increasingly apparent, these technologies are able to alter individual and societal perceptions and actions, or affect societal institutions, in ways that are not broadly understood or intended. Moreover, although these technologies are conventionally developed for improved efficacy or efficiency, they can also provide opportunities for less utilitarian purposes such as supporting introspection and personal reflection.

This strategic hiring area focuses on growing the set of scholars in the social sciences, humanities, and computing who examine technology designs, systems, policies, and practices that can address the dual challenges of the lack of understanding of these technologies and their implications, including the design of systems that may help ameliorate rather than exacerbate inequalities. It further aims to develop techniques and systems that help people interpret and gain understanding from societal and historical data, including in humanities disciplines such as comparative literature, history, and art and architectural history.

Quantum Computing. Associated schools: School of Engineering and School of Science.

One of the most promising directions for continuing improvements in computing power comes from quantum mechanics. In the coming years, new hardware, algorithms, and discoveries offer the potential to dramatically increase the power of quantum computers far beyond current machines. Achieving these advances poses challenges that span multiple scientific and engineering fields, and from quantum hardware to quantum computing algorithms. Potential quantum computing applications span a broad range of fields, including chemistry, biology, materials science, atmospheric modeling, urban system simulation, nuclear engineering, finance, optimization, and others, requiring a deep understanding of both quantum computing algorithms and the problem space.

This strategic hiring area aims to build on MIT’s rich set of activities in the space to catalyze research and education in quantum computing and quantum information across the Institute, including the study of quantum materials; developing robust controllable quantum devices and networks that can faithfully transmit quantum information; and new algorithms for machine learning, AI, optimization, and data processing to fully leverage the promise of quantum computing.

A coordinated approach

Over the past few months, the MIT Schwarzman College of Computing has undertaken a strategic planning exercise to identify key areas for hiring the new shared faculty. The process has been led by Huttenlocher, together with MIT Provost Martin Schmidt and the deans of the five schools — Anantha Chandrakasan, dean of the School of Engineering; Melissa Nobles, Kenan Sahin Dean of the School of Humanities, Arts, and Social Sciences; Hashim Sarkis, dean of the School of Architecture and Planning; David Schmittlein, John C. Head III Dean of MIT Sloan; and Michael Sipser, dean of the School of Science — beginning with input from departments across the Institute.

This input was in the form of proposals for interdisciplinary computing areas that were solicited from department heads. A total of 29 proposals were received. Over a six-week period, the committee worked with proposing departments to identify strategic hiring themes. The process yielded the six areas that cover several critically important directions. 

“These areas not only bring together computing with numerous departments and schools, but also involve multiple modes of academic inquiry, offering opportunities for new collaborations in research and teaching across a broad range of fields,” says Schmidt. “I’m excited to see us launch this critical part of the college’s mission.”

The college will also coordinate with each of the five schools to ensure that diversity, equity, and inclusion is at the forefront for all of the hiring areas.

Hiring for the 2020-21 academic year

While the number of searches and involved schools will vary from year to year, the plan for the coming academic year is to have five searches, one with each school. These searches will be in three of the six strategic hiring areas as follows:

Social, Economic, and Ethical Implications of Computing and Networks will focus on two searches, one with the Department of Philosophy in the School of Humanities, Arts, and Social Sciences, and one with the MIT Sloan School of Management.

Computing and Natural Intelligence: Cognition, Perception and Language will focus on one search with the Department of Brain and Cognitive Sciences in the School of Science.

Computing for Health of the Planet will focus on two searches, one with the Department of Urban Studies and Planning in the School of Architecture and Planning, and one with a department to be identified in the School of Engineering.

Read More

Toward a machine learning model that can reason about everyday actions

Toward a machine learning model that can reason about everyday actions

The ability to reason abstractly about events as they unfold is a defining feature of human intelligence. We know instinctively that crying and writing are means of communicating, and that a panda falling from a tree and a plane landing are variations on descending. 

Organizing the world into abstract categories does not come easily to computers, but in recent years researchers have inched closer by training machine learning models on words and images infused with structural information about the world, and how objects, animals, and actions relate. In a new study at the European Conference on Computer Vision this month, researchers unveiled a hybrid language-vision model that can compare and contrast a set of dynamic events captured on video to tease out the high-level concepts connecting them. 

Their model did as well as or better than humans at two types of visual reasoning tasks — picking the video that conceptually best completes the set, and picking the video that doesn’t fit. Shown videos of a dog barking and a man howling beside his dog, for example, the model completed the set by picking the crying baby from a set of five videos. Researchers replicated their results on two datasets for training AI systems in action recognition: MIT’s Multi-Moments in Time and DeepMind’s Kinetics.

“We show that you can build abstraction into an AI system to perform ordinary visual reasoning tasks close to a human level,” says the study’s senior author Aude Oliva, a senior research scientist at MIT, co-director of the MIT Quest for Intelligence, and MIT director of the MIT-IBM Watson AI Lab. “A model that can recognize abstract events will give more accurate, logical predictions and be more useful for decision-making.”

As deep neural networks become expert at recognizing objects and actions in photos and video, researchers have set their sights on the next milestone: abstraction, and training models to reason about what they see. In one approach, researchers have merged the pattern-matching power of deep nets with the logic of symbolic programs to teach a model to interpret complex object relationships in a scene. Here, in another approach, researchers capitalize on the relationships embedded in the meanings of words to give their model visual reasoning power.

“Language representations allow us to integrate contextual information learned from text databases into our visual models,” says study co-author Mathew Monfort, a research scientist at MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL). “Words like ‘running,’ ‘lifting,’ and ‘boxing’ share some common characteristics that make them more closely related to the concept ‘exercising,’ for example, than ‘driving.’ ”

Using WordNet, a database of word meanings, the researchers mapped the relation of each action-class label in Moments and Kinetics to the other labels in both datasets. Words like “sculpting,” “carving,” and “cutting,” for example, were connected to higher-level concepts like “crafting,” “making art,” and “cooking.” Now when the model recognizes an activity like sculpting, it can pick out conceptually similar activities in the dataset. 

This relational graph of abstract classes is used to train the model to perform two basic tasks. Given a set of videos, the model creates a numerical representation for each video that aligns with the word representations of the actions shown in the video. An abstraction module then combines the representations generated for each video in the set to create a new set representation that is used to identify the abstraction shared by all the videos in the set.

To see how the model would do compared to humans, the researchers asked human subjects to perform the same set of visual reasoning tasks online. To their surprise, the model performed as well as humans in many scenarios, sometimes with unexpected results. In a variation on the set completion task, after watching a video of someone wrapping a gift and covering an item in tape, the model suggested a video of someone at the beach burying someone else in the sand. 

“It’s effectively ‘covering,’ but very different from the visual features of the other clips,” says Camilo Fosco, a PhD student at MIT who is co-first author of the study with PhD student Alex Andonian. “Conceptually it fits, but I had to think about it.”

Limitations of the model include a tendency to overemphasize some features. In one case, it suggested completing a set of sports videos with a video of a baby and a ball, apparently associating balls with exercise and competition.

A deep learning model that can be trained to “think” more abstractly may be capable of learning with fewer data, say researchers. Abstraction also paves the way toward higher-level, more human-like reasoning.

“One hallmark of human cognition is our ability to describe something in relation to something else — to compare and to contrast,” says Oliva. “It’s a rich and efficient way to learn that could eventually lead to machine learning models that can understand analogies and are that much closer to communicating intelligently with us.”

Other authors of the study are Allen Lee from MIT, Rogerio Feris from IBM, and Carl Vondrick from Columbia University.

Read More

Robot takes contact-free measurements of patients’ vital signs

Robot takes contact-free measurements of patients’ vital signs

The research described in this article has been published on a preprint server but has not yet been peer-reviewed by scientific or medical experts.

During the current coronavirus pandemic, one of the riskiest parts of a health care worker’s job is assessing people who have symptoms of Covid-19. Researchers from MIT and Brigham and Women’s Hospital hope to reduce that risk by using robots to remotely measure patients’ vital signs.

The robots, which are controlled by a handheld device, can also carry a tablet that allows doctors to ask patients about their symptoms without being in the same room.

“In robotics, one of our goals is to use automation and robotic technology to remove people from dangerous jobs,” says Henwei Huang, an MIT postdoc. “We thought it should be possible for us to use a robot to remove the health care worker from the risk of directly exposing themselves to the patient.”

Using four cameras mounted on a dog-like robot developed by Boston Dynamics, the researchers have shown that they can measure skin temperature, breathing rate, pulse rate, and blood oxygen saturation in healthy patients, from a distance of 2 meters. They are now making plans to test it in patients with Covid-19 symptoms.

“We are thrilled to have forged this industry-academia partnership in which scientists with engineering and robotics expertise worked with clinical teams at the hospital to bring sophisticated technologies to the bedside,” says Giovanni Traverso, an MIT assistant professor of mechanical engineering, a gastroenterologist at Brigham and Women’s Hospital, and the senior author of the study.

The researchers have posted a paper on their system on the preprint server techRxiv, and have submitted it to a peer-reviewed journal. Huang is one of the lead authors of the study, along with Peter Chai, an assistant professor of emergency medicine at Brigham and Women’s Hospital, and Claas Ehmke, a visiting scholar from ETH Zurich.

Measuring vital signs

When Covid-19 cases began surging in Boston in March, many hospitals, including Brigham and Women’s, set up triage tents outside their emergency departments to evaluate people with Covid-19 symptoms. One major component of this initial evaluation is measuring vital signs, including body temperature.

The MIT and BWH researchers came up with the idea to use robotics to enable contactless monitoring of vital signs, to allow health care workers to minimize their exposure to potentially infectious patients. They decided to use existing computer vision technologies that can measure temperature, breathing rate, pulse, and blood oxygen saturation, and worked to make them mobile.

To achieve that, they used a robot known as Spot, which can walk on four legs, similarly to a dog. Health care workers can maneuver the robot to wherever patients are sitting, using a handheld controller. The researchers mounted four different cameras onto the robot — an infrared camera plus three monochrome cameras that filter different wavelengths of light.

The researchers developed algorithms that allow them to use the infrared camera to measure both elevated skin temperature and breathing rate. For body temperature, the camera measures skin temperature on the face, and the algorithm correlates that temperature with core body temperature. The algorithm also takes into account the ambient temperature and the distance between the camera and the patient, so that measurements can be taken from different distances, under different weather conditions, and still be accurate.

Measurements from the infrared camera can also be used to calculate the patient’s breathing rate. As the patient breathes in and out, wearing a mask, their breath changes the temperature of the mask. Measuring this temperature change allows the researchers to calculate how rapidly the patient is breathing.

The three monochrome cameras each filter a different wavelength of light — 670, 810, and 880 nanometers. These wavelengths allow the researchers to measure the slight color changes that result when hemoglobin in blood cells binds to oxygen and flows through blood vessels. The researchers’ algorithm uses these measurements to calculate both pulse rate and blood oxygen saturation.

“We didn’t really develop new technology to do the measurements,” Huang says. “What we did is integrate them together very specifically for the Covid application, to analyze different vital signs at the same time.”

Continuous monitoring

In this study, the researchers performed the measurements on healthy volunteers, and they are now making plans to test their robotic approach in people who are showing symptoms of Covid-19, in a hospital emergency department.

While in the near term, the researchers plan to focus on triage applications, in the longer term, they envision that the robots could be deployed in patients’ hospital rooms. This would allow the robots to continuously monitor patients and also allow doctors to check on them, via tablet, without having to enter the room. Both applications would require approval from the U.S. Food and Drug Administration.

The research was funded by the MIT Department of Mechanical Engineering and the Karl van Tassel (1925) Career Development Professorship.

Read More

National Science Foundation announces MIT-led Institute for Artificial Intelligence and Fundamental Interactions

National Science Foundation announces MIT-led Institute for Artificial Intelligence and Fundamental Interactions

The U.S. National Science Foundation (NSF) announced today an investment of more than $100 million to establish five artificial intelligence (AI) institutes, each receiving roughly $20 million over five years. One of these, the NSF AI Institute for Artificial Intelligence and Fundamental Interactions (IAIFI), will be led by MIT’s Laboratory for Nuclear Science (LNS) and become the intellectual home of more than 25 physics and AI senior researchers at MIT and Harvard, Northeastern, and Tufts universities. 

By merging research in physics and AI, the IAIFI seeks to tackle some of the most challenging problems in physics, including precision calculations of the structure of matter, gravitational-wave detection of merging black holes, and the extraction of new physical laws from noisy data.

“The goal of the IAIFI is to develop the next generation of AI technologies, based on the transformative idea that artificial intelligence can directly incorporate physics intelligence,” says Jesse Thaler, an associate professor of physics at MIT, LNS researcher, and IAIFI director.  “By fusing the ‘deep learning’ revolution with the time-tested strategies of ‘deep thinking’ in physics, we aim to gain a deeper understanding of our universe and of the principles underlying intelligence.”

IAIFI researchers say their approach will enable making groundbreaking physics discoveries, and advance AI more generally, through the development of novel AI approaches that incorporate first principles from fundamental physics.  

“Invoking the simple principle of translational symmetry — which in nature gives rise to conservation of momentum — led to dramatic improvements in image recognition,” says Mike Williams, an associate professor of physics at MIT, LNS researcher, and IAIFI deputy director. “We believe incorporating more complex physics principles will revolutionize how AI is used to study fundamental interactions, while simultaneously advancing the foundations of AI.”

In addition, a core element of the IAIFI mission is to transfer their technologies to the broader AI community.

“Recognizing the critical role of AI, NSF is investing in collaborative research and education hubs, such as the NSF IAIFI anchored at MIT, which will bring together academia, industry, and government to unearth profound discoveries and develop new capabilities,” says NSF Director Sethuraman Panchanathan. “Just as prior NSF investments enabled the breakthroughs that have given rise to today’s AI revolution, the awards being announced today will drive discovery and innovation that will sustain American leadership and competitiveness in AI for decades to come.”

Research in AI and fundamental interactions

Fundamental interactions are described by two pillars of modern physics: at short distances by the Standard Model of particle physics, and at long distances by the Lambda Cold Dark Matter model of Big Bang cosmology. Both models are based on physical first principles such as causality and space-time symmetries.  An abundance of experimental evidence supports these theories, but also exposes where they are incomplete, most pressingly that the Standard Model does not explain the nature of dark matter, which plays an essential role in cosmology.

AI has the potential to help answer these questions and others in physics.

For many physics problems, the governing equations that encode the fundamental physical laws are known. However, undertaking key calculations within these frameworks, as is essential to test our understanding of the universe and guide physics discovery, can be computationally demanding or even intractable. IAIFI researchers are developing AI for such first-principles theory studies, which naturally require AI approaches that rigorously encode physics knowledge. 

“My group is developing new provably exact algorithms for theoretical nuclear physics,” says Phiala Shanahan, an assistant professor of physics and LNS researcher at MIT. “Our first-principles approach turns out to have applications in other areas of science and even in robotics, leading to exciting collaborations with industry partners.”

Incorporating physics principles into AI could also have a major impact on many experimental applications, such as designing AI methods that are more easily verifiable. IAIFI researchers are working to enhance the scientific potential of various facilities, including the Large Hadron Collider (LHC) and the Laser Interferometer Gravity Wave Observatory (LIGO). 

“Gravitational-wave detectors are among the most sensitive instruments on Earth, but the computational systems used to operate them are mostly based on technology from the previous century,” says Principal Research Scientist Lisa Barsotti of the MIT Kavli Institute for Astrophysics and Space Research. “We have only begun to scratch the surface of what can be done with AI; just enough to see that the IAIFI will be a game-changer.”

The unique features of these physics applications also offer compelling research opportunities in AI more broadly. For example, physics-informed architectures and hardware development could lead to advances in the speed of AI algorithms, and work in statistical physics is providing a theoretical foundation for understanding AI dynamics. 

“Physics has inspired many time-tested ideas in machine learning: maximizing entropy, Boltzmann machines, and variational inference, to name a few,” says Pulkit Agrawal, an assistant professor of electrical engineering and computer science at MIT, and researcher in the Computer Science and Artificial Intelligence Laboratory (CSAIL). “We believe that close interaction between physics and AI researchers will be the catalyst that leads to the next generation of machine learning algorithms.” 

Cultivating early-career talent

AI technologies are advancing rapidly, making it both important and challenging to train junior researchers at the intersection of physics and AI. The IAIFI aims to recruit and train a talented and diverse group of early-career researchers, including at the postdoc level through its IAIFI Fellows Program.  

“By offering our fellows their choice of research problems, and the chance to focus on cutting-edge challenges in physics and AI, we will prepare many talented young scientists to become future leaders in both academia and industry,” says MIT professor of physics Marin Soljacic of the Research Laboratory of Electronics (RLE). 

IAIFI researchers hope these fellows will spark interdisciplinary and multi-investigator collaborations, generate new ideas and approaches, translate physics challenges beyond their native domains, and help develop a common language across disciplines. Applications for the inaugural IAIFI fellows are due in mid-October. 

Another related effort spearheaded by Thaler, Williams, and Alexander Rakhlin, an associate professor of brain and cognitive science at MIT and researcher in the Institute for Data, Systems, and Society (IDSS), is the development of a new interdisciplinary PhD program in physics, statistics, and data science, a collaborative effort between the Department of Physics and the Statistics and Data Science Center.

“Statistics and data science are among the foundational pillars of AI. Physics joining the interdisciplinary doctoral program will bring forth new ideas and areas of exploration, while fostering a new generation of leaders at the intersection of physics, statistics, and AI,” says Rakhlin.  

Education, outreach, and partnerships 

The IAIFI aims to cultivate “human intelligence” by promoting education and outreach. For example, IAIFI members will contribute to establishing a MicroMasters degree program at MIT for students from non-traditional backgrounds.    

“We will increase the number of students in both physics and AI from underrepresented groups by providing fellowships for the MicroMasters program,” says Isaac Chuang, professor of physics and electrical engineering, senior associate dean for digital learning, and RLE researcher at MIT. “We also plan on working with undergraduate MIT Summer Research Program students, to introduce them to the tools of physics and AI research that they might not have access to at their home institutions.”

The IAIFI plans to expand its impact via numerous outreach efforts, including a K-12 program in which students are given data from the LHC and LIGO and tasked with rediscovering the Higgs boson and gravitational waves. 

“After confirming these recent Nobel Prizes, we can ask the students to find tiny artificial signals embedded in the data using AI and fundamental physics principles,” says assistant professor of physics Phil Harris, an LNS researcher at MIT. “With projects like this, we hope to disseminate knowledge about — and enthusiasm for — physics, AI, and their intersection.”

In addition, the IAIFI will collaborate with industry and government to advance the frontiers of both AI and physics, as well as societal sectors that stand to benefit from AI innovation. IAIFI members already have many active collaborations with industry partners, including DeepMind, Microsoft Research, and Amazon. 

“We will tackle two of the greatest mysteries of science: how our universe works and how intelligence works,” says MIT professor of physics Max Tegmark, an MIT Kavli Institute researcher. “Our key strategy is to link them, using physics to improve AI and AI to improve physics. We’re delighted that the NSF is investing the vital seed funding needed to launch this exciting effort.”

Building new connections at MIT and beyond

Leveraging MIT’s culture of collaboration, the IAIFI aims to generate new connections and to strengthen existing ones across MIT and beyond.

Of the 27 current IAIFI senior investigators, 16 are at MIT and members of the LNS, RLE, MIT Kavli Institute, CSAIL, and IDSS. In addition, IAIFI investigators are members of related NSF-supported efforts at MIT, such as the Center for Brains, Minds, and Machines within the McGovern Institute for Brain Research and the MIT-Harvard Center for Ultracold Atoms.  

“We expect a lot of creative synergies as we bring physics and computer science together to study AI,” says Bill Freeman, the Thomas and Gerd Perkins Professor of Electrical Engineering and Computer Science and researcher in CSAIL. “I’m excited to work with my physics colleagues on topics that bridge these fields.”

More broadly, the IAIFI aims to make Cambridge, Massachusetts, and the surrounding Boston area a hub for collaborative efforts to advance both physics and AI. 

“As we teach in 8.01 and 8.02, part of what makes physics so powerful is that it provides a universal language that can be applied to a wide range of scientific problems,” says Thaler. “Through the IAIFI, we will create a common language that transcends the intellectual borders between physics and AI to facilitate groundbreaking discoveries.”

Read More

Real-time data for a better response to disease outbreaks

Kinsa was founded by MIT alumnus Inder Singh MBA ’06, SM ’07 in 2012, with the mission of collecting information about when and where infectious diseases are spreading in real-time. Today the company is fulfilling that mission along several fronts.

It starts with families. More than 1.5 million of Kinsa’s “smart” thermometers have been sold or given away across the country, including hundreds of thousands to families from low-income school districts. The thermometers link to an app that helps users decide if they should seek medical attention based on age, fever, and symptoms.

At the community level, the data generated by the thermometers are anonymized and aggregated, and can be shared with parents and school officials, helping them understand what illnesses are going around and prevent the spread of disease in classrooms.

By working with over 2,000 schools to date in addition to many businesses, Kinsa has also developed predictive models that can forecast flu seasons each year. In the spring of this year, the company showed it could predict flu spread 12-20 weeks in advance at the city level.

The milestone prepared Kinsa for its most profound scale-up yet. When Covid-19 came to the U.S., the company was able to estimate its spread in real-time by tracking fever levels above what would normally be expected. Now Kinsa is working with health officials in five states and three cities to help contain and control the virus.

“By the time the CDC [U.S. Centers for Disease Control] gets the data, it has been processed, deidentified, and people have entered the health system to see a doctor,” say Singh, who is Kinsa’s CEO as well as its founder. “There’s a huge delay from when someone contracts an illness and when they see a doctor. The current health care system only sees the latter; we see the former.”

Today Kinsa finds itself playing a central role in America’s Covid-19 response. In addition to its local partnerships, the company has become a central information hub for the public, media, and researchers with its Healthweather tool, which maps unusual rates of fevers — among the most common symptom of Covid-19 — to help visualize the prevalence of illness in communities.

Singh says Kinsa’s data complement other methods of containing the virus like testing, contact tracing, and the use of face masks.

Better data for better responses

Singh’s first exposure to MIT came while he was attending the Harvard University Kennedy School of Government as a graduate student.

“I remember I interacted with some MIT undergrads, we brainstormed some social-impact ideas,” Singh recalls. “A week later I got an email from them saying they’d prototyped what we were talking about. I was like, ‘You prototyped what we talked about in a week!?’ I was blown away, and it was an insight into how MIT is such a do-er campus. It was so entrepreneurial. I was like, ‘I want to do that.’”

Soon Singh enrolled in the Harvard-MIT Program in Health Sciences and Technology, an interdisciplinary program where Singh earned his master’s and MBA degrees while working with leading research hospitals in the area. The program also set him on a course to improve the way we respond to infectious disease.

Following his graduation, he joined the Clinton Health Access Initiative (CHAI), where he brokered deals between pharmaceutical companies and low-resource countries to lower the cost of medicines for HIV, malaria, and tuberculosis. Singh described CHAI as a dream job, but it opened his eyes to several shortcomings in the global health system.

“The world tries to curb the spread of infectious illness with almost zero real-time information about when and where disease is spreading,” Singh says. “The question I posed to start Kinsa was ‘how do you stop the next outbreak before it becomes an epidemic if you don’t know where and when it’s starting and how fast it’s spreading’?”

Kinsa was started in 2012 with the insight that better data were needed to control infectious diseases. In order to get that data, the company needed a new way of providing value to sick people and families.

“The behavior in the home when someone gets sick is to grab the thermometer,” Singh says. “We piggy-backed off of that to create a communication channel to the sick, to help them get better faster.”

Kinsa started by selling its thermometers and creating a sponsorship program for corporate donors to fund thermometer donations to Title 1 schools, which serve high numbers of economically disadvantaged students. Singh says 40 percent of families that receive a Kinsa thermometer through that program did not previously have any thermometer in their house.

The company says its program has been shown to help schools improve attendance, and has yielded years of real-time data on fever rates to help compare to official estimates and develop its models.

“We had been forecasting flu incidence accurately several weeks out for years, and right around early 2020, we had a massive breakthrough,” Singh recalls. “We showed we could predict flu 12 to 20 weeks out — then March hit. We said, let’s try to remove the fever levels associated with cold and flu from our observed real time signal. What’s left over is unusual fevers, and we saw hotspots across the country. We observed six years of data and there’d been hot spots, but nothing like we were seeing in early March.”

The company quickly made their real-time data available to the public, and on March 14, Singh got on a call with the former New York State health commissioner, the former head of the U.S. Food and Drug Administration, and the man responsible for Taiwan’s successful Covid-19 response.

“I said, ‘There’s hotspots everywhere,” Singh recalls. “They’re in New York, around the Northeast, Texas, Michigan. They said, ‘This is interesting, but it doesn’t look credible because we’re not seeing case reports of Covid-19.’ Low and behold, days and weeks later, we saw the Covid cases start building up.”

A tool against Covid-19

Singh says Kinsa’s data provide an unprecedented look into the way a disease is spreading through a community.

“We can predict the entire incidence curve [of flu season] on a city-by-city basis,” Singh says. “The next best model is [about] three weeks out, at a multistate level. It’s not because we’re smarter than others; it’s because we have better data. We found a way to communicate with someone consistently when they’ve just fallen ill.”

Kinsa has been working with health departments and research groups around the country to help them interpret the company’s data and react to early warnings of Covid-19’s spread. It’s also helping companies around the country as they begin bringing employees back to offices.

Now Kinsa is working on expanding its international presence to help curb infectious diseases on multiple fronts around the world, just like it’s doing in the U.S. The company’s progress promises to help authorities monitor diseases long after Covid-19.

“I started Kinsa to create a global, real-time outbreak monitoring and detection system, and now we have predictive power beyond that,” Singh says. “When you know where and when symptoms are starting and how fast their spreading, you can empower local individuals, families, communities, and governments.”

Read More

Rewriting the rules of machine-generated art

Horses don’t normally wear hats, and deep generative models, or GANs, don’t normally follow rules laid out by human programmers. But a new tool developed at MIT lets anyone go into a GAN and tell the model, like a coder, to put hats on the heads of the horses it draws. 

In a new study appearing at the European Conference on Computer Vision this month, researchers show that the deep layers of neural networks can be edited, like so many lines of code, to generate surprising images no one has seen before.

“GANs are incredible artists, but they’re confined to imitating the data they see,” says the study’s lead author, David Bau, a PhD student at MIT. “If we can rewrite the rules of a GAN directly, the only limit is human imagination.”

Generative adversarial networks, or GANs, pit two neural networks against each other to create hyper-realistic images and sounds. One neural network, the generator, learns to mimic the faces it sees in photos, or the words it hears spoken. A second network, the discriminator, compares the generator’s outputs to the original. The generator then iteratively builds on the discriminator’s feedback until its fabricated images and sounds are convincing enough to pass for real.

GANs have captivated artificial intelligence researchers for their ability to create representations that are stunningly lifelike and, at times, deeply bizarre, from a receding cat that melts into a pile of fur to a wedding dress standing in a church door as if abandoned by the bride. Like most deep learning models, GANs depend on massive datasets to learn from. The more examples they see, the better they get at mimicking them. 

But the new study suggests that big datasets are not essential. If you understand how a model is wired, says Bau, you can edit the numerical weights in its layers to get the behavior you desire, even if no literal example exists. No dataset? No problem. Just create your own.

“We’re like prisoners to our training data,” he says. “GANs only learn patterns that are already in our data. But here I can manipulate a condition in the model to create horses with hats. It’s like editing a genetic sequence to create something entirely new, like inserting the DNA of a firefly into a plant to make it glow in the dark.”

Bau was a software engineer at Google, and had led the development of Google Hangouts and Google Image Search, when he decided to go back to school. The field of deep learning was exploding and he wanted to pursue foundational questions in computer science. Hoping to learn how to build transparent systems that would empower users, he joined the lab of MIT Professor Antonio Torralba. There, he began probing deep nets and their millions of mathematical operations to understand how they represent the world.

Bau showed that you could slice into a GAN, like layer cake, to isolate the artificial neurons that had learned to draw a particular feature, like a tree, and switch them off to make the tree disappear. With this insight, Bau helped create GANPaint, a tool that lets users add and remove features like doors and clouds from a picture. In the process, he discovered that GANs have a stubborn streak: they wouldn’t let you draw doors in the sky.

“It had some rule that seemed to say, ‘doors don’t go there,’” he says. “That’s fascinating, we thought. It’s like an ‘if’ statement in a program. To me, it was a clear signal that the network had some kind of inner logic.”

Over several sleepless nights, Bau ran experiments and picked through the layers of his models for the equivalent of a conditional statement. Finally, it dawned on him. “The neural network has different memory banks that function as a set of general rules, relating one set of learned patterns to another,” he says. “I realized that if you could identify one line of memory, you could write a new memory into it.” 

In a short version of his ECCV talk, Bau demonstrates how to edit the model and rewrite memories using an intuitive interface he designed. He copies a tree from one image and pastes it into another, placing it, improbably, on a building tower. The model then churns out enough pictures of tree-sprouting towers to fill a family photo album. With a few more clicks, Bau transfers hats from human riders to their horses, and wipes away a reflection of light from a kitchen countertop.

The researchers hypothesize that each layer of a deep net acts as an associative memory, formed after repeated exposure to similar examples. Fed enough pictures of doors and clouds, for example, the model learns that doors are entryways to buildings, and clouds float in the sky. The model effectively memorizes a set of rules for understanding the world.

The effect is especially striking when GANs manipulate light. When GANPaint added windows to a room, for example, the model automatically added nearby reflections. It’s as if the model had an intuitive grasp of physics and how light should behave on object surfaces. “Even this relationship suggests that associations learned from data can be stored as lines of memory, and not only located but reversed,” says Torralba, the study’s senior author. 

GAN editing has its limitations. It’s not easy to identify all of the neurons corresponding to objects and animals the model renders, the researchers say. Some rules also appear edit-proof; some changes the researchers tried to make failed to execute.

Still, the tool has immediate applications in computer graphics, where GANs are widely studied, and in training expert AI systems to recognize rare features and events through data augmentation. The tool also brings researchers closer to understanding how GANs learn visual concepts with minimal human guidance. If the models learn by imitating what they see, forming associations in the process, they may be a springboard for new kinds of machine learning applications. 

The study’s other authors are Steven Liu, Tongzhu Wang, and Jun-Yan Zhu.

Read More

Data systems that learn to be better

Big data has gotten really, really big: By 2025, all the world’s data will add up to an estimated 175 trillion gigabytes. For a visual, if you stored that amount of data on DVDs, it would stack up tall enough to circle the Earth 222 times. 

One of the biggest challenges in computing is handling this onslaught of information while still being able to efficiently store and process it. A team from MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL) believe that the answer rests with something called “instance-optimized systems.”  

Traditional storage and database systems are designed to work for a wide range of applications because of how long it can take to build them — months or, often, several years. As a result, for any given workload such systems provide performance that is good, but usually not the best. Even worse, they sometimes require administrators to painstakingly tune the system by hand to provide even reasonable performance. 

In contrast, the goal of instance-optimized systems is to build systems that optimize and partially re-organize themselves for the data they store and the workload they serve. 

“It’s like building a database system for every application from scratch, which is not economically feasible with traditional system designs,” says MIT Professor Tim Kraska. 

As a first step toward this vision, Kraska and colleagues developed Tsunami and Bao. Tsunami uses machine learning to automatically re-organize a dataset’s storage layout based on the types of queries that its users make. Tests show that it can run queries up to 10 times faster than state-of-the-art systems. What’s more, its datasets can be organized via a series of “learned indexes” that are up to 100 times smaller than the indexes used in traditional systems. 

Kraska has been exploring the topic of learned indexes for several years, going back to his influential work with colleagues at Google in 2017. 

Harvard University Professor Stratos Idreos, who was not involved in the Tsunami project, says that a unique advantage of learned indexes is their small size, which, in addition to space savings, brings substantial performance improvements.

“I think this line of work is a paradigm shift that’s going to impact system design long-term,” says Idreos. “I expect approaches based on models will be one of the core components at the heart of a new wave of adaptive systems.”

Bao, meanwhile, focuses on improving the efficiency of query optimization through machine learning. A query optimizer rewrites a high-level declarative query to a query plan, which can actually be executed over the data to compute the result to the query. However, often there exists more than one query plan to answer any query; picking the wrong one can cause a query to take days to compute the answer, rather than seconds. 

Traditional query optimizers take years to build, are very hard to maintain, and, most importantly, do not learn from their mistakes. Bao is the first learning-based approach to query optimization that has been fully integrated into the popular database management system PostgreSQL. Lead author Ryan Marcus, a postdoc in Kraska’s group, says that Bao produces query plans that run up to 50 percent faster than those created by the PostgreSQL optimizer, meaning that it could help to significantly reduce the cost of cloud services, like Amazon’s Redshift, that are based on PostgreSQL.

By fusing the two systems together, Kraska hopes to build the first instance-optimized database system that can provide the best possible performance for each individual application without any manual tuning. 

The goal is to not only relieve developers from the daunting and laborious process of tuning database systems, but to also provide performance and cost benefits that are not possible with traditional systems.

Traditionally, the systems we use to store data are limited to only a few storage options and, because of it, they cannot provide the best possible performance for a given application. What Tsunami can do is dynamically change the structure of the data storage based on the kinds of queries that it receives and create new ways to store data, which are not feasible with more traditional approaches.

Johannes Gehrke, a managing director at Microsoft Research who also heads up machine learning efforts for Microsoft Teams, says that his work opens up many interesting applications, such as doing so-called “multidimensional queries” in main-memory data warehouses. Harvard’s Idreos also expects the project to spur further work on how to maintain the good performance of such systems when new data and new kinds of queries arrive.

Bao is short for “bandit optimizer,” a play on words related to the so-called “multi-armed bandit” analogy where a gambler tries to maximize their winnings at multiple slot machines that have different rates of return. The multi-armed bandit problem is commonly found in any situation that has tradeoffs between exploring multiple different options, versus exploiting a single option — from risk optimization to A/B testing.

“Query optimizers have been around for years, but they often make mistakes, and usually they don’t learn from them,” says Kraska. “That’s where we feel that our system can make key breakthroughs, as it can quickly learn for the given data and workload what query plans to use and which ones to avoid.”

Kraska says that in contrast to other learning-based approaches to query optimization, Bao learns much faster and can outperform open-source and commercial optimizers with as little as one hour of training time.In the future, his team aims to integrate Bao into cloud systems to improve resource utilization in environments where disk, RAM, and CPU time are scarce resources.

“Our hope is that a system like this will enable much faster query times, and that people will be able to answer questions they hadn’t been able to answer before,” says Kraska.

A related paper about Tsunami was co-written by Kraska, PhD students Jialin Ding and Vikram Nathan, and MIT Professor Mohammad Alizadeh. A paper about Bao was co-written by Kraska, Marcus, PhD students Parimarjan Negi and Hongzi Mao, visiting scientist Nesime Tatbul, and Alizadeh.

The work was done as part of the Data System and AI Lab (DSAIL@CSAIL), which is sponsored by Intel, Google, Microsoft, and the U.S. National Science Foundation. 

Read More

Shrinking deep learning’s carbon footprint

In June, OpenAI unveiled the largest language model in the world, a text-generating tool called GPT-3 that can write creative fiction, translate legalese into plain English, and answer obscure trivia questions. It’s the latest feat of intelligence achieved by deep learning, a machine learning method patterned after the way neurons in the brain process and store information.

But it came at a hefty price: at least $4.6 million and 355 years in computing time, assuming the model was trained on a standard neural network chip, or GPU. The model’s colossal size — 1,000 times larger than a typical language model — is the main factor in its high cost.

“You have to throw a lot more computation at something to get a little improvement in performance,” says Neil Thompson, an MIT researcher who has tracked deep learning’s unquenchable thirst for computing. “It’s unsustainable. We have to find more efficient ways to scale deep learning or develop other technologies.”

Some of the excitement over AI’s recent progress has shifted to alarm. In a study last year, researchers at the University of Massachusetts at Amherst estimated that training a large deep-learning model produces 626,000 pounds of planet-warming carbon dioxide, equal to the lifetime emissions of five cars. As models grow bigger, their demand for computing is outpacing improvements in hardware efficiency. Chips specialized for neural-network processing, like GPUs (graphics processing units) and TPUs (tensor processing units), have offset the demand for more computing, but not by enough. 

“We need to rethink the entire stack — from software to hardware,” says Aude Oliva, MIT director of the MIT-IBM Watson AI Lab and co-director of the MIT Quest for Intelligence. “Deep learning has made the recent AI revolution possible, but its growing cost in energy and carbon emissions is untenable.”

Computational limits have dogged neural networks from their earliest incarnation — the perceptron — in the 1950s. As computing power exploded, and the internet unleashed a tsunami of data, they evolved into powerful engines for pattern recognition and prediction. But each new milestone brought an explosion in cost, as data-hungry models demanded increased computation. GPT-3, for example, trained on half a trillion words and ballooned to 175 billion parameters — the mathematical operations, or weights, that tie the model together — making it 100 times bigger than its predecessor, itself just a year old.

In work posted on the pre-print server arXiv, Thompson and his colleagues show that the ability of deep learning models to surpass key benchmarks tracks their nearly exponential rise in computing power use. (Like others seeking to track AI’s carbon footprint, the team had to guess at many models’ energy consumption due to a lack of reporting requirements). At this rate, the researchers argue, deep nets will survive only if they, and the hardware they run on, become radically more efficient.

Toward leaner, greener algorithms

The human perceptual system is extremely efficient at using data. Researchers have borrowed this idea for recognizing actions in video and in real life to make models more compact. In a paper at the European Conference on Computer Vision (ECCV) in August, researchers at the MIT-IBM Watson AI Lab describe a method for unpacking a scene from a few glances, as humans do, by cherry-picking the most relevant data.

Take a video clip of someone making a sandwich. Under the method outlined in the paper, a policy network strategically picks frames of the knife slicing through roast beef, and meat being stacked on a slice of bread, to represent at high resolution. Less-relevant frames are skipped over or represented at lower resolution. A second model then uses the abbreviated CliffsNotes version of the movie to label it “making a sandwich.” The approach leads to faster video classification at half the computational cost as the next-best model, the researchers say.

“Humans don’t pay attention to every last detail — why should our models?” says the study’s senior author, Rogerio Feris, research manager at the MIT-IBM Watson AI Lab. “We can use machine learning to adaptively select the right data, at the right level of detail, to make deep learning models more efficient.”

In a complementary approach, researchers are using deep learning itself to design more economical models through an automated process known as neural architecture search. Song Han, an assistant professor at MIT, has used automated search to design models with fewer weights, for language understanding and scene recognition, where picking out looming obstacles quickly is acutely important in driving applications. 

In a paper at ECCV, Han and his colleagues propose a model architecture for three-dimensional scene recognition that can spot safety-critical details like road signs, pedestrians, and cyclists with relatively less computation. They used an evolutionary-search algorithm to evaluate 1,000 architectures before settling on a model they say is three times faster and uses eight times less computation than the next-best method. 

In another recent paper, they use evolutionary search within an augmented designed space to find the most efficient architectures for machine translation on a specific device, be it a GPU, smartphone, or tiny Raspberry Pi. Separating the search and training process leads to huge reductions in computation, they say.

In a third approach, researchers are probing the essence of deep nets to see if it might be possible to train a small part of even hyper-efficient networks like those above. Under their proposed lottery ticket hypothesis, PhD student Jonathan Frankle and MIT Professor Michael Carbin proposed that within each model lies a tiny subnetwork that could have been trained in isolation with as few as one-tenth as many weights — what they call a “winning ticket.” 

They showed that an algorithm could retroactively find these winning subnetworks in small image-classification models. Now, in a paper at the International Conference on Machine Learning (ICML), they show that the algorithm finds winning tickets in large models, too; the models just need to be rewound to an early, critical point in training when the order of the training data no longer influences the training outcome. 

In less than two years, the lottery ticket idea has been cited more than more than 400 times, including by Facebook researcher Ari Morcos, who has shown that winning tickets can be transferred from one vision task to another, and that winning tickets exist in language and reinforcement learning models, too. 

“The standard explanation for why we need such large networks is that overparameterization aids the learning process,” says Morcos. “The lottery ticket hypothesis disproves that — it’s all about finding an appropriate starting point. The big downside, of course, is that, currently, finding these ‘winning’ starting points requires training the full overparameterized network anyway.”

Frankle says he’s hopeful that an efficient way to find winning tickets will be found. In the meantime, recycling those winning tickets, as Morcos suggests, could lead to big savings.

Hardware designed for efficient deep net algorithms

As deep nets push classical computers to the limit, researchers are pursuing alternatives, from optical computers that transmit and store data with photons instead of electrons, to quantum computers, which have the potential to increase computing power exponentially by representing data in multiple states at once.

Until a new paradigm emerges, researchers have focused on adapting the modern chip to the demands of deep learning. The trend began with the discovery that video-game graphical chips, or GPUs, could turbocharge deep-net training with their ability to perform massively parallelized matrix computations. GPUs are now one of the workhorses of modern AI, and have spawned new ideas for boosting deep net efficiency through specialized hardware. 

Much of this work hinges on finding ways to store and reuse data locally, across the chip’s processing cores, rather than waste time and energy shuttling data to and from a designated memory site. Processing data locally not only speeds up model training but improves inference, allowing deep learning applications to run more smoothly on smartphones and other mobile devices.

Vivienne Sze, a professor at MIT, has literally written the book on efficient deep nets. In collaboration with book co-author Joel Emer, an MIT professor and researcher at NVIDIA, Sze has designed a chip that’s flexible enough to process the widely-varying shapes of both large and small deep learning models. Called Eyeriss 2, the chip uses 10 times less energy than a mobile GPU.

Its versatility lies in its on-chip network, called a hierarchical mesh, that adaptively reuses data and adjusts to the bandwidth requirements of different deep learning models. After reading from memory, it reuses the data across as many processing elements as possible to minimize data transportation costs and maintain high throughput. 

“The goal is to translate small and sparse networks into energy savings and fast inference,” says Sze. “But the hardware should be flexible enough to also efficiently support large and dense deep neural networks.”

Other hardware innovators are focused on reproducing the brain’s energy efficiency. Former Go world champion Lee Sedol may have lost his title to a computer, but his performance was fueled by a mere 20 watts of power. AlphaGo, by contrast, burned an estimated megawatt of energy, or 500,000 times more.

Inspired by the brain’s frugality, researchers are experimenting with replacing the binary, on-off switch of classical transistors with analog devices that mimic the way that synapses in the brain grow stronger and weaker during learning and forgetting.

An electrochemical device, developed at MIT and recently published in Nature Communications, is modeled after the way resistance between two neurons grows or subsides as calcium, magnesium or potassium ions flow across the synaptic membrane dividing them. The device uses the flow of protons — the smallest and fastest ion in solid state — into and out of a crystalline lattice of tungsten trioxide to tune its resistance along a continuum, in an analog fashion.

“Even though is not yet optimized, it gets to the order of energy consumption per unit area per unit change in conductance that’s close to that in the brain,” says the study’s senior author, Bilge Yildiz, a professor at MIT.

Energy-efficient algorithms and hardware can shrink AI’s environmental impact. But there are other reasons to innovate, says Sze, listing them off: Efficiency will allow computing to move from data centers to edge devices like smartphones, making AI accessible to more people around the world; shifting computation from the cloud to personal devices reduces the flow, and potential leakage, of sensitive data; and processing data on the edge eliminates transmission costs, leading to faster inference with a shorter reaction time, which is key for interactive driving and augmented/virtual reality applications.

“For all of these reasons, we need to embrace efficient AI,” she says.

Read More

3 Questions: John Leonard on the future of autonomous vehicles

As part of the MIT Task Force on the Work of the Future’s new series of research briefs, Professor John Leonard teamed with professor of aeronautics and astronautics and of history David Mindell and with doctoral candidate Erik Stayton to explore the future of autonomous vehicles (AV) — an area that could arguably be called the touchstone for the discussion of jobs of the future in recent years. Leonard is the Samuel C. Collins Professor of Mechanical and Ocean Engineering in the Department of Mechanical Engineering, a member of the Computer Science and Artificial Intelligence Laboratory (CSAIL), and member of the MIT Task Force on the Work of the Future. His research addresses navigation and mapping for autonomous mobile robots operating in challenging environments. 

Their research brief, Autonomous Vehicles, Mobility, and Employment Policy: The Roads Ahead,” looks at how the AV transition will affect jobs and explores how sustained investments in workforce training for advanced mobility can help drivers and other mobility workers transition into new careers that support mobility systems and technologies. It also highlights the policies that will greatly ease the integration of automated systems into urban mobility systems, including investing in local and national infrastructure, and forming public-private partnerships. Leonard spoke recently on some of the findings in the brief.

Q: When would you say Level 4 autonomous vehicle systems — those that can operate without active supervision by a human driver — increase their area of operation beyond today’s limited local deployments?

A: The widespread deployment of Level 4 automated vehicles will take much longer than many have predicted — at least a decade for favorable environments, and possibly much longer. Despite substantial recent progress by the community, major challenges remain before we will see the disruptive rollout of fully automated driving systems that have no safety driver onboard over large areas. Expansion will likely be gradual, and will happen region-by-region in specific categories of transportation, resulting in wide variations in availability across the country. The key question is not just “when,” but “where” will the technology be available and profitable?

Driver assistance and active safety systems (known as Level 2 automation) will continue to become more widespread on personal vehicles. These systems, however, will have limited impacts on jobs, since a human driver must be on board and ready to intervene at any moment. Level 3 systems can operate without active engagement by the driver for certain geographic settings, so long as the driver is ready to intervene when requested; however, these systems will likely be restricted to low-speed traffic.

Impacts on trucking are also expected to be less than many have predicted, due to technological challenges and risks that remain, even for more structured highway environments.

Q: In the brief, you make the argument that AV transition, while threatening numerous jobs, will not be “jobless.” Can you explain?  What are the likely impacts to mobility jobs — including transit, vehicle sales, vehicle maintenance, delivery, and other related industries?

A: The longer rollout time for Level 4 autonomy provides time for sustained investments in workforce training that can help drivers and other mobility workers transition into new careers that support mobility systems and technologies. Transitioning from current-day driving jobs to these jobs represents potential pathways for employment, so long as job-training resources are available. Because the geographical rollout of Level 4 automated driving is expected to be slow, human workers will remain essential to the operation of these systems for the foreseeable future, in roles that are both old and new. 

In some cases, Level 4 remote driving systems could move driving jobs from vehicles to fixed-location centers, but these might represent a step down in job quality for many professional drivers. The skills required for these jobs is largely unknown, but they are likely to be a combination of call-center, dispatcher, technician, and maintenance roles with strong language skills. More advanced engineering roles could also be sources of good jobs if automated taxi fleets are deployed at scale, but will require strong technical training that may be out of reach for many. 

Increasing availability of Level 2 and Level 3 systems will result in changes in the nature of work for professional drivers, but do not necessarily impact job numbers to the extent that other systems might, because these systems do not remove drivers from vehicles. 

While the employment implications of widespread Level 4 automation in trucking could eventually be considerable, as with other domains, the rollout is expected to be gradual. Truck drivers do more than just drive, and so human presence within even highly automated trucks would remain valuable for other reasons such as loading, unloading, and maintenance. Human-autonomous truck platooning, in which multiple Level 4 trucks follow a human-driven lead truck, may be more viable than completely operator-free Level 4 operations in the near term.  

Q: How should we prepare policy in the three key areas of infrastructure, jobs, and innovation? 

A: Policymakers can act now to prepare for and minimize disruptions to the millions of jobs in ground transportation and related industries that may come in the future, while also fostering greater economic opportunity and mitigating environmental impacts by building safe and accessible mobility systems. Investing in local and national infrastructure, and forming public-private partnerships, will greatly ease integration of automated systems into urban mobility systems.  

Automated vehicles should be thought of as one element in a mobility mix, and as a potential feeder for public transit rather than a replacement for it, but unintended consequences such as increased congestion remain risks. The crucial role of public transit for connecting workers to workplaces will endure: the future of work depends in large part on how people get to work.

Policy recommendations in the trucking sector include strengthening career pathways for drivers, increasing labor standards and worker protections, advancing public safety, creating good jobs via human-led truck platooning, and promoting safe and electric trucks.

Read More