Q&A: Chris Rackauckas on the equations at the heart of practically everything

Some people pass the time with hobbies like crossword puzzles or Sudoku. When Chris Rackauckas has a spare moment, he often uses it to answer questions about numerical differential equations that people have posed online. Rackauckas — previously an MIT applied mathematics instructor, now an MIT Computer Science and Artificial Intelligence Laboratory (CSAIL) research affiliate and the co-principal investigator of the MIT Julia Lab — has already posted thousands of these answers, and if you have a question, the odds are that he has already addressed it. His research, unsurprisingly, revolves around differential equations and on computational methods — using AI and other techniques — to solve them quickly and efficiently.

During his graduate studies in mathematics at the University of California at Irvine, which earned him a PhD in 2018, Rackauckas focused on medical and pharmacological applications of his work. In fact, he developed the core software and techniques for Pumas-AI — a Baltimore-based firm that provides software for pharmaceutical modeling and simulation purposes — when he was still a graduate student. He now serves as the company’s director of scientific research.

Since coming to MIT in 2019, Rackauckas has found a much wider range of applications for his “accelerated” differential equation solvers, including global climate modeling and building heating, ventilation, and air conditioning (HVAC) systems. He took time from his efforts to find ever-more rapid ways of attacking differential equations to talk about this work, which has earned him numerous honors, including the 2020 United States Air Force Artificial Intelligence Accelerator Scientific Excellence Award.

Q: How did you get into what you’re doing today?

A: As an undergraduate math major at Oberlin College, I mostly focused on the “methods courses” in scientific domains — statistical methods in psychology, time series econometrics, computational modeling in physics, and so forth. I didn’t have a well-thought-out game plan. I just wanted to understand how science is really done and how we know when our scientific approaches are giving us a correct model of a given system. Fortuitously, that path turned out to be a good one for someone in my current line of work.

In graduate school, I went into biology — specifically combining differential equation solvers with systems biology. The goal there was to make predictive models of how the randomness of a chemical, and its concentration, changes in the body, although at the time I was working with zebra fish. It turns out that systems biology is very close to systems pharmacology. You basically replace fish with humans.

Q: Why are differential equations so important in the world around us?

A: The way I like to describe it is that all scientific experiments are measuring how something changes. How do I go from an understanding of how things change to a prediction of what will happen? That’s what the process of solving a differential equation is all about. Simulations, which are experiments that we carry out on computers, can involve solving thousands upon thousands of differential equations.

Such a simulation might tell you, for instance, not only how a drug concentration changes over time but also how the effects of the drug on the body changes. It’s not the same for every person, so you have to adapt the equations for individuals, depending on their age, weight, etc.

Q: Given your focus on “accelerated” equation solvers, where can you find the best opportunities for speeding things up?

A: The clinical trials for a new drug have a set period of time; you can’t just make the human element faster. But in the preclinical domain, there’s always a period of analysis. Developing a new drug could cost $10 billion, so before you start something like that, you want to know the probability that a drug will work on its target population, as well as the optimal dose for an individual. That’s the purpose of preclinical analysis and quantitative systems pharmacology. Suppose that you typically spend three months on analysis and six months on clinical trials. If you can shorten that analysis from three months to a day — roughly a 100-fold acceleration — you will have cut the time to release a drug by a third.

Then there’s clinical pharmacology, where if you can understand how to get the first dose correct you might be able to save time on repeating elements of the trials. It turns out that my Pumas colleagues and I have already achieved a 175-fold acceleration in preclinical analyses carried out for Pfizer. Moderna also publicly used Pumas and our clinical analysis methods in its clinical analysis of the Covid-19 vaccine and other drugs.

Here’s another opportunity for time and cost savings: Mitsubishi has a facility in Japan for testing HVAC systems. You have to build the entire system and then test it in a building. Each experiment can cost millions of dollars. We’re now working with them to test out, say, 10 different ideas on a computer in order to pick out the one out of those 10 options that they ought to select for a prototype and subsequent experiments.

Q: Can you discuss some other examples of how your work is used?

A: The SciML.ai website keeps a (woefully incomplete) showcase of the amazing ways people have used these methods. CliMA — an Earth system model developed by scientists at Caltech, MIT, and other institutions — relies on the differential equation solvers that I wrote. Recently I was at an applied math conference where a group, independent of me, reported that they had used my software tools to make NASA launch simulations run 15,000 times faster.

Q: What are your plans for the future?

A: There are a lot of things in the pipeline. One application I’ve just started to pursue is predicting the flow of wildfires; another is to predict transient cardiac events like heart attacks, strokes, and arrythmias. A third area I’m moving into is in the realm of neuropsychopharmacology — trying to predict things like the individualized biosignals in bipolar disorder, depression, and schizophrenia in order to design drugs that are better suited for treating these disorders. This is an area where there is a dire need that can lead to much more effective treatments.

In between these projects, I might take a moment to answer the odd question about differential equations. You’ve got to relax sometime.

Read More

Unpacking black-box models

Modern machine-learning models, such as neural networks, are often referred to as “black boxes” because they are so complex that even the researchers who design them can’t fully understand how they make predictions.

To provide some insights, researchers use explanation methods that seek to describe individual model decisions. For example, they may highlight words in a movie review that influenced the model’s decision that the review was positive.

But these explanation methods don’t do any good if humans can’t easily understand them, or even misunderstand them. So, MIT researchers created a mathematical framework to formally quantify and evaluate the understandability of explanations for machine-learning models. This can help pinpoint insights about model behavior that might be missed if the researcher is only evaluating a handful of individual explanations to try to understand the entire model.

“With this framework, we can have a very clear picture of not only what we know about the model from these local explanations, but more importantly what we don’t know about it,” says Yilun Zhou, an electrical engineering and computer science graduate student in the Computer Science and Artificial Intelligence Laboratory (CSAIL) and lead author of a paper presenting this framework.

Zhou’s co-authors include Marco Tulio Ribeiro, a senior researcher at Microsoft Research, and senior author Julie Shah, a professor of aeronautics and astronautics and the director of the Interactive Robotics Group in CSAIL. The research will be presented at the Conference of the North American Chapter of the Association for Computational Linguistics.

Understanding local explanations

One way to understand a machine-learning model is to find another model that mimics its predictions but uses transparent reasoning patterns. However, recent neural network models are so complex that this technique usually fails. Instead, researchers resort to using local explanations that focus on individual inputs. Often, these explanations highlight words in the text to signify their importance to one prediction made by the model.

Implicitly, people then generalize these local explanations to overall model behavior. Someone may see that a local explanation method highlighted positive words (like “memorable,” “flawless,” or “charming”) as being the most influential when the model decided a movie review had a positive sentiment. They are then likely to assume that all positive words make positive contributions to a model’s predictions, but that might not always be the case, Zhou says.

The researchers developed a framework, known as ExSum (short for explanation summary), that formalizes those types of claims into rules that can be tested using quantifiable metrics. ExSum evaluates a rule on an entire dataset, rather than just the single instance for which it is constructed.

Using a graphical user interface, an individual writes rules that can then be tweaked, tuned, and evaluated. For example, when studying a model that learns to classify movie reviews as positive or negative, one might write a rule that says “negation words have negative saliency,” which means that words like “not,” “no,” and “nothing” contribute negatively to the sentiment of movie reviews.

Using ExSum, the user can see if that rule holds up using three specific metrics: coverage, validity, and sharpness. Coverage measures how broadly applicable the rule is across the entire dataset. Validity highlights the percentage of individual examples that agree with the rule. Sharpness describes how precise the rule is; a highly valid rule could be so generic that it isn’t useful for understanding the model.

Testing assumptions

If a researcher seeks a deeper understanding of how her model is behaving, she can use ExSum to test specific assumptions, Zhou says.

If she suspects her model is discriminative in terms of gender, she could create rules to say that male pronouns have a positive contribution and female pronouns have a negative contribution. If these rules have high validity, it means they are true overall and the model is likely biased.

ExSum can also reveal unexpected information about a model’s behavior. For example, when evaluating the movie review classifier, the researchers were surprised to find that negative words tend to have more pointed and sharper contributions to the model’s decisions than positive words. This could be due to review writers trying to be polite and less blunt when criticizing a film, Zhou explains.

“To really confirm your understanding, you need to evaluate these claims much more rigorously on a lot of instances. This kind of understanding at this fine-grained level, to the best of our knowledge, has never been uncovered in previous works,” he says.

“Going from local explanations to global understanding was a big gap in the literature. ExSum is a good first step at filling that gap,” adds Ribeiro.

Extending the framework

In the future, Zhou hopes to build upon this work by extending the notion of understandability to other criteria and explanation forms, like counterfactual explanations (which indicate how to modify an input to change the model prediction). For now, they focused on feature attribution methods, which describe the individual features a model used to make a decision (like the words in a movie review).

In addition, he wants to further enhance the framework and user interface so people can create rules faster. Writing rules can require hours of human involvement — and some level of human involvement is crucial because humans must ultimately be able to grasp the explanations — but AI assistance could streamline the process.

As he ponders the future of ExSum, Zhou hopes their work highlights a need to shift the way researchers think about machine-learning model explanations.

“Before this work, if you have a correct local explanation, you are done. You have achieved the holy grail of explaining your model. We are proposing this additional dimension of making sure these explanations are understandable. Understandability needs to be another metric for evaluating our explanations,” says Zhou.

This research is supported, in part, by the National Science Foundation.

Read More

Artificial intelligence system learns concepts shared across video, audio, and text

Humans observe the world through a combination of different modalities, like vision, hearing, and our understanding of language. Machines, on the other hand, interpret the world through data that algorithms can process.

So, when a machine “sees” a photo, it must encode that photo into data it can use to perform a task like image classification. This process becomes more complicated when inputs come in multiple formats, like videos, audio clips, and images.

“The main challenge here is, how can a machine align those different modalities? As humans, this is easy for us. We see a car and then hear the sound of a car driving by, and we know these are the same thing. But for machine learning, it is not that straightforward,” says Alexander Liu, a graduate student in the Computer Science and Artificial Intelligence Laboratory (CSAIL) and first author of a paper tackling this problem. 

Liu and his collaborators developed an artificial intelligence technique that learns to represent data in a way that captures concepts which are shared between visual and audio modalities. For instance, their method can learn that the action of a baby crying in a video is related to the spoken word “crying” in an audio clip.

Using this knowledge, their machine-learning model can identify where a certain action is taking place in a video and label it.

It performs better than other machine-learning methods at cross-modal retrieval tasks, which involve finding a piece of data, like a video, that matches a user’s query given in another form, like spoken language. Their model also makes it easier for users to see why the machine thinks the video it retrieved matches their query.

This technique could someday be utilized to help robots learn about concepts in the world through perception, more like the way humans do.

Joining Liu on the paper are CSAIL postdoc SouYoung Jin; grad students Cheng-I Jeff Lai and Andrew Rouditchenko; Aude Oliva, senior research scientist in CSAIL and MIT director of the MIT-IBM Watson AI Lab; and senior author James Glass, senior research scientist and head of the Spoken Language Systems Group in CSAIL. The research will be presented at the Annual Meeting of the Association for Computational Linguistics.

Learning representations

The researchers focus their work on representation learning, which is a form of machine learning that seeks to transform input data to make it easier to perform a task like classification or prediction.

The representation learning model takes raw data, such as videos and their corresponding text captions, and encodes them by extracting features, or observations about objects and actions in the video. Then it maps those data points in a grid, known as an embedding space. The model clusters similar data together as single points in the grid. Each of these data points, or vectors, is represented by an individual word.

For instance, a video clip of a person juggling might be mapped to a vector labeled “juggling.”

The researchers constrain the model so it can only use 1,000 words to label vectors. The model can decide which actions or concepts it wants to encode into a single vector, but it can only use 1,000 vectors. The model chooses the words it thinks best represent the data.

Rather than encoding data from different modalities onto separate grids, their method employs a shared embedding space where two modalities can be encoded together. This enables the model to learn the relationship between representations from two modalities, like video that shows a person juggling and an audio recording of someone saying “juggling.”

To help the system process data from multiple modalities, they designed an algorithm that guides the machine to encode similar concepts into the same vector.

“If there is a video about pigs, the model might assign the word ‘pig’ to one of the 1,000 vectors. Then if the model hears someone saying the word ‘pig’ in an audio clip, it should still use the same vector to encode that,” Liu explains.

A better retriever

They tested the model on cross-modal retrieval tasks using three datasets: a video-text dataset with video clips and text captions, a video-audio dataset with video clips and spoken audio captions, and an image-audio dataset with images and spoken audio captions.

For example, in the video-audio dataset, the model chose 1,000 words to represent the actions in the videos. Then, when the researchers fed it audio queries, the model tried to find the clip that best matched those spoken words.

“Just like a Google search, you type in some text and the machine tries to tell you the most relevant things you are searching for. Only we do this in the vector space,” Liu says.

Not only was their technique more likely to find better matches than the models they compared it to, it is also easier to understand.

Because the model could only use 1,000 total words to label vectors, a user can more see easily which words the machine used to conclude that the video and spoken words are similar. This could make the model easier to apply in real-world situations where it is vital that users understand how it makes decisions, Liu says.

The model still has some limitations they hope to address in future work. For one, their research focused on data from two modalities at a time, but in the real world humans encounter many data modalities simultaneously, Liu says.

“And we know 1,000 words works on this kind of dataset, but we don’t know if it can be generalized to a real-world problem,” he adds.

Plus, the images and videos in their datasets contained simple objects or straightforward actions; real-world data are much messier. They also want to determine how well their method scales up when there is a wider diversity of inputs.

This research was supported, in part, by the MIT-IBM Watson AI Lab and its member companies, Nexplore and Woodside, and by the MIT Lincoln Laboratory.

Read More

A one-up on motion capture

From “Star Wars” to “Happy Feet,” many beloved films contain scenes that were made possible by motion capture technology, which records movement of objects or people through video. Further, applications for this tracking, which involve complicated interactions between physics, geometry, and perception, extend beyond Hollywood to the military, sports training, medical fields, and computer vision and robotics, allowing engineers to understand and simulate action happening within real-world environments.

As this can be a complex and costly process — often requiring markers placed on objects or people and recording the action sequence — researchers are working to shift the burden to neural networks, which could acquire this data from a simple video and reproduce it in a model. Work in physics simulations and rendering shows promise to make this more widely used, since it can characterize realistic, continuous, dynamic motion from images and transform back and forth between a 2D render and 3D scene in the world. However, to do so, current techniques require precise knowledge of the environmental conditions where the action is taking place, and the choice of renderer, both of which are often unavailable.

Now, a team of researchers from MIT and IBM has developed a trained neural network pipeline that avoids this issue, with the ability to infer the state of the environment and the actions happening, the physical characteristics of the object or person of interest (system), and its control parameters. When tested, the technique can outperform other methods in simulations of four physical systems of rigid and deformable bodies, which illustrate different types of dynamics and interactions, under various environmental conditions. Further, the methodology allows for imitation learning — predicting and reproducing the trajectory of a real-world, flying quadrotor from a video.

“The high-level research problem this paper deals with is how to reconstruct a digital twin from a video of a dynamic system,” says Tao Du PhD ’21, a postdoc in the Department of Electrical Engineering and Computer Science (EECS), a member of Computer Science and Artificial Intelligence Laboratory (CSAIL), and a member of the research team. In order to do this, Du says, “we need to ignore the rendering variances from the video clips and try to grasp of the core information about the dynamic system or the dynamic motion.”

Du’s co-authors include lead author Pingchuan Ma, a graduate student in EECS and a member of CSAIL; Josh Tenenbaum, the Paul E. Newton Career Development Professor of Cognitive Science and Computation in the Department of Brain and Cognitive Sciences and a member of CSAIL; Wojciech Matusik, professor of electrical engineering and computer science and CSAIL member; and MIT-IBM Watson AI Lab principal research staff member Chuang Gan. This work was presented this week the International Conference on Learning Representations.

While capturing videos of characters, robots, or dynamic systems to infer dynamic movement makes this information more accessible, it also brings a new challenge. “The images or videos [and how they are rendered] depend largely on the on the lighting conditions, on the background info, on the texture information, on the material information of your environment, and these are not necessarily measurable in a real-world scenario,” says Du. Without this rendering configuration information or knowledge of which renderer is used, it’s presently difficult to glean dynamic information and predict behavior of the subject of the video. Even if the renderer is known, current neural network approaches still require large sets of training data. However, with their new approach, this can become a moot point. “If you take a video of a leopard running in the morning and in the evening, of course, you’ll get visually different video clips because the lighting conditions are quite different. But what you really care about is the dynamic motion: the joint angles of the leopard — not if they look light or dark,” Du says.

In order to take rendering domains and image differences out of the issue, the team developed a pipeline system containing a neural network, dubbed “rendering invariant state-prediction (RISP)” network. RISP transforms differences in images (pixels) to differences in states of the system — i.e., the environment of action — making their method generalizable and agnostic to rendering configurations. RISP is trained using random rendering parameters and states, which are fed into a differentiable renderer, a type of renderer that measures the sensitivity of pixels with respect to rendering configurations, e.g., lighting or material colors. This generates a set of varied images and video from known ground-truth parameters, which will later allow RISP to reverse that process, predicting the environment state from the input video. The team additionally minimized RISP’s rendering gradients, so that its predictions were less sensitive to changes in rendering configurations, allowing it to learn to forget about visual appearances and focus on learning dynamical states. This is made possible by a differentiable renderer.

The method then uses two similar pipelines, run in parallel. One is for the source domain, with known variables. Here, system parameters and actions are entered into a differentiable simulation. The generated simulation’s states are combined with different rendering configurations into a differentiable renderer to generate images, which are fed into RISP. RISP then outputs predictions about the environmental states. At the same time, a similar target domain pipeline is run with unknown variables. RISP in this pipeline is fed these output images, generating a predicted state. When the predicted states from the source and target domains are compared, a new loss is produced; this difference is used to adjust and optimize some of the parameters in the source domain pipeline. This process can then be iterated on, further reducing the loss between the pipelines.

To determine the success of their method, the team tested it in four simulated systems: a quadrotor (a flying rigid body that doesn’t have any physical contact), a cube (a rigid body that interacts with its environment, like a die), an articulated hand, and a rod (deformable body that can move like a snake). The tasks included estimating the state of a system from an image, identifying the system parameters and action control signals from a video, and discovering the control signals from a target image that direct the system to the desired state. Additionally, they created baselines and an oracle, comparing the novel RISP process in these systems to similar methods that, for example, lack the rendering gradient loss, don’t train a neural network with any loss, or lack the RISP neural network altogether. The team also looked at how the gradient loss impacted the state prediction model’s performance over time. Finally, the researchers deployed their RISP system to infer the motion of a real-world quadrotor, which has complex dynamics, from video. They compared the performance to other techniques that lacked a loss function and used pixel differences, or one that included manual tuning of a renderer’s configuration.

In nearly all of the experiments, the RISP procedure outperformed similar or the state-of-the-art methods available, imitating or reproducing the desired parameters or motion, and proving to be a data-efficient and generalizable competitor to current motion capture approaches.

For this work, the researchers made two important assumptions: that information about the camera is known, such as its position and settings, as well as the geometry and physics governing the object or person that is being tracked. Future work is planned to address this.

“I think the biggest problem we’re solving here is to reconstruct the information in one domain to another, without very expensive equipment,” says Ma. Such an approach should be “useful for [applications such as the] metaverse, which aims to reconstruct the physical world in a virtual environment,” adds Gan. “It is basically an everyday, available solution, that’s neat and simple, to cross domain reconstruction or the inverse dynamics problem,” says Ma.

This research was supported, in part, by the MIT-IBM Watson AI Lab, Nexplore, DARPA Machine Common Sense program, Office of Naval Research (ONR), ONR MURI, and Mitsubishi Electric.

Read More

Engineers use artificial intelligence to capture the complexity of breaking waves

Waves break once they swell to a critical height, before cresting and crashing into a spray of droplets and bubbles. These waves can be as large as a surfer’s point break and as small as a gentle ripple rolling to shore. For decades, the dynamics of how and when a wave breaks have been too complex to predict.

Now, MIT engineers have found a new way to model how waves break. The team used machine learning along with data from wave-tank experiments to tweak equations that have traditionally been used to predict wave behavior. Engineers typically rely on such equations to help them design resilient offshore platforms and structures. But until now, the equations have not been able to capture the complexity of breaking waves.

The updated model made more accurate predictions of how and when waves break, the researchers found. For instance, the model estimated a wave’s steepness just before breaking, and its energy and frequency after breaking, more accurately than the conventional wave equations.

Their results, published today in the journal Nature Communications, will help scientists understand how a breaking wave affects the water around it. Knowing precisely how these waves interact can help hone the design of offshore structures. It can also improve predictions for how the ocean interacts with the atmosphere. Having better estimates of how waves break can help scientists predict, for instance, how much carbon dioxide and other atmospheric gases the ocean can absorb.

“Wave breaking is what puts air into the ocean,” says study author Themis Sapsis, an associate professor of mechanical and ocean engineering and an affiliate of the Institute for Data, Systems, and Society at MIT. “It may sound like a detail, but if you multiply its effect over the area of the entire ocean, wave breaking starts becoming fundamentally important to climate prediction.”

The study’s co-authors include lead author and MIT postdoc Debbie Eeltink, Hubert Branger and Christopher Luneau of Aix-Marseille University, Amin Chabchoub of Kyoto University, Jerome Kasparian of the University of Geneva, and T.S. van den Bremer of Delft University of Technology.

Learning tank

To predict the dynamics of a breaking wave, scientists typically take one of two approaches: They either attempt to precisely simulate the wave at the scale of individual molecules of water and air, or they run experiments to try and characterize waves with actual measurements. The first approach is computationally expensive and difficult to simulate even over a small area; the second requires a huge amount of time to run enough experiments to yield statistically significant results.

The MIT team instead borrowed pieces from both approaches to develop a more efficient and accurate model using machine learning. The researchers started with a set of equations that is considered the standard description of wave behavior. They aimed to improve the model by “training” the model on data of breaking waves from actual experiments.

“We had a simple model that doesn’t capture wave breaking, and then we had the truth, meaning experiments that involve wave breaking,” Eeltink explains. “Then we wanted to use machine learning to learn the difference between the two.”

The researchers obtained wave breaking data by running experiments in a 40-meter-long tank. The tank was fitted at one end with a paddle which the team used to initiate each wave. The team set the paddle to produce a breaking wave in the middle of the tank. Gauges along the length of the tank measured the water’s height as waves propagated down the tank.

“It takes a lot of time to run these experiments,” Eeltink says. “Between each experiment you have to wait for the water to completely calm down before you launch the next experiment, otherwise they influence each other.”

Safe harbor

In all, the team ran about 250 experiments, the data from which they used to train a type of machine-learning algorithm known as a neural network. Specifically, the algorithm is trained to compare the real waves in experiments with the predicted waves in the simple model, and based on any differences between the two, the algorithm tunes the model to fit reality.

After training the algorithm on their experimental data, the team introduced the model to entirely new data — in this case, measurements from two independent experiments, each run at separate wave tanks with different dimensions. In these tests, they found the updated model made more accurate predictions than the simple, untrained model, for instance making better estimates of a breaking wave’s steepness.

The new model also captured an essential property of breaking waves known as the “downshift,” in which the frequency of a wave is shifted to a lower value. The speed of a wave depends on its frequency. For ocean waves, lower frequencies move faster than higher frequencies. Therefore, after the downshift, the wave will move faster. The new model predicts the change in frequency, before and after each breaking wave, which could be especially relevant in preparing for coastal storms.

“When you want to forecast when high waves of a swell would reach a harbor, and you want to leave the harbor before those waves arrive, then if you get the wave frequency wrong, then the speed at which the waves are approaching is wrong,” Eeltink says.

The team’s updated wave model is in the form of an open-source code that others could potentially use, for instance in climate simulations of the ocean’s potential to absorb carbon dioxide and other atmospheric gases. The code can also be worked into simulated tests of offshore platforms and coastal structures.

“The number one purpose of this model is to predict what a wave will do,” Sapsis says. “If you don’t model wave breaking right, it would have tremendous implications for how structures behave. With this, you could simulate waves to help design structures better, more efficiently, and without huge safety factors.”

This research is supported, in part, by the Swiss National Science Foundation, and by the U.S. Office of Naval Research.

Read More

How can we reduce the carbon footprint of global computing?

The voracious appetite for energy from the world’s computers and communications technology presents a clear threat for the globe’s warming climate. That was the blunt assessment from presenters in the intensive two-day Climate Implications of Computing and Communications workshop held on March 3 and 4, hosted by MIT’s Climate and Sustainability Consortium (MCSC), MIT-IBM Watson AI Lab, and the Schwarzman College of Computing.

The virtual event featured rich discussions and highlighted opportunities for collaboration among an interdisciplinary group of MIT faculty and researchers and industry leaders across multiple sectors — underscoring the power of academia and industry coming together.

“If we continue with the existing trajectory of compute energy, by 2040, we are supposed to hit the world’s energy production capacity. The increase in compute energy and demand has been increasing at a much faster rate than the world energy production capacity increase,” said Bilge Yildiz, the Breene M. Kerr Professor in the MIT departments of Nuclear Science and Engineering and Materials Science and Engineering, one of the workshop’s 18 presenters. This computing energy projection draws from the Semiconductor Research Corporations’s decadal report.

To cite just one example: Information and communications technology already account for more than 2 percent of global energy demand, which is on a par with the aviation industries emissions from fuel.

“We are the very beginning of this data-driven world. We really need to start thinking about this and act now,” said presenter Evgeni Gousev, senior director at Qualcomm.  

Innovative energy-efficiency options

To that end, the workshop presentations explored a host of energy-efficiency options, including specialized chip design, data center architecture, better algorithms, hardware modifications, and changes in consumer behavior. Industry leaders from AMD, Ericsson, Google, IBM, iRobot, NVIDIA, Qualcomm, Tertill, Texas Instruments, and Verizon outlined their companies’ energy-saving programs, while experts from across MIT provided insight into current research that could yield more efficient computing.

Panel topics ranged from “Custom hardware for efficient computing” to “Hardware for new architectures” to “Algorithms for efficient computing,” among others.

The goal, said Yildiz, is to improve energy efficiency associated with computing by more than a million-fold.

“I think part of the answer of how we make computing much more sustainable has to do with specialized architectures that have very high level of utilization,” said Darío Gil, IBM senior vice president and director of research, who stressed that solutions should be as “elegant” as possible.  

For example, Gil illustrated an innovative chip design that uses vertical stacking to reduce the distance data has to travel, and thus reduces energy consumption. Surprisingly, more effective use of tape — a traditional medium for primary data storage — combined with specialized hard drives (HDD), can yield a dramatic savings in carbon dioxide emissions.

Gil and presenters Bill Dally, chief scientist and senior vice president of research of NVIDIA; Ahmad Bahai, CTO of Texas Instruments; and others zeroed in on storage. Gil compared data to a floating iceberg in which we can have fast access to the “hot data” of the smaller visible part while the “cold data,” the large underwater mass, represents data that tolerates higher latency. Think about digital photo storage, Gil said. “Honestly, are you really retrieving all of those photographs on a continuous basis?” Storage systems should provide an optimized mix of of HDD for hot data and tape for cold data based on data access patterns.

Bahai stressed the significant energy saving gained from segmenting standby and full processing. “We need to learn how to do nothing better,” he said. Dally spoke of mimicking the way our brain wakes up from a deep sleep, “We can wake [computers] up much faster, so we don’t need to keep them running in full speed.”

Several workshop presenters spoke of a focus on “sparsity,” a matrix in which most of the elements are zero, as a way to improve efficiency in neural networks. Or as Dally said, “Never put off till tomorrow, where you could put off forever,” explaining efficiency is not “getting the most information with the fewest bits. It’s doing the most with the least energy.”

Holistic and multidisciplinary approaches

“We need both efficient algorithms and efficient hardware, and sometimes we need to co-design both the algorithm and the hardware for efficient computing,” said Song Han, a panel moderator and assistant professor in the Department of Electrical Engineering and Computer Science (EECS) at MIT.

Some presenters were optimistic about innovations already underway. According to Ericsson’s research, as much as 15 percent of the carbon emissions globally can be reduced through the use of existing solutions, noted Mats Pellbäck Scharp, head of sustainability at Ericsson. For example, GPUs are more efficient than CPUs for AI, and the progression from 3G to 5G networks boosts energy savings.

“5G is the most energy efficient standard ever,” said Scharp. “We can build 5G without increasing energy consumption.”

Companies such as Google are optimizing energy use at their data centers through improved design, technology, and renewable energy. “Five of our data centers around the globe are operating near or above 90 percent carbon-free energy,” said Jeff Dean, Google’s senior fellow and senior vice president of Google Research.

Yet, pointing to the possible slowdown in the doubling of transistors in an integrated circuit — or Moore’s Law — “We need new approaches to meet this compute demand,” said Sam Naffziger, AMD senior vice president, corporate fellow, and product technology architect. Naffziger spoke of addressing performance “overkill.” For example, “we’re finding in the gaming and machine learning space we can make use of lower-precision math to deliver an image that looks just as good with 16-bit computations as with 32-bit computations, and instead of legacy 32b math to train AI networks, we can use lower-energy 8b or 16b computations.”

Other presenters singled out compute at the edge as a prime energy hog.

“We also have to change the devices that are put in our customers’ hands,” said Heidi Hemmer, senior vice president of engineering at Verizon. As we think about how we use energy, it is common to jump to data centers — but it really starts at the device itself, and the energy that the devices use. Then, we can think about home web routers, distributed networks, the data centers, and the hubs. “The devices are actually the least energy-efficient out of that,” concluded Hemmer.

Some presenters had different perspectives. Several called for developing dedicated silicon chipsets for efficiency. However, panel moderator Muriel Medard, the Cecil H. Green Professor in EECS, described research at MIT, Boston University, and Maynooth University on the GRAND (Guessing Random Additive Noise Decoding) chip, saying, “rather than having obsolescence of chips as the new codes come in and in different standards, you can use one chip for all codes.”

Whatever the chip or new algorithm, Helen Greiner, CEO of Tertill (a weeding robot) and co-founder of iRobot, emphasized that to get products to market, “We have to learn to go away from wanting to get the absolute latest and greatest, the most advanced processor that usually is more expensive.” She added, “I like to say robot demos are a dime a dozen, but robot products are very infrequent.”

Greiner emphasized consumers can play a role in pushing for more energy-efficient products — just as drivers began to demand electric cars.

Dean also sees an environmental role for the end user.

“We have enabled our cloud customers to select which cloud region they want to run their computation in, and they can decide how important it is that they have a low carbon footprint,” he said, also citing other interfaces that might allow consumers to decide which air flights are more efficient or what impact installing a solar panel on their home would have.

However, Scharp said, “Prolonging the life of your smartphone or tablet is really the best climate action you can do if you want to reduce your digital carbon footprint.”

Facing increasing demands

Despite their optimism, the presenters acknowledged the world faces increasing compute demand from machine learning, AI, gaming, and especially, blockchain. Panel moderator Vivienne Sze, associate professor in EECS, noted the conundrum.

“We can do a great job in making computing and communication really efficient. But there is this tendency that once things are very efficient, people use more of it, and this might result in an overall increase in the usage of these technologies, which will then increase our overall carbon footprint,” Sze said.

Presenters saw great potential in academic/industry partnerships, particularly from research efforts on the academic side. “By combining these two forces together, you can really amplify the impact,” concluded Gousev.

Presenters at the Climate Implications of Computing and Communications workshop also included: Joel Emer, professor of the practice in EECS at MIT; David Perreault, the Joseph F. and Nancy P. Keithley Professor of EECS at MIT; Jesús del Alamo, MIT Donner Professor and professor of electrical engineering in EECS at MIT; Heike Riel, IBM Fellow and head science and technology at IBM; and Takashi Ando, principal research staff member at IBM Research. The recorded workshop sessions are available on YouTube.

Read More

Aging Brain Initiative awards fund five new ideas to study, fight neurodegeneration

Neurodegenerative diseases are defined by an increasingly widespread and debilitating death of nervous system cells, but they also share other grim characteristics: Their cause is rarely discernible and they have all eluded cures. To spur fresh, promising approaches and to encourage new experts and expertise to join the field, MIT’s Aging Brain Initiative (ABI) this month awarded five seed grants after a competition among labs across the Institute.

Founded in 2015 by nine MIT faculty members, the ABI promotes research, symposia, and related activities to advance fundamental insights that can lead to clinical progress against neurodegenerative conditions, such as Alzheimer’s disease, with an age-related onset. With an emphasis on spurring research at an early stage before it is established enough to earn more traditional funding, the ABI derives support from philanthropic gifts.

“Solving the mysteries of how health declines in the aging brain and turning that knowledge into effective tools, treatments, and technologies is of the utmost urgency given the millions of people around the world who suffer with no meaningful treatment options,” says ABI director and co-founder Li-Huei Tsai, the Picower Professor of Neuroscience in The Picower Institute for Learning and Memory and the Department of Brain and Cognitive Sciences. “We were very pleased that many groups across MIT were eager to contribute their expertise and creativity to that goal. From here, five teams will be able to begin testing their innovative ideas and the impact they could have.”

To address the clinical challenge of accurately assessing cognitive decline during Alzheimer’s disease progression and healthy aging, a team led by Thomas Heldt, associate professor of electrical and biomedical engineering in the Department of Electrical Engineering and Computer Science (EECS) and the Institute for Medical Engineering and Science, proposes to use artificial intelligence tools to bring diagnostics based on eye movements during cognitive tasks to everyday consumer electronics such as smartphones and tablets. By moving these capabilities to common at-home platforms, the team, which also includes EECS Associate Professor Vivian Sze, hopes to increase monitoring beyond what can only be intermittently achieved with high-end specialized equipment and dedicated staffing in specialists’ offices. The team will pilot their technology in a small study at Boston Medical Center in collaboration with neurosurgeon James Holsapple.

Institute Professor Ann Graybiel’s lab in the Department of Brain and Cognitive Sciences (BCS) and the McGovern Institute for Brain Research will test the hypothesis that mutations on a specific gene may lead to the early emergence of Alzheimer’s disease (AD) pathology in the striatum. That’s a a brain region crucial for motivation and movement that is directly and severely impacted by other neurodegenerative disorders including Parkinson’s and Huntington’s diseases, but that has largely been unstudied in Alzheimer’s. By editing the mutations into normal and AD-modeling mice, Research Scientist Ayano Matsushima and Graybiel hope to determine whether and how pathology, such as the accumulation of amyloid proteins, may result. Determining that could provide new insight into the progression of disease and introduce a new biomarker in a region that virtually all other studies have overlooked.

Numerous recent studies have highlighted a potential role for immune inflammation in Alzheimer’s disease. A team led by Gloria Choi, the Mark Hyman Jr. Associate Professor in BCS and The Picower Institute for Learning and Memory, will track one potential source of such activity by determining whether the brain’s meninges, which envelop the brain, becomes a means for immune cells activated by gut bacteria to circulate near the brain, where they may release signaling molecules that promote Alzheimer’s pathology. Working in mice, Choi’s lab will test whether such activity is prone to increase in Alzheimer’s and whether it contributes to disease.

A collaboration led by Peter Dedon, the Singapore Professor in MIT’s Department of Biological Engineering, will explore whether Alzheimer’s pathology is driven by dysregulation of transfer RNAs (tRNAs) and the dozens of natural tRNA modifications in the epitranscriptome, which play a key role in the process by which proteins are assembled based on genetic instructions. With Benjamin Wolozin of Boston University, Sherif Rashad of Tohoku University in Japan, and Thomas Begley of the State University of New York at Albany, Dedon will assess how the tRNA pool and epitranscriptome may differ in Alzheimer’s model mice and whether genetic instructions mistranslated because of tRNA dysregulation play a role in Alzheimer’s disease.

With her seed grant, Ritu Raman, the d’Arbeloff Assistant Professor of Mechanical Engineering, is launching an investigation of possible disruption of intercellular messages in amyotrophic lateral sclerosis (ALS), a terminal condition in which motor neuron causes loss of muscle control. Equipped with a new tool to finely sample interstitial fluid within tissues, Raman’s team will be able to monitor and compare cell-cell signaling in models of the junction between nerve and muscle. These models will be engineered from stem cells derived from patients with ALS. By studying biochemical signaling at the junction the lab hopes to discover new targets that could be therapeutically modified.

Major support for the seed grants, which provide each lab with $100,000, came from generous gifts by David Emmes SM ’76; Kathleen SM ’77, PhD ’86 and Miguel Octavio; the Estate of Margaret A. Ridge-Pappis, wife of the late James Pappis ScD ’59; the Marc Haas Foundation; and the family of former MIT President Paul Gray ’54, SM ’55, ScD ‘60, with additional funding from many annual fund donors to the Aging Brain Initiative Fund.

Read More

Machine learning, harnessed to extreme computing, aids fusion energy development

MIT research scientists Pablo Rodriguez-Fernandez and Nathan Howard have just completed one of the most demanding calculations in fusion science — predicting the temperature and density profiles of a magnetically confined plasma via first-principles simulation of plasma turbulence. Solving this problem by brute force is beyond the capabilities of even the most advanced supercomputers. Instead, the researchers used an optimization methodology developed for machine learning to dramatically reduce the CPU time required while maintaining the accuracy of the solution.

Fusion energy

Fusion offers the promise of unlimited, carbon-free energy through the same physical process that powers the sun and the stars. It requires heating the fuel to temperatures above 100 million degrees, well above the point where the electrons are stripped from their atoms, creating a form of matter called plasma. On Earth, researchers use strong magnetic fields to isolate and insulate the hot plasma from ordinary matter. The stronger the magnetic field, the better the quality of the insulation that it provides.

Rodriguez-Fernandez and Howard have focused on predicting the performance expected in the SPARC device, a compact, high-magnetic-field fusion experiment, currently under construction by the MIT spin-out company Commonwealth Fusion Systems (CFS) and researchers from MIT’s Plasma Science and Fusion Center. While the calculation required an extraordinary amount of computer time, over 8 million CPU-hours, what was remarkable was not how much time was used, but how little, given the daunting computational challenge.

The computational challenge of fusion energy

Turbulence, which is the mechanism for most of the heat loss in a confined plasma, is one of the science’s grand challenges and the greatest problem remaining in classical physics. The equations that govern fusion plasmas are well known, but analytic solutions are not possible in the regimes of interest, where nonlinearities are important and solutions encompass an enormous range of spatial and temporal scales. Scientists resort to solving the equations by numerical simulation on computers. It is no accident that fusion researchers have been pioneers in computational physics for the last 50 years.

One of the fundamental problems for researchers is reliably predicting plasma temperature and density given only the magnetic field configuration and the externally applied input power. In confinement devices like SPARC, the external power and the heat input from the fusion process are lost through turbulence in the plasma. The turbulence itself is driven by the difference in the extremely high temperature of the plasma core and the relatively cool temperatures of the plasma edge (merely a few million degrees). Predicting the performance of a self-heated fusion plasma therefore requires a calculation of the power balance between the fusion power input and the losses due to turbulence.

These calculations generally start by assuming plasma temperature and density profiles at a particular location, then computing the heat transported locally by turbulence. However, a useful prediction requires a self-consistent calculation of the profiles across the entire plasma, which includes both the heat input and turbulent losses. Directly solving this problem is beyond the capabilities of any existing computer, so researchers have developed an approach that stitches the profiles together from a series of demanding but tractable local calculations. This method works, but since the heat and particle fluxes depend on multiple parameters, the calculations can be very slow to converge.

However, techniques emerging from the field of machine learning are well suited to optimize just such a calculation. Starting with a set of computationally intensive local calculations run with the full-physics, first-principles CGYRO code (provided by a team from General Atomics led by Jeff Candy) Rodriguez-Fernandez and Howard fit a surrogate mathematical model, which was used to explore and optimize a search within the parameter space. The results of the optimization were compared to the exact calculations at each optimum point, and the system was iterated to a desired level of accuracy. The researchers estimate that the technique reduced the number of runs of the CGYRO code by a factor of four.

New approach increases confidence in predictions

This work, described in a recent publication in the journal Nuclear Fusion, is the highest fidelity calculation ever made of the core of a fusion plasma. It refines and confirms predictions made with less demanding models. Professor Jonathan Citrin, of the Eindhoven University of Technology and leader of the fusion modeling group for DIFFER, the Dutch Institute for Fundamental Energy Research, commented: “The work significantly accelerates our capabilities in more routinely performing ultra-high-fidelity tokamak scenario prediction. This algorithm can help provide the ultimate validation test of machine design or scenario optimization carried out with faster, more reduced modeling, greatly increasing our confidence in the outcomes.” 

In addition to increasing confidence in the fusion performance of the SPARC experiment, this technique provides a roadmap to check and calibrate reduced physics models, which run with a small fraction of the computational power. Such models, cross-checked against the results generated from turbulence simulations, will provide a reliable prediction before each SPARC discharge, helping to guide experimental campaigns and improving the scientific exploitation of the device. It can also be used to tweak and improve even simple data-driven models, which run extremely quickly, allowing researchers to sift through enormous parameter ranges to narrow down possible experiments or possible future machines.

The research was funded by CFS, with computational support from the National Energy Research Scientific Computing Center, a U.S. Department of Energy Office of Science User Facility.

Read More

A smarter way to develop new drugs

Pharmaceutical companies are using artificial intelligence to streamline the process of discovering new medicines. Machine-learning models can propose new molecules that have specific properties which could fight certain diseases, doing in minutes what might take humans months to achieve manually.

But there’s a major hurdle that holds these systems back: The models often suggest new molecular structures that are difficult or impossible to produce in a laboratory. If a chemist can’t actually make the molecule, its disease-fighting properties can’t be tested.

A new approach from MIT researchers constrains a machine-learning model so it only suggests molecular structures that can be synthesized. The method guarantees that molecules are composed of materials that can be purchased and that the chemical reactions that occur between those materials follow the laws of chemistry.

When compared to other methods, their model proposed molecular structures that scored as high and sometimes better using popular evaluations, but were guaranteed to be synthesizable. Their system also takes less than one second to propose a synthetic pathway, while other methods that separately propose molecules and then evaluate their synthesizability can take several minutes. In a search space that can include billions of potential molecules, those time savings add up.

“This process reformulates how we ask these models to generate new molecular structures. Many of these models think about building new molecular structures atom by atom or bond by bond. Instead, we are building new molecules building block by building block and reaction by reaction,” says Connor Coley, the Henri Slezynger Career Development Assistant Professor in the MIT departments of Chemical Engineering and Electrical Engineering and Computer Science, and senior author of the paper.

Joining Coley on the paper are first author Wenhao Gao, a graduate student, and Rocío Mercado, a postdoc. The research is being presented this week at the International Conference on Learning Representations.

Building blocks

To create a molecular structure, the model simulates the process of synthesizing a molecule to ensure it can be produced.

The model is given a set of viable building blocks, which are chemicals that can be purchased, and a list of valid chemical reactions to work with. These chemical reaction templates are hand-made by experts. Controlling these inputs by only allowing certain chemicals or specific reactions enables the researchers to limit how large the search space can be for a new molecule.

The model uses these inputs to build a tree by selecting building blocks and linking them through chemical reactions, one at a time, to build the final molecule. At each step, the molecule becomes more complex as additional chemicals and reactions are added.

It outputs both the final molecular structure and the tree of chemicals and reactions that would synthesize it.

“Instead of directly designing the product molecule itself, we design an action sequence to obtain that molecule. This allows us to guarantee the quality of the structure,” Gao says.

To train their model, the researchers input a complete molecular structure and a set of building blocks and chemical reactions, and the model learns to create a tree that synthesizes the molecule. After seeing hundreds of thousands of examples, the model learns to come up with these synthetic pathways on its own.

Molecule optimization

The trained model can be used for optimization. Researchers define certain properties they want to achieve in a final molecule, given certain building blocks and chemical reaction templates, and the model proposes a synthesizable molecular structure.

“What was surprising is what a large fraction of molecules you can actually reproduce with such a small template set. You don’t need that many building blocks to generate a large amount of available chemical space for the model to search,” says Mercado.

They tested the model by evaluating how well it could reconstruct synthesizable molecules. It was able to reproduce 51 percent of these molecules, and took less than a second to recreate each one.

Their technique is faster than some other methods because the model isn’t searching through all the options for each step in the tree. It has a defined set of chemicals and reactions to work with, Gao explains.

When they used their model to propose molecules with specific properties, their method suggested higher quality molecular structures that had stronger binding affinities than those from other methods. This means the molecules would be better able to attach to a protein and block a certain activity, like stopping a virus from replicating.

For instance, when proposing a molecule that could dock with SARS-Cov-2, their model suggested several molecular structures that may be better able to bind with viral proteins than existing inhibitors. As the authors acknowledge, however, these are only computational predictions.

“There are so many diseases to tackle,” Gao says. “I hope that our method can accelerate this process so we don’t have to screen billions of molecules each time for a disease target. Instead, we can just specify the properties we want and it can accelerate the process of finding that drug candidate.”

Their model could also improve existing drug discovery pipelines. If a company has identified a particular molecule that has desired properties, but can’t be produced, they could use this model to propose synthesizable molecules that closely resemble it, Mercado says.

Now that they have validated their approach, the team plans to continue improving the chemical reaction templates to further enhance the model’s performance. With additional templates, they can run more tests on certain disease targets and, eventually, apply the model to the drug discovery process.

“Ideally, we want algorithms that automatically design molecules and give us the synthesis tree at the same time, quickly,” says Marwin Segler, who leads a team working on machine learning for drug discovery at Microsoft Research Cambridge (UK), and was not involved with this work. “This elegant approach by Prof. Coley and team is a major step forward to tackle this problem. While there are earlier proof-of-concept works for molecule design via synthesis tree generation, this team really made it work. For the first time, they demonstrated excellent performance on a meaningful scale, so it can have practical impact in computer-aided molecular discovery.

The work is also very exciting because it could eventually enable a new paradigm for computer-aided synthesis planning. It will likely be a huge inspiration for future research in the field.”

This research was supported, in part, by the U.S. Office of Naval Research and the Machine Learning for Pharmaceutical Discovery and Synthesis Consortium. 

Read More

Estimating the informativeness of data

Not all data are created equal. But how much information is any piece of data likely to contain? This question is central to medical testing, designing scientific experiments, and even to everyday human learning and thinking. MIT researchers have developed a new way to solve this problem, opening up new applications in medicine, scientific discovery, cognitive science, and artificial intelligence.

In theory, the 1948 paper, “A Mathematical Theory of Communication,” by the late MIT Professor Emeritus Claude Shannon answered this question definitively. One of Shannon’s breakthrough results is the idea of entropy, which lets us quantify the amount of information inherent in any random object, including random variables that model observed data. Shannon’s results created the foundations of information theory and modern telecommunications. The concept of entropy has also proven central to computer science and machine learning.

The challenge of estimating entropy

Unfortunately, the use of Shannon’s formula can quickly become computationally intractable. It requires precisely calculating the probability of the data, which in turn requires calculating every possible way the data could have arisen under a probabilistic model. If the data-generating process is very simple — for example, a single toss of a coin or roll of a loaded die — then calculating entropies is straightforward. But consider the problem of medical testing, where a positive test result is the result of hundreds of interacting variables, all unknown. With just 10 unknowns, there are already 1,000 possible explanations for the data. With a few hundred, there are more possible explanations than atoms in the known universe, which makes calculating the entropy exactly an unmanageable problem.

MIT researchers have developed a new method to estimate good approximations to many information quantities such as Shannon entropy by using probabilistic inference. The work appears in a paper presented at AISTATS 2022 by authors Feras Saad ’16, MEng ’16, a PhD candidate in electrical engineering and computer science; Marco-Cusumano Towner PhD ’21; and Vikash Mansinghka ’05, MEng ’09, PhD ’09, a principal research scientist in the Department of Brain and Cognitive Sciences. The key insight is, rather than enumerate all explanations, to instead use probabilistic inference algorithms to first infer which explanations are probable and then use these probable explanations to construct high-quality entropy estimates. The paper shows that this inference-based approach can be much faster and more accurate than previous approaches.

Estimating entropy and information in a probabilistic model is fundamentally hard because it often requires solving a high-dimensional integration problem. Many previous works have developed estimators of these quantities for certain special cases, but the new estimators of entropy via inference (EEVI) offer the first approach that can deliver sharp upper and lower bounds on a broad set of information-theoretic quantities. An upper and lower bound means that although we don’t know the true entropy, we can get a number that is smaller than it and a number that is higher than it.

“The upper and lower bounds on entropy delivered by our method are particularly useful for three reasons,” says Saad. “First, the difference between the upper and lower bounds gives a quantitative sense of how confident we should be about the estimates. Second, by using more computational effort we can drive the difference between the two bounds to zero, which ‘squeezes’ the true value with a high degree of accuracy. Third, we can compose these bounds to form estimates of many other quantities that tell us how informative different variables in a model are of one another.”

Solving fundamental problems with data-driven expert systems

Saad says he is most excited about the possibility that this method gives for querying probabilistic models in areas like machine-assisted medical diagnoses. He says one goal of the EEVI method is to be able to solve new queries using rich generative models for things like liver disease and diabetes that have already been developed by experts in the medical domain. For example, suppose we have a patient with a set of observed attributes (height, weight, age, etc.) and observed symptoms (nausea, blood pressure, etc.). Given these attributes and symptoms, EEVI can be used to help determine which medical tests for symptoms the physician should conduct to maximize information about the absence or presence of a given liver disease (like cirrhosis or primary biliary cholangitis).

For insulin diagnosis, the authors showed how to use the method for computing optimal times to take blood glucose measurements that maximize information about a patient’s insulin sensitivity, given an expert-built probabilistic model of insulin metabolism and the patient’s personalized meal and medication schedule. As routine medical tracking like glucose monitoring moves away from doctor’s offices and toward wearable devices, there are even more opportunities to improve data acquisition, if the value of the data can be estimated accurately in advance.

Vikash Mansinghka, senior author on the paper, adds, “We’ve shown that probabilistic inference algorithms can be used to estimate rigorous bounds on information measures that AI engineers often think of as intractable to calculate. This opens up many new applications. It also shows that inference may be more computationally fundamental than we thought. It also helps to explain how human minds might be able to estimate the value of information so pervasively, as a central building block of everyday cognition, and help us engineer AI expert systems that have these capabilities.”

The paper, “Estimators of Entropy and Information via Inference in Probabilistic Models,” was presented at AISTATS 2022.

Read More