From “cheetah-noids” to humanoids

In November 2018, MIT Professor Sangbae Kim brought his mini cheetah robot onto “The Tonight Show’s” Tonight Show-botics segment. Much to the delight of host Jimmy Fallon, the mini cheetah did some yoga, got back up after falling, and executed a perfect backflip. Behind the stage, Benjamin Katz ’16, SM ’18 was remotely controlling the cheetah’s nimble maneuvers.

For Katz, waiting in the wings as the robot performed in front of a national audience was the culmination of nearly five years of work.

As an undergraduate at MIT, Katz studied mechanical engineering, opting for the flexible Course 2A degree program with a concentration in controls, instrumentation, and robotics. Toward the end of his first year, he emailed Kim to see if there were any job opportunities in Kim’s Biomimetic Robotics Lab. He then spent the summer in Kim’s lab as part of the MIT Undergraduate Research Opportunities Program (UROP). For his UROP research and undergraduate thesis, he began to look at how to utilize pieces built for the electronics hobby market in robotics. “You can find really high-performance motors built for things like remote control airplanes and drones. I basically thought you could also use these parts for robots, which is something no one was doing,” recalls Katz.

Kim was immediately impressed by Katz’s abilities an engineer and designer.

“Ben is an extremely versatile engineer who can cover structure and mechanism design, electric motor dynamics, power electronics, and classical control, a range of expertise usually requiring four-to-five engineers to cover,” says Kim.

After deciding to pursue a master’s degree in mechanical engineering at MIT, Katz continued working in Kim’s lab and developed solutions for actuators in robotics. While working on the third iteration of Kim’s robot, known as Cheetah 3, Katz and his labmates shifted their focus to developing a smaller version of the robot.

“There are a lot of nice things about having a smaller robot: If something breaks you can easily fix it, it’s cheaper, and it’s safe enough for one person to wrangle alone,” says Katz. “Even though a small robot may not always be the most practical for real-world applications, its controllers, software, and research can be trivially ported to a big robot that can carry larger payloads.”

Drawing upon his undergraduate research, Katz and the research team used 12 motors originally designed for drones to build actuators in each joint of the small quadruped robot that would be dubbed the “mini cheetah.”

Armed with this smaller robot, Katz set out to make the mini cheetah more agile and resilient. Alongside then-EECS student Jared Di Carlo ’19, Mng ’20, Katz focused on controls related to locomotion in the mini cheetah. In class 6.832 (Underactuated Robotics), taught by Professor Russ Tedrake, the pair worked on a project that would allow the mini cheetah safely backflip from a crouched position.

“It was basically a giant offline optimization problem to get the mini cheetah to backflip,” says Katz.

Using offline nonlinear optimization to generate the backflip trajectory, he and Di Carlo were able to program the mini cheetah to crouch and rotate 360 degrees around an axis.

While working on the cheetah, Katz was constantly pursuing other engineering projects as a hobby. This included a very different rotating robot as a pet project. Alongside Di Carlo, Katz utilized the MIT community makerspace known as MITERS to develop a robot that could solve a Rubik’s Cube in a record-breaking 0.38 seconds.

“That project was purely for fun during MIT’s Independent Activities Period,” recalls Katz. “We used custom-built actuators on each of the Rubik’s Cube’s faces alongside webcams to identify the colors and move the blocks accordingly.”

He chronicled his other pet projects on his “build-its” blog, which developed a strong following. Projects included planar magnetic headphones, a desktop Furuta pendulum, and an electric travel ukulele.

“Ben was constantly building and analyzing something along with our lab and class projects during his entire time at MIT,” says Kim. “His incessant desire to learn, build, and analyze is quite remarkable.”

After graduating with his master’s degree in 2018, Katz worked as a technical associate in Kim’s lab before accepting a position at Boston Dynamics in 2019.

As a designer at Boston Dynamics, Katz has transitioned from cheetah robots to humanoid robots on ATLAS, a research platform billed as the “world’s most dynamic humanoid robot.” Much like the mini cheetah, ATLAS can execute incredibly dynamic maneuvers, including backflips and even parkour.

While the mini cheetah holding yoga poses and ATLAS doing parkour seems like entertainment befitting “The Tonight Show,” Katz is quick to remind others that these robots are fulfilling a real-world need. The robots could someday maneuver in areas that are too dangerous for humans — including buildings that are on fire and disaster areas. They could open new possibilities for lifesaving disaster relief and first-responders in emergencies.

“What we did in Sangbae’s lab is going to help make these machines ubiquitous and actually useful in the real world as viable products,” adds Katz.

Read More

Machine learning speeds up vehicle routing

Waiting for a holiday package to be delivered? There’s a tricky math problem that needs to be solved before the delivery truck pulls up to your door, and MIT researchers have a strategy that could speed up the solution.

The approach applies to vehicle routing problems such as last-mile delivery, where the goal is to deliver goods from a central depot to multiple cities while keeping travel costs down. While there are algorithms designed to solve this problem for a few hundred cities, these solutions become too slow when applied to a larger set of cities.

To remedy this, Cathy Wu, the Gilbert W. Winslow Career Development Assistant Professor in Civil and Environmental Engineering and the Institute for Data, Systems, and Society, and her students have come up with a machine-learning strategy that accelerates some of the strongest algorithmic solvers by 10 to 100 times.

The solver algorithms work by breaking up the problem of delivery into smaller subproblems to solve — say, 200 subproblems for routing vehicles between 2,000 cities. Wu and her colleagues augment this process with a new machine-learning algorithm that identifies the most useful subproblems to solve, instead of solving all the subproblems, to increase the quality of the solution while using orders of magnitude less compute.

Their approach, which they call “learning-to-delegate,” can be used across a variety of solvers and a variety of similar problems, including scheduling and pathfinding for warehouse robots, the researchers say.

The work pushes the boundaries on rapidly solving large-scale vehicle routing problems, says Marc Kuo, founder and CEO of Routific, a smart logistics platform for optimizing delivery routes. Some of Routific’s recent algorithmic advances were inspired by Wu’s work, he notes.

“Most of the academic body of research tends to focus on specialized algorithms for small problems, trying to find better solutions at the cost of processing times. But in the real-world, businesses don’t care about finding better solutions, especially if they take too long for compute,” Kuo explains. “In the world of last-mile logistics, time is money, and you cannot have your entire warehouse operations wait for a slow algorithm to return the routes. An algorithm needs to be hyper-fast for it to be practical.”

Wu, social and engineering systems doctoral student Sirui Li, and electrical engineering and computer science doctoral student Zhongxia Yan presented their research this week at the 2021 NeurIPS conference.

Selecting good problems

Vehicle routing problems are a class of combinatorial problems, which involve using heuristic algorithms to find “good-enough solutions” to the problem. It’s typically not possible to come up with the one “best” answer to these problems, because the number of possible solutions is far too huge.

“The name of the game for these types of problems is to design efficient algorithms … that are optimal within some factor,” Wu explains. “But the goal is not to find optimal solutions. That’s too hard. Rather, we want to find as good of solutions as possible. Even a 0.5% improvement in solutions can translate to a huge revenue increase for a company.”

Over the past several decades, researchers have developed a variety of heuristics to yield quick solutions to combinatorial problems. They usually do this by starting with a poor but valid initial solution and then gradually improving the solution — by trying small tweaks to improve the routing between nearby cities, for example. For a large problem like a 2,000-plus city routing challenge, however, this approach just takes too much time.

More recently, machine-learning methods have been developed to solve the problem, but while faster, they tend to be more inaccurate, even at the scale of a few dozen cities. Wu and her colleagues decided to see if there was a beneficial way to combine the two methods to find speedy but high-quality solutions.

“For us, this is where machine learning comes in,” Wu says. “Can we predict which of these subproblems, that if we were to solve them, would lead to more improvement in the solution, saving computing time and expense?”

Traditionally, a large-scale vehicle routing problem heuristic might choose the subproblems to solve in which order either randomly or by applying yet another carefully devised heuristic. In this case, the MIT researchers ran sets of subproblems through a neural network they created to automatically find the subproblems that, when solved, would lead to the greatest gain in quality of the solutions. This process sped up subproblem selection process by 1.5 to 2 times, Wu and colleagues found.

“We don’t know why these subproblems are better than other subproblems,” Wu notes. “It’s actually an interesting line of future work. If we did have some insights here, these could lead to designing even better algorithms.”

Surprising speed-up

Wu and colleagues were surprised by how well the approach worked. In machine learning, the idea of garbage-in, garbage-out applies — that is, the quality of a machine-learning approach relies heavily on the quality of the data. A combinatorial problem is so difficult that even its subproblems can’t be optimally solved. A neural network trained on the “medium-quality” subproblem solutions available as the input data “would typically give medium-quality results,” says Wu. In this case, however, the researchers were able to leverage the medium-quality solutions to achieve high-quality results, significantly faster than state-of-the-art methods.

For vehicle routing and similar problems, users often must design very specialized algorithms to solve their specific problem. Some of these heuristics have been in development for decades.

The learning-to-delegate method offers an automatic way to accelerate these heuristics for large problems, no matter what the heuristic or — potentially — what the problem.

Since the method can work with a variety of solvers, it may be useful for a variety of resource allocation problems, says Wu. “We may unlock new applications that now will be possible because the cost of solving the problem is 10 to 100 times less.”

The research was supported by MIT Indonesia Seed Fund, U.S. Department of Transportation Dwight David Eisenhower Transportation Fellowship Program, and the MIT-IBM Watson AI Lab.

Read More

Machine-learning system flags remedies that might do more harm than good

Sepsis claims the lives of nearly 270,000 people in the U.S. each year. The unpredictable medical condition can progress rapidly, leading to a swift drop in blood pressure, tissue damage, multiple organ failure, and death.

Prompt interventions by medical professionals save lives, but some sepsis treatments can also contribute to a patient’s deterioration, so choosing the optimal therapy can be a difficult task. For instance, in the early hours of severe sepsis, administering too much fluid intravenously can increase a patient’s risk of death.

To help clinicians avoid remedies that may potentially contribute to a patient’s death, researchers at MIT and elsewhere have developed a machine-learning model that could be used to identify treatments that pose a higher risk than other options. Their model can also warn doctors when a septic patient is approaching a medical dead end — the point when the patient will most likely die no matter what treatment is used — so that they can intervene before it is too late.

When applied to a dataset of sepsis patients in a hospital intensive care unit, the researchers’ model indicated that about 12 percent of treatments given to patients who died were detrimental. The study also reveals that about 3 percent of patients who did not survive entered a medical dead end up to 48 hours before they died.

“We see that our model is almost eight hours ahead of a doctor’s recognition of a patient’s deterioration. This is powerful because in these really sensitive situations, every minute counts, and being aware of how the patient is evolving, and the risk of administering certain treatment at any given time, is really important,” says Taylor Killian, a graduate student in the Healthy ML group of the Computer Science and Artificial Intelligence Laboratory (CSAIL).

Joining Killian on the paper are his advisor, Assistant Professor Marzyeh Ghassemi, head of the Healthy ML group and senior author; lead author Mehdi Fatemi, a senior researcher at Microsoft Research; and Jayakumar Subramanian, a senior research scientist at Adobe India. The research is being presented at this week’s Conference on Neural Information Processing Systems.  

A dearth of data

This research project was spurred by a 2019 paper Fatemi wrote that explored the use of reinforcement learning in situations where it is too dangerous to explore arbitrary actions, which makes it difficult to generate enough data to effectively train algorithms. These situations, where more data cannot be proactively collected, are known as “offline” settings.

In reinforcement learning, the algorithm is trained through trial and error and learns to take actions that maximize its accumulation of reward. But in a health care setting, it is nearly impossible to generate enough data for these models to learn the optimal treatment, since it isn’t ethical to experiment with possible treatment strategies.

So, the researchers flipped reinforcement learning on its head. They used the limited data from a hospital ICU to train a reinforcement learning model to identify treatments to avoid, with the goal of keeping a patient from entering a medical dead end.

Learning what to avoid is a more statistically efficient approach that requires fewer data, Killian explains.

“When we think of dead ends in driving a car, we might think that is the end of the road, but you could probably classify every foot along that road toward the dead end as a dead end. As soon as you turn away from another route, you are in a dead end. So, that is the way we define a medical dead end: Once you’ve gone on a path where whatever decision you make, the patient will progress toward death,” Killian says.

“One core idea here is to decrease the probability of selecting each treatment in proportion to its chance of forcing the patient to enter a medical dead-end — a property that is called treatment security. This is a hard problem to solve as the data do not directly give us such an insight. Our theoretical results allowed us to recast this core idea as a reinforcement learning problem,” Fatemi says.

To develop their approach, called Dead-end Discovery (DeD), they created two copies of a neural network. The first neural network focuses only on negative outcomes — when a patient died — and the second network only focuses on positive outcomes — when a patient survived. Using two neural networks separately enabled the researchers to detect a risky treatment in one and then confirm it using the other.

They fed each neural network patient health statistics and a proposed treatment. The networks output an estimated value of that treatment and also evaluate the probability the patient will enter a medical dead end. The researchers compared those estimates to set thresholds to see if the situation raises any flags.

A yellow flag means that a patient is entering an area of concern while a red flag identifies a situation where it is very likely the patient will not recover.

Treatment matters

The researchers tested their model using a dataset of patients presumed to be septic from the Beth Israel Deaconess Medical Center intensive care unit. This dataset contains about 19,300 admissions with observations drawn from a 72-hour period centered around when the patients first manifest symptoms of sepsis. Their results confirmed that some patients in the dataset encountered medical dead ends.

The researchers also found that 20 to 40 percent of patients who did not survive raised at least one yellow flag prior to their death, and many raised that flag at least 48 hours before they died. The results also showed that, when comparing the trends of patients who survived versus patients who died, once a patient raises their first flag, there is a very sharp deviation in the value of administered treatments. The window of time around the first flag is a critical point when making treatment decisions.

“This helped us confirm that treatment matters and the treatment deviates in terms of how patients survive and how patients do not. We found that upward of 11 percent of suboptimal treatments could have potentially been avoided because there were better alternatives available to doctors at those times. This is a pretty substantial number, when you consider the worldwide volume of patients who have been septic in the hospital at any given time,” Killian says.

Ghassemi is also quick to point out that the model is intended to assist doctors, not replace them.

“Human clinicians are who we want making decisions about care, and advice about what treatment to avoid isn’t going to change that,” she says. “We can recognize risks and add relevant guardrails based on the outcomes of 19,000 patient treatments — that’s equivalent to a single caregiver seeing more than 50 septic patient outcomes every day for an entire year.”

Moving forward, the researchers also want to estimate causal relationships between treatment decisions and the evolution of patient health. They plan to continue enhancing the model so it can create uncertainty estimates around treatment values that would help doctors make more informed decisions. Another way to provide further validation of the model would be to apply it to data from other hospitals, which they hope to do in the future.

This research was supported in part by Microsoft Research, a Canadian Institute for Advanced Research Azrieli Global Scholar Chair, a Canada Research Council Chair, and a Natural Sciences and Engineering Research Council of Canada Discovery Grant.

Read More

A tool to speed development of new solar cells

In the ongoing race to develop ever-better materials and configurations for solar cells, there are many variables that can be adjusted to try to improve performance, including material type, thickness, and geometric arrangement. Developing new solar cells has generally been a tedious process of making small changes to one of these parameters at a time. While computational simulators have made it possible to evaluate such changes without having to actually build each new variation for testing, the process remains slow.

Now, researchers at MIT and Google Brain have developed a system that makes it possible not just to evaluate one proposed design at a time, but to provide information about which changes will provide the desired improvements. This could greatly increase the rate for the discovery of new, improved configurations.

The new system, called a differentiable solar cell simulator, is described in a paper published today in the journal Computer Physics Communications, written by MIT junior Sean Mann, research scientist Giuseppe Romano of MIT’s Institute for Soldier Nanotechnologies, and four others at MIT and at Google Brain.

Traditional solar cell simulators, Romano explains, take the details of a solar cell configuration and produce as their output a predicted efficiency — that is, what percentage of the energy of incoming sunlight actually gets converted to an electric current. But this new simulator both predicts the efficiency and shows how much that output is affected by any one of the input parameters. “It tells you directly what happens to the efficiency if we make this layer a little bit thicker, or what happens to the efficiency if we for example change the property of the material,” he says.

In short, he says, “we didn’t discover a new device, but we developed a tool that will enable others to discover more quickly other higher performance devices.” Using this system, “we are decreasing the number of times that we need to run a simulator to give quicker access to a wider space of optimized structures.” In addition, he says, “our tool can identify a unique set of material parameters that has been hidden so far because it’s very complex to run those simulations.”

While traditional approaches use essentially a random search of possible variations, Mann says, with his tool “we can follow a trajectory of change because the simulator tells you what direction you want to be changing your device. That makes the process much faster because instead of exploring the entire space of opportunities, you can just follow a single path” that leads directly to improved performance.

Since advanced solar cells often are composed of multiple layers interlaced with conductive materials to carry electric charge from one to the other, this computational tool reveals how changing the relative thicknesses of these different layers will affect the device’s output. “This is very important because the thickness is critical. There is a strong interplay between light propagation and the thickness of each layer and the absorption of each layer,” Mann explains.

Other variables that can be evaluated include the amount of doping (the introduction of atoms of another element) that each layer receives, or the dielectric constant of insulating layers, or the bandgap, a measure of the energy levels of photons of light that can be captured by different materials used in the layers.

This simulator is now available as an open-source tool that can be used immediately to help guide research in this field, Romano says. “It is ready, and can be taken up by industry experts.” To make use of it, researchers would couple this device’s computations with an optimization algorithm, or even a machine learning system, to rapidly assess a wide variety of possible changes and home in quickly on the most promising alternatives.

At this point, the simulator is based on just a one-dimensional version of the solar cell, so the next step will be to expand its capabilities to include two- and three-dimensional configurations. But even this 1D version “can cover the majority of cells that are currently under production,” Romano says. Certain variations, such as so-called tandem cells using different materials, cannot yet be simulated directly by this tool, but “there are ways to approximate a tandem solar cell by simulating each of the individual cells,” Mann says.

The simulator is “end-to-end,” Romano says, meaning it computes the sensitivity of the efficiency, also taking into account light absorption. He adds: “An appealing future direction is composing our simulator with advanced existing differentiable light-propagation simulators, to achieve enhanced accuracy.”

Moving forward, Romano says, because this is an open-source code, “that means that once it’s up there, the community can contribute to it. And that’s why we are really excited.” Although this research group is “just a handful of people,” he says, now anyone working in the field can make their own enhancements and improvements to the code and introduce new capabilities.

“Differentiable physics is going to provide new capabilities for the simulations of engineered systems,” says Venkat Viswanathan, an associate professor of mechanical engineering at Carnegie Mellon University, who was not associated with this work. “The  differentiable solar cell simulator is an incredible example of differentiable physics, that can now provide new capabilities to optimize solar cell device performance,” he says, calling the study “an exciting step forward.”

In addition to Mann and Romano, the team included Eric Fadel and Steven Johnson at MIT, and Samuel Schoenholz and Ekin Cubuk at Google Brain. The work was supported in part by Eni S.p.A. and the MIT Energy Initiative, and the MIT Quest for Intelligence.

Read More

Tiny machine learning design alleviates a bottleneck in memory usage on internet-of-things devices

Machine learning provides powerful tools to researchers to identify and predict patterns and behaviors, as well as learn, optimize, and perform tasks. This ranges from applications like vision systems on autonomous vehicles or social robots to smart thermostats to wearable and mobile devices like smartwatches and apps that can monitor health changes. While these algorithms and their architectures are becoming more powerful and efficient, they typically require tremendous amounts of memory, computation, and data to train and make inferences.

At the same time, researchers are working to reduce the size and complexity of the devices that these algorithms can run on, all the way down to a microcontroller unit (MCU) that’s found in billions of internet-of-things (IoT) devices. An MCU is memory-limited minicomputer housed in compact integrated circuit that lacks an operating system and runs simple commands. These relatively cheap edge devices require low power, computing, and bandwidth, and offer many opportunities to inject AI technology to expand their utility, increase privacy, and democratize their use — a field called TinyML.

Now, an MIT team working in TinyML in the MIT-IBM Watson AI Lab and the research group of Song Han, assistant professor in the Department of Electrical Engineering and Computer Science (EECS), has designed a technique to shrink the amount of memory needed even smaller, while improving its performance on image recognition in live videos.

“Our new technique can do a lot more and paves the way for tiny machine learning on edge devices,” says Han, who designs TinyML software and hardware.

To increase TinyML efficiency, Han and his colleagues from EECS and the MIT-IBM Watson AI Lab analyzed how memory is used on microcontrollers running various convolutional neural networks (CNNs). CNNs are biologically-inspired models after neurons in the brain and are often applied to evaluate and identify visual features within imagery, like a person walking through a video frame. In their study, they discovered an imbalance in memory utilization, causing front-loading on the computer chip and creating a bottleneck. By developing a new inference technique and neural architecture, the team alleviated the problem and reduced peak memory usage by four-to-eight times. Further, the team deployed it on their own tinyML vision system, equipped with a camera and capable of human and object detection, creating its next generation, dubbed MCUNetV2. When compared to other machine learning methods running on microcontrollers, MCUNetV2 outperformed them with high accuracy on detection, opening the doors to additional vision applications not before possible.

The results will be presented in a paper at the conference on Neural Information Processing Systems (NeurIPS) this week. The team includes Han, lead author and graduate student Ji Lin, postdoc Wei-Ming Chen, graduate student Han Cai, and MIT-IBM Watson AI Lab Research Scientist Chuang Gan.

A design for memory efficiency and redistribution

TinyML offers numerous advantages over deep machine learning that happens on larger devices, like remote servers and smartphones. These, Han notes, include privacy, since the data are not transmitted to the cloud for computing but processed on the local device; robustness, as the computing is quick and the latency is low; and low cost, because IoT devices cost roughly $1 to $2. Further, some larger, more traditional AI models can emit as much carbon as five cars in their lifetimes, require many GPUs, and cost billions of dollars to train. “So, we believe such TinyML techniques can enable us to go off-grid to save the carbon emissions and make the AI greener, smarter, faster, and also more accessible to everyone — to democratize AI,” says Han.

However, small MCU memory and digital storage limit AI applications, so efficiency is a central challenge. MCUs contain only 256 kilobytes of memory and 1 megabyte of storage. In comparison, mobile AI on smartphones and cloud computing, correspondingly, may have 256 gigabytes and terabytes of storage, as well as 16,000 and 100,000 times more memory. As a precious resource, the team wanted to optimize its use, so they profiled the MCU memory usage of CNN designs — a task that had been overlooked until now, Lin and Chen say.

Their findings revealed that the memory usage peaked by the first five convolutional blocks out of about 17. Each block contains many connected convolutional layers, which help to filter for the presence of specific features within an input image or video, creating a feature map as the output. During the initial memory-intensive stage, most of the blocks operated beyond the 256KB memory constraint, offering plenty of room for improvement. To reduce the peak memory, the researchers developed a patch-based inference schedule, which operates on only a small fraction, roughly 25 percent, of the layer’s feature map at one time, before moving onto the next quarter, until the whole layer is done. This method saved four-to-eight times the memory of the previous layer-by-layer computational method, without any latency.

“As an illustration, say we have a pizza. We can divide it into four chunks and only eat one chunk at a time, so you save about three-quarters. This is the patch-based inference method,” says Han. “However, this was not a free lunch.” Like photoreceptors in the human eye, they can only take in and examine part of an image at a time; this receptive field is a patch of the total image or field of view. As the size of these receptive fields (or pizza slices in this analogy) grows, there becomes increasing overlap, which amounts to redundant computation that the researchers found to be about 10 percent. The researchers proposed to also redistribute the neural network across the blocks, in parallel with the patch-based inference method, without losing any of the accuracy in the vision system. However, the question remained about which blocks needed the patch-based inference method and which could use the original layer-by-layer one, together with the redistribution decisions; hand-tuning for all of these knobs was labor-intensive, and better left to AI.

“We want to automate this process by doing a joint automated search for optimization, including both the neural network architecture, like the number of layers, number of channels, the kernel size, and also the inference schedule including number of patches, number of layers for patch-based inference, and other optimization knobs,” says Lin, “so that non-machine learning experts can have a push-button solution to improve the computation efficiency but also improve the engineering productivity, to be able to deploy this neural network on microcontrollers.”

A new horizon for tiny vision systems

The co-design of the network architecture with the neural network search optimization and inference scheduling provided significant gains and was adopted into MCUNetV2; it outperformed other vision systems in peak memory usage, and image and object detection and classification. The MCUNetV2 device includes a small screen, a camera, and is about the size of an earbud case. Compared to the first version, the new version needed four times less memory for the same amount of accuracy, says Chen. When placed head-to-head against other tinyML solutions, MCUNetV2 was able to detect the presence of objects in image frames, like human faces, with an improvement of nearly 17 percent. Further, it set a record for accuracy, at nearly 72 percent, for a thousand-class image classification on the ImageNet dataset, using 465KB of memory. The researchers tested for what’s known as visual wake words, how well their MCU vision model could identify the presence of a person within an image, and even with the limited memory of only 30KB, it achieved greater than 90 percent accuracy, beating the previous state-of-the-art method. This means the method is accurate enough and could be deployed to help in, say, smart-home applications.

With the high accuracy and low energy utilization and cost, MCUNetV2’s performance unlocks new IoT applications. Due to their limited memory, Han says, vision systems on IoT devices were previously thought to be only good for basic image classification tasks, but their work has helped to expand the opportunities for TinyML use. Further, the research team envisions it in numerous fields, from monitoring sleep and joint movement in the health-care industry to sports coaching and movements like a golf swing to plant identification in agriculture, as well as in smarter manufacturing, from identifying nuts and bolts to detecting malfunctioning machines.

“We really push forward for these larger-scale, real-world applications,” says Han. “Without GPUs or any specialized hardware, our technique is so tiny it can run on these small cheap IoT devices and perform real-world applications like these visual wake words, face mask detection, and person detection. This opens the door for a brand-new way of doing tiny AI and mobile vision.”

This research was sponsored by the MIT-IBM Watson AI Lab, Samsung, and Woodside Energy, and the National Science Foundation.

Read More

Machines that see the world more like humans do

Computer vision systems sometimes make inferences about a scene that fly in the face of common sense. For example, if a robot were processing a scene of a dinner table, it might completely ignore a bowl that is visible to any human observer, estimate that a plate is floating above the table, or misperceive a fork to be penetrating a bowl rather than leaning against it.

Move that computer vision system to a self-driving car and the stakes become much higher  — for example, such systems have failed to detect emergency vehicles and pedestrians crossing the street.

To overcome these errors, MIT researchers have developed a framework that helps machines see the world more like humans do. Their new artificial intelligence system for analyzing scenes learns to perceive real-world objects from just a few images, and perceives scenes in terms of these learned objects.

The researchers built the framework using probabilistic programming, an AI approach that enables the system to cross-check detected objects against input data, to see if the images recorded from a camera are a likely match to any candidate scene. Probabilistic inference allows the system to infer whether mismatches are likely due to noise or to errors in the scene interpretation that need to be corrected by further processing.

This common-sense safeguard allows the system to detect and correct many errors that plague the “deep-learning” approaches that have also been used for computer vision. Probabilistic programming also makes it possible to infer probable contact relationships between objects in the scene, and use common-sense reasoning about these contacts to infer more accurate positions for objects.

“If you don’t know about the contact relationships, then you could say that an object is floating above the table — that would be a valid explanation. As humans, it is obvious to us that this is physically unrealistic and the object resting on top of the table is a more likely pose of the object. Because our reasoning system is aware of this sort of knowledge, it can infer more accurate poses. That is a key insight of this work,” says lead author Nishad Gothoskar, an electrical engineering and computer science (EECS) PhD student with the Probabilistic Computing Project.

In addition to improving the safety of self-driving cars, this work could enhance the performance of computer perception systems that must interpret complicated arrangements of objects, like a robot tasked with cleaning a cluttered kitchen.

Gothoskar’s co-authors include recent EECS PhD graduate Marco Cusumano-Towner; research engineer Ben Zinberg; visiting student Matin Ghavamizadeh; Falk Pollok, a software engineer in the MIT-IBM Watson AI Lab; recent EECS master’s graduate Austin Garrett; Dan Gutfreund, a principal investigator in the MIT-IBM Watson AI Lab; Joshua B. Tenenbaum, the Paul E. Newton Career Development Professor of Cognitive Science and Computation in the Department of Brain and Cognitive Sciences (BCS) and a member of the Computer Science and Artificial Intelligence Laboratory; and senior author Vikash K. Mansinghka, principal research scientist and leader of the Probabilistic Computing Project in BCS. The research is being presented at the Conference on Neural Information Processing Systems in December.

A blast from the past

To develop the system, called “3D Scene Perception via Probabilistic Programming (3DP3),” the researchers drew on a concept from the early days of AI research, which is that computer vision can be thought of as the “inverse” of computer graphics.

Computer graphics focuses on generating images based on the representation of a scene; computer vision can be seen as the inverse of this process. Gothoskar and his collaborators made this technique more learnable and scalable by incorporating it into a framework built using probabilistic programming.

“Probabilistic programming allows us to write down our knowledge about some aspects of the world in a way a computer can interpret, but at the same time, it allows us to express what we don’t know, the uncertainty. So, the system is able to automatically learn from data and also automatically detect when the rules don’t hold,” Cusumano-Towner explains.

In this case, the model is encoded with prior knowledge about 3D scenes. For instance, 3DP3 “knows” that scenes are composed of different objects, and that these objects often lay flat on top of each other — but they may not always be in such simple relationships. This enables the model to reason about a scene with more common sense.

Learning shapes and scenes

To analyze an image of a scene, 3DP3 first learns about the objects in that scene. After being shown only five images of an object, each taken from a different angle, 3DP3 learns the object’s shape and estimates the volume it would occupy in space.

“If I show you an object from five different perspectives, you can build a pretty good representation of that object. You’d understand its color, its shape, and you’d be able to recognize that object in many different scenes,” Gothoskar says.

Mansinghka adds, “This is way less data than deep-learning approaches. For example, the Dense Fusion neural object detection system requires thousands of training examples for each object type. In contrast, 3DP3 only requires a few images per object, and reports uncertainty about the parts of each objects’ shape that it doesn’t know.”

The 3DP3 system generates a graph to represent the scene, where each object is a node and the lines that connect the nodes indicate which objects are in contact with one another. This enables 3DP3 to produce a more accurate estimation of how the objects are arranged. (Deep-learning approaches rely on depth images to estimate object poses, but these methods don’t produce a graph structure of contact relationships, so their estimations are less accurate.)

Outperforming baseline models

The researchers compared 3DP3 with several deep-learning systems, all tasked with estimating the poses of 3D objects in a scene.

In nearly all instances, 3DP3 generated more accurate poses than other models and performed far better when some objects were partially obstructing others. And 3DP3 only needed to see five images of each object, while each of the baseline models it outperformed needed thousands of images for training.

When used in conjunction with another model, 3DP3 was able to improve its accuracy. For instance, a deep-learning model might predict that a bowl is floating slightly above a table, but because 3DP3 has knowledge of the contact relationships and can see that this is an unlikely configuration, it is able to make a correction by aligning the bowl with the table.

“I found it surprising to see how large the errors from deep learning could sometimes be — producing scene representations where objects really didn’t match with what people would perceive. I also found it surprising that only a little bit of model-based inference in our causal probabilistic program was enough to detect and fix these errors. Of course, there is still a long way to go to make it fast and robust enough for challenging real-time vision systems — but for the first time, we’re seeing probabilistic programming and structured causal models improving robustness over deep learning on hard 3D vision benchmarks,” Mansinghka says.

In the future, the researchers would like to push the system further so it can learn about an object from a single image, or a single frame in a movie, and then be able to detect that object robustly in different scenes. They would also like to explore the use of 3DP3 to gather training data for a neural network. It is often difficult for humans to manually label images with 3D geometry, so 3DP3 could be used to generate more complex image labels.

The 3DP3 system “combines low-fidelity graphics modeling with common-sense reasoning to correct large scene interpretation errors made by deep learning neural nets. This type of approach could have broad applicability as it addresses important failure modes of deep learning. The MIT researchers’ accomplishment also shows how probabilistic programming technology previously developed under DARPA’s Probabilistic Programming for Advancing Machine Learning (PPAML) program can be applied to solve central problems of common-sense AI under DARPA’s current Machine Common Sense (MCS) program,” says Matt Turek, DARPA Program Manager for the Machine Common Sense Program, who was not involved in this research, though the program partially funded the study.

Additional funders include the Singapore Defense Science and Technology Agency collaboration with the MIT Schwarzman College of Computing, Intel’s Probabilistic Computing Center, the MIT-IBM Watson AI Lab, the Aphorism Foundation, and the Siegel Family Foundation.

Read More