Deep learning helps predict traffic crashes before they happen

Today’s world is one big maze, connected by layers of concrete and asphalt that afford us the luxury of navigation by vehicle. For many of our road-related advancements — GPS lets us fire fewer neurons thanks to map apps, cameras alert us to potentially costly scrapes and scratches, and electric autonomous cars have lower fuel costs — our safety measures haven’t quite caught up. We still rely on a steady diet of traffic signals, trust, and the steel surrounding us to safely get from point A to point B. 

To get ahead of the uncertainty inherent to crashes, scientists from MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL) and the Qatar Center for Artificial Intelligence developed a deep learning model that predicts very high-resolution crash risk maps. Fed on a combination of historical crash data, road maps, satellite imagery, and GPS traces, the risk maps describe the expected number of crashes over a period of time in the future, to identify high-risk areas and predict future crashes. 

Typically, these types of risk maps are captured at much lower resolutions that hover around hundreds of meters, which means glossing over crucial details since the roads become blurred together. These maps, though, are 5×5 meter grid cells, and the higher resolution brings newfound clarity: The scientists found that a highway road, for example, has a higher risk than nearby residential roads, and ramps merging and exiting the highway have an even higher risk than other roads. 

“By capturing the underlying risk distribution that determines the probability of future crashes at all places, and without any historical data, we can find safer routes, enable auto insurance companies to provide customized insurance plans based on driving trajectories of customers, help city planners design safer roads, and even predict future crashes,” says MIT CSAIL PhD student Songtao He, a lead author on a new paper about the research. 

Even though car crashes are sparse, they cost about 3 percent of the world’s GDP and are the leading cause of death in children and young adults. This sparsity makes inferring maps at such a high resolution a tricky task. Crashes at this level are thinly scattered — the average annual odds of a crash in a 5×5 grid cell is about one-in-1,000 — and they rarely happen at the same location twice. Previous attempts to predict crash risk have been largely “historical,” as an area would only be considered high-risk if there was a previous nearby crash. 

The team’s approach casts a wider net to capture critical data. It identifies high-risk locations using GPS trajectory patterns, which give information about density, speed, and direction of traffic, and satellite imagery that describes road structures, such as the number of lanes, whether there’s a shoulder, or if there’s a large number of pedestrians. Then, even if a high-risk area has no recorded crashes, it can still be identified as high-risk, based on its traffic patterns and topology alone. 

To evaluate the model, the scientists used crashes and data from 2017 and 2018, and tested its performance at predicting crashes in 2019 and 2020. Many locations were identified as high-risk, even though they had no recorded crashes, and also experienced crashes during the follow-up years.

“Our model can generalize from one city to another by combining multiple clues from seemingly unrelated data sources. This is a step toward general AI, because our model can predict crash maps in uncharted territories,” says Amin Sadeghi, a lead scientist at Qatar Computing Research Institute (QCRI) and an author on the paper. “The model can be used to infer a useful crash map even in the absence of historical crash data, which could translate to positive use for city planning and policymaking by comparing imaginary scenarios.” 

The dataset covered 7,500 square kilometers from Los Angeles, New York City, Chicago and Boston. Among the four cities, L.A. was the most unsafe, since it had the highest crash density, followed by New York City, Chicago, and Boston. 

“If people can use the risk map to identify potentially high-risk road segments, they can take action in advance to reduce the risk of trips they take. Apps like Waze and Apple Maps have incident feature tools, but we’re trying to get ahead of the crashes — before they happen,” says He. 

He and Sadeghi wrote the paper alongside Sanjay Chawla, research director at QCRI, and MIT professors of electrical engineering and computer science Mohammad Alizadeh, ​​Hari Balakrishnan, and Sam Madden. They will present the paper at the 2021 International Conference on Computer Vision.

Read More

Enabling AI-driven health advances without sacrificing patient privacy

There’s a lot of excitement at the intersection of artificial intelligence and health care. AI has already been used to improve disease treatment and detection, discover promising new drugs, identify links between genes and diseases, and more.

By analyzing large datasets and finding patterns, virtually any new algorithm has the potential to help patients — AI researchers just need access to the right data to train and test those algorithms. Hospitals, understandably, are hesitant to share sensitive patient information with research teams. When they do share data, it’s difficult to verify that researchers are only using the data they need and deleting it after they’re done.

Secure AI Labs (SAIL) is addressing those problems with a technology that lets AI algorithms run on encrypted datasets that never leave the data owner’s system. Health care organizations can control how their datasets are used, while researchers can protect the confidentiality of their models and search queries. Neither party needs to see the data or the model to collaborate.

SAIL’s platform can also combine data from multiple sources, creating rich insights that fuel more effective algorithms.

“You shouldn’t have to schmooze with hospital executives for five years before you can run your machine learning algorithm,” says SAIL co-founder and MIT Professor Manolis Kellis, who co-founded the company with CEO Anne Kim ’16, SM ’17. “Our goal is to help patients, to help machine learning scientists, and to create new therapeutics. We want new algorithms — the best algorithms — to be applied to the biggest possible data set.”

SAIL has already partnered with hospitals and life science companies to unlock anonymized data for researchers. In the next year, the company hopes to be working with about half of the top 50 academic medical centers in the country.

Unleashing AI’s full potential

As an undergraduate at MIT studying computer science and molecular biology, Kim worked with researchers in the Computer Science and Artificial Intelligence Laboratory (CSAIL) to analyze data from clinical trials, gene association studies, hospital intensive care units, and more.

“I realized there is something severely broken in data sharing, whether it was hospitals using hard drives, ancient file transfer protocol, or even sending stuff in the mail,” Kim says. “It was all just not well-tracked.”

Kellis, who is also a member of the Broad Institute of MIT and Harvard, has spent years establishing partnerships with hospitals and consortia across a range of diseases including cancers, heart disease, schizophrenia, and obesity. He knew that smaller research teams would struggle to get access to the same data his lab was working with.

In 2017, Kellis and Kim decided to commercialize technology they were developing to allow AI algorithms to run on encrypted data.

In the summer of 2018, Kim participated in the delta v startup accelerator run by the Martin Trust Center for MIT Entrepreneurship. The founders also received support from the Sandbox Innovation Fund and the Venture Mentoring Service, and made various early connections through their MIT network.

To participate in SAIL’s program, hospitals and other health care organizations make parts of their data available to researchers by setting up a node behind their firewall. SAIL then sends encrypted algorithms to the servers where the datasets reside in a process called federated learning. The algorithms crunch the data locally in each server and transmit the results back to a central model, which updates itself. No one — not the researchers, the data owners, or even SAIL —has access to the models or the datasets.

The approach allows a much broader set of researchers to apply their models to large datasets. To further engage the research community, Kellis’ lab at MIT has begun holding competitions in which it gives access to datasets in areas like protein function and gene expression, and challenges researchers to predict results.

“We invite machine learning researchers to come and train on last year’s data and predict this year’s data,” says Kellis. “If we see there’s a new type of algorithm that is performing best in these community-level assessments, people can adopt it locally at many different institutions and level the playing field. So, the only thing that matters is the quality of your algorithm rather than the power of your connections.”

By enabling a large number of datasets to be anonymized into aggregate insights, SAIL’s technology also allows researchers to study rare diseases, in which small pools of relevant patient data are often spread out among many institutions. That has historically made the data difficult to apply AI models to.

“We’re hoping that all of these datasets will eventually be open,” Kellis says. “We can cut across all the silos and enable a new era where every patient with every rare disorder across the entire world can come together in a single keystroke to analyze data.”

Enabling the medicine of the future

To work with large amounts of data around specific diseases, SAIL has increasingly sought to partner with patient associations and consortia of health care groups, including an international health care consulting company and the Kidney Cancer Association. The partnerships also align SAIL with patients, the group they’re most trying to help.

Overall, the founders are happy to see SAIL solving problems they faced in their labs for researchers around the world.

“The right place to solve this is not an academic project. The right place to solve this is in industry, where we can provide a platform not just for my lab but for any researcher,” Kellis says. “It’s about creating an ecosystem of academia, researchers, pharma, biotech, and hospital partners. I think it’s the blending all of these different areas that will make that vision of medicine of the future become a reality.”

Read More

3 Questions: Kalyan Veeramachaneni on hurdles preventing fully automated machine learning

The proliferation of big data across domains, from banking to health care to environmental monitoring, has spurred increasing demand for machine learning tools that help organizations make decisions based on the data they gather.

That growing industry demand has driven researchers to explore the possibilities of automated machine learning (AutoML), which seeks to automate the development of machine learning solutions in order to make them accessible for nonexperts, improve their efficiency, and accelerate machine learning research. For example, an AutoML system might enable doctors to use their expertise interpreting electroencephalography (EEG) results to build a model that can predict which patients are at higher risk for epilepsy — without requiring the doctors to have a background in data science.

Yet, despite more than a decade of work, researchers have been unable to fully automate all steps in the machine learning development process. Even the most efficient commercial AutoML systems still require a prolonged back-and-forth between a domain expert, like a marketing manager or mechanical engineer, and a data scientist, making the process inefficient.

Kalyan Veeramachaneni, a principal research scientist in the MIT Laboratory for Information and Decision Systems who has been studying AutoML since 2010, has co-authored a paper in the journal ACM Computing Surveys that details a seven-tiered schematic to evaluate AutoML tools based on their level of autonomy.

A system at level zero has no automation and requires a data scientist to start from scratch and build models by hand, while a tool at level six is completely automated and can be easily and effectively used by a nonexpert. Most commercial systems fall somewhere in the middle.

Veeramachaneni spoke with MIT News about the current state of AutoML, the hurdles that prevent truly automatic machine learning systems, and the road ahead for AutoML researchers.

Q: How has automatic machine learning evolved over the past decade, and what is the current state of AutoML systems?

A: In 2010, we started to see a shift, with enterprises wanting to invest in getting value out of their data beyond just business intelligence. So then came the question, maybe there are certain things in the development of machine learning-based solutions that we can automate? The first iteration of AutoML was to make our own jobs as data scientists more efficient. Can we take away the grunt work that we do on a day-to-day basis and automate that by using a software system? That area of research ran its course until about 2015, when we realized we still weren’t able to speed up this development process.

Then another thread emerged. There are a lot of problems that could be solved with data, and they come from experts who know those problems, who live with them on a daily basis. These individuals have very little to do with machine learning or software engineering. How do we bring them into the fold? That is really the next frontier.

There are three areas where these domain experts have strong input in a machine learning system. The first is defining the problem itself and then helping to formulate it as a prediction task to be solved by a machine learning model. Second, they know how the data have been collected, so they also know intuitively how to process that data. And then third, at the end, machine learning models only give you a very tiny part of a solution — they just give you a prediction. The output of a machine learning model is just one input to help a domain expert get to a decision or action.

Q: What steps of the machine learning pipeline are the most difficult to automate, and why has automating them been so challenging?

A: The problem-formulation part is extremely difficult to automate. For example, if I am a researcher who wants to get more government funding, and I have a lot of data about the content of the research proposals that I write and whether or not I receive funding, can machine learning help there? We don’t know yet. In problem formulation, I use my domain expertise to translate the problem into something that is more tangible to predict, and that requires somebody who knows the domain very well. And he or she also knows how to use that information post-prediction. That problem is refusing to be automated.

There is one part of problem-formulation that could be automated. It turns out that we can look at the data and mathematically express several possible prediction tasks automatically. Then we can share those prediction tasks with the domain expert to see if any of them would help in the larger problem they are trying to tackle. Then once you pick the prediction task, there are a lot of intermediate steps you do, including feature engineering, modeling, etc., that are very mechanical steps and easy to automate.

But defining the prediction tasks has typically been a collaborative effort between data scientists and domain experts because, unless you know the domain, you can’t translate the domain problem into a prediction task. And then sometimes domain experts don’t know what is meant by “prediction.” That leads to the major, significant back and forth in the process. If you automate that step, then machine learning penetration and the use of data to create meaningful predictions will increase tremendously.

Then what happens after the machine learning model gives a prediction? We can automate the software and technology part of it, but at the end of the day, it is root cause analysis and human intuition and decision making. We can augment them with a lot of tools, but we can’t fully automate that.

Q: What do you hope to achieve with the seven-tiered framework for evaluating AutoML systems that you outlined in your paper?

A: My hope is that people start to recognize that some levels of automation have already been achieved and some still need to be tackled. In the research community, we tend to focus on what we are comfortable with. We have gotten used to automating certain steps, and then we just stick to it. Automating these other parts of the machine learning solution development is very important, and that is where the biggest bottlenecks remain.

My second hope is that researchers will very clearly understand what domain expertise means. A lot of this AutoML work is still being conducted by academics, and the problem is that we often don’t do applied work. There is not a crystal-clear definition of what a domain expert is and in itself, “domain expert,” is a very nebulous phrase. What we mean by domain expert is the expert in the problem you are trying to solve with machine learning. And I am hoping that everyone unifies around that because that would make things so much clearer.

I still believe that we are not able to build that many models for that many problems, but even for the ones that we are building, the majority of them are not getting deployed and used in day-to-day life. The output of machine learning is just going to be another data point, an augmented data point, in someone’s decision making. How they make those decisions, based on that input, how that will change their behavior, and how they will adapt their style of working, that is still a big, open question. Once we automate everything, that is what’s next.

We have to determine what has to fundamentally change in the day-to-day workflow of someone giving loans at a bank, or an educator trying to decide whether he or she should change the assignments in an online class. How are they going to use machine learning’s outputs? We need to focus on the fundamental things we have to build out to make machine learning more usable.

Read More

Making roadway spending more sustainable

The share of federal spending on infrastructure has reached an all-time low, falling from 30 percent in 1960 to just 12 percent in 2018.

While the nation’s ailing infrastructure will require more funding to reach its full potential, recent MIT research finds that more sustainable and higher performing roads are still possible even with today’s limited budgets.

The research, conducted by a team of current and former MIT Concrete Sustainability Hub (MIT CSHub) scientists and published in Transportation Research D, finds that a set of innovative planning strategies could improve pavement network environmental and performance outcomes even if budgets don’t increase.

The paper presents a novel budget allocation tool and pairs it with three innovative strategies for managing pavement networks: a mix of paving materials, a mix of short- and long-term paving actions, and a long evaluation period for those actions.

This novel approach offers numerous benefits. When applied to a 30-year case study of the Iowa U.S. Route network, the MIT CSHub model and management strategies cut emissions by 20 percent while sustaining current levels of road quality. Achieving this with a conventional planning approach would require the state to spend 32 percent more than it does today. The key to its success is the consideration of a fundamental — but fraught — aspect of pavement asset management: uncertainty.

Predicting unpredictability

The average road must last many years and support the traffic of thousands — if not millions — of vehicles. Over that time, a lot can change. Material prices may fluctuate, budgets may tighten, and traffic levels may intensify. Climate (and climate change), too, can hasten unexpected repairs.

Managing these uncertainties effectively means looking long into the future and anticipating possible changes.

“Capturing the impacts of uncertainty is essential for making effective paving decisions,” explains Fengdi Guo, the paper’s lead author and a departing CSHub research assistant.

“Yet, measuring and relating these uncertainties to outcomes is also computationally intensive and expensive. Consequently, many DOTs [departments of transportation] are forced to simplify their analysis to plan maintenance — often resulting in suboptimal spending and outcomes.”

To give DOTs accessible tools to factor uncertainties into their planning, CSHub researchers have developed a streamlined planning approach. It offers greater specificity and is paired with several new pavement management strategies.

The planning approach, known as Probabilistic Treatment Path Dependence (PTPD), is based on machine learning and was devised by Guo.

“Our PTPD model is composed of four steps,” he explains. “These steps are, in order, pavement damage prediction; treatment cost prediction; budget allocation; and pavement network condition evaluation.”

The model begins by investigating every segment in an entire pavement network and predicting future possibilities for pavement deterioration, cost, and traffic.

“We [then] run thousands of simulations for each segment in the network to determine the likely cost and performance outcomes for each initial and subsequent sequence, or ‘path,’ of treatment actions,” says Guo. “The treatment paths with the best cost and performance outcomes are selected for each segment, and then across the network.”

The PTPD model not only seeks to minimize costs to agencies but also to users — in this case, drivers. These user costs can come primarily in the form of excess fuel consumption due to poor road quality.

“One improvement in our analysis is the incorporation of electric vehicle uptake into our cost and environmental impact predictions,” Randolph Kirchain, a principal research scientist at MIT CSHub and MIT Materials Research Laboratory (MRL) and one of the paper’s co-authors. “Since the vehicle fleet will change over the next several decades due to electric vehicle adoption, we made sure to consider how these changes might impact our predictions of excess energy consumption.”

After developing the PTPD model, Guo wanted to see how the efficacy of various pavement management strategies might differ. To do this, he developed a sophisticated deterioration prediction model.

A novel aspect of this deterioration model is its treatment of multiple deterioration metrics simultaneously. Using a multi-output neural network, a tool of artificial intelligence, the model can predict several forms of pavement deterioration simultaneously, thereby, accounting for their correlations among one another.

The MIT team selected two key metrics to compare the effectiveness of various treatment paths: pavement quality and greenhouse gas emissions. These metrics were then calculated for all pavement segments in the Iowa network.

Improvement through variation

 The MIT model can help DOTs make better decisions, but that decision-making is ultimately constrained by the potential options considered.

Guo and his colleagues, therefore, sought to expand current decision-making paradigms by exploring a broad set of network management strategies and evaluating them with their PTPD approach. Based on that evaluation, the team discovered that networks had the best outcomes when the management strategy includes using a mix of paving materials, a variety of long- and short-term paving repair actions (treatments), and longer time periods on which to base paving decisions.

They then compared this proposed approach with a baseline management approach that reflects current, widespread practices: the use of solely asphalt materials, short-term treatments, and a five-year period for evaluating the outcomes of paving actions.

With these two approaches established, the team used them to plan 30 years of maintenance across the Iowa U.S. Route network. They then measured the subsequent road quality and emissions.

Their case study found that the MIT approach offered substantial benefits. Pavement-related greenhouse gas emissions would fall by around 20 percent across the network over the whole period. Pavement performance improved as well. To achieve the same level of road quality as the MIT approach, the baseline approach would need a 32 percent greater budget.

“It’s worth noting,” says Guo, “that since conventional practices employ less effective allocation tools, the difference between them and the CSHub approach should be even larger in practice.”

Much of the improvement derived from the precision of the CSHub planning model. But the three treatment strategies also play a key role.

“We’ve found that a mix of asphalt and concrete paving materials allows DOTs to not only find materials best-suited to certain projects, but also mitigates the risk of material price volatility over time,” says Kirchain.

It’s a similar story with a mix of paving actions. Employing a mix of short- and long-term fixes gives DOTs the flexibility to choose the right action for the right project.

The final strategy, a long-term evaluation period, enables DOTs to see the entire scope of their choices. If the ramifications of a decision are predicted over only five years, many long-term implications won’t be considered. Expanding the window for planning, then, can introduce beneficial, long-term options.

It’s not surprising that paving decisions are daunting to make; their impacts on the environment, driver safety, and budget levels are long-lasting. But rather than simplify this fraught process, the CSHub method aims to reflect its complexity. The result is an approach that provides DOTs with the tools to do more with less.

This research was supported through the MIT Concrete Sustainability Hub by the Portland Cement Association and the Ready Mixed Concrete Research and Education Foundation.

Read More

Using AI and old reports to understand new medical images

Getting a quick and accurate reading of an X-ray or some other medical images can be vital to a patient’s health and might even save a life. Obtaining such an assessment depends on the availability of a skilled radiologist and, consequently, a rapid response is not always possible. For that reason, says Ruizhi “Ray” Liao, a postdoc and a recent PhD graduate at MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL), “we want to train machines that are capable of reproducing what radiologists do every day.” Liao is first author of a new paper, written with other researchers at MIT and Boston-area hospitals, that is being presented this fall at MICCAI 2021, an international conference on medical image computing.

Although the idea of utilizing computers to interpret images is not new, the MIT-led group is drawing on an underused resource — the vast body of radiology reports that accompany medical images, written by radiologists in routine clinical practice — to improve the interpretive abilities of machine learning algorithms. The team is also utilizing a concept from information theory called mutual information — a statistical measure of the interdependence of two different variables — in order to boost the effectiveness of their approach.

Here’s how it works: First, a neural network is trained to determine the extent of a disease, such as pulmonary edema, by being presented with numerous X-ray images of patients’ lungs, along with a doctor’s rating of the severity of each case. That information is encapsulated within a collection of numbers. A separate neural network does the same for text, representing its information in a different collection of numbers. A third neural network then integrates the information between images and text in a coordinated way that maximizes the mutual information between the two datasets. “When the mutual information between images and text is high, that means that images are highly predictive of the text and the text is highly predictive of the images,” explains MIT Professor Polina Golland, a principal investigator at CSAIL.

Liao, Golland, and their colleagues have introduced another innovation that confers several advantages: Rather than working from entire images and radiology reports, they break the reports down to individual sentences and the portions of those images that the sentences pertain to. Doing things this way, Golland says, “estimates the severity of the disease more accurately than if you view the whole image and whole report. And because the model is examining smaller pieces of data, it can learn more readily and has more samples to train on.”

While Liao finds the computer science aspects of this project fascinating, a primary motivation for him is “to develop technology that is clinically meaningful and applicable to the real world.”

To that end, a pilot program is currently underway at the Beth Israel Deaconess Medical Center to see how MIT’s machine learning model could influence the way doctors managing heart failure patients make decisions, especially in an emergency room setting where speed is of the essence.

The model could have very broad applicability, according to Golland. “It could be used for any kind of imagery and associated text — inside or outside the medical realm. This general approach, moreover, could be applied beyond images and text, which is exciting to think about.”

Liao wrote the paper alongside MIT CSAIL postdoc Daniel Moyer and Golland; Miriam Cha and Keegan Quigley at MIT Lincoln Laboratory; William M. Wells at Harvard Medical School and MIT CSAIL; and clinical collaborators Seth Berkowitz and Steven Horng at Beth Israel Deaconess Medical Center.

The work was sponsored by the NIH NIBIB Neuroimaging Analysis Center, Wistron, MIT-IBM Watson AI Lab, MIT Deshpande Center for Technological Innovation, MIT Abdul Latif Jameel Clinic for Machine Learning in Health (J-Clinic), and MIT Lincoln Lab.

Read More

Toward a smarter electronic health record

Electronic health records have been widely adopted with the hope they would save time and improve the quality of patient care. But due to fragmented interfaces and tedious data entry procedures, physicians often spend more time navigating these systems than they do interacting with patients.

Researchers at MIT and the Beth Israel Deaconess Medical Center are combining machine learning and human-computer interaction to create a better electronic health record (EHR). They developed MedKnowts, a system that unifies the processes of looking up medical records and documenting patient information into a single, interactive interface.

Driven by artificial intelligence, this “smart” EHR automatically displays customized, patient-specific medical records when a clinician needs them. MedKnowts also provides autocomplete for clinical terms and auto-populates fields with patient information to help doctors work more efficiently.

“In the origins of EHRs, there was this tremendous enthusiasm that getting all this information organized would be helpful to be able to track billing records, report statistics to the government, and provide data for scientific research. But few stopped to ask the deep questions around whether they would be of use for the clinician. I think a lot of clinicians feel they have had this burden of EHRs put on them for the benefit of bureaucracies and scientists and accountants. We came into this project asking how EHRs might actually benefit clinicians,” says David Karger, professor of computer science in the Computer Science and Artificial Intelligence Laboratory (CSAIL) and senior author of the paper.

The research was co-authored by CSAIL graduate students Luke Murray, who is the lead author, Divya Gopinath, and Monica Agrawal. Other authors include Steven Horng, an emergency medicine attending physician and clinical lead for machine learning at the Center for Healthcare Delivery Science of Beth Israel Deaconess Medical Center, and David Sontag, associate professor of electrical engineering and computer science at MIT and a member of CSAIL and the Institute for Medical Engineering and Science, and a principal investigator at the Abdul Latif Jameel Clinic for Machine Learning in Health. It will be presented at the Association for Computing Machinery Symposium on User Interface Software and Technology next month.

A problem-oriented tool

To design an EHR that would benefit doctors, the researchers had to think like doctors.

They created a note-taking editor with a side panel that displays relevant information from the patient’s medical history. That historical information appears in the form of cards that are focused on particular problems or concepts.

For instance, if MedKnowts identifies the clinical term “diabetes” in the text as a clinician types, the system automatically displays a “diabetes card” containing medications, lab values, and snippets from past records that are relevant to diabetes treatment.

Most EHRs store historical information on separate pages and list medications or lab values alphabetically or chronologically, forcing the clinician to search through data to find the information they need, Murray says. MedKnowts only displays information relevant to the particular concept the clinician is writing about.

“This is a closer match to the way doctors think about information. A lot of times, doctors will do this subconsciously. They will look through a medications page and only focus on the medications that are relevant to the current conditions. We are helping to do that process automatically and hopefully move some things out of the doctor’s head so they have more time to think about the complex part, which is determining what is wrong with the patient and coming up with a treatment plan,” Murray says.

Pieces of interactive text called chips serve as links to related cards. As a physician types a note, the autocomplete system recognizes clinical terms, such as medications, lab values, or conditions, and transforms them into chips. Each chip is displayed as a word or phrase that has been highlighted in a certain color depending on its category (red for a medical condition, green for a medication, yellow for a procedure, etc.)

Through the use of autocomplete, structured data on the patient’s conditions, symptoms, and medication usage is collected with no additional effort from the physician.

Sontag says he hopes the advance will “change the paradigm of how to create large-scale health datasets for studying disease progression and assessing the real-world effectiveness of treatments.”

In practice

After a year-long iterative design process, the researchers tested MedKnowts by deploying the software in the emergency department at Beth Israel Deaconess Medical Center in Boston. They worked with an emergency physician and four hospital scribes who enter notes into the electronic health record.

Deploying the software in an emergency department, where doctors operate in a high-stress environment, involved a delicate balancing act, Agrawal says.

“One of the biggest challenges we faced was trying to get people to shift what they currently do. Doctors who have used the same system, and done the same dance of clicks so many times, form a sort of muscle memory. Whenever you are going to make a change, there is a question of is this worth it? And we definitely found that some features had greater usage than others,” she says.

The Covid-19 pandemic complicated the deployment, too. The researchers had been visiting the emergency department to get a sense of the workflow, but were forced to end those visits due to Covid-19 and were unable to be in the hospital while the system was being deployed.

Despite those initial challenges, MedKnowts became popular with the scribes over the course of the one-month deployment. They gave the system an average rating of 83.75 (out of 100) for usability.

Scribes found the autocomplete function especially useful for speeding up their work, according to survey results. Also, the color-coded chips helped them quickly scan notes for relevant information.

Those initial results are promising, but as the researchers consider the feedback and work on future iterations of MedKnowts, they plan to proceed with caution.

“What we are trying to do here is smooth the pathway for doctors and let them accelerate. There is some risk there. Part of the purpose of bureaucracy is to slow things down and make sure all the i’s are dotted and all the t’s are crossed. And if we have a computer dotting the i’s and crossing the t’s for doctors, that may actually be countering the goals of the bureaucracy, which is to force doctors to think twice before they make a decision. We have to be thinking about how to protect doctors and patients from the consequences of making the doctors more efficient,” Karger says.

A longer-term vision

The researchers plan to improve the machine learning algorithms that drive MedKnowts so the system can more effectively highlight parts of the medical record that are most relevant, Agrawal says.

They also want to consider the needs of different medical users. The researchers designed MedKnowts with an emergency department in mind — a setting where doctors are typically seeing patients for the first time. A primary care physician who knows their patients much better would likely have some different needs.

In the longer-term, the researchers envision creating an adaptive system that clinicians can contribute to. For example, perhaps a doctor realizes a certain cardiology term is missing from MedKnowts and adds that information to a card, which would update the system for all users.

The team is exploring commercialization as an avenue for further deployment.

“We want to build tools that let doctors create their own tools. We don’t expect doctors to learn to be programmers, but with the right support they might be able to radically customize whatever medical applications they are using to really suit their own needs and preferences,” Karger says.

This research was funded by the MIT Abdul Latif Jameel Clinic for Machine Learning in Health.

Read More

The real promise of synthetic data

Each year, the world generates more data than the previous year. In 2020 alone, an estimated 59 zettabytes of data will be “created, captured, copied, and consumed,” according to the International Data Corporation — enough to fill about a trillion 64-gigabyte hard drives.

But just because data are proliferating doesn’t mean everyone can actually use them. Companies and institutions, rightfully concerned with their users’ privacy, often restrict access to datasets — sometimes within their own teams. And now that the Covid-19 pandemic has shut down labs and offices, preventing people from visiting centralized data stores, sharing information safely is even more difficult.

Without access to data, it’s hard to make tools that actually work. Enter synthetic data: artificial information developers and engineers can use as a stand-in for real data.

Synthetic data is a bit like diet soda. To be effective, it has to resemble the “real thing” in certain ways. Diet soda should look, taste, and fizz like regular soda. Similarly, a synthetic dataset must have the same mathematical and statistical properties as the real-world dataset it’s standing in for. “It looks like it, and has formatting like it,” says Kalyan Veeramachaneni, principal investigator of the Data to AI (DAI) Lab and a principal research scientist in MIT’s Laboratory for Information and Decision Systems. If it’s run through a model, or used to build or test an application, it performs like that real-world data would.

But — just as diet soda should have fewer calories than the regular variety — a synthetic dataset must also differ from a real one in crucial aspects. If it’s based on a real dataset, for example, it shouldn’t contain or even hint at any of the information from that dataset.

Threading this needle is tricky. After years of work, Veeramachaneni and his collaborators recently unveiled a set of open-source data generation tools — a one-stop shop where users can get as much data as they need for their projects, in formats from tables to time series. They call it the Synthetic Data Vault.

Maximizing access while maintaining privacy

Veeramachaneni and his team first tried to create synthetic data in 2013. They had been tasked with analyzing a large amount of information from the online learning program edX, and wanted to bring in some MIT students to help. The data were sensitive, and couldn’t be shared with these new hires, so the team decided to create artificial data that the students could work with instead — figuring that “once they wrote the processing software, we could use it on the real data,” Veeramachaneni says.

This is a common scenario. Imagine you’re a software developer contracted by a hospital. You’ve been asked to build a dashboard that lets patients access their test results, prescriptions, and other health information. But you aren’t allowed to see any real patient data, because it’s private.

Most developers in this situation will make “a very simplistic version” of the data they need, and do their best, says Carles Sala, a researcher in the DAI lab. But when the dashboard goes live, there’s a good chance that “everything crashes,” he says, “because there are some edge cases they weren’t taking into account.”

High-quality synthetic data — as complex as what it’s meant to replace — would help to solve this problem. Companies and institutions could share it freely, allowing teams to work more collaboratively and efficiently. Developers could even carry it around on their laptops, knowing they weren’t putting any sensitive information at risk.

Perfecting the formula — and handling constraints

Back in 2013, Veeramachaneni’s team gave themselves two weeks to create a data pool they could use for that edX project. The timeline “seemed really reasonable,” Veeramachaneni says. “But we failed completely.” They soon realized that if they built a series of synthetic data generators, they could make the process quicker for everyone else.

In 2016, the team completed an algorithm that accurately captures correlations between the different fields in a real dataset — think a patient’s age, blood pressure, and heart rate — and creates a synthetic dataset that preserves those relationships, without any identifying information. When data scientists were asked to solve problems using this synthetic data, their solutions were as effective as those made with real data 70 percent of the time. The team presented this research at the 2016 IEEE International Conference on Data Science and Advanced Analytics.

For the next go-around, the team reached deep into the machine learning toolbox. In 2019, PhD student Lei Xu presented his new algorithm, CTGAN, at the 33rd Conference on Neural Information Processing Systems in Vancouver. CTGAN (for “conditional tabular generative adversarial networks) uses GANs to build and perfect synthetic data tables. GANs are pairs of neural networks that “play against each other,” Xu says. The first network, called a generator, creates something — in this case, a row of synthetic data — and the second, called the discriminator, tries to tell if it’s real or not.

“Eventually, the generator can generate perfect [data], and the discriminator cannot tell the difference,” says Xu. GANs are more often used in artificial image generation, but they work well for synthetic data, too: CTGAN outperformed classic synthetic data creation techniques in 85 percent of the cases tested in Xu’s study.

Statistical similarity is crucial. But depending on what they represent, datasets also come with their own vital context and constraints, which must be preserved in synthetic data. DAI lab researcher Sala gives the example of a hotel ledger: a guest always checks out after he or she checks in. The dates in a synthetic hotel reservation dataset must follow this rule, too: “They need to be in the right order,” he says.

Large datasets may contain a number of different relationships like this, each strictly defined. “Models cannot learn the constraints, because those are very context-dependent,” says Veeramachaneni. So the team recently finalized an interface that allows people to tell a synthetic data generator where those bounds are. “The data is generated within those constraints,” Veeramachaneni says.

Such precise data could aid companies and organizations in many different sectors. One example is banking, where increased digitization, along with new data privacy rules, have “triggered a growing interest in ways to generate synthetic data,” says Wim Blommaert, a team leader at ING financial services. Current solutions, like data-masking, often destroy valuable information that banks could otherwise use to make decisions, he said. A tool like SDV has the potential to sidestep the sensitive aspects of data while preserving these important constraints and relationships.

One vault to rule them all

The Synthetic Data Vault combines everything the group has built so far into “a whole ecosystem,” says Veeramachaneni. The idea is that stakeholders — from students to professional software developers — can come to the vault and get what they need, whether that’s a large table, a small amount of time-series data, or a mix of many different data types.

The vault is open-source and expandable. “There are a whole lot of different areas where we are realizing synthetic data can be used as well,” says Sala. For example, if a particular group is underrepresented in a sample dataset, synthetic data can be used to fill in those gaps — a sensitive endeavor that requires a lot of finesse. Or companies might also want to use synthetic data to plan for scenarios they haven’t yet experienced, like a huge bump in user traffic.

As use cases continue to come up, more tools will be developed and added to the vault, Veeramachaneni says. It may occupy the team for another seven years at least, but they are ready: “We’re just touching the tip of the iceberg.”

Read More

Machine learning uncovers potential new TB drugs

Machine learning uncovers potential new TB drugs

Machine learning is a computational tool used by many biologists to analyze huge amounts of data, helping them to identify potential new drugs. MIT researchers have now incorporated a new feature into these types of machine-learning algorithms, improving their prediction-making ability.

Using this new approach, which allows computer models to account for uncertainty in the data they’re analyzing, the MIT team identified several promising compounds that target a protein required by the bacteria that cause tuberculosis.

This method, which has previously been used by computer scientists but has not taken off in biology, could also prove useful in protein design and many other fields of biology, says Bonnie Berger, the Simons Professor of Mathematics and head of the Computation and Biology group in MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL).

“This technique is part of a known subfield of machine learning, but people have not brought it to biology,” Berger says. “This is a paradigm shift, and is absolutely how biological exploration should be done.”

Berger and Bryan Bryson, an assistant professor of biological engineering at MIT and a member of the Ragon Institute of MGH, MIT, and Harvard, are the senior authors of the study, which appears today in Cell Systems. MIT graduate student Brian Hie is the paper’s lead author.

Better predictions

Machine learning is a type of computer modeling in which an algorithm learns to make predictions based on data that it has already seen. In recent years, biologists have begun using machine learning to scour huge databases of potential drug compounds to find molecules that interact with particular targets.

One limitation of this method is that while the algorithms perform well when the data they’re analyzing are similar to the data they were trained on, they’re not very good at evaluating molecules that are very different from the ones they have already seen.

To overcome that, the researchers used a technique called Gaussian process to assign uncertainty values to the data that the algorithms are trained on. That way, when the models are analyzing the training data, they also take into account how reliable those predictions are.

For example, if the data going into the model predict how strongly a particular molecule binds to a target protein, as well as the uncertainty of those predictions, the model can use that information to make predictions for protein-target interactions that it hasn’t seen before. The model also estimates the certainty of its own predictions. When analyzing new data, the model’s predictions may have lower certainty for molecules that are very different from the training data. Researchers can use that information to help them decide which molecules to test experimentally.

Another advantage of this approach is that the algorithm requires only a small amount of training data. In this study, the MIT team trained the model with a dataset of 72 small molecules and their interactions with more than 400 proteins called protein kinases. They were then able to use this algorithm to analyze nearly 11,000 small molecules, which they took from the ZINC database, a publicly available repository that contains millions of chemical compounds. Many of these molecules were very different from those in the training data.

Using this approach, the researchers were able to identify molecules with very strong predicted binding affinities for the protein kinases they put into the model. These included three human kinases, as well as one kinase found in Mycobacterium tuberculosis. That kinase, PknB, is critical for the bacteria to survive, but is not targeted by any frontline TB antibiotics.

The researchers then experimentally tested some of their top hits to see how well they actually bind to their targets, and found that the model’s predictions were very accurate. Among the molecules that the model assigned the highest certainty, about 90 percent proved to be true hits — much higher than the 30 to 40 percent hit rate of existing machine learning models used for drug screens.

The researchers also used the same training data to train a traditional machine-learning algorithm, which does not incorporate uncertainty, and then had it analyze the same 11,000 molecule library. “Without uncertainty, the model just gets horribly confused and it proposes very weird chemical structures as interacting with the kinases,” Hie says.

The researchers then took some of their most promising PknB inhibitors and tested them against Mycobacterium tuberculosis grown in bacterial culture media, and found that they inhibited bacterial growth. The inhibitors also worked in human immune cells infected with the bacterium.

A good starting point

Another important element of this approach is that once the researchers get additional experimental data, they can add it to the model and retrain it, further improving the predictions. Even a small amount of data can help the model get better, the researchers say.

“You don’t really need very large data sets on each iteration,” Hie says. “You can just retrain the model with maybe 10 new examples, which is something that a biologist can easily generate.”

This study is the first in many years to propose new molecules that can target PknB, and should give drug developers a good starting point to try to develop drugs that target the kinase, Bryson says. “We’ve now provided them with some new leads beyond what has been already published,” he says.

The researchers also showed that they could use this same type of machine learning to boost the fluorescent output of a green fluorescent protein, which is commonly used to label molecules inside living cells. It could also be applied to many other types of biological studies, says Berger, who is now using it to analyze mutations that drive tumor development.

The research was funded by the U.S. Department of Defense through the National Defense Science and Engineering Graduate Fellowship; the National Institutes of Health; the Ragon Institute of MGH, MIT, and Harvard’ and MIT’s Department of Biological Engineering.

Read More

How we make moral decisions

How we make moral decisions

Imagine that one day you’re riding the train and decide to hop the turnstile to avoid paying the fare. It probably won’t have a big impact on the financial well-being of your local transportation system. But now ask yourself, “What if everyone did that?” The outcome is much different — the system would likely go bankrupt and no one would be able to ride the train anymore.

Moral philosophers have long believed this type of reasoning, known as universalization, is the best way to make moral decisions. But do ordinary people spontaneously use this kind of moral judgment in their everyday lives?

In a study of several hundred people, MIT and Harvard University researchers have confirmed that people do use this strategy in particular situations called “threshold problems.” These are social dilemmas in which harm can occur if everyone, or a large number of people, performs a certain action. The authors devised a mathematical model that quantitatively predicts the judgments they are likely to make. They also showed, for the first time, that children as young as 4 years old can use this type of reasoning to judge right and wrong.

“This mechanism seems to be a way that we spontaneously can figure out what are the kinds of actions that I can do that are sustainable in my community,” says Sydney Levine, a postdoc at MIT and Harvard and the lead author of the study.

Other authors of the study are Max Kleiman-Weiner, a postdoc at MIT and Harvard; Laura Schulz, an MIT professor of cognitive science; Joshua Tenenbaum, a professor of computational cognitive science at MIT and a member of MIT’s Center for Brains, Minds, and Machines and Computer Science and Artificial Intelligence Laboratory (CSAIL); and Fiery Cushman, an assistant professor of psychology at Harvard. The paper is appearing this week in the Proceedings of the National Academy of Sciences.

Judging morality

The concept of universalization has been included in philosophical theories since at least the 1700s. Universalization is one of several strategies that philosophers believe people use to make moral judgments, along with outcome-based reasoning and rule-based reasoning. However, there have been few psychological studies of universalization, and many questions remain regarding how often this strategy is used, and under what circumstances.

To explore those questions, the MIT/Harvard team asked participants in their study to evaluate the morality of actions taken in situations where harm could occur if too many people perform the action. In one hypothetical scenario, John, a fisherman, is trying to decide whether to start using a new, more efficient fishing hook that will allow him to catch more fish. However, if every fisherman in his village decided to use the new hook, there would soon be no fish left in the lake.

The researchers found that many subjects did use universalization to evaluate John’s actions, and that their judgments depended on a variety of factors, include the number of people who were interested in using the new hook and the number of people using it that would trigger a harmful outcome.

To tease out the impact of those factors, the researchers created several versions of the scenario. In one, no one else in the village was interested in using the new hook, and in that scenario, most participants deemed it acceptable for John to use it. However, if others in the village were interested but chose not to use it, then John’s decision to use it was judged to be morally wrong.

The researchers also found that they could use their data to create a mathematical model that explains how people take different factors into account, such as the number of people who want to do the action and the number of people doing it that would cause harm. The model accurately predicts how people’s judgments change when these factors change.

In their last set of studies, the researchers created scenarios that they used to test judgments made by children between the ages of 4 and 11. One story featured a child who wanted to take a rock from a path in a park for his rock collection. Children were asked to judge if that was OK, under two different circumstances: In one, only one child wanted a rock, and in the other, many other children also wanted to take rocks for their collections.

The researchers found that most of the children deemed it wrong to take a rock if everyone wanted to, but permissible if there was only one child who wanted to do it. However, the children were not able to specifically explain why they had made those judgments.

“What’s interesting about this is we discovered that if you set up this carefully controlled contrast, the kids seem to be using this computation, even though they can’t articulate it,” Levine says. “They can’t introspect on their cognition and know what they’re doing and why, but they seem to be deploying the mechanism anyway.”

In future studies, the researchers hope to explore how and when the ability to use this type of reasoning develops in children.

Collective action

In the real world, there are many instances where universalization could be a good strategy for making decisions, but it’s not necessary because rules are already in place governing those situations.

“There are a lot of collective action problems in our world that can be solved with universalization, but they’re already solved with governmental regulation,” Levine says. “We don’t rely on people to have to do that kind of reasoning, we just make it illegal to ride the bus without paying.”

However, universalization can still be useful in situations that arise suddenly, before any government regulations or guidelines have been put in place. For example, at the beginning of the Covid-19 pandemic, before many local governments began requiring masks in public places, people contemplating wearing masks might have asked themselves what would happen if everyone decided not to wear one.

The researchers now hope to explore the reasons why people sometimes don’t seem to use universalization in cases where it could be applicable, such as combating climate change. One possible explanation is that people don’t have enough information about the potential harm that can result from certain actions, Levine says.

The research was funded by the John Templeton Foundation, the Templeton World Charity Foundation, and the Center for Brains, Minds, and Machines.

Read More

SMART researchers receive Intra-CREATE grant for personalized medicine and cell therapy

SMART researchers receive Intra-CREATE grant for personalized medicine and cell therapy

Researchers from Critical Analytics for Manufacturing Personalized-Medicine (CAMP), an interdisciplinary research group at Singapore-MIT Alliance for Research and Technology (SMART), MIT’s research enterprise in Singapore, have been awarded Intra-CREATE grants from the National Research Foundation (NRF) Singapore to help support research on retinal biometrics for glaucoma progression and neural cell implantation therapy for spinal cord injuries. The grants are part of the NRF’s initiative to bring together researchers from Campus for Research Excellence And Technological Enterprise (CREATE) partner institutions, in order to achieve greater impact from collaborative research efforts.

SMART CAMP was formed in 2019 to focus on ways to produce living cells as medicine delivered to humans to treat a range of illnesses and medical conditions, including tissue degenerative diseases, cancer, and autoimmune disorders.

“Singapore’s well-established biopharmaceutical ecosystem brings with it a thriving research ecosystem that is supported by skilled talents and strong manufacturing capabilities. We are excited to collaborate with our partners in Singapore, bringing together an interdisciplinary group of experts from MIT and Singapore, for new research areas at SMART. In addition to our existing research on our three flagship projects, we hope to develop breakthroughs in manufacturing other cell therapy platforms that will enable better medical treatments and outcomes for society,” says Krystyn Van Vliet, co-lead principal investigator at SMART CAMP, professor of materials science and engineering, and associate provost at MIT.

Understanding glaucoma progression for better-targeted treatments

Hosted by SMART CAMP, the first research project, Retinal Analytics via Machine learning aiding Physics (RAMP), brings together an interdisciplinary group of ophthalmologists, data scientists, and optical scientists from SMART, Singapore Eye Research Institute (SERI), Agency for Science, Technology and Research (A*STAR), Duke-NUS Medical School, MIT, and National University of Singapore (NUS). The team will seek to establish first principles-founded and statistically confident models of glaucoma progression in patients. Through retinal biomechanics, the models will enable rapid and reliable forecast of the rate and trajectory of glaucoma progression, leading to better-targeted treatments.

Glaucoma, an eye condition often caused by stress-induced damage over time at the optic nerve head, accounts for 5.1 million of the estimated 38 million blind in the world and 40 percent of blindness in Singapore. Currently, health practitioners face challenges forecasting glaucoma progression and its treatment strategies due to the lack of research and technology that accurately establish the relationship between its properties, such as the elasticity of the retina and optic nerve heads, blood flow, intraocular pressure and, ultimately, damage to the optic nerve head.

The research is co-led by George Barbastathis, principal investigator at SMART CAMP and professor of mechanical engineering at MIT, and Aung Tin, executive director at SERI and professor at the Department of Ophthalmology at NUS. The team includes CAMP principal investigators Nicholas Fang, also a professor of mechanical engineering at MIT; Lisa Tucker-Kellogg, assistant professor with the Cancer and Stem Biology program at Duke-NUS; and Hanry Yu, professor of physiology with the Yong Loo Lin School of Medicine, NUS and CAMP’s co-lead principal investigator.

“We look forward to leveraging the ideas fostered in SMART CAMP to build data analytics and optical imaging capabilities for this pressing medical challenge of glaucoma prediction,” says Barbastathis.

Cell transplantation to treat irreparable spinal cord injury

Engineering Scaffold-Mediated Neural Cell Therapy for Spinal Cord Injury Treatment (ScaNCellS), the second research project, gathers an interdisciplinary group of engineers, cell biologists, and clinician scientists from SMART, Nanyang Technological University (NTU), NUS, IMCB A*STAR, A*STAR, French National Centre for Scientific Research (CNRS), the University of Cambridge, and MIT. The team will seek to design a combined scaffold and neural cell implantation therapy for spinal cord injury treatment that is safe, efficacious, and reproducible, paving the way forward for similar neural cell therapies for other neurological disorders. The project, an intersection of engineering and health, will achieve its goals through an enhanced biological understanding of the regeneration process of nerve tissue and optimized engineering methods to prepare cells and biomaterials for treatment.

Spinal cord injury (SCI), affecting between 250,00 and 500,000 people yearly, is expected to incur higher societal costs as compared to other common conditions such as dementia, multiple sclerosis, and cerebral palsy. SCI can lead to temporary or permanent changes in spinal cord function, including numbness or paralysis. Currently, even with the best possible treatment, the injury generally results in some incurable impairment.

The research is co-led by Chew Sing Yian, principal investigator at SMART CAMP and associate professor of the School of Chemical and Biomedical Engineering and Lee Kong Chian School of Medicine at NTU, and Laurent David, professor at University of Lyon (France) and leader of the Polymers for Life Sciences group at CNRS Polymer Engineering Laboratory. The team includes CAMP principal investigators Ai Ye from Singapore University of Technology and Design; Jongyoon Han and Zhao Xuanhe, both professors at MIT; as well as Shi-Yan Ng and Jonathan Loh from Institute of Molecular and Cell Biology, A*STAR.

Chew says, “Our earlier SMART and NTU scientific collaborations on progenitor cells in the central nervous system are now being extended to cell therapy translation. This helps us address SCI in a new way, and connect to the methods of quality analysis for cells developed in SMART CAMP.”

“Cell therapy, one of the fastest-growing areas of research, will provide patients with access to more options that will prevent and treat illnesses, some of which are currently incurable. Glaucoma and spinal cord injuries affect many. Our research will seek to plug current gaps and deliver valuable impact to cell therapy research and medical treatments for both conditions. With a good foundation to work on, we will be able to pave the way for future exciting research for further breakthroughs that will benefit the health-care industry and society,” says Hanry Yu, co-lead principal investigator at SMART CAMP, professor of physiology with the Yong Loo Lin School of Medicine, NUS, and group leader of the Institute of Bioengineering and Nanotechnology at A*STAR.

The grants for both projects will commence on  Oct. 1, with RAMP expected to run until Sept. 30, 2022, and ScaNCellS expected to run until Sept. 30, 2023.

SMART was. established by the MIT in partnership with the NRF in 2007. SMART is the first entity in the CREATE developed by NRF. SMART serves as an intellectual and innovation hub for research interactions between MIT and Singapore, undertaking cutting-edge research projects in areas of interest to both Singapore and MIT. SMART currently comprises an Innovation Centre and five interdisciplinary research groups (IRGs): Antimicrobial Resistance, CAMP, Disruptive and Sustainable Technologies for Agricultural Precision, Future Urban Mobility, and Low Energy Electronic Systems.

CAMP is a SMART IRG launched in June 2019. It focuses on better ways to produce living cells as medicine, or cellular therapies, to provide more patients access to promising and approved therapies. The investigators at CAMP address two key bottlenecks facing the production of a range of potential cell therapies: critical quality attributes (CQA) and process analytic technologies (PAT). Leveraging deep collaborations within Singapore and MIT in the United States, CAMP invents and demonstrates CQA/PAT capabilities from stem to immune cells. Its work addresses ailments ranging from cancer to tissue degeneration, targeting adherent and suspended cells, with and without genetic engineering.

CAMP is the R&D core of a comprehensive national effort on cell therapy manufacturing in Singapore.

Read More