Making roadway spending more sustainable

The share of federal spending on infrastructure has reached an all-time low, falling from 30 percent in 1960 to just 12 percent in 2018.

While the nation’s ailing infrastructure will require more funding to reach its full potential, recent MIT research finds that more sustainable and higher performing roads are still possible even with today’s limited budgets.

The research, conducted by a team of current and former MIT Concrete Sustainability Hub (MIT CSHub) scientists and published in Transportation Research D, finds that a set of innovative planning strategies could improve pavement network environmental and performance outcomes even if budgets don’t increase.

The paper presents a novel budget allocation tool and pairs it with three innovative strategies for managing pavement networks: a mix of paving materials, a mix of short- and long-term paving actions, and a long evaluation period for those actions.

This novel approach offers numerous benefits. When applied to a 30-year case study of the Iowa U.S. Route network, the MIT CSHub model and management strategies cut emissions by 20 percent while sustaining current levels of road quality. Achieving this with a conventional planning approach would require the state to spend 32 percent more than it does today. The key to its success is the consideration of a fundamental — but fraught — aspect of pavement asset management: uncertainty.

Predicting unpredictability

The average road must last many years and support the traffic of thousands — if not millions — of vehicles. Over that time, a lot can change. Material prices may fluctuate, budgets may tighten, and traffic levels may intensify. Climate (and climate change), too, can hasten unexpected repairs.

Managing these uncertainties effectively means looking long into the future and anticipating possible changes.

“Capturing the impacts of uncertainty is essential for making effective paving decisions,” explains Fengdi Guo, the paper’s lead author and a departing CSHub research assistant.

“Yet, measuring and relating these uncertainties to outcomes is also computationally intensive and expensive. Consequently, many DOTs [departments of transportation] are forced to simplify their analysis to plan maintenance — often resulting in suboptimal spending and outcomes.”

To give DOTs accessible tools to factor uncertainties into their planning, CSHub researchers have developed a streamlined planning approach. It offers greater specificity and is paired with several new pavement management strategies.

The planning approach, known as Probabilistic Treatment Path Dependence (PTPD), is based on machine learning and was devised by Guo.

“Our PTPD model is composed of four steps,” he explains. “These steps are, in order, pavement damage prediction; treatment cost prediction; budget allocation; and pavement network condition evaluation.”

The model begins by investigating every segment in an entire pavement network and predicting future possibilities for pavement deterioration, cost, and traffic.

“We [then] run thousands of simulations for each segment in the network to determine the likely cost and performance outcomes for each initial and subsequent sequence, or ‘path,’ of treatment actions,” says Guo. “The treatment paths with the best cost and performance outcomes are selected for each segment, and then across the network.”

The PTPD model not only seeks to minimize costs to agencies but also to users — in this case, drivers. These user costs can come primarily in the form of excess fuel consumption due to poor road quality.

“One improvement in our analysis is the incorporation of electric vehicle uptake into our cost and environmental impact predictions,” Randolph Kirchain, a principal research scientist at MIT CSHub and MIT Materials Research Laboratory (MRL) and one of the paper’s co-authors. “Since the vehicle fleet will change over the next several decades due to electric vehicle adoption, we made sure to consider how these changes might impact our predictions of excess energy consumption.”

After developing the PTPD model, Guo wanted to see how the efficacy of various pavement management strategies might differ. To do this, he developed a sophisticated deterioration prediction model.

A novel aspect of this deterioration model is its treatment of multiple deterioration metrics simultaneously. Using a multi-output neural network, a tool of artificial intelligence, the model can predict several forms of pavement deterioration simultaneously, thereby, accounting for their correlations among one another.

The MIT team selected two key metrics to compare the effectiveness of various treatment paths: pavement quality and greenhouse gas emissions. These metrics were then calculated for all pavement segments in the Iowa network.

Improvement through variation

 The MIT model can help DOTs make better decisions, but that decision-making is ultimately constrained by the potential options considered.

Guo and his colleagues, therefore, sought to expand current decision-making paradigms by exploring a broad set of network management strategies and evaluating them with their PTPD approach. Based on that evaluation, the team discovered that networks had the best outcomes when the management strategy includes using a mix of paving materials, a variety of long- and short-term paving repair actions (treatments), and longer time periods on which to base paving decisions.

They then compared this proposed approach with a baseline management approach that reflects current, widespread practices: the use of solely asphalt materials, short-term treatments, and a five-year period for evaluating the outcomes of paving actions.

With these two approaches established, the team used them to plan 30 years of maintenance across the Iowa U.S. Route network. They then measured the subsequent road quality and emissions.

Their case study found that the MIT approach offered substantial benefits. Pavement-related greenhouse gas emissions would fall by around 20 percent across the network over the whole period. Pavement performance improved as well. To achieve the same level of road quality as the MIT approach, the baseline approach would need a 32 percent greater budget.

“It’s worth noting,” says Guo, “that since conventional practices employ less effective allocation tools, the difference between them and the CSHub approach should be even larger in practice.”

Much of the improvement derived from the precision of the CSHub planning model. But the three treatment strategies also play a key role.

“We’ve found that a mix of asphalt and concrete paving materials allows DOTs to not only find materials best-suited to certain projects, but also mitigates the risk of material price volatility over time,” says Kirchain.

It’s a similar story with a mix of paving actions. Employing a mix of short- and long-term fixes gives DOTs the flexibility to choose the right action for the right project.

The final strategy, a long-term evaluation period, enables DOTs to see the entire scope of their choices. If the ramifications of a decision are predicted over only five years, many long-term implications won’t be considered. Expanding the window for planning, then, can introduce beneficial, long-term options.

It’s not surprising that paving decisions are daunting to make; their impacts on the environment, driver safety, and budget levels are long-lasting. But rather than simplify this fraught process, the CSHub method aims to reflect its complexity. The result is an approach that provides DOTs with the tools to do more with less.

This research was supported through the MIT Concrete Sustainability Hub by the Portland Cement Association and the Ready Mixed Concrete Research and Education Foundation.

Read More

Optical character recognition with TensorFlow Lite: A new example app

Posted by Wei Wei, TensorFlow Developer Advocate

As the old adage goes, “a picture is worth a thousand words.” Images are rich in visual information, but sometimes the key is with the text within. While it is easy for literate human beings to read words embedded in images, how do we use computer vision and machine learning to teach computers to do so?

Today, we are going to show you how to use TensorFlow Lite to extract text from images on Android devices. We will walk you through the key steps of the Optical Character Recognition (OCR) Android app that we recently open sourced here, which you can refer to for the complete code. You can see how the app extracts the product names from three Google product logos in the animation below.

Optical Character Recognition demo

The process of recognizing text from images is called Optical Character Recognition and is widely used in many domains. For example, Google Maps uses OCR technology to automatically extract information from the geo-located imagery to improve Google Maps.

Generally speaking, OCR is a pipeline with multiple steps. Usually they consist of text detection and text recognition:

  • Use a text detection model to find out bounding boxes around text;
  • Do some post-processing to transform the bounding boxes;
  • Transform the images within those bounding boxes into grayscale, so that a text recognition model can map out the words and numbers.

In our case, we are going to leverage the text detection and text recognition models from TensorFlow Hub. There are several different model versions for speed / accuracy tradeoffs; we use the float16 quantized models here. For more information on model quantization, please refer to the TensorFlow Lite quantization section. We also use OpenCV, which is a widely used computer vision library for Non-Maximum Suppression (NMS) and perspective transformation (we’ll expand on this later) to post-process detection results. In addition, we use the TFLite Support Library to grayscale and normalize the images.

OCR pipeline from text detection, perspective transformation, to recognition
OCR pipeline from text detection, perspective transformation, to recognition.

For text detection, since the detection model accepts a fixed size of 320×320, we use the TFLite Support Library to resize and normalize the input image:

val imageProcessor =
ImageProcessor.Builder()
.add(ResizeOp(height, width, ResizeOp.ResizeMethod.BILINEAR))
.add(NormalizeOp(means, stds))
.build()
var tensorImage = TensorImage(DataType.FLOAT32)

tensorImage.load(bitmapIn)
tensorImage = imageProcessor.process(tensorImage)

Then we use TFLite to run the detection model:

detectionInterpreter.runForMultipleInputsOutputs(detectionInputs, detectionOutputs)

The output of the detection model is a number of rotated bounding boxes which contain the text in the image. We run Non-Maximum Suppression to identify one bounding box for each text block with OpenCV:

NMSBoxesRotated(
boundingBoxesMat,
detectedConfidencesMat,
detectionConfidenceThreshold.toFloat(),
detectionNMSThreshold.toFloat(),
indicesMat
)

Sometimes texts inside images are distorted (e.g., the ‘kubernetes’ sticker on my laptop) with a perspective angle:

Perspective transformation demo
Perspective transformation demo

If we just feed the raw rotated bounding box into the recognition model, the model is unlikely to correctly identify the characters. In this case, we need to use OpenCV to do perspective transformation:

val rotationMatrix = getPerspectiveTransform(srcPtsMat, targetPtsMat)

warpPerspective(
srcBitmapMat,
recognitionBitmapMat,
rotationMatrix,
Size(recognitionImageWidth.toDouble(), recognitionImageHeight.toDouble())
)

After that, we use the TFLite Support Library again to resize, grayscale, and normalize the transformed images inside the bounding boxes:

val imageProcessor =
ImageProcessor.Builder()
.add(ResizeOp(height, width, ResizeOp.ResizeMethod.BILINEAR))
.add(TransformToGrayscaleOp())
.add(NormalizeOp(mean, std))
.build()

Finally, we run the text recognition model, map out the characters and numbers from the model output, and update the app UI:

recognitionInterpreter.run(recognitionTensorImage.buffer, recognitionResult)

var recognizedText = ""
for (k in 0 until recognitionModelOutputSize) {
var alphabetIndex = recognitionResult.getInt(k * 8)
if (alphabetIndex in 0..alphabets.length - 1)
recognizedText = recognizedText + alphabets[alphabetIndex]
}
Log.d("Recognition result:", recognizedText)
if (recognizedText != "") {
ocrResults.put(recognizedText, getRandomColor())
}

That’s it. We are now able to extract text from input images using TFLite within our app.

Finally, if you just want a ready-to-use OCR SDK, Google also offers on-device OCR functionality through ML Kit, which uses TFLite underneath and should be sufficient for most OCR use cases. There are some situations where you may want to build your own OCR solution with TFLite such as:

  • You have your own text detection / recognition TFLite models that you would like to use;
  • You have special business requirements (e.g. recognizing upside-down text) and need to customize the OCR pipeline;
  • You want to support languages not covered by ML Kit;
  • Your target user devices that don’t necessarily have Google Play services installed;
  • You want to have control over hardware backends (CPU / GPU / etc.) used to run your models.

In these cases, I hope that this tutorial and our example implementation can help you get started on building your own OCR functionality in your app.

You can learn more about OCR with the resources below.

Acknowledgements

The author would like to thank Tian Lin for the helpful feedback and community contributors @Tulasi123789 and @risingsayak for their prior work on OCR using TFLite (creating and uploading the models to TF Hub, providing accompanying notebooks, and etc.).

Read More

Using AI and old reports to understand new medical images

Getting a quick and accurate reading of an X-ray or some other medical images can be vital to a patient’s health and might even save a life. Obtaining such an assessment depends on the availability of a skilled radiologist and, consequently, a rapid response is not always possible. For that reason, says Ruizhi “Ray” Liao, a postdoc and a recent PhD graduate at MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL), “we want to train machines that are capable of reproducing what radiologists do every day.” Liao is first author of a new paper, written with other researchers at MIT and Boston-area hospitals, that is being presented this fall at MICCAI 2021, an international conference on medical image computing.

Although the idea of utilizing computers to interpret images is not new, the MIT-led group is drawing on an underused resource — the vast body of radiology reports that accompany medical images, written by radiologists in routine clinical practice — to improve the interpretive abilities of machine learning algorithms. The team is also utilizing a concept from information theory called mutual information — a statistical measure of the interdependence of two different variables — in order to boost the effectiveness of their approach.

Here’s how it works: First, a neural network is trained to determine the extent of a disease, such as pulmonary edema, by being presented with numerous X-ray images of patients’ lungs, along with a doctor’s rating of the severity of each case. That information is encapsulated within a collection of numbers. A separate neural network does the same for text, representing its information in a different collection of numbers. A third neural network then integrates the information between images and text in a coordinated way that maximizes the mutual information between the two datasets. “When the mutual information between images and text is high, that means that images are highly predictive of the text and the text is highly predictive of the images,” explains MIT Professor Polina Golland, a principal investigator at CSAIL.

Liao, Golland, and their colleagues have introduced another innovation that confers several advantages: Rather than working from entire images and radiology reports, they break the reports down to individual sentences and the portions of those images that the sentences pertain to. Doing things this way, Golland says, “estimates the severity of the disease more accurately than if you view the whole image and whole report. And because the model is examining smaller pieces of data, it can learn more readily and has more samples to train on.”

While Liao finds the computer science aspects of this project fascinating, a primary motivation for him is “to develop technology that is clinically meaningful and applicable to the real world.”

To that end, a pilot program is currently underway at the Beth Israel Deaconess Medical Center to see how MIT’s machine learning model could influence the way doctors managing heart failure patients make decisions, especially in an emergency room setting where speed is of the essence.

The model could have very broad applicability, according to Golland. “It could be used for any kind of imagery and associated text — inside or outside the medical realm. This general approach, moreover, could be applied beyond images and text, which is exciting to think about.”

Liao wrote the paper alongside MIT CSAIL postdoc Daniel Moyer and Golland; Miriam Cha and Keegan Quigley at MIT Lincoln Laboratory; William M. Wells at Harvard Medical School and MIT CSAIL; and clinical collaborators Seth Berkowitz and Steven Horng at Beth Israel Deaconess Medical Center.

The work was sponsored by the NIH NIBIB Neuroimaging Analysis Center, Wistron, MIT-IBM Watson AI Lab, MIT Deshpande Center for Technological Innovation, MIT Abdul Latif Jameel Clinic for Machine Learning in Health (J-Clinic), and MIT Lincoln Lab.

Read More

Toward a smarter electronic health record

Electronic health records have been widely adopted with the hope they would save time and improve the quality of patient care. But due to fragmented interfaces and tedious data entry procedures, physicians often spend more time navigating these systems than they do interacting with patients.

Researchers at MIT and the Beth Israel Deaconess Medical Center are combining machine learning and human-computer interaction to create a better electronic health record (EHR). They developed MedKnowts, a system that unifies the processes of looking up medical records and documenting patient information into a single, interactive interface.

Driven by artificial intelligence, this “smart” EHR automatically displays customized, patient-specific medical records when a clinician needs them. MedKnowts also provides autocomplete for clinical terms and auto-populates fields with patient information to help doctors work more efficiently.

“In the origins of EHRs, there was this tremendous enthusiasm that getting all this information organized would be helpful to be able to track billing records, report statistics to the government, and provide data for scientific research. But few stopped to ask the deep questions around whether they would be of use for the clinician. I think a lot of clinicians feel they have had this burden of EHRs put on them for the benefit of bureaucracies and scientists and accountants. We came into this project asking how EHRs might actually benefit clinicians,” says David Karger, professor of computer science in the Computer Science and Artificial Intelligence Laboratory (CSAIL) and senior author of the paper.

The research was co-authored by CSAIL graduate students Luke Murray, who is the lead author, Divya Gopinath, and Monica Agrawal. Other authors include Steven Horng, an emergency medicine attending physician and clinical lead for machine learning at the Center for Healthcare Delivery Science of Beth Israel Deaconess Medical Center, and David Sontag, associate professor of electrical engineering and computer science at MIT and a member of CSAIL and the Institute for Medical Engineering and Science, and a principal investigator at the Abdul Latif Jameel Clinic for Machine Learning in Health. It will be presented at the Association for Computing Machinery Symposium on User Interface Software and Technology next month.

A problem-oriented tool

To design an EHR that would benefit doctors, the researchers had to think like doctors.

They created a note-taking editor with a side panel that displays relevant information from the patient’s medical history. That historical information appears in the form of cards that are focused on particular problems or concepts.

For instance, if MedKnowts identifies the clinical term “diabetes” in the text as a clinician types, the system automatically displays a “diabetes card” containing medications, lab values, and snippets from past records that are relevant to diabetes treatment.

Most EHRs store historical information on separate pages and list medications or lab values alphabetically or chronologically, forcing the clinician to search through data to find the information they need, Murray says. MedKnowts only displays information relevant to the particular concept the clinician is writing about.

“This is a closer match to the way doctors think about information. A lot of times, doctors will do this subconsciously. They will look through a medications page and only focus on the medications that are relevant to the current conditions. We are helping to do that process automatically and hopefully move some things out of the doctor’s head so they have more time to think about the complex part, which is determining what is wrong with the patient and coming up with a treatment plan,” Murray says.

Pieces of interactive text called chips serve as links to related cards. As a physician types a note, the autocomplete system recognizes clinical terms, such as medications, lab values, or conditions, and transforms them into chips. Each chip is displayed as a word or phrase that has been highlighted in a certain color depending on its category (red for a medical condition, green for a medication, yellow for a procedure, etc.)

Through the use of autocomplete, structured data on the patient’s conditions, symptoms, and medication usage is collected with no additional effort from the physician.

Sontag says he hopes the advance will “change the paradigm of how to create large-scale health datasets for studying disease progression and assessing the real-world effectiveness of treatments.”

In practice

After a year-long iterative design process, the researchers tested MedKnowts by deploying the software in the emergency department at Beth Israel Deaconess Medical Center in Boston. They worked with an emergency physician and four hospital scribes who enter notes into the electronic health record.

Deploying the software in an emergency department, where doctors operate in a high-stress environment, involved a delicate balancing act, Agrawal says.

“One of the biggest challenges we faced was trying to get people to shift what they currently do. Doctors who have used the same system, and done the same dance of clicks so many times, form a sort of muscle memory. Whenever you are going to make a change, there is a question of is this worth it? And we definitely found that some features had greater usage than others,” she says.

The Covid-19 pandemic complicated the deployment, too. The researchers had been visiting the emergency department to get a sense of the workflow, but were forced to end those visits due to Covid-19 and were unable to be in the hospital while the system was being deployed.

Despite those initial challenges, MedKnowts became popular with the scribes over the course of the one-month deployment. They gave the system an average rating of 83.75 (out of 100) for usability.

Scribes found the autocomplete function especially useful for speeding up their work, according to survey results. Also, the color-coded chips helped them quickly scan notes for relevant information.

Those initial results are promising, but as the researchers consider the feedback and work on future iterations of MedKnowts, they plan to proceed with caution.

“What we are trying to do here is smooth the pathway for doctors and let them accelerate. There is some risk there. Part of the purpose of bureaucracy is to slow things down and make sure all the i’s are dotted and all the t’s are crossed. And if we have a computer dotting the i’s and crossing the t’s for doctors, that may actually be countering the goals of the bureaucracy, which is to force doctors to think twice before they make a decision. We have to be thinking about how to protect doctors and patients from the consequences of making the doctors more efficient,” Karger says.

A longer-term vision

The researchers plan to improve the machine learning algorithms that drive MedKnowts so the system can more effectively highlight parts of the medical record that are most relevant, Agrawal says.

They also want to consider the needs of different medical users. The researchers designed MedKnowts with an emergency department in mind — a setting where doctors are typically seeing patients for the first time. A primary care physician who knows their patients much better would likely have some different needs.

In the longer-term, the researchers envision creating an adaptive system that clinicians can contribute to. For example, perhaps a doctor realizes a certain cardiology term is missing from MedKnowts and adds that information to a card, which would update the system for all users.

The team is exploring commercialization as an avenue for further deployment.

“We want to build tools that let doctors create their own tools. We don’t expect doctors to learn to be programmers, but with the right support they might be able to radically customize whatever medical applications they are using to really suit their own needs and preferences,” Karger says.

This research was funded by the MIT Abdul Latif Jameel Clinic for Machine Learning in Health.

Read More

Googler Marian Croak is now in the Inventors Hall of Fame

Look around you right now and consider everything that was created by an inventor. The computer you’re reading this article on, the internet necessary to load this article, the electricity that powers the screen, even the coffee maker you used this morning. 

To recognize the incredible contributions of those inventors and the benefits they bring to our everyday life, the National Inventors Hall of Fame has inducted a new group of honorees every year since 1973. In this year’s combined inductee class of 2020/2021, Googler Marian Croak is being honored for her work in advancing VoIP (Voice over Internet Protocol) Technology, which powers the online calls and video chats that have helped businesses and families stay connected through the COVID-19 pandemic. She holds more than 200 patents, and recently was honored by the U.S. Patent and Trademark Office. 

These days, Marian leads our Research Center for Responsible AI and Human Centered Technology, which is responsible for ensuring Google ​​develops artificial intelligence responsibly and that it has a positive impact. We chatted over Google Meet to find out how plumbers and electricians sparked her interest in science, how her inventions have made life in a pandemic a tiny bit easier for everyone, and what the NIHF honor means to her.  

When was the first time you realized you were interested in technology?

I was probably around 5 or 6. I know that we don’t usually think of things like plumbing or electricity as necessarily technology, but they are. I was very enchanted with plumbers and electricians who would come to our house and fix things. They would be dirty and greasy, but I would love the smell, you know? I felt like, Wow, what a miracle worker! I would follow them around, trying to figure out how they’d fix something. I still do that today! 

So when you have electricians come to your house, you’re still like, “Hey, how did you do that?”

There was a leak once, and I was asking the plumber all these questions, and he asked me to quiet down! Because he needed to listen to the invisible flow of water through the pipes to determine the problem. It was amazing to me how similar it was to network engineering!

You’ve had a few different roles at Google and Alphabet so far. How did you move to where you are today?

When I first came to Google, my first role was bringing the Internet to emerging markets. Laying fiber in Africa, building public Wi-Fi in railroad stations in India and then exploring the landscape in countries like Cuba and countries where there wasn’t an openness yet for the Internet. And that was a fascinating job. It was a merger of technology, policy and governmental affairs, combined with an understanding of communities and regions. 

Then I worked on bringing features and technology and Google’s products to the next billion users. And after I did that for a few years, I joined the Site Reliability Engineering organization to help enhance the performance of Google’s complex, integrated systems. Now my current role is leading the Research Center for Responsible AI and Human Centered Technology group. I’m inspired that my work has the potential to positively impact so many of our users. 

Today you’re being inducted into the National Inventors Hall of Fame for your work in advancing VoIP technology. What inspired you to work on VoIP, and can you describe that process of bringing the technology to life?

I have alway been motivated by the desire to change the world, and to do that I try to change the world that I’m currently in. What I mean by that is I work on problems that I am aware of, and that I can tackle within the world that surrounds me. So when I began working on VoIP technology, it was at a time in the late ‘90s when there was a lot of change happening involving the internet. Netscape had put a user-friendly web browser in place and there was a lot of new activity beginning to bubble up all over the online world. 

I was part of a team that was also very interested in doing testing and prototyping of voice communications over the internet. There were some existing technologies but they didn’t scale and they were proprietary in nature, so we were thinking of ways we could open it up, make it scalable, make it reliable and be able to support billions of daily calls. We started to work on this but had a lot of doubters telling us that this wouldn’t work, and that no one would ever use this “toy like” technology. And at the time, they were right: It wasn’t working and it wasn’t reliable. But over time we were able to get it to a point where it started working very well. So much so that eventually the senior leaders within AT&T began to adopt the technology for their core network. It was challenging but an exciting thing for me to do because I like to bring change to things, especially when people doubt that it can happen.

What advice would you give to aspiring inventors? 

Most importantly, don’t give up, and during the process of creation, listen to your critics. I received so much criticism and in many ways it was valid. That type of feedback motivated me to improve the technology, and really address a variety of pain points that I hadn’t necessarily thought of. 

What does being inducted into the NIHF mean to you? 

Well it’s humbling, and a great experience. At the time I never thought the work that I was doing was that significant and that it would lead to this, but I’m so I’m very grateful for the recognition.

What does it mean to be a part of a class that sees the first two Black women inducted into the NIHF?

I find that it inspires people when they see someone who looks like themselves on some dimension, and I’m proud to offer that type of representation. People also see that I’m just a normal person like themselves and I think that also inspires them to accomplish their goals. I want people to understand that it may be difficult but that they can overcome obstacles and that it will be so worth it.

Read More