In this post, we demonstrate how to use Amazon Comprehend Medical to extract medication names and medical conditions to monitor drug safety and adverse events. Amazon Comprehend Medical is a natural language processing (NLP) service that uses machine learning (ML) to easily extract relevant medical information from unstructured text. We query the OpenFDA API (an open-source API published by the FDA) and Clinicaltrials.gov API (another open-source API published by the National Library of Medicine (NLM) at the National Institutes of Health (NIH)) to get information on past adverse events, recalls, and clinical trials for the drug or medical condition in question. You can then use this data in population scale studies to further analyze the drug’s safety and efficacy.
Launching a new drug is an extensive process. By some estimates, it takes about 12 years to go from invention to launch. It involves various stages like preclinical testing, phase 1–3 clinical trials, and approvals by the Food and Drug Administration (FDA).In addition, new drugs require huge financial investments by pharmaceutical organizations. According to a new study published in JAMA Network, the median cost of bringing a drug to market is $918 million, with the range being between $314 million–$2.8 billion.
Even after launch, pharmaceutical companies continuously monitor for safety risks. Consumers can also directly report adverse drug reactions to the FDA. This could result in a drug recall, thereby jeopardizing millions of development dollars. Moreover, consumers who are taking these drugs and clinicians who are prescribing them need to be aware of such adverse reactions and decide whether corrective actions are necessary.
While no investment is guaranteed, drug manufacturers are starting to rely more on ML to achieve better outcomes and improve the chances of market success for new drugs they develop.
How can machine learning help?
To ensure drug safety, the FDA uses real-world data (RWD) and real-world evidence (RWE) to monitor post-market drug safety and adverse events. For more information, see real-world data (RWD) and real-world evidence (RWE) are playing an increasing role in health care decisions. This is also useful for healthcare professionals who develop guidelines and decision support tools based on RWD. Drug manufacturers can benefit from RWD analysis and use it to develop improved clinical trial designs and come up with new and innovative treatment approaches.
One of the major challenges with analyzing RWD effectively is that a lot of this data is unstructured—it doesn’t get stored in rows and columns that make it friendly to analytical queries. RWD can exist in multiple formats and span a variety of sources. It’s impracticable to use conventional analytical techniques to process unstructured data at the scale of a population. For more information, see Building a Real World Evidence Platform on AWS.
Advances in natural language processing (NLP) can help fill this gap. For example, you can use models trained on RWD to derive key entities (like medications and medical conditions) from adverse reactions reported by patients in natural language. After you extract these entities, you can store them in a database and integrate them into a variety of reporting applications. You can use them in population scale studies to determine cohorts susceptible to certain drugs or to analyze the drug’s safety and efficacy.
The following diagram represents the overall architecture of the solution. In addition to Amazon Comprehend Medical, you use the following services:
The architecture includes the following steps:
- The demo solution is a simple html page which will be served via a lambda function on the first invocation of the api gateway url. The url will be in the output section of CloudFormation stack or it can be grabbed from api gateway.
- The submit buttons on the url will asynchronously invoke 2 other lambdas via apigateway
- The 2 Lambdas will use a common layer function to vet the free text entered by user by Comprehend Medical and return medication and medical conditions.
- The lambda functions process the entities from Comprehend Medical to query open source api’s clinicaltrail.gov and open.fda.gov. The HTML would render the output from these lambdas into respective tables
To complete this walkthrough, you must have the following prerequisites:
- An AWS account
- An AWS Identity and Access Management (IAM) user with access to API Gateway, Lambda, Amazon Comprehend Medical, and AWS CloudFormation
- An S3 bucket
Configuring the CloudFormation stack
To configure your CloudFormation stack, complete the following steps:
- Sign in to the Amazon Management Console.
us-east-1as your Region.
- Launch the CloudFormation stack:
- Choose Next.
- For Stack name, enter a name; for example,
- In the Parameters section, update the API Gateway names as necessary.
- Provide the name of an S3 bucket in
us-east-1to store the CSV files.
- Choose Next.
- Select I acknowledge that AWS CloudFormation might create IAM resources.
- Choose Create stack.
The stack takes a few minutes to complete.
- On the Outputs tab, record the URL for the API Gateway.
Searching for information related to drugs and medical conditions
When you open the URL from the previous step, you can enter text related to drugs and medical conditions and choose Submit.
The output shows three tables with the following information:
- Adverse effects of the related drugs and symptoms – This information is queried from clinicaltrial.gov, and records are limited to a maximum of 10.
- Drug recall-related information – This information is queried from open.fda.gov, and records is limited to a maximum of 5 for every drug and symptom.
- Clinical trials for the related symptoms and drugs – This information is queried from clinicaltrial.gov.
In addition to the tables, the page displays two hyperlinks to download clinical trial information and the OpenFDA in a CSV file. These files have a maximum of 100 records for clinical trials and 100 for every drug and medical condition in OpenFDA.
This post demonstrated a simple application that allows drug manufacturers, healthcare professionals, and consumers to look up useful information from trusted sources like the FDA and NIH. Using this architecture and the available code base, you can integrate this solution into other downstream applications related to the analysis and reporting of adverse events. We hope this lowers the barrier of entry and increases adoption of ML to improve patient outcomes and improve quality of care.
About the authors
Varad Ram is Senior Solutions Architect in Partner Team at Amazon Web Services. He likes to help customers adopt to cloud technologies and is particularly interested in artificial intelligence. He believes deep learning will power future technology growth. In his spare time, his daughter and son keep him busy biking and hiking.
Ujjwal Ratan is Principal Machine Learning Specialist Solution Architect in the Global Healthcare and Lifesciences team at Amazon Web Services. He works on the application of machine learning and deep learning to real world industry problems like medical imaging, unstructured clinical text, genomics, precision medicine, clinical trials and quality of care improvement. He has expertise in scaling machine learning/deep learning algorithms on the AWS cloud for accelerated training and inference. In his free time, he enjoys listening to (and playing) music and taking unplanned road trips with his family.
Babu Srinivasan is Senior cloud architect at Deloitte. He works closely with customers in building scalable and resilient cloud-based architectures and accelerate the adoption of AWS cloud to solve business problems. Babu is also an APN (AWS Partner Network) Ambassador, passionate about sharing his AWS technical expertise with the technical community. In his spare time, Babu loves to spend time performing close-up card magic to friends and colleagues, wood turning in his garage woodshop or working on his AWS DeepRacer car.