Generate actionable insights for predictive maintenance management with Amazon Monitron and Amazon Kinesis

Reliability managers and technicians in industrial environments such as manufacturing production lines, warehouses, and industrial plants are keen to improve equipment health and uptime to maximize product output and quality. Machine and process failures are often addressed by reactive activity after incidents happen or by costly preventive maintenance, where you run the risk of over-maintaining the equipment or missing issues that could happen between the periodic maintenance cycles. Predictive condition-based maintenance is a proactive strategy that is better than reactive or preventive ones. Indeed, this approach combines continuous monitoring, predictive analytics, and just-in-time action. This enables maintenance and reliability teams to service equipment only when necessary, based on the actual equipment condition.

There have been common challenges with condition-based monitoring to generate actionable insights for large industrial asset fleets. These challenges include but are not limited to: build and maintain a complex infrastructure of sensors collecting data from the field, obtain a reliable high-level summary of industrial asset fleets, efficiently manage failure alerts, identify possible root causes of anomalies, and effectively visualize the state of industrial assets at scale.

Amazon Monitron is an end-to-end condition monitoring solution that enables you to start monitoring equipment health with the aid of machine learning (ML) in minutes, so you can implement predictive maintenance and reduce unplanned downtime. It includes sensor devices to capture vibration and temperature data, a gateway device to securely transfer data to the AWS Cloud, the Amazon Monitron service that analyzes the data for anomalies with ML, and a companion mobile app to track potential failures in your machinery. Your field engineers and operators can directly use the app to diagnose and plan maintenance for industrial assets.

From the operational technology (OT) team standpoint, using the Amazon Monitron data also opens up new ways to improve how they operate large industrial asset fleets thanks to AI. OT teams can reinforce the predictive maintenance practice from their organization by building a consolidated view across multiple hierarchies (assets, sites, and plants). They can combine actual measurement and ML inference results with unacknowledged alarms, sensors or getaways connectivity status, or asset state transitions to build a high-level summary for the scope (asset, site, project) they are focused on.

With the recently launched Amazon Monitron Kinesis data export v2 feature, your OT team can stream incoming measurement data and inference results from Amazon Monitron via Amazon Kinesis to AWS Simple Storage Service (Amazon S3) to build an Internet of Things (IoT) data lake. By leveraging the latest data export schema, you can obtain sensors connectivity status, gateways connectivity status, measurement classification results, closure reason code and details of asset state transition events.

Use cases overview

The enriched data stream Amazon Monitron now exposes enables you to implement several key use cases such as automated work order creation, enriching an operational single pane of glass or automating failure reporting. Let’s dive into these use cases.

You can use the Amazon Monitron Kinesis data export v2 to create work orders in Enterprise Asset Management (EAM) systems such as Infor EAM, SAP Asset Management, or IBM Maximo. For example, in the video avoiding mechanical issues with predictive maintenance & Amazon Monitron, you can discover how our Amazon Fulfillment Centers are avoiding mechanical issues on conveyor belts with Amazon Monitron sensors integrated with third-party software such as the EAM used at Amazon as well as with the chat rooms technicians used. This shows how you can naturally integrate Amazon Monitron insights into your existing workflows. Stay tuned in the coming months to read the next installment of this series with an actual implementation of this integration works.

You can also use the data stream to ingest Amazon Monitron insights back into a shop floor system such as a Supervisory Control and Data Acquisition (SCADA) or a Historian. Shop floor operators are more efficient when all the insights about their assets and processes are provided in a single pane of glass. In this concept, Amazon Monitron doesn’t become yet another tool technicians have to monitor, but another data source with insights provided in the single view they are already used to. Later this year, we will also describe an architecture you can use to perform this task and send Amazon Monitron feedback to major third-party SCADA systems and Historians.

Last but not least, the new data stream from Amazon Monitron includes the asset state transitions and closure codes provided by users when acknowledging alarms (which trigger the transition to a new state). Thanks to this data, you can automatically build visualizations that provide real-time reporting of the failures and actions taken while operating their assets.

Your team can then build a broader data analytics dashboard to support your industrial fleet management practice by combining this asset state data with Amazon Monitron measurement data and other IoT data across large industrial asset fleets by using key AWS services, which we describe in this post. We explain how to build an IoT data lake, the workflow to produce and consume the data, as well as a summary dashboard to visualize Amazon Monitron sensors data and inference results. We use an Amazon Monitron dataset coming from about 780 sensors installed in an industrial warehouse, which has been running for more than 1 year. For the detailed Amazon Monitron installation guide, refer to Getting started with Amazon Monitron.

Solution overview

Amazon Monitron provides ML inference of asset health status after 21 days of the ML model training period for each asset. In this solution, the measurement data and ML inference from these sensors are exported to Amazon S3 via Amazon Kinesis Data Streams by using the latest Amazon Monitron data export feature. As soon as Amazon Monitron IoT data is available in Amazon S3, a database and table are created in Amazon Athena by using an AWS Glue crawler. You can query Amazon Monitron data via AWS Glue tables with Athena, and visualize the measurement data and ML inference with Amazon Managed Grafana. With Amazon Managed Grafana, you can create, explore, and share observability dashboards with your team, and spend less time managing your Grafana infrastructure. In this post, you connect Amazon Managed Grafana to Athena, and learn how to build a data analytics dashboard with Amazon Monitron data to help you plan industrial asset operations at scale.

The following screenshot is an example of what you can achieve at the end of this post. This dashboard is divided into three sections:

Plant View – Analytical information from all sensors across plants; for example, the overall counts of various states of sensors (Healthy, Warning, or Alarm), number of unacknowledged and acknowledged alarms, gateway connectivity, and average time for maintenance
Site View – Site-level statistics, such as asset status statistics at each site, total number of days that an alarm remains unacknowledged, top/bottom performing assets at each site, and more
Asset View – Summary information for the Amazon Monitron project at the asset level, such as the alarm type for an unacknowledged alarm (ISO or ML), the timeline for an alarm, and more

These panels are examples that can help strategic operational planning, but they are not exclusive. You can use a similar workflow to customize the dashboard according to your targeted KPI.

Architecture overview

The solution you will build in this post combines Amazon Monitron, Kinesis Data Streams, Amazon Kinesis Data Firehose, Amazon S3, AWS Glue, Athena, and Amazon Managed Grafana.

The following diagram illustrates the solution architecture. Amazon Monitron sensors measure and detect anomalies from equipment. Both measurement data and ML inference outputs are exported at a frequency of once per hour to a Kinesis data stream, and they are delivered to Amazon S3 via Kinesis Data Firehose with a 1-minute buffer. The exported Amazon Monitron data is in JSON format. An AWS Glue crawler analyzes the Amazon Monitron data in Amazon S3 at a chosen frequency of once per hour, builds a metadata schema, and creates tables in Athena. Finally, Amazon Managed Grafana uses Athena to query the Amazon S3 data, allowing dashboards to be built to visualize both measurement data and device health status.

To build this solution, you complete the following high-level steps:

Enable a Kinesis Data Stream export from Amazon Monitron and create a data stream.
Configure Kinesis Data Firehose to deliver data from the data stream to an S3 bucket.
Build the AWS Glue crawler to create a table of Amazon S3 data in Athena.
Create a dashboard of Amazon Monitron devices with Amazon Managed Grafana.

Prerequisites

For this walkthrough, you should have the following prerequisites:

An AWS account. If you don’t have an AWS account, follow the instructions to create one, unless you have been provided event engine details.
A user role with administrator access (service access associated with this role can be constrained further when the workflow goes to production).
A running Amazon Monitron project with sensors deployed on your equipment.
The latest version of Firefox or Chrome.
No specialized knowledge is required to build this solution, but basic Linux, Python, and SQL knowledge will help.

Additionally, make sure that all the resources you deploy are in the same Region.

Enable a Kinesis data stream export from Amazon Monitron and create a data stream

To configure your data stream export, complete the following steps:

On the Amazon Monitron console, from your project’s main page, choose Start live data export.
Under Select Amazon Kinesis data stream, choose Create a new data stream.
Under Data stream configuration, enter your data stream name.
For Data stream capacity, choose On-demand.
Choose Create data stream.

Note that any live data export enabled after April 4th, 2023 will stream data following the Kinesis Data Streams v2 schema. If you have an existing data export that was enabled before this date, the schema will follow the v1 format.

You can now see live data export information on the Amazon Monitron console with your specified Kinesis data stream.

Configure Kinesis Data Firehose to deliver data to an S3 bucket

To configure your Firehose delivery stream, complete the following steps:

On the Kinesis console, choose Delivery streams in the navigation pane.
Choose Create delivery stream.
For Source, select Amazon Kinesis Data Streams.
For Destination, select Amazon S3.
Under Source settings, for Kinesis data stream, enter the ARN of your Kinesis data stream.
Under Delivery stream name, enter the name of your Kinesis data stream.
Under Destination settings, choose an S3 bucket or enter a bucket URI. You can either use an existing S3 bucket to store Amazon Monitron data, or you can create a new S3 bucket.
Enable dynamic partitioning using inline parsing for JSON:
- Choose Enabled for Dynamic partitioning.
- Choose Enabled for Inline parsing for JSON.
- Under Dynamic partitioning keys, add the following partition keys:

Key Name	JQ Expression
project	`.projectName\| "project=(.)"`
site	`.eventPayload.siteName\| "site=(.)"`
asset	`.eventPayload.assetName\| "asset=(.)"`
position	`.eventPayload.positionName\| "position=(.)"`
time	`.timestamp\| sub(" [0-9]{2}:[0-9]{2}:[0-9]{2}.[0-9]{3}$"; "")\| "time=(.)"`

Choose Apply dynamic partitioning keys and confirm the generated S3 bucket prefix is:

!{partitionKeyFromQuery:project}/!{partitionKeyFromQuery:site}/!{partitionKeyFromQuery:asset}/!{partitionKeyFromQuery:position}/!{partitionKeyFromQuery:time}/.

Enter a prefix for S3 bucket error output prefix. Any JSON payload that doesn’t contain the keys described earlier will be delivered in this prefix. For instance, thegatewayConnectedand gatewayDisconnected events are not linked to a given asset or position. Therefore, they won’t contain the assetName and positionName fields. Specifying this optional prefix here allows you to monitor this location and process these events accordingly.
Choose Create delivery stream.

You can inspect the Amazon Monitron data in the S3 bucket. Note that the Amazon Monitron data will export live data at a frequency of once per hour, so wait for 1 hour to inspect the data.

This Kinesis Data Firehose setup enables dynamic partitioning, and the S3 objects delivered will use the following key format:

/project={projectName}/site={siteDisplayName}/asset={assetDisplayName}/ position={sensorPositionDisplayName}/time={yyyy-mm-dd 00:00:00}/{filename}.

Build the AWS Glue crawler to create a table of Amazon S3 data in Athena

After the live data has been exported to Amazon S3, we use an AWS Glue crawler to generate the metadata tables. In this post, we use AWS Glue crawlers to automatically infer database and table schema from Amazon Monitron data exported in Amazon S3, and store the associated metadata in the AWS Glue Data Catalog. Athena then uses the table metadata from the Data Catalog to find, read, and process the data in Amazon S3. Complete the following steps to create your database and table schema:

On the AWS Glue console, choose Crawlers in the navigation pane.
Choose Create crawler.
Enter a name for the crawler (for example,XXX_xxxx_monitron).
Choose Next.
For Is your data already mapped to Glue tables, choose Not yet.
For Data Source, choose S3.
For Location of S3 data, choose In this Account, and enter the path of your S3 bucket directory you set up in the previous section (s3://YourBucketName).
For Repeat crawls of S3 data stores, select Crawl all sub-folders.
Finally, choose Next.
Select Create new IAM role and enter a name for the role.
Choose Next.
Select Add Database, and enter a name for the database. This creates the Athena database where your metadata tables are located after the crawler is complete.
For Crawler Schedule, select a preferred time-based scheduler (for example, hourly) to refresh the Amazon Monitron data in the database, and choose Next.
Review the crawler details and choose Create.
On the Crawlers page of the AWS Glue console, select the crawler you created and choose Run crawler.

You may need to wait a few minutes, depending on the size of the data. When it’s complete, the crawler’s status shows as Ready. To see the metadata tables, navigate to your database on the Databases page and choose Tables in the navigation pane.

You can also view data by choosing Table data on the console.

You’re redirected to the Athena console to view the top 10 records of the Amazon Monitron data in Amazon S3.

Create a dashboard of Amazon Monitron devices with Amazon Managed Grafana

In this section, we build a customized dashboard with Amazon Managed Grafana to visualize Amazon Monitron data in Amazon S3, so that OT team can get streamlined access to assets in alarm across their whole Amazon Monitron sensors fleet. This will enable the OT team to plan next step actions based on the possible root cause of the anomalies.

To create a Grafana workspace, complete the following steps:

Ensure that your user role is admin or editor.
On the Amazon Managed Grafana console, choose Create workspace.
For Workspace name, enter a name for the workspace.
Choose Next.
For Authentication access, select AWS IAM Identity Center (successor to AWS Single Sign-On). You can use the same AWS IAM Identity Center user that you used to set up your Amazon Monitron project.
Choose Next.
For this first workspace, confirm that Service managed is selected for Permission type. This selection enables Amazon Managed Grafana to automatically provision the permissions you need for the AWS data sources that you use for this workspace.
Choose Current account.
Choose Next.
Confirm the workspace details, and choose Create workspace. The workspace details page appears. Initially, the status is CREATING.
Wait until the status is ACTIVE to proceed to the next step.

To configure your Athena data source, complete the following steps:

On the Amazon Managed Grafana console, choose the workspace you want to work on.
On the Data sources tab, select Amazon Athena, and choose Actions, Enable service-managed policy.
Choose Configure in Grafana in the Amazon Athena row.
Sign in to the Grafana workspace console using IAM Identity Center if necessary. The user should have the Athena access policy attached to the user or role to have access to the Athena data source. See AWS managed policy: AmazonGrafanaAthenaAccess for more info.
On the Grafana workspace console, in the navigation pane, choose the lower AWS icon (there are two) and then choose Athena on the Data sources menu.
Select the default Region that you want the Athena data source to query from, select the accounts that you want, then choose Add data source.
Follow the steps to configure Athena details.

If your workgroup in Athena doesn’t have an output location configured already, you need to specify an S3 bucket and folder to use for query results. After setting up the data source, you can view or edit it in the Configuration pane.

In the following subsections, we demonstrate several panels in the Amazon Monitron dashboard built in Amazon Managed Grafana to gain operational insights. The Athena data source provides a standard SQL query editor that we’ll use to analyze the Amazon Monitron data to generate desired analytics.

First, if there are many sensors in the Amazon Monitron project and they are in different states (healthy, warning, alarm, and needs maintenance), the OT team wants to visually see the count of positions that sensors are in various states. You can obtain such information as a pie chart widget in Grafana via the following Athena query:

Select * FROM (Select latest_status, COUNT(assetdisplayname)OVER (PARTITION BY latest_status) AS asset_health_count FROM (SELECT timestamp, sitedisplayname, assetdisplayname, assetState.newState as latest_status, RANK() OVER (PARTITION BY assetdisplayname ORDER BY timestamp DESC)AS rnk FROM "AwsDataCatalog"."Replace with your Athena database name"."Replace with your Athena table name") tt WHERE tt.rnk=1) GROUP BY latest_status, asset_health_count;

The following screenshot shows a panel with the latest distribution of Amazon Monitron sensor status.

To format your SQL query for Amazon Monitron data, refer to Understanding the data export schema.

Next, your Operations Technology team may want to plan predictive maintenance based on assets that are in alarm status, and therefore they want to quickly know the total number of acknowledged alarms vs. unacknowledged alarms. You can show the summary information of alarm state as simple stats panels in Grafana:

Select COUNT(*) FROM (Select timestamp, sitedisplayname, assetdisplayname, assetState.newState as latest_status, RANK() OVER (PARTITION BY assetdisplayname ORDER BY timestamp DESC)AS rnk FROM "AwsDataCatalog"."Replace with your Athena database name"."Replace with your Athena table name") tt WHERE tt.rnk=1 AND tt.latest_status = 'Alarm';

The following panel shows acknowledged and unacknowledged alarms.

The OT team can also query the amount of time the sensors remain in alarm status, so that they can decide their maintenance priority:

Select c.assetdisplayname, b.sensorpositiondisplayname, b.alarm_date FROM (Select a.assetdisplayname, a.sensorpositiondisplayname, COUNT(*)/24+1 AS number_of_days_in_alarm_state FROM (Select * FROM "AwsDataCatalog"."Replace with your Athena database name"."Replace with your Athena table name" WHERE (assetState.newState = 'ALARM' AND assetState.newState = assetState.previousState) ORDER BY timestamp DESC) a GROUP BY a.assetdisplayname, a.sensorpositiondisplayname) b INNER JOIN (Select * FROM (Select timestamp, sitedisplayname, assetdisplayname, assetState.newState AS latest_status, RANK() OVER (PARTITION BY assetdisplayname ORDER BY timestamp DESC)AS rnk FROM "AwsDataCatalog"."Replace with your Athena database name"."Replace with your Athena table name") tt WHERE tt.rnk=1 AND tt.latest_status = 'ALARM') c ON b.assetdisplayname = c.assetdisplayname;

The output of this analysis can be visualized by a bar chart in Grafana, and the alarm in alarm state can be easily visualized as shown in the following screenshot.

To analyze top/bottom asset performance based on the total amount of time the assets are in an alarm or need maintenance state, use the following query:

Select s.sitedisplayname, s.assetdisplayname, COUNT(s.timestamp)/24 AS trouble_time FROM (Select timestamp, sitedisplayname, assetdisplayname, sensorpositiondisplayname, assetState.newState FROM "AwsDataCatalog"."Replace with your Athena database name"."Replace with your Athena table name" WHERE assetState.newState = 'ALARM' OR assetState.newState = 'NEEDS_MAINTENANCE') AS s GROUP BY s.assetdisplayname, s.sitedisplayname ORDER BY trouble_time, s.assetdisplayname ASC LIMIT 5;

The following bar gauge is used to visualize the preceding query output, with the top performing assets showing 0 days of alarm states, and the bottom performing assets showing accumulated alarming states over the past year.

To help the OT team understand the possible root cause of an anomaly, the alarm types can be displayed for these assets still in alarm state with the following query:

Select a.assetdisplayname, a.sensorpositiondisplayname, a.latest_status, CASE WHEN a.temperatureML != 'HEALTHY' THEN 'TEMP' WHEN a.vibrationISO != 'HEALTHY' THEN 'VIBRATION_ISO' ELSE 'VIBRATION_ML' END AS alarm_type  FROM (Select sitedisplayname, assetdisplayname, sensorpositiondisplayname, models.temperatureML.persistentClassificationOutput as temperatureML, models.vibrationISO.persistentClassificationOutput as vibrationISO, models.vibrationML.persistentClassificationOutput as vibrationML, assetState.newState as latest_status FROM (Select *, RANK() OVER (PARTITION BY assetdisplayname, sensorpositiondisplayname ORDER BY timestamp DESC)AS rnk FROM "AwsDataCatalog"."Replace with your Athena database name"."Replace with your Athena table name") tt WHERE tt.rnk=1 AND assetState.newState = 'ALARM' ) a WHERE (a.temperatureML != 'HEALTHY' OR a. vibrationISO != 'HEALTHY' OR a. vibrationML != 'HEALTHY');

You can visualize this analysis as a table in Grafana. In this Amazon Monitron project, two alarms were triggered by ML models for vibration measurement.

The Amazon Managed Grafana dashboard is shown here for illustration purposes. You can adapt the dashboard design according to your own business needs.

Failure Reports

When a user acknowledges an alarm in the Amazon Monitron app, the associated assets transition to a new state. The user also has the opportunity to provide some details about this alarm:

Failure cause – This can be one of the following: ADMINISTRATION, DESIGN, FABRICATION, MAINTENANCE, OPERATION, OTHER, QUALITY, WEAR, or UNDEDETERMINED
Failure mode – This can be one of the following: NO_ISSUE, BLOCKAGE, CAVITATION, CORROSION, DEPOSIT, IMBALANCE, LUBRICATION, MISALIGNMENT, OTHER, RESONANCE, ROTATING_LOOSENESS, STRUCTURAL_LOOSENESS, TRANSMITTED_FAULT, or UNDETERMINED
Action taken – This can be ADJUST, CLEAN, LUBRICATE, MODIFY, OVERHAUL, REPLACE, NO_ACTION, or OTHER

The event payload associated to the asset state transition contains all this information, the previous state of the asset, and the new state of the asset. Stay tuned for an update of this post with more details on how you can use this information in an additional Grafana panel to build Pareto charts of the most common failures and actions taken across your assets.

Conclusion

Enterprise customers of Amazon Monitron are looking for a solution to build an IoT data lake with Amazon Monitron’s live data, so they can manage multiple Amazon Monitron projects and assets, and generate analytics reports across multiple Amazon Monitron projects. This post provide a detailed walkthrough of a solution to build this IoT data lake with the latest Amazon Monitron Kinesis data export v2 feature. This solution also showed how to use other AWS services, such as AWS Glue and Athena to query the data, generate analytics outputs, and visualize such outputs with Amazon Managed Grafana with frequent refresh.

As a next step, you can expand this solution by sending ML inference results to other EAM systems that you might use for work order management. This will allow your operation team to integrate Amazon Monitron with other enterprise applications, and improve their operation efficiency. You can also start building more in-depth insights into your failure modes and actions taken by processing the asset state transitions and the closure codes that are now part of the Kinesis data stream payload.

About the authors

Julia Hu is a Sr. AI/ML Solutions Architect at Amazon Web Services. She has extensive experience in IoT architecture and Applied Data Science, and is part of both the Machine Learning and IoT Technical Field Community. She works with customers, ranging from start-ups to enterprises, to develop AWSome IoT machine learning (ML) solutions, at the Edge and in the Cloud. She enjoys leveraging latest IoT and big data technology to scale up her ML solution, reduce latency, and accelerate industry adoption.

Bishr Tabbaa is a solutions architect at Amazon Web Services. Bishr specializes in helping customers with machine learning, security, and observability applications. Outside of work, he enjoys playing tennis, cooking, and spending time with family.

Shalika Pargal is a Product Manager at Amazon Web Services. Shalika focuses on building AI products and services for Industrial customers. She brings significant experience at the intersection of Product, Industrial and Business Development. She recently shared Monitron’s success story at Reinvent 2022.

Garry Galinsky is a Principal Solutions Architect supporting Amazon on AWS. He has been involved with Monitron since its debut and has helped integrate and deploy the solution into Amazon’s worldwide fulfillment network. He recently shared Amazon’s Monitron success story at re:Invent 2022.

Michaël Hoarau is an AI/ML Specialist Solutions Architect at AWS who alternates between data scientist and machine learning architect, depending on the moment. He is passionate about bringing the AI/ML power to the shop floors of his industrial customers and has worked on a wide range of ML use cases, ranging from anomaly detection to predictive product quality or manufacturing optimization. He published a book on time series analysis in 2022 and regularly writes about this topic on LinkedIn and Medium. When not helping customers develop the next best machine learning experiences, he enjoys observing the stars, traveling, or playing the piano.

Vedere AI