Posted by Megha Malpani, Google Product Manager and Ard Oerlemans, Google Software Engineer
Coral reefs are some of the most diverse and important ecosystems in the world, both for marine life and society more broadly. Not only are healthy reefs critical to fisheries and food security, they also protect coastlines from storm surge, support tourism-based economies, and advance drug discovery research, among other countless benefits.
Reefs face a number of rising threats, most notably climate change, pollution, and overfishing. In the past 30 years alone, there have been dramatic losses in coral cover and habitat in the Great Barrier Reef (GBR), with other reefs experiencing similar declines. In Australia, outbreaks of the coral-eating crown of thorns starfish (COTS) have been shown to cause major coral loss. While COTS naturally exist in the Indo-Pacific, reductions in the abundance of natural predators and excess run-off nutrients have led to massive outbreaks that are devastating already vulnerable coral communities. Controlling COTS populations is critical to promoting coral growth and resilience.
The Great Barrier Reef Foundation established an innovation program to develop new survey and intervention methods that radically improve COTS control. Google teamed up with CSIRO, Australia’s national science agency, to develop innovative machine learning technology that can analyze video sequences accurately, efficiently, and in near real-time. The goal is to transform the underwater survey, monitoring and mapping reefs at scale to help rapidly identify and prioritize COTS outbreaks. This project is part of a broader partnership with CSIRO under Google’s Digital Future Initiative in Australia.
CSIRO developed an edge ML platform (built on top of the NVIDIA Jetson AGX Xavier) that can analyze underwater image sequences and map out detections in near real-time. Our goal was to use the annotated dataset CSIRO had built over multiple field trips to develop the most accurate object detection model (across a variety of environments, weather conditions, and COTS populations) within a set of performance constraints, most notably, processing more than 10 frames per second (FPS) on a <30 watt device.
We hosted a Kaggle competition, leveraging insights from the open source community to drive our experimentation plan. With over 2,000 teams and 61,000 submissions, we were able to learn from the successes and failures of far more experiments than we could hope to execute on our own. We used these insights to define our experimentation roadmap and ended up running hundreds of experiments on Google TPUs.
We used TensorFlow 2’s Model Garden library as our foundation, making use of its scaled YOLOv4 model and corresponding training pipeline implementations. Our team of modeling experts then got to work, modifying the pipeline, experimenting with different image resolutions and model sizes, and applying various data augmentation and quantization techniques to create the most accurate model within our performance constraints.
Due to the limited amount of annotated data, a key part of this problem was figuring out the most effective data augmentation techniques. We ran hundreds of experiments based on what we learned from the Kaggle submissions to determine which techniques in combination were most effective in increasing our model’s accuracy.
In parallel with our modeling workstream, we experimented with batching, XLA, and auto mixed precision (which converts parts of the model to fp16) to try and improve our performance, all of which resulted in increasing our FPS by 3x. We found however, that on the Jetson module, using TensorFlow-TensorRT (converting the entire model to fp16) by itself actually resulted in a 4x total speed up, so we used TF-TRT exclusively moving forward.
After the starfish are detected in specific frames, a tracker is applied that links detections over time. This means that every detected starfish will be assigned a unique ID that it keeps as long as it stays visible in the video. We link detections in subsequent frames to each other by first using optical flow to predict where the starfish will be in the next frame, and then matching detections to predictions based on their Intersection over Union (IoU) score.
In a task like this where recall is more important than precision (i.e. we care more about not missing COTS than false positives), it is useful to consider the F2 metric to assess model accuracy. This metric can be used to evaluate a model’s performance on individual frames. However, our ultimate goal was to determine the total number of COTS present in the video stream. Thus, we cared more about evaluating the entire pipeline’s accuracy (model + tracker) than frame-by-frame performance (i.e. it’s okay if the model has inaccurate predictions on a frame or two as long as the pipeline correctly identifies the starfish’s overall existence and location). We ended up using a sequence-based F2 metric that determines how many “tracks” are found at a certain average IoU threshold.
Our current 1080p model using TensorFlow TensorRT runs at 11 FPS on the Jetson AGX Xavier, reaching a sequence-based F2 score of 0.80! We additionally trained a 720p model that runs at 22 FPS on the Jetson module, with a sequence-based F2 score of 0.78.
Google & CSIRO are thrilled to announce that we are open-sourcing both COTS Object Detection models and have created a Colab notebook to demonstrate the server-side inference workflow. Our Colab tutorial allows students, marine researchers, or data scientists to evaluate our COTS ML models on image sequences with zero configuration/ML knowledge. Additionally, it provides a blueprint for implementing an optimized inference pipeline for edge ML platforms, such as the Jetson module. Please stay tuned as we plan to continue updating our models & trackers, ultimately open-sourcing a full TFX pipeline and dataset so that conservation organizations and other governments around the world can retrain and modify our model with their own datasets. Please reach out to us if you have a specific use case you’d like to collaborate on!
A huge thank you to everyone who’s hard work made this project possible!
We couldn’t have done this without our partners at CSIRO – Brano Kusy, Jiajun Liu, Yang Li, Lachlan Tychsen-Smith, David Ahmedt-Aristizabal, Ross Marchant, Russ Babcock, Mick Haywood, Brendan Do, Jeremy Oorloff, Lars Andersson, and Joey Crosswell, the amazing Kaggle community, and last but not least, the team at Google – Glenn Cameron, Scott Riddle, Di Zhu, Abdullah Rashwan, Rebecca Borgo, Evan Rosen, Wolff Dobson, Tei Jeong, Addison Howard, Will Cukierski, Sohier Dane, Mark McDonald, Phil Culliton, Ryan Holbrook, Khanh LeViet, Mark Daoust, George Karpenkov, and Swati Singh.