Posted by Carlos Mendonça, Coral
Fall has finally arrived and with it a new release of Coral’s C++ and Python APIs and tools, along with new models optimized for the Edge TPU and further support for TensorFlow 2.0-based workflows.
Coral is a complete toolkit to build products with local AI. Our on-device inferencing capabilities allow you to build products that are efficient, private, fast and offline with the help of TensorFlow Lite and the Edge TPU.
From the beginning, we’ve provided APIs in Python and C++ that enable developers to take advantage of the Edge TPU’s local inference speed. Offline processing for machine learning models allows for considerable savings on bandwidth and cloud compute costs, it keeps data local, and it preserves user privacy. More recently, we’ve been hard at work to refactor our APIs and make them more modular, reusable and performant, while at the same time eliminating unnecessary API abstractions and surfacing more of the native TensorFlow Lite APIs that developers are familiar with.
So in our latest release, we’re now offering two separate reusable libraries, each built upon the powerful TensorFlow Lite APIs and each isolated in their own repositories: libcoral for C++ and PyCoral for Python.
Unlike some of our previous APIs, libcoral doesn’t hide
tflite::Interpreter. Instead, we’re making this native TensorFlow Lite class a first-class component and offering some additional helper APIs that simplify some of your code when working with common models such as classification and detection.
With our new libcoral library, developers should typically follow the pattern below to perform an inference in C++:
tflite::Interpreterinstance with the Edge TPU context and allocate memory.
To simplify this step, libcoral provides the
// Load the model
auto model = coral::LoadModelOrDie(absl::GetFlag(FLAGS_model_path));
// Get the Edge TPU context
auto tpu_context = coral::ContainsEdgeTpuCustomOp(*model) ?
// Get the interpreter
auto interpreter = coral::MakeEdgeTpuInterpreterOrDie(
- Configure the interpreter’s input.
- Invoke the interpreter:
- Process the interpreter’s output.
As an alternative to
Invoke(), you can achieve higher performance with the
InvokeWithDmaBuffer() functions, which enable processing the input data without copying from another region of memory or from a DMA file descriptor, respectively.
To simplify this step, libcoral provides some adapters, requiring less code from you:
auto result = coral::GetClassificationResults(
/* threshold= */0.0f,
The above is an example of the classification adapter, where developers can specify the minimum confidence threshold, as well as the maximum number of results to return. The API also features a detection adapter with its own result filtering parameters.
For a full view of the example application source code, see classify_image.cc on GitHub and for instructions on how to integrate libcoral into your application, refer to README.md on GitHub.
This new release also brings updates to on-device retraining with the decoupling of imprinting functions from inference on the updated
ImprintingEngine. The new design makes the imprinting engine work with the
To easily address the Edge TPUs available on the host, libcoral supports labels such as
"pci:1“. This should make it easier to manage resources on multi-Edge TPU systems.
Finally, we’ve made a number of performance improvements such as more efficient memory usage and memory-based instead of file-based abstractions. Also, the design of the API is more consistent by leveraging the Abseil library for error propagation, generic interfaces and other common patterns, which should provide a more consistent and stable developer experience.
The new PyCoral library (provided in a new
pycoral Python module) follows some of the design patterns introduced with libcoral, and brings parity across our C++ and Python APIs. PyCoral implements the same imprinting decoupling design, model adapters for classification and detection, and the same label-based TPU addressing semantics.
On PyCoral, the “run inference” functionality is now entirely delegated to the native TensorFlow Lite library, as we’ve done-away with the model “engines” that abstracted the TensorFlow interpreter. This change allowed us to eliminate the code duplication introduced by the Coral-specific BasicEngine, ClassificationEngine and DetectionEngine classes (those APIs—from the “Edge TPU Python library”—are now deprecated).
To perform an inference with PyCoral, we follow a similar pattern to that of libcoral:
- Create an interpreter:
- Configure the interpreter’s input:
- Invoke the interpreter:
- Process the interpreter’s output:
interpreter = edgetpu.make_interpreter(model_file)
classes = classify.get_classes(interpreter, top_k=3)
For fully detailed example code, check out our documentation for Python.
Updates to the Coral model garden
With this release, we’re further expanding the Coral model garden with MobileDet. MobileDets refer to a family of lightweight, single-shot detectors using the TensorFlow Object Detection API that achieve state-of-the-art accuracy-latency tradeoff on Edge TPUs. It is a lower-latency detection model that offers better accuracy, compared to the MobileNet family of models.
Check out the full collection of models available from Coral for the Edge TPU, including Classification, Detection, Segmentation and models specially prepared for on-device training.
Migrating our entire workflow and model collection to TensorFlow 2 is an ongoing effort. This release of the Coral machine learning API starts introducing support for TensorFlow 2-based workflows. For now, MobileNet v1 (ImageNet), MobileNet v2 (ImageNet), MobileNet v3 (ImageNet), ResNet50 v1 (ImageNet), and UNet MobileNet v2 (Oxford pets) all support training and conversion with TensorFlow 2.
Both libcoral and PyCoral have graduated the model pipelining functionality from Beta to General Availability. Model pipelining makes it possible for large models to be partitioned and distributed across multiple Edge TPUs to run them considerably faster.
Refer to the documentation for examples of the API in C++ and Python.
The partitioning of models is done with the Edge TPU Compiler, which employs a parameter count algorithm, partitioning the model into segments with similar parameter sizes. For cases where this algorithm doesn’t provide the throughput you need, this release is introducing a new tool that supports a profiling-based algorithm, which divides the segments based on latency observed by actually running the model multiple times, possibly resulting in a more balanced output.
profiling_partition tool can be used as such:
For more information about the Coral APIs mentioned above, see the following documentation:
- Run inference on the Edge TPU with C++
- Run inference on the Edge TPU with Python
- Pipeline a model with multiple Edge TPUs
- Perform transfer learning on the Edge TPU
- Coral Model Garden