Extend your TFX pipeline with TFX-Addons

Posted by Hannes Hapke and Robert Crowe

figuration framework and shared libraries to integrate common components needed to define, launch, and monitor your machine learning system.

What is TFX-Addons?

TFX-Addons is a special interest group (SIG) for TFX users who are extending the standard set of components provided by Google’s TensorFlow team. The addons are implementations by other machine learning companies and developers which rely heavily on TFX for their production machine learning operations.

Common MLOps patterns, for example ingesting data into machine learning pipelines, are solved through TFX components. As an example, members of TFX-Addons developed and open-sourced a TFX component to ingest data from a Feast feature store, a component maintained by machine learning engineers at Twitter and Apple.

How can you use the TFX-Addons components or examples?

The TFX-Addons components and examples are accessible via a simple pip installation. To install the latest version, run the following:

pip install tfx-addons

To ensure you have a compatible version of dependencies for any given project, you can specify the project name as an extra requirement during install:

pip install tfx-addons[feast_examplegen]

To use TFX-Addons:

from tfx import v1 as tfx
import tfx_addons as tfxa

# Then you can easily load projects tfxa.{project_name}. Ex:

tfxa.feast_examplegen.FeastExampleGen(...)

The TFX-Addons components can be used in any TFX pipeline. Most components support all TFX orchestrators including Google Cloud’s Vertex Pipelines, Apache Beam, Apache Airflow, or Kubeflow Pipelines.

Which additional components are currently available?

The list of components, libraries, and examples is constantly growing, with several new projects currently in development. As of this writing, these are the currently available components.

Feast Component

The Example Generator allows you to ingest data samples from a Feast Feature Store.

Message Exit Handler

This component provides an exit handler for TFX pipelines which notifies the user about the final state of the pipeline (failed or succeeded) via a Slack message. If the pipeline fails, the component will provide the error message. The message component supports a number of message providers (e.g. Slack, stdout, logging providers) and can easily be extended to support Twilio. It also serves as an example of how to write exit handlers for TFX pipelines.

Schema Curation Component

This component allows its users to update/change the schema produced by the SchemaGen component, and curate it based on domain knowledge. The curated schema can be used to stop pipelines if a feature drift is detected.

Feature Selection Component

This component allows users to select features from datasets. This component is useful if you want to select features based on statistical feature selection metrics.

XGBoost Evaluator Component

This component extends the standard TFX Evaluator component to support trained XGBoost models, in order to do deep analysis of model performance.

Sampling Component

This component allows users to balance their training datasets by randomly undersampling or oversampling, reducing the data to the lowest- or highest-frequency class.

Pandas Transform Component

This component can be used instead of the standard TFX Transform component, and allows you to work with Pandas dataframes for your feature engineering. Processing is distributed using Beam for scalability.

Firebase Publisher

This project helps users to publish trained models directly from a TFX pipeline to Firebase ML.

HuggingFace Model Pusher

The HuggingFace Model Pusher (HFModelPusher) pushes a blessed model to the HuggingFace Model Hub. Also, it optionally pushes an application to HuggingFace Space Hub.

How can you participate?

The TFX-Addons SIG is all about sharing reusable components and best practices. If you are interested in MLOps, join our bi-weekly conference calls. It doesn’t matter if you are new to TFX or an experienced ML engineer, everyone is welcome and the SIG accepts open source contributions from all participants.

If you want to join our next meeting, sign up to our list group sig-tfx-addons@tensorflow.org.

Other resources:

Already using TFX-Addons?

If you’re already using TFX-Addons we’d love to hear from you! Use this form to send us your story!

Thanks to all Contributors

Big thanks to all the open-source component contributions from following members:
Badrul Chowdhury, Daniel Kim, Fatimah Adwan, Gerard Casas Saez, Hannes Hapke, Marcus Chang, Kshitijaa Jaglan, Pratishtha Abrol, Robert Crowe, Nirzari Gupta, Thea Lamkin, Wihan Booyse, Michael Hu, Vulko Milev, and all the other contributors! Open-source only happens when people like you contribute!

Read More