We’re happy to share the work completed by Google Summer of Code students working with TensorFlow Hub this year. If you’re a student who is interested in writing open source code, then you’ll likely be interested in Google’s Summer of Code program.
Through this program, students propose project ideas to open source organizations, and if selected, receive a stipend to work with them to complete their projects over the summer. Students have the opportunity to learn directly from mentors within their selected organization, and organizations benefit from the students’ contributions. This year, 17 successful students completed their projects with the TensorFlow organization on many projects. In this article, we’ll focus on some of the work completed on TensorFlow Hub.
We’re Sayak and Morgan, two mentors for projects on TensorFlow Hub (TF Hub). Here we share what the students learned about building and publishing state-of-the-art models, training them on large-scale benchmark datasets, what we learned as mentors, and how rewarding summer of code was for each of us, and for the community.
We had the opportunity to mentor two students – Aditya Kane and Vasudev Gupta. Aditya successfully implemented several variants of RegNets including one based on this paper, and trained them on the ImageNet-1k dataset. Vasudev ported the pre-trained wav2vec2 weights from this paper to TensorFlow, which required him to implement the model architecture from scratch. He then demonstrated fine-tuning these pre-trained checkpoints on the LibriSpeech dataset, making his work more customizable and relevant for the community.
With model training happening at such a large scale, it becomes especially important to follow good engineering practices during the implementation. These include code modularization, unit tests, good design patterns, optimizations, and so on. Models were trained on Cloud TPUs to accelerate training time, and as such, substantial effort was put into the data input pipelines to ensure maximum accelerator utilization.
All of these factors collectively contributed to the complexity of the projects. Thanks to the Summer of Code program, students have the opportunity to tackle these challenges with the help of experienced mentors. This also enables students to gain insight into their organizations, and interact with people with many skillsets who cooperate to make large projects possible. A big thank you here to our students, who gracefully handled this engineering work and listened to our feedback.
Vasudev and Aditya contributed significant pre-trained models to TensorFlow Hub, along with tutorials (Wav2Vec, RegNetY) on their use, and TensorFlow implementations for folks who want to dig deeper. In their own words:
The last 2-3 months were full of lots of learning and coding. GSoC helped me get into the speech domain and motivated me to explore more about the TensorFlow ecosystem. I am thankful to my mentors for their continuous & timely feedback. I am looking forward to contributing more to the TensorFlow community and other awesome open source projects out there. – Vasudev Gupta
More about RegNets and Wav2Vec2
Almost 6 years after they were first published, ResNets are still widely used as benchmark architectures across image understanding tasks. Many recent self-supervised and semi-supervised learning frameworks still leverage ResNet50 as their backbone architectures. However, ResNets often do not scale well under larger data regimes and suffer from large training and inference time latencies as they grow. In contrast, RegNets were developed specifically to be a scalable architecture framework that maintains low latency while demonstrating high performance on standard image recognition tasks. Aditya’s models are published on TF Hub, with code and tutorials on GitHub.
Self-supervised learning is an important area of machine learning research. Many recent success stories have been focused on NLP and Computer Vision, and for Vasudev’s project, we wanted to explore speech. Last year, a group of researchers released the wav2vec2 framework for learning representations from audio in a self-supervised manner, benefiting downstream tasks like speech-to-text.
Using wav2vec2, you can now pre-train speech models without labeled data, and fine-tune those models on downstream tasks like speaker recognition. Vasudev’s models are available on TF Hub, along with a new tutorial on fine-tuning, and code on GitHub.
We’d like to say a heartfelt thank you to all the students, mentors, and organizers who made Summer of Code a success despite this year’s many challenges. We encourage you to check out these models and share what you have built with us by tagging #TFHub on your social media posts, or share your work for the community spotlight program. If you have questions or want to learn more about these new models, you can ask them on discuss.tensorflow.org.