30 years of family videos in an AI archive

30 years of family videos in an AI archive

My dad got his first video camera the day I was born nearly three decades ago. “Say hello to the camera!” are the first words he caught on tape, as he pointed it at a red, puffy baby (me) in a hospital bassinet. The clips got more embarrassing from there, as he continued to film through many diaper changes, temper tantrums and—worst of all—puberty.

Most of those potential blackmail tokens sat trapped on miniDV tapes or scattered across SD cards until two years ago when my dad uploaded them all to Google Drive. Theoretically, since they were now stored in the cloud, my family and I could watch them whenever we wanted. But with more than 456 hours of footage, watching it all would have been a herculean effort. You can only watch old family friends open Christmas gifts so many times. So, as an Applied AI Engineer, I got down to business and built an AI-powered searchable archive of our family videos.

If you’ve ever used Google Photos, you’ve seen the power of using AI to search and organize images and videos. The app uses machine learning to identify people and pets, as well as objects and text in images. So, if I search “pool” in the Google Photos app, it’ll show me all the pictures and videos I ever took of pools.

But for this project, I needed a couple of features Photos doesn’t (yet!) support. First, because my dad’s first camera recorded footage to miniDV tapes, those videos were uploaded as meaty, two-hour-long movies with no useful metadata. Instead, my dad would start a clip by saying, “let me put a date on the screen here…” and a little white text snippet would appear in the bottom right corner of the frame. In between shots on a single reel, he’d say: “Say goodbye, I’m going to fade out now.” I would scream, “NO, DON’T FADE OUT,” while the screen faded to black. So, my first step was to use machine learning to automatically parse the date shown on the screen, and split the single long video into shorter clips after each fade out.

video screenshot

In this picture, you can see the timestamp shown on screen. Using the Vision API, I could extract it to sort my videos by date.

For this, I turned the Video intelligence API, a Google Cloud tool that lets developers analyze videos with machine learning. It allows you to replicate many of the features found in the Google Photos app—like tagging objects in images and recognizing on-screen text—and a whole lot more. For example, the API’s shot change detection feature automatically finds the timestamps in videos where a scene changes, this allowed me to split those longs videos into smaller chunks. 

Using the label detection feature, I could search for all sorts of different events, like “bridal shower,” “wedding,” “bat and ball games” and “baby.” By searching “performance,” I was able to finally find one of my life’s proudest accomplishments on tape—a starring role singing “It’s Not Easy Being Green” in my kindergarten’s production of the Sesame Street musical.

home video 2

My starring role as Kermit the Frog in my school’s Sesame Street musical. The Video Intelligence API tagged it as “performance”.  

The Video Intelligence API’s real “killer feature” for me was its ability to do audio transcription. By transcribing my videos, I was able to query clips by what people said in them. I could search for specific names (“Scott,” “Dale,” “grandma”), proper nouns (“Chuck E Cheese”, “Pokemon”), and for unique phrases. By searching “first steps,” I found a clip of my dad saying, “Here she comes… plunk. That’s the first time she’s taken major steps” alongside a video of my managing, just barely, to waddle along.

homevideo3

My first steps that I was able to find with the Video Intelligence API’s Transcription feature. Here, my dad says, “…this is the first time she’s taken major steps.”

In the end, machine learning helped me build exactly the kind of archive I wanted—one that let me search my family videos by memories, not timestamps.

P.S. Want to see how I built it? Check out my technical blog post or catch the video on the Cloud Youtube Channel

Read More

Tools for language access during COVID-19

Translation services make it easier to communicate with someone who doesn’t speak the same language, whether you’re traveling abroad or living in a new country. But in the context of a global pandemic, government and health officials urgently need to deliver vital information to their communities, and every member of the community needs access to information in a language they understand. In the U.S. alone, that means reaching 51 million migrants in at least 350 languages, with information ranging from how to keep people and their families safe, to financial, employment or food resources.

To better understand the challenges in addressing these translation needs, we conducted a research study, and interviewed health and government officials responsible for disseminating critical information. We assessed the current shortcomings in providing this information in the relevant languages, and how translation tools could help mitigate them.

The struggle for language access 

When organizations—from health departments to government agencies—update information on a website, it needs to be quickly accessible in a wide variety of languages. We learned that these organizations are struggling to keep up with the high volume of rapidly-changing content and lack the resources to translate this content into the needed languages. 

Officials, who are already spread thin, can barely keep up with the many updates surrounding COVID-19—from the evolving scientific understanding, to daily policy amendments, to new resources for the public. Nearly all new information is coming in as PDFs several times a day, and many officials report not being able to offer professional translation for all needed languages. This is where machine translation can serve as a useful tool.  

How machine translation can help

Machine translation is an automated way to translate text or speech from one language to another. It can take volumes of data and provide translations into a large number of supported languages. Although not intended to fully replace human translators, it can provide value when immediate translations are needed for a wide variety of languages.

If you’re looking to translate content on the web, you have several options.

Use your browser

Many popular browsers offer translation capabilities, which are either built in (e.g. Chrome) or require installing an add-on or extension (e.g. Microsoft Edge or Firefox). To translate web content in Chrome, all you have to do is go to a webpage in another language, then click “Translate” at the top.

Use a website translation widget

If you are a webmaster of a government, non-profit, and/or non-commercial website (e.g. academic institutions), you may be eligible to sign up for the Google Translate Website Translator widget. This tool translates web page content into 100+ different languages. To find out more, please visit the webmasters blog.

Upload PDFs and documents

Google Translate supports translating many different document formats (.doc, .docx, .odf, .pdf, .ppt, .pptx, .ps, .rtf, .txt, .xls, .xlsx). By simply uploading the document, you can get a translated version in the language that you choose.

Millions of people need translations of resources at this time. Google’s researchers, designers and product developers are listening. We are continuously looking for ways to improve our products and come to people’s aid as we navigate the pandemic. 

Read More