What made me want to fight for fair AI

My life has always involved centering the voices of those historically marginalized in order to foster equitable communities. Growing up, I lived in a small suburb just outside of Cleveland, Ohio and I was fortunate enough to attend Laurel School, an all-girls school focused on encouraging young women to think critically and solve difficult world problems. But my lived experience at school was so different from kids who lived even on my same street. I was grappling with watching families around me contend with an economic recession, losing any financial security that they had and I wanted to do everything I could to change that. Even though my favorite courses at the time were engineering and African American literature, I was encouraged to pursue economics.

I was fortunate enough to continue my education at Princeton University, first starting in the economics department. Unfortunately, I struggled to find the connections between what I was learning and the challenges I saw my community and people of color in the United States facing through the economic crisis. Interestingly enough, it was through an art and social justice movements class in the School of Architecture that I found my fit. Everyday, I focused on building creative solutions to difficult community problems through qualitative research, received feedback and iterated. The deeper I went into my studies, the more I realized that my passion was working with locally-based researchers and organizations to center their voices in designing solutions to complex and large-scale problems. It wasn’t until I came to Google, that I realized this work directly translated to human-centered design and community-based participatory research. My undergraduate studies culminated in the creation of a social good startup focused on providing fresh produce to food deserts in central New Jersey, where our team interviewed over 100 community members and leaders, secured a $16,000 grant, and provided pounds of free fresh produce to local residents.

Already committed to a Ph.D. program in Social Policy at Brandeis University, I channeled my passion for social enterprise and solving complex problems into developing research skills. Knowing that I ultimately did not want to go into academia, I joked with my friends that the job I was searching for didn’t exist yet, but hopefully it would by the time I graduated. I knew that my heart was equal parts in understanding technology and in closing equity gaps, but I did not know how I would be able to do both.

Through Brandeis, I found language to the experiences of family and friends who had lost financial stability during the Great Recession and methodologies for how to research systematic inequalities across human identity. It was in this work that I witnessed Angela Glover-Blackwell, founder of PolicyLink speak for the first time. From her discussion on highlighting community-based equitable practices, I knew I had to support her work. Through their graduate internship program in Oakland, I was able to bridge the gap between research and application – I even found a research topic for my dissertation! And then Mike Brown was shot.

Mike was from the midwest, just like me. He reminded me of my cousins, friends from my block growing up. The experience of watching what happened to Mike Brown so publically, gave weight to the research and policies that I advocated for in my Ph.D. program and at work – it somehow made it more personal than my experience with the Great Recession. At Brandeis, I led a town hall interviewing the late Civil Rights activist and politician Julian Bond, where I still remember his admonishment to shift from talk to action, and to have clear and centralized values and priorities from which to guide equity. In the background of advocating for social justice, I used my work grading papers and teaching courses as a graduate teaching assistant to supplement my doctoral grant – including graduate courses on “Ethics, Rights, and Development” and “Critical Race Theory.”

The next summer I had the privilege of working at a think tank now known as Prosperity Now, supporting local practitioners and highlighting their findings at the national level. This amazing experience was coupled with meeting my now husband, who attended my aunt and uncle’s church. By the end of the summer, my work and personal experiences in DC had become so important that I decided to stay. Finished with my coursework at Brandeis, I wrote my dissertation in the evenings as I shifted to a more permanent position at the Center for Global Policy Solutions, led by Dr. Maya Rockeymoore. I managed national research projects and then brought the findings to the hill for policymakers to make a case for equitable policies like closing the racial wealth gap. Knocking on doors in Capitol buildings taught me the importance of finding shared language and translating research into measurable change.

By the end of 2016, I was a bit burned out by my work on the hill and welcomed the transition of marriage and moving to Los Angeles. The change of scenery allowed me to finally hone my technical skills as a Program Manager for the LA-based ed tech non profit, 9 Dots. I spent my days partnering with school districts, principals, teaching fellows and software developers to provide CS education to historically underserved students. The ability to be a part of a group that created a hybrid working space for new parents was icing on the cake. Soon after, I got a call from a recruiter at Google.

It had been almost a year since Google’s AI Principles had been publicly released and they were searching for candidates that had a deep understanding of socio-technical research and program management to operationalize the Principles. Every role and research pursuit that I’d followed led to my dream role – Senior Strategist focused on centering the voices of historically underrepresented and marginalized communities in machine learning through research and collaboration.

During my time at Google, I’ve had the opportunity to develop an internal workshop focused on equitable and inclusive language practices, which led to a collaboration with UC Berkeley’s Center for Equity, Gender, and Leadership; launch the Equitable AI Research Roundtable along with Jamila Smith-Loud and external experts focused on equitable cross-disciplinary research practices (including PolicyLink!); and present on Google’s work in Responsible AI at industry-wide conferences like MozFest. With all that I’ve learned, I’m still determined to bring more voices to the table. My work in Responsible AI has led me to building out globally-focused resources for machine learning engineers, analysts, and product decision makers. When we center the experiences of our users – the communities who faced the economic recession with grit and resilience, those who searched for insights from Civil Rights leaders, and developed shared language to inspire inclusion – all else will follow. I’m honored to be one of many at Google driving the future of responsible and equitable AI for all.

Read More

The ML Glossary: Five years of new language

Over guacamole and corn chips at a party, a friend mentions that her favorite phone game uses augmented reality. Another friend points her phone at the host and shouts, “Watch out—a t-rex is sneaking up behind you.” Eager to join the conversation, you blurt, “My blender has an augmented reality setting.

If only you had looked up augmented reality in Google’s Machine Learning Glossary, which defines over 460 terms related to artificial intelligence, you’d know what the heck your friends are talking about. If you’ve ever wondered what a neural network is, or if you chronically confuse the negative class with the positive class at the doctor’s office (“Wait, the negative class means I’m healthy?”), the Glossary has you covered.

AI is increasingly intertwined with our future, and as the language of AI sneaks its way into household conversation, learning AI’s specialized vocabulary could be helpful to understanding many key technological advances — or what’s being said at a guacamole party.

A team of technical writers and AI experts produces the definitions. Sure, the definitions have to be technically accurate, but they also have to be as clear as possible. Clarity is rare in a field as notoriously complicated as artificial intelligence, which is why we created Google’s Machine Learning glossary in 2016. Since then, we’ve published nine full revisions, providing almost 300 additional terms.

Good glossaries are quicksand for the curious. You’ll come for the accuracy, stay for the class-imbalanced datasets, and then find yourself an hour later embedded in overfitting. It’s fun, educational, and blessedly blender free.

Read More

How AI is making information more useful

Today, there’s more information accessible at people’s fingertips than at any point in human history. And advances in artificial intelligence will radically transform the way we use that information, with the ability to uncover new insights that can help us both in our daily lives and in the ways we are able to tackle complex global challenges.

At our Search On livestream event today, we shared how we’re bringing the latest in AI to Google’s products, giving people new ways to search and explore information in more natural and intuitive ways.

Making multimodal search possible with MUM

Earlier this year at Google I/O, we announced we’ve reached a critical milestone for understanding information with Multitask Unified Model, or MUM for short.

We’ve been experimenting with using MUM’s capabilities to make our products more helpful and enable entirely new ways to search. Today, we’re sharing an early look at what will be possible with MUM. 

In the coming months, we’ll introduce a new way to search visually, with the ability to ask questions about what you see. Here are a couple of examples of what will be possible with MUM.

Animated GIF showing how you can tap on the Lens icon when you’re looking at a picture of a shirt, and ask Google to find you the same pattern — but on another article of clothing, like socks.

With this new capability, you can tap on the Lens icon when you’re looking at a picture of a shirt, and ask Google to find you the same pattern — but on another article of clothing, like socks. This helps when you’re looking for something that might be difficult to describe accurately with words alone. You could type “white floral Victorian socks,” but you might not find the exact pattern you’re looking for. By combining images and text into a single query, we’re making it easier to search visually and express your questions in more natural ways.

Animated GIF showing the point-and-ask mode of searching that can make it easier to find the exact moment in a video that can help you with instructions on fixing your bike.

Some questions are even trickier: Your bike has a broken thingamajig, and you need some guidance on how to fix it. Instead of poring over catalogs of parts and then looking for a tutorial, the point-and-ask mode of searching will make it easier to find the exact moment in a video that can help.

Helping you explore with a redesigned Search page

We’re also announcing how we’re applying AI advances like MUM to redesign Google Search. These new features are the latest steps we’re taking to make searching more natural and intuitive.

First, we’re making it easier to explore and understand new topics with “Things to know.” Let’s say you want to decorate your apartment, and you’re interested in learning more about creating acrylic paintings.

The search results page for the query “acrylic painting” that scrolls to a new feature called “Things to know”, which lists out various aspects of the topic like, “step by step”, “styles” and “using household items."

If you search for “acrylic painting,” Google understands how people typically explore this topic, and shows the aspects people are likely to look at first. For example, we can identify more than 350 topics related to acrylic painting, and help you find the right path to take.

We’ll be launching this feature in the coming months. In the future, MUM will unlock deeper insights you might not have known to search for — like “how to make acrylic paintings with household items” — and connect you with content on the web that you wouldn’t have otherwise found.

Two phone screens side by side highlight a set of queries and tappable features that allow you to refine to more specific searches for acrylic painting or broaden to concepts like famous painters.

Second, to help you further explore ideas, we’re making it easy to zoom in and out of a topic with new features to refine and broaden searches. 

In this case, you can learn more about specific techniques, like puddle pouring, or art classes you can take. You can also broaden your search to see other related topics, like other painting methods and famous painters. These features will launch in the coming months.

A scrolling results page for the query “pour painting ideas” that shows results with bold images and video thumbnails.

Third, we’re making it easier to find visual inspiration with a newly designed, browsable results page. If puddle pouring caught your eye, just search for “pour painting ideas” to see a visually rich page full of ideas from across the web, with articles, images, videos and more that you can easily scroll through. 

This new visual results page is designed for searches that are looking for inspiration, like “Halloween decorating ideas” or “indoor vertical garden ideas,” and you can try it today.

Get more from videos

We already use advanced AI systems to identify key moments in videos, like the winning shot in a basketball game, or steps in a recipe. Today, we’re taking this a step further, introducing a new experience that identifies related topics in a video, with links to easily dig deeper and learn more. 

Using MUM, we can even show related topics that aren’t explicitly mentioned in the video, based on our advanced understanding of information in the video. In this example, while the video doesn’t say the words “macaroni penguin’s life story,” our systems understand that topics contained in the video relate to this topic, like how macaroni penguins find their family members and navigate predators. The first version of this feature will roll out in the coming weeks, and we’ll add more visual enhancements in the coming months.

Across all these MUM experiences, we look forward to helping people discover more web pages, videos, images and ideas that they may not have come across or otherwise searched for. 

A more helpful Google

The updates we’re announcing today don’t end with MUM, though. We’re also making it easier to shop from the widest range of merchants, big and small, no matter what you’re looking for. And we’re helping people better evaluate the credibility of information they find online. Plus, for the moments that matter most, we’re finding new ways to help people get access to information and insights. 

All this work not only helps people around the world, but creators, publishers and businesses as well.  Every day, we send visitors to well over 100 million different websites, and every month, Google connects people with more than 120 million businesses that don’t have websites, by enabling phone calls, driving directions and local foot traffic.

As we continue to build more useful products and push the boundaries of what it means to search, we look forward to helping people find the answers they’re looking for, and inspiring more questions along the way.

Read More

Googler Marian Croak is now in the Inventors Hall of Fame

Look around you right now and consider everything that was created by an inventor. The computer you’re reading this article on, the internet necessary to load this article, the electricity that powers the screen, even the coffee maker you used this morning. 

To recognize the incredible contributions of those inventors and the benefits they bring to our everyday life, the National Inventors Hall of Fame has inducted a new group of honorees every year since 1973. In this year’s combined inductee class of 2020/2021, Googler Marian Croak is being honored for her work in advancing VoIP (Voice over Internet Protocol) Technology, which powers the online calls and video chats that have helped businesses and families stay connected through the COVID-19 pandemic. She holds more than 200 patents, and recently was honored by the U.S. Patent and Trademark Office. 

These days, Marian leads our Research Center for Responsible AI and Human Centered Technology, which is responsible for ensuring Google ​​develops artificial intelligence responsibly and that it has a positive impact. We chatted over Google Meet to find out how plumbers and electricians sparked her interest in science, how her inventions have made life in a pandemic a tiny bit easier for everyone, and what the NIHF honor means to her.  

When was the first time you realized you were interested in technology?

I was probably around 5 or 6. I know that we don’t usually think of things like plumbing or electricity as necessarily technology, but they are. I was very enchanted with plumbers and electricians who would come to our house and fix things. They would be dirty and greasy, but I would love the smell, you know? I felt like, Wow, what a miracle worker! I would follow them around, trying to figure out how they’d fix something. I still do that today! 

So when you have electricians come to your house, you’re still like, “Hey, how did you do that?”

There was a leak once, and I was asking the plumber all these questions, and he asked me to quiet down! Because he needed to listen to the invisible flow of water through the pipes to determine the problem. It was amazing to me how similar it was to network engineering!

You’ve had a few different roles at Google and Alphabet so far. How did you move to where you are today?

When I first came to Google, my first role was bringing the Internet to emerging markets. Laying fiber in Africa, building public Wi-Fi in railroad stations in India and then exploring the landscape in countries like Cuba and countries where there wasn’t an openness yet for the Internet. And that was a fascinating job. It was a merger of technology, policy and governmental affairs, combined with an understanding of communities and regions. 

Then I worked on bringing features and technology and Google’s products to the next billion users. And after I did that for a few years, I joined the Site Reliability Engineering organization to help enhance the performance of Google’s complex, integrated systems. Now my current role is leading the Research Center for Responsible AI and Human Centered Technology group. I’m inspired that my work has the potential to positively impact so many of our users. 

Today you’re being inducted into the National Inventors Hall of Fame for your work in advancing VoIP technology. What inspired you to work on VoIP, and can you describe that process of bringing the technology to life?

I have alway been motivated by the desire to change the world, and to do that I try to change the world that I’m currently in. What I mean by that is I work on problems that I am aware of, and that I can tackle within the world that surrounds me. So when I began working on VoIP technology, it was at a time in the late ‘90s when there was a lot of change happening involving the internet. Netscape had put a user-friendly web browser in place and there was a lot of new activity beginning to bubble up all over the online world. 

I was part of a team that was also very interested in doing testing and prototyping of voice communications over the internet. There were some existing technologies but they didn’t scale and they were proprietary in nature, so we were thinking of ways we could open it up, make it scalable, make it reliable and be able to support billions of daily calls. We started to work on this but had a lot of doubters telling us that this wouldn’t work, and that no one would ever use this “toy like” technology. And at the time, they were right: It wasn’t working and it wasn’t reliable. But over time we were able to get it to a point where it started working very well. So much so that eventually the senior leaders within AT&T began to adopt the technology for their core network. It was challenging but an exciting thing for me to do because I like to bring change to things, especially when people doubt that it can happen.

What advice would you give to aspiring inventors? 

Most importantly, don’t give up, and during the process of creation, listen to your critics. I received so much criticism and in many ways it was valid. That type of feedback motivated me to improve the technology, and really address a variety of pain points that I hadn’t necessarily thought of. 

What does being inducted into the NIHF mean to you? 

Well it’s humbling, and a great experience. At the time I never thought the work that I was doing was that significant and that it would lead to this, but I’m so I’m very grateful for the recognition.

What does it mean to be a part of a class that sees the first two Black women inducted into the NIHF?

I find that it inspires people when they see someone who looks like themselves on some dimension, and I’m proud to offer that type of representation. People also see that I’m just a normal person like themselves and I think that also inspires them to accomplish their goals. I want people to understand that it may be difficult but that they can overcome obstacles and that it will be so worth it.

Read More

This Googler’s team is making shopping more inclusive

There’s a lot to love about online shopping: It’s fast, it’s easy and there are a ton of options to choose from. But there’s one obvious challenge — you can’t try anything on. This is something Google product manager Debbie Biswas noticed, as a tech industry veteran and startup founder herself. “Historically, the fashion industry only celebrates people of a certain size and skin color,” she says. “This was something I wanted to change.”

Debbie grew up in India and moved to the U.S. after she graduated college. “I started a company in the women’s apparel space, where I learned to solve user pain points around shopping for clothes, sizing and styling.” While working on her startup, Debbie realized how hard shopping was for women, including herself — the models in the images didn’t show her how something would look on her. 

“When I got an opportunity to work at Google Shopping, I realized I could solve so many of these problems at scale using the best AI/ML tech in the industry,” she says. “As a woman of color, and someone who doesn’t conform to the ‘traditional beautiful size,’ I feel very motivated to solve apparel shopping problems for people like me.”

Read More

These researchers are bringing AI to farmers

“Farmers feed the entire world — so how might we support them to be resilient and build sustainable systems that also support global food security?” It’s a question that Diana Akrong found herself asking last year. Diana is a UX researcher based in Accra, Ghana, and the founding member of Google’s Accra UX team.

Across the world, her manager Dr. Courtney Heldreth, was equally interested in answering this question. Courtney is a social psychologist and a staff UX researcher based in Seattle, and both women work as part of Google’s People + Artificial Intelligence Research (PAIR) group. “Looking back on history, we can see how the industrial revolution played a significant role in creating global inequality,” she says. “It set most of Western Europe onto a path of economic dominance that was then followed by both military and political dominance.” Courtney and Diana teamed up on an exploratory effort focused on how AI can help better the lives of small, local farming communities in the Global South. They and their team want to understand what farmers need, their practices, value systems, what their social lives are like — and make sure that Google products reflect these dynamics.

One result of their work is a recently published research paper. The paper — written alongside their colleagues Dr. Jess Holbrook at Google and Dr. Norman Makoto Su of Indiana University and published in the ACM Interactions trade journal — dives into why we need farmer-centered AI research, and what it could mean not just for farmers, but for everyone they feed. I recently took some time to learn more about their work.

How would you explain your job to someone who isn’t in tech?

Courtney: I would say I’m a researcher trying to understand underserved and historically marginalized users’ lives and needs so we can create products that work better for them. 

Diana: I’m a researcher who looks at how people interact with technology. My superpower is my curiosity and it’s my mission to understand and advocate for user needs, explore business opportunities and share knowledge.

What’s something on your mind right now? 

Diana: Because of COVID-19, there’s the threat of a major food crisis in India and elsewhere. We’re wondering how we can work with small farms as well as local consumers, policymakers, agricultural workers, agribusiness owners and NGOs to solve this problem.

Agriculture is very close to my heart, personally. Prior to joining Google, I spent a lot of time learning from smallholder farmers across my country and helping design concepts to address their needs. 

“Farmers feed the entire world — so how might we support them to be resilient and build sustainable systems that also support global food security?” Diana Akrong
UX researcher, Google

Courtney: I’ve been thinking about how AI can be seen as this magical, heroic thing, but there are also many risks to using it in places where there aren’t laws to protect people. When I think about Google’s AI Principles — be socially beneficial, be accountable to people, avoid reinforcing bias, prioritize safety — those things define what projects I want to work on. It’s also why my colleague Tabitha Yong and I developed a set of best practices for designing more equitable AI products.

Can you tell me more about your paper, “What Does AI Mean for Smallholder Farmers? A Proposal for Farmer-Centered AI Research,” recently published in ACM Interactions

Courtney: The impact and failures of AI are often very western and U.S.-centric. We’re trying to think about how to make this more fair and inclusive for communities with different needs around the globe. For example, in our farmer-centered AI research, we know that most existing AI solutions are designed for large farms in the developed world. However, many farmers in the Global South live and work in rural areas, which trail behind urban areas in terms of connectivity and digital adoption. By focusing on the daily realities of these farmers, we can better understand different perspectives, especially those of people who don’t live in the U.S. and Europe, so that Google’s products work for everyone, everywhere.

  • Courtney is on the left of a giant billboard at the CGIAR Platform for Big Data in Agriculture summit in 2019 and Diana is on the right side of the same sign.

    In 2019, Courtney and Diana led a workshop at the CGIAR Platform for Big Data in Agriculture summit; Courtney also participated in a panel discussion. In 2020, Diana spoke at a virtual CGIAR panel on human-centered design. 

  • Diana on the left and Courtney on the right are smiling and hugging each other.

    Diana (left) and Courtney (right) are dedicated to building inclusive AI for farmers with small rural businesses in Africa and Asia. Diana is based in Accra, Ghana and Courtney is based in Seattle, Washington in the U.S. 

  • Courtney is pictured on a research trip to India, looking out of a latticed window with a beautiful traditional Indian dress and earrings.

    Courtney pictured here during a research trip to India.

  • Diana is smiling and wearing an apron for a team building exercise.

    Diana is all smiles at a team building event.

Why did you want to work at Google?

Diana: I see Google as home to teams with diverse experiences and skills who work collaboratively to tackle complex, important issues that change real people’s lives. I’ve thrived here because I get to work on projects I care about and play a critical role in growing the UX community here in Ghana.

Courtney: I chose Google because we work on the world’s hardest problems. Googlers are  fearless and the reach of Google’s products and services is unprecedented. As someone who comes from an underrepresented group, I never thought I would work here. To be here at this moment is so important to me, my community and my family. When I look at issues I care about the most — marginalized and underrepresented communities — the work we do plays a critical role in preventing algorithmic bias, bridging the digital divide and lessening these inequalities. 

How have you seen your research help real people? 

Courtney: In 2018, we worked with Titi Akinsanmi, Google’s Policy and Government Relations Lead for West and Francophone Africa, and PAIR Co-lead and Principal Research Scientist Fernanda Viegas on the report for AI in Nigeria. Since then, the Ministry of Technology and Science reached out to Google to help form a strategy around AI. We’ve seen government bodies in sub-Saharan Africa use this paper as a roadmap to develop their own responsible AI policies.

How should aspiring AI thinkers and future technologists prepare for a career in this field?

Diana: My main advice? Start with people and their needs. A digital solution or AI may not be necessary to solve every problem. The PAIR Guidebook is a great reference for best practices and examples for designing with AI.

Read More

A crossword puzzle with a big purpose

Before the pandemic, Alicia Chang was working on a new project. “I was experimenting with non-traditional ways to help teach Googlers the AI Principles,” she says. Alicia is a technical writer on the Engineering Education team focused on designing learning experiences to help Googlers learn about our AI Principles and how to apply them in their own work.

The challenge for Alicia would be how many people she needed to educate. “There are so many people spread over different locations, time zones, countries!” But when the world started working from home, she was inspired by the various workarounds people were using to connect virtually. 

A photo of Alicia Chang sitting on a bench outside. She is looking into the camera and smiling.

Alicia Chang

“I started testing out activities like haiku-writing contests and online trivia,” Alicia says. “Then one day a friend mentioned an online escape room activity someone had arranged for a COVID-safe birthday gathering. Something really clicked with me when she mentioned that, and I started to think about designing an immersive learning experience.” Alicia decided to research how some of the most creative, dedicated people deliver information: She looked at what teachers were doing. 

Alicia soon stumbled upon a YouTube video about using Google Sheets to create a crossword puzzle, so she decided to make her own — and Googlers loved it. Since the crossword was such a success, Alicia decided to make more interactive games. She used Google Forms to create a fun “Which AI Principle are you?” quiz, and Google Docs to make a word search. Then there’s the Emoji Challenge, where players have to figure out which AI Principles a set of emoji describe. All of this became part of what is now known as the Responsible Innovation Challenge, a set of various puzzle activities built with Google products — including Forms, Sheets, Docs and Sites — that focus on teaching Google’s AI Principles.

The purpose of the Responsible Innovation Challenge is to introduce Google’s AI Principles to new technical hires in onboarding courses, and to help Googlers put the AI Principles into practice in everyday product development situations. The first few puzzles are fairly simple and help players remember and recall the Principles, which serve as a practical framework for responsible innovation. As Googlers start leveling up, the puzzles get a bit more complex.. There’s even a bonus level where Googlers are asked to think about various technical resources and tools they can use to develop AI responsibly by applying them in their existing workflow when creating a machine learning model.

Alicia added a points system and a leaderboard with digital badges — and even included prizes. “I noticed that people were motivated by some friendly competition. Googlers really got involved and referred their coworkers to play, too,” she says. “We had over 1,000 enroll in the first 30 days alone!” To date, more than 2,800 Googlers have participated from across 41 countries, and people continue to sign up. 

It’s been encouraging for Alicia to see how much Googlers are enjoying the puzzles, especially when screen time burnout is all too real. Most importantly, though, she’s thrilled that more people are learning about Google’s AI Principles. “Each of the billions of people who use Google products has a unique story and life experience,” Alicia says. “And that’s what we want to think about so we can make the best products for individual people.” 

Read More

SoundStream: An End-to-End Neural Audio Codec

Posted by Neil Zeghidour, Research Scientist and Marco Tagliasacchi, Staff Research Scientist, Google Research

Audio codecs are used to efficiently compress audio to reduce either storage requirements or network bandwidth. Ideally, audio codecs should be transparent to the end user, so that the decoded audio is perceptually indistinguishable from the original and the encoding/decoding process does not introduce perceivable latency.

Over the past few years, different audio codecs have been successfully developed to meet these requirements, including Opus and Enhanced Voice Services (EVS). Opus is a versatile speech and audio codec, supporting bitrates from 6 kbps (kilobits per second) to 510 kbps, which has been widely deployed across applications ranging from video conferencing platforms, like Google Meet, to streaming services, like YouTube. EVS is the latest codec developed by the 3GPP standardization body targeting mobile telephony. Like Opus, it is a versatile codec operating at multiple bitrates, 5.9 kbps to 128 kbps. The quality of the reconstructed audio using either of these codecs is excellent at medium-to-low bitrates (12–20 kbps), but it degrades sharply when operating at very low bitrates (⪅3 kbps). While these codecs leverage expert knowledge of human perception as well as carefully engineered signal processing pipelines to maximize the efficiency of the compression algorithms, there has been recent interest in replacing these handcrafted pipelines by machine learning approaches that learn to encode audio in a data-driven manner.

Earlier this year, we released Lyra, a neural audio codec for low-bitrate speech. In “SoundStream: an End-to-End Neural Audio Codec”, we introduce a novel neural audio codec that extends those efforts by providing higher-quality audio and expanding to encode different sound types, including clean speech, noisy and reverberant speech, music, and environmental sounds. SoundStream is the first neural network codec to work on speech and music, while being able to run in real-time on a smartphone CPU. It is able to deliver state-of-the-art quality over a broad range of bitrates with a single trained model, which represents a significant advance in learnable codecs.

Learning an Audio Codec from Data
The main technical ingredient of SoundStream is a neural network, consisting of an encoder, decoder and quantizer, all of which are trained end-to-end. The encoder converts the input audio stream into a coded signal, which is compressed using the quantizer and then converted back to audio using the decoder. SoundStream leverages state-of-the-art solutions in the field of neural audio synthesis to deliver audio at high perceptual quality, by training a discriminator that computes a combination of adversarial and reconstruction loss functions that induce the reconstructed audio to sound like the uncompressed original input. Once trained, the encoder and decoder can be run on separate clients to efficiently transmit high-quality audio over a network.

SoundStream training and inference. During training, the encoder, quantizer and decoder parameters are optimized using a combination of reconstruction and adversarial losses, computed by a discriminator, which is trained to distinguish between the original input audio and the reconstructed audio. During inference, the encoder and quantizer on a transmitter client send the compressed bitstream to a receiver client that can then decode the audio signal.

Learning a Scalable Codec with Residual Vector Quantization
The encoder of SoundStream produces vectors that can take an indefinite number of values. In order to transmit them to the receiver using a limited number of bits, it is necessary to replace them by close vectors from a finite set (called a codebook), a process known as vector quantization. This approach works well at bitrates around 1 kbps or lower, but quickly reaches its limits when using higher bitrates. For example, even at a bitrate as low as 3 kbps, and assuming the encoder produces 100 vectors per second, one would need to store a codebook with more than 1 billion vectors, which is infeasible in practice.

In SoundStream, we address this issue by proposing a new residual vector quantizer (RVQ), consisting of several layers (up to 80 in our experiments). The first layer quantizes the code vectors with moderate resolution, and each of the following layers processes the residual error from the previous one. By splitting the quantization process in several layers, the codebook size can be reduced drastically. As an example, with 100 vectors per second at 3 kbps, and using 5 quantizer layers, the codebook size goes from 1 billion to 320. Moreover, we can easily increase or decrease the bitrate by adding or removing quantizer layers, respectively.

Because network conditions can vary while transmitting audio, ideally a codec should be “scalable” so that it can change its bitrate from low to high depending on the state of the network. While most traditional codecs are scalable, previous learnable codecs need to be trained and deployed specifically for each bitrate.

To circumvent this limitation, we leverage the fact that the number of quantization layers in SoundStream controls the bitrate, and propose a new method called “quantizer dropout”. During training, we randomly drop some quantization layers to simulate a varying bitrate. This pushes the decoder to perform well at any bitrate of the incoming audio stream, and thus helps SoundStream to become “scalable” so that a single trained model can operate at any bitrate, performing as well as models trained specifically for these bitrates.

Comparison of SoundStream models (higher is better) that are trained at 18 kbps with quantizer dropout (bitrate scalable), without quantizer dropout (not bitrate scalable) and evaluated with a variable number of quantizers, or trained and evaluated at a fixed bitrate (bitrate specific). The bitrate-scalable model (a single model for all bitrates) does not lose any quality when compared to bitrate-specific models (a different model for each bitrate), thanks to quantizer dropout.

A State-of-the-Art Audio Codec
SoundStream at 3 kbps outperforms Opus at 12 kbps and approaches the quality of EVS at 9.6 kbps, while using 3.2x–4x fewer bits. This means that encoding audio with SoundStream can provide a similar quality while using a significantly lower amount of bandwidth. Moreover, at the same bitrate, SoundStream outperforms the current version of Lyra, which is based on an autoregressive network. Unlike Lyra, which is already deployed and optimized for production usage, SoundStream is still at an experimental stage. In the future, Lyra will incorporate the components of SoundStream to provide both higher audio quality and reduced complexity.

SoundStream at 3kbps vs. state-of-the-art codecs. MUSHRA score is an indication of subjective quality (the higher the better).

The demonstration of SoundStream’s performance compared to Opus, EVS, and the original Lyra codec is presented in these audio examples, a selection of which are provided below.

Speech

Reference
Lyra (3kbps)
Opus (6kbps)
EVS (5.9kbps)
SoundStream (3kbps)  

Music

Reference
Lyra (3kbps)
Opus (6kbps)
EVS (5.9kbps)
SoundStream (3kbps)  

Joint Audio Compression and Enhancement
In traditional audio processing pipelines, compression and enhancement (the removal of background noise) are typically performed by different modules. For example, it is possible to apply an audio enhancement algorithm at the transmitter side, before audio is compressed, or at the receiver side, after audio is decoded. In such a setup, each processing step contributes to the end-to-end latency. Conversely, we design SoundStream in such a way that compression and enhancement can be carried out jointly by the same model, without increasing the overall latency. In the following examples, we show that it is possible to combine compression with background noise suppression, by activating and deactivating denoising dynamically (no denoising for 5 seconds, denoising for 5 seconds, no denoising for 5 seconds, etc.).

Original noisy audio  
Denoised output*
* Demonstrated by turning denoising on and off every 5 seconds.

Conclusion
Efficient compression is necessary whenever one needs to transmit audio, whether when streaming a video, or during a conference call. SoundStream is an important step towards improving machine learning-driven audio codecs. It outperforms state-of-the-art codecs, such as Opus and EVS, can enhance audio on demand, and requires deployment of only a single scalable model, rather than many.

SoundStream will be released as a part of the next, improved version of Lyra. By integrating SoundStream with Lyra, developers can leverage the existing Lyra APIs and tools for their work, providing both flexibility and better sound quality. We will also release it as a separate TensorFlow model for experimentation.

AcknowledgmentsThe work described here was authored by Neil Zeghidour, Alejandro Luebs, Ahmed Omran, Jan Skoglund and Marco Tagliasacchi. We are grateful for all discussions and feedback on this work that we received from our colleagues at Google.

Read More

Demonstrating the Fundamentals of Quantum Error Correction

Posted by Jimmy Chen, Quantum Research Scientist and Matt McEwen, Student Researcher, Google Quantum AI

The Google Quantum AI team has been building quantum processors made of superconducting quantum bits (qubits) that have achieved the first beyond-classical computation, as well as the largest quantum chemical simulations to date. However, current generation quantum processors still have high operational error rates — in the range of 10-3 per operation, compared to the 10-12 believed to be necessary for a variety of useful algorithms. Bridging this tremendous gap in error rates will require more than just making better qubits — quantum computers of the future will have to use quantum error correction (QEC).

The core idea of QEC is to make a logical qubit by distributing its quantum state across many physical data qubits. When a physical error occurs, one can detect it by repeatedly checking certain properties of the qubits, allowing it to be corrected, preventing any error from occurring on the logical qubit state. While logical errors may still occur if a series of physical qubits experience an error together, this error rate should exponentially decrease with the addition of more physical qubits (more physical qubits need to be involved to cause a logical error). This exponential scaling behavior relies on physical qubit errors being sufficiently rare and independent. In particular, it’s important to suppress correlated errors, where one physical error simultaneously affects many qubits at once or persists over many cycles of error correction. Such correlated errors produce more complex patterns of error detections that are more difficult to correct and more easily cause logical errors.

Our team has recently implemented the ideas of QEC in our Sycamore architecture using quantum repetition codes. These codes consist of one-dimensional chains of qubits that alternate between data qubits, which encode the logical qubit, and measure qubits, which we use to detect errors in the logical state. While these repetition codes can only correct for one kind of quantum error at a time1, they contain all of the same ingredients as more sophisticated error correction codes and require fewer physical qubits per logical qubit, allowing us to better explore how logical errors decrease as logical qubit size grows.

In “Removing leakage-induced correlated errors in superconducting quantum error correction”, published in Nature Communications, we use these repetition codes to demonstrate a new technique for reducing the amount of correlated errors in our physical qubits. Then, in “Exponential suppression of bit or phase flip errors with repetitive error correction”, published in Nature, we show that the logical errors of these repetition codes are exponentially suppressed as we add more and more physical qubits, consistent with expectations from QEC theory.

Layout of the repetition code (21 qubits, 1D chain) and distance-2 surface code (7 qubits) on the Sycamore device.

Leaky Qubits
The goal of the repetition code is to detect errors on the data qubits without measuring their states directly. It does so by entangling each pair of data qubits with their shared measure qubit in a way that tells us whether those data qubit states are the same or different (i.e., their parity) without telling us the states themselves. We repeat this process over and over in rounds that last only one microsecond. When the measured parities change between rounds, we’ve detected an error.

However, one key challenge stems from how we make qubits out of superconducting circuits. While a qubit needs only two energy states, which are usually labeled |0 and |1, our devices feature a ladder of energy states, |0, |1, |2, |3, and so on. We use the two lowest energy states to encode our qubit with information to be used for computation (we call these the computational states). We use the higher energy states (|2, |3 and higher) to help achieve high-fidelity entangling operations, but these entangling operations can sometimes allow the qubit to “leak” into these higher states, earning them the name leakage states.

Population in the leakage states builds up as operations are applied, which increases the error of subsequent operations and even causes other nearby qubits to leak as well — resulting in a particularly challenging source of correlated error. In our early 2015 experiments on error correction, we observed that as more rounds of error correction were applied, performance declined as leakage began to build.

Mitigating the impact of leakage required us to develop a new kind of qubit operation that could “empty out” leakage states, called multi-level reset. We manipulate the qubit to rapidly pump energy out into the structures used for readout, where it will quickly move off the chip, leaving the qubit cooled to the |0 state, even if it started in |2 or |3. Applying this operation to the data qubits would destroy the logical state we’re trying to protect, but we can apply it to the measure qubits without disturbing the data qubits. Resetting the measure qubits at the end of every round dynamically stabilizes the device so leakage doesn’t continue to grow and spread, allowing our devices to behave more like ideal qubits.

Applying the multi-level reset gate to the measure qubits almost totally removes leakage, while also reducing the growth of leakage on the data qubits.

Exponential Suppression
Having mitigated leakage as a significant source of correlated error, we next set out to test whether the repetition codes give us the predicted exponential reduction in error when increasing the number of qubits. Every time we run our repetition code, it produces a collection of error detections. Because the detections are linked to pairs of qubits rather than individual qubits, we have to look at all of the detections to try to piece together where the errors have occurred, a procedure known as decoding. Once we’ve decoded the errors, we then know which corrections we need to apply to the data qubits. However, decoding can fail if there are too many error detections for the number of data qubits used, resulting in a logical error.

To test our repetition codes, we run codes with sizes ranging from 5 to 21 qubits while also varying the number of error correction rounds. We also run two different types of repetition codes — either a phase-flip code or bit-flip code — that are sensitive to different kinds of quantum errors. By finding the logical error probability as a function of the number of rounds, we can fit a logical error rate for each code size and code type. In our data, we see that the logical error rate does in fact get suppressed exponentially as the code size is increased.

Probability of getting a logical error after decoding versus number of rounds run, shown for various sizes of phase-flip repetition code.

We can quantify the error suppression with the error scaling parameter Lambda (Λ), where a Lambda value of 2 means that we halve the logical error rate every time we add four data qubits to the repetition code. In our experiments, we find Lambda values of 3.18 for the phase-flip code and 2.99 for the bit-flip code. We can compare these experimental values to a numerical simulation of the expected Lambda based on a simple error model with no correlated errors, which predicts values of 3.34 and 3.78 for the bit- and phase-flip codes respectively.

Logical error rate per round versus number of qubits for the phase-flip (X) and bit-flip (Z) repetition codes. The line shows an exponential decay fit, and Λ is the scale factor for the exponential decay.

This is the first time Lambda has been measured in any platform while performing multiple rounds of error detection. We’re especially excited about how close the experimental and simulated Lambda values are, because it means that our system can be described with a fairly simple error model without many unexpected errors occurring. Nevertheless, the agreement is not perfect, indicating that there’s more research to be done in understanding the non-idealities of our QEC architecture, including additional sources of correlated errors.

What’s Next
This work demonstrates two important prerequisites for QEC: first, the Sycamore device can run many rounds of error correction without building up errors over time thanks to our new reset protocol, and second, we were able to validate QEC theory and error models by showing exponential suppression of error in a repetition code. These experiments were the largest stress test of a QEC system yet, using 1000 entangling gates and 500 qubit measurements in our largest test. We’re looking forward to taking what we learned from these experiments and applying it to our target QEC architecture, the 2D surface code, which will require even more qubits with even better performance.


1A true quantum error correcting code would require a two dimensional array of qubits in order to correct for all of the errors that could occur. 

Read More

The C4_200M Synthetic Dataset for Grammatical Error Correction

Posted by Felix Stahlberg and Shankar Kumar, Research Scientists, Google Research

Grammatical error correction (GEC) attempts to model grammar and other types of writing errors in order to provide grammar and spelling suggestions, improving the quality of written output in documents, emails, blog posts and even informal chats. Over the past 15 years, there has been a substantial improvement in GEC quality, which can in large part be credited to recasting the problem as a “translation” task. When introduced in Google Docs, for example, this approach resulted in a significant increase in the number of accepted grammar correction suggestions.

One of the biggest challenges for GEC models, however, is data sparsity. Unlike other natural language processing (NLP) tasks, such as speech recognition and machine translation, there is very limited training data available for GEC, even for high-resource languages like English. A common remedy for this is to generate synthetic data using a range of techniques, from heuristic-based random word- or character-level corruptions to model-based approaches. However, such methods tend to be simplistic and do not reflect the true distribution of error types from actual users.

In “Synthetic Data Generation for Grammatical Error Correction with Tagged Corruption Models”, presented at the EACL 16th Workshop on Innovative Use of NLP for Building Educational Applications, we introduce tagged corruption models. Inspired by the popular back-translation data synthesis technique for machine translation, this approach enables the precise control of synthetic data generation, ensuring diverse outputs that are more consistent with the distribution of errors seen in practice. We used tagged corruption models to generate a new 200M sentence dataset, which we have released in order to provide researchers with realistic pre-training data for GEC. By integrating this new dataset into our training pipeline, we were able to significantly improve on GEC baselines.

Tagged Corruption Models
The idea behind applying a conventional corruption model to GEC is to begin with a grammatically correct sentence and then to “corrupt” it by adding errors. A corruption model can be easily trained by switching the source and target sentences in existing GEC datasets, a method that previous studies have shown that can be very effective for generating improved GEC datasets.

A conventional corruption model generates an ungrammatical sentence (red) given a clean input sentence (green).

The tagged corruption model that we propose builds on this idea by taking a clean sentence as input along with an error type tag that describes the kind of error one wishes to reproduce. It then generates an ungrammatical version of the input sentence that contains the given error type. Choosing different error types for different sentences increases the diversity of corruptions compared to a conventional corruption model.

Tagged corruption models generate corruptions (red) for the clean input sentence (green) depending on the error type tag. A determiner error may lead to dropping the “a”, whereas a noun-inflection error may produce the incorrect plural “sheeps”.

To use this model for data generation we first randomly selected 200M clean sentences from the C4 corpus, and assigned an error type tag to each sentence such that their relative frequencies matched the error type tag distribution of the small development set BEA-dev. Since BEA-dev is a carefully curated set that covers a wide range of different English proficiency levels, we expect its tag distribution to be representative for writing errors found in the wild. We then used a tagged corruption model to synthesize the source sentence.

Synthetic data generation with tagged corruption models. The clean C4 sentences (green) are paired with the corrupted sentences (red) in the synthetic GEC training corpus. The corrupted sentences are generated using a tagged corruption model by following the error type frequencies in the development set (bar chart).

Results
In our experiments, tagged corruption models outperformed untagged corruption models on two standard development sets (CoNLL-13 and BEA-dev) by more than three F0.5-points (a standard metric in GEC research that combines precision and recall with more weight on precision), advancing the state-of-the-art on the two widely used academic test sets, CoNLL-14 and BEA-test.

In addition, the use of tagged corruption models not only yields gains on standard GEC test sets, it is also able to adapt GEC systems to the proficiency levels of users. This could be useful, for example, because the error tag distribution for native English writers often differs significantly from the distributions for non-native English speakers. For example, native speakers tend to make more punctuation and spelling mistakes, whereas determiner errors (e.g., missing or superfluous articles, like “a”, “an” or “the”) are more common in text from non-native writers.

Conclusion
Neural sequence models are notoriously data-hungry, but the availability of annotated training data for grammatical error correction is rare. Our new C4_200M corpus is a synthetic dataset containing diverse grammatical errors, which yields state-of-the-art performance when used to pre-train GEC systems. By releasing the dataset we hope to provide GEC researchers with a valuable resource to train strong baseline systems.

Read More