KELM: Integrating Knowledge Graphs with Language Model Pre-training Corpora

Posted by Siamak Shakeri, Staff Software Engineer and Oshin Agarwal, Research Intern, Google Research

Large pre-trained natural language processing (NLP) models, such as BERT, RoBERTa, GPT-3, T5 and REALM, leverage natural language corpora that are derived from the Web and fine-tuned on task specific data, and have made significant advances in various NLP tasks. However, natural language text alone represents a limited coverage of knowledge, and facts may be contained in wordy sentences in many different ways. Furthermore, existence of non-factual information and toxic content in text can eventually cause biases in the resulting models.

Alternate sources of information are knowledge graphs (KGs), which consist of structured data. KGs are factual in nature because the information is usually extracted from more trusted sources, and post-processing filters and human editors ensure inappropriate and incorrect content are removed. Therefore, models that can incorporate them carry the advantages of improved factual accuracy and reduced toxicity. However, their different structural format makes it difficult to integrate them with the existing pre-training corpora in language models.

In “Knowledge Graph Based Synthetic Corpus Generation for Knowledge-Enhanced Language Model Pre-training” (KELM), accepted at NAACL 2021, we explore converting KGs to synthetic natural language sentences to augment existing pre-training corpora, enabling their integration into the pre-training of language models without architectural changes. To that end, we leverage the publicly available English Wikidata KG and convert it into natural language text in order to create a synthetic corpus. We then augment REALM, a retrieval-based language model, with the synthetic corpus as a method of integrating natural language corpora and KGs in pre-training. We have released this corpus publicly for the broader research community.

Converting KG to Natural Language Text
KGs consist of factual information represented explicitly in a structured format, generally in the form of [subject entity, relation, object entity] triples, e.g., [10×10 photobooks, inception, 2012]. A group of related triples is called an entity subgraph. An example of an entity subgraph that builds on the previous example of a triple is { [10×10 photobooks, instance of, Nonprofit Organization], [10×10 photobooks, inception, 2012] }, which is illustrated in the figure below. A KG can be viewed as interconnected entity subgraphs.

Converting subgraphs into natural language text is a standard task in NLP known as data-to-text generation. Although there have been significant advances on data-to-text-generation on benchmark datasets such as WebNLG, converting an entire KG into natural text has additional challenges. The entities and relations in large KGs are more vast and diverse than small benchmark datasets. Moreover, benchmark datasets consist of predefined subgraphs that can form fluent meaningful sentences. With an entire KG, such a segmentation into entity subgraphs needs to be created as well.

An example illustration of how the pipeline converts an entity subgraph (in bubbles) into synthetic natural sentences (far right).

In order to convert the Wikidata KG into synthetic natural sentences, we developed a verbalization pipeline named “Text from KG Generator” (TEKGEN), which is made up of the following components: a large training corpus of heuristically aligned Wikipedia text and Wikidata KG triples, a text-to-text generator (T5) to convert the KG triples to text, an entity subgraph creator for generating groups of triples to be verbalized together, and finally, a post-processing filter to remove low quality outputs. The result is a corpus containing the entire Wikidata KG as natural text, which we call the Knowledge-Enhanced Language Model (KELM) corpus. It consists of ~18M sentences spanning ~45M triples and ~1500 relations.

Converting a KG to natural language, which is then used for language model augmentation

Integrating Knowledge Graph and Natural Text for Language Model Pre-training
Our evaluation shows that KG verbalization is an effective method of integrating KGs with natural language text. We demonstrate this by augmenting the retrieval corpus of REALM, which includes only Wikipedia text.

To assess the effectiveness of verbalization, we augment the REALM retrieval corpus with the KELM corpus (i.e., “verbalized triples”) and compare its performance against augmentation with concatenated triples without verbalization. We measure the accuracy with each data augmentation technique on two popular open-domain question answering datasets: Natural Questions and Web Questions.

Augmenting REALM with even the concatenated triples improves accuracy, potentially adding information not expressed in text explicitly or at all. However, augmentation with verbalized triples allows for a smoother integration of the KG with the natural language text corpus, as demonstrated by the higher accuracy. We also observed the same trend on a knowledge probe called LAMA that queries the model using fill-in-the-blank questions.

Conclusion
With KELM, we provide a publicly-available corpus of a KG as natural text. We show that KG verbalization can be used to integrate KGs with natural text corpora to overcome their structural differences. This has real-world applications for knowledge-intensive tasks, such as question answering, where providing factual knowledge is essential. Moreover, such corpora can be applied in pre-training of large language models, and can potentially reduce toxicity and improve factuality. We hope that this work encourages further advances in integrating structured knowledge sources into pre-training of large language models.

Acknowledgements
This work has been a collaborative effort involving Oshin Agarwal, Heming Ge, Siamak Shakeri and Rami Al-Rfou. We thank William Woods, Jonni Kanerva, Tania Rojas-Esponda, Jianmo Ni, Aaron Cohen and Itai Rolnick for rating a sample of the synthetic corpus to evaluate its quality. We also thank Kelvin Guu for his valuable feedback on the paper.

Read More

11 ways we’re innovating with AI

AI is integral to so much of the work we do at Google. Fundamental advances in computing are helping us confront some of the greatest challenges of this century, like climate change. Meanwhile, AI is also powering updates across our products, including Search, Maps and Photos — demonstrating how machine learning can improve your life in both big and small ways. 

In case you missed it, here are some of the AI-powered updates we announced at Google I/O.


LaMDA is a breakthrough in natural language understanding for dialogue.

Human conversations are surprisingly complex. They’re grounded in concepts we’ve learned throughout our lives; are composed of responses that are both sensible and specific; and unfold in an open-ended manner. LaMDA — short for “Language Model for Dialogue Applications” — is a machine learning model designed for dialogue and built on Transformer, a neural network architecture that Google invented and open-sourced. We think that this early-stage research could unlock more natural ways of interacting with technology and entirely new categories of helpful applications. Learn more about LaMDA.


And MUM, our new AI language model, will eventually help make Google Search a lot smarter.

In 2019 we launched BERT, a Transformer AI model that can better understand the intent behind your Search queries. Multitask Unified Model (MUM), our latest milestone, is 1000x more powerful than BERT. It can learn across 75 languages at once (most AI models train on one language at a time), and it can understand information across text, images, video and more. We’re still in the early days of exploring MUM, but the goal is that one day you’ll be able to type a long, information-dense, and natural sounding query like “I’ve hiked Mt. Adams and now want to hike Mt. Fuji next fall, what should I do differently to prepare?” and more quickly find relevant information you need. Learn more about MUM.

 

Project Starline will help you feel like you’re there, together.

Imagine looking through a sort of magic window. And through that window, you see another person, life-size, and in three dimensions. You can talk naturally, gesture and make eye contact.

  • A woman communicates with her sister and baby using Project Starline.

    We brought in people to reconnect using Project Starline.

  • A woman communicates with her friend using Project Starline.

    We brought in people to reconnect using Project Starline.

  • A woman and a man communicate using sign language using Project Starline.

    We brought in people to reconnect using Project Starline.

  • A woman communicates with her friend using Project Starline.

    We brought in people to reconnect using Project Starline.

Project Starline is a technology project that combines advances in hardware and software to enable friends, family and co-workers to feel together, even when they’re cities (or countries) apart. To create this experience, we’re applying research in computer vision, machine learning, spatial audio and real-time compression. And we’ve developed a light field display system that creates a sense of volume and depth without needing additional glasses or headsets. It feels like someone is sitting just across from you, like they’re right there. Learn more about Project Starline.

Within a decade, we’ll build the world’s first useful, error-corrected quantum computer. And our new Quantum AI campus is where it’ll happen. 

Confronting many of the world’s greatest challenges, from climate change to the next pandemic, will require a new kind of computing. A useful, error-corrected quantum computer will allow us to mirror the complexity of nature, enabling us to develop new materials, better batteries, more effective medicines and more. Our new Quantum AI campus — home to research offices, a fabrication facility, and our first quantum data center — will help us build that computer before the end of the decade. Learn more about our work on the Quantum AI campus.


Maps will help reduce hard-braking moments while you drive.

Soon, Google Maps will use machine learning to reduce your chances of experiencing hard-braking moments — incidents where you slam hard on your brakes, caused by things like sudden traffic jams or confusion about which highway exit to take. 

When you get directions in Maps, we calculate your route based on a lot of factors, like how many lanes a road has or how direct the route is. With this update, we’ll also factor in the likelihood of hard-braking. Maps will identify the two fastest route options for you, and then we’ll automatically recommend the one with fewer hard-braking moments (as long as your ETA is roughly the same). We believe these changes have the potential to eliminate over 100 million hard-braking events in routes driven with Google Maps each year. Learn more about our updates to Maps.


Your Memories in Google Photos will become even more personalized.

With Memories, you can already look back on important photos from years past or highlights from the last week. Using machine learning, we’ll soon be able to identify the less-obvious patterns in your photos. Starting later this summer, when we find a set of three or more photos with similarities like shape or color, we’ll highlight these little patterns for you in your Memories. For example, Photos might identify a pattern of your family hanging out on the same couch over the years — something you wouldn’t have ever thought to search for, but that tells a meaningful story about your daily life. Learn more about our updates to Google Photos.


And Cinematic moments will bring your pictures to life.

When you’re trying to get the perfect photo, you usually take the same shot two or three (or 20) times. Using neural networks, we can take two nearly identical images and fill in the gaps by creating new frames in between. This creates vivid, moving images called Cinematic moments. 

Producing this effect from scratch would take professional animators hours, but with machine learning we can automatically generate these moments and bring them to your Recent Highlights. Best of all, you don’t need a specific phone; Cinematic moments will come to everyone across Android and iOS. Learn more about Cinematic moments in Google Photos.

Two very similar pictures of a child and their baby sibling get transformed into a moving image thanks to AI.

Cinematic moments bring your pictures to life, thanks to AI.

New features in Google Workspace help make collaboration more inclusive. 

In Google Workspace, assisted writing will suggest more inclusive language when applicable. For example, it may recommend that you use the word “chairperson” instead of “chairman” or “mail carrier” instead of “mailman.” It can also give you other stylistic suggestions to avoid passive voice and offensive language, which can speed up editing and help make your writing stronger. Learn more about our updates to Workspace.

Google Shopping shows you the best products for your particular needs, thanks to our Shopping Graph.

To help shoppers find what they’re looking for, we need to have a deep understanding of all the products that are available, based on information from images, videos, online reviews and even inventory in local stores. Enter the Shopping Graph: our AI-enhanced model tracks products, sellers, brands, reviews, product information and inventory data — as well as how all these attributes relate to one another. With people shopping across Google more than a billion times a day, the Shopping Graph makes those sessions more helpful by connecting people with over 24 billion listings from millions of merchants across the web. Learn how we’re working with merchants to give you more ways to shop.

A dermatology assist tool can help you figure out what’s going on with your skin.

Each year we see billions of Google Searches related to skin, nail and hair issues, but it can be difficult to describe what you’re seeing on your skin through words alone.

With our CE marked AI-powered dermatology assist tool, a web-based application that we aim to make available for early testing in the EU later this year, it’s easier to figure out what might be going on with your skin. Simply use your phone’s camera to take three images of the skin, hair or nail concern from different angles. You’ll then be asked questions about your skin type, how long you’ve had the issue and other symptoms that help the AI to narrow down the possibilities. The AI model analyzes all of this information and draws from its knowledge of 288 conditions to give you a list of possible conditions that you can then research further. It’s not meant to be a replacement for diagnosis, but rather a good place to start. Learn more about our AI-powered dermatology assist tool.

And AI could help improve screening for tuberculosis.

Tuberculosis (TB) is one of the leading causes of death worldwide, infecting 10 million people per year and disproportionately impacting people in low-to-middle-income countries. It’s also really tough to diagnose early because of how similar symptoms are to other respiratory diseases. Chest X-rays help with diagnosis, but experts aren’t always available to read the results. That’s why the World Health Organization (WHO) recently recommended using technology to help with screening and triaging for TB. Researchers at Google are exploring how AI can be used to identify potential TB patients for follow-up testing, hoping to catch the disease early and work to eradicate it. Learn more about our ongoing research into tuberculosis screening.

Read More

Maysam Moussalem teaches Googlers human-centered AI

Originally, Maysam Moussalem dreamed of being an architect. “When I was 10, I looked up to see the Art Nouveau dome over the Galeries Lafayette in Paris, and I knew I wanted to make things like that,” she says. “Growing up between Austin, Paris, Beirut and Istanbul just fed my love of architecture.” But she found herself often talking to her father, a computer science (CS) professor, about what she wanted in a career. “I always loved art and science and I wanted to explore the intersections between fields. CS felt broader to me, and so I ended up there.”

While in grad school for CS, her advisor encouraged her to apply for a National Science Foundation Graduate Research Fellowship. “Given my lack of publications at the time, I wasn’t sure I should apply,” Maysam remembers. “But my advisor gave me some of the best advice I’ve ever received: ‘If you try, you may not get it. But if you don’t try, you definitely won’t get it.’” Maysam received the scholarship, which supported her throughout grad school. “I’ll always be grateful for that advice.” 

Today, Maysam works in AI, in Google’s Machine Learning Education division and also as the co-author and editor-in-chief of the People + AI Research (PAIR) Guidebook. She’s hosting a session at Google I/O on “Building trusted AI products” as well, which you can view when it’s live at 9 am PT Thursday, May 20, as a part of Google Design’s I/O Agenda. We recently took some time to talk to Maysam about what landed her at Google, and her path toward responsible innovation.

How would you explain your job to someone who isn’t in tech?

I create different types of training, like workshops and labs for Googlers who work in machine learning and data science. I also help create guidebooks and courses that people who don’t work at Google use.

What’s something you didn’t realize would help you in your career one day?

I didn’t think that knowing seven languages would come in handy for my work here, but it did! When I was working on the externalization of the Machine Learning Crash Course, I was so happy to be able to review modules and glossary entries for the French translation!

How do you apply Google’s AI Principles in your work? 

I’m applying the AI Principles whenever I’m helping teams learn best practices for building user-centered products with AI. It’s so gratifying when someone who’s taken one of my classes tells me they had a great experience going through the training, they enjoyed learning something new and they feel ready to apply it in their work. Just like when I was an engineer, anytime someone told me the tool I’d worked on helped them do their job better and addressed their needs, it drove home the fourth AI principle: Being accountable to people. It’s so important to put people first in our work. 

This idea was really important when I was working on Google’s People + AI Research (PAIR) Guidebook. I love PAIR’s approach of putting humans at the center of product development. It’s really helpful when people in different roles come together and pool their skills to make better products. 

How did you go from being an engineer to doing what you’re doing now? 

At Google, it feels like I don’t have to choose between learning and working. There are tech talks every week, plus workshops and codelabs constantly. I’ve loved continuing to learn while working here.

Being raised by two professors also gave me a love of teaching. I wanted to share what I’d learned with others. My current role enables me to do this and use a wider range of my skills.

My background as an engineer gives me a strong understanding of how we build software at Google’s scale. This inspires me to think more about how to bring education into the engineering workflow, rather than forcing people to learn from a disconnected experience.

How can aspiring AI thinkers and future technologists prepare for a career in responsible innovation? 

Pick up and exercise a variety of skills! I’m a technical educator, but I’m always happy to pick up new skills that aren’t traditionally specific to my job. For example, I was thinking of a new platform to deliver internal data science training, and I learned how to create a prototype using UX tools so that I could illustrate my ideas really clearly in my proposal. I write, code, teach, design and I’m always interested in learning new techniques from my colleagues in other roles.

And spend time with your audience, the people who will be using your product or the coursework you’re creating or whatever it is you’re working on. When I was an engineer, I’d always look for opportunities to sit with, observe, and talk with the people who were using my team’s products. And I learned so much from this process.

Read More

Project Guideline: Enabling Those with Low Vision to Run Independently

Posted by Xuan Yang, Software Engineer, Google Research

For the 285 million people around the world living with blindness or low vision, exercising independently can be challenging. Earlier this year, we announced Project Guideline, an early-stage research project, developed in partnership with Guiding Eyes for the Blind, that uses machine learning to guide runners through a variety of environments that have been marked with a painted line. Using only a phone running Guideline technology and a pair of headphones, Guiding Eyes for the Blind CEO Thomas Panek was able to run independently for the first time in decades and complete an unassisted 5K in New York City’s Central Park.

Safely and reliably guiding a blind runner in unpredictable environments requires addressing a number of challenges. Here, we will walk through the technology behind Guideline and the process by which we were able to create an on-device machine learning model that could guide Thomas on an independent outdoor run. The project is still very much under development, but we’re hopeful it can help explore how on-device technology delivered by a mobile phone can provide reliable, enhanced mobility and orientation experiences for those who are blind or low vision.

Thomas Panek using Guideline technology to run independently outdoors.

Project Guideline
The Guideline system consists of a mobile device worn around the user’s waist with a custom belt and harness, a guideline on the running path marked with paint or tape, and bone conduction headphones. Core to the Guideline technology is an on-device segmentation model that takes frames from a mobile device’s camera as input and classifies every pixel in the frame into two classes, “guideline” and “not guideline”. This simple confidence mask, applied to every frame, allows the Guideline app to predict where runners are with respect to a line on the path, without using location data. Based on this prediction and the proceeding smoothing/filtering function, the app sends audio signals to the runners to help them orient and stay on the line, or audio alerts to tell runners to stop if they veer too far away.

Project Guideline uses Android’s built-in Camera 2 and MLKit APIs and adds custom modules to segment the guideline, detect its position and orientation, filter false signals, and send a stereo audio signal to the user in real-time.

We faced a number of important challenges in building the preliminary Guideline system:

  1. System accuracy: Mobility for the blind and low vision community is a challenge in which user safety is of paramount importance. It demands a machine learning model that is capable of generating accurate and generalized segmentation results to ensure the safety of the runner in different locations and under various environmental conditions.
  2. System performance: In addition to addressing user safety, the system needs to be performative, efficient, and reliable. It must process at least 15 frames per second (FPS) in order to provide real-time feedback for the runner. It must also be able to run for at least 3 hours without draining the phone battery, and must work offline, without the need for internet connection should the walking/running path be in an area without data service.
  3. Lack of in-domain data: In order to train the segmentation model, we needed a large volume of video consisting of roads and running paths that have a yellow line on them. To generalize the model, data variety is equally as critical as data quantity, requiring video frames taken at different times of day, with different lighting conditions, under different weather conditions, at different locations, etc.

Below, we introduce solutions for each of these challenges.

Network Architecture
To meet the latency and power requirements, we built the line segmentation model on the DeepLabv3 framework, utilizing MobilenetV3-Small as the backbone, while simplifying the outputs to two classes – guideline and background.

The model takes an RGB frame and generates an output grayscale mask, representing the confidence of each pixel’s prediction.

To increase throughput speed, we downsize the camera feed from 1920 x 1080 pixels to 513 x 513 pixels as input to the DeepLab segmentation model. To further speed-up the DeepLab model for use on mobile devices, we skipped the last up-sample layer, and directly output the 65 x 65 pixel predicted masks. These 65 x 65 pixel predicted masks are provided as input to the post processing. By minimizing the input resolution in both stages, we’re able to improve the runtime of the segmentation model and speed up post-processing.

Data Collection
To train the model, we required a large set of training images in the target domain that exhibited a variety of path conditions. Not surprisingly, the publicly available datasets were for autonomous driving use cases, with roof mounted cameras and cars driving between the lines, and were not in the target domain. We found that training models on these datasets delivered unsatisfying results due to the large domain gap. Instead, the Guideline model needed data collected with cameras worn around a person’s waist, running on top of the line, without the adversarial objects found on highways and crowded city streets.

The large domain gap between autonomous driving datasets and the target domain. Images on the left courtesy of the Berkeley DeepDrive dataset.

With preexisting open-source datasets proving unhelpful for our use case, we created our own training dataset composed of the following:

  1. Hand-collected data: Team members temporarily placed guidelines on paved pathways using duct tape in bright colors and recorded themselves running on and around the lines at different times of the day and in different weather conditions.
  2. Synthetic data: The data capture efforts were complicated and severely limited due to COVID-19 restrictions. This led us to build a custom rendering pipeline to synthesize tens of thousands of images, varying the environment, weather, lighting, shadows, and adversarial objects. When the model struggled with certain conditions in real-world testing, we were able to generate specific synthetic datasets to address the situation. For example, the model originally struggled with segmenting the guideline amidst piles of fallen autumn leaves. With additional synthetic training data, we were able to correct for that in subsequent model releases.
Rendering pipeline generates synthetic images to capture a broad spectrum of environments.

We also created a small regression dataset, which consisted of annotated samples of the most frequently seen scenarios combined with the most challenging scenarios, including tree and human shadows, fallen leaves, adversarial road markings, sunlight reflecting off the guideline, sharp turns, steep slopes, etc. We used this dataset to compare new models to previous ones and to make sure that an overall improvement in accuracy of the new model did not hide a reduction in accuracy in particularly important or challenging scenarios.

Training Procedure
We designed a three-stage training procedure and used transfer learning to overcome the limited in-domain training dataset problem. We started with a model that was pre-trained on Cityscape, and then trained the model using the synthetic images, as this dataset is larger but of lower quality. Finally, we fine-tuned the model using the limited in-domain data we collected.

Three-stage training procedure to overcome the limited data issue. Images in the left column courtesy of Cityscapes.

Early in development, it became clear that the segmentation model’s performance suffered at the top of the image frame. As the guidelines travel further away from the camera’s point of view at the top of the frame, the lines themselves start to vanish. This causes the predicted masks to be less accurate at the top parts of the frame. To address this problem, we computed a loss value that was based on the top k pixel rows in every frame. We used this value to select those frames that included the vanishing guidelines with which the model struggled, and trained the model repeatedly on those frames. This process proved to be very helpful not only in addressing the vanishing line problem, but also for solving other problems we encountered, such as blurry frames, curved lines and line occlusion by adversarial objects.

The segmentation model’s accuracy and robustness continuously improved even in challenging cases.

System Performance
Together with Tensorflow Lite and ML Kit, the end-to-end system runs remarkably fast on Pixel devices, achieving 29+ FPS on Pixel 4 XL and 20+ FPS on Pixel 5. We deployed the segmentation model entirely on DSP, running at 6 ms on Pixel 4 XL and 12 ms on Pixel 5 with high accuracy. The end-to-end system achieves 99.5% frame success rate, 93% mIoU on our evaluation dataset, and passes our regression test. These model performance metrics are incredibly important and enable the system to provide real-time feedback to the user.

What’s Next
We’re still at the beginning of our exploration, but we’re excited about our progress and what’s to come. We’re starting to collaborate with additional leading non-profit organizations that serve the blind and low vision communities to put more Guidelines in parks, schools, and public places. By painting more lines, getting direct feedback from users, and collecting more data under a wider variety of conditions, we hope to further generalize our segmentation model and improve the existing feature-set. At the same time, we are investigating new research and techniques, as well as new features and capabilities that would improve the overall system robustness and reliability.

To learn more about the project and how it came to be, read Thomas Panek’s story. If you want to help us put more Guidelines in the world, please visit goo.gle/ProjectGuideline.

Acknowledgements
Project Guideline is a collaboration across Google Research, Google Creative Lab, and the Accessibility Team. We especially would like to thank our team members: Mikhail Sirotenko, Sagar Waghmare, Lucian Lonita, Tomer Meron, Hartwig Adam, Ryan Burke, Dror Ayalon, Amit Pitaru, Matt Hall, John Watkinson, Phil Bayer, John Mernacaj, Cliff Lungaretti, Dorian Douglass, Kyndra LoCoco. We also thank Fangting Xia, Jack Sim and our other colleagues and friends from the Mobile Vision team and Guiding Eyes for the Blind.

Read More

Google I/O 2021: Being helpful in moments that matter

It’s great to be back hosting our I/O Developers Conference this year. Pulling up to our Mountain View campus this morning, I felt a sense of normalcy for the first time in a long while. Of course, it’s not the same without our developer community here in person. COVID-19 has deeply affected our entire global community over the past year and continues to take a toll. Places such as Brazil, and my home country of India, are now going through their most difficult moments of the pandemic yet. Our thoughts are with everyone who has been affected by COVID and we are all hoping for better days ahead.

The last year has put a lot into perspective. At Google, it’s also given renewed purpose to our mission to organize the world’s information and make it universally accessible and useful. We continue to approach that mission with a singular goal: building a more helpful Google, for everyone. That means being helpful to people in the moments that matter and giving everyone the tools to increase their knowledge, success, health and happiness. 

Helping in moments that matter

Sometimes it’s about helping in big moments, like keeping 150 million students and educators learning virtually over the last year with Google Classroom. Other times it’s about helping in little moments that add up to big changes for everyone. For example, we’re introducing safer routing in Maps. This AI-powered capability in Maps can identify road, weather and traffic conditions where you are likely to brake suddenly; our aim is to reduce up to 100 million events like this every year. 

Reimagining the future of work

One of the biggest ways we can help is by reimagining the future of work. Over the last year, we’ve seen work transform in unprecedented ways, as offices and coworkers have been replaced by kitchen countertops and pets. Many companies, including ours, will continue to offer flexibility even when it’s safe to be in the same office again. Collaboration tools have never been more critical, and today we announced a new smart canvas experience in Google Workspace that enables even richer collaboration. 

GIF of Smart Canvas integration with Google Meet

 Smart Canvas integration with Google Meet

Responsible next-generation AI

We’ve made remarkable advances over the past 22 years, thanks to our progress in some of the most challenging areas of AI, including translation, images and voice. These advances have powered improvements across Google products, making it possible to talk to someone in another language using Assistant’s interpreter mode, view cherished memories on Photos or use Google Lens to solve a tricky math problem. 

We’ve also used AI to improve the core Search experience for billions of people by taking a huge leap forward in a computer’s ability to process natural language. Yet, there are still moments when computers just don’t understand us. That’s because language is endlessly complex: We use it to tell stories, crack jokes and share ideas — weaving in concepts we’ve learned over the course of our lives. The richness and flexibility of language make it one of humanity’s greatest tools and one of computer science’s greatest challenges. 

Today I am excited to share our latest research in natural language understanding: LaMDA. LaMDA is a language model for dialogue applications. It’s open domain, which means it is designed to converse on any topic. For example, LaMDA understands quite a bit about the planet Pluto. So if a student wanted to discover more about space, they could ask about Pluto and the model would give sensible responses, making learning even more fun and engaging. If that student then wanted to switch over to a different topic — say, how to make a good paper airplane — LaMDA could continue the conversation without any retraining.

This is one of the ways we believe LaMDA can make information and computing radically more accessible and easier to use (and you can learn more about that here). 

We have been researching and developing language models for many years. We’re focused on ensuring LaMDA meets our incredibly high standards on fairness, accuracy, safety and privacy, and that it is developed consistently with our AI Principles. And we look forward to incorporating conversation features into products like Google Assistant, Search and Workspace, as well as exploring how to give capabilities to developers and enterprise customers.

LaMDA is a huge step forward in natural conversation, but it’s still only trained on text. When people communicate with each other they do it across images, text, audio and video. So we need to build multimodal models (MUM) to allow people to naturally ask questions across different types of information. With MUM you could one day plan a road trip by asking Google to “find a route with beautiful mountain views.” This is one example of how we’re making progress towards more natural and intuitive ways of interacting with Search.

Pushing the frontier of computing

Translation, image recognition and voice recognition laid the foundation for complex models like LaMDA and multimodal models. Our compute infrastructure is how we drive and sustain these advances, and TPUs, our custom-built machine learning processes, are a big part of that. Today we announced our next generation of TPUs: the TPU v4. These are powered by the v4 chip, which is more than twice as fast as the previous generation. One pod can deliver more than one exaflop, equivalent to the computing power of 10 million laptops combined. This is the fastest system we’ve ever deployed, and a historic milestone for us. Previously to get to an exaflop, you needed to build a custom supercomputer. And we’ll soon have dozens of TPUv4 pods in our data centers, many of which will be operating at or near 90% carbon-free energy. They’ll be available to our Cloud customers later this year.

Images of a TPU v4 chip tray, and of TPU v4 pods at our Oklahoma data center

Left: TPU v4 chip tray; Right: TPU v4 pods at our Oklahoma data center 

It’s tremendously exciting to see this pace of innovation. As we look further into the future, there are types of problems that classical computing will not be able to solve in reasonable time. Quantum computing can help. Achieving our quantum milestone was a tremendous accomplishment, but we’re still at the beginning of a multiyear journey. We continue to work to get to our next big milestone in quantum computing: building an error-corrected quantum computer, which could help us increase battery efficiency, create more sustainable energy and improve drug discovery. To help us get there, we’ve opened a new state of the art Quantum AI campus with our first quantum data center and quantum processor chip fabrication facilities.

A photo of the interior of our new Quantum AI campus

Inside our new Quantum AI campus.

Safer with Google

At Google we know that our products can only be as helpful as they are safe. And advances in computer science and AI are how we continue to make them better. We keep more users safe by blocking malware, phishing attempts, spam messages and potential cyber attacks than anyone else in the world.

Our focus on data minimization pushes us to do more, with less data. Two years ago at I/O, I announced Auto-Delete, which encourages users to have their activity data automatically and continuously deleted. We’ve since made Auto-Delete the default for all new Google Accounts. Now, after 18 months we automatically delete your activity data, unless you tell us to do it sooner. It’s now active for over 2 billion accounts.

All of our products are guided by three important principles: With one of the world’s most advanced security infrastructures, our products are secure by default. We strictly uphold responsible data practices so every product we build is private by design. And we create easy to use privacy and security settings so you’re in control.

Long-term research: Project Starline

We were all grateful to have video conferencing over the last year to stay in touch with family and friends, and keep schools and businesses going. But there is no substitute for being together in the room with someone. 

Several years ago we kicked off a project called Project Starline to use technology to explore what’s possible. Using high-resolution cameras and custom-built depth sensors, it captures your shape and appearance from multiple perspectives, and then fuses them together to create an extremely detailed, real-time 3D model. The resulting data is many gigabits per second, so to send an image this size over existing networks, we developed novel compression and streaming algorithms that reduce the data by a factor of more than 100. We also developed a breakthrough light-field display that shows you the realistic representation of someone sitting in front of you. As sophisticated as the technology is, it vanishes, so you can focus on what’s most important. 

We’ve spent thousands of hours testing it at our own offices, and the results are promising. There’s also excitement from our lead enterprise partners, and we’re working with partners in health care and media to get early feedback. In pushing the boundaries of remote collaboration, we’ve made technical advances that will improve our entire suite of communications products. We look forward to sharing more in the months ahead.

A person in a booth talking to someone over Project Starline

A person having a conversation with someone over Project Starline.

Solving complex sustainability challenges

Another area of research is our work to drive forward sustainability. Sustainability has been a core value for us for more than 20 years. We were the first major company to become carbon neutral in 2007. We were the first to match our operations with 100% renewable energy in 2017, and we’ve been doing it ever since. Last year we eliminated our entire carbon legacy. 

Our next ambition is our biggest yet: operating on carbon free energy by the year 2030. This represents a significant step change from current approaches and is a moonshot on the same scale as quantum computing. It presents equally hard problems to solve, from sourcing carbon-free energy in every place we operate to ensuring it can run every hour of every day. 

Building on the first carbon-intelligent computing platform that we rolled out last year, we’ll soon be the first company to implement carbon-intelligent load shifting across both time and place within our data center network. By this time next year we’ll be shifting more than a third of non-production compute to times and places with greater availability of carbon-free energy. And we are working to apply our Cloud AI with novel drilling techniques and fiber optic sensing to deliver geothermal power in more places, starting in our Nevada data centers next year.

Investments like these are needed to get to 24/7 carbon-free energy, and it’s happening in Mountain View, California, too. We’re building our new campus to the highest sustainability standards. When completed, these buildings will feature a first-of-its-kind dragonscale solar skin, equipped with 90,000 silver solar panels and the capacity to generate nearly 7 megawatts. They will house the largest geothermal pile system in North America to help heat buildings in the winter and cool them in the summer. It’s been amazing to see it come to life.

Images with a rendering of the new Charleston East campus in Mountain View, California; and a model view with dragon scale solar skin.

Left: Rendering of the new Charleston East campus in Mountain View, California; Right: Model view with dragon scale solar skin.

A celebration of technology

I/O isn’t just a celebration of technology but of the people who use it, and build it — including the millions of developers around the world who joined us virtually today. Over the past year we’ve seen people use technology in profound ways: To keep themselves healthy and safe, to learn and grow, to connect and to help one another through really difficult times. It’s been inspiring to see and has made us more committed than ever to being helpful in the moments that matter. 

I look forward to seeing everyone at next year’s I/O — in person, I hope. Until then, be safe and well.

Read More

Using AI to help find answers to common skin conditions

Artificial intelligence (AI) has the potential to help clinicians care for patients and treat disease — from improving the screening process for breast cancer to helping detect tuberculosis more efficiently. When we combine these advances in AI with other technologies, like smartphone cameras, we can unlock new ways for people to stay better informed about their health, too.  

Today at  I/O, we shared a preview of an AI-powered dermatology assist tool that helps you understand what’s going on with issues related to your body’s largest organ: your skin, hair and nails. Using many of the same techniques that detect diabetic eye disease or lung cancer in CT scans, this tool gets you closer to identifying dermatologic issues — like a rash on your arm that’s bugging you — using your phone’s camera. 

How our AI-powered dermatology tool works 

Each year we see almost ten billion Google Searches related to skin, nail and hair issues. Two billion people worldwide suffer from dermatologic issues, but there’s a global shortage of specialists. While many people’s first step involves going to a Google Search bar, it can be difficult to describe what you’re seeing on your skin through words alone.

Our AI-powered dermatology assist tool is a web-based application that we hope to launch as a pilot later this year, to make it easier to figure out what might be going on with your skin. Once you launch the tool, simply use your phone’s camera to take three images of the skin, hair or nail concern from different angles. You’ll then be asked questions about your skin type, how long you’ve had the issue and other symptoms that help the tool narrow down the possibilities. The AI model analyzes this information and draws from its knowledge of 288 conditions to give you a list of possible matching conditions that you can then research further.

For each matching condition, the tool will show dermatologist-reviewed information and answers to commonly asked questions, along with similar matching images from the web. The tool is not intended to provide a diagnosis nor be a substitute for medical advice as many conditions require clinician review, in-person examination, or additional testing like a biopsy. Rather we hope it gives you access to authoritative information so you can make a more informed decision about your next step.

Image of a phone showing you each step of using the AI-powered dermatology assist tool.

Based on the photos and information you provide, our AI-powered dermatology assist tool will offer suggested conditions. This product has been CE marked as a Class I medical device in the EU. It is not available in the United States.

Developing an AI model that assesses issues for all skin types 

Our tool is the culmination of over three years of machine learning research and product development. To date, we’ve published several peer-reviewed papers that validate our AI model and more are in the works. 

Our landmark study, featured in Nature Medicine, debuted our deep learning approach to assessing skin diseases and showed that our AI system can achieve accuracy that is on par with U.S. board-certified dermatologists. Our most recent paper in JAMA Network Open demonstrated how non-specialist doctors can use AI-based tools to improve their ability to interpret skin conditions

To make sure we’re building for everyone, our model accounts for factors like age, sex, race and skin types — from pale skin that does not tan to brown skin that rarely burns. We developed and fine-tuned our model with de-identified data encompassing around 65,000 images and case data of diagnosed skin conditions, millions of curated skin concern images and thousands of examples of healthy skin — all across different demographics. 

Recently, the AI model that powers our tool successfully passed clinical validation, and the tool has been CE marked as a Class I medical device in the EU.¹ In the coming months, we plan to build on this work so more people can use this tool to answer questions about common skin issues. If you’re interested in this tool, sign up here to be notified (subject to availability in your region).

¹This tool has not been evaluated by the U.S. FDA for safety or efficacy. It is not available in the United States.

Read More

Tackling tuberculosis screening with AI

Today we’re sharing new AI research that aims to improve screening for one of the top causes of death worldwide: tuberculosis (TB). TB infects 10 million people per year and disproportionately affects people in low-to-middle-income countries. Diagnosing TB early is difficult because its symptoms can mimic those of common respiratory diseases.

Cost-effective screening, specifically chest X-rays, has been identified as one way to improve the screening process. However, experts aren’t always available to interpret results. That’s why the World Health Organization (WHO) recently recommended the use of computer-aided detection (CAD) for screening and triaging.

To help catch the disease early and work toward eventually eradicating it, Google researchers developed an AI-based tool that builds on our existing work in medical imaging to identify potential TB patients for follow-up testing. 

A deep learning system to detect active pulmonary tuberculosis  

In a new study released this week, we found that the right deep learning system can be used to accurately identify patients who are likely to have active TB based on their chest X-ray. By using this screening tool as a preliminary step before ordering a more expensive diagnostic test, our study showed that effective AI-powered screening could save up to 80% of the cost per positive TB case detected. 

Our AI-based tool was able to accurately detect active pulmonary TB cases with false-negative and false-positive detection rates that were similar to 14 radiologists. This accuracy was maintained even when examining patients who were HIV-positive, a population that is at higher risk of developing TB and is challenging to screen because their chest X-rays may differ from typical TB cases.

To make sure the model worked for patients from a wide range of races and ethnicities, we used de-identified data from nine countries to train the model and tested it on cases from five countries. These findings build on our previousresearch that showed AI can detect common issues like collapsed lungs, nodules or fractures in chest X-rays. 

Applying these findings in the real world

The AI system produces a number between 0 and 1 that indicates the risk of TB. For the system to be useful in a real-world setting, there needs to be agreement about what risk level indicates that patients should be recommended for additional testing. Calibrating this threshold can be time-consuming and expensive because administrators can only come to this number after running the system on hundreds of patients, testing these patients, and analyzing the results. 

Based on the performance of our model, our research suggests that any clinic could start from this default threshold and be confident that the model will perform similarly to radiologists, making it easier to deploy this technology. From there, clinics can adjust the threshold based on local needs and resources. For example, regions with fewer resources may use a higher cut-off point to reduce the number of follow-up tests needed. 

The path to eradicating tuberculosis

The WHO’s “The End TB Strategy” lays out the global efforts that are underway to dramatically reduce the incidence of tuberculosis in the coming decade. Because TB can remain pervasive in communities, even if a relatively low number of people have it at a given time, more and earlier screenings are critical to reducing its prevalence. 

We’ll keep contributing to these efforts — especially when it comes to research and development. Later this year, we plan to expand this work through two separate research studies with our partners, Apollo Hospitals in India and the Centre for Infectious Disease Research in Zambia (CIDRZ). 

Read More

A smoother ride and a more detailed Map thanks to AI

AI is a critical part of what makes Google Maps so helpful. With it, we’re able to map roads over 10 times faster than we could five years ago, and we can bring maps filled with useful information to virtually every corner of the world. Today, we’re giving you a behind-the-scenes look at how AI makes two of the features we announced at I/O possible.

Teaching Maps to identify and forecast when people are hitting the brakes

Let’s start with our routing update that helps you avoid situations that cause you to slam on the brakes, such as confusing lane changes or freeway exits. We use AI and navigation information to identify hard-braking events — moments that cause drivers to decelerate sharply and are known indicators of car crash likelihood — and then suggest alternate routes when available. We believe these updates have the potential to eliminate over 100 million hard-braking events in routes driven with Google Maps each year. But how exactly do we find when and where these moments are likely to occur?

That’s where AI comes in. To do this, we train our machine learning models on two sets of data. The first set of information comes from phones using Google Maps. Mobile phone sensors can determine deceleration along a route, but this data is highly prone to false alarms because your phone can move independently of your car. This is what makes it hard for our systems to decipher you tossing your phone into the cupholder or accidentally dropping it on the floor from an actual hard-braking moment. To combat this, we also use information from routes driven with Google Maps when it’s projected on a car’s display, like Android Auto. This represents a relatively small subset of data, but it’s highly accurate because Maps is now tethered to a stable spot — your car display. Training our models on both sets of data makes it possible to spot actual deceleration moments from fake ones, making detection across all trips more accurate. 

Understanding spots along a route that are likely to cause hard-braking is just one part of the equation. We’re also working to identify other contextual factors that lead to hard-braking events, like construction or visibility conditions. For example, if there’s a sudden increase in hard-braking events along a route during a certain time of day when people are likely to be driving toward the glare of the sun, our system could detect those events and offer alternate routes. These details inform future routing so we can suggest safer, smoother routes.

Using AI to go beyond driving

When you’re walking or biking or taking public transit, AI is also there helping you move along safely and easily. Last August we launched detailed street maps which show accurate road widths, along with details about where the sidewalks, crosswalks and pedestrian islands are in an area so people can better understand its layout and how to navigate it. Today, we announced that detailed street maps will expand to 50 more cities by the end of 2021. While this sounds straightforward, a lot is going on under the hood — especially with AI — to make this possible! 

A GIF that shows a before and after comparison of detailed streets maps built from satellite imagery

A before and after comparison of detailed streets maps built from satellite imagery

Imagine that you’re taking a stroll down a typical San Francisco street. As you approach the intersection, you’ll notice that the crosswalk uses a “zebra” pattern — vertical stripes that show you where to walk. But if you were in another city, say London, then parallel dotted lines would define the crosswalks. To account for these differences and accurately display them on the map, our systems need to know what crosswalks look like — not just in one city but across the entire world. It gets even trickier since urban design can change at the country, state, and even city level.

  • A street-level picture of crosswalks in San Francisco

    Crosswalks in San Francisco

  • A street-level image of crosswalks in London

    Crosswalks in London

  • A street-level image of crosswalks in Tokyo

    Crosswalks in Tokyo

  • A street-level image of crosswalks in Madrid

    Crosswalks in Madrid

  • A street-level image of crosswalks in Zurich

    Crosswalks in Zurich

To expand globally and account for local differences, we needed to completely revamp our mapmaking process. Traditionally, we’ve approached mapmaking like baking a cake — one layer at a time. We trained machine learning models to identify and classify features one by one across our index of millions of Street View, satellite and aerial images — starting first with roads, then addresses, buildings and so on. 

But detailed street maps require significantly more granularity and precision than a normal map. To map these dense urban features correctly, we’ve updated our models to identify all objects in a scene at once. This requires a ton of AI smarts. The model has to understand not only what the objects are, but the relationships between them — like where exactly a street ends and a sidewalk begins. With these new full-scene models, we’re able to detect and classify broad sets of features at a time without sacrificing accuracy, allowing us to map a single city faster than ever before. 

An image of Google Maps’ single-feature AI models

Single-feature AI model that classifies buildings.

An image of Google Maps’ full-scene AI models

Full-scene AI models that capture multiple categories of objects at once.

Once we have a model trained on a particular city, we can then expand it to other cities with similar urban designs. For example, the sidewalks, curbs, and traffic lights look similar in Atlanta and Ho Chi Minh City — despite being over 9,000 miles away. And the same model works in Madrid as it does in Dallas, something that may be hard to believe at first glance. With our new advanced machine learning techniques combined with our collection of high-definition imagery, we’re on track to bring a level of detail to the map at scale like never before.

AI will continue to play an important role as we build the most helpful map for people around the globe. For more behind-the-scenes looks at the technology that powers Google Maps, check out the rest of our Maps 101 blog series.

More from this Series

Maps 101

Google Maps helps you navigate, explore, and get things done every single day. In this series, we’ll take a look under the hood at how Google Maps uses technology to build helpful products—from using flocks of sheep and laser beams to gather high-definition imagery to predicting traffic jams that haven’t even happened yet.

View more from Maps 101

Read More

Unveiling our new Quantum AI campus

Within the decade, Google aims to build a useful, error-corrected quantum computer. This will accelerate solutions for some of the world’s most pressing problems, like sustainable energy and reduced emissions to feed the world’s growing population, and unlocking new scientific discoveries, like more helpful AI.

To begin our journey, today we’re unveiling our new Quantum AI campus in Santa Barbara, California. This campus includes our first quantum data center, our quantum hardware research laboratories, and our own quantum processor chip fabrication facilities. Here, our team is working to build an error-corrected quantum computer for the world.

This is a drone's perspective of entering the new Quantum AI campus

Our new Quantum AI campus in Santa Barbara, CA will include our first quantum data center, new research laboratories, and quantum processor fabrication facilities.

Google began using machine learning 20 years ago (for spell checking in Search), and led the deep learning revolution 10 years ago (advancing neural nets, the leading approach to modern AI). These advances in AI and other technologies have enabled many of the incredible applications we’re seeing today. As we look 10 years into the future, many of the greatest global challenges, from climate change to handling the next pandemic, demand a new kind of computing.

To build better batteries (to lighten the load on the power grid), or to create fertilizer to feed the world without creating 2% of global carbon emissions (as nitrogen fixation does today), or to create more targeted medicines (to stop the next pandemic before it starts), we need to understand and design molecules better. That means simulating nature accurately. But you can’t simulate molecules very well using classical computers. As you get to even modestly sized molecules, you quickly run out of computing resources. Nature is quantum mechanical: The bonds and interactions among atoms behave probabilistically, with richer dynamics that exhaust the simple classical computing logic.

A look inside of our cryostats

The inside of our cryostats, like the ones found in the Quantum AI campus, are some of the coldest places in the universe, reaching temperatures around 10 milliKelvin

This is where quantum computers come in. Quantum computers use quantum bits, or “qubits,” which can be entangled in a complex superposition of states, naturally mirroring the complexity of molecules in the real world. With an error-corrected quantum computer, we’ll be able to simulate how molecules behave and interact, so we can test and invent new chemical processes and new materials before investing in costly real-life prototypes. These new computing capabilities will help to accelerate the discovery of better batteries, energy-efficient fertilizers, and targeted medicines, as well as improved optimization, new AI architectures, and more.

Depicting the journey to building an error-corrected quantum computer

Our journey to build an error-corrected quantum computer within the decade includes several scientific milestones, including building an error-corrected logical qubit.

To reach this goal, we’re on a journey to build 1,000,000 physical qubits that work in concert inside a room-sized error-corrected quantum computer. That’s a big leap from today’s modestly-sized systems of fewer than 100 qubits.

To get there, we must build the world’s first “quantum transistor” — two error-corrected “logical qubits” performing quantum operations together — and then figure out how to tile hundreds to thousands of them to form the error-corrected quantum computer. That will take years.

To get there, we need to show we can encode one logical qubit — with 1,000 physical qubits. Using quantum error-correction, these physical qubits work together to form a long-lived nearly perfect qubit — a forever qubit that maintains coherence until power is removed, ushering in the digital era of quantum computing. Again, we expect years of concerted development to achieve this goal.

And to get THERE(!), we need to show that the more physical qubits participate in error correction, the more you can cut down on errors in the first place — this is a crucial step given how error-prone physical qubits are. We’re doing that research right now on our Quantum AI campus.

Already we run quantum computers that can perform calculations beyond the reach of classical computers. To continue this journey towards a useful error-corrected quantum computer and provide humanity with a new tool tuned to the way nature works, we’re assembling an amazing team to invent the future of computing together right here, right now, at Google’s Quantum AI campus.

  • Picture of the Sycamore quantum processor

    The Sycamore quantum processor has 54 individually controllable qubits, and 88 tunable couplers. The couplers are used to enable fast quantum operations between qubits.

  • Current generation of cryostats

    The current generation of cryostats that hold our quantum processors are about the size of three household refrigerators.

  • Art in the Quantum AI campus

    Quantum computing could help us understand and simulate the natural world around us. The art in the Quantum AI campus is influenced by nature.

Read More

LaMDA: our breakthrough conversation technology

We’ve always had a soft spot for language at Google. Early on, we set out to translate the web. More recently, we’ve invented machine learning techniques that help us better grasp the intent of Search queries. Over time, our advances in these and other areas have made it easier and easier to organize and access the heaps of information conveyed by the written and spoken word.

But there’s always room for improvement. Language is remarkably nuanced and adaptable. It can be literal or figurative, flowery or plain, inventive or informational. That versatility makes language one of humanity’s greatest tools — and one of computer science’s most difficult puzzles. 

LaMDA, our latest research breakthrough, adds pieces to one of the most tantalizing sections of that puzzle: conversation.

Animation of a multi-turn conversation between a person and LaMDA. The transcript reads:  USER: Hi, I’d like to have a conversation demonstration for a blog post.  LaMDA: Hi. I’m a friendly and knowledgeable demonstration for your blog post. I can engage in interesting conversation about virtually anything. What would you like to chat about?  USER: if my washing machine and tv are both malfunctioning at the same time what could the reason be?  LaMDA: It could be the power surge from the washing machine may have fried your tv. If it’s a power surge problem it may be time to get a new washing machine. Or a new TV.  USER: yeah maybe. btw how many legs does a horse have, and could it fix my washing machine?  LaMDA: Horses have four legs. How could they help?  USER: you tell me  LaMDA: Oh, well they could certainly help you. By moving you to a new home where you could install a new washing machine.

While conversations tend to revolve around specific topics, their open-ended nature means they can start in one place and end up somewhere completely different. A chat with a friend about a TV show could evolve into a discussion about the country where the show was filmed before settling on a debate about that country’s best regional cuisine. 

That meandering quality can quickly stump modern conversational agents (commonly known as chatbots), which tend to follow narrow, pre-defined paths. But LaMDA — short for “Language Model for Dialogue Applications” — can engage in a free-flowing way about a seemingly endless number of topics, an ability we think could unlock more natural ways of interacting with technology and entirely new categories of helpful applications.

The long road to LaMDA

LaMDA’s conversational skills have been years in the making. Like many recent language models, including BERT and GPT-3, it’s built on Transformer, a neural network architecture that Google Research invented and open-sourced in 2017. That architecture produces a model that can be trained to read many words (a sentence or paragraph, for example), pay attention to how those words relate to one another and then predict what words it thinks will come next. 

But unlike most other language models, LaMDA was trained on dialogue. During its training, it picked up on several of the nuances that distinguish open-ended conversation from other forms of language. One of those nuances is sensibleness. Basically: Does the response to a given conversational context make sense? For instance, if someone says:

“I just started taking guitar lessons.”

You might expect another person to respond with something like: 

“How exciting! My mom has a vintage Martin that she loves to play.”

That response makes sense, given the initial statement. But sensibleness isn’t the only thing that makes a good response. After all, the phrase “that’s nice” is a sensible response to nearly any statement, much in the way “I don’t know” is a sensible response to most questions. Satisfying responses also tend to be specific, by relating clearly to the context of the conversation. In the example above, the response is sensible and specific.

LaMDA builds on earlier Google research, published in 2020, that showed Transformer-based language models trained on dialogue could learn to talk about virtually anything. Since then, we’ve also found that, once trained, LaMDA can be fine-tuned to significantly improve the sensibleness and specificity of its responses. 

Responsibility first

These early results are encouraging, and we look forward to sharing more soon, but sensibleness and specificity aren’t the only qualities we’re looking for in models like LaMDA. We’re also exploring dimensions like “interestingness,” by assessing whether responses are insightful, unexpected or witty. Being Google, we also care a lot about factuality (that is, whether LaMDA sticks to facts, something language models often struggle with), and are investigating ways to ensure LaMDA’s responses aren’t just compelling but correct.

But the most important question we ask ourselves when it comes to our technologies is whether they adhere to our AI Principles. Language might be one of humanity’s greatest tools, but like all tools it can be misused. Models trained on language can propagate that misuse — for instance, by internalizing biases, mirroring hateful speech, or replicating misleading information. And even when the language it’s trained on is carefully vetted, the model itself can still be put to ill use. 

Our highest priority, when creating technologies like LaMDA, is working to ensure we minimize such risks. We’re deeply familiar with issues involved with machine learning models, such as unfair bias, as we’ve been researching and developing these technologies for many years. That’s why we build and open-source resources that researchers can use to analyze models and the data on which they’re trained; why we’ve scrutinized LaMDA at every step of its development; and why we’ll continue to do so as we work to incorporate conversational abilities into more of our products.

Read More