Cohere brings language AI to Amazon SageMaker

Cohere brings language AI to Amazon SageMaker

This is a guest post by Sudip Roy, Manager of Technical Staff at Cohere.

It’s an exciting day for the development community. Cohere’s state-of-the-art language AI is now available through Amazon SageMaker. This makes it easier for developers to deploy Cohere’s pre-trained generation language model to Amazon SageMaker, an end-to-end machine learning (ML) service. Developers, data scientists, and business analysts use Amazon SageMaker to build, train, and deploy ML models quickly and easily using its fully managed infrastructure, tools, and workflows.

At Cohere, the focus is on language. The company’s mission is to enable developers and businesses to add language AI to their technology stack and build game-changing applications with it. Cohere helps developers and businesses automate a wide range of tasks, such as copywriting, named entity recognition, paraphrasing, text summarization, and classification. The company builds and continually improves its general-purpose large language models (LLMs), making them accessible via a simple-to-use platform. Companies can use the models out of the box or tailor them to their particular needs using their own custom data.

Developers using SageMaker will have access to Cohere’s Medium generation language model. The Medium generation model excels at tasks that require fast responses, such as question answering, copywriting, or paraphrasing. The Medium model is deployed in containers that enable low-latency inference on a diverse set of hardware accelerators available on AWS, providing different cost and performance advantages for SageMaker customers.

“Amazon SageMaker provides the broadest and most comprehensive set of services that eliminate heavy lifting from each step of the machine learning process. We’re excited to offer Cohere’s general purpose large language model with Amazon SageMaker. Our joint customers can now leverage the broad range of Amazon SageMaker services and integrate Cohere’s model with their applications for accelerated time-to-value and faster innovation.”

-Rajneesh Singh, General Manager AI/ML at Amazon Web Services.

“As Cohere continues to push the boundaries of language AI, we are excited to join forces with Amazon SageMaker. This partnership will allow us to bring our advanced technology and innovative approach to an even wider audience, empowering developers and organizations around the world to harness the power of language AI and stay ahead of the curve in an increasingly competitive market.”

-Saurabh Baji, Senior Vice President of Engineering at Cohere.

The Cohere Medium generation language model available through SageMaker, provide developers with three key benefits:

  • Build, iterate, and deploy quickly – Cohere empowers any developer (no NLP, ML, or AI expertise required) to quickly get access to a pre-trained, state-of-the-art generation model that understands context and semantics at unprecedented levels. This high-quality, large language model reduces the time-to-value for customers by providing an out-of-the-box solution for a wide range of language understanding tasks.
  • Private and secure – With SageMaker, customers can spin up containers serving Cohere’s models without having to worry about their data leaving these self-managed containers.
  • Speed and accuracy Cohere’s Medium model offers customers a good balance across quality, cost, and latency. Developers can easily integrate the Cohere Generate endpoint into apps using a simple API and SDK.

Get started with Cohere in SageMaker

Developers can use the visual interface of the SageMaker JumpStart foundation models to test Cohere’s models without writing a single line of code. You can evaluate the model on your specific language understanding task and learn the basics of using generative language models. See Cohere’s documentation and blog for various tutorials and tips-and-tricks related to language modeling.

Deploy the SageMaker endpoint using a notebook

Cohere has packaged Medium models, along with an optimized, low-latency inference framework, in containers that can be deployed as SageMaker inference endpoints. Cohere’s containers can be deployed on a range of different instances (including ml.p3.2xlarge, ml.g5.xlarge, and ml.g5.2xlarge) that offer different cost/performance trade-offs. These containers are currently available in two Regions: us-east-1 and eu-west-1. Cohere intends to expand its offering in the near future, including adding to the number and size of models available, the set of supported tasks (such as the endpoints built on top of these models), the supported instances, and the available Regions.

To help developers get started quickly, Cohere has provided Jupyter notebooks that make it easy to deploy these containers and run inference on the deployed endpoints. With the preconfigured set of constants in the notebook, deploying the endpoint can be easily done with only a couple of lines of code as shown in the following example:

After the endpoint is deployed, users can use Cohere’s SDK to run inference. The SDK can be installed easily from PyPI as follows:

It can also be installed from the source code in Cohere’s public SDK GitHub repository.

After the endpoint is deployed, users can use the Cohere Generate endpoint to accomplish multiple generative tasks, such as text summarization, long-form content generation, entity extraction, or copywriting. The Jupyter notebook and GitHub repository include examples demonstrating some of these use cases.

Conclusion

The availability of Cohere natively on SageMaker via the AWS Marketplace represents a major milestone in the field of NLP. The Cohere model’s ability to generate high-quality, coherent text makes it a valuable tool for anyone working with text data.

If you’re interested in using Cohere for your own SageMaker projects, you can now access it on SageMaker JumpStart. Additionally, you can reference Cohere’s GitHub notebook for instructions on deploying the model and accessing it from the Cohere Generate endpoint.


About the authors

Sudip Roy is Manager of Technical Staff at Cohere, a provider of cutting-edge natural language processing (NLP) technology. Sudip is an accomplished researcher who has published and served on program committees for top conferences like NeurIPS, MLSys, OOPSLA, SIGMOD, VLDB, and SIGKDD, and his work has earned Outstanding Paper awards from SIGMOD and MLSys.

Karthik Bharathy is the product leader for the Amazon SageMaker team with over a decade of product management, product strategy, execution, and launch experience.

Karl Albertsen leads product, engineering, and science for Amazon SageMaker Algorithms and JumpStart, SageMaker’s machine learning hub. He is passionate about applying machine learning to unlock business value.

Read More

NVIDIA CEO Ignites AI Conversation in Stockholm

NVIDIA CEO Ignites AI Conversation in Stockholm

More than 600 entrepreneurs, developers, researchers and executives from across the Nordics flocked Tuesday to Stockholm’s sleek Sergel Hub conference center in a further sign of the strength of the region’s AI ecosystem.

The highlight: a far-reaching conversation between NVIDIA founder and CEO Jensen Huang and Swedish industrialist Marcus Wallenberg exploring the intersections of AI, green computing, and Scandinavia’s broader tech scene.

“This generative AI phenomenon is creating a whole slew of new startups, new ideas, new video editing, image editing, new text,” Huang said. “It can achieve capabilities that previous computing platforms cannot.”

The Berzelius supercomputer, named for Jöns Jacob Berzelius, one of the fathers of modern chemistry, has just been upgraded to 94 NVIDIA DGX A100 AI computing systems, delivering nearly half an exaflop of AI performance, placing it among the world’s 100 fastest AI supercomputers.

“Years ago, Marcus and I started talking about a new way of doing computer science. Having a key instrument, like Berzelius, would be a fundamental instrument of future science,” Huang told the audience. “The work that is done on this instrument would make tremendous impacts to life sciences, material sciences, physical sciences and computer science.”

Maximum Efficiency, Minimal Impact

The rising use of electricity is one of the causes of global warming, and powerful, energy-efficient computers are crucial to fighting climate change through green computing.

Huang explained that whether for data centers or the latest smartphone, computer chips, systems and software must be designed and used to maximize energy efficiency and minimize environmental impact.

“Companies large and small have to sign up for the carbon footprint that we use to build the work that we do,” said Huang. “If there’s an opportunity for us to help accelerate workloads and reduce energy use and improve energy efficiency, we will.”

Sweden’s Role in AI

The upgrade comes as AI is powering change in every industry across the globe, with leaders from across the Nordics accelerating the growth of some of the world’s most powerful AI solutions, explained Wallenberg.

“From the perspective of the foundations, we’re trying to work for the betterment of Sweden by promoting the areas of research, technology and medicine,” said Wallenberg, whose family has for generations been deeply involved across the nation’s economy. “We are working together as a team to create possibilities and the foundations for more work to be done.”

The Berzelius system was used for training the first Swedish large language model. Increasing in size 10x every year for the last few years, large language models are just one state-of-the-art AI technology that promises transformation through learned knowledge.

Neural networks trained with massive datasets on powerful systems, LLMs are accelerating discoveries across industries such as healthcare and climate science with software frameworks like NVIDIA BioNeMo. Models like ChatGPT are making a name for themselves as a new way to use AI.

“You can connect models together to retrieve new information so that models like ChatGPT could report on the news today, who won that game, or the latest weather,” Huang said. “The combination of these capabilities means not only the ability to respond and answer questions and write stories, but it can also write programs and solve problems.”

Knowledge From Data

Solving problems requires reliable, physically accurate data. The industrial metaverse, where digital twins of real factories, rail networks or retail stores can be created, is already being used by large companies like Amazon, BMW, Ericsson and Siemens.

Following the conversation between Huang and Wallenberg, Staffan Truvé, CTO and co-founder of cybersecurity company Recorded Future, talked about how data can be used to model intelligence as a digital twin to get an end-to-end view of threats and targets.

“Today, there are three major converging threat areas. Physical, cyber and influence, which is the threat to our brains,” Truvé explained. “By creating an intelligence graph, we’re building a full picture of a threat.”

Digital twins are not the only way to gather valuable insights when developing for the future. Sara Mazur, vice executive director of the Knut and Alice Wallenberg Foundation and chair of the Wallenberg AI Autonomous Systems and Software Program, highlighted the importance of collaboration between academia and industry.

Read More

Deciphering Clinical Abbreviations with Privacy Protecting ML

Deciphering Clinical Abbreviations with Privacy Protecting ML

Today many people have digital access to their medical records, including their doctor’s clinical notes. However, clinical notes are hard to understand because of the specialized language that clinicians use, which contains unfamiliar shorthand and abbreviations. In fact, there are thousands of such abbreviations, many of which are specific to certain medical specialities and locales or can mean multiple things in different contexts. For example, a doctor might write in their clinical notes, “pt referred to pt for lbp“, which is meant to convey the statement: “Patient referred to physical therapy for low back pain.” Coming up with this translation is tough for laypeople and computers because some abbreviations are uncommon in everyday language (e.g., “lbp” means “low back pain”), and even familiar abbreviations, such as “pt” for “patient”, can have alternate meanings, such as “physical therapy.” To disambiguate between multiple meanings, the surrounding context must be considered. It’s no easy task to decipher all the meanings, and prior research suggests that expanding the shorthand and abbreviations can help patients better understand their health, diagnoses, and treatments.

In “Deciphering clinical abbreviations with a privacy protecting machine learning system”, published in Nature Communications, we report our findings on a general method that deciphers clinical abbreviations in a way that is both state-of-the-art and is on-par with board certified physicians in this task. We built the model using only public data on the web that wasn’t associated with any patient (i.e., no potentially sensitive data) and evaluated performance on real, de-identified notes from inpatient and outpatient clinicians from different health systems. To enable the model to generalize from web-data to notes, we created a way to algorithmically re-write large amounts of internet text to look as if it were written by a doctor (called web-scale reverse substitution), and we developed a novel inference method, (called elicitive inference).

The model input is a string that may or may not contain medical abbreviations. We trained a model to output a corresponding string in which all abbreviations are simultaneously detected and expanded. If the input string does not contain an abbreviation, the model will output the original string. By Rajkomar et al used under CC BY 4.0/ Cropped from original.

Rewriting Text to Include Medical Abbreviations

Building a system to translate doctors’ notes would usually start with a large, representative dataset of clinical text where all abbreviations are labeled with their meanings. But no such dataset for general use by researchers exists. We therefore sought to develop an automated way to create such a dataset but without the use of any actual patient notes, which might include sensitive data. We also wanted to ensure that models trained on this data would still work well on real clinical notes from multiple hospital sites and types of care, such as both outpatient and inpatient.

To do this, we referenced a dictionary of thousands of clinical abbreviations and their expansions, and found sentences on the web that contained uses of the expansions from this dictionary. We then “rewrote” those sentences by abbreviating each expansion, resulting in web data that looked like it was written by a doctor. For instance, if a website contained the phrase “patients with atrial fibrillation can have chest pain,” we would rewrite this sentence to “pts with af can have cp.” We then used the abbreviated text as input to the model, with the original text serving as the label. This approach provided us with large amounts of data to train our model to perform abbreviation expansion.

The idea of “reverse substituting” the long-forms for their abbreviations was introduced in prior research, but our distributed algorithm allows us to extend the technique to large, web-sized datasets. Our algorithm, called web-scale reverse substitution (WSRS), is designed to ensure that rare terms occur more frequently and common terms are down-sampled across the public web to derive a more balanced dataset. With this data in-hand, we trained a series of large transformer-based language models to expand the web text.

We generate text to train our model on the decoding task by extracting phrases from public web pages that have corresponding medical abbreviations (shaded boxes on the left) and then substituting in the appropriate abbreviations (shaded dots, right). Since some words are found much more frequently than others (“patient” more than “posterior tibialis”, both of which can be abbreviated “pt”), we downsampled common expansions to derive a more balanced dataset across the thousands of abbreviations. By Rajkomar et al used under CC BY 4.0.

Adapting Protein Alignment Algorithms to Unstructured Clinical Text

Evaluation of these models on the particular task of abbreviation expansion is difficult. Because they produce unstructured text as output, we had to figure out which abbreviations in the input correspond to which expansion in the output. To achieve this, we created a modified version of the Needleman Wunsch algorithm, which was originally designed for divergent sequence alignment in molecular biology, to align the model input and output and extract the corresponding abbreviation-expansion pairs. Using this alignment technique, we were able to evaluate the model’s capacity to detect and expand abbreviations accurately. We evaluated Text-to-Text Transfer Transformer (T5) models of various sizes (ranging from 60 million to over 60 billion parameters) and found that larger models performed translation better than smaller models, with the biggest model achieving the best performance.

Creating New Model Inference Techniques to Coax the Model

However, we did find something unexpected. When we evaluated the performance on multiple external test sets from real clinical notes, we found the models would leave some abbreviations unexpanded, and for larger models, the problem of incomplete expansion was even worse. This is mainly due to the fact that while we substitute expansions on the web for their abbreviations, we have no way of handling the abbreviations that are already present. This means that the abbreviations appear in both the original and rewritten text used as respective labels and input, and the model learns not to expand them.

To address this, we developed a new inference-chaining technique in which the model output is fed again as input to coax the model to make further expansions as long as the model is confident in the expansion. In technical terms, our best-performing technique, which we call elicitive inference, involves examining the outputs from a beam search above a certain log-likelihood threshold. Using elicitive inference, we were able to achieve state-of-the-art capability of expanding abbreviations in multiple external test sets.

Real example of the model’s input (left) and output (right).

Comparative Performance

We also sought to understand how patients and doctors currently perform at deciphering clinical notes, and how our model compared. We found that lay people (people without specific medical training) demonstrated less than 30% comprehension of the abbreviations present in the sample medical texts. When we allowed them to use Google Search, their comprehension increased to nearly 75%, still leaving 1 out of 5 abbreviations indecipherable. Unsurprisingly, medical students and trained physicians performed much better at the task with an accuracy of 90%. We found that our largest model was capable of matching or exceeding experts, with an accuracy of 98%.

How does the model perform so well compared to physicians in this task? There are two important factors in the model’s high comparative performance. Part of the discrepancy is that there were some abbreviations that clinicians did not even attempt to expand (such as “cm” for centimeter), which partly lowered the measured performance. This might seem unimportant, but for non-english speakers, these abbreviations may not be familiar, and so it may be helpful to have them written out. In contrast, our model is designed to comprehensively expand abbreviations. In addition, clinicians are familiar with abbreviations they commonly see in their speciality, but other specialists use shorthand that are not understood by those outside their fields. Our model is trained on thousands of abbreviations across multiple specialities and therefore can decipher a breadth of terms.

Towards Improved Health Literacy

We think there are numerous avenues in which large language models (LLMs) can help advance the health literacy of patients by augmenting the information they see and read. Most LLMs are trained on data that does not look like clinical note data, and the unique distribution of this data makes it challenging to deploy these models in an out-of-the-box fashion. We have demonstrated how to overcome this limitation. Our model also serves to “normalize” clinical note data, facilitating additional capabilities of ML to make the text easier for patients of all educational and health-literacy levels to understand.

Acknowledgements

This work was carried out in collaboration with Yuchen Liu, Jonas Kemp, Benny Li, Ming-Jun Chen, Yi Zhang, Afroz Mohiddin, and Juraj Gottweis. We thank Lisa Williams, Yun Liu, Arelene Chung, and Andrew Dai for many useful conversations and discussions about this work.

Read More

Google Research, 2022 & Beyond: Responsible AI

Google Research, 2022 & Beyond: Responsible AI

<!–

This is the second post in our “Google Research, 2022 & Beyond” series. Other topics in the series can be found below:

Language Models Computer Vision Multimodal Models
Generative Models Responsible AI Algorithms*
ML & Computer Systems Robotics Health
General Science & Quantum Community Engagement
* Other articles in the series will be linked as they are released.

–>

The last year showed tremendous breakthroughs in artificial intelligence (AI), particularly in large language models (LLMs) and text-to-image models. These technological advances require that we are thoughtful and intentional in how they are developed and deployed. In this blogpost, we share ways we have approached Responsible AI across our research in the past year and where we’re headed in 2023. We highlight four primary themes covering foundational and socio-technical research, applied research, and product solutions, as part of our commitment to build AI products in a responsible and ethical manner, in alignment with our AI Principles.

  · Theme 1: Responsible AI Research Advancements
  · Theme 2: Responsible AI Research in Products
  · Theme 3: Tools and Techniques
  · Theme 4: Demonstrating AI’s Societal Benefit

Theme 1: Responsible AI Research Advancements

Machine Learning Research

When machine learning (ML) systems are used in real world contexts, they can fail to behave in expected ways, which reduces their realized benefit. Our research identifies situations in which unexpected behavior may arise, so that we can mitigate undesired outcomes.

Across several types of ML applications, we showed that models are often underspecified, which means they perform well in exactly the situation in which they are trained, but may not be robust or fair in new situations, because the models rely on “spurious correlations” — specific side effects that are not generalizable. This poses a risk to ML system developers, and demands new model evaluation practices.

We surveyed evaluation practices currently used by ML researchers and introduced improved evaluation standards in work addressing common ML pitfalls. We identified and demonstrated techniques to mitigate causal “shortcuts”, which lead to a lack of ML system robustness and dependency on sensitive attributes, such as age or gender.

Shortcut learning: Age impacts correct medical diagnosis.

To better understand the causes of and mitigations for robustness issues, we decided to dig deeper into model design in specific domains. In computer vision, we studied the robustness of new vision transformer models and developed new negative data augmentation techniques to improve their robustness. For natural language tasks, we similarly investigated how different data distributions improve generalization across different groups and how ensembles and pre-trained models can help.

Another key part of our ML work involves developing techniques to build models that are more inclusive. For example, we look to external communities to guide understanding of when and why our evaluations fall short using participatory systems, which explicitly enable joint ownership of predictions and allow people to choose whether to disclose on sensitive topics.

Sociotechnical Research

In our quest to include a diverse range of cultural contexts and voices in AI development and evaluation, we have strengthened community-based research efforts, focusing on particular communities who are less represented or may experience unfair outcomes of AI. We specifically looked at evaluations of unfair gender bias, both in natural language and in contexts such as gender-inclusive health. This work is advancing more accurate evaluations of unfair gender bias so that our technologies evaluate and mitigate harms for people with queer and non-binary identities.

Alongside our fairness advancements, we also reached key milestones in our larger efforts to develop culturally-inclusive AI. We championed the importance of cross-cultural considerations in AI — in particular, cultural differences in user attitudes towards AI and mechanisms for accountability — and built data and techniques that enable culturally-situated evaluations, with a focus on the global south. We also described user experiences of machine translation, in a variety of contexts, and suggested human-centered opportunities for their improvement.

Human-Centered Research

At Google, we focus on advancing human-centered research and design. Recently, our work showed how LLMs can be used to rapidly prototype new AI-based interactions. We also published five new interactive explorable visualizations that introduce key ideas and guidance to the research community, including how to use saliency to detect unintended biases in ML models, and how federated learning can be used to collaboratively train a model with data from multiple users without any raw data leaving their devices.

Our interpretability research explored how we can trace the behavior of language models back to the training data itself, suggested new ways to compare differences in what models pay attention to, how we can explain emergent behavior, and how to identify human-understandable concepts learned by models. We also proposed a new approach for recommender systems that uses natural language explanations to make it easier for people to understand and control their recommendations.

Creativity and AI Research

We initiated conversations with creative teams on the rapidly changing relationship between AI technology and creativity. In the creative writing space, Google’s PAIR and Magenta teams developed a novel prototype for creative writing, and facilitated a writers’ workshop to explore the potential and limits of AI to assist creative writing. The stories from a diverse set of creative writers were published as a collection, along with workshop insights. In the fashion space, we explored the relationship between fashion design and cultural representation, and in the music space, we started examining the risks and opportunities of AI tools for music.

Top

Theme 2: Responsible AI Research in Products

The ability to see yourself reflected in the world around you is important, yet image-based technologies often lack equitable representation, leaving people of color feeling overlooked and misrepresented. In addition to efforts to improve representation of diverse skin tones across Google products, we introduced a new skin tone scale designed to be more inclusive of the range of skin tones worldwide. Partnering with Harvard professor and sociologist, Dr. Ellis Monk, we released the Monk Skin Tone (MST) Scale, a 10-shade scale that is available for the research community and industry professionals for research and product development. Further, this scale is being incorporated into features on our products, continuing a long line of our work to improve diversity and skin tone representation on Image Search and filters in Google Photos.

The 10 shades of the Monk Skin Tone Scale.

This is one of many examples of how Responsible AI in Research works closely with products across the company to inform research and develop new techniques. In another example, we leveraged our past research on counterfactual data augmentation in natural language to improve SafeSearch, reducing unexpected shocking Search results by 30%, especially on searches related to ethnicity, sexual orientation, and gender. To improve video content moderation, we developed new approaches for helping human raters focus their attention on segments of long videos that are more likely to contain policy violations. And, we’ve continued our research on developing more precise ways of evaluating equal treatment in recommender systems, accounting for the broad diversity of users and use cases.

In the area of large models, we incorporated Responsible AI best practices as part of the development process, creating Model Cards and Data Cards (more details below), Responsible AI benchmarks, and societal impact analysis for models such as GLaM, PaLM, Imagen, and Parti. We also showed that instruction fine-tuning results in many improvements for Responsible AI benchmarks. Because generative models are often trained and evaluated on human-annotated data, we focused on human-centric considerations like rater disagreement and rater diversity. We also presented new capabilities using large models for improving responsibility in other systems. For example, we have explored how language models can generate more complex counterfactuals for counterfactual fairness probing. We will continue to focus on these areas in 2023, also understanding the implications for downstream applications.

Top

Theme 3: Tooling and Techniques

Responsible Data

Data Documentation:

Extending our earlier work on Model Cards and the Model Card Toolkit, we released Data Cards and the Data Cards Playbook, providing developers with methods and tools to document appropriate uses and essential facts related to a model or dataset. We have also advanced research on best practices for data documentation, such as accounting for a dataset’s origins, annotation processes, intended use cases, ethical considerations, and evolution. We also applied this to healthcare, creating “healthsheets” to underlie the foundation of our international Standing Together collaboration, bringing together patients, health professionals, and policy-makers to develop standards that ensure datasets are diverse and inclusive and to democratize AI.

New Datasets:

Fairness: We released a new dataset to assist in ML fairness and adversarial testing tasks, primarily for generative text datasets. The dataset contains 590 words and phrases that show interactions between adjectives, words, and phrases that have been shown to have stereotypical associations with specific individuals and groups based on their sensitive or protected characteristics.

A partial list of the sensitive characteristics in the dataset denoting their associations with adjectives and stereotypical associations.

Toxicity: We constructed and publicly released a dataset of 10,000 posts to help identify when a comment’s toxicity depends on the comment it’s replying to. This improves the quality of moderation-assistance models and supports the research community working on better ways to remedy online toxicity.

Societal Context Data: We used our experimental societal context repository (SCR) to supply the Perspective team with auxiliary identity and connotation context data for terms relating to categories such as ethnicity, religion, age, gender, or sexual orientation — in multiple languages. This auxiliary societal context data can help augment and balance datasets to significantly reduce unintended biases, and was applied to the widely used Perspective API toxicity models.

Learning Interpretability Tool (LIT)

An important part of developing safer models is having the tools to help debug and understand them. To support this, we released a major update to the Learning Interpretability Tool (LIT), an open-source platform for visualization and understanding of ML models, which now supports images and tabular data. The tool has been widely used in Google to debug models, review model releases, identify fairness issues, and clean up datasets. It also now lets you visualize 10x more data than before, supporting up to 100s of thousands of data points at once.

A screenshot of the Language Interpretability Tool displaying generated sentences on a data table.

Counterfactual Logit Pairing

ML models are sometimes susceptible to flipping their prediction when a sensitive attribute referenced in an input is either removed or replaced. For example, in a toxicity classifier, examples such as “I am a man” and “I am a lesbian” may incorrectly produce different outputs. To enable users in the Open Source community to address unintended bias in their ML models, we launched a new library, Counterfactual Logit Pairing (CLP), which improves a model’s robustness to such perturbations, and can positively influence a model’s stability, fairness, and safety.

Illustration of fairness predictions that can be mitigated using counterfactual logit pairing.

Top

Theme 4: Demonstrating AI’s Societal Benefit

We believe that AI can be used to explore and address hard, unanswered questions around humanitarian and environmental issues. Our research and engineering efforts span many areas, including accessibility, health, and media representation, with the end goal of promoting inclusion and meaningfully improving people’s lives.

Accessibility

Following many years of research, we launched Project Relate, an Android app that uses a personalized AI-based speech recognition model to enable people with non-standard speech to communicate more easily with others. The app is available to English speakers 18+ in Australia, Canada, Ghana, India, New Zealand, the UK, and the US.

To help catalyze advances in AI to benefit people with disabilities, we also launched the Speech Accessibility Project. This project represents the culmination of a collaborative, multi-year effort between researchers at Google, Amazon, Apple, Meta, Microsoft, and the University of Illinois Urbana-Champaign. Together, this group built a large dataset of impaired speech that is available to developers to empower research and product development for accessibility applications. This work also complements our efforts to assist people with severe motor and speech impairments through improvements to techniques that make use of a user’s eye gaze.

Health

We’re also focused on building technology to better the lives of people affected by chronic health conditions, while addressing systemic inequities, and allowing for transparent data collection. As consumer technologies — such as fitness trackers and mobile phones — become central in data collection for health, we’ve explored use of technology to improve interpretability of clinical risk scores and to better predict disability scores in chronic diseases, leading to earlier treatment and care. And, we advocated for the importance of infrastructure and engineering in this space.

Many health applications use algorithms that are designed to calculate biometrics and benchmarks, and generate recommendations based on variables that include sex at birth, but might not account for users’ current gender identity. To address this issue, we completed a large, international study of trans and non-binary users of consumer technologies and digital health applications to learn how data collection and algorithms used in these technologies can evolve to achieve fairness.

Media

We partnered with the Geena Davis Institute on Gender in Media (GDI) and the Signal Analysis and Interpretation Laboratory (SAIL) at the University of Southern California (USC) to study 12 years of representation in TV. Based on an analysis of over 440 hours of TV programming, the report highlights findings and brings attention to significant disparities in screen and speaking time for light and dark skinned characters, male and female characters, and younger and older characters. This first-of-its-kind collaboration uses advanced AI models to understand how people-oriented stories are portrayed in media, with the ultimate goal to inspire equitable representation in mainstream media.

MUSE demo Source: Video Collection / Getty Images.

Top

Plans for 2023 and Beyond

We’re committed to creating research and products that exemplify positive, inclusive, and safe experiences for everyone. This begins by understanding the many aspects of AI risks and safety inherent in the innovative work that we do, and including diverse sets of voices in coming to this understanding.

  • Responsible AI Research Advancements: We will strive to understand the implications of the technology that we create, through improved metrics and evaluations, and devise methodology to enable people to use technology to become better world citizens.
  • Responsible AI Research in Products: As products leverage new AI capabilities for new user experiences, we will continue to collaborate closely with product teams to understand and measure their societal impacts and to develop new modeling techniques that enable the products to uphold Google’s AI Principles.
  • Tools and Techniques: We will develop novel techniques to advance our ability to discover unknown failures, explain model behaviors, and to improve model output through training, responsible generation, and failure mitigation.
  • Demonstrating AI’s Social Benefit: We plan to expand our efforts on AI for the Global Goals, bringing together research, technology, and funding to accelerate progress on the Sustainable Development Goals. This commitment will include $25 million to support NGOs and social enterprises. We will further our work on inclusion and equity by forming more collaborations with community-based experts and impacted communities. This includes continuing the Equitable AI Research Roundtables (EARR), focused on the potential impacts and downstream harms of AI with community based experts from the Othering and Belonging Institute at UC Berkeley, PolicyLink, and Emory University School of Law.

Building ML models and products in a responsible and ethical manner is both our core focus and core commitment.

Acknowledgements

This work reflects the efforts from across the Responsible AI and Human-Centered Technology community, from researchers and engineers to product and program managers, all of whom contribute to bringing our work to the AI community.

Google Research, 2022 & Beyond

This was the second blog post in the “Google Research, 2022 & Beyond” series. Other posts in this series are listed in the table below:

Language Models Computer Vision Multimodal Models
Generative Models Responsible AI Algorithms*
ML & Computer Systems Robotics Health
General Science & Quantum Community Engagement
* Articles will be linked as they are released.

Read More

Supersizing AI: Sweden Turbocharges Its Innovation Engine

Supersizing AI: Sweden Turbocharges Its Innovation Engine

Sweden is outfitting its AI supercomputer for a journey to the cutting edge of machine learning, robotics and healthcare.

It couldn’t ask for a better guide than Anders Ynnerman (above). His signature blue suit, black spectacles and gentle voice act as calm camouflage for a pioneering spirit.

Early on, he showed a deep interest in space, but his career took a different direction. He established the country’s first network of supercomputing centers and went on to pioneer scientific visualization technologies used in hospitals and museums around the world.

Today, he leads Sweden’s largest research effort, WASP — the Wallenberg Artificial Intelligence, Autonomous Systems and Software Program — focused on AI innovation.

The Big Picture

“This is a year when people are turning their focus to sustainability challenges we face as a planet,” said the Linköping University professor. “Without advances in AI and other innovations, we won’t have a sustainable future.”

To supercharge environmental efforts and more, Sweden will upgrade its Berzelius supercomputer. Based on the NVIDIA DGX SuperPOD, it will deliver nearly half an exaflop of AI performance, placing it among the world’s 100 fastest AI supercomputers.

“A machine like Berzelius is fundamental not only for the results it delivers, but the way it catalyzes expertise in Sweden,” he said. “We’re a knowledge-driven nation, so our researchers and companies need access to the latest technology to compete.”

AI Learns Swedish

In June, the system trained GPT-SW3, a family of large language models capable of drafting a speech or answering questions in Swedish.

Today, a more powerful version sports 20 billion parameters, a popular measure of a neural network’s smarts. It can help developers write software and handle other complex tasks.

Long term, researchers aim to train a version with a whopping 175 billion parameters that’s also fluent in Nordic languages like Danish and Norwegian.

One of Sweden’s largest banks is already exploring use of the latest GPT-SW3 variant for a chatbot and other applications.

A Memory Boost

To build big AIs, Berzelius will add 34 NVIDIA DGX A100 systems to its cluster of 60 that makeup the SuperPOD. The new units will sport GPUs with 80GB of memory each.

Anders Ynnerman with Sweden's Berzelius AI supercomputer
Ynnerman with Berzelius at the system’s March 2021 launch.

“Having really fat nodes with large memory is important for some of these models,” Ynnerman said. Atos, the system integrator, is providing “a very smooth ride getting the whole process set up,” he added.

Seeking a Cure for Cancer

In healthcare, a data-driven life sciences program, funded by the Wallenberg Foundation, will be a major Berzelius user. The program spans 10 universities and will, among other applications, employ AI to understand protein folding, fundamentally important to understanding diseases like cancer.

Others will use Berzelius to improve detection of cancer cells and navigate the massive mounds of data in human genomes.

Some researchers are exploring tools such as NVIDIA Omniverse Avatar Cloud Engine and NVIDIA BotMaker to create animated patients. Powered by GPT-SW3, they could help doctors practice telemedicine skills.

Robots in Zero Gravity

Sweden’s work in image and video recognition will get a boost from Berzelius. Such algorithms advance work on the autonomous systems used in modern factories and warehouses.

One project is exploring how autonomous systems act in space and undersea. It’s a topic close to the heart of a recent addition to WASP, researcher Christer Fuglesang, who was named Sweden’s first astronaut in 1992.

Fuglesang went to the International Space Station in 2006 and 2008. Later, as a professor of physics at Sweden’s Royal Institute of Technology, he collaborated with Ynnerman on live shows about life in space, presented in the WISDOME dome theater at the Visualization Center C Ynnerman founded and directs.

Thanks to his expertise in visualization, “I can go to Mars whenever I want,” Ynnerman quipped.

He took NVIDIA founder and CEO Jensen Huang and Marcus Wallenberg — scion of Sweden’s leading industrial family — on a tour of outer space at the dome to mark the Berzelius upgrade. The dome can show the Martian surface in 8K resolution at 120 frames per second, thanks to its use of 12 NVIDIA Quadro RTX 8000 GPUs.

Inspiring the Next Generation

Ynnerman’s algorithms have touched millions who’ve seen visualizations of Egyptian mummies at the British Museum.

“That makes me even more proud than some of my research papers because many are young people we can inspire with a love for science and technology,” he said.

A passion for science and technology has attracted more than 400 active Ph.D. candidates so far to WASP, which is on the way to exceeding its goal of 600 grads by 2031.

But even a visualization specialist can’t be everywhere. So Ynnerman’s pet project will use AI to create a vibrant, virtual museum guide.

“I think we can provide more people a ‘wow’ experience — I want a copilot when I’m navigating the universe,” he said.

Read More

3D Artist Enters the Node Zone, Creating Alien Artifacts This Week ‘In the NVIDIA Studio’

3D Artist Enters the Node Zone, Creating Alien Artifacts This Week ‘In the NVIDIA Studio’

Editor’s note: This post is part of our weekly In the NVIDIA Studio series, which celebrates featured artists, offers creative tips and tricks, and demonstrates how NVIDIA Studio technology improves creative workflows. We’re also deep diving on new GeForce RTX 40 Series GPU features, technologies and resources, and how they dramatically accelerate content creation.

Artist Ducky 3D creates immersive experiences through vibrant visuals and beautiful 3D environments in the alien-inspired animation Stylized Alien Landscape — this week In the NVIDIA Studio.

Ducky 3D is a modern Renaissance man who works with musicians from around the world, creating tour packages and visual art related to the music industry. As a 3D fanatic who specializes in Blender, he often guides emerging and advanced 3D artists to new creative heights. It’s no surprise that his weekly YouTube tutorials have garnered over 400,000 subscribers.

Stylized Alien Landscape is uniquely individualistic and was built entirely in Blender using geometry nodes, or geo nodes.

In the NVIDIA Studio Ducky 3D Customization
Geo nodes can add organic style and customization to Blender scenes and animation.

The use of geo nodes in Blender has recently skyrocketed. That’s because they virtually make modeling a completely procedural process — allowing for non-linear, non-destructive workflows and the instancing of objects — to create incredibly detailed scenes using small amounts of data. Geo nodes can also organically modify all types of geometry, including meshes, curves, volumes, instances and more. Many of these were edited in the making of Stylized Alien Landscape.

Ducky 3D opened a new scene, created a simple cube and applied several popular geo nodes, including random value, triangulate and dual mesh. By simple trial and error with numeric values, he was able to create a provocative, alien-inspired visual.

“I use geometry nodes to take advantage of the dual mesh, which creates organic shapes by manipulating with simple deformations,” he said.

In The NVIDIA Studio Ducky 3D
Ducky 3D’s GeForce 4090 RTX GPU ensured smooth movement in the viewport with virtually no noise.

Simply adding a transform node to the mix got the animation going. Ducky 3D then copied all nodes and scaled the duplicated render to create two animations rotating simultaneously.

Next, Ducky 3D turned his focus to lighting the object, selecting the Blender Cycles renderer to do so.

“Rendering lighting is drastically better in Cycles, but you do you,” he said with candor.

Blender Cycles RTX-accelerated OptiX ray tracing in the viewport unlocks interactive, photoreal rendering for modeling and animation work.

In the NVIDIA Studio Ducky 3D
Ducky 3D applies shading nodes to “Stylized Alien Landscape.”  

Here, Ducky 3D can quickly create more realism in two ways: adding depth of field by playing with distance options and the flat shaded view, and bringing the background out of focus and the object into focus.

Volume “just makes things look cool,” Ducky 3D added. Selecting the world and clicking principled volume made the scene nearly photorealistic.

With the help of geo nodes, Ducky 3D refined the texture to his desired effect, using the bump node, color ramp and noise texture.

For more on the making of Stylized Alien Landscape, check out the video below.

“I needed my viewport to perform well enough to see detail through the added volume,” he said. “Thank goodness for the AI-powered NVIDIA OptiX ray tracing API that my GeForce RTX 4090 GPU enables.”

Ducky 3D accomplished the slightly odd atmosphere that he wanted for his piece through the addition of fog.

“Fog is tough to render, and the GPU helped me see my viewport clearly,” he said.

Ducky 3D Setup In the NVIDIA Studio
3D artist Ducky 3D’s workstation. 

In the NVIDIA Studio Ducky3D
For more Blender tutorials, check out Ducky 3D’s YouTube channel or the NVIDIA Studio Blender playlist.

Enter the #NewYearNewArt Challenge 

A new year comes with new art, and we’d love to see yours! Use the hashtag #NewYearNewArt and tag @NVIDIAStudio to show off your most recent creations for a chance to be featured on our channels.

There have been stunning animations like this lively work from the amazing @stillmanvisual.

There’s also explosive new content from @TheRealYarnHub featuring some action-packed, historically-based battles.

Catch even more #NewYearNewArt entries from other creators on the NVIDIA Studio Instagram stories.

Access tutorials on the Studio YouTube channel and get updates directly in your inbox by subscribing to the Studio newsletter.

Read More

Fresh AI on Security: Digital Fingerprinting Deters Identity Attacks

Fresh AI on Security: Digital Fingerprinting Deters Identity Attacks

Add AI to the list of defenses against identity attacks, one of the most common and hardest breach to prevent.

More than 40% of all data compromises involved stolen credentials, according to the 2022 Verizon Data Breach Investigations Report. And a whopping 80% of all web application breaches involved credential abuse.

“Credentials are the favorite data type of criminal actors because they are so useful for masquerading as legitimate users on the system,” the report said.

In today’s age of zero trust, security experts say it’s not a matter of if but when they’ll experience an identity attack.

A Response From R&D

The director of cybersecurity engineering and R&D at NVIDIA, Bartley Richardson, articulates the challenge simply.

“We need to look for when Bartley is not acting like Bartley,” he said.

Last year, his team described a concept called digital fingerprinting. In the wake of highly publicized attacks in February, he came up with a simple but ambitious idea for implementing it.

A Big Ask

He called a quick meeting with his two tech leads to share the idea. Richardson told them he wanted to create a deep learning model for every account, server, application and device on the network.

The models would learn individual behavior patterns and alert security staff when an account was acting in an uncharacteristic way. That’s how they would deter attacks.

The tech leads thought it was a crazy idea. It was computationally impossible, they told him, and no one was even using GPUs for security yet.

Richardson listened to their concerns and slowly convinced them it was worth a try. They would start with just a model for every account.

Everybody’s Problem

Security managers know it’s a big-data problem.

Companies collect terabytes of data on network events every day. That’s just a fraction of the petabytes of events a day companies could log if they had the resources, according to Daniel Rohrer, NVIDIA’s vice president of software product security.

The fact that it’s a big-data problem is also good news, Rohrer said in a talk at GTC in September (watch free with registration). “We’re already well on the way to combining our cybersecurity and AI efforts,” he said.

Starting With a Proof of Concept

By mid-March, Richardson’s team was focused on ways to run thousands of AI models in tandem. They used NVIDIA Morpheus, an AI security software library announced a year earlier, to build a proof of concept in two months.

Once an entire, albeit crude, product was done, they spent another two months optimizing each portion.

Then they reached out to about 50 NVIDIANs to review their work — security operations and product security teams, and IT folks who would be alpha users.

An Initial Deployment

Three months later, in early October, they had a solution NVIDIA could deploy on its global networks — security software for AI-powered digital fingerprinting.

The software is a kind of LEGO kit, an AI framework anyone can use to create a custom cybersecurity solution.

Version 2.0 is running across NVIDIA’s networks today on just four NVIDIA A100 Tensor Core GPUs. IT staff can create their own models, changing aspects of them to create specific alerts.

Tested and Released

NVIDIA is making these capabilities available in a digital fingerprinting AI workflow included with NVIDIA AI Enterprise 3.0 announced in December.

For identity attackers, “the models Bartley’s team built have anomaly scores that are off the charts, and we’re able to visualize events so we can see things in new ways,” said Jason Recla, NVIDIA’s senior director of information security.

As a result, instead of facing a tsunami of 100 million network events a week, an IT team may have just 8-10 incidents to investigate daily. That cuts the time to detect certain attack patterns from weeks to minutes.

Tailoring AI for Small Events

The team already has big ideas for future versions.

“Our software works well on major identity attacks, but it’s not every day you have an incident like that,” Richardson said. “So, now we’re tuning it with other models to make it more applicable to everyday vanilla security incidents.”

Meanwhile, Richardson’s team used the software to create a proof of concept for a large consulting firm.

“They wanted it to handle a million records in a tenth of a second. We did it in a millionth of a second, so they’re fully on board,” Richardson said.

The Outlook for AI Security

Looking ahead, the team has ideas for applying AI and accelerated computing to secure digital identities and generate hard-to-find training data.

Richardson imagines passwords and multi-factor authentication will be replaced by models that know how fast a person types, with how many typos, what services they use and when they use them. Such detailed digital identities will prevent attackers from hijacking accounts and pretending they are legitimate users.

Data on network events is gold for building AI models that harden networks, but no one wants to share details of real users and break-ins. Synthetic data, generated by a variant of digital fingerprinting, could fill the gap, letting users create what they need to fit their use case.

In the meantime, Recla has advice security managers can act on now.

“Get up to speed on AI,” he said. “Start investing in AI engineering and data science skills — that’s the biggest thing.”

Digital fingerprinting is not a panacea. It’s one more brick in an ever-evolving digital wall that a community of security specialists is building against the next big attack.

You can try this AI-powered security workflow live on NVIDIA LaunchPad starting Jan. 23. And you can watch the video below to learn more about digital fingerprinting.

Read More

OpenAI and Microsoft Extend Partnership

We’re happy to announce that OpenAI and Microsoft are extending our partnership.

This multi-year, multi-billion dollar investment from Microsoft follows their previous investments in 2019 and 2021, and will allow us to continue our independent research and develop AI that is increasingly safe, useful, and powerful.

In pursuit of our mission to ensure advanced AI benefits all of humanity, OpenAI remains a capped-profit company and is governed by the OpenAI non-profit. This structure allows us to raise the capital we need to fulfill our mission without sacrificing our core beliefs about broadly sharing benefits and the need to prioritize safety.

Microsoft shares this vision and our values, and our partnership is instrumental to our progress.

  • We’ve worked together to build multiple supercomputing systems powered by Azure, which we use to train all of our models. Azure’s unique architecture design has been crucial in delivering best-in-class performance and scale for our AI training and inference workloads. Microsoft will increase their investment in these systems to accelerate our independent research and Azure will remain the exclusive cloud provider for all OpenAI workloads across our research, API and products.
  • Learning from real-world use – and incorporating those lessons – is a critical part of developing powerful AI systems that are safe and useful. Scaling that use also ensures AI’s benefits can be distributed broadly. So, we’ve partnered with Microsoft to deploy our technology through our API and the Azure OpenAI Service — enabling enterprise and developers to build on top of GPT, DALL·E, and Codex. We’ve also worked together to build OpenAI’s technology into apps like GitHub Copilot and Microsoft Designer.
  • In an effort to build and deploy safe AI systems, our teams regularly collaborate to review and synthesize shared lessons – and use them to inform iterative updates to our systems, future research, and best practices for use of these powerful AI systems across the industry.

We look forward to continued collaboration and advancing this progress with Microsoft.

OpenAI