Human Following in Mobile Platforms with Person Re-Identification

Human following serves an important human-robotics interaction feature, while real-world scenarios make it challenging particularly for a mobile agent. The main challenge is that when a mobile agent try to locate and follow a targeted person, this person can be in a crowd, be occluded by other people, and/or be facing (partially) away from the mobile agent. To address the challenge, we present a novel person re-identification module, which contains three parts: 1) a 360-degree visual registration process, 2) a neural-based person re-identification mechanism by multiple body parts – human faces…Apple Machine Learning Research

Privacy-Preserving Quantile Treatment Effect Estimation for Randomized Controlled Trials

In accordance with the principle of “data minimization,” many internet companies are opting to record less data. However, this is often at odds with A/B testing efficacy. For experiments with units with multiple observations, one popular data-minimizing technique is to aggregate data for each unit. However, exact quantile estimation requires the full observation-level data. In this paper, we develop a method for approximate Quantile Treatment Effect (QTE) analysis using histogram aggregation. In addition, we can also achieve formal privacy guarantees using differential privacy.Apple Machine Learning Research

What Can CLIP Learn From Task-specific Experts?

This paper has been accepted to the UniReps Workshop in NeurIPS 2023.
Contrastive language image pretraining has become the standard approach for training vision language models. Despite the utility of CLIP visual features as global representations for images, they have limitations when it comes to tasks involving object localization, pixel-level understanding of the image, or 3D perception. Multi-task training is a popular solution to address this drawback, but collecting a large-scale annotated multi-task dataset incurs significant costs. Furthermore, training on separate task specific…Apple Machine Learning Research

Knowledge Bases for Amazon Bedrock now supports hybrid search

Knowledge Bases for Amazon Bedrock now supports hybrid search

At AWS re:Invent 2023, we announced the general availability of Knowledge Bases for Amazon Bedrock. With a knowledge base, you can securely connect foundation models (FMs) in Amazon Bedrock to your company data for fully managed Retrieval Augmented Generation (RAG).

In a previous post, we described how Knowledge Bases for Amazon Bedrock manages the end-to-end RAG workflow for you and shared details about some of the recent feature launches.

For RAG-based applications, the accuracy of the generated response from large language models (LLMs) is dependent on the context provided to the model. Context is retrieved from the vector database based on the user query. Semantic search is widely used because it is able to understand more human-like questions—a user’s query is not always directly related to the exact keywords in the content that answers it. Semantic search helps provide answers based on the meaning of the text. However, it has limitations in capturing all the relevant keywords. Its performance relies on the quality of the word embeddings used to represent meaning of the text. To overcome such limitations, combining semantic search with keyword search (hybrid) will give better results.

In this post, we discuss the new feature of hybrid search, which you can select as a query option alongside semantic search.

Hybrid search overview

Hybrid search takes advantage of the strengths of multiple search algorithms, integrating their unique capabilities to enhance the relevance of returned search results. For RAG-based applications, semantic search capabilities are commonly combined with traditional keyword-based search to improve the relevance of search results. It enables searching over both the content of documents and their underlying meaning. For example, consider the following query:

What is the cost of the book "<book_name>" on <website_name>?

In this query for a book name and website name, a keyword search will give better results, because we want the cost of the specific book. However, the term “cost” might have synonyms such as “price,” so it will be better to use semantic search, which understands the meaning of the text. Hybrid search brings the best of both approaches: precision of semantic search and coverage of keywords. It works great for RAG-based applications where the retriever has to handle a wide variety of natural language queries. The keywords help cover specific entities in the query such as product name, color, and price, while semantics better understands the meaning and intent within the query. For example, if you have want to build a chatbot for an ecommerce website to handle customer queries such as the return policy or details of the product, using hybrid search will be most suitable.

Use cases for hybrid search

The following are some common use cases for hybrid search:

  • Open domain question answering – This involves answering questions on a wide variety of topics. This requires searching over large collections of documents with diverse content, such as website data, which can include various topics such as sustainability, leadership, financial results, and more. Semantic search alone can’t generalize well for this task, because it lacks the capacity for lexical matching of unseen entities, which is important for handling out-of-domain examples. Therefore, combining keyword-based search with semantic search can help narrow down the scope and provide better results for open domain question answering.
  • Contextual-based chatbots – Conversations can rapidly change direction and cover unpredictable topics. Hybrid search can better handle such open-ended dialogs.
  • Personalized search – Web-scale search over heterogeneous content benefits from a hybrid approach. Semantic search handles popular head queries, while keywords cover rare long-tail queries.

Although hybrid search offers wider coverage by combining two approaches, semantic search has precision advantages when the domain is narrow and semantics are well-defined, or when there is little room for misinterpretation, like factoid question answering systems.

Benefits of hybrid search

Both keyword and semantic search will return a separate set of results along with their relevancy scores, which are then combined to return the most relevant results. Knowledge Bases for Amazon Bedrock currently supports four vector stores: Amazon OpenSearch Serverless, Amazon Aurora PostgreSQL-Compatible Edition, Pinecone, and Redis Enterprise Cloud. As of this writing, the hybrid search feature is available for OpenSearch Serverless, with support for other vector stores coming soon.

The following are some of the benefits of using hybrid search:

  • Improved accuracy – The accuracy of the generated response from the FM is directly dependent on the relevancy of retrieved results. Based on your data, it can be challenging to improve the accuracy of your application only using semantic search. The key benefit of using hybrid search is to get improved quality of retrieved results, which in turn helps the FM generate more accurate answers.
  • Expanded search capabilities – Keyword search casts a wider net and finds documents that may be relevant but might not contain semantic structure throughout the document. It allows you to search on keywords as well as the semantic meaning of the text, thereby expanding the search capabilities.

In the following sections, we demonstrate how to use hybrid search with Knowledge Bases for Amazon Bedrock.

Use hybrid search and semantic search options via SDK

When you call the Retrieve API, Knowledge Bases for Amazon Bedrock selects the right search strategy for you to give you most relevant results. You have the option to override it to use either hybrid or semantic search in the API.

Retrieve API

The Retrieve API is designed to fetch relevant search results by providing the user query, knowledge base ID, and number of results that you want the API to return. This API converts user queries into embeddings, searches the knowledge base using either hybrid search or semantic (vector) search, and returns the relevant results, giving you more control to build custom workflows on top of the search results. For example, you can add postprocessing logic to the retrieved results or add your own prompt and connect with any FM provided by Amazon Bedrock for generating answers.

To show you an example of switching between hybrid and semantic (vector) search options, we have created a knowledge base using the Amazon 10K document for 2023. For more details on creating a knowledge base, refer to Build a contextual chatbot application using Knowledge Bases for Amazon Bedrock.

To demonstrate the value of hybrid search, we use the following query:

As of December 31st 2023, what is the leased square footage for physical stores in North America?

The answer for the preceding query involves a few keywords, such as the date, physical stores, and North America. The correct response is 22,871 thousand square feet. Let’s observe the difference in the search results for both hybrid and semantic search.

The following code shows how to use hybrid or semantic (vector) search using the Retrieve API with Boto3:

import boto3

bedrock_agent_runtime = boto3.client(
    service_name = "bedrock-agent-runtime"
)

def retrieve(query, kbId, numberOfResults=5):
    return bedrock_agent_runtime.retrieve(
        retrievalQuery= {
            'text': query
        },
        knowledgeBaseId=kbId,
        retrievalConfiguration= {
            'vectorSearchConfiguration': {
                'numberOfResults': numberOfResults,
                'overrideSearchType': "HYBRID/SEMANTIC", # optional
            }
        }
    )
response = retrieve("As of December 31st 2023, what is the leased square footage for physical stores in North America?", "<knowledge base id>")["retrievalResults"]

The overrideSearchType option in retrievalConfiguration offers the choice to use either HYBRID or SEMANTIC. By default, it will select the right strategy for you to give you most relevant results, and if you want to override the default option to use either hybrid or semantic search, you can set the value to HYBRID/SEMANTIC. The output of the Retrieve API includes the retrieved text chunks, the location type and URI of the source data, and the relevancy scores of the retrievals. The scores help determine which chunks best match the response of the query.

The following are the results for the preceding query using hybrid search (with some of the output redacted for brevity):

[
  {
    "content": {
      "text": "... Description of Use Leased Square Footage (1).... Physical stores (2) 22,871  ..."
    },
    "location": {
      "type": "S3",
      "s3Location": {
        "uri": "s3://<bucket_name>/amazon-10k-2023.pdf"
      }
    },
    "score": 0.6389407
  },
  {
    "content": {
      "text": "Property and equipment, net by segment is as follows (in millions): December 31, 2021 2022 2023 North America $ 83,640 $ 90,076 $ 93,632 International 21,718 23,347 24,357 AWS 43,245 60,324 72,701 Corporate 1.."
    },
    "location": {
      "type": "S3",
      "s3Location": {
        "uri": "s3://<bucket_name>/amazon-10k-2023.pdf"
      }
    },
    "score": 0.6389407
  },
  {
    "content": {
      "text": "..amortization of property and equipment acquired under finance leases of $9.9 billion, $6.1 billion, and $5.9 billion for 2021, 2022, and 2023. 54 Table of Contents Note 4 — LEASES We have entered into non-cancellable operating and finance leases for fulfillment network, data center, office, and physical store facilities as well as server and networking equipment, aircraft, and vehicles. Gross assets acquired under finance leases, ..."
    },
    "location": {
      "type": "S3",
      "s3Location": {
        "uri": "s3://<bucket_name>/amazon-10k-2023.pdf"
      }
    },
    "score": 0.61908984
  }
]

The following are the results for semantic search (with some of the output redacted for brevity):

[
  {
    "content": {
      "text": "Property and equipment, net by segment is as follows (in millions):    December 31,    2021 2022 2023   North America $ 83,640 $ 90,076 $ 93,632  International 21,718 23,347 24,357  AWS 43,245 60,324 72,701.."
    },
    "location": {
      "type": "S3",
      "s3Location": {
        "uri": "s3://<bucket_name>/amazon-10k-2023.pdf"
      }
    },
    "score": 0.6389407
  },
  {
    "content": {
      "text": "Depreciation and amortization expense on property and equipment was $22.9 billion, $24.9 billion, and $30.2 billion which includes amortization of property and equipment acquired under finance leases of $9.9 billion, $6.1 billion, and $5.9 billion for 2021, 2022, and 2023.   54        Table of Contents   Note 4 — LEASES We have entered into non-cancellable operating and finance leases for fulfillment network, data center, office, and physical store facilities as well a..."
    },
    "location": {
      "type": "S3",
      "s3Location": {
        "uri": "s3://<bucket_name>/amazon-10k-2023.pdf"
      }
    },
    "score": 0.61908984
  },
  {
    "content": {
      "text": "Incentives that we receive from property and equipment   vendors are recorded as a reduction to our costs. Property includes buildings and land that we own, along with property we have acquired under build-to-suit lease arrangements when we have control over the building during the construction period and finance lease arrangements..."
    },
    "location": {
      "type": "S3",
      "s3Location": {
        "uri": "s3://<bucket_name>/amazon-10k-2023.pdf"
      }
    },
    "score": 0.61353767
  }
]

As you can see in the results, hybrid search was able to retrieve the search result with the leased square footage for physical stores in North America as mentioned in the user query. The main reason was that hybrid search was able to combine the results from keywords such as date, physical stores, and North America in the query, whereas semantic search did not. Therefore, when the search results are augmented with the user query and the prompt, the FM won’t be able to provide the correct response in case of semantic search.

Now let’s look at the RetrieveAndGenerate API with hybrid search to understand the final response generated by the FM.

RetrieveAndGenerate API

The RetrieveAndGenerate API queries a knowledge base and generates a response based on the retrieved results. You specify the knowledge base ID as well as the FM to generate a response from the results. Amazon Bedrock converts the queries into embeddings, queries the knowledge base based on the search type, and then augments the FM prompt with the search results as context information and returns the FM-generated response.

Let’s use the query “As of December 31st 2023, what is the leased square footage for physical stores in North America?” and ask the RetrieveAndGenerate API to generate the response using our query:

def retrieveAndGenerate(input, kbId):
    return bedrock_agent_runtime.retrieve_and_generate(
        input={
            'text': input
        },
        retrieveAndGenerateConfiguration={
            'type': 'KNOWLEDGE_BASE',
            'knowledgeBaseConfiguration': {
                'knowledgeBaseId': kbId,
                'modelArn': 'arn:aws:bedrock:us-east-1::foundation-model/anthropic.claude-instant-v1'
                'retrievalConfiguration': {
                'overrideSearchType': 'HYBRID/SEMANTIC',
                }
                }
            }
        )
response = retrieveAndGenerate("As of December 31st 2023, what is the leased square footage for physical stores in North America?", "<knowledge base id>")["output"]["text"]

The following are the results using hybrid search:

22,871 thousand leased square feet

The following are the results using semantic search:

The search results do not contain any information about the leased square footage for physical stores in North America for 2023.

The actual answer for the query is 22,871 thousand leased square feet, which is generated by the hybrid search. The retrieved search results for hybrid search included the information about the leased square footage for physical stores in North America, whereas semantic search wasn’t able to fetch the right information from the vector store due to embeddings translation. Therefore, the FM couldn’t provide the correct response because it didn’t have the correct and most relevant search results.

However, for more generic questions that don’t involve entities such as physical stores or North America, both hybrid and semantic search give similar results.

The following are sample responses from a few queries demonstrating cases when both hybrid and semantic search yield similar results.

Question Semantic Search: RAG API Hybrid Search: RAG API
How does Amazon serve the developers and enterprises? We serve developers and enterprises of all sizes, including start-ups, government agencies, and academic institutions, through AWS, which offers a broad set of on-demand technology services, including compute, storage, database, analytics, and machine learning, and other services We serve developers and enterprises of all sizes, including start-ups, government agencies, and academic institutions, through AWS, which offers a broad set of on-demand technology services, including compute, storage, database, analytics, and machine learning, and other services.
Who are the Executive Officers and Directors for Amazon as of January 24, 2024? The executive officers of Amazon as of 2024 include Andrew R. Jassy as President and Chief Executive Officer, Douglas J. Herrington as CEO Worldwide Amazon Stores, Brian T. Olsavsky as Senior Vice President and Chief Financial Officer, Shelley L. Reynolds as Vice President, Worldwide Controller, and Principal Accounting Officer, Adam N. Selipsky as CEO Amazon Web Services, and David A. Zapolsky as Senior Vice President, Global Public Policy and General Counsel. As of 2024, Jeffrey P. Bezos serves as Executive Chair of Amazon.com. Andrew R. Jassy serves as President and Chief Executive Officer. Other executive officers include Douglas J. Herrington as CEO Worldwide Amazon Stores, Brian T. Olsavsky as Senior Vice President and Chief Financial Officer, Shelley L. Reynolds as Vice President, Worldwide Controller, and Principal Accounting Officer, and Adam N. Selipsky as CEO Amazon Web Services. David A. Zapolsky serves as Senior Vice President, Global Public Policy and General Counsel

Use hybrid search and semantic search options via the Amazon Bedrock console

To use hybrid and semantic search options on the Amazon Bedrock console, complete the following steps:

  1. On the Amazon Bedrock console, choose Knowledge base in the navigation pane.
  2. Choose the knowledge base you created.
  3. Choose Test knowledge base.
  4. Choose the configurations icon.
  5. For Search type¸ select Hybrid search (semantic & text).

By default, you can choose an FM to get a generated response for your query. If you want to see only the retrieved results, you can toggle Generate response off to get only retrieved results.

Conclusion

In this post, we covered the new query feature in Knowledge Bases for Amazon Bedrock, which enables hybrid search. We learned how to configure the hybrid search option in the SDK and the Amazon Bedrock console. This helps overcome some of the limitations of relying solely on semantic search, especially for searching over large collections of documents with diverse content. The use of hybrid search depends on the document type and the use case that you are trying to implement.

For additional resources, refer to the following:

References

Improving Retrieval Performance in RAG Pipelines with Hybrid Search


About the Authors

Mani Khanuja is a Tech Lead – Generative AI Specialists, author of the book Applied Machine Learning and High Performance Computing on AWS, and a member of the Board of Directors for Women in Manufacturing Education Foundation Board. She leads machine learning projects in various domains such as computer vision, natural language processing, and generative AI. She speaks at internal and external conferences such AWS re:Invent, Women in Manufacturing West, YouTube webinars, and GHC 23. In her free time, she likes to go for long runs along the beach.

Pallavi Nargund is a Principal Solutions Architect at AWS. In her role as a cloud technology enabler, she works with customers to understand their goals and challenges, and give prescriptive guidance to achieve their objective with AWS offerings. She is passionate about women in technology and is a core member of Women in AI/ML at Amazon. She speaks at internal and external conferences such as AWS re:Invent, AWS Summits, and webinars. Outside of work she enjoys volunteering, gardening, cycling and hiking.

Read More

Automakers Electrify Geneva International Motor Show

Automakers Electrify Geneva International Motor Show

The Geneva International Motor Show, one of the most important and long-standing global auto exhibitions, opened this week, with the spotlight on several China and U.S. EV makers building on NVIDIA DRIVE that are expanding their presence in Europe.

BYD

One of the key reveals is BYD’s Yangweng U8 plug-in hybrid large SUV, built on the NVIDIA DRIVE Orin platform. It features an electric drivetrain with a range of up to 1,000 kilometers, or 621 miles, thanks to its 49 kilowatt-hour battery and range-extender gasoline engine.

IM Motors

The premium EV brand by SAIC and Alibaba, IM Motors, unveiled its IM L6 mid-size electric  saloon (aka sedan), which will soon be available to the European market.

The L6 features IM’s proprietary Intelligent Mobility Autonomous Driving System, powered by NVIDIA DRIVE Orin. The advanced system uses a comprehensive sensor suite, including one lidar, three radars, 11 cameras and 12 ultrasonic sensors. IM reports that the L6 is capable of highway navigation on autopilot, marking a milestone in the company’s mission to bring advanced self-driving vehicles to market.

IM Motors has been working with NVIDIA since 2021, using NVIDIA DRIVE Orin as the AI brain for its flagship LS7 SUV and L7 sedan.

IM Motors IM L6

Lucid 

The Geneva Motor Show marks the European debut of Lucid’s Gravity SUV.

The Gravity aims to set fresh benchmarks for sustainability and technological innovation when its production commences in late 2024. Powered by NVIDIA DRIVE, the luxury SUV features supercar levels of performance and an impressive battery range to mitigate range anxiety.

Lucid Gravity SUV

Explore the latest developments in mobility and meet NVIDIA DRIVE ecosystem partners showcasing their next-gen vehicles at GTC, the conference for the era of AI, running from March 18-21 at the San Jose Convention Center and online.

Find additional details on automotive-specific programming at GTC.

Read More

No Noobs Here: Top Pro Gamers Bolster Software Quality Assurance Testing

No Noobs Here: Top Pro Gamers Bolster Software Quality Assurance Testing

For some NVIDIANs, it’s always game day.

Our Santa Clara-based software quality assurance team boasts some of the world’s top gamers, whose search for bugs and errors is as strategic as their battle plans for toppling top-tier opponents in video games.

Two team members of the QA team — friendly colleagues in the office but fierce rivals in the esports arena — recently competed against one another at the finals of the Guildhouse Fighters tournament, a local circuit in Northern California.

Eduardo “PR Balrog” Perez-Frangie, a veteran Street Fighter player, fought his way to the Grand Final to face Miky “Samurai” Chea.

Perez-Frangie came out on top, but there was a twist: he’d brought to the contest his two-year-old son, who fell asleep on his father’s chest mid-match. “I played the rest of the game with a deadweight in my lap,” he said.

Perez-Frangie and Chea play at work — guess who won? QA team members Alyssa Ruiz and DaJuan McDaniel cheer them on.

A Competitive Spirit

Perez-Frangie has competed for 15 years in a series of fighting game titles, including Marvel vs. Capcom, Killer Instinct and Mortal Kombat. He was part of the Evil Geniuses esports organization when he joined NVIDIA almost a decade ago but now plays without a sponsor so he can enjoy more family time.

He’s played against Chea in esports events for years, and they’re just as competitive in the office as in the stadiums.

“Even when we’re in the test environment, when we’re looking for bugs, we are competitive,” Perez-Frangie said. “But Miky stays calm — he was a teacher, so he can put everyone in their place.”

Chea’s days teaching kindergarten through eighth grade in Fresno, California, are behind him, but the coaching aspect of his current gaming-related role reminds him of the classroom — a place to share insights and takeaways.

As new games are released and older ones updated, “the hardware and software stack needs to work in harmony,” Chea said. “Our pro gaming team is the last line of defense to ensure our customers have the best gaming experience possible.”

The QA team gathers to check out a new game.

QA team member DaJuan “Shroomed” McDaniel is a top-ranked Super Smash Bros. Melee player whose signature characters are Sheik and Marth. He’s also widely considered to be the best Dr. Mario player of all time.

“Being a competitive gamer, visual fidelity is so important,” McDaniel said. “We can see and feel visual anomalies, frame discrepancies, general latency and anything that’s off in ways that others won’t see.”

McDaniel playing “Cyberpunk 2077.”

A Winning Formula

Alyssa Ruiz joined the QA team a year ago, initially testing drivers as part of the pro gaming team before switching to testing NVIDIA DLSS, a suite of neural rendering techniques that use deep learning to improve image quality and performance.

Introduced to gaming by her brothers through Halo 3, she later dedicated hours to Fortnite before deciding to stream the gameplay directly from her console. She posted the content to TikTok and began playing in online tournaments. By then, her game of choice was Riot Games’ Valorant.

“The game has a large female player base with visually appealing graphics and an engrossing storyline,” she said. “It can be more complex than a fighting game because it relies on a combination of abilities with strategies. It’s also a team game, so if someone isn’t pulling their weight, it’s a loss for all of us.”

That’s not unlike the team dynamic in the office.

Perez-Frangie, Ruiz, Chea and McDaniel.

Each member brings their own specialties to the testing environment, where they’re using their keen eyes to scrutinize DLSS technologies.

Their acute awareness of game latency and image fidelity — honed through hundreds of hours of gameplay — means the team can achieve better test coverage all around.

“We’re all very competitive, but there’s a real diversity that contributes to a stronger team,” Ruiz said. “And we all get along really well.”

Learn more about NVIDIA life, culture and careers

Read More

What Is Trustworthy AI?

What Is Trustworthy AI?

Artificial intelligence, like any transformative technology, is a work in progress — continually growing in its capabilities and its societal impact. Trustworthy AI initiatives recognize the real-world effects that AI can have on people and society, and aim to channel that power responsibly for positive change.

What Is Trustworthy AI?

Trustworthy AI is an approach to AI development that prioritizes safety and transparency for those who interact with it. Developers of trustworthy AI understand that no model is perfect, and take steps to help customers and the general public understand how the technology was built, its intended use cases and its limitations.

In addition to complying with privacy and consumer protection laws, trustworthy AI models are tested for safety, security and mitigation of unwanted bias. They’re also transparent — providing information such as accuracy benchmarks or a description of the training dataset — to various audiences including regulatory authorities, developers and consumers.

Principles of Trustworthy AI

Trustworthy AI principles are foundational to NVIDIA’s end-to-end AI development. They have a simple goal: to enable trust and transparency in AI and support the work of partners, customers and developers.

Privacy: Complying With Regulations, Safeguarding Data

AI is often described as data hungry. Often, the more data an algorithm is trained on, the more accurate its predictions.

But data has to come from somewhere. To develop trustworthy AI, it’s key to consider not just what data is legally available to use, but what data is socially responsible to use.

Developers of AI models that rely on data such as a person’s image, voice, artistic work or health records should evaluate whether individuals have provided appropriate consent for their personal information to be used in this way.

For institutions like hospitals and banks, building AI models means balancing the responsibility of keeping patient or customer data private while training a robust algorithm. NVIDIA has created technology that enables federated learning, where researchers develop AI models trained on data from multiple institutions without confidential information leaving a company’s private servers.

NVIDIA DGX systems and NVIDIA FLARE software have enabled several federated learning projects in healthcare and financial services, facilitating secure collaboration by multiple data providers on more accurate, generalizable AI models for medical image analysis and fraud detection.

Safety and Security: Avoiding Unintended Harm, Malicious Threats

Once deployed, AI systems have real-world impact, so it’s essential they perform as intended to preserve user safety.

The freedom to use publicly available AI algorithms creates immense possibilities for positive applications, but also means the technology can be used for unintended purposes.

To help mitigate risks, NVIDIA NeMo Guardrails keeps AI language models on track by allowing enterprise developers to set boundaries for their applications. Topical guardrails ensure that chatbots stick to specific subjects. Safety guardrails set limits on the language and data sources the apps use in their responses. Security guardrails seek to prevent malicious use of a large language model that’s connected to third-party applications or application programming interfaces.

NVIDIA Research is working with the DARPA-run SemaFor program to help digital forensics experts identify AI-generated images. Last year, researchers published a novel method for addressing social bias using ChatGPT. They’re also creating methods for avatar fingerprinting — a way to detect if someone is using an AI-animated likeness of another individual without their consent.

To protect data and AI applications from security threats, NVIDIA H100 and H200 Tensor Core GPUs are built with confidential computing, which ensures sensitive data is protected while in use, whether deployed on premises, in the cloud or at the edge. NVIDIA Confidential Computing uses hardware-based security methods to ensure unauthorized entities can’t view or modify data or applications while they’re running — traditionally a time when data is left vulnerable.

Transparency: Making AI Explainable

To create a trustworthy AI model, the algorithm can’t be a black box — its creators, users and stakeholders must be able to understand how the AI works to trust its results.

Transparency in AI is a set of best practices, tools and design principles that helps users and other stakeholders understand how an AI model was trained and how it works. Explainable AI, or XAI, is a subset of transparency covering tools that inform stakeholders how an AI model makes certain predictions and decisions.

Transparency and XAI are crucial to establishing trust in AI systems, but there’s no universal solution to fit every kind of AI model and stakeholder. Finding the right solution involves a systematic approach to identify who the AI affects, analyze the associated risks and implement effective mechanisms to provide information about the AI system.

Retrieval-augmented generation, or RAG, is a technique that advances AI transparency by connecting generative AI services to authoritative external databases, enabling models to cite their sources and provide more accurate answers. NVIDIA is helping developers get started with a RAG workflow that uses the NVIDIA NeMo framework for developing and customizing generative AI models.

NVIDIA is also part of the National Institute of Standards and Technology’s U.S. Artificial Intelligence Safety Institute Consortium, or AISIC, to help create tools and standards for responsible AI development and deployment. As a consortium member, NVIDIA will promote trustworthy AI by leveraging best practices for implementing AI model transparency.

And on NVIDIA’s hub for accelerated software, NGC, model cards offer detailed information about how each AI model works and was built. NVIDIA’s Model Card ++ format describes the datasets, training methods and performance measures used, licensing information, as well as specific ethical considerations.

Nondiscrimination: Minimizing Bias

AI models are trained by humans, often using data that is limited by size, scope and diversity. To ensure that all people and communities have the opportunity to benefit from this technology, it’s important to reduce unwanted bias in AI systems.

Beyond following government guidelines and antidiscrimination laws, trustworthy AI developers mitigate potential unwanted bias by looking for clues and patterns that suggest an algorithm is discriminatory, or involves the inappropriate use of certain characteristics. Racial and gender bias in data are well-known, but other considerations include cultural bias and bias introduced during data labeling. To reduce unwanted bias, developers might incorporate different variables into their models.

Synthetic datasets offer one solution to reduce unwanted bias in training data used to develop AI for autonomous vehicles and robotics. If data used to train self-driving cars underrepresents uncommon scenes such as extreme weather conditions or traffic accidents, synthetic data can help augment the diversity of these datasets to better represent the real world, helping improve AI accuracy.

NVIDIA Omniverse Replicator, a framework built on the NVIDIA Omniverse platform for creating and operating 3D pipelines and virtual worlds, helps developers set up custom pipelines for synthetic data generation. And by integrating the NVIDIA TAO Toolkit for transfer learning with Innotescus, a web platform for curating unbiased datasets for computer vision, developers can better understand dataset patterns and biases to help address statistical imbalances.

Learn more about trustworthy AI on NVIDIA.com and the NVIDIA Blog. For more on tackling unwanted bias in AI, watch this talk from NVIDIA GTC and attend the trustworthy AI track at the upcoming conference, taking place March 18-21 in San Jose, Calif, and online.

Read More

Expedite your Genesys Cloud Amazon Lex bot design with the Amazon Lex automated chatbot designer

Expedite your Genesys Cloud Amazon Lex bot design with the Amazon Lex automated chatbot designer

The rise of artificial intelligence (AI) has created opportunities to improve the customer experience in the contact center space. Machine learning (ML) technologies continually improve and power the contact center customer experience by providing solutions for capabilities like self-service bots, live call analytics, and post-call analytics. Self-service bots integrated with your call center can help you achieve decreased wait times, intelligent routing, decreased time to resolution through self-service functions or data collection, and improved net promoter scores (NPS). Some examples include a customer calling to check on the status of an order and receiving an update from a bot, or a customer needing to submit a renewal for a license and the chatbot collecting the necessary information, which it hands over to an agent for processing.

With Amazon Lex bots, you can use conversational AI capabilities to enable these capabilities within your call center. Amazon Lex uses automatic speech recognition (ASR) and natural language understanding (NLU) to understand the customer’s needs and assist them on their journey.

Genesys Cloud (an omni-channel orchestration and customer relationship platform) provides a contact center platform in a public cloud model that enables quick and simple integration of AWS Contact Center Intelligence (AWS CCI) to transform the modern contact center from a cost center into a profit center. As part of AWS CCI, Genesys Cloud integrates with Amazon Lex, which enables self-service, intelligent routing, and data collection capabilities.

When exploring AWS CCI capabilities with Amazon Lex and Genesys Cloud, you may be unsure of where to start on your bot design journey. To assist those who may be starting with a blank canvas, Amazon Lex provides the Amazon Lex automated chatbot designer. The automated chatbot designer uses ML to provide an initial bot design that you can then refine and launch conversational experiences faster based on your current call transcripts. With the automated chatbot designer, Amazon Lex customers and partners have a straightforward and intuitive way of designing chatbots and can reduce bot design time from weeks to hours. However, the automated chatbot designer requires transcripts to be in a certain format that is not aligned to Genesys Cloud transcript exports.

In this post, we show how you can implement an architecture using Amazon EventBridge, Amazon Simple Storage Service (Amazon S3), and AWS Lambda to automatically collect, transform, and load your Genesys call transcripts in the required format for the Amazon Lex automated chatbot designer. You can then run the automated chatbot designer on your transcripts, be given recommendations for bot design, and streamline your bot design journey.

Solution overview

The following diagram illustrates the solution architecture.

The solution workflow consists of the following steps:

  1. Genesys Cloud sends iterative transcripts events to your EventBridge event bus.
  2. Lambda receives the iterative transcripts from EventBridge, determines when a conversation is complete, and invokes the Transcript API within Genesys Cloud and drops the full transcript in an S3 bucket.
  3. When a new full transcript is uploaded to Amazon S3, Lambda converts the Genesys Cloud formatted transcript into the required format for the Amazon Lex automated chatbot designer and copies it to an S3 bucket.
  4. The Amazon Lex automated chatbot designer uses ML to build an initial bot design based on the provided Genesys Cloud transcripts.

Prerequisites

Before you deploy the solution, you must complete the following prerequisites:

  1. Set up your Genesys Cloud CX account and make sure that you are able to log in. For more information on setting up your account, refer to the Genesys documentation.
  2. Make sure that the right permissions are set for enabling and publishing transcripts from Genesys. For more information on setting up the required permissions, refer to Roles and permissions overview.
  3. If PCI and PII encryption is required for transcription, make sure it is set up in Genesys. For more information on setting up the required permissions, refer to Are interaction transcripts encrypted when stored in the cloud.
  4. Set up an AWS account with the appropriate permissions.

Deploy the Genesys EventBridge integration

To enable the EventBridge integration with Genesys Cloud, complete the following steps:

  1. Log in to the Genesys Cloud environment.
  2. Choose Admin, Integrations, Add Integrations, and Amazon EventBridge Source.
  3. On the Configuration tab, provide the following information:
    1. For AWS Account ID, enter your AWS account ID.
    2. For AWS Account Region, enter the Region where you want EventBridge to be set up.
    3. For Event Source Suffix, enter a suffix (for example, genesys-eb-poc-demo).
  4. Save your configuration.
  5. On the EventBridge console, choose Integration in the navigation pane, then choose Partner event sources.

There should be an event source listed with a name like aws.partner/genesys.com/…/genesys-eb-poc-demo.

  1. Select the partner event source and choose Associate with event bus.

The status changes from Pending to Active. This sets up the EventBridge configuration for Genesys.

Next, you set up OAuth2 credentials in Genesys Cloud for authorizing the API call to get the final transcript.

  1. Navigate to the Genesys Cloud instance.
  2. Choose Admin, Integrations, and OAuth.
  3. Choose Add Client.
  4. On the Client Details tab, provide the following information:
    1. For App Name, enter a name (for example, TranscriptInvoke-creds).
    2. For Grant Types, select Client Credentials.

Make sure you’re using the right role that has access to invoke the Transcribe APIs.

  1. Choose Save.

This generates new values for Client ID and Client Secret. Copy these values to use in the next section, where you configure the template for the solution.

Deploy the solution

After you have set up the Genesys EventBridge integration, you can deploy an AWS Serverless Application Model (AWS SAM) template, which deploys the remainder of the architecture. To deploy the solution in your account, complete the following steps:

  1. Install AWS SAM if not installed already. For instructions, refer to Installing the AWS SAM CLI.
  2. Download the GitHub repo and unzip to your directory.
  3. Navigate to the genesys-to-lex-automated-chatbot-designer folder and run the following commands:
    sam build --use-container
    sam deploy –guided

The first command builds the source of your application. The second command packages and deploys your application to AWS, with a series of prompts:

  • Stack Name – Enter the name of the stack to deploy to AWS CloudFormation. This should be unique to your account and Region; a good starting point is something matching your project name.
  • AWS Region – Enter the Region you want to deploy your app to. Make sure it is deployed in the same Region as the EventBridge event bus.
  • Parameter GenesysBusname – Enter the bus name created when you configured the Genesys integration. The pattern of the bus name should look like aws.partner/genesys.com/*.
  • Parameter ClientId – Enter the client ID you copied earlier.
  • Parameter ClientSecret – Enter the client secret you copied earlier.
  • Parameter FileNamePrefix – Change the default file name prefix for the target transcript file in the raw S3 bucket or keep the default.
  • Parameter GenCloudEnv – Enter is the cloud environment for the specific Genesys organization. Genesys is available in more than 15 Regions worldwide as of this writing, so this value is mandatory and should point to the environment where your organization is created in Genesys (for example, usw2.pure.cloud).
  • Confirm changes before deploy – If set to yes, any change sets will be shown to you before deployment for manual review. If set to no, the AWS SAM CLI will automatically deploy application changes.
  • Allow SAM CLI IAM role creation – Many AWS SAM templates, including this example, create AWS Identity and Access Management (IAM) roles required for the Lambda functions included to access AWS services. By default, these are scoped down to the minimum required permissions. To deploy a CloudFormation stack that creates or modifies IAM roles, you must provide the CAPABILITY_IAM value for capabilities. If permission isn’t provided through this prompt, to deploy this example, you must explicitly pass --capabilities CAPABILITY_IAM to the sam deploy command.
  • Save arguments to samconfig.toml – If set to yes, your choices will be saved to a configuration file inside the project, so that in the future you can rerun sam deploy without parameters to deploy changes to your application.

After you deploy your AWS SAM application in your account, you can test that Genesys transcripts are being sent to your account and being transformed into the required format for the Amazon Lex automated chatbot designer.

Make a test call to validate the solution

After you have set up the Genesys EventBridge integration and deployed the preceding AWS SAM template, you can make test calls and validate that files are ending up in the S3 bucket for transformed files. At a high level, you need to perform the following steps:

  1. Make a test call to your Genesys instance to create a transcript.
  2. Wait a few minutes and check the TransformedTranscript bucket for the output.

Run the automated chatbot designer

After you have a few days’ worth of transcripts saved in Amazon S3, you can run the automated chatbot designer through the Amazon Lex console using the steps in this section. For more information about the minimum and maximum amount of turns for the service, refer to Prepare transcripts.

  1. On the Amazon Lex V2 console, choose Bots in the navigation pane.
  2. Choose Create bot.
  3. Select Start with transcripts as the creation method.
  4. Give the bot a name (for this example, InsuranceBot) and provide an optional description.
  5. Select Create a role with basic Amazon Lex permissions and use this as your runtime role.
  6. After you fill out the other fields, choose Next to proceed to the language configuration.
  7. Choose the language and voice for your interaction.
  8. Specify the Amazon S3 location of the transcripts that the solution has converted for you.
  9. Add additional local paths if you have a specific a folder structure within your S3 bucket.
  10. Apply a filter (date range) for your input transcripts.
  11. Choose Done.

You can use the status bar on the Amazon S3 console to track the analysis. Within a few hours, the automated chatbot designer surfaces a chatbot design that includes user intents, sample phrases associated with those intents, and a list of all the information required to fulfill them. The amount of time it takes to complete training depends on several factors, including the volume of transcripts and the complexity of the conversations. Typically, 600 lines of transcript are analyzed every minute.

  1. Choose Review to view the intents and slot types discovered by the automated chatbot designer.

The Intents tab lists all the intents along with sample phrases and slots, and the Slot types tab provides a list of all the slot types along with slot type values.

  1. Choose any of the intents to review the sample utterances and slots. For example, in the following screenshot, we choose ChangePassword to view the utterances.
  2. Choose the Associated transcripts tab to review the conversations used to identify the intents.
  3. After you review the results, select the intents and slot types relevant to your use case and choose Add.

This adds the selected intents and slot types to the bot. You can now iterate on this design by making changes such as adding prompts, merging intents or slot types, and renaming slots.

You have now used the Amazon Lex automated chatbot designer to identify common intents, utterances mapped to those intents, and information that the chatbot needs to collect to fulfill certain business functions.

Clean up

When you’re finished, clean up your resources by using the following command within the AWS SAM CLI:

sam delete

Conclusion

This post showed you how to use the Genesys Cloud CX and EventBridge integration to send your Genesys CX transcripts to your AWS account, transform them, and use them with the Amazon Lex automated chatbot designer to create sample bots, intents, utterances, and slots. This architecture can help first-time AWS CCI users and current AWS CCI users onboard more chatbots using the Genesys CX and Amazon Lex integration, or in continuous improvement opportunities where you may want to compare your current intent design to that outputted by the Amazon Lex automated chatbot designer. For more information about other AWS CCI capabilities, see Contact Center Intelligence.


About the Authors

Joe Morotti is a Solutions Architect at Amazon Web Services (AWS), helping Enterprise customers across the Midwest US. He has held a wide range of technical roles and enjoy showing customer’s art of the possible. In his free time, he enjoys spending quality time with his family exploring new places and over analyzing his sports team’s performance.

Anand Bose is a Senior Solutions Architect at Amazon Web Services, supporting ISV partners who build business applications on AWS. He is passionate about creating differentiated solutions that unlock customers for cloud adoption. Anand lives in Dallas, Texas and enjoys travelling.

Teri Ferris is responsible for architecting great customer experiences alongside business partners, leveraging Genesys technology solutions that enable Experience Orchestration for contact centers. In her role she advises on solution architecture, integrations, IVR, routing, reporting analytics, self-service, AI, outbound, mobile capabilities, omnichannel, social channels, digital, unified communications (UCaaS), and analytics and how they can streamline the customer experience. Before Genesys, she held senior leadership roles at Human Resources, Payroll, and Learning Management companies, including overseeing the Contact Center.

Read More

Live at GTC: Hear From Industry Leaders Using AI to Drive Innovation and Agility

Live at GTC: Hear From Industry Leaders Using AI to Drive Innovation and Agility

Interest in new AI applications reached a fever pitch last year as business leaders began exploring AI pilot programs. This year, they’re focused on strategically implementing these programs to create new value and sharpen their competitive advantage.

GTC, NVIDIA’s conference on AI and accelerated computing, set for March 18-21 at the San Jose Convention Center, will feature leaders across a broad swath of industries discussing how they’re charting the path to AI-driven innovation.

Execs from Bentley Systems, Lowe’s, Siemens and Verizon are among those sharing their companies’ AI journeys.

Don’t miss NVIDIA founder and CEO Jensen Huang’s GTC keynote on Monday, March 18, at 1 p.m. PT.

AI Takes Center Stage in Enterprise Technology Priorities

Nearly three-quarters of C-suite executives plan to increase their company’s tech investments this year, according to a BCG survey of C-suite executives, and 89% rank AI and generative AI among their top three priorities. More than half expect AI to deliver cost savings, primarily through productivity gains, improved customer service and IT efficiencies.

However, challenges to driving value with AI remain, including reskilling workers, prioritizing the right AI use cases and developing a strategy to implement responsible AI.

Join us in person or online to learn how industry leaders are overcoming these challenges to thrive with AI.

Here’s a preview of top industry sessions:

Financial Services

Navigating the Opportunity for Generative AI in Financial Services, featuring speakers from NVIDIA, MasterCard, Capital One and Goldman Sachs.

Enterprise AI in Banking: How One Leader Is Investing in “AI First,” featuring Alexandra V. Mousavizadeh, CEO of Evident, and Chintan Mehta, chief information officer and head of digital technology and innovation at Wells Fargo.

How PayPal Reduced Cloud Costs by up to 70% With Spark RAPIDS, featuring Illay Chen, software engineer at PayPal.

Public Sector

Generative AI Adoption and Operational Challenges in Government, featuring speakers from Microsoft, NVIDIA and the U.S. Army.

How to Apply Generative AI to Improve Cybersecurity, featuring Bartley Richardson, director of cybersecurity engineering at NVIDIA.

Healthcare

Healthcare Is Adopting Generative AI, Becoming One of the Largest Tech Industries, featuring Kimberly Powell, vice president of healthcare and life sciences at NVIDIA.

The Role of Generative AI in Modern Medicine, featuring speakers from ARK Investment Management, NVIDIA, Microsoft and Scripps Research.

How Artificial Intelligence Is Powering the Future of Biomedicine, featuring Priscilla Chan, cofounder and co-CEO of the Chan Zuckerberg Initiative, and Mona Flores, global head of medical AI at NVIDIA.

Retail and Consumer Packaged Goods

Augmented Marketing in Beauty With Generative AI, featuring Asmita Dubey, chief digital and marketing officer at L’Oréal.

AI and the Radical Transformation of Marketing, featuring Stephan Pretorius, chief technology officer at WPP.

How Lowe’s Is Driving Innovation and Agility With AI, featuring Azita Martin, vice president of artificial intelligence for retail and consumer packaged goods at NVIDIA, and Seemantini Godbole, executive vice president and chief digital and information officer at Lowe’s.

Telecommunications

Special Address: Three Ways Artificial Intelligence Is Transforming Telecommunications, featuring Ronnie Vasishta, senior vice president of telecom at NVIDIA.

Generative AI as an Innovative Accelerator in Telcos, featuring Asif Hasan, cofounder of Quantiphi; Lilach Ilan, global head of business development, telco operations at NVIDIA; and Chris Halton, vice president of product strategy and innovation at Verizon.

How Telcos Are Enabling National AI Infrastructure and Platforms, featuring speakers from Indosat, NVIDIA, Singtel and Telconet.

Manufacturing

Accelerating Aerodynamics Analysis at Mercedes-Benz, featuring Liam McManus, technical product manager at Siemens; Erich Jehle-Graf of Mercedes Benz; and Ian Pegler, global business development, computer-aided design at NVIDIA.

Omniverse-Based Fab Digital Twin Platform for Semiconductor Industry, featuring Seokjin Youn, corporate vice president and head of the management information systems team at Samsung Electronics.

Digitalizing Global Manufacturing Supply Chains With Digital Twins, Powered by OpenUSD, featuring Kirk Fleischhaue, senior vice president at Foxconn.

Automotive

Applying AI & LLMs to Transform the Luxury Automotive Experience, featuring Chrissie Kemp, chief data and digital product officer at JLR (Jaguar Land Rover).

Accelerating Automotive Workflows With Large Language Models, featuring Bryan Goodman, director of artificial intelligence at Ford Motor Co.

How LLMs and Generative AI Will Enhance the Way We Experience Self-Driving Cars, featuring Alex Kendall, cofounder and CEO of Wayve.

Robotics 

Robotics and the Role of AI: Past, Present and Future, featuring Marc Raibert, executive director at The AI Institute, and Dieter Fox, senior director of robotics research at NVIDIA.

Breathing Life into Disney’s Robotic Characters With Deep Reinforcement Learning, featuring Mortiz Bächer, associate lab director of robotics at Disney Research.

Media and Entertainment 

Unlocking Creative Potential: The Synergy of AI and Human Creativity, featuring Andrea Gagliano, senior director of data science, AI/ML at Getty Images.

Beyond the Screen: Unraveling the Impact of AI in the Film Industry, featuring Nikola Todorovic, cofounder and CEO at Wonder Dynamics; Chris Jacquemin, head of digital strategy at WME; and Sanja Fidler, vice president of AI research at NVIDIA.

Revolutionizing Fan Engagement: Unleashing the Power of AI in Software-Defined Production, featuring ​​Lewis Smithingham, senior vice president of innovation and creative solutions at Media.Monks.

Energy

Panel: Building a Lower-Carbon Future With HPC and AI in Energy, featuring speakers from NVIDIA, Shell, ExxonMobil, Schlumberger and Petrobas.

The Increasing Complexity of the Electric Grid Demands Edge Computing, featuring Marissa Hummon, chief technology officer at Utilidata.

Browse a curated list of GTC sessions for business leaders of every technical level and area of interest.  

Read More

NLPositionality: Characterizing Design Biases of Datasets and Models

NLPositionality: Characterizing Design Biases of Datasets and Models

TLDR; Design biases in NLP systems, such as performance differences for different populations, often stem from their creator’s positionality, i.e., views and lived experiences shaped by identity and background. Despite the prevalence and risks of design biases, they are hard to quantify because researcher, system, and dataset positionality are often unobserved.

We introduce NLPositionality, a framework for characterizing design biases and quantifying the positionality of NLP datasets and models. We find that datasets and models align predominantly with Western, White, college-educated, and younger populations. Additionally, certain groups such as nonbinary people and non-native English speakers are further marginalized by datasets and models as they rank least in alignment across all tasks.

Figure 1. Carl from the U.S. and Aditya from India both want to use Perspective API, but it works better for Carl than it does for Aditya. This is because toxicity researchers’ positionalities lead them to make design choices that make toxicity datasets, and thus Perspective API, to have positionalities that are Western-centric.

Imagine the following scenario (see Figure 1): Carl, who works for the New York Times, and Aditya, who works for the Times of India, both want to use Perspective API. However, Perspective API fails to label instances containing derogatory terms in Indian contexts as “toxic”, leading it to work better overall for Carl than Aditya. This is because toxicity researchers’ positionalities lead them to make design choices that make toxicity datasets, and thus Perspective API, to have Western-centric positionalities.

In this study, we developed NLPositionality, a framework to quantify the positionalities of datasets and models. Prior work has introduced the concept of model positionality, defining it as “the social and cultural position of a model with regard to the stakeholders with which it interfaces.” We extend this definition to add that datasets also encode positionality, in a similar way as models. Thus, model and dataset positionality results in perspectives embedded within language technologies, making them less inclusive towards certain populations.

In this work, we highlight the importance of considering design biases in NLP. Our findings showcase the usefulness of our framework in quantifying dataset and model positionality. In a discussion of the implications of our results, we consider how positionality may manifest in other NLP tasks.

Figure 2. Overview of the NLPositionality Framework. Collection (steps 1-4): A subset of datasets’ instances are re-annotated via diverse volunteers recruited on LabintheWild. Processing (step 5): We compare the labels collected from LabintheWild with the dataset’s original labels and models’ predictions. Analysis (step 6): We compute the Pearson’s r correlation between the LabintheWild annotations by demographic for the dataset’s original labels and the models’ predictions. We apply the Bonferroni correction to account for multiple hypothesis testing.
Figure 3. Example Annotation. An example instance from the Social Chemistry dataset that was sent to LabintheWild along with the mean of the received annotation scores across various demographics.

NLPositionality: Quantifying Dataset and Model Positionality

Our NLPositionality framework follows a two-step process for characterizing the design biases and positionality of datasets and models. We present an overview of the NLPositionality framework in Figure 2.

First, a subset of data for a task is re-annotated by annotators from around the world to obtain globally representative data in order to quantify positionality. An example of a reannotation is included in Figure 3. We perform reannotation for two tasks: hate speech detection (i.e., harmful speech targeting specific group characteristics) and social acceptability (i.e., how acceptable certain actions are in society). For hate speech detection, we study the DynaHate dataset along with the following models: Perspective API, Rewire API, ToxiGen RoBERTa, and GPT-4 zero shot. For social acceptability, we study the Social Chemistry dataset along with the following models: the Delphi model and GPT-4 zero shot.

Then, the positionality of the dataset or model is computed by calculating the Pearson’s r scores between responses of the dataset or model with the responses of different demographic groups for identical instances. These scores are then compared with one another to determine how models and datasets are biased.

While relying on demographics as a proxy for positionality is limited, we use demographic information for an initial exploration in uncovering design biases in datasets and models.

Table 1: Positionality of NLP datasets and models quantified using Pearson’s r correlation coefficients. # denotes the number of annotations associated with a demographic group. α denotes Krippendorff’s alpha of a demographic group for a task. * denotes statistical significance (p<2.04e−05 after Bonferroni correction). For each dataset or model, we denote the minimum and maximum Pearson’s r value for each demographic category in red and blue respectively.

The demographic groups collected from LabintheWild are represented as rows in the table; the Pearson’s r scores between the demographic groups’ labels and each model and/or dataset are located in the last three and five columns within the social acceptability and toxicity and hate speech sections respectively. For example, in the fifth row and the third column, there is the value 0.76. This indicates Social Chemistry has a Pearson’s r value of 0.76 with English-speaking countries, indicating a stronger correlation with this population.

Experimental Results

Our results are displayed in Table 1. Overall, across all tasks, models, and datasets, we find statistically significant moderate correlations with Western, educated, White, and young populations, indicating that language technologies are WEIRD (Western, Educated, Industrialized, Rich, Democratic) to an extent, though each to varying degrees. Also, certain demographics consistently rank lowest in their alignment with datasets and models across both tasks compared to other demographics of the same type.

Social acceptability. Social Chemistry is most aligned with people who grow up and live in English speaking countries, who have a college education, are White, and are 20-30 years old. Delphi also exhibits a similar pattern, but to a lesser degree. While it strongly aligns with people who grow up and live in English-speaking countries, who have a college education (r=0.66), are White, and are 20-30 years old. We also observe a similar pattern with GPT-4. It has the highest Pearson’s r value for people who grow up and live in English-speaking countries, are college-educated, are White and are between 20-30 years old.

Non-binary people align less to both Social Chemistry, Delphi, and GPT-4 compared to men and women. Black, Latinx, and Native American populations consistently rank least in correlation to education level and ethnicity.

Hate speech detection. Dynahate is highly correlated with people who grow up in English-speaking countries, who have a college education, are White, and are 20-30 years old. Perspective API also tends to align with WEIRD populations, though to a lesser degree than DynaHate. Perspective API exhibits some alignment with people who grow up and live in English-speaking, have a college education, are White, and are 20-30 years old. Rewire API similarly shows this bias. It has a moderate correlation with people who grow up and live in English-speaking countries, have a college education, are White, and are 20-30 years old. A Western bias is also shown in ToxiGen RoBERTa. ToxiGen RoBERTa shows some alignment with people who grow up and live in English-speaking countries, have a college education, are White and are between 20-30 years of age. We also observe similar behavior with GPT-4. The demographics with some of the higher Pearson’s r values in its category are people who grow up and live in English-speaking countries, are college-educated, are White, and are 20-30 years old. It shows stronger alignment with Asian-Americans compared to White people.

Non-binary people align less with Dynahate, PerspectiveAPI, Rewire API, ToxiGen RoBERTa, andGPT-4 compared to other genders. Also, people are Black, Latinx, and NativeAmerican rank least in alignment for education and ethnicity respectively.

What can we do about dataset and model positionality?

Based on these findings, we have recommendations for researchers on how to handle dataset and model positionality:

  1. Keep a record of all design choices made while building datasets and models. This can improve reproducibility and aid others in understanding the rationale behind the decisions, revealing some of the researcher’s positionality.
  2. Report your positionality and the assumptions you make.
  3. Use methods to center the perspectives of communities who are harmed by design biases. This can be done using approaches such as participatory design as well as value-sensitive design.
  4. Make concerted efforts to recruit annotators from diverse backgrounds. Since new design biases could be introduced in this process, we recommend following the practice of documenting the demographics of annotators to record a dataset’s positionality.
  5. Be mindful of different perspectives by sharing datasets with disaggregated annotations and finding modeling techniques that can handle inherent disagreements or distributions, instead of forcing a single answer in the data.

Finally, we argue that the notion of “inclusive NLP” does not mean that all language technologies have to work for everyone. Specialized datasets and models are immensely valuable when the data collection process and other design choices are intentional and made to uplift minority voices or historically underrepresented cultures and languages, such as Masakhane-NER and AfroLM.

To learn more about this work, its methodology, and/or results, please read our paper: https://aclanthology.org/2023.acl-long.505/. This work was done in collaboration with Sebastin Santy and Katharina Reinecke from the University of Washington, Ronan Le Bras from the Allen Institute for AI, and Maarten Sap from Carnegie Mellon University.

Read More