Use custom metadata created by Amazon Comprehend to intelligently process insurance claims using Amazon Kendra

Use custom metadata created by Amazon Comprehend to intelligently process insurance claims using Amazon Kendra

Structured data, defined as data following a fixed pattern such as information stored in columns within databases, and unstructured data, which lacks a specific form or pattern like text, images, or social media posts, both continue to grow as they are produced and consumed by various organizations. For instance, according to International Data Corporation (IDC), the world’s data volume is expected to increase tenfold by 2025, with unstructured data accounting for a significant portion. Enterprises may want to add custom metadata like document types (W-2 forms or paystubs), various entity types such as names, organization, and address, in addition to the standard metadata like file type, date created, or size to extend the intelligent search while ingesting the documents. The custom metadata helps organizations and enterprises categorize information in their preferred way. For example, metadata can be used for filtering and searching. Customers can create the custom metadata using Amazon Comprehend, a natural-language processing (NLP) service managed by AWS to extract insights about the content of documents, and ingest it into Amazon Kendra along with their data into the index. Amazon Kendra is a highly accurate and easy-to-use enterprise search service powered by Machine Learning (AWS). The custom metadata can then be used to enrich the content for better filtering and facet capabilities. In Amazon Kendra, facets are scoped views of a set of search results. For example, you can provide search results for cities across the world, where documents are filtered by a specific city with which they are associated. You could also create facets to display results by a specific author.

Insurance companies are burdened with increasing numbers of claims that they must process. Additionally, the complexity of claims processing is also increasing due to the diverse types of insurance documents involved, and custom entities in each of these documents. In this post, we describe a use case for custom content enrichment for insurance providers. The insurance provider receives payout claims from the beneficiary’s attorney for different insurance types, such as home, auto, and life insurance. In this use case, the documents received by the insurance provider do not contain any metadata that allows searching the content based on certain entities and classes. The insurance provider wants to filter Kendra content based on custom entities and classes specific to their business domain. This post illustrates how you can automate and simplify metadata generation using custom models by Amazon Comprehend. The metadata generated can be customized during the ingestion process with Amazon Kendra Custom Document Enrichment (CDE) custom logic.

Let’s look at a few examples of Amazon Kendra search with or without filtering and facets capabilities.

In the following screenshot, Amazon Kendra provides a search result but there is no option to further narrow down the search results by using any filters.

The following screenshot shows Amazon Kendra search results can be filtered by using different facets like Law Firm, Policy Numbers, created by custom metadata to narrow down the search results.

The solution discussed in this post can easily be applied to other businesses/use-cases as well, such as healthcare, manufacturing, and research.

Solution overview

In this proposed solution, we will 1) classify insurance claims submissions into various classes, and 2) retrieve insurance-specific entities from these documents. When this is complete, the document can be routed to the appropriate department or downstream process.

The following diagram outlines the proposed solution architecture.

Amazon Comprehend custom classification API is used to organize your documents into categories (classes) that you define. Custom classification is a two-step process. First, you train a custom classification model (also called a classifier) to recognize the classes that are of interest to you. Then, you use your model to classify any number of document sets.

Amazon Comprehend custom entity recognition feature is used to identify specific entity types (names of insurance company, names of the insurer, policy number) beyond what is available in the generic entity types by default. Building a custom entity recognition model is a more effective approach than using string matching or regular expressions to extract entities from documents. A custom entity recognition model can learn the context where those names are likely to appear. Additionally, string matching will not detect entities that have typos or follow new naming conventions, while this is possible using a custom model.

Before diving deeper, let’s take a moment to explore Amazon Kendra. Amazon Kendra is a highly accurate and easy-to-use enterprise search service powered by machine learning. It allows users to find the information they need within the vast amount of content spread across their organization, ranging from websites and databases to intranet sites. We will first create an Amazon Kendra index to ingest the documents. While ingesting the data, it’s essential to consider the concept of Custom Data Enrichment (CDE). CDE enables you to enhance the search capability by incorporating external knowledge into the search index. For more information, refer to Enriching your documents during ingestion. In this post, the CDE logic invokes the custom APIs of Amazon Comprehend to enrich the documents with identified classes and entities. Finally, we use the Amazon Kendra search page to show how the metadata enhanced the search capability by adding faceting and filtering capabilities.

The high-level steps to implement this solution are as follows:

  1. Train the Amazon Comprehend custom classifier using training data
  2. Train the Amazon Comprehend custom entity recognition using training data
  3. Create the Amazon Comprehend custom classifier and custom entity recognition endpoints
  4. Create and deploy a Lambda function for post extraction enrichment
  5. Create and populate the Amazon Kendra index
  6. Use the extracted entities to filter searches in Amazon Kendra

We have also provided a sample application in the GitHub repo for reference.

Data security and IAM considerations

With security as the top priority, this solution follows the least privilege permissions principle for the services and features used. The IAM role used by Amazon Comprehend custom classification and custom entity recognition has permissions to access the dataset from the test bucket only. The Amazon Kendra service has access to a specific S3 bucket and Lambda function used to call comprehend APIs. The Lambda function has permissions to call the Amazon Comprehend APIs only. For more information, review section 1.2 and 1.3 in the notebook.

We recommend you do the following in a non-production environment prior to implementing the solution in the production environment.

Train the Comprehend custom classifier using training data

Amazon Comprehend Custom Classification supports two data format types for annotation files:

Since our data is already labeled and stored in CSV files, we will use the CSV file format for the annotation file as an example. We have to provide the labeled training data as UTF-8 encoded text in a CSV file. Do not include a header row in the CSV file. Adding a header row in your file may cause runtime errors. An example to the training data CSV file is as follows:

CLASS, Text of document 1
CLASS, Text of document 2

To prepare classifier training data, refer to Preparing classifier training data. For each row in the CSV file, the first column contains one or more class labels. A class label can be any valid UTF-8 string. We recommend using clear class names that don’t overlap in meaning. The name can include white space, and can consist of multiple words connected by underscores or hyphens. Do not leave any space characters before or after the commas that separate the values in a row.

Next, you will train either using Multi-class mode or Multi-label mode. Specifically, in multi-class mode, classification assigns one class for each document, while in multi-label mode, individual classes represent different categories that aren’t mutually exclusive. In our case we will be using the Multi-Class mode for Plain-text models.

You can prepare separate training and testing datasets for Amazon Comprehend custom classifier training and model evaluation. Or, only provide one dataset for both training and testing. Comprehend will automatically select 10% of your provided dataset to use as testing data. In this example, we are providing separate training and testing datasets.

The following example shows a CSV file containing the class names associated with the various documents.

Document format – Type of Insurance, Content of document 1

When the custom classification model is trained, it can capture different classes of insurance on the documents (Home, Auto, or Life insurance).

Train the Amazon Comprehend custom entity recognizer (NER) using training data

The training dataset for Amazon Comprehend Custom Entity Recognition (NER) can be prepared in one of two different ways:

  • Annotations – Provides a data set that contains the annotated entities for mode training
  • Entity lists (plain text only) – Provides a list of entities and their label type (such as “Insurance company names”) and a set of unannotated documents containing those entities for model training

For more information, refer to Preparing entity recognizer training data.

When training a model using entity list, we need to provide two pieces of information: a list of entity names with their associated custom entity types and a collection of unannotated documents in which the entities appear.

Automatic training requires having two types of information: sample documents and the entity list or annotations. Once the recognizer is trained, you can use it to detect custom entities in your documents. You can quickly analyze a small body of text in real time, or you can analyze a large set of documents with an asynchronous job.

You can prepare separate training and testing datasets for Amazon Comprehend custom entity recognizer training and model evaluation. Or provide only one dataset for both training and testing. Amazon Comprehend will automatically select 10% of your provided dataset to use as testing data. In the below example, we specified the training dataset as Documents.S3Uri under InputDataConfig.

The following example shows a CSV file containing the of entities:

Once the custom entities (NER) model is trained, it will be able to extract the various entities like “PAYOUT“, “INSURANCE_COMPANY“, “LAW_FIRM“, “POLICY_HOLDER_NAME“, “POLICY_NUMBER“.

Create the Amazon Comprehend custom classifier and custom entities (NER) endpoints

Amazon Comprehend’s endpoints make your custom models available for real-time classification. After you create an endpoint, you can make changes to it as your business needs evolve. For example, you can monitor your endpoint utilization and apply auto scaling to automatically set endpoint provisioning to fit your capacity needs. You can manage all your endpoints from a single view, and when you no longer need an endpoint, you can delete it to save costs. Amazon Comprehend support both synchronous and asynchronous options, if real-time classification isn’t required for your use case, you can submit a batch job to Amazon Comprehend for asynchronous data classification.

For this use case, you create an endpoint to make your custom model available for real-time analysis.

To meet your text processing needs, you assign inference units to the endpoint, and each unit allows a throughput of 100 characters per second. You can then adjust the throughput up or down.

Create and deploy a Lambda function for post extraction enrichment

The post-extraction Lambda function allows you to implement the logic to process the text extracted by Amazon Kendra from the ingested document. The post-extraction function we configured implements the code to invoke Amazon Comprehend to detect custom entities and custom classifying the documents from the text extracted by Amazon Kendra, and uses them to update the document metadata, which is presented as facets in an Amazon Kendra search. The function code is embedded in the notebook. The PostExtractionLambda code works as follows:

  • Splits the page text into sections that do not exceed the max byte length limit of the comprehend detect_entities API. (See Limits ).
    NOTE the script uses a naive character length splitting algorithm for simplicity – production use cases should implement overlapping or sentence boundary splits, based on UTF8 byte length.
  • For each section of the text, calls the comprehend real-time endpoints for custom entities and custom classifier to detect the following entity types: [“PAYOUT“, “INSURANCE_COMPANY“, “LAW_FIRM“, “POLICY_HOLDER_NAME“, “POLICY_NUMBER“, “INSURANCE_TYPE“].
  • Filters out detected entities that are below the confidence score threshold. We are using 0.50 threshold which means only entities with confidence 50% and more will be used. This can be tuned based on the use case and requirements.
  • Tracks the frequency count of each entity.
  • Selects only the top N (10) unique entities for each page, based on frequency of occurrence.
  • For document classification, the multi-class classifier assigns only one class for each document. In this Lambda function, the documents will be classified as Auto Insurance, Home Insurance, or Life Insurance.
#The function to read the input text and detect entities in it using Comprehend 
def entity_detector(doc_text):
    #List of JSON objects to store entities
    entity_data = dict()
    #List of observed text strings recognized as categories
    category_text = dict()
    #Frequency of each text string
    text_frequency = dict()
    for et in categories:
        entity_data[ et ] = []
        category_text[ et ] = []
        text_frequency[ et ] = dict()
    
    #Make detect_entities_v2 call in a loop to work with the text limit
    for i in range(0, len(doc_text), compre_text_size):
        try:
            entities = compre.detect_entities(Text=doc_text[i:i+compre_text_size], LanguageCode='en', EndpointArn=endpoint_custom_entity)
            
        except Exception as e:
            logger.info("Exiting - detect_entities_v2 terminated with exception")
            return []
        for e in entities["Entities"]:
            #For each of the recognized entities take only those that have confidence score higher than min_score, 
            #are printable, dont contain quotes and are previously unseen
            if ((e["Score"] > min_score) and (e["Text"].isprintable()) and (not '"' in e["Text"]) and (not e["Text"].upper() in category_text[e["Type"]])):
                #Append the text to entity data to be used for a Kendra custom attribute
                entity_data[e["Type"]].append(e["Text"])
                #Keep track of text in upper case so that we don't treat the same text written in different cases differently
                category_text[e["Type"]].append(e["Text"].upper())
                #Keep track of the frequency of the text so that we can take the text with highest frequency of occurrance
                text_frequency[e["Type"]][e["Text"].upper()] = 1
            elif (e["Text"].upper() in category_text[e["Type"]]):
                #Keep track of the frequency of the text so that we can take the text with highest frequency of occurrance
                text_frequency[e["Type"]][e["Text"].upper()] += 1
    #The Kendra attribute metadata JSON object to be populated
    metadata = dict()
    for et in categories:
        metadata[et] = []
        #Take at most elimit number of recognized text strings having the highest frequency of occurrance
        el = [pair[0] for pair in sorted(text_frequency[et].items(), key=lambda item: item[1], reverse=True)][0:elimit]
        for d in entity_data[et]:
            if (d.upper() in el):
                metadata[et].append(d)
    for md in metadata:
        metaUL.append({
            "name": md,
            "value": {
                "stringListValue": metadata[md]
            }
        })
    return metaUL

Note that as of this writing, CDE only supports synchronous calls or if it has to be asynchronous, then an explicit wait loop is needed. For post extraction Lambda the max execution time is 1 min. The Lambda custom logic can be changed based on the requirements that fit your use case.

Create and populate the Amazon Kendra index

In this step, we will ingest the data to the Amazon Kendra index and make it searchable for the users. During the ingestion, we will use the Lambda function created in the previous step as a post extraction step and the Lambda function will call the custom classification and custom entity recognition (NER) endpoints to create the custom metadata fields.

The high-level steps to implement this solution are as follows:

  1. Create Amazon Kendra Index.
  2. Create Amazon Kendra Data source – There are different data sources which can be used to ingest dataset. In this post we are using an S3 bucket.
  3. Create Facets ­ Law_Firm, Payout, Insurance_Company, Policy_Number, Policy_Holder_Name, Insurance_Type with string type as ‘STRING_LIST_VALUE’.
  4. Create Kendra CDE and point it to the post-extraction Lambda function previously created.
  5. Perform the sync process to ingest the dataset.

Once completed, you can populate the index with the insurance data, using the Kendra CDE with post extraction lambda, you can filter searches based on the custom entity types and custom classification as custom metadata fields.

Use the extracted entities to filter searches in Kendra

Now the index is populated and ready to use. In the Amazon Kendra console, choose Search Indexed Content under Data Management and do the following.

Query the following: List of insurance failed due to late filing?

The results show an answer from the policy type – HOME INSURANCE and brings text_18 and text_14 as the top results.

Choose “Filter search results” on the left. Now you will see all the Entity types and classification values extracted using Comprehend, and for each entity value and classification you will see the number of matching documents.

Under INSURANCE_TYPE choose “Auto-Insurance”, and then you will get an answer from text_25 file.

Note that your results may vary slightly from the results shown in the screenshot.

Try searching with your own queries, and observe how the entities and document classification identified by Amazon Comprehend quickly allows you to:

  • See how your search results are distributed across the categories.
  • Narrow your search by filtering on any of the entity/classification values.

Clean up

After you have experimented with the search and tried the notebook provided in the Github repository, delete the infrastructure you provisioned in your AWS account to avoid any unwanted charges. You can run the cleanup cells in the notebook. Alternatively, you can delete the resources manually through the AWS console:

  • Amazon Kendra Index
  • Comprehend custom classifier and custom entity recognition (NER) endpoints
  • Comprehend custom classifier and custom entity recognition (NER) custom models
  • Lambda function
  • S3 bucket
  • IAM roles and policies

Conclusion

In this post, we showed how Amazon Comprehend custom entities and custom classifier enables Amazon Kendra search powered by CDE feature to help end-users perform better searches on the structured/unstructured data. The custom entities of Amazon Comprehend and custom classifier makes it very useful for different use cases and various domain specific data. For more information about how to use Amazon Comprehend, refer to Amazon Comprehend developer resources and for Amazon Kendra, refer to Amazon Kendra developer resources.

Give this solution a try for your use case. We invite you to leave your feedback in the comments sections.


About the Authors

Amit Chaudhary is a Senior Solutions Architect at Amazon Web Services. His focus area is AI/ML, and he helps customers with generative AI, large language models, and prompt engineering. Outside of work, Amit enjoys spending time with his family.

Yanyan Zhang is a Senior Data Scientist in the Energy Delivery team with AWS Professional Services. She is passionate about helping customers solve real problems with AI/ML knowledge. Recently, her focus has been on exploring the potential of Generative AI and LLM. Outside of work, she loves traveling, working out and exploring new things.

Nikhil Jha is a Senior Technical Account Manager at Amazon Web Services. His focus areas include AI/ML, and analytics. In his spare time, he enjoys playing badminton with his daughter and exploring the outdoors.

Read More

Foundational data protection for enterprise LLM acceleration with Protopia AI

Foundational data protection for enterprise LLM acceleration with Protopia AI

This post is written in collaboration with Balaji Chandrasekaran, Jennifer Cwagenberg and Andrew Sansom and Eiman Ebrahimi from Protopia AI.

New and powerful large language models (LLMs) are changing businesses rapidly, improving efficiency and effectiveness for a variety of enterprise use cases. Speed is of the essence, and adoption of LLM technologies can make or break a business’s competitive advantage. AWS is especially well suited to provide enterprises the tools necessary for deploying LLMs at scale to enable critical decision-making.

In their implementation of generative AI technology, enterprises have real concerns about data exposure and ownership of confidential information that may be sent to LLMs. These concerns of privacy and data protection can slow down or limit the usage of LLMs in organizations. Enterprises need a responsible and safer way to send sensitive information to the models without needing to take on the often prohibitively high overheads of on-premises DevOps.

The post describes how you can overcome the challenges of retaining data ownership and preserving data privacy while using LLMs by deploying Protopia AI’s Stained Glass Transform to protect your data. Protopia AI has partnered with AWS to deliver the critical component of data protection and ownership for secure and efficient enterprise adoption of generative AI. This post outlines the solution and demonstrates how it can be used in AWS for popular enterprise use cases like Retrieval Augmented Generation (RAG) and with state-of-the-art LLMs like Llama 2.

Stained Glass Transform overview

Organizations seek to retain full ownership and control of their sensitive enterprise data. This is a pillar of responsible AI and an emerging data protection and privacy requirement above and beyond basic security and legal guarantees of LLM providers.

Although enterprise business units want to utilize LLMs for various tasks, they are also concerned about trade secrets, intellectual property, and other proprietary information leaking through data sent to these models. At the same time, enterprise security, compliance, data management, and information offices are apprehensive of exposing or leaking plain text customer information or other regulated data outside of the enterprise. AWS and Protopia AI are partnering to deliver the critical component that solves this common enterprise customer need.

Protopia AI’s Stained Glass Transform (SGT) solves these challenges by converting unprotected enterprise data to a randomized re-representation, referred to as RmoRed data, as shown in the following figure. This representation is a stochastic embedding of the original data, preserving the information the target LLM needs to function without exposing sensitive prompts or queries, context, or fine-tuning data. This re-representation is a one-way transformation that can’t be reversed, ensuring holistic privacy of enterprise data and protection against leaking plain text sensitive information to LLMs. SGT’s applicability is not limited to language models. Randomized re-representations can also be generated for visual and structured data. The name Stained Glass Transform is rooted in the visual appearance of randomized re-representations of visual data that can resemble viewing the data through stained glass, as demonstrated in this US Navy use case.

SGT works with state-of-the-art LLMs such as Llama 2. The following figure shows an example of applying SGT to a Llama 2 model for instruction following while adding a layer of protection to the instruction and context. The left side of the figure shows an example of a financial document as context, with the instruction asking the model to summarize the document. On the bottom left, the response generated by Llama 2 when operating on the raw prompt is shown. When using SGT, the embeddings associated with this prompt are transformed on the client side into stochastic embeddings, as described in more detail later in this post. The bottom right shows Llama 2 can still generate a correct response if the RmoRed data (post-transformation embeddings) are sent instead of the unprotected embeddings. The top right shows that if the RmoRed data leaked, a reconstruction of the original prompt would result in unintelligible text.

To create an SGT for a given model such as Llama 2, Protopia AI provides a lightweight library called the Stained Glass SDK, which is an extension of PyTorch. As shown in the following figure, after an SGT is created, it can be integrated into deployment pipelines in multiple ways. The transform that is created from the SDK can be deployed locally, in a hybrid setup, or completely on the cloud. This is possible because SGT is designed to be a lightweight process requiring very little compute resources and as such has minimal impact on the inference critical path. Another key evaluation is retention of model accuracy using re-represented data. We observe that across different data types and model variations, accuracy is retained within desirable tolerance limits when using re-represented data.

These options for deployment and maintaining the accuracy allows for confident adoption of SGT by all the stakeholders within an enterprise organization. To further protect the output of the LLM, Protopia AI can encode query outputs to a representation whose decoder is only available to the enterprise data owner.

Solution overview

The previous section described how you can use Stained Glass Transform in a variety of architectures. The following figure details the steps involved in creating, deploying, and using SGT for LLMs:

  • SGT creation – The team that trains the baseline LLM foundation model (providers of proprietary LLMs, cloud service provider, or enterprise ML teams creating their own LLMs) runs Protopia AI’s Stained Glass SDK software without altering their existing practices for training and deploying the LLM. After the foundation model training is complete, the SDK runs as an optimization pass over the language model to compute the SGT. This optimization pass is delivered through an extension to PyTorch. The SDK wraps the foundation model and mathematically discovers a unique Stained Glass Transform for that LLM. Further details of the underlying math can be found in the accompanying whitepaper. Note that because the team training the LLM itself is also running the Stained Glass SDK, there is no exposure or sending of model weights that is necessary for this step to be completed.
  • SGT release and deployment – The SGT that is output from the earlier optimization step is deployed as part of the data pipeline that feeds the trained LLM. As described in the previous section, the SGT sits on the enterprise client side.
  • SGT use – The SGT runs on the prompts created by the enterprise and generates protected prompts, which are sent to the deployed LLM. This enables the enterprise to retain ownership of their sensitive queries and context. Using Protopia AI Stained Glass, the unprotected sensitive data does not leave the enterprise’s site or trust zone.

You can use the Stained Glass SDK to create an SGT in multiple ways. For example, you can use the Stained Glass SDK in self-managed machine learning (ML) environments with Amazon Elastic Kubernetes Service (Amazon EKS) for training and inferencing or within Amazon Elastic Compute Cloud (Amazon EC2) directly. Another option is it can run within Amazon SageMaker to create an SGT for a given trained model. Transforming the input for deployment during inference from the client is independent of the chosen deployment implementation.

The following figure illustrates a possible implementation in a self-managed ML environment where training a Stained Glass Transform is performed on Amazon EKS.

In this workflow, a container is created using the Stained Glass SDK and deployed to Amazon Elastic Container Registry (Amazon ECR). This container is then deployed on Amazon EKS to train an SGT that is saved to Amazon Simple Storage Service (Amazon S3). If you’re using Amazon EC2, you can train a transformation directly on your instance as part of your ML setup. The Stained Glass SDK can run on a variety of instance types, including Amazon P5, P4, or G5 instance families, based on your base LLM requirements. After the LLM is deployed to be used for inference, the client application uses the created SGT, which is a lightweight operation, to transform prompts and context before sending them to the LLM. By doing so, only transformed data is exposed to the LLM, and ownership of the original input is retained on the client side.

The following figure demonstrates how you can train a transform and run inferencing on SageMaker.

The creation of the SGT follows a similar path as the Amazon EKS setup by ingesting the training data from Amazon S3, training an SGT on a container, and saving it to Amazon S3. You can use the Stained Glass SDK in your existing SageMaker setup with Amazon SageMaker Studio, SageMaker notebooks, and a SageMaker training job. The LLM is hosted as a SageMaker endpoint that is accessible by the client application. The inferencing for the client application is also identical to the Amazon EKS setup, except for what is serving the model.

Randomized re-representations to protect LLM prompts and fine-tuning data

This section covers a variety of use cases demonstrating how randomized re-representation protects LLM prompts. The examples illustrate major implications for enterprise generative AI efforts: opening new doors to AI use cases, accelerating speed to market while properly protecting enterprise data, and retaining ownership of the sensitive data required for use in LLM prompts.

RAG use case

A popular enterprise use case for LLMs is Retrieval Augmented Generation (RAG). The following figure shows an illustrative example where the prompts and sources are protected using Stained Glass. The left side of the figure shows the unprotected prompts and source information. In an enterprise implementation of RAG, the sources could include sensitive information such as enterprise trade secrets, intellectual property, or financial information. The right side shows the best possible reconstruction in human readable text from the RmoRed prompts created by the SGT.

We can observe that even in the best possible reconstruction, the information is completely obfuscated. However, the response from the model with and without the transformation is the same, with pointers to the original source documents, thereby preserving the accuracy of both the question and source documents while performing this popular enterprise use case.

Broad applicability across LLMs and languages

One of the highlights of the Stained Glass SDK is that it’s highly resilient to model advancements and adaptable to state-of-the-art models such as Llama 2. The following figure shows an SGT that was created on a Llama 2 LLM that was previously fine-tuned for working with Japanese text. This example further illustrates that SGTs can be created and applied for any language and that even inputs for fine-tuned models can be transformed. The general applicability of SGT is driven by the robust foundation of the Stained Glass SDK being model- and data-agnostic.

Protecting fine-tuning data as well as prompts

Stained Glass Transform is not limited solely to protecting data at inference time; it can also protect data used to fine-tune a foundation model. The process for creating the transformation for fine-tuning datasets is the same as that explained in the solution architecture section earlier in this post. The transformation is created for the foundation model to be fine-tuned without accessing the fine-tuning data. After the SGT has been created and trained for the foundation model, the fine-tuning dataset is transformed to randomized re-representations that will then be used to fine-tune the foundation model. This process is explained in more detail in the accompanying whitepaper.

In the following example, an enterprise customer needed to fine-tune an existing model for network log anomaly detection. They used Stained Glass to transform the sensitive fine-tuning dataset to randomized embeddings, which were used to fine-tune their foundation model. They found that the detection model that was fine-tuned on the transformed representations performed with almost identical accuracy compared to the hypothetical scenario of fine-tuning the foundation model on the unprotected fine-tuning dataset. The following table shows two examples of plain text data records from the fine-tuning dataset and a reconstruction to text of those same data records from the fine-tuning dataset.

Under the hood of Stained Glass Transform for LLMs

When applied to computer vision, SGT operates on input pixel features, and for LLMs, it operates at the embedding level. To highlight how Stained Glass Transform works, imagine the prompt embeddings as a matrix, as illustrated on the left of the following figure. In each entry, there is a deterministic value. This value can be mapped to the original data, exposing the unprotected prompt. Stained Glass Transform converts this matrix of deterministic values to a matrix whose elements are a cloud of possibilities.

The transformed prompt is rendered by sampling noise from probability distributions defined by the SGT and adding the sampled noise to the deterministic embeddings, which randomizes the original prompt values irreversibly. The model still understands the randomized re-represented prompt at the mathematical level and can carry out its task accurately.

Conclusion

This post discussed how Protopia AI’s Stained Glass Transform decouples raw data ownership and protection from the ML operations process, enabling enterprises to retain ownership and maintain privacy of sensitive information in LLM prompts and fine-tuning data. By using this state-of-the-art data protection for LLM usage, enterprises can accelerate adoption of foundation models and LLMs by worrying less about exposure of sensitive information. By safely unlocking the value in real enterprise data, organizations can enable the promised efficiencies and business outcomes of LLMs more efficiently and quickly. To learn more about this technology, you can find further reading in the accompanying whitepaper and connect with Protopia AI to get access and try it on your enterprise data.

About Protopia AI

Protopia AI is a leader in data protection and privacy-preserving AI/ML technologies based in Austin, Texas, and specializes in enabling AI algorithms and software platforms to operate without the need to access plain text information. Over the past 2 years, Protopia AI has successfully demonstrated its flagship Stained Glass Transform product across a variety of ML use cases and data types with the US Navy, leading financial services, and global technology providers.

Protopia AI works with enterprises, generative AI and LLM providers, and Cloud Service Providers (CSPs) to enable maintaining ownership and confidentiality of enterprise data while using AI/ML solutions. Protopia AI has partnered with AWS to deliver a critical component of data protection and ownership for enterprise adoption of generative AI, and was one of 21 startups selected for the inaugural AWS Generative AI Accelerator in 2023.


About the authors

Balaji Chandrasekaran is the VP for Go-to-Market & Customer Enablement at Protopia AI, works closely with clients to leverage AI in their business while prioritizing data protection and privacy. Prior to Protopia AI, Balaji was the Product Lead for AI Solutions at Infor, developing value-centric products while acting as a trusted partner for enterprise customers across diverse industries. Outside work, he enjoys music, hiking, and traveling with family.

Jennifer Cwagenberg leads the engineering team at Protopia AI and works to ensure that the Stained Glass technology meets the needs of their customers to protect their data. Jennifer has prior experience with security working at Toyota in their Product Cybersecurity Group, managing Cloud workloads at N-able, and responsible for data at Match.com.

Andrew Sansom is an AI Solutions Engineer at Protopia AI where he helps enterprises use AI while preserving private and sensitive information in their data. Prior to Protopia AI, he worked as a Technical Consultant focused on enabling AI solutions for clients across many industries including Finance, Manufacturing, Healthcare, and Education. He also taught Computer Science and Math to High School, University, and Professional students.

Eiman Ebrahimi, PhD, is a co-founder and the Chief Executive Officer of Protopia AI. Dr. Ebrahimi is passionate about enabling AI to enrich the human experience across different societal and industry verticals. Protopia AI is a vision for enhancing the lens through which AI observes the necessary and quality data it needs while creating novel capabilities for safeguarding sensitive information. Prior to Protopia AI, he was a Senior Research Scientist at NVIDIA for 9 years. His work at NVIDIA research aimed to solve problems of accessing massive datasets in ML/AI. He also co-authored peer-reviewed publications on how to utilize the power of thousands of GPUs to make training large language models feasible.

Rohit Talluri is a Generative AI GTM Specialist at Amazon Web Services (AWS). He is partnering with top generative AI model builders, strategic customers, key AI/ML partners, and AWS Service Teams to enable the next generation of artificial intelligence, machine learning, and accelerated computing on AWS. He was previously an Enterprise Solutions Architect, and the Global Solutions Lead for AWS Mergers & Acquisitions Advisory.

Read More

Exploring LLMs’ potential to help facilitators enhance online healthcare communities

Exploring LLMs’ potential to help facilitators enhance online healthcare communities

This research paper was presented at the Fourth African Human Computer Interaction Conference (opens in new tab) (AfriCHI 2023), the pan-African conference on interactive digital technology design.

AfriCHI 2023 logo to the left of accepted paper

Online health communities can be a lifeline for people seeking healthcare support, enabling them to share experiences, ask questions, and receive help. These are particularly vital in low-and-middle-income countries (LMICs), where access to quality healthcare can be limited and online health communities function as a doorway for receiving expert advice and accessing trustworthy content. One platform that is widely used for this purpose is WhatsApp due to its popularity and ability to host facilitated communities for specific groups, like patients affiliated with a particular clinic.

For all their benefits, online health communities also face challenges due to the myriad responsibilities and equal lack of support for facilitators, who must answer questions, respond to ongoing discussions, and review reports. Facilitation requires staying abreast of ongoing chat threads, verifying facts, and generally just being available. Given that most healthcare professionals already have a full day of in-person healthcare work, facilitation occurs during lunch breaks, evenings, and even mornings before the workday begins.

Our paper, “Can Large Language Models Support Medical Facilitation Work? A Speculative Analysis (opens in new tab),” presented at AfriCHI 2023 (opens in new tab), discusses research conducted in collaboration with the University of Washington, where we examined facilitated WhatsApp groups created for young people living with HIV in informal settlements in Kenya. Facilitation involved moderating chats, providing emotional support, conducting administrative tasks, sharing information, and resolving conflicts. Because many discussions occurred at night, facilitators struggled to keep up with the chats, often missing important questions or responding to them a few days after they were posted. Facilitators also found it difficult to defuse tensions, which occurred from time to time.

LLMs’ potential in supporting online health facilitators

To help resolve these challenges, we explored ways large language models (LLMs) could potentially support facilitators, for example, by flagging important messages and helping with content authoring. LLMs’ language translation capabilities and capacity to answer questions and summarize information made them great candidates for online heath communities, understanding that facilitators should always verify the content that LLMs create. To explore their potential, we tested their application on chat log data. We concluded that an LLM-enabled copilot could help facilitators in several ways, such as:

  • Coproducing compelling content: LLMs could help facilitators create educational and informative content for group members. They can summarize frequently asked questions, patient stories, and best practices for managing chronic conditions.
  • Summarizing messages: LLMs could summarize long discussions in the chat, making it easier for facilitators to get up to date and identify important issues. Summarization can also help participants who need to be offline and might otherwise miss important information.
  • Providing recommendations: LLMs could help facilitators conduct research when answering questions. However, facilitators must exercise due diligence and verify any suggestions the LLM makes.
  • Performing sentiment analysis: LLMs could flag potential trouble spots in messages, such as declines in mental health, tension among participants, harmful advice, and misinformation.
  • Assigning badges: LLMs could assign badges to group members in recognition for participating in discussions, completing tasks, or achieving milestones. This could help to motivate and engage members.

Importance of human facilitation

While LLMs offer numerous potential benefits for healthcare facilitation, it’s important to consider their challenges and limitations. We strongly believe that LLMs should be used to augment, not replace, human facilitation. One crucial reason is that this technology cannot provide the emotional support essential in these groups. Another challenge involves the potential for bias and harm. LLMs are trained on massive datasets of text and code, which might contain harmful biases and stereotypes. Additionally, LLMs can produce errors when dealing with content from outside the training data, such as cultural backgrounds that are underrepresented in this data.

Our research shows that the benefits these groups provide lie beyond merely providing information. Their success, gauged by participation levels, perceived value by members, and adherence to medical protocols, is attributed not only to the facilitators’ expertise but also to their empathy, humor, and care. These are human qualities that LLMs cannot replace.

a medical professional in scrubs holding a stethoscope posing for the camera

Looking forward

When used to augment and support existing medical professionals, LLMs show promise in healthcare solutions, such as those for patients with chronic diseases in LMICs. We recommend that future research and practice in this area prioritize the following:

  • Developing and testing LLM-enabled copilot systems that are tailored to specific patient populations and online health communities.
  • Ensuring that design supports medical professionals, taking special care to preserve their agency.
  • Designing copilot systems so that users can easily evaluate output as well as identify and correct erroneous content.
  • Developing guidelines and regulations to ensure quality and safety when using LLMs for healthcare purposes.

Overall, the use of LLMs to support the work of online health community facilitation is an exciting new area of research. By making the facilitators’ tasks easier, they can pave the way for groups supporting more patients, improve adherence to medical protocols, and enhance well-being. While our research focused on a specific type of WhatsApp group, the potential of LLMs reaches far beyond. These models have the potential to support facilitators of online health communities across a diverse range of platforms.

The post Exploring LLMs’ potential to help facilitators enhance online healthcare communities appeared first on Microsoft Research.

Read More

Collaborators: Teachable AI with Cecily Morrison and Karolina Pakėnaitė

Transforming research ideas into meaningful impact is no small feat. It often requires the knowledge and experience of individuals from across disciplines and institutions. Collaborators, a Microsoft Research Podcast series, explores the relationships—both expected and unexpected—behind the projects, products, and services being pursued and delivered by researchers at Microsoft and the diverse range of people they’re teaming up with.

In this episode, Gretchen Huizinga speaks with Cecily Morrison (opens in new tab), MBE, a Senior Principal Research Manager at Microsoft Research, and Karolina Pakėnaitė (opens in new tab), who also goes by Caroline, a PhD student and member of the citizen design team working with Morrison on the research project Find My Things. An AI phone application designed to help people who are blind or have low vision locate their personal items, Find My Things is an example of a broader research approach known as Teachable AI. Morrison and Pakėnaitė explore the Teachable AI goal of empowering people to make an AI experience work for them. They also discuss how “designing for one” when it comes to inclusive design leads to innovative solutions and what they learned about optimizing these types of systems for real-world use (spoiler: it’s not necessarily more or higher-quality data).

Transcript

[TEASER] [MUSIC PLAYS UNDER DIALOGUE]

CECILY MORRISON: One of the things about Teachable AI is that it’s not about the AI system. It’s about the relationship between the user and the AI system. And the key to that relationship is the mental model of the user. They need to make good judgments about how to give good teaching examples if we want that whole cycle between user and AI system to go well.

[TEASER ENDS]

GRETCHEN HUIZINGA: You’re listening to Collaborators, a Microsoft Research Podcast showcasing the range of expertise that goes into transforming mind-blowing ideas into world-changing technologies. I’m Dr. Gretchen Huizinga.

[MUSIC FADES]

Today I’m talking to Dr. Cecily Morrison, MBE, a Senior Principal Research Manager at Microsoft Research, and Karolina Pakėnaitė, a PhD student and a participant on the citizen design team for the Teachable AI research project Find My Things. Cecily and Karolina are part of a growing movement to bring accessible technologies to people with different abilities by closely collaborating with those communities during research and development. Cecily, Karolina, welcome to Collaborators!


CECILY MORRISON: Thank you, Gretchen.

KAROLINA PAKĖNAITĖ: Yeah, thank you.

HUIZINGA: Before we hear more about Find My Things, let’s get to know the both of you. And, Cecily, I’ll start with you. Give us a brief overview of your background, including your training and expertise, and what you’re up to in general right now. We’ll get specific shortly, but I just want to have sort of the umbrella of your raison d’être, or your reason for research being, as it were.

MORRISON: Sure, I’m a researcher in human-computer interaction with a very specific focus on AI and inclusion. Now this for me brings together an undergraduate degree in anthropology—understanding people—a PhD in computer science—understanding computers and technology—as well as a life role as a parent of a disabled child. And I’m currently leading a team that’s really trying to push the boundaries of what’s possible in human-AI interaction and motivated by creating technologies that lead us to a more inclusive world.

HUIZINGA: As a quick follow-up, Cecily, for our non-UK listeners, tell us what MBE stands for and why you were awarded this honor.

MORRISON: Yes, MBE. I also had to look it up when I first received the, uh, the award. [LAUGHTER] It stands for Member of the British Empire, and it’s part of the UK honor system. My MBE was awarded in 2020 for services to inclusive design. Now much of my career at Microsoft Research has been dedicated to innovating inclusive technology and then ensuring that it gets into the hands for those whom we made it for.

HUIZINGA: Right. Was there a big ceremony?

MORRISON: Things were a little bit different during the, the COVID times, but I did have the honor of going to Buckingham Palace to receive the award. And it was a wonderful time bringing my mother and my manager, uh, the important women around me, who’ve made it possible for me to do this work.

HUIZINGA: That’s wonderful. Well, Karolina, let’s talk to you for a minute here. You’re one of the most unique guests we’ve ever had on this podcast. Tell us a bit about yourself. Obviously, we’d like to know where you’re studying and what you’re studying, but this would be a great opportunity to share a little bit about your life story, including the rare condition that brought you to this collaboration.

PAKĖNAITĖ: Thank you so much again for having me. What an amazing opportunity to be here on the podcast. So I’m a PhD student at the University of Bath looking into making visual photographs accessible through text. Maybe you can tell from my speech that I am deaf-blind. So I got diagnosed with Usher syndrome type 2A at the age of 19, which means that I was born hard of hearing but then started to lose sight just around my early 20s. It has been a journey accepting this condition, but it’s also brought me some opportunities like becoming part of this collaboration for Microsoft Research project.

HUIZINGA: Karolina, a quick follow-up for you. Because of the nature of your condition, you’ve encountered some unique challenges, um, one of which made the news a couple of years ago. Can you talk a little bit about how perceptions about people with varying degrees of disability can cause skepticism, both from others and in fact, as you’ve pointed out, yourself? What can we learn about this here?

PAKĖNAITĖ: Yeah, so I have experienced many misunderstandings, and I know I’m not alone. So I have tunnel vision, a progressive condition at the stage where my specialists have registered me as blind instead of partially sighted. My central sight is still excellent, so that means I can still make eye contact, read books, do photography. Some people even tell me that I don’t look blind, but what does that even mean? [LAUGHTER] So since my early 20s, I became very, very clumsy. I stepped over children, walked into elderly, stepped on cat tails, experienced too many near-miss car accidents. So my brain no longer processes the world in the same way as before. But, yeah, for the longest time in my sight-loss journey, I felt like I had imposter syndrome, being completely skeptical about my own diagnosis despite the clumsy experiences, extensive eye tests, and genetic confirmation. I think the major reason is because of a lack of representation of the blind community in the media. Blindness is not black and white. Statistically, most of us have some remaining vision. Disability is not about having a certain look. This also applies to people with some form of visual impairment. I love it, how I can … how there’s so many more new Instagrammers and YouTubers who are just like me, but I still think there is a long way to go before having disability representation becoming a norm for greater understanding and inclusivity.

HUIZINGA: You know, I have to say, this is a great reminder that there is a kind of a spectrum of ability, and that we should be gracious to people as opposed to critical of them. So, um, thank you so much for that understanding that you bring to this, Karolina. Before we get into specifics of this collaboration—and that’s what we’re here for on this podcast—I think the idea of Teachable AI warrants some explication. So, Cecily, what is Teachable AI, and why is it an important line of research, including its applications in things like Find My Things?

MORRISON: Gretchen, that’s a great question. Teachable AI enables users to provide examples or higher-level constraints to an AI model in order to personalize that AI system to meet their own needs. Now most people are familiar with personalization. Our favorite shopping site or entertainment service offers us personalized suggestions. But we don’t always have a way to shape those suggestions. So you can imagine it’s pretty annoying, for example, if you keep being offered nappies by your favorite shopping service because you’ve been buying them for a friend, but actually, you don’t have or even plan to have a baby. So now Teachable AI gives, us—the user—agency in personalizing that AI system to make a choice about what are the things you want to be reflected in yourself, your identity, when you work or interact with that AI system? Now this is really important for AI systems that enable inclusion. So if we consider disability to be a mismatch between a person’s capabilities and their environment, then AI has a really significant role to play in reducing that mismatch. However, as we were working on this, we soon discovered that the number of potential mismatches between a person and their environment is incredibly large. I mean, it’s like the number of stars, right.

HUIZINGA: Right, right.

MORRISON: Because disability is a really heterogeneous group. But then we say, oh, well, let’s just consider people who are blind. Well, as Karolina has just shown us, um, even people who are blind are very, very diverse. So there are people with different amounts of vision or different types of vision. People who have different … experience the world with vision or without. People can lose their vision later in life. They can be born blind. People have different personalities. Some people are happy to go with whatever. Some people not so much.

HUIZINGA: Right.

MORRISON: People are from different cultures. Maybe they, they are used to being in an interdependent context. Other people might have intersecting disabilities like deaf-blindness and have, again, its own set of needs. So as we got into building AI for accessibility and AI for inclusion more generally, we realized that we needed to figure out how can we make AI systems work for individuals, not quote-unquote “people with disabilities”? So we focused on Teachable AI so that each user could shape the AI system to work for their own needs as an individual in a way that they choose, not somebody else. So Find My Things is a simple but working example of a Teachable AI system. And in this example, people can personalize a object finder or object detector for the personal items that matter to them. And they can do this by taking four videos of that personal item that’s important to them and then training, on their phone, a little model that will then recognize those items and guide them to those items. So you might say, well, recognizing objects with phone, we can do that now for a couple of years. And that’s very true. But much of what’s been recognizable wasn’t necessarily very helpful for people who are blind and low vision. Now it’s great if you can recognize doors, chairs, but carnivores and sombrero hats? [LAUGHTER] You know, perhaps this is less handy on a day-to-day basis. But your own keys, your friend’s front door, your guide cane, maybe even the TV remote that somebody’s always putting somewhere else. I mean these are the things that people want to keep track of. And each person has their own set of things that they want. So the Find My Things research prototype allows people to choose what they want to train or to teach to their phone and then be able to teach it and to find those things.

HUIZINGA: OK, so just to clarify, I have my phone. I’ve trained it to find certain objects that I want to find. What’s the mechanism that I use to say, what, you know … do you just say, “Find my keys,” and your phone leads you there through beeps or, you know, Marco Polo? Closer? Warmer?

MORRISON: Sure, how, how does it work?

HUIZINGA: Yeah!

MORRISON: Well, that’s a great question. So you then have a list of things that you can find. So for most people, there’s five or 10 things that are pretty important to them. And then you would find that … then you would scan your phone around the room. And you need to be within sort of 4 to 6 meters of something that you want to find. So if, if it’s in your back studio in the garden, it’s not going to find it. It’s not telepathic in that regard. It’s a computer vision system using vision. If it’s underneath your sofa, you probably won’t find it either. But we found that with all things human-AI interaction, we, we rely on the interaction between the person and the AI to make things work. So most people know where things might be. So if you’re looking for a TV remote, it’s probably not in the bathtub, right? It’s probably going to be somewhere in the living room, but, you know, your, your daughter or your brother or your housemate might have dropped it on the floor; they might have accidentally taken it into the kitchen. But you probably have some good ideas of where that thing might be. So this is then going to help you find it a little bit faster so you don’t need to get on your hands and knees and feel around to where it is.

HUIZINGA: Gotcha. The only downside of this is “find my phone,” which would help me find my things! [LAUGHTER] Anyway, that’s all …

MORRISON: Well, well, I think Apple has solved that one.

HUIZINGA: They do! They have, they have an app. Find My phone. I don’t know how that works. Well, listen, let’s talk about the collaboration a bit and, and talk about the meetup, as I say, on how you started working together. I like to call this bit “how I met your mother” because I’m always interested to hear each side of the collaboration story. So, Karolina, why don’t you take the lead here and then Cecily can fill in the blanks from her side on how you got together.

PAKĖNAITĖ: Um, yeah, so I found this opportunity to join this collaboration for Microsoft Research project as a citizen designer through an email newsletter from a charity, VICTA. From the newsletter, it looked like it was organized in a way where you were way more than just a participant for another research project. It looked like an amazing opportunity to actually get some experiences and skills. So gaining just as much as giving. So, yeah, I thought that I shouldn’t miss out.

HUIZINGA: So you responded to the email, “Yeah, I’m in.”

PAKĖNAITĖ: Yeah.

HUIZINGA: Cecily, what, what was going on from your side? How did you put this out there with this charity and bring this thing together?

MORRISON: So VICTA is a fantastic charity in the UK that works with, uh, blind and low vision young people up to the age of 30. And they’re constantly trying to bring educational and meaningful experiences to the people that they serve. And we thought this would be a great moment of collaboration where we could bring an educational experience about learning how to do design and they could help us reach out to the people who might want to learn about design and might want to be part of this collaboration.

HUIZINGA: So Karolina was one of many? How many other citizen designers on this project did you end up with?

MORRISON: Oh, that’s a great question. We had a lot of interest, I do have to say, and from there, we selected eight citizen designers from around the UK who were willing to make the journey to Cambridge and work with us over a period of almost six months. People came up to us about monthly, although we did some virtual ones, as well.

HUIZINGA: Well, Cecily, let’s talk about this idea of citizen designers. I, I like that term very much. Inclusive design isn’t new in computer-human interaction circles—or human-computer interaction circles—and you already operate on the principle of “nothing about us without us,” so tell us how the concept of citizen designer is different and why you think citizen designers take user input to another level.

MORRISON: Sure, I think citizen designer is a really interesting concept and one that we, we need more of. But let me first start with inclusive design and how that brings us to think about citizen designers. So inclusive design has been a really productive innovation tool because it brings us unusual constraints to the design problem. Within the Microsoft Inclusive Design toolkit, we refer to this as “designing for one.” And once you’ve got this very novel design that emerges, we then optimize it to work for everyone, or we extend it to many. So this approach really jogs the mind to radical solutions. So let me give you just one example. In years past, we developed a physical coding language to support blind and sighted children to learn to code together. So we thought, ah, OK, sighted children have blocks on a screen, so we’re going to make blocks on a table. Well, our young design team lined up the blocks on the table, put their hands in their lap, and I looked at them and I thought, we failed! [LAUGHTER] So we started again, and we said, OK, show us. And we worked with them to show us what excites the hands. You know, here are kids who live through their hands. You know, what are the shapes? What are the interactions? What are the kinds of things they want to do with their hands? And through this, we developed a completely different base idea and design, and we found that it didn’t just excite the hands of children who are blind or low vision, but it excited the hands of all children. They had brought us their expertise in thinking about the world in a different way. And so now we have this product Code Jumper, which kids just can’t put down.

HUIZINGA: Right.

MORRISON: So that’s great. So we, we know that inclusive design is going to generate great ideas. We also know that diverse teams generate the best ideas because diverse life experience can prompt us to think out of the box. But how do we get diverse teams when it can be hard for people with disabilities to enter the field of design and technology? So design assumes often good visual skills; it assumes the ability to draw. And that can knock out a lot of people who might be great at designing technology experiences without those skills. So with our citizen design team, we wanted to open up the opportunity to young people who are blind and low vision to really set the stage for them to think about what would a career in technology design be like? Could I be part of this? Can I be that generation who’s going to design the next cohort of accessible, inclusive technologies? So we did this through teaching key design skills like the design process itself, prototyping, as well as having, uh, this team act as full members of our own R&D team, so in an apprenticeship style. So our citizen designers weren’t just giving feedback as, as participants might, but they were creating prototypes, running A/B tests, and it was our hope and I think we succeeded in making it a give-give situation. We were giving them a set of skills, and they were giving us their design knowledge that was really valuable to our innovation process.

HUIZINGA: That is so awesome. I’m, you know, just thinking of, of the sense of belonging that you might get instead of being, as Karolina kind of referred to, it’s not just another user-research study where you’ll go and be part of a project that someone else is doing. You’re actually integrally connected to the project. And on that note, Karolina, talk a little bit about what it’s like to be a citizen designer. What were some of your aha moments on the project, maybe the items that you wanted to be able to find and what surprises you encountered in the process of developing a technique to teach a personal item?

PAKĖNAITĖ: Yeah, so it was, uh, incredibly fascinating to play the role of a citizen designer and testing a Teachable AI for use and providing further comments. It took me a bit of time to really understand how this tool is different from existing ones, but then I realized it’s literally in the name, a Teachable AI. [LAUGHTER] So it’s a tool designed for teaching it about your very own personal items. Yeah, your items may, may not look like a typical standard item; maybe you personalized them with engravings or stickers, or maybe it’s a unique gadget or maybe, say, a medical device. So it’s not about teaching every single item that you own, but rather a tool, a tool that lets us identify what matters most to you. So, yeah, I have about five to 10 small personal items that I always carry with me, and most of them are like very, very, very important to me. Like losing a bus pass means I can’t get anywhere. Losing a key means I can’t get home. Because these items are small and I use them daily, that means they are also, uh, being lost most commonly. So now I have a tool that is able to locate my personal items if they happen to be lost.

HUIZINGA: Right. And as you said earlier, you do have some sight. It’s, it’s tunnel vision at this point, so the peripheral part, um, is more challenging for you. But having this tool helps you to focus in a broader spectrum of, of visual sight. Cecily, this would be a great time to get a bit more specific about your Teachable AI discovery process. Tell us some research stories. How did you go about optimizing this AI system, and what things did you learn from both your successes and your failures?

MORRISON: Ah, yes, lots of research stories with this system, I’m afraid, but I think the very first thing we did was, OK, a user wants to teach this system, so we need to tell the user what makes a good teaching example. Well, we don’t know. Actually, we assumed we did know because in machine learning, the idea is more data, better quote-unquote “quality data,” and the system will work better. So the first thing that really surprised us when we actually ran some experimental analysis was that more data was not better and higher-quality data, or data that has less blur or is perfectly framed, was also not better. So what we realized is that it wasn’t our aim to kind of squeeze as much data as we could from the users but really to get the data that was the right kind of data. So we did need the object in the image. It’s, it’s really hard to train a system to recognize an object that’s not there at all. But what we needed was data that looked exactly like what the user was going to use when they were finding the objects. So if the user moves the camera really fast and the image becomes blurry, then we need those teaching examples to have blur, too.

HUIZINGA: Right.

MORRISON: So it was in understanding this relationship between the teaching examples and the user that really helped us craft a process that was going to help the user get the best result from the system. One of the things about Teachable AI is that it’s not about the AI system. It’s about the relationship between the user and the AI system. And the key to that relationship is the mental model of the user. They need to make good judgments about how to give good teaching examples if we want that whole cycle between user and AI system to go well. So I remember watching Karolina taking her teaching frames, and she was moving very far away. And I was thinking, hmm, I don’t think that data is going to work very well because there’s just not going to be enough pixels of the object to make a good representation for the system. So I asked Karolina about her strategy, and she said, well, if I want it to work from far away, then I should take teaching examples from far away. And I thought, ah, that’s a very logical mental model.

HUIZINGA: Right.

MORRISON: But unfortunately, we’ve broken the user’s mental model because that’s not actually how the system works because we were cropping frames and taking pixels out and doing all kinds of fancy image manipulation to, actually, to improve the performance under the hood. So I think this was an experience where we thought, ah, we want the user to develop a good mental model, but to do that, we need to actually structure this teaching process so they don’t need to think so hard and we’re guiding them into the, the kinds of things that make the system work well as opposed to not, and then they don’t need to guess. So the other thing that we found was that teaching should be fast and easy. Otherwise, it’s just too much work. No matter how personalized something is, if you have to work too hard, it’s a no-go. So we thought, ah, we want this to be really fast. We want it to take as few frames as possible. And we want the users to be really confident that they’ve got the object in the frame because that’s the one thing we really need. So we’re going to tell them all the time if the object’s in the frame: it’s in frame; it’s in frame; it’s in frame; it’s in frame; it’s in frame; it’s in frame. Well, there’s … citizen designers [LAUGHTER], including Karolina, came back to us and said, you know, this is really stressful. You know, I’m constantly worrying, “Is it in frame? Is it in frame? Is it in frame?” And actually, the cognitive load of that, even though we were trying to make the process really, really easy, um, was, was really overwhelming. And one of them said to us, well, why don’t I just assume that I’m doing a good job unless you tell me otherwise? [LAUGHTER] And that really helped shift our mindset to say, well, OK, we can help the user by giving them a gentle nudge back on track, but we don’t need to grab all their cognitive attention to make the perfect video!

HUIZINGA: [LAUGHS] That’s, that’s so hilarious. Well, Cecily, I want to stay with you for a minute and discuss the broader benefits of what you call “designing outside the mean.” And despite the challenges of developing technologies, we’ve seen specialized research deliver the so-called curb-cut effect over and over. Now you’ve already alluded to this a bit earlier. But clearly people with blindness and low vision aren’t the only ones who can’t find their things. So might this research help other people? Could it, could it be something I could incorporate into my phone?

MORRISON: That’s a great question. And I think an important question when we do any research is how do we broaden this out to meet the, the widest need possible? So I’m going to think about rather than Find My Things specifically, I’m going to think about Teachable AI. And Teachable AI should benefit everybody who needs something specific to themselves. And who of us don’t think that we need things to be specific to ourselves in this day and age?

HUIZINGA: Right … [LAUGHS]

MORRISON: But it’s going to be particularly useful for people on the margins of technology design for many reasons. So it doesn’t matter—it could be where your home is different or the way you go about your daily lives or perhaps the intersection of your identities. By having Teachable AI, we make systems that are going to work for individuals. Regardless of the labels that you might have or the life experience you might have, we want an AI system that works for you. And this is an approach that’s moving us in that direction.

HUIZINGA: You know, I love … I, I remembered what you said earlier, and it was for individuals, not people with disabilities. And I just love that framing anyway because we’re all individuals, and everyone has some kind of a disability, whether you call it that or not. So I just love this work so much. Karolina, back to you for a minute. You have said you’re a very tactile person. What role does haptics, which is the touch/feel part of computer science, play for you in this research, and how do physical cues work for you in this technology?

PAKĖNAITĖ: Yeah, so because I’m deaf-blind, I think my brain naturally craves information through senses which I have full access to. For me, it’s touch. So I find it very stimulating when the tools are tactile, whether that’s vibrations or textures. Tactile feedback not only enhances the experiences, but I think it’s also a good accessibility cue, as well. For example, one big instance happened that as a citizen designer was when I was pointing my camera at an object and, being hard of hearing, that means I couldn’t hear what it was saying, so I had to bring it close to my, my ear, and that meant that the object was lost in the camera view. [LAUGHS]

HUIZINGA: Right … [LAUGHS]

PAKĖNAITĖ: So … yeah, yeah, I think having tactile cues could be very beneficial for people like me who are deaf-blind but also others. Like, for example, you don’t always want your phone to be on sound all the time. Maybe in a quiet train, in a quiet tube, you don’t want your phone to start talking; you might be feeling self-conscious. So, yeah, I think …

HUIZINGA: Right …

PAKĖNAITĖ: … always adding those tactile cues will benefit me and everyone else.

HUIZINGA: Yeah, so to clarify, is haptics or touch involved in any of this particular Teachable AI technology, Cecily? I know that Karolina has that as a, you know, a “want to have” kind of thing. Where does it stand here?

MORRISON: Yeah, no, I, I think Karolina’s participation, um, was actually fairly critical in us adding, um, vibration cues to the experience.

HUIZINGA: Yeah, so it does use the, the haptic …

MORRISON: Yeah, we use auditory, visual, and, and vibration as a means of interaction. And I think in general, we should be designing all of our experiences with technology to be multisensory because, as Karolina pointed out, in certain circumstances, you don’t really want your computer talking at you. In other circumstances, you need something else. And in our different individual needs, we might need something else. So this allows people to be as flexible as possible for their context and for their own needs to make an experience work for them.

HUIZINGA: Right. Yeah, and I feel like this is already kind of part of our lives when our phones buzz or, or, you know, vibrate or when you wear the watch that gives you a little tip on your wrist that you’ve got a notification or you need to turn left or [LAUGHTER] whatever you’re using it for. Cecily, I always like to know where a project is on the spectrum from lab to life, as we say on this show. What’s the status of Teachable AI in general and Find My Things in particular, and how close is it to being able to be used in real life by a broader audience than your citizen designers and your team?

MORRISON: So it’s really important for us that the technologies we research become available to the communities to whom they are valuable. And in the past, we’ve had a whole set of partners, including Seeing AI, American Printing House for the Blind, to help us take ideas, research prototypes, and make them into products that people can have. Now Teachable AI is a grand vision. I think we are … showed with this work in Find My Things that the machine learning is there. We can do this, and it’s coming. And as we move into this new era of machine learning with these very large models, we’re going to need it there, too, because the larger the model, the more personalized we’re probably going to need the experience. In terms of Find My Things, we are also on that journey to finding the right opportunity to bring it out to the blind community.

HUIZINGA: So this has been fascinating. I’m … there’s so many more questions I want to ask, but we don’t have a lot of time to ask them all. I’m sure that we’re going to be watching as this unfolds and probably becomes part of all of our lives at some point thanks to the wonderful people doing the research. I like to end the podcast with a little future casting from each of my guests, and, Karolina, I’d like you to go first. I have a very specific question for you. Aside from your studies and your research work, you’ve said you’re on a mission. What’s that mission, and what does Mount Everest have to do with it?

PAKĖNAITĖ: So firstly, I’m hoping to complete my PhD this year. That’s my big priority for, for this year. And then, uh, I will be on a mission, an ambitious one that I feel a little bit nervous to share but also very excited. As an adventurer at heart, my dream is to summit Mount Everest. So before it’s always seemed like a fantasy, but I recently came back from an Everest base camp trek just a few months ago, and I met some mountaineers who were on their way to the top, and I found myself quietly saying, what if? And then, as I was thinking how I’m slowly losing my sight, I realized that if I do want to summit Everest, I would want to go there while I still can see with my remaining vision, so I realized that it would have to be now or never.

HUIZINGA: Right!

PAKĖNAITĖ: So when I came back, I decided … I just made some actions. So I reached out to different organizations and surprisingly a film production team is eager to document this journey and … yeah, it seems like something might be happening. So this mission isn’t just about me potentially becoming the first deaf-blind person to summit Everest but also a commitment to raising awareness and providing representation for the blind and deaf-blind community. I hope to stay in the research field, and I believe this mission has some potential for research. So I think that, for example, I’m looking for accessibility tools for, for me to climb Everest so that I can be the best climber I can be as a deaf-blind person, being independent but part of the team, or maybe make a documentary film a multisensory experience, accessible to a wider community, including deaf-blind. So, yeah, I’m actively looking for collaborators and would love to be contacted by anyone.

HUIZINGA: I love the fact that you are bringing awareness to the fact, first of all, that the deaf-blind community or even the blind community isn’t a one-size-fits-all. So, um, yeah, I hope you get to summit Everest to be able to see the world from the tallest peak in the world before anything progresses that way. Well, Cecily, I’d like to close with you. Go with me on a little forward-thinking, backward-thinking journey. You’re at the end of your career looking back. What have you accomplished as a researcher, and how has your work disrupted the field of accessible technology and made the world a better place?

MORRISON: Where would I like to be? I would say more like where would we like to be. So in collaboration with colleagues, I hope we have brought a sense of individual’s agency in their experience with AI systems, which allow people to shape them for their own unique experience, whoever they might be and wherever they might be in the world. And I think this idea is no less important, or perhaps it’s even more important, as we move into a world of large foundation models that underpin many or perhaps all of our experiences as we, as we go forward. And I think particularly large foundation models will bring really significant change to accessibility, and I hope the approach of teachability will be a significantly positive influence in making those experiences just what we need them to be. And I have to say, in my life role, I’m personally really very hopeful for my own blind child’s opportunities in the world of work in 10 years’ time. At the moment, only 25 percent of people who are blind or low vision work. I think technology can play a huge role in getting rid of this mismatch between the environment and a person and allowing many more people with disabilities to enjoy being in the workplace.

HUIZINGA: This is exciting research and really a wonderful collaboration. I’m so grateful, Cecily Morrison and Karolina Pakėnaitė, for coming on the show and talking about it with us today. Thank you so much.

MORRISON: Thank you, Gretchen, and thank you, Karolina.

PAKĖNAITĖ: Thank you.

The post Collaborators: Teachable AI with Cecily Morrison and Karolina Pakėnaitė appeared first on Microsoft Research.

Read More

‘Christmas Rush’ 3D Scene Brings Holiday Cheer This Week ‘In the NVIDIA Studio’

‘Christmas Rush’ 3D Scene Brings Holiday Cheer This Week ‘In the NVIDIA Studio’

Editor’s note: This post is part of our weekly In the NVIDIA Studio series, which celebrates featured artists, offers creative tips and tricks, and demonstrates how NVIDIA Studio technology improves creative workflows. 

‘Tis the season for friends, family and beautifully rendered Santa animations from this week’s In the NVIDIA Studio artist, 3D expert Božo Balov.

This week also marks an incredible milestone, with over 500 NVIDIA RTX-powered games and creative apps now available with support for ray tracing and AI-powered technologies like NVIDIA DLSS. Over 120 of the most popular apps — including the Adobe Creative Cloud suite, Autodesk Maya, Blender, Blackmagic Design’s Davinci Resolve, OBS, Unity and more — use RTX to accelerate workflows by orders of magnitude, power new AI tools and enhancements and enable real-time, ray-traced previews.

To celebrate, NVIDIA GeForce is hosting a giveaway for gift cards, rare, sought-after #RTXON keyboard keycaps and more. Follow GeForce on Facebook, Instagram, TikTok or X (formerly known as Twitter) for instructions on how to enter.

Say it ain’t snow: the NVIDIA Studio #WinterArtChallenge is back. Through the end of the year, share winter-themed art on Facebook, Instagram or X for a chance to be featured on NVIDIA Studio social media channels. Be sure to tag #WinterArtChallenge to join.

Finally, 80 Level — the creative community for digital artists, animators and computer-generated imagery specialists — is hosting its Community Metasites Challenge. Artists can showcase their creativity by applying unique aesthetics to a simple block level via characters, game mechanics, visual effects and more — with a chance to win a new NVIDIA Studio laptop. Register today.

Wrapper’s Delight

Balov’s Christmas Rush 3D animation reimagines Santa as a resident of the coastal city of Split, Croatia — but with a harsher, less jolly edge.

 

Balov jumped straight into modeling edgy Saint Nick in the virtual-reality modeling software Quill. He deployed vertex-painting techniques and used a photogrammetry scan of a Vespa as a base, adding brushstrokes to blend it with the rest of the scene.

 

To achieve a flickering effect on Santa’s clothing, Balov created a custom texture with different brush strokes in Adobe Photoshop. The texture doubles as an alpha map, which intentionally clips the geometry.

 

“When it comes to rendering 3D graphics, nothing really comes close to NVIDIA GPUs.” — Božo Balov

He then used Adobe Photoshop to paint monochromatic background layers. Balov’s GeForce RTX 3080 Ti GPU unlocked over 30 GPU-accelerated features, including blur gallery, liquify, smart sharpen and perspective warp.

Balov then converted the files to the FBX adaptable file format for 3D software before importing them into Blender, where he animated the layers to move in the opposite direction of the character to create a sense of speed. He kept the lighting fairly simple, with one light source as the base and a few supplemental ones to emphasize specific parts of the scene.

 

Balov prefers working in Blender’s real-time engine EEVEE to animate his scene, cutting wait times. RTX-accelerated OptiX ray tracing in the viewport enabled greater interactivity with smoother movement, speeding his ideation and creative workflow.

Extraordinary detail.

“Rendering is a joy on NVIDIA RTX cards,” said Balov. “Since OptiX made its debut, rendering times have been cut in half or more — Blender Cycles feels like a real-time engine.”

When asked for advice to give aspiring artists, Balov emphasized the importance of individual passion.

“Pursue what matters to you,” he said. “Don’t spend time fulfilling other people’s ideas of what art should be.”

 

Check out Balov’s art portfolio on Instagram.

Follow NVIDIA Studio on Facebook, Instagram and X. Access tutorials on the Studio YouTube channel and get updates directly in your inbox by subscribing to the Studio newsletter. 

Read More

Snowflake logo

Snowflake Joins the PyTorch Foundation as a General Member

Snowflake logo

The PyTorch Foundation, a neutral home for the deep learning community to collaborate on the open source PyTorch framework and ecosystem, is announcing today that Snowflake has joined as a general member.

Snowflake enables thousands of organizations to unite siloed data, discover and securely share data, power data applications, and execute diverse AI/ML and analytic workloads across multiple clouds and geographies.

“By joining the PyTorch community, we know that Snowflake will help accelerate data warehousing solutions and cutting-edge AI frameworks. This showcases the commitment to advancing innovation for data and artificial intelligence,” said Ibrahim Haddad, Executive Director, PyTorch Foundation. “We are thrilled to have Snowflake join the PyTorch Foundation, marking a significant stride in the convergence of data management and deep learning technologies.”

Snowflake enables collaboration with AI technologies to handle the storage and analysis of large datasets generated by machine learning and AI applications through scalability and SQL support.

With the integrated repository of Python libraries from Anaconda in Snowpark, Snowflake users have always had a streamlined experience to deploy pre-trained PyTorch models in Snowflake to easily and securely make them a part of applications. Now with the addition of GPU instances in Snowpark Container Services (in private preview), training and other computationally intensive processing using PyTorch will also be streamlined, providing teams with an end-to-end solution for AI development and deployment.

“Most if not all of our customers incorporate open source software as part of their data stacks, so it is critical for us to work with open source ecosystems like the PyTorch Foundation, alongside incorporating open source to meet the needs of our customers,” said Adrien Treuille, Co-Founder of Streamlit, Director of Product Management at Snowflake. “As AI developers continue to integrate their models as part of applications, the power of Snowflake and PyTorch — coupled with Streamlit as the powerful front-end — creates near-limitless innovation for developers looking to build next-generation apps and unlock even more use cases.”

To learn more about the power of Snowflake and PyTorch, tune into Snowflake’s developer conference for AI and apps, BUILD.

To learn more about how you can be a part of the PyTorch Foundation, visit our website.

About Snowflake

Snowflake enables every organization to mobilize their data with Snowflake’s Data Cloud. Customers use the Data Cloud to unite siloed data, discover and securely share data, power data applications, and execute diverse AI/ML and analytic workloads. Wherever data or users live, Snowflake delivers a single data experience that spans multiple clouds and geographies. Thousands of customers across many industries, including 639 of the 2023 Forbes Global 2000 (G2K) as of July 31, 2023, use Snowflake Data Cloud to power their businesses. Learn more at snowflake.com.

About PyTorch Foundation

The PyTorch Foundation is a neutral home for the deep learning community to collaborate on the open source PyTorch framework and ecosystem. The PyTorch Foundation is supported by its members and leading contributors to the PyTorch open source project. The Foundation leverages resources provided by members and contributors to enable community discussions and collaboration.

About The Linux Foundation

The Linux Foundation is the world’s leading home for collaboration on open source software, hardware, standards, and data. Linux Foundation projects are critical to the world’s infrastructure including Linux, Kubernetes, Node.js, ONAP, PyTorch, RISC-V, SPDX, OpenChain, and more. The Linux Foundation focuses on leveraging best practices and addressing the needs of contributors, users, and solution providers to create sustainable models for open collaboration. For more information, please visit us at linuxfoundation.org. The Linux Foundation has registered trademarks and uses trademarks. For a list of trademarks of The Linux Foundation, please see its trademark usage page. Linux is a registered trademark of Linus Torvalds.

Read More

A new quantum algorithm for classical mechanics with an exponential speedup

A new quantum algorithm for classical mechanics with an exponential speedup

Quantum computers promise to solve some problems exponentially faster than classical computers, but there are only a handful of examples with such a dramatic speedup, such as Shor’s factoring algorithm and quantum simulation. Of those few examples, the majority of them involve simulating physical systems that are inherently quantum mechanical — a natural application for quantum computers. But what about simulating systems that are not inherently quantum? Can quantum computers offer an exponential advantage for this?

In “Exponential quantum speedup in simulating coupled classical oscillators”, published in Physical Review X (PRX) and presented at the Symposium on Foundations of Computer Science (FOCS 2023), we report on the discovery of a new quantum algorithm that offers an exponential advantage for simulating coupled classical harmonic oscillators. These are some of the most fundamental, ubiquitous systems in nature and can describe the physics of countless natural systems, from electrical circuits to molecular vibrations to the mechanics of bridges. In collaboration with Dominic Berry of Macquarie University and Nathan Wiebe of the University of Toronto, we found a mapping that can transform any system involving coupled oscillators into a problem describing the time evolution of a quantum system. Given certain constraints, this problem can be solved with a quantum computer exponentially faster than it can with a classical computer. Further, we use this mapping to prove that any problem efficiently solvable by a quantum algorithm can be recast as a problem involving a network of coupled oscillators, albeit exponentially many of them. In addition to unlocking previously unknown applications of quantum computers, this result provides a new method of designing new quantum algorithms by reasoning purely about classical systems.

Simulating coupled oscillators

The systems we consider consist of classical harmonic oscillators. An example of a single harmonic oscillator is a mass (such as a ball) attached to a spring. If you displace the mass from its rest position, then the spring will induce a restoring force, pushing or pulling the mass in the opposite direction. This restoring force causes the mass to oscillate back and forth.

A simple example of a harmonic oscillator is a mass connected to a wall by a spring. [Image Source: Wikimedia]

Now consider coupled harmonic oscillators, where multiple masses are attached to one another through springs. Displace one mass, and it will induce a wave of oscillations to pulse through the system. As one might expect, simulating the oscillations of a large number of masses on a classical computer gets increasingly difficult.

An example system of masses connected by springs that can be simulated with the quantum algorithm.

To enable the simulation of a large number of coupled harmonic oscillators, we came up with a mapping that encodes the positions and velocities of all masses and springs into the quantum wavefunction of a system of qubits. Since the number of parameters describing the wavefunction of a system of qubits grows exponentially with the number of qubits, we can encode the information of N balls into a quantum mechanical system of only about log(N) qubits. As long as there is a compact description of the system (i.e., the properties of the masses and the springs), we can evolve the wavefunction to learn coordinates of the balls and springs at a later time with far fewer resources than if we had used a naïve classical approach to simulate the balls and springs.

We showed that a certain class of coupled-classical oscillator systems can be efficiently simulated on a quantum computer. But this alone does not rule out the possibility that there exists some as-yet-unknown clever classical algorithm that is similarly efficient in its use of resources. To show that our quantum algorithm achieves an exponential speedup over any possible classical algorithm, we provide two additional pieces of evidence.

The glued-trees problem and the quantum oracle

For the first piece of evidence, we use our mapping to show that the quantum algorithm can efficiently solve a famous problem about graphs known to be difficult to solve classically, called the glued-trees problem. The problem takes two branching trees — a graph whose nodes each branch to two more nodes, resembling the branching paths of a tree — and glues their branches together through a random set of edges, as shown in the figure below.

A visual representation of the glued trees problem. Here we start at the node labeled ENTRANCE and are allowed to locally explore the graph, which is obtained by randomly gluing together two binary trees. The goal is to find the node labeled EXIT.

The goal of the glued-trees problem is to find the exit node — the “root” of the second tree — as efficiently as possible. But the exact configuration of the nodes and edges of the glued trees are initially hidden from us. To learn about the system, we must query an oracle, which can answer specific questions about the setup. This oracle allows us to explore the trees, but only locally. Decades ago, it was shown that the number of queries required to find the exit node on a classical computer is proportional to a polynomial factor of N, the total number of nodes.

But recasting this as a problem with balls and springs, we can imagine each node as a ball and each connection between two nodes as a spring. Pluck the entrance node (the root of the first tree), and the oscillations will pulse through the trees. It only takes a time that scales with the depth of the tree — which is exponentially smaller than N — to reach the exit node. So, by mapping the glued-trees ball-and-spring system to a quantum system and evolving it for that time, we can detect the vibrations of the exit node and determine it exponentially faster than we could using a classical computer.

BQP-completeness

The second and strongest piece of evidence that our algorithm is exponentially more efficient than any possible classical algorithm is revealed by examination of the set of problems a quantum computer can solve efficiently (i.e., solvable in polynomial time), referred to as bounded-error quantum polynomial time or BQP. The hardest problems in BQP are called “BQP-complete”.

While it is generally accepted that there exist some problems that a quantum algorithm can solve efficiently and a classical algorithm cannot, this has not yet been proven. So, the best evidence we can provide is that our problem is BQP-complete, that is, it is among the hardest problems in BQP. If someone were to find an efficient classical algorithm for solving our problem, then every problem solved by a quantum computer efficiently would be classically solvable! Not even the factoring problem (finding the prime factors of a given large number), which forms the basis of modern encryption and was famously solved by Shor’s algorithm, is expected to be BQP-complete.

A diagram showing the believed relationships of the classes BPP and BQP, which are the set of problems that can be efficiently solved on a classical computer and quantum computer, respectively. BQP-complete problems are the hardest problems in BQP.

To show that our problem of simulating balls and springs is indeed BQP-complete, we start with a standard BQP-complete problem of simulating universal quantum circuits, and show that every quantum circuit can be expressed as a system of many balls coupled with springs. Therefore, our problem is also BQP-complete.

Implications and future work

This effort also sheds light on work from 2002, when theoretical computer scientist Lov K. Grover and his colleague, Anirvan M. Sengupta, used an analogy to coupled pendulums to illustrate how Grover’s famous quantum search algorithm could find the correct element in an unsorted database quadratically faster than could be done classically. With the proper setup and initial conditions, it would be possible to tell whether one of N pendulums was different from the others — the analogue of finding the correct element in a database — after the system had evolved for time that was only ~√(N). While this hints at a connection between certain classical oscillating systems and quantum algorithms, it falls short of explaining why Grover’s quantum algorithm achieves a quantum advantage.

Our results make that connection precise. We showed that the dynamics of any classical system of harmonic oscillators can indeed be equivalently understood as the dynamics of a corresponding quantum system of exponentially smaller size. In this way we can simulate Grover and Sengupta’s system of pendulums on a quantum computer of log(N) qubits, and find a different quantum algorithm that can find the correct element in time ~√(N). The analogy we discovered between classical and quantum systems can be used to construct other quantum algorithms offering exponential speedups, where the reason for the speedups is now more evident from the way that classical waves propagate.

Our work also reveals that every quantum algorithm can be equivalently understood as the propagation of a classical wave in a system of coupled oscillators. This would imply that, for example, we can in principle build a classical system that solves the factoring problem after it has evolved for time that is exponentially smaller than the runtime of any known classical algorithm that solves factoring. This may look like an efficient classical algorithm for factoring, but the catch is that the number of oscillators is exponentially large, making it an impractical way to solve factoring.

Coupled harmonic oscillators are ubiquitous in nature, describing a broad range of systems from electrical circuits to chains of molecules to structures such as bridges. While our work here focuses on the fundamental complexity of this broad class of problems, we expect that it will guide us in searching for real-world examples of harmonic oscillator problems in which a quantum computer could offer an exponential advantage.

Acknowledgements

We would like to thank our Quantum Computing Science Communicator, Katie McCormick, for helping to write this blog post.

Read More

Bringing Personality to Pixels, Inworld Levels Up Game Characters Using Generative AI

Bringing Personality to Pixels, Inworld Levels Up Game Characters Using Generative AI

To enhance the gaming experience, studios and developers spend tremendous effort creating photorealistic, immersive in-game environments.

But non-playable characters (NPCs) often get left behind. Many behave in ways that lack depth and realism, making their interactions repetitive and forgettable.

Inworld AI is changing the game by using generative AI to drive NPC behaviors that are dynamic and responsive to player actions. The Mountain View, Calif.-based startup’s Character Engine, which can be used with any character design, is helping studios and developers enhance gameplay and improve player engagement.

Elevate Gaming Experiences: Achievement Unlocked

The Inworld team aims to develop AI-powered NPCs that can learn, adapt and build relationships with players while delivering high-quality performance and maintaining in-game immersion.

To make it easier for developers to integrate AI-based NPCs into their games, Inworld built Character Engine, which uses generative AI running on NVIDIA technology to create immersive, interactive characters. It’s built to be production-ready, scalable and optimized for real-time experiences.

The Character Engine comprises three layers: Character Brain, Contextual Mesh and Real-Time AI.

Character Brain orchestrates a character’s performance by syncing to its multiple personality machine learning models, such as for text-to-speech, automatic speech recognition, emotions, gestures and animations.

The layer also enables AI-based NPCs to learn and adapt, navigate relationships and perform motivated actions. For example, users can create triggers using the “Goals and Action” feature to program NPCs to behave in a certain way in response to a given player input.

Contextual Mesh allows developers to set parameters for content and safety mechanisms, custom knowledge and narrative controls. Game developers can use the “Relationships” feature to create emergent narratives, such that an ally can turn into an enemy or vice versa based on how players treat an NPC.

One big challenge developers face when using generative AI is keeping NPCs in-world and on-message. Inworld’s Contextual Mesh layer helps overcome this hurdle by rendering characters within the logic and fantasy of their worlds, effectively avoiding the hallucinations that commonly appear when using large language models (LLMs).

The Real-Time AI layer ensures optimal performance and scalability for real-time experiences.

Powering Up AI Workflows With NVIDIA 

Inworld, a member of the NVIDIA Inception program, which supports startups through every stage of their development, uses NVIDIA A100 Tensor Core GPUs and NVIDIA Triton Inference Server as integral parts of its generative AI training and deployment infrastructure.

Inworld used the open-source NVIDIA Triton Inference Server software to standardize other non-generative machine learning model deployments required to power Character Brain features, such as emotions. The startup also plans to use the open-source NVIDIA TensorRT-LLM library to optimize inference performance. Both NVIDIA Triton Inference Server and TensorRT-LLM are available with the NVIDIA AI Enterprise software platform, which provides security, stability and support for production AI.

Inworld also used NVIDIA A100 GPUs within Slurm-managed bare-metal machines for its production training pipelines. Similar machines wrapped in Kubernetes help manage character interactions during gameplay. This setup delivers real-time generative AI at the lowest possible cost.

“We chose to use NVIDIA A100 GPUs because they provided the best, most cost-efficient option for our machine learning workloads compared to other solutions,” said Igor Poletaev, vice president of AI at Inworld.

“Our customers and partners are looking to find novel and innovative ways to drive player engagement metrics by integrating AI NPC functionalities into their gameplay,” said Poletaev. “There’s no way to achieve real-time performance without hardware accelerators, which is why we required GPUs to be integrated into our backend architecture from the very beginning.”

Inworld’s generative AI-powered NPCs have enabled dynamic, evergreen gaming experiences that keep players coming back. Developers and gamers alike have reported enhanced player engagement, satisfaction and retention.

Inworld has powered AI-based NPC experiences from Niantic, LG UPlus, Alpine Electronics and more. One open-world virtual reality game using the Inworld Character Engine saw a 5% increase in playtime, while a detective-themed indie game garnered over $300,000 in free publicity after some of the most popular Twitch streamers discovered it.

Learn more about Inworld AI and NVIDIA technologies for game developers.

Read More

Summary report optimization in the Privacy Sandbox Attribution Reporting API

Summary report optimization in the Privacy Sandbox Attribution Reporting API

In recent years, the Privacy Sandbox initiative was launched to explore responsible ways for advertisers to measure the effectiveness of their campaigns, by aiming to deprecate third-party cookies (subject to resolving any competition concerns with the UK’s Competition and Markets Authority). Cookies are small pieces of data containing user preferences that websites store on a user’s device; they can be used to provide a better browsing experience (e.g., allowing users to automatically sign in) and to serve relevant content or ads. The Privacy Sandbox attempts to address concerns around the use of cookies for tracking browsing data across the web by providing a privacy-preserving alternative.

Many browsers use differential privacy (DP) to provide privacy-preserving APIs, such as the Attribution Reporting API (ARA), that don’t rely on cookies for ad conversion measurement. ARA encrypts individual user actions and collects them in an aggregated summary report, which estimates measurement goals like the number and value of conversions (useful actions on a website, such as making a purchase or signing up for a mailing list) attributed to ad campaigns.

The task of configuring API parameters, e.g., allocating a contribution budget across different conversions, is important for maximizing the utility of the summary reports. In “Summary Report Optimization in the Privacy Sandbox Attribution Reporting API”, we introduce a formal mathematical framework for modeling summary reports. Then, we formulate the problem of maximizing the utility of summary reports as an optimization problem to obtain the optimal ARA parameters. Finally, we evaluate the method using real and synthetic datasets, and demonstrate significantly improved utility compared to baseline non-optimized summary reports.

ARA summary reports

We use the following example to illustrate our notation. Imagine a fictional gift shop called Du & Penc that uses digital advertising to reach its customers. The table below captures their holiday sales, where each record contains impression features with (i) an impression ID, (ii) the campaign, and (iii) the city in which the ad was shown, as well as conversion features with (i) the number of items purchased and (ii) the total dollar value of those items.

Impression and conversion feature logs for Du & Penc.

Mathematical model

ARA summary reports can be modeled by four algorithms: (1) Contribution Vector, (2) Contribution Bounding, (3) Summary Reports, and (4) Reconstruct Values. Contribution Bounding and Summary Reports are performed by the ARA, while Contribution Vector and Reconstruct Values are performed by an AdTech provider — tools and systems that enable businesses to buy and sell digital advertising. The objective of this work is to assist AdTechs in optimizing summary report algorithms.

The Contribution Vector algorithm converts measurements into an ARA format that is discretized and scaled. Scaling needs to account for the overall contribution limit per impression. Here we propose a method that clips and performs randomized rounding. The outcome of the algorithm is a histogram of aggregatable keys and values.

Next, the Contribution Bounding algorithm runs on client devices and enforces the contribution bound on attributed reports where any further contributions exceeding the limit are dropped. The output is a histogram of attributed conversions.

The Summary Reports algorithm runs on the server side inside a trusted execution environment and returns noisy aggregate results that satisfy DP. Noise is sampled from the discrete Laplace distribution, and to enforce privacy budgeting, a report may be queried only once.

Finally, the Reconstruct Values algorithm converts measurements back to the original scale. Reconstruct Values and Contribution Vector Algorithms are designed by the AdTech, and both impact the utility received from the summary report.

Illustrative usage of ARA summary reports, which include Contribution Vector (Algorithm A), Contribution Bounding (Algorithm C), Summary Reports (Algorithm S), and Reconstruct Values (Algorithm R). Algorithms C and S are fixed in the API. The AdTech designs A and R.

Error metrics

There are several factors to consider when selecting an error metric for evaluating the quality of an approximation. To choose a particular metric, we considered the desirable properties of an error metric that further can be used as an objective function. Considering desired properties, we have chosen 𝜏-truncated root mean square relative error (RMSRE𝜏) as our error metric for its properties. See the paper for a detailed discussion and comparison to other possible metrics.

Optimization

To optimize utility as measured by RMSRE𝜏, we choose a capping parameter, C, and privacy budget, 𝛼, for each slice. The combination of both determines how an actual measurement (such as two conversions with a total value of $3) is encoded on the AdTech side and then passed to the ARA for Contribution Bounding algorithm processing. RMSRE𝜏 can be computed exactly, since it can be expressed in terms of the bias from clipping and the variance of the noise distribution. Following those steps we find out that RMSRE𝜏 for a fixed privacy budget, 𝛼,or a capping parameter, C, is convex (so the error-minimizing value for the other parameter can be obtained efficiently), while for joint variables (C, 𝛼) it becomes non-convex (so we may not always be able to select the best possible parameters). In any case, any off-the-shelf optimizer can be used to select privacy budgets and capping parameters. In our experiments, we use the SLSQP minimizer from the scipy.optimize library.

Synthetic data

Different ARA configurations can be evaluated empirically by testing them on a conversion dataset. However, access to such data can be restricted or slow due to privacy concerns, or simply unavailable. One way to address these limitations is to use synthetic data that replicates the characteristics of real data.

We present a method for generating synthetic data responsibly through statistical modeling of real-world conversion datasets. We first perform an empirical analysis of real conversion datasets to uncover relevant characteristics for ARA. We then design a pipeline that uses this distribution knowledge to create a realistic synthetic dataset that can be customized via input parameters.

The pipeline first generates impressions drawn from a power-law distribution (step 1), then for each impression it generates conversions drawn from a Poisson distribution (step 2) and finally, for each conversion, it generates conversion values drawn from a log-normal distribution (step 3). With dataset-dependent parameters, we find that these distributions closely match ad-dataset characteristics. Thus, one can learn parameters from historical or public datasets and generate synthetic datasets for experimentation.

Overall dataset generation steps with features for illustration.

Experimental evaluation

We evaluate our algorithms on three real-world datasets (Criteo, AdTech Real Estate, and AdTech Travel) and three synthetic datasets. Criteo consists of 15M clicks, Real Estate consists of 100K conversions, and Travel consists of 30K conversions. Each dataset is partitioned into a training set and a test set. The training set is used to choose contribution budgets, clipping threshold parameters, and the conversion count limit (the real-world datasets have only one conversion per click), and the error is evaluated on the test set. Each dataset is partitioned into slices using impression features. For real-world datasets, we consider three queries for each slice; for synthetic datasets, we consider two queries for each slice.

For each query we choose the RMSRE𝝉 𝜏 value to be five times the median value of the query on the training dataset. This ensures invariance of the error metric to data rescaling, and allows us to combine the errors from features of different scales by using 𝝉 per each feature.

Scatter plots of real-world datasets illustrating the probability of observing a conversion value. The fitted curves represent best log-normal distribution models that effectively capture the underlying patterns in the data.

Results

We compare our optimization-based algorithm to a simple baseline approach. For each query, the baseline uses an equal contribution budget and a fixed quantile of the training data to choose the clipping threshold. Our algorithms produce substantially lower error than baselines on both real-world and synthetic datasets. Our optimization-based approach adapts to the privacy budget and data.

RMSREτ for privacy budgets {1, 2, 4, 8, 16, 32, 64} for our algorithms and baselines on three real-world and three synthetic datasets. Our optimization-based approach consistently achieves lower error than baselines that use a fixed quantile for the clipping threshold and split the contribution budget equally among the queries.

Conclusion

We study the optimization of summary reports in the ARA, which is currently deployed on hundreds of millions of Chrome browsers. We present a rigorous formulation of the contribution budgeting optimization problem for ARA with the goal of equipping researchers with a robust abstraction that facilitates practical improvements.

Our recipe, which leverages historical data to bound and scale the contributions of future data under differential privacy, is quite general and applicable to settings beyond advertising. One approach based on this work is to use past data to learn the parameters of the data distribution, and then to apply synthetic data derived from this distribution for privacy budgeting for queries on future data. Please see the paper and accompanying code for detailed algorithms and proofs.

Acknowledgements

This work was done in collaboration with Badih Ghazi, Pritish Kamath, Ravi Kumar, Pasin Manurangsi, and Avinash Varadarajan. We thank Akash Nadan for his help.

Read More