Generate synthetic counterparty (CR) risk data with generative AI using Amazon Bedrock LLMs and RAG

Generate synthetic counterparty (CR) risk data with generative AI using Amazon Bedrock LLMs and RAG

Data is the lifeblood of modern applications, driving everything from application testing to machine learning (ML) model training and evaluation. As data demands continue to surge, the emergence of generative AI models presents an innovative solution. These large language models (LLMs), trained on expansive data corpora, possess the remarkable capability to generate new content across multiple media formats—text, audio, and video—and across various business domains, based on provided prompts and inputs.

In this post, we explore how you can use these LLMs with advanced Retrieval Augmented Generation (RAG) to generate high-quality synthetic data for a finance domain use case. You can use the same technique for synthetic data for other business domain use cases as well. For this post, we demonstrate how to generate counterparty risk (CR) data, which would be beneficial for over-the-counter (OTC) derivatives that are traded directly between two parties, without going through a formal exchange.

Solution overview

OTC derivatives are typically customized contracts between counterparties and include a variety of financial instruments, such as forwards, options, swaps, and other structured products. A counterparty is the other party involved in a financial transaction. In the context of OTC derivatives, the counterparty refers to the entity (such as a bank, financial institution, corporation, or individual) with whom a derivative contract is made.

For example, in an OTC swap or option contract, one entity agrees to terms with another party, and each entity becomes the counterparty to the other. The responsibilities, obligations, and risks (such as credit risk) are shared between these two entities according to the contract.

As financial institutions continue to navigate the complex landscape of CR, the need for accurate and reliable risk assessment models has become paramount. For our use case, ABC Bank, a fictional financial services organization, has taken on the challenge of developing an ML model to assess the risk of a given counterparty based on their exposure to OTC derivative data.

Building such a model presents numerous challenges. Although ABC Bank has gathered a large dataset from various sources and in different formats, the data may be biased, skewed, or lack the diversity needed to train a highly accurate model. The primary challenge lies in collecting and preprocessing the data to make it suitable for training an ML model. Deploying a poorly suited model could result in misinformed decisions and significant financial losses.

We propose a generative AI solution that uses the RAG approach. RAG is a widely used approach that enhances LLMs by supplying extra information from external data sources not included in their original training. The entire solution can be broadly divided into three steps: indexing, data generation, and validation.

Data indexing

In the indexing step, we parse, chunk, and convert the representative CR data into vector format using the Amazon Titan Text Embeddings V2 model and store this information in a Chroma vector database. Chroma is an open source vector database known for its ease of use, efficient similarity search, and support for multimodal data and metadata. It offers both in-memory and persistent storage options, integrates well with popular ML frameworks, and is suitable for a wide range of AI applications. It is particularly beneficial for smaller to medium-sized datasets and projects requiring local deployment or low resource usage. The following diagram illustrates this architecture.

Here are the steps for data indexing:

  • The sample CR data is segmented into smaller, manageable chunks to optimize it for embedding generation.
  • These segmented data chunks are then passed to a method responsible for both generating embeddings and storing them efficiently.
  • The Amazon Titan Text Embeddings V2 API is called upon to generate high-quality embeddings from the prepared data chunks.
  • The resulting embeddings are then stored in the Chroma vector database, providing efficient retrieval and similarity searches for future use.

Data generation

When the user requests data for a certain scenario, the request is converted into vector format and then looked up in the Chroma database to find matches with the stored data. The retrieved data is augmented with the user request and additional prompts to Anthropic’s Claude Haiku on Amazon Bedrock. Anthropic’s Claude Haiku was chosen primarily for its speed, processing over 21,000 tokens per second, which significantly outpaces its peers. Moreover, Anthropic’s Claude Haiku’s efficiency in data generation is remarkable, with a 1:5 input-to-output token ratio. This means it can generate a large volume of data from a relatively small amount of input or context. This capability not only enhances the model’s effectiveness, but also makes it cost-efficient for our application, where we need to generate numerous data samples from a limited set of examples. Anthropic’s Claude Haiku LLM is invoked iteratively to efficiently manage token consumption and help prevent reaching the maximum token limit. The following diagram illustrates this workflow.

Here are the steps for data generation:

  • The user initiates a request to generate new synthetic counterparty risk data based on specific criteria.
  • The Amazon Titan Text Embeddings V2 LLM is employed to create embeddings for the user’s request prompts, transforming them into a machine-interpretable format.
  • These newly generated embeddings are then forwarded to a specialized module designed to identify matching stored data.
  • The Chroma vector database, which houses previously stored embeddings, is queried to find data that closely matches the user’s request.
  • The identified matching data and the original user prompts are then passed to a module responsible for generating new synthetic data.
  • Anthropic’s Claude Haiku 3.0 model is invoked, using both the matching embeddings and user prompts as input to create high-quality synthetic data.
  • The generated synthetic data is then parsed and formatted into a .csv file using the Pydantic library, providing a structured and validated output.
  • To confirm the quality of the generated data, several statistical methods are applied, including quantile-quantile (Q-Q) plots and correlation heat maps of key attributes, providing a comprehensive validation process.

Data validation

When validating the synthetic CR data generated by the LLM, we employed Q-Q plots and correlation heat maps focusing on key attributes such as cp_exposure, cp_replacement_cost, and cp_settlement_risk. These statistical tools serve crucial roles in promoting the quality and representativeness of the synthetic data. By using the Q-Q plots, we can assess whether these attributes follow a normal distribution, which is often expected in many clinical and financial variables. By comparing the quantiles of our synthetic data against theoretical normal distributions, we can identify significant deviations that might indicate bias or unrealistic data generation.

Simultaneously, the correlation heat maps provide a visual representation of the relationships between these attributes and others in the dataset. This is particularly important because it helps verify that the LLM has maintained the complex interdependencies typically observed in real CR data. For instance, we would expect certain correlations between exposure and replacement cost, or between replacement cost and settlement risk. By making sure these correlations are preserved in our synthetic data, we can be more confident that analyses or models built on this data will yield insights that are applicable to real-world scenarios. This rigorous validation process helps to mitigate the risk of introducing artificial patterns or biases, thereby enhancing the reliability and utility of our synthetic CR dataset for subsequent research or modeling tasks.

We’ve created a Jupyter notebook containing three parts to implement the key components of the solution. We provide code snippets from the notebooks for better understanding.

Prerequisites

To set up the solution and generate test data, you should have the following prerequisites:

  • Python 3 must be installed on your machine
  • We recommend that an integrated development environment (IDE) that can run Jupyter notebooks be installed
  • You can also create a Jupyter notebook instance using Amazon SageMaker from AWS console and develop the code there.
  • You need to have an AWS account with access to Amazon Bedrock and the following LLMs enabled (be careful not to share the AWS account credentials):
    • Amazon Titan Text Embeddings V2
    • Anthropic’s Claude 3 Haiku

Setup

Here are the steps to setup the environment.

import sys!{sys.executable} -m pip install -r requirements.txt

The content of the requirements.txt is given here.

boto3
langchain
langchain-community
streamlit
chromadb==0.4.15
numpy
jq
langchain-aws
seaborn
matplotlib
scipy

The following code snippet will perform all the necessary imports.

from pprint import pprint 
from uuid import uuid4 
import chromadb 
from langchain_community.document_loaders import JSONLoader 
from langchain_community.embeddings import BedrockEmbeddings
from langchain_community.vectorstores import Chroma 
from langchain_text_splitters import RecursiveCharacterTextSplitter

Index data in the Chroma database

In this section, we show how indexing of data is done in a Chroma database as a locally maintained open source vector store. This index data is used as context for generating data.

The following code snippet shows the preprocessing steps of loading the JSON data from a file and splitting it into smaller chunks:

def load_using_jsonloaer(path):
    loader = JSONLoader(path,
                            jq_schema=".[]",
                            text_content=False)
    documents = loader.load()
    return documents

def split_documents(documents):
    doc_list = [item for item in documents]
    text_splitter = RecursiveCharacterTextSplitter(chunk_size=1200, chunk_overlap=0)
    texts = text_splitter.split_documents(doc_list)
    return texts

The following snippet shows how an Amazon Bedrock embedding instance is created. We used the Amazon Titan Embeddings V2 model:

def get_bedrock_embeddings():
    aws_region = "us-east-1"
    model_id = "amazon.titan-embed-text-v2:0" #look for latest version of model
    bedrock_embeddings = BedrockEmbeddings(model_id=model_id, region_name=aws_region)
    return bedrock_embeddings

The following code shows how the embeddings are created and then loaded in the Chroma database:

persistent_client = chromadb.PersistentClient(path="../data/chroma_index")
collection = persistent_client.get_or_create_collection("test_124")
print(collection)
    #     query the database
vector_store_with_persistent_client = Chroma(collection_name="test_124",
                                                 persist_directory="../data/chroma_index",
                                                 embedding_function=get_bedrock_embeddings(),
                                                 client=persistent_client)
load_json_and_index(vector_store_with_persistent_client)

Generate data

The following code snippet shows the configuration used during the LLM invocation using Amazon Bedrock APIs. The LLM used is Anthropic’s Claude 3 Haiku:

config = Config(
    region_name='us-east-1',
    signature_version='v4',
    retries={
        'max_attempts': 2,
        'mode': 'standard'
    }
)
bedrock_runtime = boto3.client('bedrock-runtime', config=config)
model_id = "anthropic.claude-3-haiku-20240307-v1:0" #look for latest version of model
model_kwrgs = {
    "temperature": 0,
    "max_tokens": 8000,
    "top_p": 1.0,
    "top_k": 25,
    "stop_sequences": ["company-1000"],
}
# Initialize the language model
llm = ChatBedrock(
    model_id=model_id,
    model_kwargs=model_kwrgs,
    client=bedrock_runtime,
)

The following code shows how the context is fetched by looking up the Chroma database (where data was indexed) for matching embeddings. We use the same Amazon Titan model to generate the embeddings:

def get_context(scenario):
    region_name = 'us-east-1'
    credential_profile_name = "default"
    titan_model_id = "amazon.titan-embed-text-v2:0"
    kb_context = []
    be = BedrockEmbeddings(region_name=region_name,
                           credentials_profile_name=credential_profile_name,
                           model_id=titan_model_id)

    vector_store = Chroma(collection_name="test_124", persist_directory="../data/chroma_index",
                      embedding_function=be)
    search_results = vector_store.similarity_search(scenario, k=3)
    for doc in search_results:
        kb_context.append(doc.page_content)
    return json.dumps(kb_context)

The following snippet shows how we formulated the detailed prompt that was passed to the LLM. We provided examples for the context, scenario, start index, end index, records count, and other parameters. The prompt is subjective and can be adjusted for experimentation.

# Create a prompt template
prompt_template = ChatPromptTemplate.from_template(
    "You are a financial data expert tasked with generating records "
    "representing company OTC derivative data and "
    "should be good enough for investor and lending ML model to take decisions "
    "and data should accurately represent the scenario: {scenario} n "
    "and as per examples given in context: "
    "and context is {context} "
    "the examples given in context is for reference only, do not use same values while generating dataset."
    "generate dataset with the diverse set of samples but record should be able to represent the given scenario accurately."
    "Please ensure that the generated data meets the following criteria: "
    "The data should be diverse  and realistic, reflecting various industries, "
    "company sizes, financial metrics. "
    "Ensure that the generated data follows logical relationships and correlations between features "
    "(e.g., higher revenue typically corresponds to more employees, "
    "better credit ratings, and lower risk). "
    "And Generate {count} records starting from index {start_index}. "
    "generate just JSON as per schema and do not include any text or message before or after JSON. "
    "{format_instruction} n"
    "If continuing, start after this record: {last_record}n"
    "If stopping, do not include this record in the output."
    "Please ensure that the generated data is well-formatted and consistent."
)

The following code snippet shows the process for generating the synthetic data. You can call this method in an iterative manner to generate more records. The input parameters include scenario, context, count, start_index, and last_record. The response data is also formatted into CSV format using the instruction provided by the following:

output_parser.get_format_instructions():

 def generate_records(start_index, count, scenario, context, last_record=""):
    try:
        response = chain.invoke({
            "count": count,
            "start_index": start_index,
            "scenario": scenario,
            "context": context,
            "last_record": last_record,
            "format_instruction": output_parser.get_format_instructions(),
            "data_set_class_schema": DataSet.schema_json()
        })
        
        return response
    except Exception as e:
        print(f"Error in generate_records: {e}")
        raise e

Parsing the output generated by the LLM and representing it in CSV was quite challenging. We used a Pydantic parser to parse the JSON output generated by the LLM, as shown in the following code snippet:

class CustomPydanticOutputParser(PydanticOutputParser):
    def parse(self, text: str) -> BaseModel:
        # Extract JSON from the text
        try:
            # Find the first occurrence of '{'
            start = text.index('{')
            # Find the last occurrence of '}'
            end = text.rindex('}') + 1
            json_str = text[start:end]

            # Parse the JSON string
            parsed_json = json.loads(json_str)

            # Use the parent class to convert to Pydantic object
            return super().parse_with_cls(parsed_json)
        except (ValueError, json.JSONDecodeError) as e:
            raise ValueError(f"Failed to parse output: {e}")

The following code snippet shows how the records are generated in an iterative manner with 10 records in each invocation to the LLM:

def generate_full_dataset(total_records, batch_size, scenario, context):
    dataset = []
    total_generated = 0
    last_record = ""
    batch: DataSet = generate_records(total_generated,
                                      min(batch_size, total_records - total_generated),
                                      scenario, context, last_record)
    # print(f"batch: {type(batch)}")
    total_generated = len(batch.records)
    dataset.extend(batch.records)
    while total_generated < total_records:
        try:
            batch = generate_records(total_generated,
                                     min(batch_size, total_records - total_generated),
                                     scenario, context, batch.records[-1].json())
            processed_batch = batch.records

            if processed_batch:
                dataset.extend(processed_batch)
                total_generated += len(processed_batch)
                last_record = processed_batch[-1].start_index
                print(f"Generated {total_generated} records.")
            else:
                print("Generated an empty or invalid batch. Retrying...")
                time.sleep(10)
        except Exception as e:
            print(f"Error occurred: {e}. Retrying...")
            time.sleep(5)

    return dataset[:total_records]  # Ensure exactly the requested number of records

Verify the statistical properties of the generated data

We generated Q-Q plots for key attributes of the generated data: cp_exposure, cp_replacement_cost, and cp_settlement_risk, as shown in the following screenshots. The Q-Q plots compare the quantiles of the data distribution with the quantiles of a normal distribution. If the data isn’t skewed, the points should approximately follow the diagonal line.

As the next step of verification, we created a corelation heat map of the following attributes: cp_exposure, cp_replacement_cost, cp_settlement_risk, and risk. The plot is perfectly balanced with the diagonal elements showing a value of 1. The value of 1 indicates the column is perfectly co-related to itself. The following screenshot is the correlation heatmap.

Clean up

It’s a best practice to clean up the resources you created as part of this post to prevent unnecessary costs and potential security risks from leaving resources running. If you created the Jupyter notebook instance in SageMaker please complete the following steps:

  1. Save and shut down the notebook:
    # First save your work
    # Then close all open notebooks by clicking File -> Close and Halt 
  2. Clear the output (if needed before saving):
    # Option 1: Using notebook menu
    # Kernel -> Restart & Clear Output
    
    # Option 2: Using code
    from IPython.display import clear_output
    clear_output()
  3. Stop and delete the Jupyter notebook instance created in SageMaker:
    # Option 1: Using aws cli
    # Stop the notebook instance when not in use
    aws sagemaker stop-notebook-instance --notebook-instance-name <your-notebook-name>
    
    # If you no longer need the notebook instance
    aws sagemaker delete-notebook-instance --notebook-instance-name <your-notebook-name>
    
    # Option 2: Using Sagemager Console
    # Amazon Sagemaker -> Notebooks
    # Select the Notebook and click Actions drop-down and hit Stop.
    Click Actions drop-down and hit Delete

Responsible use of AI

Responsible AI use and data privacy are paramount when using AI in financial applications. Although synthetic data generation can be a powerful tool, it’s crucial to make sure that no real customer information is used without proper authorization and thorough anonymization. Organizations must prioritize data protection, implement robust security measures, and adhere to relevant regulations. Additionally, when developing and deploying AI models, it’s essential to consider ethical implications, potential biases, and the broader societal impact. Responsible AI practices include regular audits, transparency in decision-making processes, and ongoing monitoring to help prevent unintended consequences. By balancing innovation with ethical considerations, financial institutions can harness the benefits of AI while maintaining trust and protecting individual privacy.

Conclusion

In this post, we showed how to generate a well-balanced synthetic dataset representing various aspects of counterparty data, using RAG-based prompt engineering with LLMs. Counterparty data analysis is imperative for making OTC transactions between two counterparties. Because actual business data in this domain isn’t easily available, using this approach you can generate synthetic training data for your ML models at minimal cost often within minutes. After you train the model, you can use it to make intelligent decisions before entering into an OTC derivative transaction.

For more information about this topic, refer to the following resources:


About the Authors

Santosh Kulkarni is a Senior Moderation Architect with over 16 years of experience, specialized in developing serverless, container-based, and data architectures for clients across various domains. Santosh’s expertise extends to machine learning, as a certified AWS ML specialist. Currently, engaged in multiple initiatives leveraging AWS Bedrock and hosted Foundation models.

Joyanta Banerjee is a Senior Modernization Architect with AWS ProServe and specializes in building secure and scalable cloud native application for customers from different industry domains. He has developed an interest in the AI/ML space particularly leveraging Gen AI capabilities available on Amazon Bedrock.

Mallik Panchumarthy is a Senior Specialist Solutions Architect for generative AI and machine learning at AWS. Mallik works with customers to help them architect efficient, secure and scalable AI and machine learning applications. Mallik specializes in generative AI services Amazon Bedrock and Amazon SageMaker.

Read More

Turbocharging premium audit capabilities with the power of generative AI: Verisk’s journey toward a sophisticated conversational chat platform to enhance customer support

Turbocharging premium audit capabilities with the power of generative AI: Verisk’s journey toward a sophisticated conversational chat platform to enhance customer support

This post is co-written with Sajin Jacob, Jerry Chen, Siddarth Mohanram, Luis Barbier, Kristen Chenowith, and Michelle Stahl from Verisk.

Verisk (Nasdaq: VRSK) is a leading data analytics and technology partner for the global insurance industry. Through advanced analytics, software, research, and industry expertise across more than 20 countries, Verisk helps build resilience for individuals, communities, and businesses. The company is committed to ethical and responsible AI development with human oversight and transparency. Verisk is using generative AI to enhance operational efficiencies and profitability for insurance clients while adhering to its ethical AI principles.

Verisk’s Premium Audit Advisory Service (PAAS®) is the leading source of technical information and training for premium auditors and underwriters. PAAS helps users classify exposure for commercial casualty insurance, including general liability, commercial auto, and workers’ compensation. PAAS offers a wide range of essential services, including more than 40,000 classification guides and more than 500 bulletins. PAAS now includes PAAS AI, the first commercially available interactive generative-AI chats specifically developed for premium audit, which reduces research time and empower users to make informed decisions by answering questions and quickly retrieving and summarizing multiple PAAS documents like class guides, bulletins, rating cards, etc.

In this post, we describe the development of the customer support process in PAAS, incorporating generative AI, the data, the architecture, and the evaluation of the results. Conversational AI assistants are rapidly transforming customer and employee support. Verisk has embraced this technology and developed its own PAAS AI, which provides an enhanced self-service capability to the PAAS platform.

The opportunity

The Verisk PAAS platform houses a vast array of documents—including class guides, advisory content, and bulletins—that aid Verisk’s customers in determining the appropriate rules and classifications for workers’ compensation, general liability, and commercial auto business. When premium auditors need accurate answers within this extensive document repository, the challenges they face are:

  • Overwhelming volume – The sheer volume of documents (advisories, bulletins, and so on) makes manual searching time-consuming and inefficient
  • Slow response times – Finding accurate information within this vast repository can be slow, hindering timely decision-making
  • Inconsistent quality of responses – Manual searches might yield irrelevant or incomplete results, leading to uncertainty and potential errors

To address this issue, Verisk PAAS AI is designed to alleviate the burden by providing round-the-clock support for business processing and delivering precise and quick responses to customer queries. This technology is deeply integrated into Verisk’s newly reimagined PAAS platform, using all of Verisk’s documentation, training materials, and collective expertise. It employs a retrieval augmented generation (RAG) approach and a combination of AWS services alongside proprietary evaluations to promptly answer most user questions about the capabilities of the Verisk PAAS platform.

When deployed at scale, this PAAS AI will enable Verisk staff to dedicate more time to complex issues, critical projects, and innovation, thereby enhancing the overall customer experience. Throughout the development process, Verisk encountered several considerations, key findings, and decisions that provide valuable insights for any enterprise looking to explore the potential of generative AI.

The approach

When creating an interactive agent using large language models (LLMs), two common approaches are RAG and model fine-tuning. The choice between these methods depends on the specific use case and available data. Verisk PAAS began developing a RAG pipeline for its PAAS AI and has progressively improved this solution. Here are some reasons why continuing with a RAG architecture was beneficial for Verisk:

  • Dynamic data access – The PAAS platform is constantly evolving, adding new business functions and technical capabilities. Verisk needed to make sure its responses are based on the most current information. The RAG approach allows access to continuously updated data, providing responses with the latest information without frequently retraining the model.
  • Multiple data sources – Besides data recency, another crucial aspect is the ability to draw from multiple PAAS resources to acquire relevant context. The ease of expanding the knowledge base without the need for fine-tuning new data sources makes the solution adaptable.
  • Reduced hallucinations – Retrieval minimizes the risk of hallucinations compared with free-form text generation because responses come directly from the provided excerpts. Verisk developed an evaluation tool to enhance response quality.
  • LLM linguistics – Although appropriate context can be retrieved from enterprise data sources, the underlying LLM manages the linguistics and fluency.
  • Transparency – Verisk aimed to consistently improve the PAAS AI’s response generation ability. A RAG architecture offered the transparency required in the context retrieval process, which would ultimately be used to generate user responses. This transparency helped Verisk identify areas where document restructuring was needed.
  • Data governance – With diverse users accessing the platform and differing data access permissions, data governance and isolation were critical. Verisk implemented controls within the RAG pipeline to restrict data access based on user permissions, helping to ensure that responses are delivered only to authorized users.

Although both RAG and fine-tuning have their pros and cons, RAG is the best approach for building a PAAS AI on the PAAS platform, given Verisk’s needs for real-time accuracy, explainability, and configurability. The pipeline architecture supports iterative enhancement as the use cases for the Verisk PAAS platform develop.

Solution overview

The following diagram showcases a high-level architectural data flow that highlights various AWS services used in constructing the solution. Verisk’s system demonstrates a complex AI setup, where multiple components interact and frequently call on the LLM to provide user responses. Employing the PAAS platform to manage these varied components was an intuitive decision.

Premium Audit Advisory Service AI Pipeline

The key components are as follows:

Amazon ElastiCache

Verisk’s PAAS team determined that ElastiCache is the ideal solution for storing all chat history. This storage approach allows for seamless integration in conversational chats and enables the display of recent conversations on the website, providing an efficient and responsive user experience.

Amazon Bedrock

Anthropic’s Claude, available in Amazon Bedrock, played various roles within Verisk’s solution:

  • Response generation – When building their PAAS AI, Verisk conducted a comprehensive evaluation of leading LLMs, using their extensive dataset to test each model’s capabilities. Through Amazon Bedrock, Verisk gained streamlined access to multiple best-in-class foundation models (FMs), enabling efficient testing and comparison across key performance criteria. The Amazon Bedrock unified API and robust infrastructure provided the ideal platform to develop, test, and deploy LLM solutions at scale. After this extensive testing, Verisk found Anthropic’s Claude model consistently outperformed across key criteria. Anthropic’s Claude demonstrated superior language understanding in Verisk’s complex business domain, allowing more pertinent responses to user questions. Given the model’s standout results across Verisk PAAS platform use cases, it was the clear choice to power the PAAS AI’s natural language capabilities.
  • Conversation summarization – When a user asks a follow-up question, the PAAS AI can continue the conversational thread. To enable this, Verisk used Claude to summarize the dialogue to update the context from ElastiCache. The full conversation summary and new excerpts are input to the LLM to generate the next response. This conversational flow allows the PAAS AI to answer user follow-up questions and have a more natural, contextual dialogue, bringing Verisk PAAS closer to having a true AI assistant that can engage in useful, back-and-forth conversations with users.
  • Keyword extraction – Keywords are extracted from user questions and previous conversations to be used for creating the new summarized prompt and to be input to Verisk’s knowledge base retrievers to perform vector similarity search.

Amazon OpenSearch Service

Primarily used for the storage of text embeddings, OpenSearch facilitates efficient document retrieval by enabling rapid access to indexed data. These embeddings serve as semantic representations of documents, allowing for advanced search capabilities that go beyond simple keyword matching. This semantic search functionality enhances the system’s ability to retrieve relevant documents that are contextually similar to the search queries, thereby improving the overall accuracy and speed of data queries. Additionally, OpenSearch functions as a semantic cache for similarity searches, optimizing performance by reducing the computational load and improving response times during data retrieval operations. This makes it an indispensable tool in the larger PAAS ecosystem, where the need for quick and precise information access is paramount.

Snowflake in Amazon

The integration of Snowflake in the PAAS AI ecosystem helps provide scalable and real-time access to data, allowing Verisk to promptly address customer concerns and improve its services. By using Snowflake’s capabilities, Verisk can perform advanced analytics, including sentiment analysis and predictive modeling, to better understand customer needs and enhance user experiences. This continuous feedback loop is vital for refining the PAAS AI and making sure it remains responsive and relevant to user demands.

Structuring and retrieving the data

An essential element in developing the PAAS AI’s knowledge base was properly structuring and effectively querying the data to deliver accurate answers. Verisk explored various techniques to optimize both the organization of the content and the methods to extract the most relevant information:

  • Chunking – A key step in preparing the accumulated questions and answers was splitting the data into individual documents to facilitate indexing into OpenSearch Service. Rather than uploading large files containing multiple pages of content, Verisk chunked the data into smaller segments by document section and character lengths. By splitting the data into small, modular chunks focused on a single section of a document, Verisk could more easily index each document and had greater success in pulling back the correct context. Chunking the data also enabled straightforward updating and reindexing of the knowledge base over time.
  • Hybrid query – When querying the knowledge base, Verisk found that using just standard vector search wasn’t enough to retrieve all the relevant contexts pertaining to a question. Therefore, a solution was implemented to combine a sparse bm25 search in combination with the dense vector search to create a hybrid search approach, which yielded much better context retrieval results.
  • Data separation and filters – Another issue Verisk ran into was that, because of the vast amount of documents and the overlapping content within certain topics, incorrect documents were being retrieved for some questions that asked for specific topics that were present across multiple sources—some of these weren’t needed or appropriate in the context of the user’s question. Therefore, data separation was implemented to split the documents based on document type and filter by line of business to improve context retrieval within the application.

By thoroughly experimenting and optimizing both the knowledge base powering the PAAS AI and the queries to extract answers from it, Verisk was able to achieve very high answer accuracy during the proof of concept, paving the way for further development. The techniques explored—hybrid querying, HTML section chunking, and index filtering—became core elements of Verisk’s approach for extracting quality contexts.

LLM parameters and models

Experimenting with prompt structure, length, temperature, role-playing, and context was key to improving the quality and accuracy of the PAAS AI’s Claude-powered responses. The prompt design guidelines provided by Anthropic were incredibly helpful.

Verisk crafted prompts that provided Anthropic’s Claude with clear context and set roles for answering user questions. Setting the temperature to 0 helped reduce the randomness and indeterministic nature of LLM-generated responses.

Verisk also experimented with different models to improve the efficiency of the overall solution. For scenarios where latency was more important and less reasoning was required, Anthropic’s Claude Haiku was the perfect solution. For other scenarios such as question answering using provided contexts where it was more important for the LLM to be able to understand every detail given in the prompt, Anthropic’s Claude Sonnet was the better choice to balance latency, performance, and cost.

Guardrails

LLM guardrails were implemented in the PAAS AI project using both the guardrails provided by Amazon Bedrock and specialized sections within the prompt to detect unrelated questions and prompt attack attempts. Amazon Bedrock guardrails can be attached to any Amazon Bedrock model invocation call and automatically detect if the given model input and output are in violation of the language filters that are set (violence, misconduct, sexual, and so on), which helps with screening user inputs. The specialized prompts further improve LLM security by creating a second net that uses the power of the LLMs to catch any inappropriate inputs from the users.

This allows Verisk to be confident that the model will only answer to its intended purpose surrounding premium auditing services and will not be misused by threat actors.

PAAS Evaluation API Pipeline

After validating several evaluation tools such as Deepeval, Ragas, Trulens, and so on, the Verisk PAAS team realized that there were certain limitations to using these tools for their specific use case. Consequently, the team decided to develop its own evaluation API, shown in the following figure.

This custom API evaluates the answers based on three major metrics:

  • Answer relevancy score – Using LLMs, the process assesses whether the answers provided are relevant to the customer’s prompt. This helps make sure that the responses are directly addressing the questions posed.
  • Context relevancy score – By using LLMs, the process evaluates whether the context retrieved is appropriate and aligns well with the question. This helps make sure that the LLM has the appropriate and accurate contexts to generate a response.
  • Faithfulness score – Using LLMs, the process checks if the responses are generated based on their retrieved context or if they are hallucinated. This is crucial for maintaining the integrity and reliability of the information provided.

This custom evaluation approach helps make sure that the answers generated are not only relevant and contextually appropriate but also faithful to the established generative AI knowledge base, minimizing the risk of misinformation. By incorporating these metrics, Verisk has enhanced the robustness and reliability of their PAAS AI, providing customers with accurate and trustworthy responses.

Feedback loop of PAAS AI platform

The Verisk PAAS team has implemented a comprehensive feedback loop mechanism, shown in the following figure, to support continuous improvement and address any issues that might arise.

This feedback loop is structured around the following key components:

  • Customer feedback analysis – The team actively collects and analyzes feedback from customers to identify potential data issues or problems with the generative AI responses. This analysis helps pinpoint specific areas that need improvement.
  • Issue categorization – After an issue is identified, it’s categorized based on its nature. If it’s a data-related issue, it’s assigned to the internal business team for resolution. If it’s an application issue, a Jira ticket is automatically created for the PAAS IT team to address and fix the problem.
  • QA test case updates – The system provides an option to update QA test cases based on the feedback received. This helps make sure that the test scenarios remain relevant and comprehensive, covering a wide range of potential issues.
  • Ground truth agreements – Ground truth agreements, which serve as the benchmark for evaluating LLM response quality, are periodically reviewed and updated. This helps make sure that the evaluation metrics remain accurate and reflective of the desired standards.
  • Ongoing evaluations – Regular evaluations of the LLM responses are conducted using the updated QA test cases and ground truth agreements. This helps in maintaining high-quality responses and quickly addressing any deviations from the expected standards.

This robust feedback loop mechanism enables Verisk to continuously fine-tune the PAAS AI, making sure that it delivers precise, relevant, and contextually appropriate answers to customer queries. By integrating customer feedback, categorizing issues efficiently, updating test scenarios, and adhering to stringent evaluation protocols, Verisk maintains a high standard of service and drives continuous improvement in its generative AI capabilities.

Business impact

Verisk initially rolled out the PAAS AI to one beta customer to demonstrate real-world performance and impact. Supporting a customer in this way is a stark contrast to how Verisk has historically engaged with and supported customers in the past, where Verisk would typically have a team allocated to interact with the customer directly. Verisk’s PAAS AI has revolutionized the way subject matter experts (SMEs) work and cost-effectively scales while still providing high-quality assistance. What previously took hours of manual review can now be accomplished in minutes, resulting in an extraordinary 96–98% reduction in processing time per specialist. This dramatic improvement in efficiency not only streamline operations but also allows Verisk’s experts to focus on more strategic initiatives that drive greater value for the organization.

In analyzing this early usage data, Verisk uncovered additional areas where it can drive business value for its customers. As Verisk collects additional information, this data will help uncover what will be needed to improve results and prepare to roll out to a wider customer base of approximately 15,000 users.

Ongoing development will focus on expanding these capabilities, prioritized based on the collected questions. Most exciting, though, are the new possibilities on the horizon with generative AI. Verisk knows this technology is rapidly advancing and is eager to harness innovations to bring even more value to customers. As new models and techniques emerge, Verisk plans to adapt the PAAS AI to take advantage of the latest capabilities. Although the PAAS AI currently focuses on responding to user questions, this is only the starting point. Verisk plans to quickly improve its capabilities to proactively make suggestions and configure functionality directly in the system itself. The Verisk PAAS team is inspired by the challenge of pushing the boundaries of what’s possible with generative AI and is excited to test those boundaries.

Conclusion

Verisk’s development of a PAAS AI for its PAAS platform demonstrates the transformative power of generative AI in customer support and operational efficiency. Through careful data harvesting, structuring, retrieval, and the use of LLMs, semantic search functionalities, and stringent evaluation protocols, Verisk has crafted a robust system that delivers accurate, real-time answers to user questions. By continuing to enhance the PAAS AI’s features while maintaining ethical and responsible AI practices, Verisk is set to provide increased value to its customers, enable staff to concentrate on innovation, and establish new benchmarks for customer service in the insurance sector.

For more information, see the following resources:


About the Authors

Sajin Jacob is the Director of Software Engineering at Verisk, where he leads the Premium Audit Advisory Service (PAAS) development team. In this role, Sajin plays a crucial part in designing the architecture and providing strategic guidance to eight development teams, optimizing their efficiency and ensuring the maintainability of all solutions. He holds an MS in Software Engineering from Periyar University, India.

Jerry Chen is a Lead Software Developer at Verisk, based in Jersey City. He leads the GenAi development team, working on solutions for projects within the Verisk Underwriting department to enhance application functionalities and accessibility. Within PAAS, he has worked on the implementation of the conversational RAG architecture with enhancements such as hybrid search, guardrails, and response evaluations. Jerry holds a degree in Computer Science from Stevens Institute of Technology.

Sid Mohanram is the Senior Vice President of Core Lines Technology at Verisk. His area of expertise includes data strategy, analytics engineering, and digital transformation. Sid is head of the technology organization with global teams across five countries. He is also responsible for leading the technology transformation for the multi-year Core Lines Reimagine initiative. Sid holds an MS in Information Systems from Stevens Institute of Technology.

Luis Barbier is the Chief Technology Officer (CTO) of Verisk Underwriting at Verisk. He provides guidance to the development teams’ architectures to maximize efficiency and maintainability for all underwriting solutions. Luis holds an MBA from Iona University.

Kristen Chenowith, MSMSL, CPCU, WCP, APA, CIPA, AIS, is PAAS Product Manager at Verisk. She is currently the product owner for the Premium Audit Advisory Service (PAAS) product suite, including PAAS AI, a first to market generative AI chat tool for premium audit that accelerates research for many consultative questions by 98% compared to traditional methods. Kristen holds an MS in Management, Strategy and Leadership at Michigan State University and a BS in Business Administration at Valparaiso University. She has been in the commercial insurance industry and premium audit field since 2006.

Michelle Stahl, MBA, CPCU, AIM, API, AIS, is a Digital Product Manager with Verisk. She has over 20 years of experience building and transforming technology initiatives for the insurance industry. She has worked as a software developer, project manager, and product manager throughout her career.

Arun Pradeep Selvaraj is a Senior Solutions Architect at AWS. Arun is passionate about working with his customers and stakeholders on digital transformations and innovation in the cloud while continuing to learn, build, and reinvent. He is creative, fast-paced, deeply customer-obsessed, and uses the working backward process to build modern architectures to help customers solve their unique challenges. Connect with him on LinkedIn.

Ryan Doty is a Solutions Architect Manager at AWS, based out of New York. He helps financial services customers accelerate their adoption of the AWS Cloud by providing architectural guidelines to design innovative and scalable solutions. Coming from a software development and sales engineering background, the possibilities that the cloud can bring to the world excite him.

Apoorva Kiran, PhD, is a Senior Solutions Architect at AWS, based out of New York. He is aligned with the financial service industry, and is responsible for providing architectural guidelines to design innovative and scalable fintech solutions. He specializes in developing and commercializing artificial intelligence and machine learning products. Connect with him on LinkedIn.

Read More

Exploring the structural changes driving protein function with BioEmu-1

Exploring the structural changes driving protein function with BioEmu-1

The image shows eight different 3D models of protein structures. Each model is color-coded with various segments in blue, green, orange, and other colors to highlight different parts of the protein.

From forming muscle fibers to protecting us from disease, proteins play an essential role in almost all biological processes in humans and other life forms alike. There has been extraordinary progress in recent years toward better understanding protein structures using deep learning, enabling the accurate prediction of protein structures from their amino acid sequences. However, predicting a single protein structure from its amino acid sequence is like looking at a single frame of a movie—it offers only a snapshot of a highly flexible molecule. Biomolecular Emulator-1 (BioEmu-1) is a deep-learning model that provides scientists with a glimpse into the rich world of different structures each protein can adopt, or structural ensembles, bringing us a step closer to understanding how proteins work. A deeper understanding of proteins enables us to design more effective drugs, as many medications work by influencing protein structures to boost their function or prevent them from causing harm.

One way to model different protein structures is through molecular dynamics (MD) simulations. These tools simulate how proteins move and deform over time and are widely used in academia and industry. However, in order to simulate functionally important changes in structure, MD simulations must be run for a long time. This is a computationally demanding task and significant effort has been put into accelerating simulations, going as far as designing custom computer architectures (opens in new tab). Yet, even with these improvements, many proteins remain beyond what is currently possible to simulate and would require simulation times of years or even decades. 

Enter BioEmu-1 (opens in new tab)—a deep learning model that can generate thousands of protein structures per hour on a single graphics processing unit. Today, we are making BioEmu-1 open-source (opens in new tab), following our preprint (opens in new tab) from last December, to empower protein scientists in studying structural ensembles with our model. It provides orders of magnitude greater computational efficiency compared to classical MD simulations, thereby opening the door to insights that have, until now, been out of reach. BioEmu-1 is featured in Azure AI Foundry Labs (opens in new tab), a hub for developers, startups, and enterprises to explore groundbreaking innovations from research at Microsoft.

on-demand event

Microsoft Research Forum Episode 4

Learn about the latest multimodal AI models, advanced benchmarks for AI evaluation and model self-improvement, and an entirely new kind of computer for AI inference and hard optimization.


We have enabled this by training BioEmu-1 on three types of data sets: (1) AlphaFold Database (AFDB) (opens in new tab) structures (2) an extensive MD simulation dataset, and (3) an experimental protein folding stability dataset (opens in new tab). Training BioEmu-1 on the AFDB structures is like mapping distinct islands in a vast ocean of possible structures. When preparing this dataset, we clustered similar protein sequences so that BioEmu-1 can recognize that a protein sequence maps to multiple distinct structures. The MD simulation dataset helps BioEmu-1 predict physically plausible structural changes around these islands, mapping out the plethora of possible structures that a single protein can adopt. Finally, through fine-tuning on the protein folding stability dataset, BioEmu-1 learns to sample folded and unfolded structures with the right probabilities.

Figure 1: BioEmu-1 predicts diverse structures of LapD protein unseen during training. We sampled structures independently and reordered the samples to create a movie connecting two experimentally known structures.

Combining these advances, BioEmu-1 successfully generalizes to unseen protein sequences and predicts multiple structures. In Figure 1, we show that BioEmu-1can predict structures of the LapD protein (opens in new tab) from Vibrio cholerae bacteria, which causes cholera. BioEmu-1 predicts structures of LapD when it is bound and unbound with c-di-GMP molecules, both of which are experimentally known but not in the training set. Furthermore, our model offers a view on intermediate structures, which have never been experimentally observed, providing viable hypotheses about how this protein functions. Insights into how proteins function pave the way for further advancements in areas like drug development.

The figure compares Molecular Dynamics (MD) simulation and BioEmu-1, and shows that BioEmu-1 can emulate the equilibrium distribution 100,000 times faster than running a MD simulation to full convergence. The middle part of the figure shows that the 2D projections of the structure distributions obtained from MD simulation and BioEmu-1 are nearly identical. The bottom part of the figure shows three representative structures from the equilibrium distribution.
Figure 2: BioEmu-1 reproduces the D. E. Shaw research (DESRES) simulation of Protein G accurately with a fraction of the computational cost. On the top, we compare the distributions of structures obtained by extensive MD simulation (left) and independent sampling from BioEmu-1 (right). Three representative sample structures are shown at the bottom.

Moreover, BioEmu-1 reproduces MD equilibrium distributions accurately with a tiny fraction of the computational cost. In Figure 2, we compare 2D projections of the structural distribution of D. E. Shaw research (DESRES) simulation of Protein G (opens in new tab) and samples from BioEmu-1. BioEmu-1 reproduces the MD distribution accurately, while requiring 10,000-100,000 times fewer GPU hours.

The left panel of the figure shows a scatter plot of the experimental folding free energies ΔG against those predicted by BioEmu-1. The plot shows a good correlation between the two. The right panel of the figure shows folded and unfolded structures of a protein.
Figure 3: BioEmu-1 accurately predicts protein stability. On the left, we plot the experimentally measured free energy differences ΔG against those predicted by BioEmu-1. On the right, we show a protein in folded and unfolded structures.

Furthermore, BioEmu-1 accurately predicts protein stability, which we measure by computing the folding free energies—a way to quantify the ratio between the folded and unfolded states of a protein. Protein stability is an important factor when designing proteins, e.g., for therapeutic purposes. Figure 3 shows the folding free energies predicted by BioEmu-1, obtained by sampling protein structures and counting folded versus unfolded protein structures, compared against experimental folding free energy measurements. We see that even on sequences that BioEmu-1 has never seen during training, the predicted free energy values correlate well with experimental values.

Professor Martin Steinegger (opens in new tab) of Seoul National University, who was not part of the study, says “With highly accurate structure prediction, protein dynamics is the next frontier in discovery. BioEmu marks a significant step in this direction by enabling blazing-fast sampling of the free-energy landscape of proteins through generative deep learning.”

We believe that BioEmu-1 is a first step toward generating the full ensemble of structures that a protein can take. In these early days, we are also aware of its limitations. With this open-source release, we hope scientists will start experimenting with BioEmu-1, helping us carve out its potentials and shortcomings so we can improve it in the future. We are looking forward to hearing how it performs on various proteins you care about.

Acknowledgements

BioEmu-1 is the result of highly collaborative team effort at Microsoft Research AI for Science. The full authors: Sarah Lewis, Tim Hempel, José Jiménez-Luna, Michael Gastegger, Yu Xie, Andrew Y. K. Foong, Victor García Satorras, Osama Abdin, Bastiaan S. Veeling, Iryna Zaporozhets, Yaoyi Chen, Soojung Yang, Arne Schneuing, Jigyasa Nigam, Federico Barbero, Vincent Stimper, Andrew Campbell, Jason Yim, Marten Lienen, Yu Shi, Shuxin Zheng, Hannes Schulz, Usman Munir, Ryota Tomioka, Cecilia Clementi, Frank Noé

The post Exploring the structural changes driving protein function with BioEmu-1 appeared first on Microsoft Research.

Read More

It’s a Sign: AI Platform for Teaching American Sign Language Aims to Bridge Communication Gaps

It’s a Sign: AI Platform for Teaching American Sign Language Aims to Bridge Communication Gaps

American Sign Language is the third most prevalent language in the United States — but there are vastly fewer AI tools developed with ASL data than data representing the country’s most common languages, English and Spanish.

NVIDIA, the American Society for Deaf Children and creative agency Hello Monday are helping close this gap with Signs, an interactive web platform built to support ASL learning and the development of accessible AI applications.

Sign language learners can access the platform’s validated library of ASL signs to expand their vocabulary with the help of a 3D avatar that demonstrates signs — and use an AI tool that analyzes webcam footage to receive real-time feedback on their signing. Signers of any skill level can contribute by signing specific words to help build an open-source video dataset for ASL.

The dataset — which NVIDIA aims to grow to 400,000 video clips representing 1,000 signed words — is being validated by fluent ASL users and interpreters to ensure the accuracy of each sign, resulting in a high-quality visual dictionary and teaching tool.

“Most deaf children are born to hearing parents. Giving family members accessible tools like Signs to start learning ASL early enables them to open an effective communication channel with children as young as six to eight months old,” said Cheri Dowling, executive director of the American Society for Deaf Children. “And knowing that professional ASL teachers have validated all the vocabulary on the platform, users can be confident in what they’re learning.”

NVIDIA teams plan to use this dataset to further develop AI applications that break down communication barriers between the deaf and hearing communities. The data is slated to be available to the public as a resource for building accessible technologies including AI agents, digital human applications and video conferencing tools. It could also be used to enhance Signs and enable ASL platforms across the ecosystem with real-time, AI-powered support and feedback.

Three people practicing sign language using Signs AI platform
Whether novice or expert, volunteers can record themselves signing to contribute to the ASL dataset.

Supporting ASL Education, Exploring Language Nuance

During the data collection phase, Signs already provides a powerful platform for ASL language acquisition, offering opportunities for individuals to learn and practice an initial set of 100 signs so they can more effectively communicate with friends or family members who use ASL.

“The Signs learning platform could help families with deaf children quickly search for a specific word and see how to make the corresponding sign. It’s a tool that can help support their everyday use of ASL outside of a more formal class,” Dowling said. “I see both kids and parents exploring it — and I think they could play with it together.”person signing the word "vegetable" using Signs AI platform

While Signs currently focuses on hand movements and finger positions for each sign, ASL also incorporates facial expressions and head movements to convey meaning. The team behind Signs is exploring how these non-manual signals can be tracked and integrated in future versions of the platform.

They’re also investigating how other nuances, like regional variations and slang terms, can be represented in Signs to enrich its ASL database — and working with researchers at the Rochester Institute of Technology’s Center for Accessibility and Inclusion Research to evaluate and further improve the user experience of the Signs platform for deaf and hard-of-hearing users.

“Improving ASL accessibility is an ongoing effort,” said Anders Jessen, founding partner of Hello Monday/DEPT, which built the Signs web platform and previously worked with the American Society for Deaf Children on Fingerspelling.xyz, an application that taught users the ASL alphabet. “Signs can serve the need for advanced AI tools that help transcend communication barriers between the deaf and hearing communities.”

The dataset behind Signs is planned for release later this year.

Start learning or contributing with Signs at signs-ai.com, and learn more about NVIDIA’s trustworthy AI initiatives. Attendees of NVIDIA GTC, a global AI conference taking place March 17-21 in San Jose, will be able to participate in Signs live at the event.

Read More

Step Into the World of ‘Avowed’ on GeForce NOW

Step Into the World of ‘Avowed’ on GeForce NOW

Wield magic and steel as GeForce NOW’s fifth-anniversary celebration summons Obsidian Entertainment’s highly anticipated Avowed to the cloud.

This first-person fantasy role-playing game is ready to enchant cloud gamers, leading the charge of six titles joining the over 2,000 games in the cloud gaming library.

GeForce NOW day passes are available to purchase again, in limited quantities each day. Members can currently purchase one day at a time, based on available capacity. Day pass users get 24-hour access to powerful cloud gaming with all the benefits of a GeForce NOW Ultimate or Performance membership. Stay tuned for updates as more membership options become available.

Choose Your Own Adventure

Avowed on GeForce NOW
Cloudy with a chance of dragons.

Embark on a thrilling adventure in Avowed, set in the captivating world of Eora. As an envoy of Aedyr, explore the mysterious Living Lands, an island teeming with ancient magic and shifting secrets, as a dire threat looms over the realm: a mysterious plague that defies nature and reason, spreading chaos across the sprawling wilderness.

The Living Lands offer a diverse array of environments to explore, each with a unique ecosystem. Engage in visceral combat by mixing and matching swords, spells, guns and shields. Companions of various species, each with their own abilities and quests, will join the adventure, their fates intertwined with the players’ choices. As the story unfolds, every decision will ripple across the Living Lands, shaping the future of its inhabitants and testing the players’ resolve in the face of intrigue, power and danger.

GeForce NOW members can dive into this immersive fantasy world with the power of GeForce RTX-powered gaming rigs in the cloud. Ultimate members can stream the game at up to 4K resolution and 60 frames per second with high dynamic range on supported devices. These members enjoy additional benefits like NVIDIA DLSS 3 technology for enhanced frame rates and NVIDIA Reflex for ultra-low latency, delivering a seamless and visually stunning adventure through the Living Lands.

Time to Play

Lost Records: Bloom & Rage on GeForce NOW
Some mixtapes are better left unplayed.

Lost Records: Bloom & Rage is the recently released narrative-adventure game by Don’t Nod, the creators of Life Is Strange. Set in the fictional Michigan town of Velvet Cove, the game follows four friends — Swann, Nora, Autumn and Kat — during the summer of 1995, as well as 27 years later in 2022.

Explore Swann’s world through a nostalgic 90s lens, complete with a camcorder for capturing and reliving memories. The story unfolds across two timelines, delving into themes of friendship and identity, as well as a mysterious secret that tore the group apart. With its immersive storytelling, interactive environments and choice-driven gameplay, Lost Records: Bloom & Rage promises a captivating journey through time, nostalgia and the complexities of lifelong friendships.

Look for the following games available to stream in the cloud this week:

  • Avowed (New release on Steam, Battle.net and Xbox, available on PC Game Pass, Feb. 18)
  • Warhammer 40,000: Rogue Trader (New release on Xbox, available on PC Game Pass, Feb. 20)
  • Lost Records: Bloom & Rage (New release on Steam, Feb. 18)
  • Abiotic Factor (Steam)
  • HUMANITY (Steam)
  • Songs of Silence (Steam)

What are you planning to play this weekend? Let us know on X or in the comments below.

Read More

Into the Omniverse: How OpenUSD and Synthetic Data Are Shaping the Future for Humanoid Robots

Into the Omniverse: How OpenUSD and Synthetic Data Are Shaping the Future for Humanoid Robots

Editor’s note: This post is part of Into the Omniverse, a series focused on how developers, 3D practitioners and enterprises can transform their workflows using the latest advances in OpenUSD and NVIDIA Omniverse.

Humanoid robots are rapidly becoming a reality. Those built on NVIDIA Isaac GR00T are already learning to walk, manipulate objects and otherwise interact with the real world.

Gathering diverse and large datasets to train these sophisticated machines can be time-consuming and costly. Using synthetic data (SDG), generated from physically-accurate digital twins, researchers and developers can train and validate their AI models in simulation before deployment in the real world.

Universal Scene Description, aka OpenUSD, is a powerful framework that makes it easy to build these physically accurate virtual environments. Once 3D environments are built, OpenUSD allows teams to develop detailed, scalable simulations along with lifelike scenarios where robots can practice, learn and improve their skills.

This synthetic data is essential for humanoid robots to learn humanlike behaviors such as walking, grasping objects and navigating complex environments. OpenUSD is enhancing the development of humanoid robots and paving the way for a future where these machines can seamlessly integrate into people’s daily lives.

The NVIDIA Omniverse platform, powered by OpenUSD, provides developers a way to unify 3D assets from disparate sources such as 3DCAD and digital content creation (DCC) tools. This allows them to build large-scale 3D virtual environments and run complex simulations to train their robots, streamlining the entire process and delivering faster, more cost-effective ways to collaborate and develop physical AI.

Advancing Robot Training With Synthetic Motion Data

At CES last month, NVIDIA announced the Isaac GR00T Blueprint for synthetic motion generation to help developers generate exponentially larger synthetic motion datasets to train humanoids using imitation learning.

Highlights of the release include:

  • Large-Scale Motion Data Generation: Uses simulation as well generative AI techniques to generate exponentially large and diverse datasets of humanlike movements, speeding up the data collection process.
  • Faster Data Augmentation: NVIDIA Cosmos world foundation models generate photorealistic videos at scale using the ground-truth simulation from Omniverse. This equips  developers to augment synthetic datasets faster, for training physical AI models, reducing the simulation-to-real gap.
  • Simulation-First Training: Instead of relying solely on real-world testing, developers can train robots in virtual environments, making the process faster and more cost-effective.
  • Bridging Virtual to Reality: The combination of real and synthetic data along with simulation-based training and testing allows developers to transfer the robots skills learned in the virtual world to the real-world seamlessly.

Simulating the Future of Robotics

Humanoid robots are enhancing efficiency, safety and adaptability across industries like manufacturing, warehouse and logistics, and healthcare by automating complex tasks and increasing safety conditions for human workers.

Major robotics companies including Boston Dynamics and Figure have already started adopting and demonstrating results with Isaac GR00T.

Get Plugged Into the World of OpenUSD

Learn more about OpenUSD, humanoid robots and the latest AI advancements at NVIDIA GTC, a global AI conference running March 17-21 in San Jose, California.

Don’t miss NVIDIA founder and CEO Jensen Huang’s GTC keynote on Tuesday, March 18 — in person at the SAP Center or online. He’ll share the latest technologies driving the next wave in AI, digital twins, cloud technologies and sustainable computing.

The inaugural GTC Humanoid Developer Day will take place on Wednesday, March 18. Following the sessions, join the Physical AI Developer Meetup to network with developers and researchers at NVIDIA GTC. Discuss the latest breakthroughs in OpenUSD and generative AI-powered simulation and digital twins, as well as innovations in generalist robotics for the next frontier of industries.

Learn how to use USD and continue to optimize 3D workflows with the new self-paced “Learn OpenUSD” curriculum for 3D developers and practitioners, available for free through the NVIDIA Deep Learning Institute. For more resources on OpenUSD, explore the Alliance for OpenUSD forum and the AOUSD website.

Stay up to date by subscribing to NVIDIA news, joining the community and following NVIDIA Omniverse on Instagram, LinkedIn, Medium and X.

Featured image courtesy of Fourier.

Read More

Calling All Creators: GeForce RTX 5070 Ti GPU Accelerates Generative AI and Content Creation Workflows in Video Editing, 3D and More

Calling All Creators: GeForce RTX 5070 Ti GPU Accelerates Generative AI and Content Creation Workflows in Video Editing, 3D and More

The NVIDIA GeForce RTX 5070 Ti graphics cards — built on the NVIDIA Blackwell architecture — are out now, ready to power generative AI content creation and accelerate creative performance.

GeForce RTX 5070 Ti GPUs feature fifth-generation Tensor Cores with support for FP4, doubling performance and reducing VRAM requirements to run generative AI models.

In addition, the GPU comes equipped with two ninth-generation encoders and a sixth-generation decoder that add support for the 4:2:2 pro-grade color format and increase encoding quality for HEVC and AV1. This combo accelerates video editing workflows, reducing export times by 8x compared with single encoder GPUs without 4:2:2 support like the GeForce RTX 3090.

The GeForce RTX 5070 Ti GPU also includes 16GB of fast GDDR7 memory and 896 GB/sec of total memory bandwidth — a 78% increase over the GeForce RTX 4070 Ti GPU.

The GeForce RTX 5070 Ti GPU — a game changer.

NVIDIA DLSS 4, a suite of neural rendering technologies that uses AI to boost frames per second (fps) and improve image quality, is now available in professional-grade 3D apps like Chaos Vantage. D5 Render also adds DLSS 4 in beta with the new Multi Frame Generation feature to boost frame rates by 3x. 3D rendering software Maxon Redshift also added NVIDIA Blackwell support, providing a 30% performance increase.

The February NVIDIA Studio Driver, with support for the GeForce RTX 5070 Ti GPU, will be ready for download next week. For automatic Studio Driver notifications, download the NVIDIA app.

Use NVIDIA’s product finder to pick up a GeForce RTX 5070 Ti GPU or prebuilt system today. Check back regularly after 6 a.m. PT, as retail partners list their available models. Explore complete specifications.

Ready for the Generative AI Era

Black Forest Lab’s FP4-optimized FLUX.1 [dev] suite of image generation models is now available on Hugging Face.

FP4 is a lower quantization method, similar to file compression, that decreases model sizes. FLUX.1 [dev] at FP4 requires less than 10GB of VRAM, compared with over 23GB at FP16.

This means the state-of-the-art FLUX.1 [dev] model can run on the GeForce RTX 5070 Ti GPU as well as all GeForce RTX 50 Series GPUs. This is important because the FLUX.1 [dev] model wouldn’t be able to run at FP16, given memory constraints.

On the GeForce RTX 5070 Ti GPU, the FLUX.1 [dev] model can generate images in just over eight seconds on FP4, compared with 20 seconds on FP8 on a GeForce RTX 4070 Ti GPU.

Versatile Viewports

DLSS 4 is now available in Chaos Vantage and D5 Render in beta — popular professional-grade 3D apps for architects, animators and designers.

Both apps natively support DLSS 4’s improved Super Resolution and Ray Reconstruction models — powered by transformers — to increase image detail and improve stability.

D5 Render also supports DLSS 4’s DLSS Multi Frame Generation to boost frame rates by using AI to generate up to three frames per traditionally rendered frame.

This enables animators to smoothly navigate a scene with multiplied frame rates and render 3D content, even with massive file sizes, at 60 fps or more.

Maxon Redshift — a 3D rendering software that uses GPU acceleration to visualize 3D models, scenes, animations and designs — has released an update to fully harness GeForce RTX 50 Series GPUs, accelerating performance by up to 30%.

Every month brings new creative app updates and optimizations powered by the NVIDIA Studio platform.  Follow NVIDIA Studio on Instagram, X and Facebook. Access tutorials on the Studio YouTube channel and get updates directly in your inbox by subscribing to the Studio newsletter

See notice regarding software product information.

Read More

Evaluating Sample Utility for Data Selection by Mimicking Model Weights

Foundation models are trained on large-scale web-crawled datasets, which often contain noise, biases, and irrelevant information. This motivates the use of data selection techniques, which can be divided into model-free variants — relying on heuristic rules and downstream datasets — and model-based, e.g., using influence functions. The former can be expensive to design and risk introducing unwanted dependencies, while the latter are often computationally prohibitive. Instead, we propose an efficient, model-based approach using the Mimic Score, a new data quality metric that leverages the…Apple Machine Learning Research

Wearable Accelerometer Foundation Models for Health via Knowledge Distillation

Modern wearable devices can conveniently record various biosignals in the many different environments of daily living, enabling a rich view of individual health. However, not all biosignals are the same: high-fidelity biosignals, such as photoplethysmogram (PPG), contain more physiological information, but require optical sensors with a high power footprint. Alternatively, a lower-fidelity biosignal such as accelerometry has a significantly smaller power footprint and is available in almost any wearable device. While accelerometry is widely used for activity recognition and fitness, it is less…Apple Machine Learning Research