February 2024 – Vedere AI

ViSNet: A general molecular geometry modeling framework for predicting molecular properties and simulating molecular dynamics

Figure 1. The general model architecture of ViSNet. (a) Model sketch of ViSNet. ViSNet embeds the 3D structures of molecules and extracts the geometric information through a series of ViSNet blocks and outputs the molecule properties such as energy, forces, and HOMO-LUMO gap through an output block. (b) Flowchart of one ViSNet Block. One ViSNet block consists of two modules: i) Scalar2Vec, responsible for attaching scalar embeddings to vectors.; ii) Vec2Scalar. The inputs of Scalar2Vec are the node embedding, edge embedding, direction unit and the relative positions between two atoms.

Molecular geometry modeling is a powerful tool for understanding the intricate relationships between molecular structure and biological activity – a field known as structure-activity relationships (SAR). The main premise of SAR is that the biological activity of a molecule is dictated by its specific chemical structure, not only the connections between nuclei but also how the molecule is twisted and arranged in a three-dimensional configuration. The holy grail in SAR is to be able to predict how molecular configurations influence vital processes such as drug interactions, chemical reactivity, and protein functionality. If this were possible, scientists could predict the efficacy of a drug, as well as its side effects and toxicity, long before it is ever tested on people.

The vector-scalar interactive graph neural network (ViSNet) framework, developed by Microsoft, is a novel approach to molecular geometry modeling. ViSNet is designed to help researchers predict molecular properties, simulate molecular dynamics, and gain a more precise understanding of structure-activity relationships. As a result, ViSNet has the potential to help transform drug discovery, materials science, and other critical fields.

Our research aims to improve the interpretability of molecular data, reduce computing costs, and evaluate real-world application utility. “Enhancing geometric representations for molecules with equivariant vector-scalar interactive message passing” was published in Nature Communications (opens in new tab) in January 2024 and selected for “Editors’ Highlights” in both the “AI and machine learning (opens in new tab)” and “biotechnology and method (opens in new tab)” categories.

Geometry deep learning and SAR

Geometry deep learning is a method at the forefront of SAR investigations: a powerful computational approach that harnesses the power of deep-learning techniques to analyze and understand the three-dimensional structures of molecules. Traditional deep-learning methods primarily focus on processing data organized in grid-like structures, such as images or sequences of text. However, molecules are inherently three-dimensional entities with complex geometries, making them challenging to analyze using conventional deep-learning approaches. Geometry deep learning addresses this challenge by building specialized architectures and algorithms capable of handling three-dimensional data. These methods enable computers to learn and extract meaningful features from the spatial arrangement of atoms within molecules, capturing crucial information about their structure and behavior.

Despite significant recent strides in geometry deep learning, however, challenges persist. These include:

Insufficient molecular interpretability – We are limited in our ability to understand and interpret the inner workings of deep neural networks when applied to molecular geometry modeling. While these networks excel at making predictions based on large datasets and complex patterns, they often operate as “black boxes,” meaning the rationale behind their predictions isn’t always understandable or transparent. In the context of molecular geometry, this lack of interpretability poses challenges in comprehending why certain molecular structures lead to specific outcomes, such as biological activity or chemical reactivity.
Rapidly increasing computing costs as molecular size increases – As molecules increase in size and complexity, the computational resources required to analyze them escalate dramatically. This challenge becomes particularly pronounced when employing advanced computational techniques, such as those using high-order Clebsch–Gordan coefficients. The Clebsch–Gordan coefficients are mathematical quantities used in quantum mechanics to describe the coupling of the angular momentum properties of particles. In the context of molecular modeling, these coefficients are employed in sophisticated quantum mechanical calculations to help account for the interactions between electrons and nuclei within a molecule. For large molecules, the number of atoms and electrons involved increases exponentially, resulting in an astronomical number of possible interactions that must be considered. As a result, the calculations involving high-order Clebsch–Gordan coefficients become tremendously complex and computationally demanding.
Need for blind tests and evaluations in real applications – Assessing predictive models in real-world applications through blind tests is crucial for evaluating their reliability and applicability beyond controlled benchmarks. However, challenges arise due to the scarcity of diverse and representative datasets, and complex system dynamics. There are also ethical considerations in animal and human trials, which naturally restrict the availability of such data. Overcoming these challenges requires interdisciplinary collaboration, innovative methodologies, and transparent validation frameworks to ensure the robustness and trustworthiness of predictive models in addressing real-world challenges.

Enhancing molecular geometry representations by ViSNet

Originally, our goal was to develop a model capable of effectively harnessing the intricate structures of molecules. Traditional molecular dynamics (MD) simulations track molecular movements by considering factors like bond length, bond angle, and dihedral angles. Taking inspiration from these methods, we introduced a novel approach called the vector-scalar interactive graph neural network (ViSNet).

Instead of directly integrating bond angle and dihedral information into our model in a straightforward manner, we introduced a concept termed “direction units.” These units represent nodes within the molecular structure as vectors, calculated by summing normalized vectors pointing from the central node to its neighboring nodes. We expanded traditional calculations of bond length, bond angle, and dihedral angles into interactions involving pairs of atoms (two-body), triplets of atoms (three-body), and quadruplets of atoms (four-body). To efficiently manage these interactions, we devised a runtime geometry calculation (RGC) module, which accurately captures the complex relationships between atoms in a molecule. Remarkably, the RGC module’s computations for three-body and four-body interactions exhibit linear time complexity, ensuring computational efficiency.

Additionally, we introduced a mechanism known as vector-scalar interactive message passing (ViS-MP), facilitating the exchange of information between nodes and edges in the molecular graph. This mechanism iteratively updates the direction units of nodes based on scalar representations of nodes and edges, and vice versa, through the RGC module. These distinctive features of the RGC and ViS-MP significantly enhance our model’s capacity to encode molecular geometry and streamline the process of information exchange within the molecular graph neural network.

ViSNet in real-world applications for molecular modeling and property predictions

To gauge ViSNet’s practical utility, we rigorously evaluated its performance using established benchmarks for predicting molecular properties. Across a range of datasets, including MD17, revised MD17, MD22, QM9, and Molecule3D, ViSNet consistently outperformed existing algorithms, showcasing its exceptional accuracy in representing molecular geometry.

We then put ViSNet to the test by simulating the behavior of the Chignolin protein through molecular dynamics (MD) simulations. Trained on the AIMD-Chig dataset, featuring protein data calculated using advanced density functional theory (DFT) methods, ViSNet outperformed traditional empirical force fields and showed promise when compared to contemporary machine-learning force fields. Notably, simulations with ViSNet closely mirrored outcomes from rigorous DFT calculations, highlighting its potential for precise and efficient data simulations.

We used ViSNet to participate in the First Global AI Drug Development Competition (opens in new tab), an international competition to predict the inhibitors against the main protease of SARS-CoV-2, given the sequence information (i.e., SMILES) of small molecules. Worldwide, 1,105 participants from 878 teams took part in the competition. ViSNet helped us win the competition, demonstrating its promising prediction accuracy.

Figure 2. ViSNet in the PyTorch Geometric Library. A PyTorch module that implements the equivariant vector-scalar interactive graph neural network (ViSNet) from the “Enhancing Geometric Representations for Molecules with Equivariant Vector-Scalar Interactive Message Passing” paper. — Figure 2. ViSNet in the PyTorch Geometric Library.

To make ViSNet more accessible and user-friendly, Microsoft has integrated it into the PyTorch Geometric Library (opens in new tab) as a core model for molecular modeling and property prediction. This integration aims to broaden the scope of applications and simplify the usage of ViSNet for researchers and practitioners. Additionally, to ensure ongoing support and improvement, a regularly updated version of ViSNet is now available on GitHub (opens in new tab), providing users with the latest enhancements.

Recognizing the potential limitations of graph neural networks, such as the risk of “over-smoothing” (i.e., making nodes indistinguishable from one another) as models grow larger and more complex, we developed a Transformer-based version of ViSNet known as Geoformer (short for Geometric Transformer). This novel variant, introduced in our publication at NeurIPS 2023 (opens in new tab), addresses scalability challenges by transferring the key components of ViSNet into the Transformer architecture. This includes incorporating the RGC module into the Transformer attention mechanism and introducing a new method called interatomic positional encoding (IPE) to capture spatial relationships between atoms.

Figure 3. The overall pipeline of AI2BMD (see demos at https://microsoft.github.io/AI2BMD/index.html). Proteins are divided into protein units by fragmentation process. The AI2BMD potential is designed based on ViSNet, and the datasets are generated at DFT level. It calculates the energy and atomic forces for the whole protein. The AI2BMD simulation system is built upon all these components and provides a generalizable solution to perform simulations for various proteins. It makes ab initio accuracy on energy and force calculations. By comprehensive analysis from both kinetics and thermodynamics, AI2BMD exhibits good alignments with wet-lab experiment data and detects different phenomenon compared with molecular mechanics. — Figure 3. The overall pipeline of AI²BMD (see demos at https://microsoft.github.io/AI2BMD/index.html (opens in new tab)).

Looking forward: Toward AI-powered MD simulations with ab initio accuracy

As a crucial component of the AI-powered Ab Initio Molecular Dynamics (AI²BMD) project (opens in new tab), ViSNet plays a pivotal role in accelerating molecular dynamics simulations. The project’s primary objective is to enhance the accuracy and efficiency of these simulations, with the aim of achieving results comparable to those obtained through rigorous ab initio methods, even for large molecular systems.

By integrating ViSNet into AI²BMD, significant strides have been made toward achieving this goal. ViSNet enables AI²BMD to achieve levels of accuracy in energy and force calculations that closely approach those of ab initio methods, even for complex proteins containing over 10,000 atoms. By leveraging ViSNet in protein dynamics simulations, AI²BMD aims to enhance the precision of free energy estimations and provide valuable insights into protein folding thermodynamics.

ViSNet’s contributions extend beyond energy calculations to the characterization of various protein properties. These insights have the potential to complement experimental research efforts by offering predictive capabilities and guiding further investigations into protein structure and function. The advancements in molecular geometry modeling, demonstrated by the innovative ViSNet framework, portend a new era of precision and efficiency in computational chemistry and biophysics.

Through meticulous design and rigorous validation, ViSNet has emerged as a versatile tool capable of giving insight into the intricate relationships between molecular structure and biological activity – getting us one step closer to the holy grail of structure-activity relationships. The integration of ViSNet into established libraries and frameworks, coupled with ongoing research efforts to enhance scalability and accuracy, underscores its potential to revolutionize drug discovery, materials science, and more.

The post ViSNet: A general molecular geometry modeling framework for predicting molecular properties and simulating molecular dynamics appeared first on Microsoft Research.

Decarbonizing paper packaging

Amazon teams up with RTI International, Schlumberger, and International Paper on a project selected by the US Department of Energy to scale carbon capture and storage for the pulp and paper industry.Read More

Use RAG for drug discovery with Knowledge Bases for Amazon Bedrock

Amazon Bedrock provides a broad range of models from Amazon and third-party providers, including Anthropic, AI21, Meta, Cohere, and Stability AI, and covers a wide range of use cases, including text and image generation, embedding, chat, high-level agents with reasoning and orchestration, and more. Knowledge Bases for Amazon Bedrock allows you to build performant and customized Retrieval Augmented Generation (RAG) applications on top of AWS and third-party vector stores using both AWS and third-party models. Knowledge Bases for Amazon Bedrock automates synchronization of your data with your vector store, including diffing the data when it’s updated, document loading, and chunking, as well as semantic embedding. It allows you to seamlessly customize your RAG prompts and retrieval strategies—we provide the source attribution, and we handle memory management automatically. Knowledge Bases is completely serverless, so you don’t need to manage any infrastructure, and when using Knowledge Bases, you’re only charged for the models, vector databases and storage you use.

RAG is a popular technique that combines the use of private data with large language models (LLMs). RAG starts with an initial step to retrieve relevant documents from a data store (most commonly a vector index) based on the user’s query. It then employs a language model to generate a response by considering both the retrieved documents and the original query.

In this post, we demonstrate how to build a RAG workflow using Knowledge Bases for Amazon Bedrock for a drug discovery use case.

Overview of Knowledge Bases for Amazon Bedrock

Knowledge Bases for Amazon Bedrock supports a broad range of common file types, including .txt, .docx, .pdf, .csv, and more. To enable effective retrieval from private data, a common practice is to first split these documents into manageable chunks. Knowledge Bases has implemented a default chunking strategy that works well in most cases to allow you to get started faster. If you want more control, Knowledge Bases lets you control the chunking strategy through a set of preconfigured options. You can control the maximum token size and the amount of overlap to be created across chunks to provide coherent context to the embedding. Knowledge Bases for Amazon Bedrock manages the process of synchronizing data from your Amazon Simple Storage Service (Amazon S3) bucket, splits it into smaller chunks, generates vector embeddings, and stores the embeddings in a vector index. This process comes with intelligent diffing, throughput, and failure management.

At runtime, an embedding model is used to convert the user’s query to a vector. The vector index is then queried to find documents similar to the user’s query by comparing document vectors to the user query vector. In the final step, semantically similar documents retrieved from the vector index are added as context for the original user query. When generating a response for the user, the semantically similar documents are prompted in the text model, together with source attribution for traceability.

Knowledge Bases for Amazon Bedrock supports multiple vector databases, including Amazon OpenSearch Serverless, Amazon Aurora, Pinecone, and Redis Enterprise Cloud. The Retrieve and RetrieveAndGenerate APIs allow your applications to directly query the index using a unified and standard syntax without having to learn separate APIs for each different vector database, reducing the need to write custom index queries against your vector store. The Retrieve API takes the incoming query, converts it into an embedding vector, and queries the backend store using the algorithms configured at the vector database level; the RetrieveAndGenerate API uses a user-configured LLM provided by Amazon Bedrock and generates the final answer in natural language. The native traceability support informs the requesting application about the sources used to answer a question. For enterprise implementations, Knowledge Bases supports AWS Key Management Service (AWS KMS) encryption, AWS CloudTrail integration, and more.

In the following sections, we demonstrate how to build a RAG workflow using Knowledge Bases for Amazon Bedrock, backed by the OpenSearch Serverless vector engine, to analyze an unstructured clinical trial dataset for a drug discovery use case. This data is information rich but can be vastly heterogenous. Proper handling of specialized terminology and concepts in different formats is essential to detect insights and ensure analytical integrity. With Knowledge Bases for Amazon Bedrock, you can access detailed information through simple, natural queries.

Build a knowledge base for Amazon Bedrock

In this section, we demo the process of creating a knowledge base for Amazon Bedrock via the console. Complete the following steps:

On the Amazon Bedrock console, under Orchestration in the navigation pane, choose Knowledge base.
Choose Create knowledge base.

In the Knowledge base details section, enter a name and optional description.
In the IAM permissions section, select Create and use a new service role.
For Service name role, enter a name for your role, which must start with AmazonBedrockExecutionRoleForKnowledgeBase_.
Choose Next.

In the Data source section, enter a name for your data source and the S3 URI where the dataset sits. Knowledge Bases supports the following file formats:
- Plain text (.txt)
- Markdown (.md)
- HyperText Markup Language (.html)
- Microsoft Word document (.doc/.docx)
- Comma-separated values (.csv)
- Microsoft Excel spreadsheet (.xls/.xlsx)
- Portable Document Format (.pdf)

Under Additional settings¸ choose your preferred chunking strategy (for this post, we choose Fixed size chunking) and specify the chunk size and overlay in percentage. Alternatively, you can use the default settings.
Choose Next.

In the Embeddings model section, choose the Titan Embeddings model from Amazon Bedrock.
In the Vector database section, select Quick create a new vector store, which manages the process of setting up a vector store.
Choose Next.

Review the settings and choose Create knowledge base.

Wait for the knowledge base creation to complete and confirm its status is Ready.
In the Data source section, or on the banner at the top of the page or the popup in the test window, choose Sync to trigger the process of loading data from the S3 bucket, splitting it into chunks of the size you specified, generating vector embeddings using the selected text embedding model, and storing them in the vector store managed by Knowledge Bases for Amazon Bedrock.

The sync function supports ingesting, updating, and deleting the documents from the vector index based on changes to documents in Amazon S3. You can also use the StartIngestionJob API to trigger the sync via the AWS SDK.

When the sync is complete, the Sync history shows status Completed.

Query the knowledge base

In this section, we demonstrate how to access detailed information in the knowledge base through straightforward and natural queries. We use an unstructured synthetic dataset consisting of PDF files, the page number of each ranging from 10–100 pages, simulating a clinical trial plan of a proposed new medicine including statistical analysis methods and participant consent forms. We use the Knowledge Bases for Amazon Bedrock retrieve_and_generate and retrieve APIs with Amazon Bedrock LangChain integration.

Before you can write scripts that use the Amazon Bedrock API, you’ll need to install the appropriate version of the AWS SDK in your environment. For Python scripts, this will be the AWS SDK for Python (Boto3):

pip install langchain
pip install boto3

Additionally, enable access to the Amazon Titan Embeddings model and Anthropic Claude v2 or v1. For more information, refer to Model access.

Generate questions using Amazon Bedrock

We can use Anthropic Claude 2.1 for Amazon Bedrock to propose a list of questions to ask on the clinical trial dataset:

import boto3
from langchain.llms.bedrock import Bedrock

bedrock_client = boto3.client("bedrock-runtime")

# Start with the query
prompt = "For medical research trial consent forms to sign, what are the top 5 questions can be asked?"

claude_llm = Bedrock(
    model_id="anthropic.claude-v2:1",
    model_kwargs={"temperature": 0, "top_k": 10, "max_tokens_to_sample": 3000},
    client=bedrock_client,
)

# Provide the prompt to the LLM to generate an answer to the query without any additional context provided
response = claude_llm(prompt)
questions = [
    item.split(".")[1].strip() for item in response.strip().split("nn")[1:-1]
]
questions
>>> answer:
'What is the purpose of the study? Make sure you understand the goals of the research and what the study procedures will entail',
'What are the risks and potential benefits? The form should explain all foreseeable risks, side effects, or discomforts you might experience from participating',
'What will participation involve? Get details on what tests, medications, lifestyle changes, or procedures you will go through, how much time it will take, and how long the study will last',
'Are there any costs or payments? Ask if you will be responsible for any costs related to the study or get paid for participating',
'How will my privacy be protected? The form should explain how your personal health information will be kept confidential before, during, and after the trial'

Use the Amazon Bedrock RetrieveAndGenerate API

For a fully managed RAG experience, you can use the native Knowledge Bases for Amazon Bedrock RetrieveAndGenerate API to obtain the answers directly:

bedrock_agent_client = boto3.client("bedrock-agent-runtime")

kb_id = "<YOUR_KNOWLEDGE_BASE_ID>"

def retrieveAndGenerate(
    input: str,
    kbId: str,
    region: str = "us-east-1",
    sessionId: str = None,
    model_id: str = "anthropic.claude-v2:1",
):
    model_arn = f"arn:aws:bedrock:{region}::foundation-model/{model_id}"

    if sessionId:
        return bedrock_agent_client.retrieve_and_generate(
            input={"text": input},
            retrieveAndGenerateConfiguration={
                "type": "KNOWLEDGE_BASE",
                "knowledgeBaseConfiguration": {
                    "knowledgeBaseId": kbId,
                    "modelArn": model_arn,
                },
            },
            sessionId=sessionId,
        )

    else:
        return bedrock_agent_client.retrieve_and_generate(
            input={"text": input},
            retrieveAndGenerateConfiguration={
                "type": "KNOWLEDGE_BASE",
                "knowledgeBaseConfiguration": {
                    "knowledgeBaseId": kbId,
                    "modelArn": model_arn,
                },
            },
        )

response = retrieveAndGenerate(
    "What are the potential risks and benefits of participating?", kb_id
)

generated_text = response["output"]["text"]
>>> "The potential risks include side effects from the study medication lithium such as nausea, loose stools, thirst, urination changes, shakiness, headaches, sweating, fatigue, decreased concentration, and skin rash. There is also a risk of lithium interaction with other medications. For women, there is a risk of birth defects if lithium is taken during pregnancy. There are no guaranteed benefits, but possible benefits include new information that could help the participant from the interviews and tests conducted during the study."

The cited information source can be obtained via the following code (with some of the output redacted for brevity):

response["citations"]

>>> [
    {
        "generatedResponsePart": {
            "textResponsePart": {
                "text": " The potential risks include side effects from the study...",
                "span": {"start": 0, "end": 361},
            }
        },
        "retrievedReferences": [
            {
                "content": {
                    "text": "590 ICF#2 Page 7 of 19 The primary risks and discomforts of participation…"
                },
                "location": {"type": "S3", "s3Location": {"uri": "s3://XXXX/XXXX.pdf"}},
            },
            {
                "content": {
                    "text": "N/A CSP 590 ICF#2 Page 10 of 19 Risks associated with suddenly stopping study medications..."
                },
                "location": {"type": "S3", "s3Location": {"uri": "s3://XXXX/XXXX.pdf"}},
            },
        ],
    },
    {
        "generatedResponsePart": {
            "textResponsePart": {
                "text": " There are no guaranteed benefits, but possible benefits include...",
                "span": {"start": 363, "end": 531},
            }
        },
        "retrievedReferences": [
            {
                "content": {
                    "text": "research, not usual clinical care. After these are done we ask..."
                },
                "location": {"type": "S3", "s3Location": {"uri": "s3://XXXX/XXXX.pdf"}},
            }
        ],
    },
]

By passing the session ID of the RetrieveAndGenerate API, you can preserve the conversation context and ask follow-up questions. For example, without the context, if you ask for more details from the previous answer, it may not be able to answer correctly:

retrieveAndGenerate("elaborate more on the first side effect", kb_id, sessionId=None)["output"]["text"]
>>> "The search results do not provide additional details about the mild nausea side effect that would allow me to elaborate further on it."

But by passing the session ID, the RAG pipeline is able to identify the corresponding context and return relevant answers:

retrieveAndGenerate("elaborate more on the first side effect", kb_id, sessionId=response["sessionId"])["output"]["text"]
>>> "The search results provide details that nausea from taking lithium is usually mild and goes away after days or weeks for most people. Specifically, up to 75% of people may experience mild nausea when first starting lithium, but this goes away in 90-99% of people who continue taking it."

The following table shows the retrieved answers to all the corresponding questions.

Question	Answer
What is the purpose of the study? Make sure you understand the goals of the research and what the study procedures will entail.	The purpose of the study is to test whether lithium is effective at preventing repeated suicidal self-directed violence in patients with depression or bipolar disorder.
What are the risks and potential benefits? The form should explain all foreseeable risks, side effects, or discomforts you might experience from participating.	The possible risks or discomforts include: the interview questions causing discomfort, side effects from the lithium medication such as nausea, loose stools, thirst, urination changes, shakiness, headaches, sweating, fatigue, decreased concentration, skin rash, thyroid changes, worsening acne/psoriasis, lithium toxicity, and risks if the medication is suddenly stopped. The potential benefits are that the tests may lead to new information to help the participant, and lithium may help prevent repeated suicidal self-directed violence for those with depression or bipolar disorder.
What will participation involve? Get details on what tests, medications, lifestyle changes, or procedures you will go through, how much time it will take, and how long the study will last.	Participation will involve completing an interview and questionnaires covering thinking, behaviors, mental health treatment, medications, alcohol and drug use, home and social supports, and understanding of the research study. This takes about two hours and can be done in multiple sessions, in person and by phone. If eligible for the full study, there will be about 20 study visits over one year. This will involve taking study medication, having vital signs checked, completing questionnaires, reviewing side effects, and continuing normal medical and mental health care.
Are there any costs or payments? Ask if you will be responsible for any costs related to the study or get paid for participating.	Yes, there are costs and payments discussed in the search results. You will not be charged for any treatments or procedures that are part of the study. However, you will still have to pay any usual VA co-payments for care and medications not related to the study. You will not be paid for participation, but the study will reimburse expenses related to participation like transportation, parking, etc. Reimbursement amounts and process are provided.
How will my privacy be protected? The form should explain how your personal health information will be kept confidential before, during, and after the trial.	Your privacy will be protected by conducting interviews in private, keeping written notes in locked files and offices, storing electronic information in encrypted and password protected files, and obtaining a Confidentiality Certificate from the Department of Health and Human Services to prevent disclosing information that identifies you. Information that identifies you may be shared with doctors responsible for your care or for audits and evaluations by government agencies, but talks and papers about the study will not identify you.

Query using the Amazon Bedrock Retrieve API

To customize your RAG workflow, you can use the Retrieve API to fetch the relevant chunks based on your query and pass it to any LLM provided by Amazon Bedrock. To use the Retrieve API, define it as follows:

def retrieve(query: str, kbId: str, numberOfResults: int = 5):
    return bedrock_agent_client.retrieve(
        retrievalQuery={"text": query},
        knowledgeBaseId=kbId,
        retrievalConfiguration={
            "vectorSearchConfiguration": {"numberOfResults": numberOfResults}
        },
    )

Retrieve the corresponding context (with some of the output redacted for brevity):

query = "What is the purpose of the medical research study?"
response = retrieve(query, kb_id, 3)
retrievalResults = response["retrievalResults"]
>>> [
    {
        "content": {"text": "You will not be charged for any procedures that..."},
        "location": {"type": "S3", "s3Location": {"uri": "s3://XXXXX/XXXX.pdf"}},
        "score": 0.6552521,
    },
    {
        "content": {"text": "and possible benefits of the study. You have been..."},
        "location": {"type": "S3", "s3Location": {"uri": "s3://XXXX/XXXX.pdf"}},
        "score": 0.6581577,
    },
    ...,
]

Extract the context for the prompt template:

def get_contexts(retrievalResults):
    contexts = []
    for retrievedResult in retrievalResults:
        contexts.append(retrievedResult["content"]["text"])
    return " ".join(contexts)

contexts = get_contexts(retrievalResults)

Import the Python modules and set up the in-context question answering prompt template, then generate the final answer:

from langchain.prompts import PromptTemplate

PROMPT_TEMPLATE = """
Human: You are an AI system working on medical trial research, and provides answers to questions 
by using fact based and statistical information when possible.
Use the following pieces of information to provide a concise answer to the question enclosed in <question> tags.
If you don't know the answer, just say that you don't know, don't try to make up an answer.

<context>
{context_str}
</context>

<question>
{query_str}
</question>

The response should be specific and use statistics or numbers when possible.

Assistant:"""

claude_prompt = PromptTemplate(
    template=PROMPT_TEMPLATE, input_variables=["context_str", "query_str"]
)

prompt = claude_prompt.format(context_str=contexts, query_str=query)
response = claude_llm(prompt)
>>> "Based on the context provided, the purpose of this medical research study is to evaluate the efficacy of lithium compared to a placebo in preventing suicide over a 1 year period. Specifically, participants will be randomly assigned to receive either lithium or a placebo pill for 1 year, with their doctors and the participants themselves not knowing which treatment they receive (double-blind). Blood lithium levels will be monitored and doses adjusted over the first 6-8 visits, then participants will be followed monthly for 1 year to assess outcomes."

Query using Amazon Bedrock LangChain integration

To create an end-to-end customized Q&A application, Knowledge Bases for Amazon Bedrock provides integration with LangChain. To set up the LangChain retriever, provide the knowledge base ID and specify the number of results to return from the query:

from langchain.retrievers.bedrock import AmazonKnowledgeBasesRetriever

retriever = AmazonKnowledgeBasesRetriever(
    knowledge_base_id=kb_id,
    retrieval_config={"vectorSearchConfiguration": {"numberOfResults": 4}},
)

Now set up LangChain RetrievalQA and generate answers from the knowledge base:

from langchain.chains import RetrievalQA

qa = RetrievalQA.from_chain_type(
    llm=claude_llm,
    chain_type="stuff",
    retriever=retriever,
    return_source_documents=True,
    chain_type_kwargs={"prompt": claude_prompt},
)

[qa(q)["result"] for q in questions]

This will generate corresponding answers similar to the ones listed in the earlier table.

Clean up

Make sure to delete the following resources to avoid incurring additional charges:

The Amazon Bedrock knowledge base. For instructions, refer to Manage your knowledge base.
The OpenSearch Serverless collection. For instructions, refer to Deleting collections.
The S3 bucket.

Conclusion

Amazon Bedrock provides a broad set of deeply integrated services to power RAG applications of all scales, making it straightforward to get started with analyzing your company data. Knowledge Bases for Amazon Bedrock integrates with Amazon Bedrock foundation models to build scalable document embedding pipelines and document retrieval services to power a wide range of internal and customer-facing applications. We are excited about the future ahead, and your feedback will play a vital role in guiding the progress of this product. To learn more about the capabilities of Amazon Bedrock and knowledge bases, refer to Knowledge base for Amazon Bedrock.

About the Authors

Mark Roy is a Principal Machine Learning Architect for AWS, helping customers design and build AI/ML solutions. Mark’s work covers a wide range of ML use cases, with a primary interest in computer vision, deep learning, and scaling ML across the enterprise. He has helped companies in many industries, including insurance, financial services, media and entertainment, healthcare, utilities, and manufacturing. Mark holds six AWS Certifications, including the ML Specialty Certification. Prior to joining AWS, Mark was an architect, developer, and technology leader for over 25 years, including 19 years in financial services.

Mani Khanuja is a Tech Lead – Generative AI Specialists, author of the book – Applied Machine Learning and High Performance Computing on AWS, and a member of the Board of Directors for Women in Manufacturing Education Foundation Board. She leads machine learning (ML) projects in various domains such as computer vision, natural language processing and generative AI. She helps customers to build, train and deploy large machine learning models at scale. She speaks in internal and external conferences such re:Invent, Women in Manufacturing West, YouTube webinars and GHC 23. In her free time, she likes to go for long runs along the beach.

Dr. Baichuan Sun, currently serving as a Sr. AI/ML Solution Architect at AWS, focuses on generative AI and applies his knowledge in data science and machine learning to provide practical, cloud-based business solutions. With experience in management consulting and AI solution architecture, he addresses a range of complex challenges, including robotics computer vision, time series forecasting, and predictive maintenance, among others. His work is grounded in a solid background of project management, software R&D, and academic pursuits. Outside of work, Dr. Sun enjoys the balance of traveling and spending time with family and friends.

Derrick Choo is a Senior Solutions Architect at AWS focused on accelerating customer’s journey to the cloud and transforming their business through the adoption of cloud-based solutions. His expertise is in full stack application and machine learning development. He helps customers design and build end-to-end solutions covering frontend user interfaces, IoT applications, API and data integrations and machine learning models. In his free time, he enjoys spending time with his family and experimenting with photography and videography.

Frank Winkler is a Senior Solutions Architect and Generative AI Specialist at AWS based in Singapore, focused in Machine Learning and Generative AI. He works with global digital native companies to architect scalable, secure, and cost-effective products and services on AWS. In his free time, he spends time with his son and daughter, and travels to enjoy the waves across ASEAN.

Nihir Chadderwala is a Sr. AI/ML Solutions Architect in the Global Healthcare and Life Sciences team. His expertise is in building Big Data and AI-powered solutions to customer problems especially in biomedical, life sciences and healthcare domain. He is also excited about the intersection of quantum information science and AI and enjoys learning and contributing to this space. In his spare time, he enjoys playing tennis, traveling, and learning about cosmology.

Unlock personalized experiences powered by AI using Amazon Personalize and Amazon OpenSearch Service

OpenSearch is a scalable, flexible, and extensible open source software suite for search, analytics, security monitoring, and observability applications, licensed under the Apache 2.0 license. Amazon OpenSearch Service is a fully managed service that makes it straightforward to deploy, scale, and operate OpenSearch in the AWS Cloud.

OpenSearch uses a probabilistic ranking framework called BM-25 to calculate relevance scores. If a distinctive keyword appears more frequently in a document, BM-25 assigns a higher relevance score to that document. This framework, however, doesn’t consider user behavior like click-through or purchase data, which can further improve relevance for individual users.

Improving the functionality of search is an integral aspect of enhancing the overall user experience and engagement on a website or application. Search traffic is considered high intent because users are actively seeking a particular item, and they have been found to convert up to two times more than non-site search visitors on average. By using user interaction data such as clicks, likes, and purchases, businesses can improve search relevancy to capitalize on this traffic and reduce instances of users abandoning their sessions due to difficulties in finding the desired items. By refining the quality of search results, businesses can significantly improve their customer engagement, satisfaction, and loyalty, as well as increase their conversion rates, ultimately leading to greater profitability and success.

Amazon Personalize allows you to add sophisticated personalization capabilities to your applications by using the same machine learning (ML) technology used on Amazon.com for over 20 years. No ML expertise is required.

Amazon Personalize supports the automatic adjustment of recommendations based on contextual information about your user, such as device type, location, time of day, or other information you provide. You supply Amazon Personalize with historical data about your users and their interactions within your application, such as purchase history, ratings, and likes. You can add data to Amazon Personalize in bulk by importing large historical datasets all at once from an Amazon Simple Storage Service (Amazon S3) CSV file, using a format required by Amazon Personalize. You can also add data incrementally by importing records using the Amazon Personalize console or API. After your historical data is imported, you can continue to provide new data in real time by sending user interaction events. Based on the use case you want to address, such as product recommendations, you select a pre-built recipe that is optimized for that goal. Amazon Personalize analyzes your data and trains a custom ML model based on the parameters in the recipe to generate personalized recommendations optimized for your users and application. After the model is trained, you can generate real-time personalized recommendations for your users.

With the newly launched Amazon Personalized Search Plugin for Amazon OpenSearch Service, you can use user interaction histories and interests to enhance their search results. By utilizing an Amazon Personalize recipe such as Personalized-Ranking, you can help boost search results for relevant items based on user interests at the time of getting search results from OpenSearch Service.

This post explains how to integrate the Amazon Personalize Search Ranking plugin with OpenSearch Service to enable personalized search experiences. To build Amazon Personalize artifacts in this post, we use a dataset from IMDb, the world’s most authoritative source for movie, TV, and celebrity content, available on AWS Marketplace, as well as the MovieLens dataset prepared by GroupLens research at the University of Minnesota, consisting of user rankings for various movies.

Solution overview

The following diagram illustrates the solution architecture.

The workflow includes the following steps:

A user issues a search request through their website or portal. This search request is sent to OpenSearch Service.
The top N search results are returned from the OpenSearch Service index and sent to the plugin to preprocess and prepare the input for an Amazon Personalize campaign.
The request is sent to Amazon Personalize to get the re-ranked search results.
Amazon Personalize returns the personalized ranking of the search results with the relevant score for each result.
The reranked hits are returned by the plugin to OpenSearch Service, with a weighting applied between the OpenSearch Service relevance score and the Amazon Personalize personalized ranking score. You specify a weight parameter (between 0.0–1.0) that controls the balance between OpenSearch Service and Amazon Personalize when reranking results. A higher weight means more influence from the Amazon Personalize ranking scores vs. the OpenSearch Service scores. This allows you to customize how much the personalized recommendations affect the final search results ranking returned to the user.
The user gets personalized search results based on their preferences and interactions.

Prerequisites

You should have the following prerequisites:

An AWS account.
An AWS Identity and Access Management (IAM) role with appropriate access permissions. We provide AWS CloudFormation templates and Jupyter notebooks to help set up the required IAM role and access.
To enable personalization in OpenSearch Service, you need to set up the required Amazon Personalize resources, including a dataset group, solution version, and campaign. We have provided a Jupyter notebook that creates all the Amazon Personalize resources, taking advantage of the fully managed Jupyter notebook instance capabilities of Amazon SageMaker.

Deploy the CloudFormation stack

The CloudFormation stack automates the deployment of the OpenSearch Service domain and SageMaker Notebook instance. Complete the following steps to deploy the stack:

Sign in to the AWS Management Console with your credentials in the account where you want to deploy the CloudFormation stack.
Launch the CloudFormation stack directly.
On the Specify details page, provide any parameters required by the template, such as OpenSearch Service and SageMaker instance sizes.
On the Configure stack options page, specify a stack name and any other options you want to set.
Complete creating the stack and monitor the status on the stack details page.
After the stack is created, open the SageMaker notebook instance from the console.

The notebook instance will already be preloaded with the required notebooks.

Set up and complete the Amazon Personalize workflow

Open the 1.Configure_Amazon_Personalize.ipynb notebook to set up the Amazon Personalize artifacts. This notebook walks you through the following steps:

Download the dataset and preprocess the data to create the required input files for creating the datasets.
Create a dataset group.
Create datasets and schemas.
Prepare and import data.
Create a solution and a solution version.
Create a campaign for the solution version.

Install the Amazon Personalize Search Ranking plugin using a Jupyter notebook

Open the 2.Configure_Amazon_OpenSearch.ipynb notebook and run through the instructions. This notebook walks you through the following steps:

Ingest sample index data into the OpenSearch Service instance. Populating the index with representative data facilitates thorough testing and validation of the plugin.
Install the plugin package in the OpenSearch Service domain. This integrates the personalization capabilities into the OpenSearch environment.
Set up search pipelines to activate the plugin’s functionality. Search pipelines contain request preprocessors and response postprocessors that transform queries and results. When constructing a pipeline, specify the Amazon Personalize campaign ARN created earlier in a personalized_search_ranking postprocessor to enable personalized re-ranking. This configures the plugin to retrieve real-time personalization results from Amazon Personalize for application during result processing. Defining pipelines allows the plugin to augment search relevance based on user preferences.

Install the Amazon Personalize Search Ranking plugin using the console

You can also set up the Amazon Personalize search plugin from the console. You only need to do this if you have not installed the plugin using the Jupyter notebook from earlier.

To install the Amazon Personalize Search Ranking plugin on OpenSearch Service, complete the following steps:

On the OpenSearch Service console, navigate to your domain.
On the Packages tab, choose Associate package to associate the Amazon Personalize Search Ranking plugin with your OpenSearch Service domain. The plugin version must match the OpenSearch Service domain version.

The Amazon Personalize Search Ranking plugin can be installed on OpenSearch Service versions 2.9 and above.

Locate the Amazon Personalize Search Ranking plugin in the list of available plugins.
Choose Associate next to the plugin to install it and associate it with your existing OpenSearch Service domain.

After you have connected the plugin, it will appear in the list of packages as a plugin type. With the plugin installed, the installation process is now finished.

Enable the Amazon Personalize Search Ranking plugin

The Amazon Personalize Search Ranking plugin uses the search-pipeline feature of OpenSearch Service, released starting with version 2.9. The plugin depends on the search-pipeline feature to apply Amazon Personalized ranking on search results provided by OpenSearch Service and also needs to be set up as a search-pipeline response processor. This pipeline definition will contain configuration for the Amazon Personalize plugin, which includes the Amazon Personalize campaign to call for getting Amazon Personalize ranking, the IAM role to access Amazon Personalize resources, as well as the parameters defined in the following table.

Settings	Required	Default	Description
`campaign`	Yes	None	Specify the ARN of the Amazon Personalize campaign to use to personalize results.
`recipe`	Yes	None	Specify the name of the Amazon Personalize recipe to use. As of this writing, `aws-personalized-ranking` is the only supported value.
`item_id_field`	No	“_id”	If the `_id` field for an indexed document in OpenSearch doesn’t correspond with your Amazon Personalize `itemId`, specify the name of the field that does.
`weight`	Yes	None	Specify the emphasis that the response processor puts on personalization when it re-ranks results. Specify a value within a range of 0.0–1.0. The closer to 1.0 that it is, the more likely it is that results from Amazon Personalize rank higher. If you specify 0.0, no personalization occurs and OpenSearch Service takes precedence.
`tag`	No	None	Specify an identifier for the processor.
`iam_role_arn`	Yes	None	Specify the IAM role to access Amazon Personalize resources. This is required for OpenSearch Service, and optional for open source OpenSearch.
`aws_region`	Yes	None	Specify the AWS Region where you created your Amazon Personalize campaign.
`ignore_failure`	No	None	Specify whether the plugin ignores any processor failures. For values, specify `true` or `false`. For your production environments, we recommend that you specify `true` to avoid any interruptions for query responses. For test environments, you can specify `false` to view any errors that the plugin generates.
`external_account_iam_role_arn`	No	None	If you use OpenSearch Service and your Amazon Personalize and OpenSearch Service resources exist in different accounts, specify the ARN of the role that has permission to access to Amazon Personalize.

The following Python code snippet creates a search pipeline with a personalized_search_ranking response processor on an OpenSearch Service domain. You run this step one time as a part of the notebook that accompanies this post:

Define search pipeline for personalized ranking

You can use the following Python code to create a search pipeline with a personalized_search_ranking response processor on an OpenSearch Service domain. Replace domain endpoint with your domain endpoint URL. For example: https://<domain name>.<AWS region>.es.amazonaws.com.

import requests
from requests_auth_aws_sigv4 import AWSSigV4

domain_endpoint = 'domain endpoint'
pipeline_name = 'pipeline name'
url = f'{domain_endpoint}/_search/pipeline/{pipeline_name}'
auth = AWSSigV4('es')

headers = {'Content-Type': 'application/json'}

body = {
  "description": "A pipeline to apply custom re-ranking from Amazon Personalize",
  "response_processors": [
    {
      "personalized_search_ranking" : {
        "campaign_arn" : "<Replace with Amazon Personalize Campaign ARN>",
        "item_id_field" : "itemId",
        "recipe" : "aws-personalized-ranking",
        "weight" : "0.3",
        "tag" : "personalize-processor",
        "iam_role_arn": "<Replace with Role ARN>",
        "aws_region": "<Replace with AWS region>",
        "ignore_failure": true
    }
  ]
}
try:
    response = requests.put(url, auth=auth, json=body, headers=headers)
    print(response.text)
except Exception as e:
    print(f"Error: {e}")

Apply a search pipeline to an individual query

After you configure a search pipeline with a personalized_search_ranking response processor, you can apply the Amazon Personalize Search Ranking plugin to your OpenSearch queries and view the re-ranked results. Update the code to specify your domain endpoint, your OpenSearch Service index, the name of your pipeline (you configured above), and your query (we use “Tom Cruise” for query). For user_id, specify the ID of the user that you’re getting search results for. This user must be in the data that you used to create your Amazon Personalize solution version.

import requests
from requests_auth_aws_sigv4 import AWSSigV4

domain_endpoint = 'domain endpoint'
index = 'index name'
url = f'{domain_endpoint}/{index}/_search/'

auth = AWSSigV4('es')
headers = {'Content-Type': 'application/json'}
params = {"search_pipeline": "<Replace with pipeline-name>"}
body = {
    "query": {
        "multi_match": {
            "query": "Tom Cruise",
            "fields": ["title", "plot", "genres", "directedBy", "starring"]
        }
    },
    "ext": {
        "personalize_request_parameters": {
            "user_id": "<Replace with USER ID>"
        }
    }
}
try:
    response = requests.post(url, auth=auth, params=params, json=body, headers=headers)
    print(response)
except Exception as e:
    print(f"Error: {e}")

Evaluate the results

Open the 3.Testing.ipynb notebook and walk through the steps to test and compare the results for queries that use personalization and those that don’t. The Amazon Personalize Search Ranking plugin re-ranks the search results in the OpenSearch Service query response. It considers both the ranking from Amazon Personalize and the ranking from OpenSearch Service. This notebook walks you through the following steps:

Define the necessary connection parameters to establish a connection with your OpenSearch Service domain. This involves specifying the domain endpoint, authentication credentials, and any additional configuration settings required for your specific OpenSearch Service setup.
Create a set of sample queries, including queries with personalization parameters and queries without personalization parameters. These queries will be used to evaluate the impact of personalization on the search results.
Run and compare the results for queries that use personalization and those that do not.

For our example, we used a query for “Tom Cruise” and for the personalization parameter, we used a user with a recent history of viewing drama and romance film genres. The subsequent search results exhibit how the plugin tailors and prioritizes recommendations predicated on the user’s observed viewing behavior. This exemplifies the plugin’s ability to deliver a customized, curated experience by considering individual user preferences and engagement patterns. The capability to refine and attune search outcomes based on inferences of a user’s preferences enables delivering enhanced relevance and utility.

Personalized vs. non-personalized results

Let’s consider personalizing results for a user with ID 12. First, we check this user’s recent interactions by running the code in the 3.Testing.ipynb notebook to retrieve their interaction history. This allows us to see what types of movies this user has reviewed recently, which can inform how we personalize recommendations for them.

In this example, we see that the user has expressed interest in drama, romance, and thriller movie genres. To provide personalized recommendations, we first run queries with personalization parameters enabled, utilizing the user’s genre preferences. We then run the same queries without personalization enabled, for comparison. The following results show the difference between the non-personalized and personalized recommendation outputs.

The first two columns display the default OpenSearch Service results for the query “Tom Cruise” on a movies index, showing a variety of Tom Cruise films across different genres. The next two columns showcase personalized OpenSearch Service results for the same “Tom Cruise” query, but customized for a user interested in drama, romance, and thriller genres. Compared to the generic results, the personalized results prominently feature Tom Cruise movies in the user’s preferred drama, romance, and thriller genres. The delta highlights how the personalized results have been re-ranked relative to the non-personalized results, prioritizing films that match the user’s genre preferences. This demonstrates how personalization can tailor OpenSearch Service results to individual users’ tastes and interests.

This comparison demonstrates how Amazon Personalize can customize OpenSearch Service movie results to match an individual user’s interests. Although standard OpenSearch Service aims to universally serve relevant movie results for Tom Cruise, Amazon Personalize tailors the results to focus on Tom Cruise films it predicts this user will enjoy based on their unique viewing history and preferences.

The side-by-side results illustrate how Amazon Personalize provides a more targeted, user-centric search experience by personalizing the movie results to the individual.

Clean up

Complete the following steps to clean up your resources:

Follow the steps in the 4.Cleanup.ipynb notebook to clean up the resources created through the notebook.
On the AWS CloudFormation console, delete the stack that you created.

Conclusion

The Amazon Personalize Search Ranking plugin integrates seamlessly with OpenSearch Service to enable personalized search experiences. By using user behavior data and the ML capabilities of Amazon Personalize, the plugin can reorder OpenSearch Service result rankings to boost relevance for each unique user. This creates a custom-tailored search experience that surfaces the most relevant content higher in the results. The plugin is configurable to balance personalization with OpenSearch Service native scoring to fit diverse use cases. Overall, the Amazon Personalize Search Ranking plugin is a powerful way to enhance OpenSearch Service search relevance and engagement by factoring in the individual interests and preferences of your users. With just a few configuration steps, you can start serving hyper-relevant results that resonate strongly with your users.

Additional resources

Amazon Personalize Developer Guide
Personalizing search results from OpenSearch
Setting up Amazon Personalize
Amazon Personalize workflow
Configuring the plugin
Configuring permissions when resources are in different accounts
The IMDb dataset is available on AWS Data Exchange and provides over 1.6 billion user ratings; credits for more than 13 million cast and crew members; 10 million movie, TV, and entertainment titles; and global box office reporting data from more than 60 countries

About the Authors

James Jory is a Principal Solutions Architect in Applied AI with AWS. He has a special interest in personalization and recommender systems and a background in ecommerce, marketing technology, and customer data analytics. In his spare time, he enjoys camping and auto racing simulations.

Reagan Rosario is a Solutions Architect at AWS, specializing in building scalable, highly available, and secure cloud solutions for education technology companies. With over 10 years of experience in software engineering and architecture roles, Reagan loves using his technical knowledge to help AWS customers architect robust cloud solutions that leverage the breadth and depth of AWS.

Automate Amazon SageMaker Pipelines DAG creation

Creating scalable and efficient machine learning (ML) pipelines is crucial for streamlining the development, deployment, and management of ML models. In this post, we present a framework for automating the creation of a directed acyclic graph (DAG) for Amazon SageMaker Pipelines based on simple configuration files. The framework code and examples presented here only cover model training pipelines, but can be readily extended to batch inference pipelines as well.

This dynamic framework uses configuration files to orchestrate preprocessing, training, evaluation, and registration steps for both single-model and multi-model use cases based on user-defined Python scripts, infrastructure needs (including Amazon Virtual Private Cloud (Amazon VPC) subnets and security groups, AWS Identity and Access Management (IAM) roles, AWS Key Management Service (AWS KMS) keys, containers registry, and instance types), input and output Amazon Simple Storage Service (Amazon S3) paths, and resource tags. Configuration files (YAML and JSON) allow ML practitioners to specify undifferentiated code for orchestrating training pipelines using declarative syntax. This enables data scientists to quickly build and iterate on ML models, and empowers ML engineers to run through continuous integration and continuous delivery (CI/CD) ML pipelines faster, decreasing time to production for models.

Solution overview

The proposed framework code starts by reading the configuration files. It then dynamically creates a SageMaker Pipelines DAG based on the steps declared in the configuration files and the interactions and dependencies among steps. This orchestration framework caters to both single-model and multi-model use cases, and provides a smooth flow of data and processes. The following are the key benefits of this solution:

Automation – The entire ML workflow, from data preprocessing to model registry, is orchestrated with no manual intervention. This reduces the time and effort required for model experimentation and operationalization.
Reproducibility – With a predefined configuration file, data scientists and ML engineers can reproduce the entire workflow, achieving consistent results across multiple runs and environments.
Scalability – Amazon SageMaker is used throughout the pipeline, enabling ML practitioners to process large datasets and train complex models without infrastructure concerns.
Flexibility – The framework is flexible and can accommodate a wide range of ML use cases, ML frameworks (such as XGBoost and TensorFlow), multi-model training, and multi-step training. Every step of the training DAG can be customized via the configuration file.
Model governance – The Amazon SageMaker Model Registry integration allows for tracking model versions, and therefore promoting them to production with confidence.

The following architecture diagram depicts how you can use the proposed framework during both experimentation and operationalization of ML models. During experimentation, you can clone the framework code repository provided in this post and your project-specific source code repositories into Amazon SageMaker Studio, and set your virtual environment (detailed later in this post). You can then iterate on preprocessing, training, and evaluation scripts, as well as configuration choices. To create and run a SageMaker Pipelines training DAG, you can call the framework’s entry point, which will read all the configuration files, create the necessary steps, and orchestrate them based on the specified step ordering and dependencies.

During operationalization, the CI pipeline clones the framework code repository and project-specific training repositories into an AWS CodeBuild job, where the framework’s entry point script is called to create or update the SageMaker Pipelines training DAG, and then run it.

Repository structure

The GitHub repository contains the following directories and files:

/framework/conf/ – This directory contains a configuration file that is used to set common variables across all modeling units such as subnets, security groups, and IAM role at the runtime. A modeling unit is a sequence of up to six steps for training an ML model.
/framework/createmodel/ – This directory contains a Python script that creates a SageMaker model object based on model artifacts from a SageMaker Pipelines training step. The model object is later used in a SageMaker batch transform job for evaluating model performance on a test set.
/framework/modelmetrics/ – This directory contains a Python script that creates an Amazon SageMaker Processing job for generating a model metrics JSON report for a trained model based on results of a SageMaker batch transform job performed on test data.
/framework/pipeline/ – This directory contains Python scripts that use Python classes defined in other framework directories to create or update a SageMaker Pipelines DAG based on the specified configurations. The model_unit.py script is used by pipeline_service.py to create one or more modeling units. Each modeling unit is a sequence of up to six steps for training an ML model: process, train, create model, transform, metrics, and register model. Configurations for each modeling unit should be specified in the model’s respective repository. The pipeline_service.py also sets dependencies among SageMaker Pipelines steps (how steps within and across modeling units are sequenced or chained) based on the sagemakerPipeline section, which should be defined in the configuration file of one of the model repositories (the anchor model). This allows you to override default dependencies inferred by SageMaker Pipelines. We discuss the configuration file structure later in this post.
/framework/processing/ – This directory contains a Python script that creates a SageMaker Processing job based on the specified Docker image and entry point script.
/framework/registermodel/ – This directory contains a Python script for registering a trained model along with its calculated metrics in SageMaker Model Registry.
/framework/training/ – This directory contains a Python script that creates a SageMaker training job.
/framework/transform/ – This directory contains a Python script that creates a SageMaker batch transform job. In the context of model training, this is used to calculate the performance metric of a trained model on test data.
/framework/utilities/ – This directory contains utility scripts for reading and joining configuration files, as well as logging.
/framework_entrypoint.py – This file is the entry point of the framework code. It calls a function defined in the /framework/pipeline/ directory to create or update a SageMaker Pipelines DAG and run it.
/examples/ – This directory contains several examples of how you can use this automation framework to create simple and complex training DAGs.
/env.env – This file allows you to set common variables such as subnets, security groups, and IAM role as environment variables.
/requirements.txt – This file specifies Python libraries that are required for the framework code.

Prerequisites

You should have the following prerequisites before deploying this solution:

An AWS account
SageMaker Studio
A SageMaker role with Amazon S3 read/write and AWS KMS encrypt/decrypt permissions
An S3 bucket for storing data, scripts, and model artifacts
Optionally, the AWS Command Line Interface (AWS CLI)
Python3 (Python 3.7 or greater) and the following Python packages:
- boto3
- sagemaker
- PyYAML
Additional Python packages used in your custom scripts

Deploy the solution

Complete the following steps to deploy the solution:

Organize your model training repository according to the following structure:

<MODEL-DIR-REPO>
 .
├── <MODEL-DIR>
|    ├── conf
|    |   └── conf.yaml
|    └── scripts
|        ├── preprocess.py
|        ├── train.py
|        ├── transform.py
|        └── evaluate.py
└── README.md

Clone the framework code and your model source code from the Git repositories:

- Clone dynamic-sagemaker-pipelines-framework repo into a training directory. In the following code, we assume the training directory is called aws-train:
```
git clone https://github.com/aws-samples/dynamic-sagemaker-pipelines-framework.git aws-train
```
- Clone the model source code under the same directory. For multi-model training, repeat this step for as many models as you need to train.
```
git clone https:<MODEL-DIR-REPO>.git aws-train
```

For single-model training, your directory should look like the following:

<aws-train>  
.  
├── framework
└── <MODEL-DIR>

For multi-model training, your directory should look like the following:

<aws-train>  
.  
├── framework
└── <MODEL-DIR-1>
└── <MODEL-DIR-2>
└── <MODEL-DIR-3>

Set up the following environment variables. Asterisks indicate environment variables that are required; the rest are optional.

Environment Variable	Description
`SMP_ACCOUNTID^*`	AWS account where the SageMaker pipeline is run
`SMP_REGION^*`	AWS Region where the SageMaker pipeline is run
`SMP_S3BUCKETNAME^*`	S3 bucket name
`SMP_ROLE^*`	SageMaker role
`SMP_MODEL_CONFIGPATH^*`	Relative path of the of single-model or multi-model configuration files
`SMP_SUBNETS`	Subnet IDs for SageMaker networking configuration
`SMP_SECURITYGROUPS`	Security group IDs for SageMaker networking configuration

For single-model use cases, SMP_MODEL_CONFIGPATH will be <MODEL-DIR>/conf/conf.yaml. For multi-model use cases, SMP_MODEL_CONFIGPATH will be */conf/conf.yaml, which allows you to find all conf.yaml files using Python’s glob module and combine them to form a global configuration file. During experimentation (local testing), you can specify environment variables inside the env.env file and then export them by running the following command in your terminal:

source env.env

Note that the values of environment variables in env.env should be placed inside quotation marks (for example, SMP_REGION="us-east-1"). During operationalization, these environment variables should be set by the CI pipeline.

Create and activate a virtual environment by running the following commands:
```
python -m venv .venv

source .venv/bin/activate
```
Install the required Python packages by running the following command:
```
pip install -r requirements.txt
```
Edit your model training conf.yaml files. We discuss the configuration file structure in the next section.
From the terminal, call the framework’s entry point to create or update and run the SageMaker Pipeline training DAG:
```
python framework/framework_entrypoint.py
```
View and debug the SageMaker Pipelines run on the Pipelines tab of the SageMaker Studio UI.

Configuration file structure

There are two types of configuration files in the proposed solution: framework configuration and model configuration. In this section, we describe each in detail.

Framework configuration

The /framework/conf/conf.yaml file sets the variables that are common across all modeling units. This includes SMP_S3BUCKETNAME, SMP_ROLE, SMP_MODEL_CONFIGPATH, SMP_SUBNETS, SMP_SECURITYGROUPS, and SMP_MODELNAME. Refer to Step 3 of deployment instructions for descriptions of these variables and how to set them via environment variables.

Model configuration

For each model in the project, we need to specify the following in the <MODEL-DIR>/conf/conf.yaml file (asterisks indicate required sections; the rest are optional):

/conf/models* – In this section, you can configure one or more modeling units. When the framework code is run, it will automatically read all configuration files during runtime and append them to the config tree. Theoretically, you can specify all modeling units in the same conf.yaml file, but it’s recommended to specify each modeling unit configuration in its respective directory or Git repository to minimize errors. The units are as follows:
- {model-name}* – The name of the model.
- source_directory* – A common source_dir path to use for all steps within the modeling unit.
- preprocess – This section specifies preprocessing parameters.
- train* – This section specifies training job parameters.
- transform* – This section specifies SageMaker Transform job parameters for making predictions on the test data.
- evaluate – This section specifies SageMaker Processing job parameters for generating a model metrics JSON report for the trained model.
- registry* – This section specifies parameters for registering the trained model in SageMaker Model Registry.
/conf/sagemakerPipeline* – This section defines the SageMaker Pipelines flow, including dependencies among steps. For single-model use cases, this section is defined at the end of the configuration file. For multi-model use cases, the sagemakerPipeline section only needs to be defined in the configuration file of one of the models (any of the models). We refer to this model as the anchor model. The parameters are as follows:
- pipelineName* – Name of the SageMaker pipeline.
- models* – Nested list of modeling units:
  - {model-name}* – Model identifier, which should match a {model-name} identifier in the /conf/models section.
    - steps* –
      - step_name* – Step name to be displayed in the SageMaker Pipelines DAG.
      - step_class* – (Union[Processing, Training, CreateModel, Transform, Metrics, RegisterModel])
      - step_type* – This parameter is only required for preprocessing steps, for which it should be set to preprocess. This is needed to distinguish preprocess and evaluate steps, both of which have a step_class of Processing.
      - enable_cache – ([Union[True, False]]). This indicates whether to enable SageMaker Pipelines caching for this step.
      - chain_input_source_step – ([list[step_name]]). You can use this to set the channel outputs of another step as input to this step.
      - chain_input_additional_prefix – This is only allowed for steps of the Transform step_class, and can be used in conjunction with chain_input_source_step parameter to pinpoint the file that should be used as the input to the transform step.
- dependencies – This section specifies the sequence in which the SageMaker Pipelines steps should be run. We have adapted the Apache Airflow notation for this section (for example, {step_name} >> {step_name}). If this section is left blank, explicit dependencies specified by the chain_input_source_step parameter or implicit dependencies define the SageMaker Pipelines DAG flow.

Note that we recommend having one training step per modeling unit. If multiple training steps are defined for a modeling unit, the subsequent steps implicitly take the last training step to create the model object, calculate metrics, and register the model. If you need to train multiple models, it’s recommended to create multiple modeling units.

Examples

In this section, we demonstrate three examples of ML model training DAGs created using the presented framework.

Single-model training: LightGBM

This is a single-model example for a classification use case where we use LightGBM in script mode on SageMaker. The dataset consists of categorical and numerical variables to predict the binary label Revenue (to predict if the subject makes a purchase or not). The preprocessing script is used to model the data for training and testing and then stage it in an S3 bucket. The S3 paths are then provided to the training step in the configuration file.

When the training step runs, SageMaker loads the file on the container at /opt/ml/input/data/{channelName}/, accessible via the environment variable SM_CHANNEL_{channelName} on the container (channelName= ‘train’ or ‘test’).The training script does the following:

Load the files locally from local container paths using the NumPy load module.
Set hyperparameters for the training algorithm.
Save the trained model at the local container path /opt/ml/model/.

SageMaker takes the content under /opt/ml/model/ to create a tarball that is used to deploy the model to SageMaker for hosting.

The transform step takes as input the staged test file as input and the trained model to make predictions on the trained model. The output of the transform step is chained to the metrics step to evaluate the model against the ground truth, which is explicitly supplied to the metrics step. Finally, the output of the metrics step is implicitly chained to the register step to register the model in SageMaker Model Registry with information about the model’s performance produced in the metrics step. The following figure shows a visual representation of the training DAG. You can refer to the scripts and configuration file for this example in the GitHub repo.

Single-model training: LLM fine-tuning

This is another single-model training example, where we orchestrate fine-tuning of a Falcon-40B large language model (LLM) from Hugging Face Hub for a text summarization use case. The preprocessing script loads the samsum dataset from Hugging Face, loads the tokenizer for the model, and processes the train/test data splits for fine-tuning the model on this domain data in the falcon-text-summarization-preprocess step.

The output is chained to the falcon-text-summarization-tuning step, where the training script loads the Falcon-40B LLM from Hugging Face Hub and starts accelerated fine-tuning using LoRA on the train split. The model is evaluated in the same step after fine-tuning, which gatekeeps the evaluation loss to fail the falcon-text-summarization-tuning step, which causes the SageMaker pipeline to stop before it is able to register the fine-tuned model. Otherwise, the falcon-text-summarization-tuning step runs successfully and the model is registered in SageMaker Model Registry. The following figure shows a visual representation of the LLM fine-tuning DAG. The scripts and configuration file for this example are available in the GitHub repo.

Multi-model training

This is a multi-model training example where a principal component analysis (PCA) model is trained for dimensionality reduction, and a TensorFlow Multilayer Perceptron model is trained for California Housing Price prediction. The TensorFlow model’s preprocessing step uses a trained PCA model to reduce dimensionality of its training data. We add a dependency in the configuration to ensure the TensorFlow model is registered after PCA model registration. The following figure shows a visual representation of the multi-model training DAG example. The scripts and configuration files for this example are available in the GitHub repo.

Clean up

Complete the following steps to clean up your resources:

Use the AWS CLI to list and remove any remaining pipelines that are created by the Python scripts.
Optionally, delete other AWS resources such as the S3 bucket or IAM role created outside SageMaker Pipelines.

Conclusion

In this post, we presented a framework for automating SageMaker Pipelines DAG creation based on configuration files. The proposed framework offers a forward-looking solution to the challenge of orchestrating complex ML workloads. By using a configuration file, SageMaker Pipelines provides the flexibility to build orchestration with minimal code, so you can streamline the process of creating and managing both single-model and multi-model pipelines. This approach not only saves time and resources, but also promotes MLOps best practices, contributing to the overall success of ML initiatives. For more information about implementation details, review the GitHub repo.

About the Authors

Luis Felipe Yepez Barrios, is a Machine Learning Engineer with AWS Professional Services, focused on scalable distributed systems and automation tooling to expedite scientific innovation in the field of Machine Learning (ML). Furthermore, he assists enterprise clients in optimizing their machine learning solutions through AWS services.

Jinzhao Feng, is a Machine Learning Engineer at AWS Professional Services. He focuses on architecting and implementing large scale Generative AI and classical ML pipeline solutions. He is specialized in FMOps, LLMOps and distributed training.

Harsh Asnani, is a Machine Learning Engineer at AWS. His Background is in Applied Data Science with a focus on operationalizing Machine Learning workloads in the cloud at scale.

Hasan Shojaei, is a Sr. Data Scientist with AWS Professional Services, where he helps customers across different industries solve their business challenges through the use of big data, machine learning, and cloud technologies. Prior to this role, Hasan led multiple initiatives to develop novel physics-based and data-driven modeling techniques for top energy companies. Outside of work, Hasan is passionate about books, hiking, photography, and history.

Alec Jenab, is a Machine Learning Engineer who specializes in developing and operationalizing machine learning solutions at scale for enterprise customers. Alec is passionate about bringing innovative solutions to market, especially in areas where machine learning can meaningfully improve end user experience. Outside of work, he enjoys playing basketball, snowboarding, and discovering hidden gems in San Francisco.

Accelerating large-scale neural network training on CPUs with ThirdAI and AWS Graviton

This guest post is written by Vihan Lakshman, Tharun Medini, and Anshumali Shrivastava from ThirdAI.

Large-scale deep learning has recently produced revolutionary advances in a vast array of fields. Although this stunning progress in artificial intelligence remains remarkable, the financial costs and energy consumption required to train these models has emerged as a critical bottleneck due to the need for specialized hardware like GPUs. Traditionally, even modestly sized neural models have required costly hardware accelerators for training, which limits the number of organizations with the financial means to take full advantage of this technology.

Founded in 2021, ThirdAI Corp. is a startup dedicated to the mission of democratizing artificial intelligence technologies through algorithmic and software innovations that fundamentally change the economics of deep learning. We have developed a sparse deep learning engine, known as BOLT, that is specifically designed for training and deploying models on standard CPU hardware as opposed to costly and energy-intensive accelerators like GPUs. Many of our customers have reported strong satisfaction with ThirdAI’s ability to train and deploy deep learning models for critical business problems on cost-effective CPU infrastructure.

In this post, we investigate of potential for the AWS Graviton3 processor to accelerate neural network training for ThirdAI’s unique CPU-based deep learning engine.

The benefits of high-performance CPUs

At ThirdAI, we achieve these breakthroughs in efficient neural network training on CPUs through proprietary dynamic sparse algorithms that activate only a subset of neurons for a given input (see the following figure), thereby side-stepping the need for full dense computations. Unlike other approaches to sparse neural network training, ThirdAI uses locality-sensitive hashing to dynamically select neurons for a given input as shown in the bold lines below. In certain cases, we have even observed that our sparse CPU-based models train faster than the comparable dense architecture on GPUs.

Given that many of our target customers operate in the cloud—and among those, the majority use AWS—we were excited to try out the AWS Graviton3 processor to see if the impressive price-performance improvements of Amazon’s silicon innovation would translate to our unique workload of sparse neural network training and thereby provide further savings for customers. Although both the research community and the AWS Graviton team have delivered exciting advances in accelerating neural network inference on CPU instances, we at ThirdAI are, to our knowledge, the first to seriously study how to train neural models on CPUs efficiently.

As shown in our results, we observed a significant training speedup with AWS Graviton3 over the comparable Intel and NVIDIA instances on several representative modeling workloads.

Instance types

For our evaluation, we considered two comparable AWS CPU instances: a c6i.8xlarge machine powered by Intel’s Ice Lake processor and a c7g.8xlarge powered by AWS Graviton3. The following table summarizes the details of each instance.

Instance	vCPU	RAM (GB)	Processor	On-Demand Price (us-east-1)
c7g.8xlarge	32	64	AWS Graviton3	$1.1562/hr
c6i.8xlarge	32	64	Intel Ice Lake	$1.36/hr
g5g.8xlarge (GPU)	32	64 with 16 GB GPU Memory	AWS Graviton2 processors with 1 NVIDIA T4G GPU	$1.3720/hr

Evaluation 1: Extreme classification

For our first evaluation, we focus on the problem of extreme multi-label classification (XMC), an increasingly popular machine learning (ML) paradigm with a number of practical applications in search and recommendations (including at Amazon). For our evaluation, we focus on the public Amazon-670K product recommendation task, which, given an input product, identifies similar products from a collection of over 670,000 items.

In this experiment, we benchmark ThirdAI’s BOLT engine against TensorFlow 2.11 and PyTorch 2.0 on the aforementioned hardware choices: Intel Ice Lake, AWS Graviton3, and an NVIDIA T4G GPU. For our experiments on Intel and AWS Graviton, we use the AWS Deep Learning AMI (Ubuntu 18.04) version 59.0. For our GPU evaluation, we use the NVIDIA GPU-Optimized Arm64 AMI, available via the AWS Marketplace. For this evaluation, we use the SLIDE model architecture, which achieves both competitive performance on this extreme classification task and strong training performance on CPUs. For our TensorFlow and PyTorch comparisons, we implement the analogous version of the SLIDE multi-layer perceptron (MLP) architecture with dense matrix multiplications. We train each model for five epochs (full passes through the training dataset) with a fixed batch size of 256 and learning rate of 0.001. We observed that all models achieved the same test accuracy of 33.6%.

The following chart compares the training time of ThirdAI’s BOLT to TensorFlow 2.11 and PyTorch 2.0 on the Amazon670k extreme classification benchmark. All models achieve the same test precision. We observe that AWS Graviton3 considerably accelerates the performance of BOLT out of the box with no customizations needed—by approximately 40%. ThirdAI’s BOLT on AWS Graviton3 also achieves considerably faster training than the TensorFlow or PyTorch models trained on the GPU. Note that there is no ThirdAI result on the NVIDIA GPU benchmark because BOLT is designed to run on CPUs. We do not include TensorFlow and PyTorch CPU benchmarks because of the prohibitively long training time.

The following table summarizes the training time and test accuracy for each processor/specialized processor(GPU).

Processor	Engine	Training Time (s)	Test Accuracy
Intel Ice Lake (c6i.8xlarge)	BOLT	1470	33.6
AWS Graviton3 (c7g.8xlarge)	BOLT	935	33.6
NVIDIA T4G (g5g.8xlarge)	TensorFlow	7550	33.6
NVIDIA T4G (g5g.8xlarge)	PyTorch	5130	33.6

Evaluation 2: Yelp Polarity sentiment analysis

For our second evaluation, we focus on the popular Yelp Polarity sentiment analysis benchmark, which involves classifying a review as positive or negative. For this evaluation, we compare ThirdAI’s Universal Deep Transformers (UDT) model against a fine-tuned DistilBERT network, a compressed pre-trained language model that achieves near-state-of-the-art performance with reduced inference latency. Because fine-tuning DistilBERT models on a CPU would take a prohibitively long time (at least several days), we benchmark ThirdAI’s CPU-based models against DistilBERT fine-tuned on a GPU. We train all models with a batch size of 256 for a single pass through the data (one epoch). We note that we can achieve slightly higher accuracy with BOLT with additional passes through the data, but we restrict ourselves to a single pass in this evaluation for consistency.

As shown in the following figure, AWS Graviton3 again accelerates ThirdAI’s UDT model training considerably. Furthermore, UDT is able to achieve comparable test accuracy to DistilBERT with a fraction of the training time and without the need for a GPU. We note that there has also been recent work in optimizing the fine-tuning of Yelp Polarity on CPUs. Our models, however, still achieve greater efficiency gains and avoid the cost of pre-training, which is substantial and requires the use of hardware accelerators like GPUs.

The following table summarizes the training time, test accuracy, and inference latency.

Processor	Engine	Model	Training Time (s)	Test Accuracy	Inference Latency (ms)
Intel Icelake (c6i.8xlarge)	BOLT	UDT	47	93.2	<1
Graviton3 (c7g.8xlarge)	BOLT	UDT	29	92.9	<1
T4G GPU (g5g.8xlarge)	TensorFlow	DistilBERT	4200	93.3	8.7
T4G GPU (g5g.8xlarge)	PyTorch	DistilBERT	3780	93.4	8.3

Evaluation 3: Multi-class text classification (DBPedia)

For our final evaluation, we focus on the problem of multi-class text classification, which involves assigning a label to a given input text from a set of more than two output classes. We focus on the DBPedia benchmark, which consists of 14 possible output classes. Again, we see that AWS Graviton3 accelerates UDT performance over the comparable Intel instance by roughly 40%. We also see that BOLT achieves comparable results to the DistilBERT transformer-based model fine-tuned on a GPU while achieving sub-millisecond latency.

The following table summarizes the training time, test accuracy, and inference latency.

Processor	Engine	Model	Training Time (s)	Test Accuracy	Inference Latency (ms)
Intel Icelake (c6i.8xlarge)	BOLT	UDT	23	98.23	<1
Graviton3 (c7g.8xlarge)	BOLT	UDT	14	98.10	<1
T4G GPU (g5g.8xlarge)	TensorFlow	DistilBERT	4320	99.23	8.6
T4G GPU (g5g.8xlarge)	PyTorch	DistilBERT	3480	99.29	8

Get started with ThirdAI on AWS Graviton

We have designed our BOLT software for compatibility with all major CPU architectures, including AWS Graviton3. In fact, we didn’t have to make any customizations to our code to run on AWS Graviton3. Therefore, you can use ThirdAI for model training and deployment on AWS Graviton3 with no additional effort. In addition, as detailed in our recent research whitepaper, we have developed a set of novel mathematical techniques to automatically tune the specialized hyperparameters associated with our sparse models, allowing our models to work well immediately out of the box.

We also note that our models primarily work well for search, recommendation, and natural language processing tasks that typically feature large, high-dimensional output spaces and a requirement of extremely low inference latency. We are actively working on extending our methods to additional domains, such as computer vision, but be aware that our efficiency improvements do not translate to all ML domains at this time.

Conclusion

In this post, we investigated the potential for the AWS Graviton3 processor to accelerate neural network training for ThirdAI’s unique CPU-based deep learning engine. Our benchmarks on search, text classification, and recommendations benchmarks suggest that AWS Graviton3 can accelerate ThirdAI’s model training workloads by 30–40% over the comparable x86 instances with a price-performance improvement of nearly 50%. Furthermore, because AWS Graviton3 instances are available at a lower cost than the analogous Intel and NVIDIA machines and enable shorter training and inference times, you can further unlock the value of the AWS pay-as-you-go usage model by using lower-cost machines for shorter durations of time.

We are very excited by the price and performance savings of AWS Graviton3 and will look to pass on these improvements to our customers so they can enjoy faster ML training and inference with improved performance on low-cost CPUs. As customers of AWS ourselves, we are delighted by the speed at which AWS Graviton3 allows us to experiment with our models, and we look forward to using more cutting-edge silicon innovation from AWS going forward. Graviton Technical Guide is a good resource to consider while evaluating your ML workloads to run on Graviton. You can also try Graviton t4g instances free trial.

The content and opinions in this post are those of the third-party author and AWS is not responsible for the content or accuracy of this post. At the time of writing the blog the most current instance were c6i and hence the comparison was done with c6i instances.

About the Author

Vihan Lakshman – Vihan Lakshman is a research scientist at ThirdAI Corp. focused on developing systems for resource-efficient deep learning. Prior to ThirdAI, he worked as an Applied Scientist at Amazon and received undergraduate and master’s degrees from Stanford University. Vihan is also a recipient of a National Science Foundation research fellowship.

Tharun Medini – Tharun Medini is the co-founder and CTO of ThirdAI Corp. He did his PhD in “Hashing Algorithms for Search and Information Retrieval” at Rice University. Prior to ThirdAI, Tharun worked at Amazon and Target. Tharun is the recipient of numerous awards for his research, including the Ken Kennedy Institute BP Fellowship, the American Society of Indian Engineers Scholarship, and a Rice University Graduate Fellowship.

Anshumali Shrivastava – Anshumali Shrivastava is an associate professor in the computer science department at Rice University. He is also the Founder and CEO of ThirdAI Corp, a company that is democratizing AI to commodity hardware through software innovations. His broad research interests include probabilistic algorithms for resource-frugal deep learning. In 2018, Science news named him one of the Top-10 scientists under 40 to watch. He is a recipient of the National Science Foundation CAREER Award, a Young Investigator Award from the Air Force Office of Scientific Research, a machine learning research award from Amazon, and a Data Science Research Award from Adobe. He has won numerous paper awards, including Best Paper Awards at NIPS 2014 and MLSys 2022, as well as the Most Reproducible Paper Award at SIGMOD 2019. His work on efficient machine learning technologies on CPUs has been covered by popular press including Wall Street Journal, New York Times, TechCrunch, NDTV, etc.

Supercharge your AI team with Amazon SageMaker Studio: A comprehensive view of Deutsche Bahn’s AI platform transformation

AI’s growing influence in large organizations brings crucial challenges in managing AI platforms. These include developing a scalable and operationally efficient platform that adheres to organizational compliance and security standards. Amazon SageMaker Studio offers a comprehensive set of capabilities for machine learning (ML) practitioners and data scientists. These include a fully managed AI development environment with an integrated development environment (IDE), simplifying the end-to-end ML workflow. Its collaborative capabilities such as real-time coediting and sharing notebooks within the team ensures smooth teamwork, while the scalability and high-performance training caters to large datasets. With built-in security, cost-effectiveness, and a range of pre-built tools like Amazon SageMaker Autopilot, Amazon SageMaker JumpStart, and Amazon SageMaker Feature store, SageMaker Studio is a powerful platform for accelerating AI projects and empowering data scientists at every level of expertise.

Deutsche Bahn is a leading transportation organization in Germany with a revenue of 56.3 billion EUR (in 2022), a workforce of 336,884 employees (including 221,343 employees in Germany), and operations spanning 130 countries. They offer a wide range of services, including public and regional transport, freight services, and rail infrastructure. Through the integrated operation of traffic and railway infrastructure, as well as the economically and ecologically intelligent connection of all modes of transport, Deutsche Bahn moves people and goods. Deutsche Bahn has been at the forefront in adopting AI, using SageMaker Studio as a key AI platform. At Deutsche Bahn, a dedicated AI platform team manages and operates the SageMaker Studio platform, and multiple data analytics teams within the organization use the platform to develop, train, and run various analytics and ML activities.

The AI platform team’s key objective is to ensure seamless access to Workbench services and SageMaker Studio for all Deutsche Bahn teams and projects, with a primary focus on data scientists and ML engineers. This platform helps Deutsche Bahn realize a spectrum of use cases, ranging from railway maintenance, forecasting, and future applications in generative AI.

The AI platform managed service, built on SageMaker Studio, seamlessly aligns with Deutsche Bahn’s group-wide platform strategy. It meets the company’s compliance requirements, enables a swift project initiation for the team by provisioning a SageMaker domain, and reduces maintenance overhead due to an overarching operating model. Major benefits include high scalability of the service, in large part due to automation and a self-service model, and an attractive pricing model that’s primarily based on resource consumption.

“SageMaker Studio provided us a common platform that is scalable, security compliant, and addresses the development needs of data scientists from multiple data analytics teams within the DB organization. Before this, each team managed and operated their own JupyterLab notebooks, which was not efficient or cost-effective. Within 8 weeks, we onboarded over 120 developers, provisioned 25 SageMaker domains, and quickly got started using this platform.”

– Emmanuel Drosos, product owner at DB Systel.

In this post, we explore how Deutsche Bahn scaled and operated their AI platform using SageMaker Studio for multiple teams, while ensuring robust security and oversight.

Solution overview

The architecture at Deutsche Bahn consists of a central platform account managed by a platform team responsible for managing infrastructure and operations for SageMaker Studio. SageMaker Studio resources are grouped by SageMaker domains, each consisting of an associated Amazon Elastic File System (Amazon EFS) volume, a list of authorized users, and a variety of security, application, policy, and Amazon Virtual Private Cloud (Amazon VPC) configurations. At Deutsche Bahn, data scientists from various teams use SageMaker domains for their ML activities; each team has a dedicated SageMaker domain that they use for developing and testing ML models and collaborate using features such as notebook sharing.

From an infrastructure perspective, the VPC provisioned in the AI platform account as shown in the following figure has no outbound internet connectivity to ensure security and compliance. For high availability, multiple identical private isolated subnets are provisioned. The SageMaker Studio domains are deployed in VPC only mode, which creates an elastic network interface for communication between the SageMaker service account (AWS service account) and the platform account’s VPC. The endpoints like SageMaker API, SageMaker Studio, and SageMaker notebook facilitate secure and reliable communication between the platform account’s VPC and the SageMaker domain managed by AWS in the SageMaker service account.

Each data analytics team is able to request one or multiple SageMaker domains through the company’s internal self-service portal. This process of ordering a SageMaker domain is orchestrated through a separate workflow process (via AWS Step Functions). During this orchestration flow, an Azure Active Directory (AD) group for the data analytics team is provisioned with the AD group name corresponding to the domain name. The orchestration leads to a continuous integration and continuous deployment (CI/CD) pipeline deploying an AWS Cloud Development Kit (AWS CDK) app consisting of a SageMaker domain for the respective team.

In addition to the SageMaker domain, a customized AWS Identity and Access Management (IAM) role (SageMaker-execution-role), Amazon Simple Storage Service (Amazon S3) bucket (data-bucket), customer managed key (CMK), and other AWS resources are provisioned during the deployment process by the AWS CDK app, as illustrated in the following figure. The AD group contains scientists who needs access to their team’s SageMaker domain. The AD group name corresponds to the SageMaker domain’s name and is primarily used during the authorization process.

Client separation is implemented on the level of SageMaker domains by using IAM authentication mode. A domain-specific IAM role (SageMaker-execution-role) is attached to each domain that follows the principle of least privilege and is assumed by the data analytics team during the login process. This role grants data scientists in the team the ability to perform various activities, such as running processing jobs, hyperparameter tuning jobs, transformation jobs, and experiments, as well as creating models. These ML activities are run on behalf of the user by SageMaker using the IAM pass role permission. However, certain actions like creating S3 buckets, modifying IAM roles, updating SageMaker domains, and provisioning large instances are restricted for security, compliance, and cost control reasons. The associated IAM policy makes sure that the data analytics team only has access to the relevant S3 bucket and CMK for their authorized domain, as depicted in the following figure. Additionally, the role SageMaker-execution-role allows the team members to assume roles in other accounts within the Deutsche Bahn organization from SageMaker Studio, providing them with flexibility to access resources like Amazon Relational Database Service (Amazon S3), other S3 buckets, and Amazon Athena. The IAM policy uses aws:RequestTag and aws:ResourceTag for fine-grained access control during SageMaker activities, like processing jobs, training jobs, and create models. These tags also help track associated costs for the domain. For more information, refer to Actions, resources, and condition keys for Amazon SageMaker.

The CMK encrypts both the SageMaker domain’s file system contents stored in Amazon EFS and the contents of the S3 bucket (data-bucket) that is provisioned to store data for SageMaker processing and transformation jobs. In addition, resource-based policies, such as the bucket policy and CMK policy, provide an extra layer of security, restricting both access to only authorized AI team members and permitted actions on these resources.

The AI team does not have AWS Management Console access to the AI platform team’s account. To access SageMaker Studio, as illustrated in the following figure, the data scientists from the data analytics team use a generated presigned URL by authenticating through an Amazon Cognito based custom login application. After the user logs in to this custom application, they receive an OAuth access token that contains information such as AD group name. After they log in to the custom application, the user requests SageMaker domain access through the UI by triggering an Amazon API Gateway call to generate a presigned URL. API Gateway invokes the PreSignUrlGenerator AWS Lambda function and uses an Amazon Cognito authorizer to validate the OAuth access token in the request header. The PreSignUrlGenerator function validates user access permissions for the requested SageMaker domain by comparing the AD name in the access token against the requested SageMaker domain. Upon successful authorization, the PreSignUrlGenerator function creates a SageMaker user profile upon first login and generates a presigned URL response. The custom login application then redirects the users to the requested SageMaker domain.

AWS CDK

The solution at Deutsche Bahn uses AWS CDK as infrastructure as code (IaC) to provision a SageMaker domain along with resources like S3 buckets and a CMK. The following figure illustrates the stacks and associated resources used for SageMaker deployment. The infrastructure stack takes care of setting up essential resources like VPC, subnets, and multiple SageMaker endpoints. The resources such as VPC, subnets, and service control policies (SCPs) are managed by a central cloud team through a different stack (but is shown here for simplicity). The SageMakerStudioStack is primarily responsible for provisioning a SageMaker domain, a dedicated data bucket, a CMK, and the dedicated IAM role SageMaker-execution-role. Notably, each SageMaker domain is provisioned through its individual SageMakerStudioStack.

The solution uses a purpose-built L3 construct (SageMaker Studio domain), as shown in the following figure, for the SageMaker domain resource. SageMaker Studio has a lifecycle configuration feature that enables specific initializations during the startup of JupyterLab or KernelGateway apps.

Deutsch Bahn uses the lifecycle configuration as shown in the following figure to automatically detect and shut down idle instances in the SageMaker domain, reducing unnecessary costs. Due to restricted outbound connectivity, the data analytics team uses internally hosted images and third-party libraries from the company’s internal artifactory. The lifecycle configuration script for KernelGateway configures pip and conda package managers to redirect downloads to the internally hosted artifactory location. As of this writing, there is no AWS CDK construct for the lifecycle configuration resource; therefore, they use a custom CDK resource to provision and manage the LifeCycleConfig script. Custom resources in AWS CDK offer the ability to provision and manage resources not directly supported by AWS CloudFormation or AWS CDK constructs.

Installation

The sample AWS CDK application demonstrates how various components, including the SageMaker domain, lifecycle configuration, Amazon Cognito, and IAM role with the least privileges, function together. Within the application, the SagemakerStudioStack class handles the provisioning of a SageMaker domain, IAM role (sagemaker-execution-role) that users assume, CMK, lifecycle configuration, SageMaker user profile, S3 bucket for data processing, and Amazon Cognito user group. The demo AWS CDK application provides a concise overview of key components, such as the SageMaker domain, lifecycle configuration, authentication through Amazon Cognito, and IAM role with least privileges. The SagemakerLoginStack, on the other hand, is responsible for deploying the Amazon Cognito user pool, Lambda function, and API Gateway for generating presigned URLs. The CognitoUserStack primarily focuses on deploying a user within the Amazon Cognito user pool.

You can run the following commands to compile, synthesize, and deploy the application. You should adjust the account, user, and password in the sample code for your application. The password should be at least 8 characters, with uppercase characters and numbers. The user parameter is the SageMaker domain user that will be authenticated by Amazon Cognito.

Download the source code from the GitHub repo.
Bootstrap the AWS account. In the following code, adjust the account number and Region as needed:
```
cdk bootstrap aws://11111111111/eu-central-1
```
Install the packages and compile the code:
```
npm install
npm run build
```

Synthesize the AWS CDK application:

npx cdk synth -c account=11111111111 -c region='eu-central-1' -c domain-name=team1 -c user=demo-user -c password=<your password>

Deploy the application with all stacks into the account and Region of your choice:

npx cdk deploy --all -c account=11111111111 -c region='eu-central-1' -c domain-name=team1 -c user=demo-user -c password=<password>

Download the Postman app to make an API call.

If you don’t have a Postman account, create a free account with your email. If you already have an account, sign in to your account.

On the File menu, choose Import and import the Postman environment JSON file included in the GitHub repo.
On the Environments tab in Postman, locate the environment called SageMaker.

Add the following environment variables, which you see as part of the stack deployment output from SagemakerLoginStack:

..... output from the cdk deploy .....

//PreSignedURLApi

SageMaker-login-stack.PreSignedURLApiEndpointXXXX= https://xxxxxxx.execute-api.eu-central-1.amazonaws.com/prod/

//UserPoolClientId

SageMaker-login-stack.UserPoolUserPoolClientIdFXXXX = xxxxxxxxxxxxxxxx

//UserPoolClientSecret

SageMaker-login-stack.UserPoolUserPoolClientSecretC1D088A5 = xxxxxxxxxxxxxxx

//CognitoSigninDomain

SageMaker-login-stack.UserPoolCognitoSigninDomainD3B08161 = https://SageMaker-login-xxxxx.auth.eu-central-1.amazoncognito.com/oauth2

Use the following parameters (fetch the values from the output during cdk deploy):

- domainName – The domain name parameter you passed in cdk deploy, for example team1
- client-id – The Amazon Cognito client ID
- client-secret – The Amazon Cognito client secret.
- SageMaker-presigned-api – The URL of the API Gateway created by AWS CDK, which generates the presigned URL
- cognito-signin-endpoint – The endpoint URL of the Amazon Cognito domain where the client app (in this case, Postman) authenticates by providing credentials of the user (demo-user)

The next step is to generate an OAuth2 token.

1. On the Authorization tab, choose the SageMaker environment and choose Generate New Access Token.

All the values on this tab should be prefilled.

1. Update the environment variables and choose Get New Access Token.

In the pop-up window that opens, log in to Amazon Cognito with the user name (demo-user) and password you used earlier.

Upon successful authentication, a new access token is generated.

Choose Use Token.
Choose GeneratePresignedUrlDemo in the Postman SageMaker collections and choose Send.
Make sure you selected the right environment (SageMaker) on the drop-down list.

This makes a REST API call to API Gateway and generates a presigned URL to access the SageMaker domain. You can see this URL in the response body.

Copy this URL and enter it in the browser window.

A new SageMaker domain will be launched with your user profile.

This demo application supports SageMaker features like training jobs, processing jobs, and model endpoints. Note that features like Amazon SageMaker Canvas, SageMaker JumpStart, and SageMaker Feature Store are not activated.

Clean up

Complete the following steps to clean up your resources:

On the SageMaker console, in the navigation pane, choose Domain, User Profile, and Apps.
Delete all running apps (KernelGateway or JupyterLab) from this solution.
Delete all the SageMaker user profiles you created during the login step.
On the Amazon EFS console, delete the EFS file system created for this post.
Run the following command to delete the resources created with the AWS CDK:
```
npx cdk destroy --all
```

Conclusion

The post highlighted how Deutsche Bahn effectively used SageMaker Studio to revamp its AI platform, resulting in a scalable, automated, and manageable solution to support its diverse data analytics teams. This architecture features a central platform account, a self-service domain ordering process, and infrastructure provisioning using AWS CDK. The deployment process incorporates a CI/CD pipeline, ensuring the smooth delivery of SageMaker domains.

Overall, the transformation brought about by SageMaker Studio has empowered Deutsche Bahn to construct a robust platform for their AI initiatives, catering to over 100 developers and managing 20 SageMaker domains within a single AWS account.

Lastly, we extend our sincere appreciation to Nico Seegert (d-fine) and Philipp Vollmer (Deutsche Bahn), whose invaluable contributions were instrumental in shaping this architecture.

For further reading, refer to the following resources:

___________________________________________________________________________________________

About the authors

Prasanna Tuladhar is a Cloud Infrastructure Architect at AWS Professional Services in Munich, Germany. Specializing in cloud infrastructure, workload migration, and DevOps on the AWS platform, he empowers customers to achieve their business objectives. Outside of work, he enjoys jogging, hiking, and quality time with his family.

Emmanuel Drosos is a Product Owner for the AI platform at DBSystel, a subsidiary of Deutsche Bahn (DB) Germany. With a passion for innovation and technology, Emmanuel spearheads initiatives aimed at leveraging the power of the cloud to drive AI platform at DB (Deutsche Bahn). The AI.Platform is one of DB’s group-wide development platforms. It includes AI services and tools for the development of AI (machine learning) models and directly usable AI services. Simple, integrated and scalable.He works closely with other DB customers to unlock the full potential of AI platform, enabling them to achieve their business objectives efficiently and effectively. Outside of his professional activities, Emmanuel enjoys traveling and is an enthusiastic nature and hiking lover.

Vishwanath Bhat is a DevOps Architect at AWS Professional Services, based in Germany. He helps customers to get the full benefit of the cloud and achieve their business goals with AWS cloud. When not working, he likes to go swimming in alpine lakes, hiking, reading or play football.

Kumudhan Cherarajan is a DevOps Consultant at AWS Professional Services, based in Switzerland. He is passionate about helping customers adopt process and services that increase their efficiency in the cloud journey. When not working, he likes to play cricket and music.

Battle.net Leaps Into the Cloud With GeForce NOW

GFN Thursday celebrates this leap day with the addition of a popular game store to the cloud.

Stream the first titles from Blizzard Entertainment’s Battle.net, including Diablo IV, Overwatch 2, Call of Duty HQ and Hearthstone, now playable across more devices than ever.

They’re all part of the 30 new games coming to GeForce NOW in March, with eight available this week.

Plus, Day Passes, announced at CES, are coming to the cloud next week, enabling gamers to experience the benefits of GeForce NOW Ultimate and Priority memberships for 24 hours at a time.

Welcome to the Cloud

Diablo IV on GeForce NOW — *More cloud gaming friends.*

Battle.net is Blizzard’s digital storefront, a gateway to adventures in the Blizzard universe and home to a vibrant gaming community.

Members who own Diablo IV, Overwatch 2, Call of Duty HQ and Hearthstone on Battle.net can now stream these triple-A titles from NVIDIA GeForce RTX-powered servers in the cloud without worrying about hardware specs or long download times.

Hearthstone on GeForce NOW — *Cloud gamers have heart.*

Battle the forces of evil in the dark, treacherous world of Diablo IV’s Sanctuary at up to 4K resolution and 120 frames per second with an Ultimate membership, even on under-powered devices. Assemble a deck to cast legendary spells in Hearthstone, and engage in epic firefights in Overwatch 2 and Call of Duty HQ at ultra-low latency thanks to the power of NVIDIA Reflex technology. Read this article and search for Hearthstone for more details on supported devices for this title.

Get ready to play Blizzard and Activision’s top-quality games anytime, anywhere. Battle.net joins supported platforms on GeForce NOW, including Steam, Epic Games Store, Xbox, Ubisoft Connect and GOG.com.

Not Mad at March

Welcome to ParadiZe on GeForce NOW — *“A wasteland I like to call my home.”*

Imagine a paradise … infested with zombies! In Welcome to ParadiZe — now available for members to stream — capture, control and teach zombies to farm or fight in the beautiful country of ParadiZe. Explore the world’s unique flora and fauna while using the zombies to defend the camp and do the dirty work.

In addition, members can look for the following this week:

STAR WARS: Dark Forces Remaster (New release on Steam, Feb. 28)
Space Engineers (New release on Xbox, available on PC Game Pass, Feb. 29)
Welcome to ParadiZe (New release on Steam, Feb. 29)
Call of Duty HQ (Battle.net)
Diablo IV (Battle.net)
Fort Solis (Steam)
Hearthstone (Battle.net)
Overwatch 2 (Battle.net)

Plus, check out what the rest of March looks like:

The Thaumaturge (New release on Steam, Mar. 4)
Classified: France ’44 (New release on Steam, Mar. 5)
Expeditions: A MudRunner Game (New release on Steam, Mar. 5)
Warhammer 40,000: Boltgun (New release on Xbox, available on PC Game Pass, Mar. 5)
Winter Survival (New release on Steam, Mar. 6)
Taxi Life: A City Driving Simulator (New release on Steam, Mar. 7)
Hellbreach: Vegas (New release on Steam, Mar. 11)
Crown Wars: The Black Prince (New release on Steam, Mar. 14)
Outcast – A New Beginning (New release on Steam, Mar. 15)
Alone in the Dark (New release on Steam, Mar. 20)
Breachway (New release on Steam, Mar. 22)
Palia (New release on Steam, Mar. 25)
Bulwark: Falconeer Chronicles (New release on Steam, Mar. 26)
Millennia (New release on Steam, Mar. 26)
Outpost: Infinity Siege (New release on Steam, Mar. 26)
SOUTH PARK: SNOW DAY! (New release on Steam, Mar. 26)
Balatro (Steam)
PARANORMASIGHT: The Seven Mysteries of Honjo (Steam)
Portal: Revolution (Steam)
STAR OCEAN THE SECOND STORY R (Steam)
STAR OCEAN THE SECOND STORY R – DEMO (Steam)
Undisputed (Steam)

Fantastic February

In addition to the 27 games announced last month, five more joined the GeForce NOW library:

Deep Rock Galactic: Survivor (New release on Steam, Feb. 14)
Goat Simulator 3 (New release on Steam, Feb. 15)
Le Mans Ultimate (New release on Steam, Feb. 20)
art of rally (Xbox, available on Microsoft Store)
Halo Infinite (Steam and Xbox, available on PC Game Pass)

The Thaumaturge didn’t make it in February due to a shift in its launch date, and is included in the March games list.

What are you planning to play this weekend? Let us know on X or in the comments below.

Wait for it…. pic.twitter.com/tXdyoeuQSP

— NVIDIA GeForce NOW (@NVIDIAGFN) February 28, 2024

Abstracts: February 29, 2024

MSR Podcast - Abstracts hero with a microphone icon

Members of the research community at Microsoft work continuously to advance their respective fields. Abstracts brings its audience to the cutting edge with them through short, compelling conversations about new and noteworthy achievements.

In this episode, Senior Behavioral Science Researcher Lev Tankelevitch joins host Gretchen Huizinga to discuss “The Metacognitive Demands and Opportunities of Generative AI.” In their paper, Tankelevitch and his coauthors propose using the scientific study of how people monitor, understand, and adapt their thinking to address common challenges of incorporating generative AI into life and work—from crafting effective prompts to determining the value of AI-generated outputs. 

To learn more about the paper and related topics, register for Microsoft Research Forum (opens in new tab), a series of panel discussions and lightning talks around science and technology research in the era of general AI.

Read the paper

Transcript

[MUSIC PLAYS]

GRETCHEN HUIZINGA: Welcome to Abstracts, a Microsoft Research Podcast that puts the spotlight on world-class research in brief. I’m Dr. Gretchen Huizinga. In this series, members of the research community at Microsoft give us a quick snapshot—or a podcast abstract—of their new and noteworthy papers.

[MUSIC FADES]

Today, I’m talking to Dr. Lev Tankelevitch, a senior behavioral science researcher from Microsoft Research. Dr. Tankelevitch is coauthor of a paper called “The Metacognitive Demands and Opportunities of Generative AI,” and you can read this paper now on arXiv. Lev, thanks for joining us on Abstracts!

LEV TANKELEVITCH: Thanks for having me.

HUIZINGA: So in just a couple sentences—a metacognitive elevator pitch, if you will—tell us about the issue or problem your paper addresses and, more importantly, why we should care about it.

TANKELEVITCH: Sure. So as generative AI has, sort of, rolled out over the last year or two, we’ve seen some user studies come out, and as we read these studies, we noticed there are a lot of challenges that people face with these tools. So people really struggle with, you know, writing prompts for systems like Copilot or ChatGPT. For example, they don’t even know really where to start, or they don’t know how to convert an idea they have in their head into, like, clear instructions for these systems. If they’re, sort of, working in a field that maybe they’re less familiar with, like a new programming language, and they get an output from these systems, they’re not really sure if it’s right or not. And then, sort of, more broadly, they don’t really know how to fit these systems into their workflows. And so we’ve noticed all these challenges, sort of, arise, and some of them relate to, sort of, the unique features of generative AI, and some relate to the design of these systems. But basically, we started to, sort of, look at these challenges, and try to understand what’s going on—how can we make sense of them in a more coherent way and actually build systems that really augment people and their capabilities rather than, sort of, posing these challenges?

HUIZINGA: Right. So let’s talk a little bit about the related research that you’re building on here and what unique insights or directions your paper adds to the literature.

TANKELEVITCH: So as I mentioned, we were reading all these different user studies that were, sort of, testing different prototypes or existing systems like ChatGPT or GitHub Copilot, and we noticed different patterns emerging, and we noticed that the same kinds of challenges were cropping up. But there weren’t any, sort of, clear coherent explanations that tied all these things together. And in general, I’d say that human-computer interaction research, which is where a lot of these papers are coming out from, it’s really about building prototypes, testing them quickly, exploring things in an open-ended way. And so we thought that there was an opportunity to step back and to try to see how we can understand these patterns from a more theory-driven perspective. And so, with that in mind, one perspective that became clearly relevant to this problem is that of metacognition, which is this idea of “thinking about thinking” or how we, sort of, monitor our cognition or our thinking and then control our cognition and thinking. And so we thought there was really an opportunity here to take this set of theories and research findings from psychology and cognitive science on metacognition and see how they can apply to understanding these usability challenges of generative AI systems.

HUIZINGA: Yeah. Well, this paper isn’t a traditional report on empirical research as many of the papers on this podcast are. So how would you characterize the approach you chose and why?

TANKELEVITCH: So the way that we got into this, working on this project, it was, it was quite organic. So we were looking at these user studies, and we noticed these challenges emerging, and we really tried to figure out how we can make sense of them. And so it occurred to us that metacognition is really quite relevant. And so what we did was we then dove into the metacognition research from psychology and cognitive science to really understand what are the latest theories, what are the latest research findings, how could we understand what’s known about that from that perspective, from that, sort of, fundamental research, and then go back to the user studies that we saw in human-computer interaction and see how those ideas can apply there. And so we did this, sort of, in an iterative way until we realized that we really have something to work with here. We can really apply a somewhat coherent framework onto these, sort of, disparate set of findings not only to understand these usability challenges but then also to actually propose directions for new design and research explorations to build better systems that support people’s metacognition.

HUIZINGA: So, Lev, given the purpose of your paper, what are the major takeaways for your readers, and how did you present them in the paper?

TANKELEVITCH: So I think the key, sort of, fundamental point is that the perspective of metacognition is really valuable for understanding the usability challenges of generative AI and potentially designing new systems that support metacognition. And so one analogy that we thought was really useful here is of a manager delegating tasks to a team. And so a manager has to determine, you know, what is their goal in their work? What are the different subgoals that that goal breaks down into? How can you communicate those goals clearly to a team, right? Then how do you assess your team’s outputs? And then how do you actually adjust your strategy accordingly as the team works in an iterative fashion? And then at a higher level, you have to really know how to—actually what to delegate to your team and how you might want to delegate that. And so we realized that working with generative AI really parallels these different aspects of what a manager does, right. So when people have to write a prompt initially, they really have to have self-awareness of their task goals. What are you actually trying to achieve? How does that translate into different subtasks? And how do you verbalize that to a system in a way that system understands? You might then get an output and you need to iterate on that output. So then you need to really think about, what is your level of confidence in your prompting ability? So is your prompting the main reason why the output isn’t maybe as satisfactory as you want, or is it something to do with the system? Then you actually might get the output [you’re] happy with, but you’re not really sure if you should fully rely on it because maybe it’s an area that is outside of your domain of expertise. And so then you need to maintain an appropriate level of confidence, right? Either to verify that output further or decide not to rely on it, for example. And then at a, sort of, broader level, this is about the question of task delegation. So this requires having self-awareness of the applicability of generative AI to your workflows and maintaining an appropriate level of confidence in completing tasks manually or relying on generative AI. For example, whether it’s worth it for you to actually learn how to work with generative AI more effectively. And then finally, it requires, sort of, metacognitive flexibility to adapt your workflows as you work with these tools. So are there some tasks where the way that you’re working with them is, sort of, slowing you down in specific ways? So being able to recognize that and then change your strategies as necessary really requires metacognitive flexibility. So that was, sort of, one key half of our findings.

And then beyond that we really thought about how we can use this perspective of metacognition to design better systems. And so one, sort of, general direction is really about supporting people’s metacognition. So we know from research from cognitive science and psychology that we can actually design interventions to improve people’s metacognition in a lasting and effective way. And so similarly, we can design systems that support people’s metacognition. For example, systems that support people in planning their tasks as they actually craft prompts. We can support people in actually reflecting on their confidence in their prompting ability or in assessing the output that they see. And so this relates a little bit to AI acting as a coach for you, which is an idea that the Microsoft Research New York City team came up with. So this is Jake Hofman, David Rothschild, and Dan Goldstein. And so, in this way, generative AI systems can really help you reflect as a coach and understand whether you have the right level of confidence in assessing output or crafting prompts and so on. And then similarly, at a higher level, they can help you manage your workflows, so helping you reflect on whether generative AI is really working for you in certain tasks or whether you can adapt your strategy in certain ways. And likewise, this relates also to explanations about AI, so how you can actually design systems that are explainable to users in a way that helps them achieve their goals? And explainability can be thought about as a way to actually reduce the metacognitive demand because you’re, sort of, explaining things in a way to people that they don’t have to keep in their mind and have to think about, and that, sort of, improves their confidence. It can help them improve their confidence or calibrate their confidence in their ability to assess outputs.

HUIZINGA: Talk for a minute about real-world impact of this research. And by that, I mean, who does it help most and how? Who’s your main audience for this right now?

TANKELEVITCH: In a sense, this is very broadly applicable. It’s really about designing systems that people can interact with in any domain and in any context. But I think, given how generative AI has rolled out in the world today, I mean, a lot of the focus has been on productivity and workflows. And so this is a really well-defined, clear area where there is an opportunity to actually help people achieve more and stay in control and actually be more intentional and be more aligned with their goals. And so this is, this is an approach where not only can we go beyond, sort of, automating specific tasks but actually use these systems to help people clarify their goals and track with them in a more effective way. And so knowledge workers are an obvious, sort of, use case or an obvious area where this is really relevant because they work in a complex system where a lot of the work is, sort of, diffused and spread across collaborations and artifacts and softwares and different ways of working. And so a lot of things are, sort of, lost or made difficult by that complexity. And so systems, um, that are flexible and help people actually reflect on what they want to achieve can really have a big impact here.

HUIZINGA: Mm-hmm. Are you a little bit upstream of that even now in the sense that this is a “research direction” kind of paper. I noticed that as I read it, I felt like this was how researchers can begin to think about what they’re doing and how that will help downstream from that.

TANKELEVITCH: Yes. That’s exactly right. So this is really about, we hope, unlocking a new direction of research and design where we take this perspective of metacognition—of how we can help people think more clearly and, sort of, monitor and control their own cognition—and design systems to help them do that. And in the paper, there’s a whole list of different questions, both fundamental research questions to understand in more depth how metacognition plays a role in human-AI interaction when people work with generative AI systems but also how we can then actually design new interventions or new systems that actually support people’s metacognition. And so there’s a lot of work to do in this, and we hope that, sort of, inspires a lot of further research, and we’re certainly planning to do a lot more follow-up research.

HUIZINGA: Yeah. So I always ask, if there was just one thing that you wanted our listeners to take away from this work, a sort of golden nugget, what would it be?

TANKELEVITCH: I mean, I’d say that if we really want generative AI to be about augmenting human agency, then I think we need to focus on understanding how people think and behave in their real-world context and design for that. And so I think specifically, the real potential of generative AI here, as I was saying, is not just to automate a bunch of tasks but really to help people clarify their intentions and goals and act in line with them. And so, in a way, it’s kind of about building tools for thought, which was the real vision of the early pioneers of computing. And so I hope that this, kind of, goes back to that original idea.

HUIZINGA: You mentioned this short list of open research questions in the field, along with a list of suggested interventions. You’ve, sort of, curated that for your readers at the end of the paper. But give our audience a little overview of that and how those questions inform your own research agenda coming up next.

TANKELEVITCH: Sure. So on the, sort of, fundamental research side of things, there are a lot of questions around how, for example, self-confidence that people have plays a role in their interactions with generative AI systems. So this could be self-confidence in their ability to prompt these systems. And so that is one interesting research question. What is the role of confidence and calibrating one’s confidence in prompting? And then similarly, on the, sort of, output evaluation side, when you get an output from generative AI, how do you calibrate your confidence in assessing that output, right, especially if it’s in an area where maybe you’re less familiar with? And so there’s these interesting, nuanced questions around self-confidence that are really interesting, and we’re actually exploring this in a new study. This is part of the AI, Cognition, and [the] Economy pilot project. So this is a collaboration that we’re running with Dr. Clara Colombatto, who’s a researcher in University of Waterloo and University College London, and we’re essentially designing a study where we’re trying to understand people’s confidence in themselves, in their planning ability, and in working with AI systems to do planning together, and how that influences their reliance on the output of generative AI systems.

[MUSIC PLAYS]

HUIZINGA: Well, Lev Tankelevitch, thank you for joining us today, and to our listeners, thanks for tuning in. If you want to read the full paper on metacognition and generative AI, you can find a link at aka.ms/abstracts, or you can read it on arXiv. Also, Lev will be speaking about this work at the upcoming Microsoft Research Forum, and you can register for this series of events at researchforum.microsoft.com. See you next time on Abstracts!

[MUSIC FADES]

The post Abstracts: February 29, 2024 appeared first on Microsoft Research.

What Is Sovereign AI?

Nations have long invested in domestic infrastructure to advance their economies, control their own data and take advantage of technology opportunities in areas such as transportation, communications, commerce, entertainment and healthcare.

AI, the most important technology of our time, is turbocharging innovation across every facet of society. It’s expected to generate trillions of dollars in economic dividends and productivity gains.

Countries are investing in sovereign AI to develop and harness such benefits on their own. Sovereign AI refers to a nation’s capabilities to produce artificial intelligence using its own infrastructure, data, workforce and business networks.

Why Sovereign AI Is Important

The global imperative for nations to invest in sovereign AI capabilities has grown since the rise of generative AI, which is reshaping markets, challenging governance models, inspiring new industries and transforming others — from gaming to biopharma. It’s also rewriting the nature of work, as people in many fields start using AI-powered “copilots.”

Sovereign AI encompasses both physical and data infrastructures. The latter includes sovereign foundation models, such as large language models, developed by local teams and trained on local datasets to promote inclusiveness with specific dialects, cultures and practices.

For example, speech AI models can help preserve, promote and revitalize indigenous languages. And LLMs aren’t just for teaching AIs human languages, but for writing software code, protecting consumers from financial fraud, teaching robots physical skills and much more.

In addition, as artificial intelligence and accelerated computing become increasingly critical tools for combating climate change, boosting energy efficiency and protecting against cybersecurity threats, sovereign AI has a pivotal role to play in equipping every nation to bolster its sustainability efforts.

Factoring In AI Factories

Comprising new, essential infrastructure for AI production are “AI factories,” where data comes in and intelligence comes out. These are next-generation data centers that host advanced, full-stack accelerated computing platforms for the most computationally intensive tasks.

Nations are building up domestic computing capacity through various models. Some are procuring and operating sovereign AI clouds in collaboration with state-owned telecommunications providers or utilities. Others are sponsoring local cloud partners to provide a shared AI computing platform for public- and private-sector use.

“The AI factory will become the bedrock of modern economies across the world,” NVIDIA founder and CEO Jensen Huang said in a recent media Q&A.

Sovereign AI Efforts Underway

Nations around the world are already investing in sovereign AI.

Since 2019, NVIDIA’s AI Nations initiative has helped countries spanning every region of the globe to build sovereign AI capabilities, including ecosystem enablement and workforce development, creating the conditions for engineers, developers, scientists, entrepreneurs, creators and public sector officials to pursue their AI ambitions at home.

France-based Scaleway, a subsidiary of the iliad Group, is building Europe’s most powerful cloud-native AI supercomputer. The NVIDIA DGX SuperPOD comprises 127 DGX H100 systems, representing 1,016 NVIDIA H100 Tensor Core GPUs interconnected by NVIDIA NVLink technology and the NVIDIA Quantum-2 InfiniBand platform. NVIDIA DGX systems also include NVIDIA AI Enterprise software for secure, supported and stable AI development and deployment.

Swisscom Group, majority-owned by the Swiss government, recently announced its Italian subsidiary, Fastweb, will build Italy’s first and most powerful NVIDIA DGX-powered supercomputer — also using NVIDIA AI Enterprise software — to develop the first LLM natively trained in the Italian language.

With these NVIDIA technologies and its own cloud and cybersecurity infrastructures, Fastweb plans to launch an end-to-end system with which Italian companies, public-administration organizations and startups can develop generative AI applications for any industry.

The government of India has also announced sovereign AI initiatives promoting workforce development, sustainable computing and private-sector investment in domestic compute capacity. India-based Tata Group, for example, is building a large-scale AI infrastructure powered by the NVIDIA GH200 Grace Hopper Superchip, while Reliance Industries will develop a foundation LLM tailored for generative AI and trained on the diverse languages of the world’s most populous nation. NVIDIA is also working with India’s top universities to support and expand local researcher and developer communities.

Japan is going all in with sovereign AI, collaborating with NVIDIA to upskill its workforce, support Japanese language model development, and expand AI adoption for natural disaster response and climate resilience. These efforts include public-private partnerships that are incentivizing leaders like SoftBank Corp. to collaborate with NVIDIA on building a generative AI platform for 5G and 6G applications as well as a network of distributed AI factories.

Finally, Singapore is fostering a range of sovereign AI programs, including by partnering with NVIDIA to upgrade its National Super Computer Center, or NSCC, with NVIDIA H100 GPUs. In addition, Singtel, a leading communications services provider building energy-efficient AI factories across Southeast Asia, is accelerated by NVIDIA Hopper architecture GPUs and NVIDIA AI reference architectures.

AI Frontiers: The future of scale with Ahmed Awadallah and Ashley Llorens

Geometry deep learning and SAR

Enhancing molecular geometry representations by ViSNet

ViSNet in real-world applications for molecular modeling and property predictions

Looking forward: Toward AI-powered MD simulations with ab initio accuracy

Overview of Knowledge Bases for Amazon Bedrock

Build a knowledge base for Amazon Bedrock

Query the knowledge base

Generate questions using Amazon Bedrock

Use the Amazon Bedrock RetrieveAndGenerate API

Query using the Amazon Bedrock Retrieve API

Query using Amazon Bedrock LangChain integration

Clean up

Conclusion

About the Authors

Solution overview

Prerequisites

Deploy the CloudFormation stack

Set up and complete the Amazon Personalize workflow

Install the Amazon Personalize Search Ranking plugin using a Jupyter notebook

Install the Amazon Personalize Search Ranking plugin using the console

Enable the Amazon Personalize Search Ranking plugin

Define search pipeline for personalized ranking

Apply a search pipeline to an individual query

Evaluate the results

Personalized vs. non-personalized results

Clean up

Conclusion

Additional resources

About the Authors

Solution overview

Repository structure

Prerequisites

Deploy the solution

Configuration file structure

Framework configuration

Model configuration

Examples

Single-model training: LightGBM

Single-model training: LLM fine-tuning

Multi-model training

Clean up

Conclusion

About the Authors

The benefits of high-performance CPUs

Instance types

Evaluation 1: Extreme classification

Evaluation 2: Yelp Polarity sentiment analysis

Evaluation 3: Multi-class text classification (DBPedia)

Get started with ThirdAI on AWS Graviton

Conclusion

About the Author

Solution overview

AWS CDK

Installation

Clean up

Conclusion

___________________________________________________________________________________________

About the authors

Welcome to the Cloud

Not Mad at March

Fantastic February

Subscribe to the Microsoft Research Podcast:

Transcript

Why Sovereign AI Is Important

Factoring In AI Factories

Sovereign AI Efforts Underway

Navigation

Computer Vision Endless Possibilities

"I'm interested in things that change the world or that affect the future and wondrous, new technology where you see it, and you're like, 'Wow, how did that even happen? How is that possible?'" -- Elon Musk

Copyright © 2019-2023 Vedere AI. All Rights Reserved.