May 2023 – Page 19

Latest NVIDIA Graphics Research Advances Generative AI’s Next Frontier

NVIDIA today introduced a wave of cutting-edge AI research that will enable developers and artists to bring their ideas to life — whether still or moving, in 2D or 3D, hyperrealistic or fantastical.

Around 20 NVIDIA Research papers advancing generative AI and neural graphics — including collaborations with over a dozen universities in the U.S., Europe and Israel — are headed to SIGGRAPH 2023, the premier computer graphics conference, taking place Aug. 6-10 in Los Angeles.

The papers include generative AI models that turn text into personalized images; inverse rendering tools that transform still images into 3D objects; neural physics models that use AI to simulate complex 3D elements with stunning realism; and neural rendering models that unlock new capabilities for generating real-time, AI-powered visual details.

Innovations by NVIDIA researchers are regularly shared with developers on GitHub and incorporated into products, including the NVIDIA Omniverse platform for building and operating metaverse applications and NVIDIA Picasso, a recently announced foundry for custom generative AI models for visual design. Years of NVIDIA graphics research helped bring film-style rendering to games, like the recently released Cyberpunk 2077 Ray Tracing: Overdrive Mode, the world’s first path-traced AAA title.

The research advancements presented this year at SIGGRAPH will help developers and enterprises rapidly generate synthetic data to populate virtual worlds for robotics and autonomous vehicle training. They’ll also enable creators in art, architecture, graphic design, game development and film to more quickly produce high-quality visuals for storyboarding, previsualization and even production.

AI With a Personal Touch: Customized Text-to-Image Models

Generative AI models that transform text into images are powerful tools to create concept art or storyboards for films, video games and 3D virtual worlds. Text-to-image AI tools can turn a prompt like “children’s toys” into nearly infinite visuals a creator can use for inspiration — generating images of stuffed animals, blocks or puzzles.

However, artists may have a particular subject in mind. A creative director for a toy brand, for example, could be planning an ad campaign around a new teddy bear and want to visualize the toy in different situations, such as a teddy bear tea party. To enable this level of specificity in the output of a generative AI model, researchers from Tel Aviv University and NVIDIA have two SIGGRAPH papers that enable users to provide image examples that the model quickly learns from.

One paper describes a technique that needs a single example image to customize its output, accelerating the personalization process from minutes to roughly 11 seconds on a single NVIDIA A100 Tensor Core GPU, more than 60x faster than previous personalization approaches.

A second paper introduces a highly compact model called Perfusion, which takes a handful of concept images to allow users to combine multiple personalized elements — such as a specific teddy bear and teapot — into a single AI-generated visual:

Examples of generative AI model personalizing text-to-image output based on user-provided images

Serving in 3D: Advances in Inverse Rendering and Character Creation

Once a creator comes up with concept art for a virtual world, the next step is to render the environment and populate it with 3D objects and characters. NVIDIA Research is inventing AI techniques to accelerate this time-consuming process by automatically transforming 2D images and videos into 3D representations that creators can import into graphics applications for further editing.

A third paper created with researchers at the University of California, San Diego, discusses tech that can generate and render a photorealistic 3D head-and-shoulders model based on a single 2D portrait — a major breakthrough that makes 3D avatar creation and 3D video conferencing accessible with AI. The method runs in real time on a consumer desktop, and can generate a photorealistic or stylized 3D telepresence using only conventional webcams or smartphone cameras.

A fourth project, a collaboration with Stanford University, brings lifelike motion to 3D characters. The researchers created an AI system that can learn a range of tennis skills from 2D video recordings of real tennis matches and apply this motion to 3D characters. The simulated tennis players can accurately hit the ball to target positions on a virtual court, and even play extended rallies with other characters.

Beyond the test case of tennis, this SIGGRAPH paper addresses the difficult challenge of producing 3D characters that can perform diverse skills with realistic movement — without the use of expensive motion-capture data.

Not a Hair Out of Place: Neural Physics Enables Realistic Simulations

Once a 3D character is generated, artists can layer in realistic details such as hair — a complex, computationally expensive challenge for animators.

Humans have an average of 100,000 hairs on their heads, with each reacting dynamically to an individual’s motion and the surrounding environment. Traditionally, creators have used physics formulas to calculate hair movement, simplifying or approximating its motion based on the resources available. That’s why virtual characters in a big-budget film sport much more detailed heads of hair than real-time video game avatars.

A fifth paper showcases a method that can simulate tens of thousands of hairs in high resolution and in real time using neural physics, an AI technique that teaches a neural network to predict how an object would move in the real world.

The team’s novel approach for accurate simulation of full-scale hair is specifically optimized for modern GPUs. It offers significant performance leaps compared to state-of-the-art, CPU-based solvers, reducing simulation times from multiple days to merely hours — while also boosting the quality of hair simulations possible in real time. This technique finally enables both accurate and interactive physically based hair grooming.

Neural Rendering Brings Film-Quality Detail to Real-Time Graphics

After an environment is filled with animated 3D objects and characters, real-time rendering simulates the physics of light reflecting through the virtual scene. Recent NVIDIA research shows how AI models for textures, materials and volumes can deliver film-quality, photorealistic visuals in real time for video games and digital twins.

NVIDIA invented programmable shading over two decades ago, enabling developers to customize the graphics pipeline. In these latest neural rendering inventions, researchers extend programmable shading code with AI models that run deep inside NVIDIA’s real-time graphics pipelines.

In a sixth SIGGRAPH paper, NVIDIA will present neural texture compression that delivers up to 16x more texture detail without taking additional GPU memory. Neural texture compression can substantially increase the realism of 3D scenes, as seen in the image below, which demonstrates how neural-compressed textures (right) capture sharper detail than previous formats, where the text remains blurry (center).

Three-pane image showing a page of text, a zoomed-in version with blurred text, and a zoomed-in version with clear text. — Neural texture compression (right) provides up to 16x more texture detail than previous texture formats without using additional GPU memory.

A related paper announced last year is now available in early access as NeuralVDB, an AI-enabled data compression technique that decreases by 100x the memory needed to represent volumetric data — like smoke, fire, clouds and water.

NVIDIA also released today more details about neural materials research that was shown in the most recent NVIDIA GTC keynote. The paper describes an AI system that learns how light reflects from photoreal, many-layered materials, reducing the complexity of these assets down to small neural networks that run in real time, enabling up to 10x faster shading.

The level of realism can be seen in this neural-rendered teapot, which accurately represents the ceramic, the imperfect clear-coat glaze, fingerprints, smudges and even dust.

Rendered close-up images of a ceramic blue teapot with gold handle — The neural material model learns how light reflects from the many-layered, photoreal reference materials.

More Generative AI and Graphics Research

These are just the highlights — read more about all the NVIDIA papers at SIGGRAPH. NVIDIA will also present six courses, four talks and two Emerging Technology demos at the conference, with topics including path tracing, telepresence and diffusion models for generative AI.

NVIDIA Research has hundreds of scientists and engineers worldwide, with teams focused on topics including AI, computer graphics, computer vision, self-driving cars and robotics.

AI self-play for algorithm design

A flow chart demonstrating the five steps in a self-play pipeline for a language model to improve itself automatically.A self-play pipeline for a language model (LM) to improve itself in a fully automatic manner. First, the LM generates novel puzzles based on a training set of handwritten puzzles. Then, the LM attempts to solve each of these puzzles 100 times. In Step 3, the computer (specifically a Python interpreter) filters the candidate solutions for correctness. Finally, the LM is improved by further training on these verified correct solutions to synthetic puzzles, and the process repeats. This process leads to significant improvements as measured on held-out test puzzles that were also handwritten. — A self-play pipeline for a language model (LM) to improve itself in a fully automatic manner. First, the LM generates novel puzzles based on a training set of handwritten puzzles. Then, the LM attempts to solve each of these puzzles 100 times. In Step 3, the computer (specifically a Python interpreter) filters the candidate solutions for correctness. Finally, the LM is improved by further training on these verified correct solutions to synthetic puzzles, and the process repeats. This process leads to significant improvements as measured on held-out test puzzles, which were also handwritten.

Efficient algorithms are crucial for many purposes, including reducing energy consumption in digital devices. While humans outperform AI systems at designing such algorithms, we show how to improve AI programming abilities using self-play, a technique that has helped AI systems dominate in games such as chess and Go.

Designing fast and accurate algorithms requires high-level abstract reasoning, which remains difficult for AI systems. Our approach involves having the AI design and solve its own programming challenges, enabling practice on millions of artificial challenges and exploration of problem types not found in public repositories. We detail our work in a new paper, “Language Models Can Teach Themselves to Program Better,” which we’re presenting at the 2023 International Conference on Learning Representations (ICLR).

The key challenge and our solution

How can an AI system generate novel algorithmic programming problems without knowing the solution?

Our approach uses programming puzzles introduced by Microsoft Research in 2021. These puzzles—known in complexity theory as the class of “NP” decision problems—are easy to check for correctness (no hidden answer key) but often difficult to solve. In this way, they’re like a Rubik’s cube, where it’s trivial to recognize a solution but hard to find one. Three examples are illustrated below: a novel string challenge and the classic Towers of Hanoi and factoring problems. Programming puzzles can range from trivial to major open problems in algorithms and mathematics, and solving them requires all the major algorithmic techniques, such as dynamic programming and greedy algorithms. However, each puzzle just checks a single input as opposed to standard problems in algorithms, which require a solution that scales efficiently for all inputs, which is much harder to test.

Programming puzzle examples

Can computers generate valuable, novel challenges?

Surprisingly, language models such as Codex and GPT-Neo can indeed create novel puzzles when prompted to generate “more like these” on a set of example puzzles without solutions. You may wonder what makes a challenge good. Instead of focusing on interesting, we prioritize useful challenges. Our evaluation has the language model generate, solve, and train on its own puzzles; then we assess whether the training improved its performance on a hidden test set of puzzles. (By now, solutions to our puzzles may have leaked into AI training sets, but with the help of champion competitive programmers, we have created a secret test set that remains unpublished, which can be used for uncontaminated evaluation.) In our experiments with small- to medium-sized language models—with a few billion parameters, much fewer than the latest GPT models—self-training more than doubled success rates.

Risks and limitations

This research was conducted prior to GPT-4’s release. While we believe similar techniques may help GPT-4 self-improve in programming, this is an active area of research as we better understand the capabilities and limitations of these models. One key limitation of puzzles is that solutions might only work for the specific instance provided. However, this limitation also serves as an advantage in terms of human-AI alignment. Unlike other AI challenges with inherent ambiguities that could lead to unintended consequences if objectives are imprecisely defined (for example, an AI-designed math-tutor app that may become addicting unintendedly), our programming puzzles encompass exactly those standalone problems that can be perfectly verified for meeting a precise objective. As there remains a risk that any work that substantially advances AI programming capabilities can be used in other systems and with unintended consequences, we continue to encourage taking great care before deploying systems with artificially generated code.

Examples of programming puzzles for AI self-play

Each puzzle is specified by a short Python program that checks a possible answer. Each solution is a Python program that outputs an answer in a limited amount of time.

Example 1: Towers of Hanoi

A Towers of Hanoi puzzle in three steps: the first a picture with the puzzle’s seven disks on the first tower, the second a picture with the disks split among the three towers, and the third a picture of all the disks on the last tower.

The goal of the well-known Towers of Hanoi puzzle is to move all the disks from the first tower to the last tower, one by one, without ever putting a bigger disk on top of a smaller disk. It’s easy to check that a solution is correct but hard to find a correct solution. Even though the number of steps required to solve it is exponential in the number of disks, there’s a solution in the form of a short program that is often used to teach recursion. The clever solution program that outputs the moves is easier to find than the sequence of moves itself. Here are the programming puzzle and solution:

Example 2: String challenge

This concise puzzle perplexes AI systems, although humans find it simple. The puzzle requires a string with 1,000 “A” characters but no two consecutive A’s. Most programmers devise solutions like “ABABAB …” (1,000 times), generated by the compact Python solution above. In contrast, AI systems usually need multiple attempts. Fortunately, AI systems can easily verify their attempts by running the checking program. This puzzle exemplifies a straightforward, unique problem specifically created for our dataset.

Example 3: Integer factorization

Another classic example is integer factorization. The puzzle above requires a factor of a relatively small number so it can be solved quickly by a simple loop. However, our dataset also contains factoring challenges like the 309-digit RSA Factoring Challenge number, which was published in 1991 along with a $100,000 prize. The 309-digit number was never factored, and the challenge has since ended.

The post AI self-play for algorithm design appeared first on Microsoft Research.

Bring your own ML model into Amazon SageMaker Canvas and generate accurate predictions

Machine learning (ML) helps organizations generate revenue, reduce costs, mitigate risk, drive efficiencies, and improve quality by optimizing core business functions across multiple business units such as marketing, manufacturing, operations, sales, finance, and customer service. With AWS ML, organizations can accelerate the value creation from months to days. Amazon SageMaker Canvas is a visual, point-and-click service that allows business analysts to generate accurate ML predictions without writing a single line of code or requiring ML expertise. You can use models to make predictions interactively and for batch scoring on bulk datasets.

In this post, we showcase architectural patterns on how business teams can use ML models built anywhere by generating predictions in Canvas and achieve effective business outcomes.

This integration of model development and sharing creates a tighter collaboration between business and data science teams and lowers time to value. Business teams can use existing models built by their data scientists or other departments to solve a business problem instead of rebuilding new models in outside environments.

Finally, business analysts can import shared models into Canvas and generate predictions before deploying to production with just a few clicks.

Solution overview

The following figure describes three different architecture patterns to demonstrate how data scientists can share models with business analysts, who can then directly generate predictions from those models in the visual interface of Canvas:

Use Amazon SageMaker Autopilot and Canvas
Use Amazon SageMaker JumpStart and Canvas
Use SageMaker model registry and Canvas

Prerequisites

To train and build your model using SageMaker and bring your model into Canvas, complete the following prerequisites:

If you don’t already have a SageMaker domain and Studio user, set up and onboard a Studio user to a SageMaker domain.
Enable and set up Canvas base permissions for your users and grant users permissions to collaborate with Studio.
You must have a trained model from Autopilot, JumpStart, or the model registry. For any model that you’ve built outside of SageMaker, you must register your model in the model registry before importing it into Canvas.

Now let’s assume the role of a data scientist who is looking to train, build, deploy, and share ML models with a business analyst for each of these three architectural patterns.

Use Autopilot and Canvas

Autopilot automates key tasks of an automatic ML (AutoML) process like exploring data, selecting the relevant algorithm for the problem type, and then training and tuning it. All of this can be achieved while allowing you to maintain full control and visibility on the dataset. Autopilot automatically explores different solutions to find the best model, and users can either iterate on the ML model or directly deploy the model to production with one click.

In this example, we use a customer churn synthetic dataset from the telecom domain and are tasked with identifying customers that are potentially at risk of churning. Complete the following steps to use Autopilot AutoML to build, train, deploy, and share an ML model with a business analyst:

Download the dataset, upload it to an Amazon S3 (Amazon Simple Storage Service) bucket, and make a note of the S3 URI.
On the Studio console, choose AutoML in the navigation pane.
Choose Create AutoML experiment.
Specify the experiment name (for this post, Telecom-Customer-Churn-AutoPilot), S3 data input, and output location.
Set the target column as churn.
In the deployment settings, you can enable the auto deploy option to create an endpoint that deploys your best model and runs inference on the endpoint.

For more information, refer to Create an Amazon SageMaker Autopilot experiment.

Choose your experiment, then select your best model and choose Share model.
Add a Canvas user and choose Share to share the model.

(Note: You can’t share model with the same Canvas user as used for Studio login. For example, Studio user-A can’t share model with Canvas User-A. But user-A can share model with user-B, hence choose different uses for model-sharing)

For more information, refer to Studio users: Share a model to SageMaker Canvas.

Use JumpStart and Canvas

JumpStart is an ML hub that provides pre-trained, open-source models for a wide range of ML use cases like fraud detection, credit risk prediction, and product defect detection. You can deploy more than 300 pre-trained models for tabular, vision, text, and audio data.

For this post, we use a LightGBM regression pre-trained model from JumpStart. We train the model on a custom dataset and share the model with a Canvas user (business analyst). The pre-trained model can be deployed to an endpoint for inference. JumpStart provides an example notebook to access the model after it is deployed.

In this example, we use the abalone dataset. The dataset contains examples of eight physical measurements such as length, diameter, and height to predict the age of abalone (a regression problem).

Download the abalone dataset from Kaggle.
Create an S3 bucket and upload the train, validation, and custom header datasets.
On the Studio console, under SageMaker JumpStart in the navigation pane, choose Models, notebooks, solutions.
Under Tabular Models, choose LightGBM Regression.
Under Train Model, specify the S3 URIs for the training, validation, and column header datasets.
Choose Train.
In the navigation pane, choose Launched JumpStart assets.
On the Training jobs tab, choose your training job.
On the Share menu, choose Share to Canvas.
Choose the Canvas users to share with, specify the model details, and choose Share.

For more information, refer to Studio users: Share a model to SageMaker Canvas.

Use SageMaker model registry and Canvas

With SageMaker model registry, you can catalog models for production, manage model versions, associate metadata, manage the approval status of a model, deploy models to production, and automate model deployment with CI/CD.

Let’s assume the role of a data scientist. For this example, you’re building an end-to-end ML project that includes data preparation, model training, model hosting, model registry, and model sharing with a business analyst. Optionally, for data preparation and preprocessing or postprocessing steps, you can use Amazon SageMaker Data Wrangler and an Amazon SageMaker Processing job. In this example, we use the abalone dataset downloaded from LIBSVM. The target variable is the age of abalone.

In Studio, clone the GitHub repo.
Complete the steps listed in the README file.
On the Studio console, under Models in the navigation pane, choose Model registry.
Choose the model sklearn-reg-ablone.
Share model version 1 from the model registry to Canvas.
Choose the Canvas users to share with, specify the model details, and choose Share.

For instructions, refer to the Model Registry section in Studio users: Share a model to SageMaker Canvas.

Manage shared models

After you share the model using any of the preceding methods, you can go to the Models section in Studio and review all shared models. In the following screenshot, we see 3 different models shared by a Studio user (data scientist) with different Canvas users (business teams).

Import a shared model and make predictions with Canvas

Let’s assume the role of business analyst and log in to Canvas with your Canvas user.

When a data scientist or Studio user shares a model with a Canvas user, you receive a notification within the Canvas application that a Studio user has shared a model with you. In the Canvas application, the notification is similar to the following screenshot.

You can choose View update to see the shared model, or you can go to the Models page in the Canvas application to discover all the models that have been shared with you. The model import from Studio can take up to 20 minutes.

After importing the model, you can view its metrics and generate real-time predictions with what-if analysis or batch predictions.

Considerations

Keep in mind the following when sharing models with Canvas:

You store training and validation datasets in Amazon S3, and the S3 URIs are passed to Canvas with AWS Identity and Access Management (IAM) permissions.
Provide the target column to Canvas or use the first column as default.
For a Canvas container to parse inference data, the Canvas endpoint accepts either text (CSV) or application (JSON).
Canvas doesn’t support multiple container or inference pipelines.
A data schema is provided to Canvas if no headers are provided in the training and validation datasets. By default, the JumpStart platform doesn’t provide headers in the training and validation datasets.
With Jumpstart, the training job needs to be complete before you can share it with Canvas.

Refer to Limitations and troubleshooting to help you troubleshoot any issues you encounter when sharing models.

Clean up

To avoid incurring future charges, delete or shut down the resources you created while following this post. Refer to Logging out of Amazon SageMaker Canvas for more details. Shut down the individual resources, including notebooks, terminal, kernels, apps and instances. For more information, refer to Shut Down Resources. Delete the model version, SageMaker endpoint and resources, Autopilot experiment resources, and S3 bucket.

Conclusion

Studio allows data scientists to share ML models with business analysts in a few simple steps. Business analysts can benefit from ML models already built by data scientists to solve business problems instead of creating a new model in Canvas. However, it might be difficult to use these models outside the environments in which they are built due to technical requirements and manual processes to import models. This often forces users to rebuild ML models, resulting in the duplication of effort and additional time and resources. Canvas removes these limitations so you can generate predictions in Canvas with models that you have trained anywhere. By using the three patterns illustrated in this post, you can register ML models in the SageMaker model registry, which is a metadata store for ML models, and import them into Canvas. Business analysts can then analyze and generate predictions from any model in Canvas.

To learn more about using SageMaker services, check out the following resources:

If you have questions or suggestions, leave a comment.

About the authors

Aman Sharma is a Senior Solutions Architect With AWS. He works with start-ups, small and medium businesses, and enterprise customers across the APJ region, more than 19 years of experience in consulting, architecting, and solutioning. He is passionate about democratizing AI and ML and helping customers in designing their data and ML strategies. Outside work, he likes to explore nature and wildlife.

Zichen Nie is the Senior Software Engineer at AWS SageMaker leading the project Bring Your Own Model to SageMaker Canvas last year. She has been working in Amazon for more than 7 years and has experience in both Amazon Supply Chain Optimization and AWS AI services. She enjoys Barre workouts and music after work.

Self-Supervised Temporal Analysis of Spatiotemporal Data

*=Equal Contributors
There exists a correlation between geospatial activity temporal patterns and type of land use. A novel self-supervised approach is proposed to survey landscape based on activity time series, where time series signal is transformed to frequency domain and compressed into embeddings by a contractive autoencoder, which preserve cyclic temporal patterns observed in time series. The embeddings are input to segmentation neural network for binary classification. Experiments show that the temporal embeddings are effective in classifying residential area and commercial area.Apple Machine Learning Research

Question answering using Retrieval Augmented Generation with foundation models in Amazon SageMaker JumpStart

Today, we announce the availability of sample notebooks that demonstrate question answering tasks using a Retrieval Augmented Generation (RAG)-based approach with large language models (LLMs) in Amazon SageMaker JumpStart. Text generation using RAG with LLMs enables you to generate domain-specific text outputs by supplying specific external data as part of the context fed to LLMs.

JumpStart is a machine learning (ML) hub that can help you accelerate your ML journey. JumpStart provides many pre-trained language models called foundation models that can help you perform tasks such as article summarization, question answering, and conversation generation and image generation.

In this post, we describe RAG and its advantages, and demonstrate how to quickly get started by using a sample notebook to solve a question answering task using RAG implementation with LLMs in Jumpstart. We demonstrate two approaches:

How to solve the problem with the open-sourced LangChain library and Amazon SageMaker endpoints in a few lines of code
How to use the SageMaker KNN algorithm to perform semantic searching for large-scale data using SageMaker endpoints

LLMS and constraints

LLMs are trained on large amounts of unstructured data and are great at general text generation. LLMs can store factual knowledge by training their parameters on a large corpus of natural language data.

There are a few limitations of using off-the-shelf pre-trained LLMs:

They’re usually trained offline, making the model agnostic to the latest information (for example, a chatbot trained from 2011–2018 has no information about COVID-19).
They make predictions by only looking up information stored in its parameters, leading to inferior interpretability.
They’re mostly trained on general domain corpora, making them less effective on domain-specific tasks. There are scenarios when you want models to generate text based on specific data rather than generic data. For example, a health insurance company may want their question answering bot to answer questions using the latest information stored in their enterprise document repository or database, so the answers are accurate and reflect their unique business rules.

Currently, there are two popular ways to reference specific data in LLMs:

Insert data as context in the model prompt as a way to provide the information that the model can use while creating the result
Fine-tine the model by providing a file with prompt and completion pairs

The challenge of the context-based approach is that models come with limited context size, and including all the documents as context may not fit into the allowed context size of the model. Depending on the model used, there may also be additional cost for larger context.

For the approach of fine-tuning, generating the right formatted information is time consuming and involves cost. In addition, if external data used for fine-tuning changes frequently, it would imply frequent fine-tunings and retraining are needed to create accurate results. Frequent training impacts speed to market and adds to the overall solution cost.

To demonstrate these constraints, we used an LLM Flan T5 XXL model and asked the following question:

question = "Which instances can I use with Managed Spot Training in SageMaker?"

We get the following response:

"""For model: huggingface-text2text-flan-t5-xxl, the generated output is: 
the Managed Spot Training is a subscriptions product available for the following instances: Data Science Virtual Machine (DSVM), DSVM High, and DSVM Low.
"""

As you can see, the response is not accurate. The correct answer should be all SageMaker instances support Managed Spot Training.

We tried the same question but with additional context passed along with the question:

question + context + prompt = """
Answer based on context:

Managed Spot Training can be used with all instances supported in Amazon SageMaker. Managed Spot Training is supported in all AWS Regions where Amazon SageMaker is currently available.

Which instances can I use with Managed Spot Training in SageMaker?
"""

We got the following response this time:

"""For model: huggingface-text2text-flan-t5-xxl, the generated output is: 
instances supported in Amazon SageMaker
"""

The response is better but still not accurate. However, in real production use cases, users may send various queries, and to provide accurate responses, you may want to include all or most of the available information as part of the static context to create accurate responses. Therefore, with this approach, we may hit the context size limitation constraint because even non-relevant information for the question asked is sent as part of the context. This is where you can use the RAG-based approach to create scalable and accurate responses for a user’s queries.

Retrieval Augmented Generation

To solve the constraints we discussed, we can use Retrieval Augmented Generation (RAG) with LLMs. RAG retrieves data from outside the language model (non-parametric) and augments the prompts by adding the relevant retrieved data in context. RAG models were introduced by Lewis et al. in 2020 as a model where parametric memory is a pre-trained seq2seq model and the non-parametric memory is a dense vector index of Wikipedia, accessed with a pre-trained neural retriever.

In RAG, the external data can come from multiple data sources, such as a document repository, databases, or APIs. The first step is to convert the documents and the user query in the format so they can be compared and relevancy search can be performed. To make the formats comparable for doing relevancy search, a document collection (knowledge library) and the user-submitted query are converted to numerical representation using embedding language models. The embeddings are essentially numerical representations of concept in text. Next, based on the embedding of user query, its relevant text is identified in the document collection by a similarity search in the embedding space. Then the prompt provided by the user is appended with relevant text that was searched and it’s added to the context. The prompt is now sent to the LLM and because the context has relevant external data along with the original prompt, the model output is relevant and accurate.

To maintain up-to-date information for the reference documents, you can asynchronously update the documents and update embedding representation of the documents. This way, the updated documents will be used to generate answers for future questions to provide accurate responses.

The following diagram shows the conceptual flow of using RAG with LLMs.

In this post, we demonstrate how to implement a question answering application with the following steps:

Generate embedding for each of document in the knowledge library with a SageMaker GPT-J-6B embedding model.
Identify the top K most relevant documents based on the user query.
1. For your query, generate the embedding of the query using the same embedding model.
2. Search the indexes of the top K most relevant documents in the embedding space using an in-memory FAISS search.
3. Use the indexes to retrieve the corresponding documents.
Use the retrieved relevant documents as context with the prompt and question, and send them to the SageMaker LLM to generate the response.

We demonstrate the following approaches:

How to solve a question answering task with SageMaker LLMs and embedding endpoints and the open-sourced library LangChain in a few lines of code. In particular, we use two SageMaker endpoints for the LLM (Flan T5 XXL) and embedding model (GPT-J 6B), and the vector database used is in-memory FAISS. For more details, see the GitHub repo.
If the in-memory FAISS doesn’t fit into your large dataset, we provide you with a SageMaker KNN algorithm to perform the semantic search, which also uses FAISS as the underlying searching algorithm. For details, see the GitHub repo.

The following diagram depicts the solution architecture.

JumpStart RAG-based implementation notebook with LangChain

LangChain is an open-source framework for developing applications powered by language models. LangChain provides a generic interface for many different LLMs. It also makes it easier for developers to chain various LLMs together and build powerful applications. LangChain provides a standard interface for memory and a collection of memory implementations to persist the state between calls of agents or chains.

LangChain has many other utility features that can add to developer productivity. These features include a prompt template that helps customize prompts using variables in the prompt template, agents to build end-to-end applications, indexes for search and retrieval steps of the chain, and much more. To further explore LangChain capabilities, refer to the LangChain documentation.

Create LLM Model

As a first step, deploy the JumpStart LLM model of your choice. In this demo, we use a Jumpstart Flan T5 XXL model endpoint. For deployment instructions, refer to Zero-shot prompting for the Flan-T5 foundation model in Amazon SageMaker JumpStart. Based on your use case, you can also deploy other instruction-tuned models like Flan T5 UL2 or BloomZ 7B1. For details, see the example notebook.

To use the SageMaker LLM endpoint with LangChain, we use langchain.llms.sagemaker_endpoint.SagemakerEndpoint, which abstracts the SageMaker LLM endpoint. We need to perform a transformation for the request and response payload as shown in the following code for the LangChain SageMaker integration. Note that you may need to adjust the code in ContentHandler based on the content_type and accepts format of the LLM model that you choose to use.

from langchain.llms.sagemaker_endpoint import SagemakerEndpoint

class ContentHandler(ContentHandlerBase):
    content_type = "application/json"
    accepts = "application/json"

    def transform_input(self, prompt: str, model_kwargs={}) -> bytes:
        input_str = json.dumps({"text_inputs": prompt, **model_kwargs})
        return input_str.encode("utf-8")

    def transform_output(self, output: bytes) -> str:
        response_json = json.loads(output.read().decode("utf-8"))
        return response_json["generated_texts"][0]

content_handler = ContentHandler()

sm_llm = SagemakerEndpoint(
    endpoint_name=_MODEL_CONFIG_["huggingface-text2text-flan-t5-xxl"]["endpoint_name"],
    region_name=aws_region,
    model_kwargs=parameters,
    content_handler=content_handler,
)

Create the embedding model

Next, we need to get our embedded model ready. We deploy the GPT-J 6B model as the embedding model. If you’re using a JumpStart embedding model, you need to customize the LangChain SageMaker endpoint embedding class and transform the model request and response to integrate with LangChain. For a detailed implementation, refer to the GitHub repo.

embeddings = SagemakerEndpointEmbeddingsJumpStart(
    endpoint_name=_MODEL_CONFIG_["huggingface-textembedding-gpt-j-6b"]["endpoint_name"],
    region_name=aws_region,
    content_handler=content_handler,
)

Load domain-specific documents using the LangChain document loader and create an index

We use the CSVLoader package in LangChain to load CSV-formatted documents into the document loader:

loader = CSVLoader(file_path="rag_data/processed_data.csv")
documents = loader.load()

Next, we use TextSplitter to preprocess data for embedding purposes and use the SageMaker embedding model GPT-J -6B to create the embedding. We store embedding in a FAISS vector store to create an index. We use this index to find relevant documents that are semantically similar to the user’s query.

The following code shows how all these steps are done by the VectorstoreIndexCreator class in just few lines of code in LangChain to create a concise implementation of question answering with RAG:

index_creator = VectorstoreIndexCreator(
    vectorstore_cls=FAISS,
    embedding=embeddings,
    text_splitter=CharacterTextSplitter(chunk_size=300, chunk_overlap=0),
)
index = index_creator.from_loaders([loader])

Use the index to search for relevant context and pass it to the LLM model

Next, use the query method on the created index and pass the user’s question and SageMaker endpoint LLM. LangChain selects the top four closest documents (K=4) and passes the relevant context extracted from the documents to generate an accurate response. See the following code:

index.query(question=question, llm=sm_llm)

We get the following response for the query using the RAG-based approach with Flan T5 XXL:

"""For model: huggingface-text2text-flan-t5-xxl, the generated output is: 
Managed Spot Training can be used with all instances supported in Amazon SageMaker
"""

The response looks more accurate compared to the response we got with other approaches that we demonstrated earlier that have no context or static context that may not be always relevant.

Alternate approach to implement RAG with more customization using SageMaker and LangChain

In this section, we show you another approach to implement RAG using SageMaker and LangChain. This approach offers the flexibility to configure top K parameters for a relevancy search in the documents. It also allows you to use the LangChain feature of prompt templates, which allow you to easily parameterize the prompt creation instead of hard coding the prompts.

In the following code, we explicitly use FAISS to generate embedding for each of the document in the knowledge library with the SageMaker GPT-J-6B embedding model. Then we identify the top K (K=3) most relevant documents based on the user query.

docsearch = FAISS.from_documents(documents, embeddings)
docs = docsearch.similarity_search(question, k=3)

Next, we use a prompt template and chain it with the SageMaker LLM:

prompt_template = """Answer based on context:nn{context}nn{question}"""
PROMPT = PromptTemplate(template=prompt_template, input_variables=["context", "question"])
chain = load_qa_chain(llm=sm_llm, prompt=PROMPT)

We send the top three (K=3) relevant documents we found as context to the prompt by using a LangChain chain:

result = chain({"input_documents": docs, "question": question}, return_only_outputs=True)["output_text"]

With this approach of RAG implementation, we were able to take advantage of the additional flexibility of LangChain prompt templates and customize the number of documents searched for a relevancy match using the top K hyperparameter.

JumpStart RAG-based implementation notebook with SageMaker KNN

In this section, we implement the RAG-based approach using the KNN algorithm for finding relevant documents to create enhanced context. In this approach, we’re not using LangChain, but we use same dataset Amazon SageMaker FAQs as knowledge documents, embedding the models GPT-J-6B and LLM Flan T5 XXL just as we did in the previous LangChain approach.

If you have a large dataset, the SageMaker KNN algorithm may provide you with an effective semantic search. The SageMaker KNN algorithm also uses FAISS as the underlying search algorithm. The notebook for this solution can be found on GitHub.

First, we deploy the LLM Flan T5 XXL and GPT-J 6B embedding models in the same way as in the previous section. For each record in the knowledge database, we generate an embedding vector using the GPT-J embedding model.

Next, we use a SageMaker KNN training job to index the embedding of the knowledge data. The underlying algorithm used to index the data is FAISS. We want to find the top five most relevant documents, so we set the TOP_K variable to 5. We create the estimator for the KNN algorithm, run the training job, and deploy the KNN model to find indexes of the top five documents matching the query. See the following code:

from sagemaker.amazon.amazon_estimator import get_image_uri

def trained_estimator_from_hyperparams(s3_train_data, hyperparams, output_path):
    """
    Create an Estimator from the given hyperparams, fit to training data,
    and return a deployed predictor

    """
    # set up the estimator
    knn = sagemaker.estimator.Estimator(
        get_image_uri(boto3.Session().region_name, "knn"),
        aws_role,
        instance_count=1,
        instance_type="ml.m5.2xlarge",
        output_path=output_path,
        sagemaker_session=sess,
    )
    knn.set_hyperparameters(**hyperparams)

    # train a model. fit_input contains the locations of the train data
    fit_input = {"train": s3_train_data}
    knn.fit(fit_input)
    return knn

hyperparams = {"feature_dim": train_features.shape[1], "k": TOP_K,"sample_size": train_features.shape[0], "predictor_type": "classifier"}
output_path = f"s3://{bucket}/{prefix}/default_example/output"
knn_estimator = trained_estimator_from_hyperparams(
    s3_train_data, hyperparams, output_path)

Next, we create an embedding representation of the query using the GPT-J-6B embedding model that we used for creating an embedding of the knowledge library documents:

query_response = query_endpoint_with_json_payload(question, endpoint_name_embed, content_type="application/x-text")
question_embedding = parse_response_text_embed(query_response)

Then we use the KNN endpoint and pass the embedding of the query to the KNN endpoint to get the indexes of the top K most relevant documents. We use the indexes to retrieve the corresponded textual documents. Next, we concatenate the documents, ensuring the maximum allowed length of context is not exceeded. See the following code:

"""With maximum sequence length 500, selected top 4 document sections: 
  Managed Spot Training can be used with all instances supported in Amazon SageMaker.
  Managed Spot Training is supported in all AWS Regions where Amazon SageMaker is currently available.
  The difference between Savings Plans for Amazon SageMaker and Savings Plans for EC2 is in the services they 
  include. 
  SageMaker Savings Plans apply only to SageMaker ML Instance usage.
  There are no fixed limits to the size of the dataset you can use for training models with Amazon SageMaker.
"""

Now we come to our final step in which we combine the query, prompt, and the context containing text from relevant documents and pass it to the text generation LLM Flan T5 XXL model to generate the answer.

We get the following response for the query using a RAG-based approach with Flan T5 XXL:

"""
For model: huggingface-text2text-flan-t5-xxl, the generated output is: 

Managed Spot Training can be used with all instances supported in Amazon SageMaker
"""

Clean up

Make sure to delete the endpoints that we created in this notebook when not using them to avoid reoccurring cost.

Conclusion

In this post, we demonstrated the implementation of a RAG-based approach with LLMs for question answering tasks using two approaches: LangChain and the built-in KNN algorithm. The RAG-based approach optimizes the accuracy of the text generation using Flan T5 XXL by dynamically providing relevant context that was created by searching a list of documents.

You can use this these notebooks in SageMaker as is or you may customize them to your needs. To customize, you can use your own set of documents in the knowledge library, use other relevancy search implementations like OpenSearch, and use other embedding models and text generation LLMs available on JumpStart.

We look forward to seeing what you build on JumpStart using a RAG-based approach!

About the authors

Dr. Xin Huang is a Senior Applied Scientist for Amazon SageMaker JumpStart and Amazon SageMaker built-in algorithms. He focuses on developing scalable machine learning algorithms. His research interests are in the area of natural language processing, explainable deep learning on tabular data, and robust analysis of non-parametric space-time clustering. He has published many papers in ACL, ICDM, KDD conferences, and Royal Statistical Society: Series A.

Rachna Chadha is a Principal Solution Architect AI/ML in Strategic Accounts at AWS. Rachna is an optimist who believes that ethical and responsible use of AI can improve society in future and bring economical and social prosperity. In her spare time, Rachna likes spending time with her family, hiking and listening to music.

Dr. Kyle Ulrich is an Applied Scientist with the Amazon SageMaker built-in algorithms team. His research interests include scalable machine learning algorithms, computer vision, time series, Bayesian non-parametrics, and Gaussian processes. His PhD is from Duke University and he has published papers in NeurIPS, Cell, and Neuron.

Hemant Singh is a Machine Learning Engineer with experience in Amazon SageMaker JumpStart and Amazon SageMaker built-in algorithms. He got his masters from Courant Institute of Mathematical Sciences and B.Tech from IIT Delhi. He had experience in working on a diverse range of Machine Learning problems within the domain of natural language processing, computer vision, and time-series analysis.

Manas Dadarkar is a Software Development Manager owning the engineering of the Amazon Forecast service. He is passionate about the applications of machine learning and making ML technologies easily available for everyone to adopt and deploy to production. Outside of work, he has multiple interests including travelling, reading and spending time with friends and family.

Dr. Ashish Khetan is a Senior Applied Scientist with Amazon SageMaker built-in algorithms and helps develop machine learning algorithms. He got his PhD from University of Illinois Urbana-Champaign. He is an active researcher in machine learning and statistical inference, and has published many papers in NeurIPS, ICML, ICLR, JMLR, ACL, and EMNLP conferences.

Considerations for Distribution Shift Robustness in Health

*=Equal Contributors
This paper was accepted at the workshop “Trustworthy Machine Learning for Healthcare Workshop” at the conference ICLR 2023.
When analyzing robustness of predictive models under distribution shift, many works focus on tackling generalization in the presence of spurious correlations. In this case, one typically makes use of covariates or environment indicators to enforce independencies in learned models to guarantee generalization under various distribution shifts. In this work, we analyze a class of distribution shifts, where such independencies are not desirable, as…Apple Machine Learning Research

AutoFocusFormer: Image Segmentation off the Grid

Real world images often have highly imbalanced content density. Some areas are very uniform, e.g., large patches of blue sky, while other areas are scattered with many small objects. Yet, the commonly used successive grid downsampling strategy in convolutional deep networks treats all areas equally. Hence, small objects are represented in very few spatial locations, leading to worse results in tasks such as segmentation. Intuitively, retaining more pixels representing small objects during downsampling helps to preserve important information. To achieve this, we propose AutoFocusFormer (AFF), a…Apple Machine Learning Research

Learning to Detect Novel and Fine-Grained Acoustic Sequences Using Pretrained Audio Representations

This work investigates pre-trained audio representations for few shot Sound Event Detection. We specifically address the task of few shot detection of novel acoustic sequences, or sound events with semantically meaningful temporal structure, without assuming access to non-target audio. We develop procedures for pre-training suitable representations, and methods which transfer them to our few shot learning scenario. Our experiments evaluate the general purpose utility of our pre-trained representations on AudioSet, and the utility of proposed few shot methods via tasks constructed from…Apple Machine Learning Research

Amazon-sponsored workshop advances deep learning for code

ICLR workshop sponsored by Amazon CodeWhisperer features Amazon papers on a novel contrastive-learning framework for causal language models and a way to gauge the robustness of code generation models.Read More

Renders and Dragons Rule Creative Kingdoms This Week ‘In the NVIDIA Studio’

Editor’s note: This post is part of our weekly In the NVIDIA Studio series, which celebrates featured artists, offers creative tips and tricks, and demonstrates how NVIDIA Studio technology improves creative workflows. We’re also deep diving on new GeForce RTX 40 Series GPU features, technologies and resources, and how they dramatically accelerate content creation.

Content creator Grant Abbitt embodies selflessness, one of the best qualities that a creative can possess. Passionate about giving back to the creative community, Abbitt offers inspiration, guidance and free education for others in his field through YouTube tutorials.

He designed Dragon, the 3D scene featured this week In the NVIDIA Studio, specifically to help new Blender users easily understand the steps in the creative process of using the software.

“Dragons can be extremely tough to make,” said Abbitt. While he could have spent more time refining the details, he said, “That wasn’t the point of the project. It’s all about the learning journey for the student.”

Abbitt understands the importance of early education. Providing actionable, straightforward instructions enables prospective 3D modelers to make gradual progress, he said. When encouraged, 3D artists keep morale high while gaining confidence and learning more advanced skills, Abbitt has noticed over his 30+ years of industry experience.

His own early days of learning 3D workflows presented unique obstacles, like software programs costing as much as the hardware, or super-slow internet, which required Abbitt to learn 3D through instructional VHS tapes.

Learning 3D modeling and animation on VHS tapes.

Undeterred by such challenges, Abbitt earned a media studies degree and populated films with his own 3D content.

Now a full-time 3D artist and content creator, Abbitt does what he loves while helping aspiring content creators realize their creative ambitions. In this tutorial, for example, Abbitt teaches viewers how to create a video game character in just 20 minutes.

Dragon Wheel

Abbitt described a different dynasty in this realm — how he created his Dragon piece.

“Reference images are a must,” stressed Abbitt. “Deviation from the intended vision is part of the creative process, but without a direction or foundation, things can quickly go off track.” This is especially important with freelance work and creative briefs provided by clients, he added.

Abbitt looked to Pinterest and ArtStation for creative inspiration and reference material, and sketched in the Krita app on his tablet. The remainder of the project was completed in Blender — the popular 3D creation suite — which is free and open source.

Reference imagery set a solid foundation for the project.

He began with the initial blockout, a 3D rough-draft level built using simple 3D shapes without details or polished art assets. The goal of the blockout was to prototype, test and adjust the foundational shapes of the dragon. Abbitt then combined block shapes into a single mesh model, the structural build of a 3D model, consisting of polygons.

More sculpting was followed by retopologizing the mesh, the process of simplifying the topology of a mesh to make it cleaner and easier to work with. This is a necessary step for images that will undergo more advanced editing and distortions.

Adding Blender’s multiresolution modifier enabled Abbitt to subdivide a mesh, especially useful for re-projecting details from another sculpt with a Shrinkwrap modifier, which allows an object to “shrink” to the surface of another object. It can be applied to meshes, lattices, curves, surfaces and texts.

At this stage, the power of Abbitt’s GeForce RTX 4090 GPU really started to shine. He sculpted fine details faster with Blender Cycles RTX-accelerated OptiX ray tracing in the viewport for fluid, interactive modeling with photorealistic detail. Baking and applying textures were done with buttery smooth ease.

Astonishing details for a single 3D model.

The RTX 4090 GPU also accelerated the animation phase, where the artist rigged and posed his model. “Modern content creators require GPU technology to see their creative visions fully realized at an efficient pace,” Abbitt said.

For the texturing, painting and rendering process, Abbitt said he found it “extremely useful to be able to see the finished results without a huge render time, thanks to NVIDIA OptiX.”

Rendering final files in popular 3D creative apps — like Blender, Autodesk Maya with Autodesk Arnold, OTOY’s OctaneRender and Maxon’s Redshift — is made 70-200% faster with an RTX 4090 GPU, compared to previous-generation cards. This results in invaluable time saved for a freelancer with a deadline or a student working on a group project.

Abbitt’s RTX GPU enabled OptiX ray tracing in Blender Cycles for the fastest final frame render.

“NVIDIA GeForce RTX graphics cards are really the only choice at the moment for Blender users, because they offer so much more speed during render times,” said Abbitt. “You can quickly see results and make the necessary changes.”

Check out Abbitt’s YouTube channel with livestreams every Friday at 9 a.m. PT.

Follow NVIDIA Studio on Instagram, Twitter and Facebook. Access tutorials on the Studio YouTube channel and get updates directly in your inbox by subscribing to the Studio newsletter.

AI With a Personal Touch: Customized Text-to-Image Models

Serving in 3D: Advances in Inverse Rendering and Character Creation

Not a Hair Out of Place: Neural Physics Enables Realistic Simulations

Neural Rendering Brings Film-Quality Detail to Real-Time Graphics

More Generative AI and Graphics Research

AI Frontiers: AI for health and the future of research with Peter Lee

The key challenge and our solution

Programming puzzle examples

Can computers generate valuable, novel challenges?

Risks and limitations

Examples of programming puzzles for AI self-play

Example 1: Towers of Hanoi

Example 2: String challenge

Example 3: Integer factorization

Solution overview

Prerequisites

Use Autopilot and Canvas

Use JumpStart and Canvas

Use SageMaker model registry and Canvas

Manage shared models

Import a shared model and make predictions with Canvas

Considerations

Clean up

Conclusion

About the authors

LLMS and constraints

Retrieval Augmented Generation

JumpStart RAG-based implementation notebook with LangChain

Create LLM Model

Create the embedding model

Load domain-specific documents using the LangChain document loader and create an index

Use the index to search for relevant context and pass it to the LLM model

Alternate approach to implement RAG with more customization using SageMaker and LangChain

JumpStart RAG-based implementation notebook with SageMaker KNN

Clean up

Conclusion

About the authors

Dragon Wheel

Navigation

GenAI Vision Endless Possibilities

"I'm interested in things that change the world or that affect the future and wondrous, new technology where you see it, and you're like, 'Wow, how did that even happen? How is that possible?'" -- Elon Musk

Copyright © 2019-2025 Vedere AI. All Rights Reserved.