Putting the AI in Retail: Survey Reveals Latest Trends Driving Technological Advancements in the Industry

Putting the AI in Retail: Survey Reveals Latest Trends Driving Technological Advancements in the Industry

The retail industry is in the midst of a major technology transformation, fueled by the rise in AI.

With the highest potential for AI and analytics among all industries, the retail and consumer packaged goods (CPG) sectors are poised to harness the power of AI to enhance operational efficiency, elevate customer and employee experiences and drive growth. As such, it’s crucial to stay ahead of the curve by anticipating potential trends that aim to change the retail game.

NVIDIA’s first annual “State of AI in Retail and CPG” survey — conducted among industry professionals — provides insights into the state of AI adoption in retail, its impact on revenue and costs and the emerging trends shaping the future of the industry.

With more than 400 respondents globally, including C-suite leaders and other executives, general managers and individual contributors, the survey consisted of questions covering a range of AI topics, top use cases, biggest challenges, infrastructure investment initiatives and deployment models.

Improving Operational Efficiencies Is a Top Priority

To stay ahead in a highly dynamic market, retailers are actively considering how AI can help them meet evolving customer preferences, address labor shortages and drive sustainability efforts. AI has already proved to be a game-changer for retailers, with 69% reporting an increase in annual revenue attributed to AI adoption. Additionally, 72% of retailers using AI experienced a decrease in operating costs.

AI is enhancing operational efficiency, elevating customer experiences and driving growth. The top five current AI use cases were as follows:

  1. Personalized customer recommendations
  2. Store analytics and insights
  3. Loss prevention and asset protection
  4. Augmented reality experiences
  5. Automated marketing content generation

While retailers are actively implementing AI, there are still areas they plan on exploring. These include further investing in AI infrastructure to overcome challenges related to inadequate technology and lack of AI talent, exploring the potential of the metaverse for consumer engagement and operational efficiency while also leveraging AI for brick-and-mortar stores to provide convenience and personalized customer experiences, transforming customer experiences using generative AI, and ensuring data privacy and protection in generative AI adoption.

Generative AI Is Changing Customer Experiences

Generative AI prominently emerged in several of the top AI use cases for retail. The use cases ranged from multimodal shopping advisors for personalized product recommendations; adaptive advertising, promotions and pricing; product tagging and cataloging; identification of similar and complementary products; as well as deployment of brand avatars for automated customer service.

Retailers recognized the transformative potential of generative AI, with 86% expressing a desire to use it to enhance customer experiences. Respondents acknowledged that incorporating AI into business practices and solutions could revolutionize customer engagement, optimize marketing strategies and streamline operational processes.

Staying Ahead With an Omnichannel Approach

To stay competitive, the survey indicated the importance of an omnichannel approach that integrates numerous online and offline channels to provide consumers with a consistent experience.

The results showed that ecommerce was the most used channel, with 79% of retailers actively participating. Mobile applications also gained traction, with over half of retailers using them to bridge the gap between digital and physical shopping experiences.

Despite the rise in digital shopping, 30% of respondents say physical stores have the biggest revenue growth opportunity (ranked second behind ecommerce) and remain the channel with the most AI use cases for retailers. Given the emphasis on intelligent stores and their central role in the omnichannel experience, use cases such as store analytics and loss prevention will continue to be critical investments.

Investing in AI Infrastructure

While AI adoption is still in its early stages, retailers are committed to increasing their AI infrastructure investments. Over 60% of respondents plan to boost their AI investments in the next 18 months. This commitment reflects the industry’s recognition of the technology’s potential to enhance operational efficiency, reduce costs, elevate customer experiences and drive growth.

Download the “State of AI in Retail and CPG: 2024 Trends” report for in-depth results and insights.

Explore NVIDIA’s AI solutions and enterprise-level AI platforms for retail at www.nvidia.com/retail.

Read More

Deploy a Slack gateway for Amazon Q, your business expert

Deploy a Slack gateway for Amazon Q, your business expert

Amazon Q is a new generative AI-powered application that helps users get work done. Amazon Q can become your tailored business expert and let you discover content, brainstorm ideas, or create summaries using your company’s data safely and securely. You can use Amazon Q to have conversations, solve problems, generate content, gain insights, and take action by connecting to your company’s information repositories, code, data, and enterprise systems. For more information, see Introducing Amazon Q, a new generative AI-powered assistant (preview).

In this post, we show you how to bring Amazon Q, your business expert, to users in Slack.

You’ll be able converse with Amazon Q using Slack direct messages (DMs) to ask questions and get answers based on company data, get help creating new content such as email drafts, summarize attached files, and perform tasks.

You can also invite Amazon Q to participate in your team channels. In a channel, users can ask it questions in a new message, or tag it in an existing thread at any point, to provide additional data points, resolve a debate, or summarize the conversation and capture the next steps.

Solution overview

Amazon Q is amazingly powerful. Check out the following demo—seeing is believing!

In the demo, our Amazon Q application is populated with a set of AWS whitepapers. You can populate your own Amazon Q business expert application with your own company’s documents and knowledge base articles, so it will be able to answer your questions!

Everything you need is provided as open source in our GitHub repo.

In this post, we walk you through the process to deploy Amazon Q in your AWS account and add it to your Slack workspace. When you’re done, you’ll wonder how you ever managed without it!

The following are some of the things it can do:

  • Respond to messages – In DMs, it responds to all messages. In channels, it responds only to @mentions and responds in a conversation thread.
  • Render answers containing markdown – This includes headings, lists, bold, italics, tables, and more.
  • Track sentiment – It provides thumbs up and thumbs down buttons to track user sentiment.
  • Provide source attribution – It provides references and hyperlinks to sources used by Amazon Q.
  • Understand conversation context – It tracks the conversation and responds based on the context.
  • Stay aware of multiple users – When it’s tagged in a thread, it knows who said what, and when, so it can contribute in context and accurately summarize the thread when asked.
  • Process attached files – It can process up to five attached files for document question answering, summaries, and more.
  • Start new conversations – You can reset and start new conversations in DM channels by using /new_conversation.

Slack example

In the following sections, we show how to deploy the project to your own AWS account and Slack workspace, and start experimenting!

Prerequisites

You need to have an AWS account and an AWS Identity and Access Management (IAM) role and user with permissions to create and manage the necessary resources and components for this application. If you don’t have an AWS account, see How do I create and activate a new Amazon Web Services account?

You also need to have an existing, working Amazon Q business expert application. If you haven’t set one up yet, see Creating an Amazon Q application.

Lastly, you need a Slack account and access to create and publish apps to your Slack organization. If you don’t have one, see if your company can create a Slack sandbox organization for you to experiment, or go to slack.com to create a free Slack account and workspace.

Deploy the solution resources

We’ve provided pre-built AWS CloudFormation templates that deploy everything you need in your AWS account.

If you’re a developer and you want to build, deploy, or publish the solution from code, refer to the Developer README.

Complete the following steps to launch the CloudFormation stack:

  1. Log in to the AWS Management Console.
  2. Choose one of the following Launch Stack buttons for your desired AWS Region to open the AWS CloudFormation console and create a new stack.
Region Launch Stack
N. Virginia (us-east-1)
Oregon (us-west-2)
  1. For Stack name, enter a name for your app (for example, AMAZON-Q-SLACK-GATEWAY).
  2. For AmazonQAppId, enter your existing Amazon Q application ID (for example, 80xxxxx9-7xx3-4xx0-bxx4-5baxxxxx2af5). You can copy it from the Amazon Q console.
  3. For AmazonQRegion, choose the Region where you created your Amazon Q application (us-east-1 or us-west-2).
  4. For AmazonQUserId, enter an Amazon Q user ID email address (leave blank to use a Slack user email as the user ID).
  5. For ContextDaysToLive, enter the length of time to keep conversation metadata cached in Amazon DynamoDB (you can leave this as the default).

When your CloudFormation stack status is CREATE_COMPLETE, choose the Outputs tab, and keep it open—you’ll need it in later steps.

Create your app

Now you can create your app in Slack. Complete the following steps:

  1. Create a Slack app in https://api.slack.com/apps from the generated manifest—copy and paste from the stack output: SlackAppManifest.
  2. Choose App Home in the navigation pane and scroll down to the section Show Tabs.
  3. Enable Messages Tab.
  4. Select Allow users to send Slash commands and messages from the messages tab.

This is a required step to enable your user to send messages to your app.

Slack enable messages

Add your app in your workspace

Now you can add your app in your workspace. This is required to generate the bot user OAuth token value that is needed in the next step.

  1. Go to OAuth & Permissions (in https://api.slack.com) and choose Install to Workspace to generate the OAuth token.
  2. In Slack, go to your workspace.
  3. Choose your workspace name, Settings & administration, and Manage apps.
  4. Choose your newly created app.
  5. In the right pane, choose Open in App Directory.
  6. Choose Open in Slack.

Configure Slack secrets in AWS Secrets Manager

Let’s configure your Slack secrets in order to verify the signature of each request and post on behalf of your Amazon Q bot.

In this example, we are not enabling Slack token rotation. You can enable it for a production app by implementing rotation via AWS Secrets Manager. Create an issue (or, better yet, a pull request) in the GitHub repo if you want this feature added to a future version.

Complete the following steps to configure a secret in Secrets Manager:

  1. On the AWS CloudFormation console, navigate to your stack Outputs tab and choose the link for SlackSecretConsoleUrl to be redirected to the Secrets Manager console.
  2. Choose Retrieve secret value.
  3. Choose Edit.
  4. Replace the values of SlackSigningSecret and SlackBotUserOAuthToken using the values in the Slack application configuration under Basic Information and OAuth & Permissions.

Be careful you don’t accidentally copy Client Secret instead of Signing Secret.

Edit secrets

Start using Amazon Q

Complete the following steps to start using Amazon Q in Slack:

  1. Open your Slack workspace.
  2. Under Apps, Manage, add your new Amazon Q app.
  3. Optionally, add your Amazon Q app to team channels.
  4. In the app DM channel, enter Hello.

Say hello

You have now deployed a powerful new AI assistant into your sandbox Slack environment.

Play with it, try all the features discussed in this post, and copy the things you saw in the demo video. Most importantly, you can ask about topics related to the documents that you have ingested into your own Amazon Q business expert application. But don’t stop there. You can find additional ways to make it useful, and when you do, let us know by posting a comment.

Once you are convinced how useful it is, talk to your Slack admins (and show them this post) and work with them to deploy it in your company’s Slack workspaces. Your fellow employees will thank you!

Clean up

When you’re finished experimenting with this solution, delete your app in Slack (https://api.slack.com/apps) and clean up your AWS resources by opening the AWS CloudFormation console and deleting the AMAZON-Q-SLACK-GATEWAY stack that you deployed. This deletes the resources that you created by deploying the solution.

Conclusions

This sample Amazon Q slack application discussed in this post is provided as open source—you can use it as a starting point for your own solution, and help us make it better by contributing back fixes and features via GitHub pull requests. Explore the code, choose Watch in the GitHub repo to be notified of new releases, and check back for the latest updates. We’d also love to hear your suggestions for improvements and features.

For more information on Amazon Q, refer to What is Amazon Q (For Business Use)?


About the Authors

Gary Benattar is a Senior Software Development Manager in AWS HR. Gary started at Amazon in 2012 as an intern, focusing on building scalable, real-time outlier detection systems. He worked in Seattle and Luxembourg and is now based in Tel Aviv, Israel, where he dedicates his time to building software to revolutionize the future of Human Resources. He co-founded a startup, Zengo, with a focus on making digital wallets secure through multi-party computation. He received his MSc in Software Engineering from Sorbonne University in Paris.


Bob Strahan

Bob Strahan is a Principal Solutions Architect in the AWS Language AI Services team.

Read More

NVIDIA and Loss Prevention Retail Council Introduce AI Solution to Address Organized Retail Crime

NVIDIA and Loss Prevention Retail Council Introduce AI Solution to Address Organized Retail Crime

NVIDIA and the Loss Prevention Research Council (LPRC) are collaborating with several AI companies to showcase a real-time solution for combating and preventing organized retail crime (ORC).

The integrated offering provides advance notifications of suspicious behavior inside and outside stores so that authorities can intervene early.

The LPRC includes asset-protection executives from more than 85 major retail chains, with hundreds of thousands of stores worldwide, as well as law enforcement, consumer packaged goods companies and technology solutions partners. It’s focused on collaborating with the retail industry to reduce shrink — the loss of products for reasons other than sales — and increase safety and security at stores and shopping malls.

Flash mobs and smash-and-grab thefts are a growing concern, costing retailers billions of dollars in lost revenue and causing safety concerns among customers and employees. Crime syndicates have committed brazen, large-scale thefts, often selling stolen merchandise on the black market.

A National Retail Federation survey found that shrink accounted for $112 billion in losses in 2022, with an estimated two-thirds due to theft.

Increasingly, this involves violence. According to the survey, 67% of respondents said they were seeing more violence and aggression associated with organized-crime theft than a year ago.

The AI-based solution, which helps retailers get a jump on often-evasive, fast-moving organized crime groups, uses technology from several leading AI firms that have built their high-performance AI applications on the NVIDIA Metropolis application framework and microservices.

The solution includes product recognition and tracking, as well as anomaly detection, from AiFi, vehicle license plate and model recognition from BriefCam, and physical security management from SureView to provide advance and real-time notifications to retailer command centers.

The three are among over 500 software companies and startups that have developed retail, safety and security AI applications on NVIDIA Metropolis software development kits for vision AI — and that have been certified as NVIDIA Metropolis partners.

“The proposed AI-based ORC solution combines LPRC’s deep expertise in loss prevention from over 23 years of collaboration with asset protection executives with NVIDIA’s deep AI expertise,” said Read Hayes, who leads the LPRC and is a University of Florida research scientist and criminologist. “We believe this type of cross-industry collaboration will help retailers fight back against organized retail crime.”

Developing Integrated AI for Securing Stores 

AiFi, based in Silicon Valley, develops computer vision solutions, including autonomous retail capabilities built on the NVIDIA Metropolis application framework. Its solution detects anomalies in shopper behavior, tracks items removed from shelves and notifies retailers if shoppers bypass checkout lanes.

BriefCam, based in Newton, Mass., provides deep learning-based video analytics technology for insightful decision-making. Enabling the forensic search, alerting on and visualization of objects in video, the BriefCam Platform includes integrated license plate recognition and cross-camera object tracking, alongside other capabilities that support effective asset protection and real-time response to theft attempts.

SureView, based in Tampa, Fla., offers a software platform for managing multiple security systems with a single view. The company’s physical security management system receives signals from the AiFi and BriefCam applications, helping teams coordinate a quick and consistent response and providing notifications to store security operations and law enforcement based on the retailer’s business rules.

For more information about AI solutions for mitigating organized retail crime, connect with the NVIDIA team at NRF: Retail’s Big Show, the world’s largest retail expo, taking place Jan. 14-16 at the Javits Convention Center in New York.

Attend the Big Ideas session on Organized Retail Crime on Jan. 14 at 2 p.m. ET, moderated by the LPRC, to discover how Kroger and Jacksons Food are using AI in their stores to tackle crime.

The ORC solution will be showcased at NRF — visit NVIDIA experts in Lenovo’s booth (3665) and Dell’s booth (4957) to learn more about it from NVIDIA’s software partners.

Read More

Accelerate AI models on GPU using Amazon SageMaker multi-model endpoints with TorchServe, saving up to 75% on inference costs

Accelerate AI models on GPU using Amazon SageMaker multi-model endpoints with TorchServe, saving up to 75% on inference costs

Multi-model endpoints (MMEs) are a powerful feature of Amazon SageMaker designed to simplify the deployment and operation of machine learning (ML) models. With MMEs, you can host multiple models on a single serving container and host all the models behind a single endpoint. The SageMaker platform automatically manages the loading and unloading of models and scales resources based on traffic patterns, reducing the operational burden of managing a large quantity of models. This feature is particularly beneficial for deep learning and generative AI models that require accelerated compute. The cost savings achieved through resource sharing and simplified model management makes SageMaker MMEs an excellent choice for you to host models at scale on AWS.

Recently, generative AI applications have captured widespread attention and imagination. Customers want to deploy generative AI models on GPUs but at the same time are conscious of costs. SageMaker MMEs support GPU instances and is a great option for these types of applications. Today, we are excited to announce TorchServe support for SageMaker MMEs. This new model server support gives you the advantage of all the benefits of MMEs while still using the serving stack that TorchServe customers are most familiar with. In this post, we demonstrate how to host generative AI models, such as Stable Diffusion and Segment Anything Model, on SageMaker MMEs using TorchServe and build a language-guided editing solution that can help artists and content creators develop and iterate their artwork faster.

Solution overview

Language-guided editing is a common cross-industry generative AI use case. It can help artists and content creators work more efficiently to meet content demand by automating repetitive tasks, optimizing campaigns, and providing a hyper-personalized experience for the end customer. Businesses can benefit from increased content output, cost savings, improved personalization, and enhanced customer experience. In this post, we demonstrate how you can build language-assisted editing features using MME TorchServe that allow you to erase any unwanted object from an image and modify or replace any object in an image by supplying a text instruction.

The user experience flow for each use case is as follows:

  • To remove an unwanted object, the select the object from the image to highlight it. This action sends the pixel coordinates and the original image to a generative AI model, which generates a segmentation mask for the object. After confirming the correct object selection, you can send the original and mask images to a second model for removal. The detailed illustration of this user flow is demonstrated below.

Dog on a bench with mouse pointer clicking the dog

Dog on a bench highlighted

A bench without the dog

Step 1: Select an object (“dog”) from the image Step 2: Confirm the correct object is highlighted Step 3: Erase the object from the image
  • To modify or replace an object, the select and highlight the desired object, following the same process as described above. Once you confirm the correct object selection, you can modify the object by supplying the original image, the mask, and a text prompt. The model will then change the highlighted object based on the provided instructions. A detailed illustration of this second user flow is as follows.

A vase with a cactus and mouse pointer

A vase highlighted

A rounded vase with a cactus

Step 1: Select an object (“vase”) from the image Step 2: Confirm the correct object is highlighted Step 3: Provide a text prompt (“futuristic vase”) to modify the object

To power this solution, we use three generative AI models: Segment Anything Model (SAM), Large Mask Inpainting Model (LaMa), and Stable Diffusion Inpaint (SD). Here are how these models been utilized in the user experience workflow:

To remove an unwanted object To modify or replace an object

flow diagram

flow diagram

  1. Segment Anything Model (SAM) is used to generate a segment mask of the object of interest. Developed by Meta Research, SAM is an open-source model that can segment any object in an image. This model has been trained on a massive dataset known as SA-1B, which comprises over 11 million images and 1.1 billion segmentation masks. For more information on SAM, refer to their website and research paper.
  2. LaMa is used to remove any undesired objects from an image. LaMa is a Generative Adversarial Network (GAN) model specializes in fill missing parts of images using irregular masks. The model architecture incorporates image-wide global context and a single-step architecture that uses Fourier convolutions, enabling it to achieve state-of-the-art results at a faster speed. For more details on LaMa, visit their website and research paper.
  3. SD 2 inpaint model from Stability AI is used to modify or replace objects in an image. This model allows us to edit the object in the mask area by providing a text prompt. The inpaint model is based on the text-to-image SD model, which can create high-quality images with a simple text prompt. It provides additional arguments such as original and mask images, allowing for quick modification and restoration of existing content. To learn more about Stable Diffusion models on AWS, refer to Create high-quality images with Stable Diffusion models and deploy them cost-efficiently with Amazon SageMaker.

All three models are hosted on SageMaker MMEs, which reduces the operational burden from managing multiple endpoints. In addition to that, using MME eliminates concerns about certain models being underutilized because resources are shared. You can observe the benefit from improved instance saturation, which ultimately leads to cost savings. The following architecture diagram illustrates how all three models are served using SageMaker MMEs with TorchServe.

flow diagram

We have published the code to implement this solution architecture in our GitHub repository. To follow along with the rest of the post, use the notebook file. It is recommended to run this example on a SageMaker notebook instance using the conda_python3 (Python 3.10.10) kernel.

Extend the TorchServe container

The first step is to prepare the model hosting container. SageMaker provides a managed PyTorch Deep Learning Container (DLC) that you can retrieve using the following code snippet:

# Use SageMaker PyTorch DLC as base image
baseimage = sagemaker.image_uris.retrieve(
    framework="pytorch",
    region=region,
    py_version="py310",
    image_scope="inference",
    version="2.0.0",
    instance_type="ml.g5.2xlarge",
)
print(baseimage)

Because the models require resources and additional packages that are not on the base PyTorch DLC, you need to build a Docker image. This image is then uploaded to Amazon Elastic Container Registry (Amazon ECR) so we can access directly from SageMaker. The custom installed libraries are listed in the Docker file:

ARG BASE_IMAGE

FROM $BASE_IMAGE

#Install any additional libraries
RUN pip install segment-anything-py==1.0
RUN pip install opencv-python-headless==4.7.0.68
RUN pip install matplotlib==3.6.3
RUN pip install diffusers
RUN pip install tqdm
RUN pip install easydict
RUN pip install scikit-image
RUN pip install xformers
RUN pip install tensorflow
RUN pip install joblib
RUN pip install matplotlib
RUN pip install albumentations==0.5.2
RUN pip install hydra-core==1.1.0
RUN pip install pytorch-lightning
RUN pip install tabulate
RUN pip install kornia==0.5.0
RUN pip install webdataset
RUN pip install omegaconf==2.1.2
RUN pip install transformers==4.28.1
RUN pip install accelerate
RUN pip install ftfy

Run the shell command file to build the custom image locally and push it to Amazon ECR:

%%capture build_output

reponame = "torchserve-mme-demo"
versiontag = "genai-0.1"

# Build our own docker image
!cd workspace/docker && ./build_and_push.sh {reponame} {versiontag} {baseimage} {region} {account}

Prepare the model artifacts

The main difference for the new MMEs with TorchServe support is how you prepare your model artifacts. The code repo provides a skeleton folder for each model (models folder) to house the required files for TorchServe. We follow the same four-step process to prepare each model .tar file. The following code is an example of the skeleton folder for the SD model:

workspace
|--sd
   |-- custom_handler.py
   |-- model-config.yaml

The first step is to download the pre-trained model checkpoints in the models folder:

import diffusers
import torch
import transformers

pipeline = diffusers.StableDiffusionInpaintPipeline.from_pretrained(
    "stabilityai/stable-diffusion-2-inpainting", torch_dtype=torch.float16
)

sd_dir = "workspace/sd/model"
pipeline.save_pretrained(sd_dir)

The next step is to define a custom_handler.py file. This is required to define the behavior of the model when it receives a request, such as loading the model, preprocessing the input, and postprocessing the output. The handle method is the main entry point for requests, and it accepts a request object and returns a response object. It loads the pre-trained model checkpoints and applies the preprocess and postprocess methods to the input and output data. The following code snippet illustrates a simple structure of the custom_handler.py file. For more detail, refer to the TorchServe handler API.

def initialize(self, ctx: Context):

def preprocess(self, data):

def inference(self, data):

def handle(self, data, context):
    requests = self.preprocess(data)
    responses = self.inference(requests)

    return responses

The last required file for TorchServe is model-config.yaml. The file defines the configuration of the model server, such as number of workers and batch size. The configuration is at a per-model level, and an example config file is shown in the following code. For a complete list of parameters, refer to the GitHub repo.

minWorkers: 1
maxWorkers: 1
batchSize: 1
maxBatchDelay: 200
responseTimeout: 300

The final step is to package all the model artifacts into a single .tar.gz file using the torch-model-archiver module:

!torch-model-archiver --model-name sd --version 1.0 --handler workspace/sd/custom_handler.py --extra-files workspace/sd/model --config-file workspace/sam/model-config.yaml --archive-format no-archive!cd sd && tar cvzf sd.tar.gz .

Create the multi-model endpoint

The steps to create a SageMaker MME are the same as before. In this particular example, you spin up an endpoint using the SageMaker SDK. Start by defining an Amazon Simple Storage Service (Amazon S3) location and the hosting container. This S3 location is where SageMaker will dynamically load the models base on invocation patterns. The hosting container is the custom container you built and pushed to Amazon ECR in the earlier step. See the following code:

# This is where our MME will read models from on S3.
multi_model_s3uri = output_path

Then you want to define a MulitDataModel that captures all the attributes like model location, hosting container, and permission access:

print(multi_model_s3uri)
model = Model(
    model_data=f"{multi_model_s3uri}/sam.tar.gz",
    image_uri=container,
    role=role,
    sagemaker_session=smsess,
    env={"TF_ENABLE_ONEDNN_OPTS": "0"},
)

mme = MultiDataModel(
    name="torchserve-mme-genai-" + datetime.now().strftime("%Y-%m-%d-%H-%M-%S"),
    model_data_prefix=multi_model_s3uri,
    model=model,
    sagemaker_session=smsess,
)
print(mme)

The deploy() function creates an endpoint configuration and hosts the endpoint:

mme.deploy(
    initial_instance_count=1,
    instance_type="ml.g5.2xlarge",
    serializer=sagemaker.serializers.JSONSerializer(),
    deserializer=sagemaker.deserializers.JSONDeserializer(),
)

In the example we provided, we also show how you can list models and dynamically add new models using the SDK. The add_model() function copies your local model .tar files into the MME S3 location:

# Only sam.tar.gz visible!
list(mme.list_models())

models = ["sd/sd.tar.gz", "lama/lama.tar.gz"]
for model in models:
    mme.add_model(model_data_source=model)

Invoke the models

Now that we have all three models hosted on an MME, we can invoke each model in sequence to build our language-assisted editing features. To invoke each model, provide a target_model parameter in the predictor.predict() function. The model name is just the name of the model .tar file we uploaded. The following is an example code snippet for the SAM model that takes in a pixel coordinate, a point label, and dilate kernel size, and generates a segmentation mask of the object in the pixel location:

img_file = "workspace/test_data/sample1.png"
img_bytes = None

with Image.open(img_file) as f:
    img_bytes = encode_image(f)

gen_args = json.dumps(dict(point_coords=[750, 500], point_labels=1, dilate_kernel_size=15))

payload = json.dumps({"image": img_bytes, "gen_args": gen_args}).encode("utf-8")

response = predictor.predict(data=payload, target_model="/sam.tar.gz")
encoded_masks_string = json.loads(response.decode("utf-8"))["generated_image"]
base64_bytes_masks = base64.b64decode(encoded_masks_string)

with Image.open(io.BytesIO(base64_bytes_masks)) as f:
    generated_image_rgb = f.convert("RGB")
    generated_image_rgb.show()

To remove an unwanted object from an image, take the segmentation mask generated from SAM and feed that into the LaMa model with the original image. The following images show an example.

Dog on a bench

White mask of dog on black background

Just a bench

Sample image Segmentation mask from SAM Erase the dog using LaMa

To modify or replace any object in an image with a text prompt, take the segmentation mask from SAM and feed it into SD model with the original image and text prompt, as shown in the following example.

Dog on a bench White mask of dog on black background Hamster on a bench
Sample image Segmentation mask from SAM Replace using SD model with text prompt

“a hamster on a bench”

Cost savings

The benefits of SageMaker MMEs increase based on the scale of model consolidation. The following table shows the GPU memory usage of the three models in this post. They are deployed on one g5.2xlarge instance by using one SageMaker MME.

Model GPU Memory (MiB)
Segment Anything Model 3,362
Stable Diffusion In Paint 3,910
Lama 852

You can see cost savings when hosting the three models with one endpoint, and for use cases with hundreds or thousands of models, the savings are much greater.

For example, consider 100 Stable Diffusion models. Each of the models on its own could be served by an ml.g5.2xlarge endpoint (4 GiB memory), costing $1.52 per instance hour in the US East (N. Virginia) Region. To provide all 100 models using their own endpoint would cost $218,880 per month. With a SageMaker MME, a single endpoint using ml.g5.2xlarge instances can host four models simultaneously. This reduces production inference costs by 75% to only $54,720 per month. The following table summarizes the differences between single-model and multi-model endpoints for this example. Given an endpoint configuration with sufficient memory for your target models, steady state invocation latency after all models have been loaded will be similar to that of a single-model endpoint.

Single-model endpoint Multi-model endpoint
Total endpoint price per month $218,880 $54,720
Endpoint instance type ml.g5.2xlarge ml.g5.2xlarge
CPU Memory capacity (GiB) 32 32
GPU Memory capacity (GiB) 24 24
Endpoint price per hour $1.52 $1.52
Number of instances per endpoint 2 2
Endpoints needed for 100 models 100 25

Clean up

After you are done, please follow the instructions in the cleanup section of the notebook to delete the resources provisioned in this post to avoid unnecessary charges. Refer to Amazon SageMaker Pricing for details on the cost of the inference instances.

Conclusion

This post demonstrates the language-assisted editing capabilities made possible through the use of generative AI models hosted on SageMaker MMEs with TorchServe. The example we shared illustrates how we can use resource sharing and simplified model management with SageMaker MMEs while still utilizing TorchServe as our model serving stack. We utilized three deep learning foundation models: SAM, SD 2 Inpainting, and LaMa. These models enable us to build powerful capabilities, such as erasing any unwanted object from an image and modifying or replacing any object in an image by supplying a text instruction. These features can help artists and content creators work more efficiently and meet their content demands by automating repetitive tasks, optimizing campaigns, and providing a hyper-personalized experience. We invite you to explore the example provided in this post and build your own UI experience using TorchServe on a SageMaker MME.

To get started, see Supported algorithms, frameworks, and instances for multi-model endpoints using GPU backed instances.


About the authors

James Wu James Wu is a Senior AI/ML Specialist Solution Architect at AWS. helping customers design and build AI/ML solutions. James’s work covers a wide range of ML use cases, with a primary interest in computer vision, deep learning, and scaling ML across the enterprise. Prior to joining AWS, James was an architect, developer, and technology leader for over 10 years, including 6 years in engineering and 4 years in marketing & advertising industries.
Li Ning

Li Ning is a senior software engineer at AWS with a specialization in building large-scale AI solutions. As a tech lead for TorchServe, a project jointly developed by AWS and Meta, her passion lies in leveraging PyTorch and AWS SageMaker to help customers embrace AI for the greater good. Outside of her professional endeavors, Li enjoys swimming, traveling, following the latest advancements in technology, and spending quality time with her family.

Ankith Gunapal Ankith Gunapal is an AI Partner Engineer at Meta (PyTorch). He is passionate about model optimization and model serving, with experience ranging from RTL verification, embedded software, computer vision, to PyTorch. He holds a Master’s in Data Science and a Master’s in Telecommunications. Outside of work, Ankith is also an electronic dance music producer.

Saurabh Trikande Saurabh Trikande is a Senior Product Manager for Amazon SageMaker Inference. He is passionate about working with customers and is motivated by the goal of democratizing machine learning. He focuses on core challenges related to deploying complex ML applications, multi-tenant ML models, cost optimizations, and making deployment of deep learning models more accessible. In his spare time, Saurabh enjoys hiking, learning about innovative technologies, following TechCrunch and spending time with his family.

Subhash Talluri Subhash Talluri is a Lead AI/ML solutions architect of the Telecom Industry business unit at Amazon Web Services. He’s been leading development of innovative AI/ML solutions for Telecom customers and partners worldwide. He brings interdisciplinary expertise in engineering and computer science to help build scalable, secure, and compliant AI/ML solutions via cloud-optimized architectures on AWS.

Read More

Responsible AI at Google Research: User Experience Team

Responsible AI at Google Research: User Experience Team

Google’s Responsible AI User Experience (Responsible AI UX) team is a product-minded team embedded within Google Research. This unique positioning requires us to apply responsible AI development practices to our user-centered user experience (UX) design process. In this post, we describe the importance of UX design and responsible AI in product development, and share a few examples of how our team’s capabilities and cross-functional collaborations have led to responsible development across Google.

First, the UX part. We are a multi-disciplinary team of product design experts: designers, engineers, researchers, and strategists who manage the user-centered UX design process from early-phase ideation and problem framing to later-phase user-interface (UI) design, prototyping and refinement. We believe that effective product development occurs when there is clear alignment between significant unmet user needs and a product’s primary value proposition, and that this alignment is reliably achieved via a thorough user-centered UX design process.

And second, recognizing generative AI’s (GenAI) potential to significantly impact society, we embrace our role as the primary user advocate as we continue to evolve our UX design process to meet the unique challenges AI poses, maximizing the benefits and minimizing the risks. As we navigate through each stage of an AI-powered product design process, we place a heightened emphasis on the ethical, societal, and long-term impact of our decisions. We contribute to the ongoing development of comprehensive safety and inclusivity protocols that define design and deployment guardrails around key issues like content curation, security, privacy, model capabilities, model access, equitability, and fairness that help mitigate GenAI risks.

Responsible AI UX is constantly evolving its user-centered product design process to meet the needs of a GenAI-powered product landscape with greater sensitivity to the needs of users and society and an emphasis on ethical, societal, and long-term impact.

Responsibility in product design is also reflected in the user and societal problems we choose to address and the programs we resource. Thus, we encourage the prioritization of user problems with significant scale and severity to help maximize the positive impact of GenAI technology.

Communication across teams and disciplines is essential to responsible product design. The seamless flow of information and insight from user research teams to product design and engineering teams, and vice versa, is essential to good product development. One of our team’s core objectives is to ensure the practical application of deep user-insight into AI-powered product design decisions at Google by bridging the communication gap between the vast technological expertise of our engineers and the user/societal expertise of our academics, research scientists, and user-centered design research experts. We’ve built a multidisciplinary team with expertise in these areas, deepening our empathy for the communication needs of our audience, and enabling us to better interface between our user & society experts and our technical experts. We create frameworks, guidebooks, prototypes, cheatsheets, and multimedia tools to help bring insights to life for the right people at the right time.

Facilitating responsible GenAI prototyping and development

During collaborations between Responsible AI UX, the People + AI Research (PAIR) initiative and Labs, we identified that prototyping can afford a creative opportunity to engage with large language models (LLM), and is often the first step in GenAI product development. To address the need to introduce LLMs into the prototyping process, we explored a range of different prompting designs. Then, we went out into the field, employing various external, first-person UX design research methodologies to draw out insight and gain empathy for the user’s perspective. Through user/designer co-creation sessions, iteration, and prototyping, we were able to bring internal stakeholders, product managers, engineers, writers, sales, and marketing teams along to ensure that the user point of view was well understood and to reinforce alignment across teams.

The result of this work was MakerSuite, a generative AI platform launched at Google I/O 2023 that enables people, even those without any ML experience, to prototype creatively using LLMs. The team’s first-hand experience with users and understanding of the challenges they face allowed us to incorporate our AI Principles into the MakerSuite product design. Product features like safety filters, for example, enable users to manage outcomes, leading to easier and more responsible product development with MakerSuite.

Because of our close collaboration with product teams, we were able to adapt text-only prototyping to support multimodal interaction with Google AI Studio, an evolution of MakerSuite. Now, Google AI Studio enables developers and non-developers alike to seamlessly leverage Google’s latest Gemini model to merge multiple modality inputs, like text and image, in product explorations. Facilitating product development in this way provides us with the opportunity to better use AI to identify appropriateness of outcomes and unlocks opportunities for developers and non-developers to play with AI sandboxes. Together with our partners, we continue to actively push this effort in the products we support.

Google AI studio enables developers and non-developers to leverage Google Cloud infrastructure and merge multiple modality inputs in their product explorations.

Equitable speech recognition

Multiple external studies, as well as Google’s own research, have identified an unfortunate deficiency in the ability of current speech recognition technology to understand Black speakers on average, relative to White speakers. As multimodal AI tools begin to rely more heavily on speech prompts, this problem will grow and continue to alienate users. To address this problem, the Responsible AI UX team is partnering with world-renowned linguists and scientists at Howard University, a prominent HBCU, to build a high quality African-American English dataset to improve the design of our speech technology products to make them more accessible. Called Project Elevate Black Voices, this effort will allow Howard University to share the dataset with those looking to improve speech technology while establishing a framework for responsible data collection, ensuring the data benefits Black communities. Howard University will retain the ownership and licensing of the dataset and serve as stewards for its responsible use. At Google, we’re providing funding support and collaborating closely with our partners at Howard University to ensure the success of this program.

Equitable computer vision

The Gender Shades project highlighted that computer vision systems struggle to detect people with darker skin tones, and performed particularly poorly for women with darker skin tones. This is largely due to the fact that the datasets used to train these models were not inclusive to a wide range of skin tones. To address this limitation, the Responsible AI UX team has been partnering with sociologist Dr. Ellis Monk to release the Monk Skin Tone Scale (MST), a skin tone scale designed to be more inclusive of the spectrum of skin tones around the world. It provides a tool to assess the inclusivity of datasets and model performance across an inclusive range of skin tones, resulting in features and products that work better for everyone.

We have integrated MST into a range of Google products, such as Search, Google Photos, and others. We also open sourced MST, published our research, described our annotation practices, and shared an example dataset to encourage others to easily integrate it into their products. The Responsible AI UX team continues to collaborate with Dr. Monk, utilizing the MST across multiple product applications and continuing to do international research to ensure that it is globally inclusive.

Consulting & guidance

As teams across Google continue to develop products that leverage the capabilities of GenAI models, our team recognizes that the challenges they face are varied and that market competition is significant. To support teams, we develop actionable assets to facilitate a more streamlined and responsible product design process that considers available resources. We act as a product-focused design consultancy, identifying ways to scale services, share expertise, and apply our design principles more broadley. Our goal is to help all product teams at Google connect significant unmet user needs with technology benefits via great responsible product design.

One way we have been doing this is with the creation of the People + AI Guidebook, an evolving summative resource of many of the responsible design lessons we’ve learned and recommendations we’ve made for internal and external stakeholders. With its forthcoming, rolling updates focusing specifically on how to best design and consider user needs with GenAI, we hope that our internal teams, external stakeholders, and larger community will have useful and actionable guidance at the most critical milestones in the product development journey.

The People + AI Guidebook has six chapters, designed to cover different aspects of the product life cycle.

If you are interested in reading more about Responsible AI UX and how we are specifically thinking about designing responsibly with Generative AI, please check out this Q&A piece.

Acknowledgements

Shout out to our the Responsible AI UX team members: Aaron Donsbach, Alejandra Molina, Courtney Heldreth, Diana Akrong, Ellis Monk, Femi Olanubi, Hope Neveux, Kafayat Abdul, Key Lee, Mahima Pushkarna, Sally Limb, Sarah Post, Sures Kumar Thoddu Srinivasan, Tesh Goyal, Ursula Lauriston, and Zion Mengesha. Special thanks to Michelle Cohn for her contributions to this work.

Read More

Create a document lake using large-scale text extraction from documents with Amazon Textract

Create a document lake using large-scale text extraction from documents with Amazon Textract

AWS customers in healthcare, financial services, the public sector, and other industries store billions of documents as images or PDFs in Amazon Simple Storage Service (Amazon S3). However, they’re unable to gain insights such as using the information locked in the documents for large language models (LLMs) or search until they extract the text, forms, tables, and other structured data. With AWS intelligent document processing (IDP) using AI services such as Amazon Textract, you can take advantage of industry-leading machine learning (ML) technology to quickly and accurately process data from PDFs or document images (TIFF, JPEG, PNG). After the text is extracted from the documents, you can use it to fine-tune a foundation model, summarize the data using a foundation model, or send it to a database.

In this post, we focus on processing a large collection of documents into raw text files and storing them in Amazon S3. We provide you with two different solutions for this use case. The first allows you to run a Python script from any server or instance including a Jupyter notebook; this is the quickest way to get started. The second approach is a turnkey deployment of various infrastructure components using AWS Cloud Development Kit (AWS CDK) constructs. The AWS CDK construct provides a resilient and flexible framework to process your documents and build an end-to-end IDP pipeline. Through the use of the AWS CDK, you can extend its functionality to include redaction, store the output in Amazon OpenSearch, or add a custom AWS Lambda function with your own business logic.

Both of these solutions allow you to quickly process many millions of pages. Before running either of these solutions at scale, we recommend testing with a subset of your documents to make sure the results meet your expectations. In the following sections, we first describe the script solution, followed by the AWS CDK construct solution.

Solution 1: Use a Python script

This solution processes documents for raw text through Amazon Textract as quickly as the service will allow with the expectation that if there is a failure in the script, the process will pick up from where it left off. The solution utilizes three different services: Amazon S3, Amazon DynamoDB, and Amazon Textract.

The following diagram illustrates the sequence of events within the script. When the script ends, a completion status along with the time taken will be returned to the SageMaker studio console.

diagram

We have packaged this solution in a .ipynb script and .py script. You can use any of the deployable solutions as per your requirements.

Prerequisites

To run this script from a Jupyter notebook, the AWS Identity and Access Management (IAM) role assigned to the notebook must have permissions that allow it to interact with DynamoDB, Amazon S3, and Amazon Textract. The general guidance is to provide least-privilege permissions for each of these services to your AmazonSageMaker-ExecutionRole role. To learn more, refer to Get started with AWS managed policies and move toward least-privilege permissions.

Alternatively, you can run this script from other environments such as an Amazon Elastic Compute Cloud (Amazon EC2) instance or container that you would manage, provided that Python, Pip3, and the AWS SDK for Python (Boto3) are installed. Again, the same IAM polices need to be applied that allow the script to interact with the various managed services.

Walkthrough

To implement this solution, you first need to clone the repository GitHub.

You need to set the following variables in the script before you can run it:

  • tracking_table – This is the name of the DynamoDB table that will be created.
  • input_bucket – This is your source location in Amazon S3 that contains the documents that you want to send to Amazon Textract for text detection. For this variable, provide the name of the bucket, such as mybucket.
  • output_bucket – This is for storing the location of where you want Amazon Textract to write the results to. For this variable, provide the name of the bucket, such as myoutputbucket.
  • _input_prefix (optional) – If you want to select certain files from within a folder in your S3 bucket, you can specify this folder name as the input prefix. Otherwise, leave the default as empty to select all.

The script is as follows:

_tracking_table = "Table_Name_for_storing_s3ObjectNames"
_input_bucket = "your_files_are_here"
_output_bucket = "Amazon Textract_writes_JSON_containing_raw_text_to_here"

The following DynamoDB table schema gets created when the script is run:

Table              Table_Name_for_storing_s3ObjectNames
Partition Key       objectName (String)
                    bucketName (String)
                    createdDate (Decimal)
                    outputbucketName (String)
                    txJobId (String)

When the script is run for the first time, it will check to see if the DynamoDB table exists and will automatically create it if needed. After the table is created, we need to populate it with a list of document object references from Amazon S3 that we want to process. The script by design will enumerate over objects in the specified input_bucket and automatically populate our table with their names when ran. It takes approximately 10 minutes to enumerate over 100,000 documents and populate those names into the DynamoDB table from the script. If you have millions of objects in a bucket, you could alternatively use the inventory feature of Amazon S3 that generates a CSV file of names, then populate the DynamoDB table from this list with your own script in advance and not use the function called fetchAllObjectsInBucketandStoreName by commenting it out. To learn more, refer to Configuring Amazon S3 Inventory.

As mentioned earlier, there is both a notebook version and a Python script version. The notebook is the most straightforward way to get started; simply run each cell from start to finish.

If you decide to run the Python script from a CLI, it is recommended that you use a terminal multiplexer such as tmux. This is to prevent the script from stopping should your SSH session finish. For example: tmux new -d ‘python3 textractFeeder.py’.

The following is the script’s entry point; from here you can comment out methods not needed:

"""Main entry point into script --- Start Here"""
if __name__ == "__main__":    
    now = time.perf_counter()
    print("started")

The following fields are set when the script is populating the DynamoDB table:

  • objectName – The name of the document located in Amazon S3 that will be sent to Amazon Textract
  • bucketName – The bucket where the document object is stored

These two fields must be populated if you decide to use a CSV file from the S3 inventory report and skip the auto populating that happens within the script.

Now that the table is created and populated with the document object references, the script is ready to start calling the Amazon Textract StartDocumentTextDetection API. Amazon Textract, similar to other managed services, has a default limit on the APIs called transactions per second (TPS). If required, you can request a quota increase from the Amazon Textract console. The code is designed to use multiple threads concurrently when calling Amazon Textract to maximize the throughput with the service. You can change this within the code by modifying the threadCountforTextractAPICall variable. By default, this is set to 20 threads. The script will initially read 200 rows from the DynamoDB table and store these in an in-memory list that is wrapped with a class for thread safety. Each caller thread is then started and runs within its own swim lane. Basically, the Amazon Textract caller thread will retrieve an item from the in-memory list that contains our object reference. It will then call the asynchronous start_document_text_detection API and wait for the acknowledgement with the job ID. The job ID is then updated back to the DynamoDB row for that object, and the thread will repeat by retrieving the next item from the list.

The following is the main orchestration code script:

while len(results) > 0:
        for record in results: # put these records into our thread safe list
            fileList.append(record)    
        """create our threads for processing Amazon Textract"""
        	  threadsforTextractAPI=threading.Thread(name="Thread - " + str(i), target=procestTextractFunction, args=(fileList,)) 

The caller threads will continue repeating until there are no longer any items within the list, at which point the threads will each stop. When all threads operating within their swim lanes have stopped, the next 200 rows from DynamoDB are retrieved and a new set of 20 threads are started, and the whole process repeats until every row that doesn’t contain a job ID is retrieved from DynamoDB and updated. Should the script crash due to some unexpected problem, then the script can be run again from the orchestrate() method. This makes sure that the threads will continue processing rows that contain empty job IDs. Note that when rerunning the orchestrate() method after the script has stopped, there is a potential that a few documents will get sent to Amazon Textract again. This number will be equal to or less than the number of threads that were running at the time of the crash.

When there are no more rows containing a blank job ID in the DynamoDB table, the script will stop. All the JSON output from Amazon Textract for all the objects will be found in the output_bucket by default under the textract_output folder. Each subfolder within textract_output will be named with the job ID that corresponds to the job ID that was stored in the DynamoDB table for that object. Within the job ID folder, you will find the JSON, which will be numerically named starting at 1 and can potentially span additional JSON files that would be labeled 2, 3, and so on. Spanning JSON files is a result of dense or multi-page documents, where the amount of content extracted exceeds the Amazon Textract default JSON size of 1,000 blocks. Refer to Block for more information on blocks. These JSON files will contain all the Amazon Textract metadata, including the text that was extracted from within the documents.

You can find the Python code notebook version and script for this solution in GitHub.

Clean up

When the Python script is complete, you can save costs by shutting down or stopping the Amazon SageMaker Studio notebook or container that you spun up.

Now on to our second solution for documents at scale.

Solution 2: Use a serverless AWS CDK construct

This solution uses AWS Step Functions and Lambda functions to orchestrate the IDP pipeline. We use the IDP AWS CDK constructs, which make it straightforward to work with Amazon Textract at scale. Additionally, we use a Step Functions distributed map to iterate over all the files in the S3 bucket and initiate processing. The first Lambda function determines how many pages your documents has. This enables the pipeline to automatically use either the synchronous (for single-page documents) or asynchronous (for multi-page documents) API. When using the asynchronous API, an additional Lambda function is called to all the JSON files that Amazon Textract will produce for all of your pages into one JSON file to make it straightforward for your downstream applications to work with the information.

This solution also contains two additional Lambda functions. The first function parses the text from the JSON and saves it as a text file in Amazon S3. The second function analyzes the JSON and stores that for metrics on the workload.

The following diagram illustrates the Step Functions workflow.

Diagram

Prerequisites

This code base uses the AWS CDK and requires Docker. You can deploy this from an AWS Cloud9 instance, which has the AWS CDK and Docker already set up.

Walkthrough

To implement this solution, you first need to clone the repository.

After you clone the repository, install the dependencies:

pip install -r requirements.txt

Then use the following code to deploy the AWS CDK stack:

cdk bootstrap
cdk deploy --parameters SourceBucket=<Source Bucket> SourcePrefix=<Source Prefix>

You must provide both the source bucket and source prefix (the location of the files you want to process) for this solution.

When the deployment is complete, navigate to the Step Functions console, where you should see the state machine ServerlessIDPArchivePipeline.

Diagram

Open the state machine details page and on the Executions tab, choose Start execution.

Diagram

Choose Start execution again to run the state machine.

Diagram

After you start the state machine, you can monitor the pipeline by looking at the map run. You will see an Item processing status section like the following screenshot. As you can see, this is built to run and track what was successful and what failed. This process will continue to run until all documents have been read.

Diagram

With this solution, you should be able to process millions of files in your AWS account without worrying about how to properly determine which files to send to which API or corrupt files failing your pipeline. Through the Step Functions console, you will be able to watch and monitor your files in real time.

Clean up

After your pipeline is finished running, to clean up, you can go back into your project and enter the following command:

cdk destroy

This will delete any services that were deployed for this project.

Conclusion

In this post, we presented a solution that makes it straightforward to convert your document images and PDFs to text files. This is a key prerequisite to using your documents for generative AI and search. To learn more about using text to train or fine-tune your foundation models, refer to Fine-tune Llama 2 for text generation on Amazon SageMaker JumpStart. To use with search, refer to Implement smart document search index with Amazon Textract and Amazon OpenSearch. To learn more about advanced document processing capabilities offered by AWS AI services, refer to Guidance for Intelligent Document Processing on AWS.


About the Authors

Tim CondelloTim Condello is a senior artificial intelligence (AI) and machine learning (ML) specialist solutions architect at Amazon Web Services (AWS). His focus is natural language processing and computer vision. Tim enjoys taking customer ideas and turning them into scalable solutions.

David Girling is a senior AI/ML solutions architect with over twenty years of experience in designing, leading and developing enterprise systems. David is part of a specialist team that focuses on helping customers learn, innovate and utilize these highly capable services with their data for their use cases.

Read More

Amgen to Build Generative AI Models for Novel Human Data Insights and Drug Discovery

Amgen to Build Generative AI Models for Novel Human Data Insights and Drug Discovery

Generative AI is transforming drug research and development, enabling new discoveries faster than ever — and Amgen, one of the world’s leading biotechnology companies, is tapping the technology to power its research.

Amgen will build AI models trained to analyze one of the world’s largest human datasets on an NVIDIA DGX SuperPOD, a full-stack data center platform, that will be installed at Amgen’s deCODE genetics’ headquarters in Reykjavik, Iceland. The system will be named Freyja in honor of the powerful, life-giving Norse goddess associated with the ability to predict the future.

Freyja will be used to build a human diversity atlas for drug target and disease-specific biomarker discovery, providing vital diagnostics for monitoring disease progression and regression. The system will also help develop AI-driven precision medicine models, potentially enabling individualized therapies for patients with serious diseases.

Amgen plans to integrate the DGX SuperPOD, which will feature 31 NVIDIA DGX H100 nodes totaling 248 H100 Tensor Core GPUs, to train state-of-the-art AI models in days rather than months, enabling researchers to more efficiently analyze and learn from data in their search for novel health and therapeutics insights.

“For more than a decade, Amgen has been preparing for this hinge moment we are seeing in the industry, powered by the union of technology and biotechnology,” said David M. Reese, executive vice president and chief technology officer at Amgen. “We look forward to  combining the breadth and maturity of our world-class human data capabilities at Amgen with NVIDIA’s technologies.”

The goal of deCODE founder and CEO Kári Stefánsson in starting the company was to understand human disease by looking at the diversity of the human genome. He predicted in a recent Amgen podcast that within the next 10 years, doctors will routinely use genetics to explore uncommon diseases in patients.

“This SuperPOD has the potential to accelerate our research by training models more quickly and helping us generate questions we might not have otherwise thought to ask,” said Stefánsson.

Putting the Tech in Biotechnology

Since its founding in 1996, deCODE has curated more than 200 petabytes of de-identified human data from nearly 3 million individuals.

The company started by collecting de-identified data from Icelanders, who have a rich heritage in genealogies that stretch back for centuries. This population-scale data from research volunteers provides unique insights into human diversity as it applies to disease.

deCODE has also helped sequence more than half a million human genomes from volunteers in the UK Biobank.

But drawing insights from this much data requires powerful AI systems.

By integrating powerful new technology, Amgen has an opportunity to accelerate the discovery and development of life-changing medicines. In March 2023, NVIDIA announced that Amgen became one of the first companies to employ NVIDIA BioNeMo, which researchers have used to build generative AI models to accelerate drug discovery and development. Amgen researchers have also been accessing BioNeMo via NVIDIA DGX Cloud, an AI supercomputing service.

“Models trained in BioNeMo can advance drug discovery on multiple fronts,” said Marti Head, executive director of computational and data sciences at Amgen. “In addition to helping develop drugs that are more effective, they can also help avoid unwanted effects like immune responses, and new biologics can be made in volume.”

By adopting DGX SuperPOD, Amgen is poised to gain unprecedented data insights with the potential to change the pace and scope of drug discovery.

“The fusion of advanced AI, groundbreaking developments in biology and molecular engineering and vast quantities of human data are not just reshaping how we discover and develop new medicines — they’re redefining medicine,” Reese said.

Learn about NVIDIA’s AI platform for healthcare and life sciences.

Read More

NVIDIA Generative AI Is Opening the Next Era of Drug Discovery and Design

NVIDIA Generative AI Is Opening the Next Era of Drug Discovery and Design

In perhaps the healthcare industry’s most dramatic transformation since the advent of computing, digital biology and generative AI are helping to reinvent drug discovery, surgery, medical imaging and wearable devices.

NVIDIA has been preparing for this moment for over a decade, building deep domain expertise, creating the NVIDIA Clara healthcare-specific computing platform and expanding its work with a rich ecosystem of partners. Healthcare customers and partners already consume well over a billion dollars in NVIDIA GPU computing each year — directly and indirectly through cloud partners.

In the $250 billion field of drug discovery, these efforts are meeting an inflection point: R&D teams can now represent drugs inside a computer.

By harnessing emerging generative AI tools, drug discovery teams observe foundational building blocks of molecular sequence, structure, function and meaning — allowing them to generate or design novel molecules likely to possess desired properties. With these capabilities, researchers can curate a more precise field of drug candidates to investigate, reducing the need for expensive, time-consuming physical experiments.

Accelerating this shift is NVIDIA BioNeMo, a generative AI platform that provides services to develop, customize and deploy foundation models for drug discovery.

Used by pharmaceutical, techbio and software companies, BioNeMo offers a new class of computational methods for drug research and development, enabling scientists to integrate generative AI to reduce experiments and, in some cases, replace them altogether.

In addition to developing, optimizing and hosting AI models through BioNeMo, NVIDIA has boosted the computer-aided drug discovery ecosystem with investments in innovative techbio companies — such as biopharmaceutical company Recursion, which is offering one of its foundation models for BioNeMo users, and biotech company Terray Therapeutics, which is using BioNeMo for AI model development.

BioNeMo Brings Precision to AI-Accelerated Drug Discovery 

BioNeMo features a growing collection of pretrained biomolecular AI models for protein structure prediction, protein sequence generation, molecular optimization, generative chemistry, docking prediction and more. It also enables computer-aided drug discovery companies to make their models available to a broad audience through easy-to-access APIs for inference and customization.

Drug discovery teams use BioNeMo to invent or customize generative AI models with proprietary data — and drug discovery software companies, techbios and large pharmas  are integrating  BioNeMo cloud APIs, which will be released in beta this month, into platforms that deliver computer-aided drug discovery workflows.

The cloud APIs will now include foundation models from three sources: models invented by NVIDIA, such as the MolMIM generative chemistry model for small molecule generation; open-source models pioneered by global research teams, curated and optimized by NVIDIA, such as the OpenFold protein prediction AI; and proprietary models developed by NVIDIA partners, such as Recursion’s Phenom-Beta for embedding cellular microscopy images.

MolMIM generates small molecules while giving users finer control over the AI generation process — identifying new molecules that possess desired properties and follow constraints specified by users. For example, researchers could direct the model to generate molecules that have similar structures and properties to a given reference molecule.

Phenomenal AI for Pharma: Recursion Brings Phenom-Beta Model to BioNeMo

Recursion is the first hosting partner offering an AI model through BioNeMo cloud APIs: Phenom-Beta, a vision transformer model that extracts biologically meaningful features from cellular microscopy images.

This capability can provide researchers with insights about cell function and help them learn how cells respond to drug candidates or genetic engineering.

Phenom-Beta performed well on image reconstruction tasks, a training metric to evaluate model performance. Read the NeurIPS workshop paper to learn more.

Phenom-Beta was trained on Recursion’s publicly available RxRx3 dataset of biological images using the company’s BioHive-1 supercomputer, based on the NVIDIA DGX SuperPOD reference architecture.

To further its foundation model development, Recursion is expanding its supercomputer with more than 500 NVIDIA H100 Tensor Core GPUs. This will boost its computational capacity by 4x to create what’s expected to be the most powerful supercomputer owned and operated by any biopharma company.

How Companies Are Adopting NVIDIA BioNeMo

A growing group of scientists, biotech and pharma companies, and AI software vendors are using NVIDIA BioNeMo to support biology, chemistry and genomics research.

Biotech leader Terray Therapeutics is integrating BioNeMo cloud APIs into its development of a generalized, multi-target structural binding model. The company also uses NVIDIA DGX Cloud to train chemistry foundation models to power generative AI for small molecule design.

Protein engineering and molecular design companies Innophore and Insilico Medicine are bringing BioNeMo into their computational drug discovery applications. Innophore is integrating BioNeMo cloud APIs into its Catalophore platform for protein design and drug discovery. And Insilico, a premier member of the NVIDIA Inception program for startups, has adopted BioNeMo in its generative AI pipeline for early drug discovery.

Biotech software company OneAngstrom and systems integrator Deloitte are using BioNeMo cloud APIs to build AI solutions for their clients.

OneAngstrom is integrating BioNeMo cloud APIs into its SAMSON platform for molecular design used by academics, biotechs and pharmas. Deloitte is transforming scientific research by integrating BioNeMo on NVIDIA DGX Cloud with the Quartz Atlas AI platform. This combination enables biopharma researchers with unparalleled data connectivity and cutting-edge generative AI, propelling them into a new era of accelerated drug discovery.

Learn more about NVIDIA BioNeMo and subscribe to NVIDIA healthcare news.

Read More

NVIDIA Reveals Gaming, Creating, Generative AI, Robotics Innovations at CES

NVIDIA Reveals Gaming, Creating, Generative AI, Robotics Innovations at CES

The AI revolution returned to where it started this week, putting powerful new tools into the hands of gamers and content creators.

Generative AI models that will bring lifelike characters to games and applications and new GPUs for gamers and creators were among the highlights of a news-packed address Monday ahead of this week’s CES trade show in Las Vegas.

“Today, NVIDIA is at the center of the latest technology transformation: generative AI,” said Jeff Fisher, senior vice president for GeForce at NVIDIA, who was joined by leaders across the company to introduce products and partnerships across gaming, content creation, and robotics.

A Launching Pad for Generative AI

As AI shifts into the mainstream, Fisher said NVIDIA’s RTX GPUs, with more than 100 million units shipped, are pivotal in the burgeoning field of generative AI, exemplified by innovations like ChatGPT and Stable Diffusion.

In October, NVIDIA released the TensorRT-LLM library for Windows, accelerating large language models, or LLMs, like Llama 2 and Mistral up to 5x on RTX PCs.

And with our new Chat with RTX playground, releasing later this month, enthusiasts can connect an RTX-accelerated LLM to their own data, from locally stored documents to YouTube videos, using retrieval-augmented generation, or RAG, a technique for enhancing the accuracy and reliability of generative AI models.

Fisher also introduced TensorRT acceleration for Stable Diffusion XL and SDXL Turbo in the popular Automatic1111 text-to-image app, providing up to a 60% boost in performance.

NVIDIA Avatar Cloud Engine (ACE) Microservices Debut With Generative AI Models for Digital Avatars

NVIDIA ACE is a technology platform that brings digital avatars to life with generative AI. ACE AI models are designed to run in the cloud or locally on the PC.

In an ACE demo featuring Convai’s new technologies, NVIDIA’s Senior Product Manager Seth Schneider showed how it works.

 

First, a player’s voice input is passed to NVIDIA’s automatic speech recognition model, which translates speech to text. Then, the text is put into an LLM to generate the character’s response.

After that, the text response is vocalized using a text-to-speech model, which is passed to an animation model to create a realistic lip sync. Finally, the dynamic character is rendered into the game scene.

At CES, NVIDIA is announcing ACE Production Microservices for NVIDIA Audio2Face and NVIDIA Riva Automatic Speech Recognition. Available now, each model can be incorporated by developers individually into their pipelines.

NVIDIA is also announcing game and interactive avatar developers are pioneering ways ACE and generative AI technologies can be used to transform interactions between players and non-playable characters in games and applications. Developers embracing ACE include Convai, Charisma.AI, Inworld, miHoYo, NetEase Games, Ourpalm, Tencent, Ubisoft and UneeQ.

Getty Images Releases Generative AI by iStock and AI Image Generation Tools Powered by NVIDIA Picasso

Generative AI empowers designers and marketers to create concept imagery, social media content and more. Today, iStock by Getty Images is releasing a genAI service built on NVIDIA Picasso, an AI foundry for visual design, Fisher announced.

The iStock service allows anyone to create 4K imagery from text using an AI model trained on Getty Images’ extensive catalog of licensed, commercially safe creative content. New editing application programming interfaces that give customers powerful control over their generated images are also coming soon.

The generative AI service is available today at istock.com, with advanced editing features releasing via API.

NVIDIA Introduces GeForce RTX 40 SUPER Series

Fisher announced a new series of GeForce RTX 40 SUPER GPUs with more gaming and generative AI performance.

Fisher said that the GeForce RTX 4080 SUPER can power fully ray-traced games at 4K. It’s 1.4x faster than the RTX 3080 Ti without frame gen in the most graphically intensive games. With 836 AI TOPS, NVIDIA DLSS Frame Generation delivers an extra performance boost, making the RTX 4080 SUPER twice as fast as an RTX 3080 Ti.

Creators can generate video with Stable Video Diffusion 1.5x faster and images with Stable Diffusion XL 1.7x faster. The RTX 4080 SUPER features more cores and faster memory, giving it a performance edge at a great new price of $999. It will be available starting Jan. 31.

Next up is the RTX 4070 Ti SUPER. NVIDIA has added more cores and increased the frame buffer to 16GB and the memory bus to 256 bits. It’s 1.6x faster than a 3070 Ti and 2.5x faster with DLSS 3, Fisher said. The RTX 4070 Ti SUPER will be available starting Jan. 24 for $799.

Fisher also introduced the RTX 4070 SUPER. NVIDIA has added 20% more cores, making it faster than the RTX 3090 while using a fraction of the power. And with DLSS 3, it’s 1.5x faster in the most demanding games. It will be available for $599 starting Jan. 17.

NVIDIA RTX Remix Open Beta Launches This Month

There are over 10 billion game mods downloaded each year. With RTX Remix, modders can remaster classic games with full ray tracing, DLSS, NVIDIA Reflex and generative AI texture tools that transform low-resolution textures into 4K, physically accurate materials. The RTX Remix app will be released in open beta on Jan. 22.

RTX Remix has already delivered stunning remasters in NVIDIA’s Portal with RTX and the modder-made Portal: Prelude RTX. Now, Orbifold Studios is using RTX Remix to develop Half-Life 2 RTX: An RTX Remix Project, a community remaster of one of the highest-rated games of all time.

Check out this new Half-Life 2 RTX gameplay trailer:

 

Twitch and NVIDIA to Release Multi-Encode Livestreaming

Twitch is one of the most popular platforms for content creators, with over 7 million streamers going live each month to 35 million daily viewers. Fisher explained that these viewers are on all kinds of devices and internet services.

Yet many Twitch streamers are limited to broadcasting at a single resolution and quality level. As a result, they must broadcast at lower quality to reach more viewers.

To address this, Twitch, OBS and NVIDIA announced Enhanced Broadcasting, supported by all RTX GPUs. This new feature allows streamers to transmit up to three concurrent streams to Twitch at different resolutions and quality so each viewer gets the optimal experience.

Beta signups start today and will go live later this month. Twitch will also experiment with 4K and AV1 on the GeForce RTX 40 Series GPUs to deliver even better quality and higher resolution streaming.

‘New Wave’ of AI-Ready RTX Laptops

RTX is the fastest-growing laptop platform, having grown 5x in the last four years. Over 50 million devices are enjoyed by gamers and creators across the globe.

More’s coming. Fisher announced “a new wave” of RTX laptops launching from every major manufacturer. “Thanks to powerful RT and Tensor Cores, every RTX laptop is AI-ready for the best gaming and AI experiences,” Fisher said.

With an installed base of 100 million GPUs and 500 RTX games and apps, GeForce RTX is the world’s largest platform for gamers, creators and, now, generative AI.

Activision and Blizzard Games Embrace RTX

More than 500 games and apps now take advantage of NVIDIA RTX technology, NVIDIA’s Senior Consumer Marketing Manager Kristina Bartz said, including Alan Wake 2, which won three awards at this year’s Game Awards.

NVIDIA Consumer Marketing Manager Kristina Bartz spoke about how NVIDIA technologies are being integrated into popular games.

It’s a list that keeps growing with 14 new RTX titles announced at CES.

Horizon Forbidden West, the critically acclaimed sequel to Horizon Zero Dawn, will come to PC early this year with the Burning Shores expansion, accelerated by DLSS 3.

Pax Dei is a social sandbox massively multiplayer online game inspired by the legends of the medieval era. Developed by Mainframe Industries with veterans from CCP Games, Blizzard and Remedy Entertainment, Pax Dei will launch in early access on PC with AI-accelerated DLSS 3 this spring.

Last summer, Diablo IV launched with DLSS 3 and immediately became Blizzard’s fastest-selling game. RTX ray tracing will now be coming to Diablo IV in March.

More than 500 games and apps now take advantage of NVIDIA RTX technology, with more coming.

Day Passes and G-SYNC Technology Coming to GeForce NOW

NVIDIA’s partnership with Activision also extends to the cloud with GeForce NOW, Bartz said. In November, NVIDIA welcomed the first Activation and Blizzard game, Call of Duty: Modern Warfare 3. Diablo IV and Overwatch 2 are coming soon.

GeForce NOW will get Day Pass membership options starting in February. Priority and Ultimate Day Passes will give gamers a full day of gaming with the fastest access to servers, with all the same benefits as members, including NVIDIA DLSS 3.5 and NVIDIA Reflex for Ultimate Day Pass purchasers.

NVIDIA also announced Cloud G-SYNC technology is coming to GeForce NOW, which varies the display refresh rate to match the frame rate on G-SYNC monitors, giving members the smoothest, tear-free gaming experience from the cloud.

Generative AI Powers Smarter Robots With NVIDIA Isaac

NVIDIA Vice President of Robotics and Edge Computing Deepu Talla addressed the intersection of AI and robotics.

Closing out the special address, NVIDIA Vice President of Robotics and Edge Computing Deepu Talla shared how the infusion of generative AI into robotics is speeding up the ability to bring robots from proof of concept to real-world deployment.

Talla gave a peek into the growing use of generative AI in the NVIDIA robotics ecosystem, where robotics innovators like Boston Dynamics and Collaborative Robots are changing the landscape of human-robot interaction.

Read More

NVIDIA Drives AI Forward With Automotive Innovation on Display at CES

NVIDIA Drives AI Forward With Automotive Innovation on Display at CES

Amid explosive interest in generative AI, the auto industry is racing to embrace the power of AI across a range of critical activities, from vehicle design, engineering and manufacturing, to marketing and sales.

The adoption of generative AI — along with the growing importance of software-defined computing — will continue to transform the automotive market in 2024.

NVIDIA today announced that Li Auto, a pioneer in extended-range electric vehicles (EVs), has selected the NVIDIA DRIVE Thor centralized car computer to power its next-generation fleets. Also, EV makers GWM (Great Wall Motor), ZEEKR and Xiaomi have adopted the NVIDIA DRIVE Orin platform to power their intelligent automated-driving systems.

In addition, a powerful lineup of technology is on display from NVIDIA’s automotive partners on the CES trade show floor in Las Vegas.

  • Mercedes-Benz is kicking off CES with a press conference to announce a range of exciting software-driven features and the latest developments in the Mercedes-Benz MB.OS story, each one showcased in a range of cars, including the Concept CLA Class, which is using NVIDIA DRIVE Orin for the automated driving domain. 

    Mercedes-Benz is also using digital twins for production with help from NVIDIA Omniverse, a platform for developing applications to design, collaborate, plan and operate manufacturing and assembly facilities. (West Hall – 4941)

  • Luminar will host a fireside chat with NVIDIA on Jan. 9 at 2 p.m. PT to discuss the state of the art of sensor processing and ongoing collaborations between the companies. In addition, Luminar will showcase the work it’s doing with NVIDIA partners Volvo Cars, Polestar, Plus and Kodiak. (West Hall – 5917 and West Plaza – WP10)
  • Ansys is demonstrating how it leverages NVIDIA Omniverse to accelerate autonomous vehicle development. Ansys AVxcelerate Sensors will be accessible within NVIDIA DRIVE Sim. (West Hall – 6500)
  • Cerence is introducing CaLLM, an automotive-specific large language model that serves as the foundation for the company’s next-gen in-car computing platform, running on NVIDIA DRIVE. (West Hall – 6627)
  • Cipia is showcasing its embedded software version of Cabin Sense, which includes both driver and occupancy monitoring and is expected to go into serial production this year. NVIDIA DRIVE is the first platform on which Cabin Sense will run commercially. (North Hall – 11022)
  • Kodiak is exhibiting an autonomous truck, which relies on NVIDIA GPUs for high-performance compute to process the enormous quantities of data it collects from its cameras, radar and lidar sensors. (West Plaza – WP10, with Luminar)
  • Lenovo is displaying its vehicle computing roadmap, featuring new products based on NVIDIA DRIVE Thor, including: Lenovo XH1, a central compute unit for advanced driver-assistance systems and smart cockpit; Lenovo AH1, a level 2++ ADAS domain controller unit; and Lenovo AD1, a level 4 autonomous driving domain controller unit. (Estiatorio Milos, Venetian Hotel)
  • Pebble, a recreational vehicle startup, is presenting its flagship product Pebble Flow, the electric semi-autonomous travel trailer powered by NVIDIA DRIVE Orin, with production starting before the end of 2024. (West Hall – 7023)
  • Polestar is showcasing Polestar 3, which is powered by the NVIDIA DRIVE Orin central core computer. (West Hall – 5917 with Luminar and Central Plaza – CP1 with Google)
  • Zoox is showcasing the latest generation of its purpose-built robotaxi, which leverages NVIDIA technology, and is offering CES attendees the opportunity to join its early-bird waitlist for its autonomous ride-hailing service. (West Hall – 7228)

Explore to Win

Visit select NVIDIA partner booths for a chance to win GTC 2024 conference passes with hotel accommodations.

Event Lineup

Check out NVIDIA’s CES event page for a summary of all of the company’s automotive-related events. Learn about NVIDIA’s other announcements at CES by viewing the company’s special address on demand.

Read More