Spring Into Action With 11 New Games on GeForce NOW

Spring Into Action With 11 New Games on GeForce NOW

As the days grow longer and the flowers bloom, GFN Thursday brings a fresh lineup of games to brighten the week.

Dive into thrilling hunts and dark fantasy adventures with the arrivals of titles like Hunt: Showdown 1896 — now available on Xbox and PC Game Pass and Mandragora: Whispers of the Witch Tree on GeForce NOW. Whether chasing bounties in the Colorado Rockies or battling chaos in a cursed land, players will gain unforgettable experiences with these games in the cloud.

Plus, roll with the punches in Capcom’s MARVEL vs. CAPCOM Fighting Collection: Arcade Classics, part of 11 games GeForce NOW is adding to its cloud gaming library — featuring over 2,000 titles playable with GeForce RTX 4080 performance.

Spring Into Gaming Anywhere

With the arrivals of Hunt: Showdown 1896 and Mandragora: Whispers of the Witch Tree in the cloud, GeForce NOW members can take their gaming journeys anywhere, from the wild frontiers of the American West to the shadowy forests of a dark fantasy realm.

Hunt Showdown 1896 on GeForce NOW
It’s the wild, wild west.

Hunt: Showdown 1896 transports players to the untamed Rockies, where danger lurks behind every pine and in every abandoned mine. PC Game Pass members — and those who own the game on Xbox — can stream the action instantly. Whether players are tracking monstrous bounties solo or teaming with friends, the game’s tense player vs. player vs. environment action and new map, Mammon’s Gulch, are ideal for springtime exploration.

Jump into the hunt from the living room, in the backyard or even on the go — no high-end PC required with GeForce NOW.

Mandragora on GeForce NOW
Every whisper is a warning.

Step into a beautifully hand-painted world teetering on the edge of chaos in Mandragora: Whispers of the Witch Tree. As an Inquisitor, battle nightmarish creatures and uncover secrets beneath the budding canopies of Faelduum. With deep role-playing game mechanics and challenging combat, Mandragora is ideal for players seeking a fresh adventure this season. GeForce NOW members can continue their quest wherever spring takes them — including on their laptops, tablets and smartphones.

Time for New Games

Marvel VS. Capcom on GeForce NOW
Everyone’s shouting from the excitement of being in the cloud.

Catch MARVEL vs. CAPCOM Fighting Collection: Arcade Classics in the cloud this week. In this legendary collection of arcade classics from the fan-favorite Marvel and Capcom crossover games, dive into an action-packed lineup of seven titles, including heavy hitters X-MEN vs. STREET FIGHTER and MARVEL vs. CAPCOM 2 New Age of Heroes, as well as THE PUNISHER. 

Each game in the collection can be played online or in co-op mode. Whether new or returning to the series from their arcade days, players of all levels can together enjoy these timeless classics in the cloud.

Look for the following games available to stream in the cloud this week:

  • Forever Skies (New release on Steam, available April 14)
  • Night Is Coming (New release on Steam, available April 14)
  • Hunt: Showdown 1896 (New release on Xbox, available on PC Game Pass April 15)
  • Crime Scene Cleaner (New release on Xbox, available on PC Game Pass April 17)
  • Mandragora: Whispers of the Witch Tree (New release on Steam, available April 17)
  • Tempest Rising (New release on Steam, Advanced Access starts April 17)
  • Aimlabs (Steam)
  • Blue Prince (Steam, Xbox)
  • ContractVille (Steam)
  • Gedonia 2 (Steam) 
  • MARVEL vs. CAPCOM Fighting Collection: Arcade Classics (Steam)
  • Path of Exile 2 (Epic Games Store)

What are you planning to play this weekend? Let us know on X or in the comments below.

Read More

ACM Human-Computer Interaction Conference (CHI) 2025

Apple is presenting new research at the ACM annual conference on Human-Computer Interaction (CHI), which takes place in person in Yokohama, Japan, from April 26 to May 1. We are proud to again sponsor the conference, which brings together the scientific and industrial research communities focused on interactive technology. Below is an overview of Apple’s participation at CHI 2025.

Schedule
Stop by the Apple booth (304 & 305) in the Yokohama PACIFICO during exhibition hours. All times listed in GMT +9 (Japan Time):

Tuesday, April 29: 10:00 – 17:00
Wednesday, April 30: 10:00 -…Apple Machine Learning Research

Disentangled Representational Learning with the Gromov-Monge Gap

Learning disentangled representations from unlabelled data is a fundamental challenge in machine learning. Solving it may unlock other problems, such as generalization, interpretability, or fairness. Although remarkably challenging to solve in theory, disentanglement is often achieved in practice through prior matching. Furthermore, recent works have shown that prior matching approaches can be enhanced by leveraging geometrical considerations, e.g., by learning representations that preserve geometric features of the data, such as distances or angles between points. However, matching the prior…Apple Machine Learning Research

Automate Amazon EKS troubleshooting using an Amazon Bedrock agentic workflow

Automate Amazon EKS troubleshooting using an Amazon Bedrock agentic workflow

As organizations scale their Amazon Elastic Kubernetes Service (Amazon EKS) deployments, platform administrators face increasing challenges in efficiently managing multi-tenant clusters. Tasks such as investigating pod failures, addressing resource constraints, and resolving misconfiguration can consume significant time and effort. Instead of spending valuable engineering hours manually parsing logs, tracking metrics, and implementing fixes, teams should focus on driving innovation. Now, with the power of generative AI, you can transform your Kubernetes operations. By implementing intelligent cluster monitoring, pattern analysis, and automated remediation, you can dramatically reduce both mean time to identify (MTTI) and mean time to resolve (MTTR) for common cluster issues.

At AWS re:Invent 2024, we announced the multi-agent collaboration capability for Amazon Bedrock (preview). With multi-agent collaboration, you can build, deploy, and manage multiple AI agents working together on complex multistep tasks that require specialized skills. Because troubleshooting an EKS cluster involves deriving insights from multiple observability signals and applying fixes using a continuous integration and deployment (CI/CD) pipeline, a multi-agent workflow can help an operations team streamline the management of EKS clusters. The workflow manager agent can integrate with individual agents that interface with individual observability signals and a CI/CD workflow to orchestrate and perform tasks based on user prompt.

In this post, we demonstrate how to orchestrate multiple Amazon Bedrock agents to create a sophisticated Amazon EKS troubleshooting system. By enabling collaboration between specialized agents—deriving insights from K8sGPT and performing actions through the ArgoCD framework—you can build a comprehensive automation that identifies, analyzes, and resolves cluster issues with minimal human intervention.

Solution overview

The architecture consists of the following core components:

  • Amazon Bedrock collaborator agent – Orchestrates the workflow and maintains context while routing user prompts to specialized agents, managing multistep operations and agent interactions
  • Amazon Bedrock agent for K8sGPT – Evaluates cluster and pod events through K8sGPT’s Analyze API for security issues, misconfigurations, and performance problems, providing remediation suggestions in natural language
  • Amazon Bedrock agent for ArgoCD – Manages GitOps-based remediation through ArgoCD, handling rollbacks, resource optimization, and configuration updates

The following diagram illustrates the solution architecture.

Architecture Diagram

Prerequisites

You need to have the following prerequisites in place:

Set up the Amazon EKS cluster with K8sGPT and ArgoCD

We start with installing and configuring the K8sGPT operator and ArgoCD controller on the EKS cluster.

The K8sGPT operator will help with enabling AI-powered analysis and troubleshooting of cluster issues. For example, it can automatically detect and suggest fixes for misconfigured deployments, such as identifying and resolving resource constraint problems in pods.

ArgoCD is a declarative GitOps continuous delivery tool for Kubernetes that automates the deployment of applications by keeping the desired application state in sync with what’s defined in a Git repository.

The Amazon Bedrock agent serves as the intelligent decision-maker in our architecture, analyzing cluster issues detected by K8sGPT. After the root cause is identified, the agent orchestrates corrective actions through ArgoCD’s GitOps engine. This powerful integration means that when problems are detected (whether it’s a misconfigured deployment, resource constraints, or scaling issue), the agent can automatically integrate with ArgoCD to provide the necessary fixes. ArgoCD then picks up these changes and synchronizes them with your EKS cluster, creating a truly self-healing infrastructure.

  1. Create the necessary namespaces in Amazon EKS:
    kubectl create ns helm-guestbook
    kubectl create ns k8sgpt-operator-system
  2. Add the k8sgpt Helm repository and install the operator:
    helm repo add k8sgpt https://charts.k8sgpt.ai/
    helm repo update
    helm install k8sgpt-operator k8sgpt/k8sgpt-operator 
      --namespace k8sgpt-operator-system
  3. You can verify the installation by entering the following command:
    kubectl get pods -n k8sgpt-operator-system
    
    NAME                                                          READY   STATUS    RESTARTS  AGE
    release-k8sgpt-operator-controller-manager-5b749ffd7f-7sgnd   2/2     Running   0         1d
    

After the operator is deployed, you can configure a K8sGPT resource. This Custom Resource Definition(CRD) will have the large language model (LLM) configuration that will aid in AI-powered analysis and troubleshooting of cluster issues. K8sGPT supports various backends to help in AI-powered analysis. For this post, we use Amazon Bedrock as the backend and Anthropic’s Claude V3 as the LLM.

  1. You need to create the pod identity for providing the EKS cluster access to other AWS services with Amazon Bedrock:
    eksctl create podidentityassociation  --cluster PetSite --namespace k8sgpt-operator-system --service-account-name k8sgpt  --role-name k8sgpt-app-eks-pod-identity-role --permission-policy-arns arn:aws:iam::aws:policy/AmazonBedrockFullAccess  --region $AWS_REGION
  2. Configure the K8sGPT CRD:
    cat << EOF > k8sgpt.yaml
    apiVersion: core.k8sgpt.ai/v1alpha1
    kind: K8sGPT
    metadata:
      name: k8sgpt-bedrock
      namespace: k8sgpt-operator-system
    spec:
      ai:
        enabled: true
        model: anthropic.claude-v3
        backend: amazonbedrock
        region: us-east-1
        credentials:
          secretRef:
            name: k8sgpt-secret
            namespace: k8sgpt-operator-system
      noCache: false
      repository: ghcr.io/k8sgpt-ai/k8sgpt
      version: v0.3.48
    EOF
    
    kubectl apply -f k8sgpt.yaml
    
  3. Validate the settings to confirm the k8sgpt-bedrock pod is running successfully:
    kubectl get pods -n k8sgpt-operator-system
    NAME                                                          READY   STATUS    RESTARTS      AGE
    k8sgpt-bedrock-5b655cbb9b-sn897                               1/1     Running   9 (22d ago)   22d
    release-k8sgpt-operator-controller-manager-5b749ffd7f-7sgnd   2/2     Running   3 (10h ago)   22d
    
  4. Now you can configure the ArgoCD controller:
    helm repo add argo https://argoproj.github.io/argo-helm
    helm repo update
    kubectl create namespace argocd
    helm install argocd argo/argo-cd 
      --namespace argocd 
      --create-namespace
  5. Verify the ArgoCD installation:
    kubectl get pods -n argocd
    NAME                                                READY   STATUS    RESTARTS   AGE
    argocd-application-controller-0                     1/1     Running   0          43d
    argocd-applicationset-controller-5c787df94f-7jpvp   1/1     Running   0          43d
    argocd-dex-server-55d5769f46-58dwx                  1/1     Running   0          43d
    argocd-notifications-controller-7ccbd7fb6-9pptz     1/1     Running   0          43d
    argocd-redis-587d59bbc-rndkp                        1/1     Running   0          43d
    argocd-repo-server-76f6c7686b-rhjkg                 1/1     Running   0          43d
    argocd-server-64fcc786c-bd2t8                       1/1     Running   0          43d
  6. Patch the argocd service to have an external load balancer:
    kubectl patch svc argocd-server -n argocd -p '{"spec": {"type": "LoadBalancer"}}'
  7. You can now access the ArgoCD UI with the following load balancer endpoint and the credentials for the admin user:
    kubectl get svc argocd-server -n argocd
    NAME            TYPE           CLUSTER-IP       EXTERNAL-IP                                                              PORT(S)                      AGE
    argocd-server   LoadBalancer   10.100.168.229   a91a6fd4292ed420d92a1a5c748f43bc-653186012.us-east-1.elb.amazonaws.com   80:32334/TCP,443:32261/TCP   43d
  8. Retrieve the credentials for the ArgoCD UI:
    export argocdpassword=`kubectl -n argocd get secret argocd-initial-admin-secret 
    -o jsonpath="{.data.password}" | base64 -d`
    
    echo ArgoCD admin password - $argocdpassword
  9. Push the credentials to AWS Secrets Manager:
    aws secretsmanager create-secret 
    --name argocdcreds 
    --description "Credentials for argocd" 
    --secret-string "{"USERNAME":"admin","PASSWORD":"$argocdpassword"}"
  10. Configure a sample application in ArgoCD:
    cat << EOF > argocd-application.yaml
    apiVersion: argoproj.io/v1alpha1
    kind: Application
    metadata:
    name: helm-guestbook
    namespace: argocd
    spec:
    project: default
    source:
    repoURL: https://github.com/awsvikram/argocd-example-apps
    targetRevision: HEAD
    path: helm-guestbook
    destination:
    server: https://kubernetes.default.svc
    namespace: helm-guestbook
    syncPolicy:
    automated:
    prune: true
    selfHeal: true
    EOF
  11. Apply the configuration and verify it from the ArgoCD UI by logging in as the admin user:
    kubectl apply -f argocd-application.yaml

    ArgoCD Application

  12. It takes some time for K8sGPT to analyze the newly created pods. To make that immediate, restart the pods created in the k8sgpt-operator-system namespace. The pods can be restarted by entering the following command:
    kubectl -n k8sgpt-operator-system rollout restart deploy
    
    deployment.apps/k8sgpt-bedrock restarted
    deployment.apps/k8sgpt-operator-controller-manager restarted

Set up the Amazon Bedrock agents for K8sGPT and ArgoCD

We use a CloudFormation stack to deploy the individual agents into the US East (N. Virginia) Region. When you deploy the CloudFormation template, you deploy several resources (costs will be incurred for the AWS resources used).

Use the following parameters for the CloudFormation template:

  • EnvironmentName: The name for the deployment (EKSBlogSetup)
  • ArgoCD_LoadBalancer_URL: Extracting the ArgoCD LoadBalancer URL:
    kubectl  get service argocd-server -n argocd -ojsonpath="{.status.loadBalancer.ingress[0].hostname}"
  • AWSSecretName: The Secrets Manager secret name that was created to store ArgoCD credentials

The stack creates the following AWS Lambda functions:

  • <Stack name>-LambdaK8sGPTAgent-<auto-generated>
  • <Stack name>-RestartRollBackApplicationArgoCD-<auto-generated>
  • <Stack name>-ArgocdIncreaseMemory-<auto-generated>

The stack creates the following Amazon Bedrock agents:

  • ArgoCDAgent, with the following action groups:
    1. argocd-rollback
    2. argocd-restart
    3. argocd-memory-management
  • K8sGPTAgent, with the following action group:
    1. k8s-cluster-operations
  • CollaboratorAgent

The stack outputs the following, with the following agents associated to it:

  1. ArgoCDAgent
  2. K8sGPTAgent
  • LambdaK8sGPTAgentRole, AWS Identity and Access Management (IAM) role Amazon Resource Name (ARN) associated to the Lambda function handing interactions with the K8sGPT agent on the EKS cluster. This role ARN will be needed at a later stage of the configuration process.
  • K8sGPTAgentAliasId, ID of the K8sGPT Amazon Bedrock agent alias
  • ArgoCDAgentAliasId, ID of the ArgoCD Amazon Bedrock Agent alias
  • CollaboratorAgentAliasId, ID of the collaborator Amazon Bedrock agent alias

Assign appropriate permissions to enable K8sGPT Amazon Bedrock agent to access the EKS cluster

To enable the K8sGPT Amazon Bedrock agent to access the EKS cluster, you need to configure the appropriate IAM permissions using Amazon EKS access management APIs. This is a two-step process: first, you create an access entry for the Lambda function’s execution role (which you can find in the CloudFormation template output section), and then you associate the AmazonEKSViewPolicy to grant read-only access to the cluster. This configuration makes sure that the K8sGPT agent has the necessary permissions to monitor and analyze the EKS cluster resources while maintaining the principle of least privilege.

  1. Create an access entry for the Lambda function’s execution role
    export CFN_STACK_NAME=EKS-Troubleshooter
    	   export EKS_CLUSTER=PetSite
    
    export K8SGPT_LAMBDA_ROLE=`aws cloudformation describe-stacks --stack-name $CFN_STACK_NAME --query "Stacks[0].Outputs[?OutputKey=='LambdaK8sGPTAgentRole'].OutputValue" --output text`
    
    aws eks create-access-entry 
        --cluster-name $EKS_CLUSTER 
        --principal-arn $K8SGPT_LAMBDA_ROLE
  2. Associate the EKS view policy with the access entry
    aws eks associate-access-policy 
        --cluster-name $EKS_CLUSTER 
        --principal-arn  $K8SGPT_LAMBDA_ROLE
        --policy-arn arn:aws:eks::aws:cluster-access-policy/AmazonEKSClusterAdminPolicy 
        --access-scope type=cluster
  3. Verify the Amazon Bedrock agents. The CloudFormation template adds all three required agents. To view the agents, on the Amazon Bedrock console, under Builder tools in the navigation pane, select Agents, as shown in the following screenshot.

Bedrock agents

Perform Amazon EKS troubleshooting using the Amazon Bedrock agentic workflow

Now, test the solution. We explore the following two scenarios:

  1. The agent coordinates with the K8sGPT agent to provide insights into the root cause of a pod failure
  2. The collaborator agent coordinates with the ArgoCD agent to provide a response

Agent coordinates with K8sGPT agent to provide insights into the root cause of a pod failure

In this section, we examine a down alert for a sample application called memory-demo. We’re interested in the root cause of the issue. We use the following prompt: “We got a down alert for the memory-demo app. Help us with the root cause of the issue.”

The agent not only stated the root cause, but went one step further to potentially fix the error, which in this case is increasing memory resources to the application.

K8sgpt agent finding

Collaborator agent coordinates with ArgoCD agent to provide a response

For this scenario, we continue from the previous prompt. We feel the application wasn’t provided enough memory, and it should be increased to permanently fix the issue. We can also tell the application is in an unhealthy state in the ArgoCD UI, as shown in the following screenshot.

ArgoUI

Let’s now proceed to increase the memory, as shown in the following screenshot.

Interacting with agent to increase memory

The agent interacted with the argocd_operations Amazon Bedrock agent and was able to successfully increase the memory. The same can be inferred in the ArgoCD UI.

ArgoUI showing memory increase

Cleanup

If you decide to stop using the solution, complete the following steps:

  1. To delete the associated resources deployed using AWS CloudFormation:
    1. On the AWS CloudFormation console, choose Stacks in the navigation pane.
    2. Locate the stack you created during the deployment process (you assigned a name to it).
    3. Select the stack and choose Delete.
  2. Delete the EKS cluster if you created one specifically for this implementation.

Conclusion

By orchestrating multiple Amazon Bedrock agents, we’ve demonstrated how to build an AI-powered Amazon EKS troubleshooting system that simplifies Kubernetes operations. This integration of K8sGPT analysis and ArgoCD deployment automation showcases the powerful possibilities when combining specialized AI agents with existing DevOps tools. Although this solution represents advancement in automated Kubernetes operations, it’s important to remember that human oversight remains valuable, particularly for complex scenarios and strategic decisions.

As Amazon Bedrock and its agent capabilities continue to evolve, we can expect even more sophisticated orchestration possibilities. You can extend this solution to incorporate additional tools, metrics, and automation workflows to meet your organization’s specific needs.

To learn more about Amazon Bedrock, refer to the following resources:


About the authors

Vikram Venkataraman is a Principal Specialist Solutions Architect at Amazon Web Services (AWS). He helps customers modernize, scale, and adopt best practices for their containerized workloads. With the emergence of Generative AI, Vikram has been actively working with customers to leverage AWS’s AI/ML services to solve complex operational challenges, streamline monitoring workflows, and enhance incident response through intelligent automation.

Puneeth Ranjan Komaragiri is a Principal Technical Account Manager at Amazon Web Services (AWS). He is particularly passionate about monitoring and observability, cloud financial management, and generative AI domains. In his current role, Puneeth enjoys collaborating closely with customers, leveraging his expertise to help them design and architect their cloud workloads for optimal scale and resilience.

Sudheer Sangunni is a Senior Technical Account Manager at AWS Enterprise Support. With his extensive expertise in the AWS Cloud and big data, Sudheer plays a pivotal role in assisting customers with enhancing their monitoring and observability capabilities within AWS offerings.

Vikrant Choudhary is a Senior Technical Account Manager at Amazon Web Services (AWS), specializing in healthcare and life sciences. With over 15 years of experience in cloud solutions and enterprise architecture, he helps businesses accelerate their digital transformation initiatives. In his current role, Vikrant partners with customers to architect and implement innovative solutions, from cloud migrations and application modernization to emerging technologies such as generative AI, driving successful business outcomes through cloud adoption.

Read More

Host concurrent LLMs with LoRAX

Host concurrent LLMs with LoRAX

Businesses are increasingly seeking domain-adapted and specialized foundation models (FMs) to meet specific needs in areas such as document summarization, industry-specific adaptations, and technical code generation and advisory. The increased usage of generative AI models has offered tailored experiences with minimal technical expertise, and organizations are increasingly using these powerful models to drive innovation and enhance their services across various domains, from natural language processing (NLP) to content generation.

However, using generative AI models in enterprise environments presents unique challenges. Out-of-the-box models often lack the specific knowledge required for certain domains or organizational terminologies. To address this, businesses are turning to custom fine-tuned models, also known as domain-specific large language models (LLMs). These models are tailored to perform specialized tasks within specific domains or micro-domains. Similarly, organizations are fine-tuning generative AI models for domains such as finance, sales, marketing, travel, IT, human resources (HR), procurement, healthcare and life sciences, and customer service. Independent software vendors (ISVs) are also building secure, managed, multi-tenant generative AI platforms.

As the demand for personalized and specialized AI solutions grows, businesses face the challenge of efficiently managing and serving a multitude of fine-tuned models across diverse use cases and customer segments. From résumé parsing and job skill matching to domain-specific email generation and natural language understanding, companies often grapple with managing hundreds of fine-tuned models tailored to specific needs. This challenge is further compounded by concerns over scalability and cost-effectiveness. Traditional model serving approaches can become unwieldy and resource-intensive, leading to increased infrastructure costs, operational overhead, and potential performance bottlenecks, due to the size and hardware requirements to maintain a high-performing FM. The following diagram represents a traditional approach to serving multiple LLMs.

Fine-tuning LLMs is prohibitively expensive due to the hardware requirements and the costs associated with hosting separate instances for different tasks.

In this post, we explore how Low-Rank Adaptation (LoRA) can be used to address these challenges effectively. Specifically, we discuss using LoRA serving with LoRA eXchange (LoRAX) and Amazon Elastic Compute Cloud (Amazon EC2) GPU instances, allowing organizations to efficiently manage and serve their growing portfolio of fine-tuned models, optimize costs, and provide seamless performance for their customers.

LoRA is a technique for efficiently adapting large pre-trained language models to new tasks or domains by introducing small trainable weight matrices, called adapters, within each linear layer of the pre-trained model. This approach enables efficient adaptation with a significantly reduced number of trainable parameters compared to full model fine-tuning. Although LoRA allows for efficient adaptation, typical hosting of fine-tuned models merges the fine-tuned layers and base model weights together, so organizations with multiple fine-tuned variants normally must host each on separate instances. Because the resultant adapters are relatively small compared to the base model and are the last few layers of inference, this traditional custom model-serving approach is inefficient toward both resource and cost optimization.

A solution for this is provided by an open source software tool called LoRAX that provides weight-swapping mechanisms for inference toward serving multiple variants of a base FM. LoRAX takes away having to manually set up the adapter attaching and detaching process with the pre-trained FM while you’re swapping between inferencing fine-tuned models for different domain or instruction use cases.

With LoRAX, you can fine-tune a base FM for a variety of tasks, including SQL query generation, industry domain adaptations, entity extraction, and instruction responses. They can host the different variants on a single EC2 instance instead of a fleet of model endpoints, saving costs without impacting performance.

Why LoRAX for LoRA deployment on AWS?

The surge in popularity of fine-tuning LLMs has given rise to multiple inference container methods for deploying LoRA adapters on AWS. Two prominent approaches among our customers are LoRAX and vLLM.

vLLM offers rapid inference speeds and high-performance capabilities, making it well-suited for applications that demand heavy-serving throughput at low cost, making it a perfect fit especially when running multiple fine-tuned models with the same base model. You can run vLLM inference containers using Amazon SageMaker, as demonstrated in Efficient and cost-effective multi-tenant LoRA serving with Amazon SageMaker in the AWS Machine Learning Blog. However, the complexity of vLLM currently limits ease of implementing custom integrations for applications. vLLM also has limited quantization support.

For those seeking methods to build applications with strong community support and custom integrations, LoRAX presents an alternative. LoRAX is built upon Hugging Face’s Text Generation Interface (TGI) container, which is optimized for memory and resource efficiency when working with transformer-based models. Furthermore, LoRAX supports quantization methods such as Activation-aware Weight Quantization (AWQ) and Half-Quadratic Quantization (HQQ)

Solution overview

The LoRAX inference container can be deployed on a single EC2 G6 instance, and models and adapters can be loaded in using Amazon Simple Storage Service (Amazon S3) or Hugging Face. The following diagram is the solution architecture.

Prerequisites

For this guide, you need access to the following prerequisites:

  • An AWS account
  • Proper permissions to deploy EC2 G6 instances. LoRAX is built with the intention of using NVIDIA CUDA technology, and the G6 family of EC2 instances is the most cost-efficient instance types with the more recent NVIDIA CUDA accelerators. Specifically, the G6.xlarge is the most cost-efficient for the purposes of this tutorial, at the time of this writing. Make sure that quota increases are active prior to deployment.
  • (Optional) A Jupyter notebook within Amazon SageMaker Studio or SageMaker Notebook Instances. After your requested quotas are applied to your account, you can use the default Studio Python 3 (Data Science) image with an ml.t3.medium instance to run the optional notebook code snippets. For the full list of available kernels, refer to available Amazon SageMaker kernels.

Walkthrough

This post walks you through creating an EC2 instance, downloading and deploying the container image, and hosting a pre-trained language model and custom adapters from Amazon S3. Follow the prerequisite checklist to make sure that you can properly implement this solution.

Configure server details

In this section, we show how to configure and create an EC2 instance to host the LLM. This guide uses the EC2 G6 instance class, and we deploy a 15 GB Llama2 7B model. It’s recommended to have about 1.5x the GPU memory capacity of the model to swiftly run inference on a language model. GPU memory specifications can be found at Amazon ECS task definitions for GPU workloads.

You have the option to quantize the model. Quantizing a language model reduces the model weights to a size of your choosing. For example, the LLM we use is Meta’s Llama2 7b, which by default has a weight size of fp16, or 16-bit floating point. We can convert the model weights to int8 or int4 (8- or 4-bit integers) to shrink the memory footprint of the model by 50% and 25% respectively. In this guide, we use the default fp16 representation of Meta’s Llama2 7B, so we require an instance type with at least 22 GB of GPU memory, or VRAM.

Depending on the language model specifications, we need to adjust the amount of Amazon Elastic Block Store (Amazon EBS) storage to properly store the base model and adapter weights.

To set up your inference server, follow these steps:

  1. On the Amazon EC2 console, choose Launch instances, as shown in the following screenshot.
  2. For Name, enter LoRAX - Inference Server.
  3. To open AWS CloudShell, on the bottom left of the AWS Management Console choose CloudShell, as shown in the following screenshot.
  4. Paste the following command into CloudShell and copy the resulting text, as shown in the screenshot that follows. This is the Amazon Machine Image (AMI) ID you will use.
    aws ec2 describe-images --filters 'Name=name,Values=Deep Learning OSS Nvidia Driver AMI GPU PyTorch 2.5*(Ubuntu*' 'Name=state,Values=available' --query 'sort_by(Images, &CreationDate)[-1].ImageId' --output text

  5. In the Application and OS Images (Amazon Machine Image) search bar, enter ami-0d2047d61ff42e139 and press Enter on your keyboard.
  6. In Selected AMI, enter the AMI ID that you got from the CloudShell command. In Community AMIs, search for the Deep Learning OSS Nvidia Driver AMI GPU PyTorch 2.5.1 (Ubuntu 22.04) AMI.
  7. Choose Select, as shown in the following screenshot.
  8. Specify the Instance type as g6.xlarge. Depending on the size of the model, you can increase the size of the instance to accommodate your For information on GPU memory per instance type, visit Amazon EC2 task definitions for GPU workloads.
  9. (Optional) Under Key pair (login), create a new key pair or select an existing key pair if you want to use one to connect to the instance using Secure Shell (SSH).
  10. In Network settings, choose Edit, as shown in the following screenshot.
  11. Leave default settings for VPC, Subnet, and Auto-assign public IP.
  12. Under Firewall (security groups), for Security group name, enter Inference Server Security Group.
  13. For Description, enter Security Group for Inference Server.
  14. Under Inbound Security Group Rules, edit Security group rule 1 to limit SSH traffic to your IP address by changing Source type to My IP.
  15. Choose Add security group rule.
  16. Configure Security group rule 2 by changing Type to All ICMP-IPv4 and Source Type to My IP. This is to make sure the server is only reachable to your IP address and not bad actors.
  17. Under Configure storage, set Root volume size to 128 GiB to allow enough space for storing base model and adapter weights. For larger models and more adapters, you might need to increase this value accordingly. The model card available with most open source models details the size of the model weights and other usage information. We suggest 128 GB for the starting storage size here because downloading multiple adapters along with the model weights can add up very quickly. Factoring the operating system space, downloaded drivers and dependencies, and various project files, 128 GB is a safer storage size to start off with before adjusting up or down. After setting the desired storage space, select the Advanced details dropdown menu.
  18. Under IAM instance profile, either select or create an IAM instance profile that has S3 read access enabled.
  19. Choose Launch instance.
  20. When the instance finishes launching, select either SSH or Instance connect to connect to your instance and enter the following commands:
    sudo apt update
    sudo systemctl start docker 
    sudo nvidia-ctk runtime configure --runtime=docker 
    sudo systemctl restart docker

Install container and launch server

The server is now properly configured to load and run the serving software.

Enter the following commands to download and deploy the LoRAX Docker container image. For more information, refer to Run container with base LLM. Specify a model from Hugging Face or the storage volume and load the model for inference. Replace the parameters in the commands to suit your requirements (for example, <huggingface-access-token>).

Adding the -d tag as shown will run the download and installation process in the background. It can take up to 30 minutes until properly configured. Using the docker commands docker ps and docker logs <container-name>, you can view the progress of the Docker container and observe when the container is finished setting up. docker logs <container-name> --follow will continue streaming the new output from the container for continuous monitoring.

model=meta-llama/Llama-2-7b-hf
volume=$PWD/data
token=<huggingface-access-token>

docker run -d --gpus all --shm-size 1g -p 8080:80 -v $volume:/data -e HUGGING_FACE_HUB_TOKEN=$token ghcr.io/predibase/lorax:main --model-id $model

Test server and adapters

By running the container as a background process using the -d tag, you can prompt the server with incoming requests. By specifying the model-id as a Hugging Face model ID, LoRAX loads the model into memory directly from Hugging Face.

This isn’t recommended for production because relying on Hugging Face introduces yet another point of failure in case the model or adapter is unavailable. It’s recommended that models be stored locally either in Amazon S3, Amazon EBS, or Amazon Elastic File System (Amazon EFS) for consistent deployments. Later in this post, we discuss a way to load models and adapters from Amazon S3 as you go.

LoRAX also can pull adapter files from Hugging Face at runtime. You can use this capability by adding adapter_id and adapter_source within the body of the request. The first time a new adapter is requested, it can take some time to load into the server, but requests afterwards will load from memory.

  1. Enter the following command to prompt the base model:
    curl 127.0.0.1:8080/generate  
    -X POST  
    d '{ 
      "inputs": "why is the sky blue", 
      "parameters": { 
        "max_new_tokens": 6 
      } 
    }'  
    H 'Content-Type: application/json'

  2. Enter the following command to prompt the base model with the specified adapter:
    curl 127.0.0.1:8080/generate 
    -X POST  
    -d '{ 
      "inputs": "why is the sky blue", 
      "parameters": { 
        "max_new_tokens": 64, 
        "adapter_id": "vineetsharma/qlora-adapter-Llama-2-7b-hf-databricks-dolly-15k", 
        "adapter_source": "hub" 
      } 
    }'  
    -H 'Content-Type: application/json'

[Optional] Create custom adapters with SageMaker training and PEFT

Typical fine-tuning jobs for LLMs merge the adapter weights with the original base model, but using software such as Hugging Face’s PEFT library allows for fine-tuning with adapter separation.

Follow the steps outlined in this AWS Machine Learning blog post to fine-tune Meta’s Llama 2 and get the separated LoRA adapter in Amazon S3.

[Optional] Use adapters from Amazon S3

LoRAX can pull adapter files from Amazon S3 at runtime. You can use this capability by adding adapter_id and adapter_source within the body of the request. The first time a new adapter is requested, it can take some time to load into the server, but requests afterwards will load from server memory. This is the optimal method when running LoRAX in production environments compared to importing from Hugging Face because it doesn’t involve runtime dependencies.

curl 127.0.0.1:8080/generate 
-X POST 
-d '{
  "inputs": "What is process mining?",
  "parameters": {
    "max_new_tokens": 64,
    "adapter_id": "<your-adapter-s3-bucket-path>",
    "adapter_source": "s3"
  }
}' 
-H 'Content-Type: application/json'

[Optional] Use custom models from Amazon S3

LoRAX also can load custom language models from Amazon S3. If the model architecture is supported in the LoRAX documentation, you can specify a bucket name to pull the weights from, as shown in the following code example. Refer to the previous optional section on separating adapter weights from base model weights to customize your own language model.

volume=$PWD/data
bucket_name=<s3-bucket-name>
model=<model-directory-name>

docker run --gpus all --shm-size 1g -e PREDIBASE_MODEL_BUCKET=$bucket_name -p 8080:80 -v $volume:/data ghcr.io/predibase/lorax:latest --model-id $model

Reliable deployments using Amazon S3 for model and adapter storage

Storing models and adapters in Amazon S3 offers a more dependable solution for consistent deployments compared to relying on third-party services such as Hugging Face. By managing your own storage, you can implement robust protocols so your models and adapters remain accessible when needed. Additionally, you can use this approach to maintain version control and isolate your assets from external sources, which is crucial for regulatory compliance and governance.

For even greater flexibility, you can use virtual file systems such as Amazon EFS or Amazon FSx for Lustre. You can use these services to mount the same models and adapters across multiple instances, facilitating seamless access in environments with auto scaling setups. This means that all instances, whether scaling up or down, have uninterrupted access to the necessary resources, enhancing the overall reliability and scalability of your deployments.

Cost comparison and advisory on scaling

Using the LoRAX inference containers on EC2 instances means that you can drastically reduce the costs of hosting multiple fine-tuned versions of language models by storing all adapters in memory and swapping dynamically at runtime. Because LLM adapters are typically a fraction of the size of the base model, you can efficiently scale your infrastructure according to server usage and not by individual variant utilization. LoRA adapters are usually anywhere from 1/10th to 1/4th the size of the base model. But, again, it depends on the implementation and complexity of the task that the adapter is being trained on or for. Regular adapters can be as large as the base model.

In the preceding example, the model adapters resultant from the training methods were 5 MB.

Though this storage amount depends on the specific model architecture, you can dynamically swap up to thousands of fine-tuned variants on a single instance with little to no change to inference speed. It’s recommended to use instances with around 150% GPU memory to model and variant size to account for model, adapter, and KV cache (or attention cache) storage in VRAM. For GPU memory specifications, refer to Amazon ECS task definitions for GPU workloads.

Depending on the chosen base model and the number of fine-tuned adapters, you can train and deploy hundreds or thousands of customized language models sharing the same base model using LoRAX to dynamically swap out adapters. With adapter swapping mechanisms, if you have five fine-tuned variants, you can save 80% on hosting costs because all the custom adapters can be used in the same instance.

Launch templates in Amazon EC2 can be used to deploy multiple instances, with options for load balancing or auto scaling. You can additionally use AWS Systems Manager to deploy patches or changes. As discussed previously, a shared file system can be used across all deployed EC2 resources to store the LLM weights for multiple adapters, resulting in faster translation to the instances compared to Amazon S3. The difference between using a shared file system such as Amazon EFS over direct Amazon S3 access is the number of steps to load the model weights and adapters into memory. With Amazon S3, the adapter and weights need to be transferred to the local file system of the instance before being loaded. However, shared file systems don’t need to transfer the file locally and can be loaded directly. There are implementation tradeoffs that should be taken into consideration. You can also use Amazon API Gateway as an API endpoint for REST-based applications.

Host LoRAX servers for multiple models in production

If you intend to use multiple custom FMs for specific tasks with LoRAX, follow this guide for hosting multiple variants of models. Follow this AWS blog on hosting text classification with BERT to perform task routing between the expert models. For an example implementation of efficient model hosting using adapter swapping, refer to LoRA Land, which was released by Predibase, the organization responsible for LoRAX. LoRA Land is a collection of 25 fine-tuned variants of Mistral.ai’s Mistral-7b LLM that collectively outperforms top-performing LLMs hosted behind a single endpoint. The following diagram is the architecture.

Cleanup

In this guide, we created security groups, an S3 bucket, an optional SageMaker notebook instance, and an EC2 inference server. It’s important to terminate resources created during this walkthrough to avoid incurring additional costs:

  1. Delete the S3 bucket
  2. Terminate the EC2 inference server
  3. Terminate the SageMaker notebook instance

Conclusion

After following this guide, you can set up an EC2 instance with LoRAX for language model hosting and serving, storing and accessing custom model weights and adapters in Amazon S3, and manage pre-trained and custom models and variants using SageMaker. LoRAX allows for a cost-efficient approach for those who want to host multiple language models at scale. For more information on working with generative AI on AWS, refer to Announcing New Tools for Building with Generative AI on AWS.


About the Authors

John Kitaoka is a Solutions Architect at Amazon Web Services, working with government entities, universities, nonprofits, and other public sector organizations to design and scale artificial intelligence solutions. With a background in mathematics and computer science, John’s work covers a broad range of ML use cases, with a primary interest in inference, AI responsibility, and security. In his spare time, he loves woodworking and snowboarding.

Varun Jasti is a Solutions Architect at Amazon Web Services, working with AWS Partners to design and scale artificial intelligence solutions for public sector use cases to meet compliance standards. With a background in Computer Science, his work covers broad range of ML use cases primarily focusing on LLM training/inferencing and computer vision. In his spare time, he loves playing tennis and swimming.

Baladithya Balamurugan is a Solutions Architect at AWS focused on ML deployments for inference and utilizing AWS Neuron to accelerate training and inference. He works with customers to enable and accelerate their ML deployments on services such as AWS Sagemaker and AWS EC2. Based out of San Francisco, Baladithya enjoys tinkering, developing applications and his homelab in his free time.

Read More

Build a computer vision-based asset inventory application with low or no training

Build a computer vision-based asset inventory application with low or no training

Keeping an up-to-date asset inventory with real devices deployed in the field can be a challenging and time-consuming task. Many electricity providers use manufacturer’s labels as key information to link their physical assets within asset inventory systems. Computer vision can be a viable solution to speed up operator inspections and reduce human errors by automatically extracting relevant data from the label. However, building a standard computer vision application capable of managing hundreds of different types of labels can be a complex and time-consuming endeavor.

In this post, we present a solution using generative AI and large language models (LLMs) to alleviate the time-consuming and labor-intensive tasks required to build a computer vision application, enabling you to immediately start taking pictures of your asset labels and extract the necessary information to update the inventory using AWS services like AWS Lambda, Amazon Bedrock, Amazon Titan, Anthropic’s Claude 3 on Amazon Bedrock, Amazon API Gateway, AWS Amplify, Amazon Simple Storage Service (Amazon S3), and Amazon DynamoDB.

LLMs are large deep learning models that are pre-trained on vast amounts of data. They are capable of understanding and generating human-like text, making them incredibly versatile tools with a wide range of applications. This approach harnesses the image understanding capabilities of Anthropic’s Claude 3 model to extract information directly from photographs taken on-site, by analyzing the labels present in those field images.

Solution overview

The AI-powered asset inventory labeling solution aims to streamline the process of updating inventory databases by automatically extracting relevant information from asset labels through computer vision and generative AI capabilities. The solution uses various AWS services to create an end-to-end system that enables field technicians to capture label images, extract data using AI models, verify the accuracy, and seamlessly update the inventory database.

The following diagram illustrates the solution architecture.

Architecure diagram

The workflow consists of the following steps:

  1. The process starts when an operator takes and uploads a picture of the assets using the mobile app.
  2. The operator submits a request to extract data from the asset image.
  3. A Lambda function retrieves the uploaded asset image from the uploaded images data store.
  4. The function generates the asset image embeddings (vector representations of data) invoking the Amazon Titan Multimodal Embeddings G1 model.
  5. The function performs a similarity search in the knowledge base to retrieve similar asset labels. The most relevant results will augment the prompt as similar examples to improve the response accuracy, and are sent with the instructions to the LLM to extract data from the asset image.
  6. The function invokes Anthropic’s Claude 3 Sonnet on Amazon Bedrock to extract data (serial number, vendor name, and so on) using the augmented prompt and the related instructions.
  7. The function sends the response to the mobile app with the extracted data.
  8. The mobile app verifies the extracted data and assigns a confidence level. It invokes the API to process the data. Data with high confidence will be directly ingested into the system.
  9. A Lambda function is invoked to update the asset inventory database with the extracted data if the confidence level has been indicated as high by the mobile app.
  10. The function sends data with low confidence to Amazon Augmented AI (Amazon A2I) for further processing.
  11. The human reviewers from Amazon A2I validate or correct the low-confidence data.
  12. Human reviewers, such as subject matter experts, validate the extracted data, flag it, and store it in an S3 bucket.
  13. A rule in Amazon EventBridge is defined to trigger a Lambda function to get the information from the S3 bucket when the Amazon A2I workflow processing is complete.
  14. A Lambda function processes the output of the Amazon A2I workflow by loading data from the JSON file that stored the backend operator-validated information.
  15. The function updates the asset inventory database with the new extracted data.
  16. The function sends the extracted data marked as new by human reviewers to an Amazon Simple Queue Service (Amazon SQS) queue to be further processed.
  17. Another Lambda function fetches messages from the queue and serializes the updates to the knowledge base database.
  18. The function generates the asset image embeddings by invoking the Amazon Titan Multimodal Embeddings G1 model.
  19. The function updates the knowledge base with the generated embeddings and notifies other functions that the database has been updated.

Let’s look at the key components of the solution in more detail.

Mobile app

The mobile app component plays a crucial role in this AI-powered asset inventory labeling solution. It serves as the primary interface for field technicians on their tablets or mobile devices to capture and upload images of asset labels using the device’s camera. The implementation of the mobile app includes an authentication mechanism that will allow access only to authenticated users. It’s also built using a serverless approach to minimize recurring costs and have a highly scalable and robust solution.

The mobile app has been built using the following services:

  • AWS Amplify – This provides a development framework and hosting for the static content of the mobile app. By using Amplify, the mobile app component benefits from features like seamless integration with other AWS services, offline capabilities, secure authentication, and scalable hosting.
  • Amazon Cognito – This handles user authentication and authorization for the mobile app.

AI data extraction service

The AI data extraction service is designed to extract critical information, such as manufacturer name, model number, and serial number from images of asset labels.

To enhance the accuracy and efficiency of the data extraction process, the service employs a knowledge base comprising sample label images and their corresponding data fields. This knowledge base serves as a reference guide for the AI model, enabling it to learn and generalize from labeled examples to new label formats effectively. The knowledge base is stored as vector embeddings in a high-performance vector database: Meta’s FAISS (Facebook AI Similarity Search), hosted on Amazon S3.

Embeddings are dense numerical representations that capture the essence of complex data like text or images in a vector space. Each data point is mapped to a vector or ordered list of numbers, where similar data points are positioned closer together. This embedding space allows for efficient similarity calculations by measuring the distance between vectors. Embeddings enable machine learning (ML) models to effectively process and understand relationships within complex data, leading to improved performance on various tasks like natural language processing and computer vision.

The following diagram illustrates an example workflow.

Vector embeddings

The vector embeddings are generated using Amazon Titan, a powerful embedding generation service, which converts the labeled examples into numerical representations suitable for efficient similarity searches. The workflow consists of the following steps:

  1. When a new asset label image is submitted for processing, the AI data extraction service, through a Lambda function, retrieves the uploaded image from the bucket where it was uploaded.
  2. The Lambda function performs a similarity search using Meta’s FAISS vector search engine. This search compares the new image against the vector embeddings in the knowledge base generated by Amazon Titan Multimodal Embeddings invoked through Amazon Bedrock, identifying the most relevant labeled examples.
  3. Using the augmented prompt with context information from the similarity search, the Lambda function invokes Amazon Bedrock, specifically Anthropic’s Claude 3, a state-of-the-art generative AI model, for image understanding and optical character recognition (OCR) tasks. By using the similar examples, the AI model can more accurately extract and interpret the critical information from the new asset label image.
  4. The response is then sent to the mobile app to be confirmed by the field technician.

In this phase, the AWS services used are:

  • Amazon Bedrock – A fully managed service that offers a choice of high-performing foundation models (FMs) from leading AI companies like AI21 Labs, Anthropic, Cohere, Meta, Mistral AI, Stability AI, and Amazon through a single API, along with a broad set of capabilities.
  • AWS Lambda – A serverless computing service that allows you to run your code without the need to provision or manage physical servers or virtual machines. A Lambda function runs the data extraction logic and orchestrates the overall data extraction process.
  • Amazon S3 – A storage service offering industry-leading durability, availability, performance, security, and virtually unlimited scalability at low costs. It’s used to store the asset images uploaded by the field technicians.

Data verification

Data verification plays a crucial role in maintaining the accuracy and reliability of the extracted data before updating the asset inventory database and is included in the mobile app.

The workflow consists of the following steps:

  1. The extracted data is shown to the field operator.
  2. If the field operator determines that the extracted data is accurate and matches an existing asset label in the knowledge base, they can confirm the correctness of the extraction; if not, they can update the values directly using the app.
  3. When the field technician confirms the data is correct, that information is automatically forwarded to the backend review component.

Data verification uses the following AWS services:

  • Amazon API Gateway – A secure and scalable API gateway that exposes the data verification component’s functionality to the mobile app and other components.
  • AWS Lambda – Serverless functions for implementing the verification logic and routing data based on confidence levels.

Backend review

This component assesses the discrepancy of automatically identified data by the AI data extraction service and the final data approved by the field operator and computes the difference. If the difference is below a configured threshold, the data is sent to update the inventory database; otherwise a human review process is engaged:

  1. Subject matter experts asynchronously review flagged data entries on the Amazon A2I console.
  2. Significant discrepancies are marked to update the generative AI’s knowledge base.
  3. Minor OCR errors are corrected without updating the AI model’s knowledge base.

The backend review component uses the following AWS services:

  • Amazon A2I – A service that provides a web-based interface for human reviewers to inspect and correct the extracted data and asset label images.
  • Amazon EventBridge – A serverless service that uses events to connect application components together. When the Amazon A2I human workflow is complete, EventBridge is used to detect this event and trigger a Lambda function to process the output data.
  • Amazon S3 – Object storage to save the marked information in charge of Amazon A2I.

Inventory database

The inventory database component plays a crucial role in storing and managing the verified asset data in a scalable and efficient manner. Amazon DynamoDB, a fully managed NoSQL database service from AWS, is used for this purpose. DynamoDB is a serverless, scalable, and highly available key-value and document database service. It’s designed to handle massive amounts of data and high traffic workloads, making it well-suited for storing and retrieving large-scale inventory data.

The verified data from the AI extraction and human verification processes is ingested into the DynamoDB table. This includes data with high confidence from the initial extraction, as well as data that has been reviewed and corrected by human reviewers.

Knowledge base update

The knowledge base update component enables continuous improvement and adaptation of the generative AI models used for asset label data extraction:

  1. During the backend review process, human reviewers from Amazon A2I validate and correct the data extracted from asset labels by the AI model.
  2. The corrected and verified data, along with the corresponding asset label images, is marked as new label examples if not already present in the knowledge base.
  3. A Lambda function is triggered to update the asset inventory and send the new labels to the FIFO (First-In-First-Out) queue.
  4. A Lambda function processes the messages in the queue, updating the knowledge base vector store (S3 bucket) with the new label examples.
  5. The update process generates the vector embeddings by invoking the Amazon Titan Multimodal Embeddings G1 model exposed by Amazon Bedrock and storing the embeddings in a Meta’s FAISS database in Amazon S3.

The knowledge base update process makes sure that the solution remains adaptive and continuously improves its performance over time, reducing the likelihood of unseen label examples and the involvement of subject matter experts to correct the extracted data.

This component uses the following AWS services:

  • Amazon Titan Multimodal Embeddings G1 model – This model generates the embeddings (vector representations) for the new asset images and their associated data.
  • AWS Lambda – Lambda functions are used to update the asset inventory database, to send and process the extracted data to the FIFO queue, and to update the knowledge base in case of new unseen labels.
  • Amazon SQS – Amazon SQS offers fully managed message queuing for microservices, distributed systems, and serverless applications. The extracted data marked as new by human reviewers is sent to an SQS FIFO (First-In-First-Out) queue. This makes sure that the messages are processed in the correct order; FIFO queues preserve the order in which messages are sent and received. If you use a FIFO queue, you don’t have to place sequencing information in your messages.
  • Amazon S3 – The knowledge base is stored in an S3 bucket, with the newly generated embeddings. This allows the AI system to improve its accuracy for future asset label recognition tasks.

Navigation flow

This section explains how users interact with the system and how data flows between different components of the solution. We’ll examine each key component’s role in the process, from initial user access through data verification and storage.

Mobile app

The end user accesses the mobile app using the browser included in the handheld device. The application URL to access the mobile app is available after you have deployed the frontend application. Using the browser on a handheld device or your PC, browse to the application URL address, where a login window will appear. Because this is a demo environment, you can register on the application by following the automated registration workflow implemented through Amazon Cognito and choosing Create Account, as shown in the following screenshot.

During the registration process, you must provide a valid email address that will be used to verify your identity, and define a password. After you’re registered, you can log in with your credentials.

After authentication is complete, the mobile app appears, as shown in the following screenshot.

The process to use the app is the following:

  1. Use the camera button to capture a label image.
  2. The app facilitates the upload of the captured image to a private S3 bucket specifically designated for storing asset images. S3 Transfer Acceleration is a separate AWS service that can be integrated with Amazon S3 to improve the transfer speed of data uploads and downloads. It works by using AWS edge locations, which are globally distributed and closer to the client applications, as intermediaries for data transfer. This reduces the latency and improves the overall transfer speed, especially for clients that are geographically distant from the S3 bucket’s AWS Region.
  3. After the image is uploaded, the app sends a request to the AI data extraction service, triggering the subsequent process of data extraction and analysis. The extracted data returned by the service is displayed and editable within the form, as described later in this post. This allows for data verification.

AI data extraction service

This module uses Anthropic’s Claude 3 FM, a multimodal system capable of processing both images and text. To extract relevant data, we employ a prompt technique that uses samples to guide the model’s output. Our prompt includes two sample images along with their corresponding extracted text. The model identifies which sample image most closely resembles the one we want to analyze and uses that sample’s extracted text as a reference to determine the relevant information in the target image.

We use the following prompt to achieve this result:

{
 "role": "user",
 "content": [
 {
 "type": "text",
 "text": "first_sample_image:",
 },
 {
 "type": "image",
 "source": {
 "type": "base64",
 "media_type": "image/jpeg",
 "data": first_sample_encoded_image,
 },
 },
 {
 "type": "text",
 "text": "target_image:",
 },
 {
 "type": "image",
 "source": {
 "type": "base64",
 "media_type": "image/jpeg",
 "data": encoded_image,
 },
 },
 {"type": "text",
 "text": f"""
 answer the question using the following example as reference.
 match exactly the same set of fields and information as in the provided example.
 
 <example>
 analyze first_sample_image and answer with a json file with the following information: Model, SerialN, ZOD.
 answer only with json.
 
 Answer:
 {first_sample_answer}
 </example>
 
 <question>
 analyze target_image and answer with a json file with the following information: Model, SerialN, ZOD.
 answer only with json.
 
 Answer:
 </question>
 """},
 
 ],
 }

In the preceding code, first_sample_encoded_image and first_sample_answer are the reference image and expected output, respectively, and encoded_image contains the new image that has to be analyzed.

Data verification

After the image is processed by the AI data extraction service, the control goes back to the mobile app:

  1. The mobile app receives the extracted data from the AI data extraction service, which has processed the uploaded asset label image and extracted relevant information using computer vision and ML models.
  2. Upon receiving the extracted data, the mobile app presents it to the field operator, allowing them to review and confirm the accuracy of the information (see the following screenshot). If the extracted data is correct and matches the physical asset label, the technician can submit a confirmation through the app, indicating that the data is valid and ready to be inserted into the asset inventory database.
  3. If the field operator sees any discrepancies or errors in the extracted data compared to the actual asset label, they have the option to correct those values.
  4. The values returned by the AI data extraction service and the final values validated by the field operators are sent to the backend review service.

Backend review

This process is implemented using Amazon A2I:

  1. A distance metric is computed to evaluate the difference between what the data extraction service has identified and the correction performed by the on-site operator.
  2. If the difference is larger than a predefined threshold, the image and the operator modified data are submitted to an Amazon A2I workflow, creating a human-in-the-loop request.
  3. When a backend operator becomes available, the new request is assigned.
  4. The operator uses the Amazon A2I provided web interface, as depicted in the following screenshot, to check what the on-site operator has done and, if it’s found that this type of label is not included in the knowledge base, can decide to add it by entering Yes in the Add to Knowledge Base field.App screenshot
  1. When the A2I process is complete, a Lambda function is triggered.
  2. This Lambda function stores the information in the inventory database and verifies whether this image also needs to be used to update the knowledge base.
  3. If this is the case, the Lambda function files the request with the relevant data in an SQS FIFO queue.

Inventory database

To keep this solution as simple as possible while covering the required capability, we selected DynamoDB as our inventory database. This is a no SQL database, and we will store data in a table with the following information:

  • Manufacturers, model ID, and the serial number that is going to be the key of the table
  • A link to the picture containing the label used during the on-site inspection

DynamoDB offers an on-demand pricing model that allows costs to directly depend on actual database usage.

Knowledge base database

The knowledge base database is stored as two files in an S3 bucket:

  • The first file is a JSON array containing the metadata (manufacturer, serial number, model ID, and link to reference image) for each of the knowledge base entries
  • The second file is a FAISS database containing an index with the embedding for each of the images included in the first file

To be able to minimize race conditions when updating the database, a single Lambda function is configured as the consumer of the SQS queue. The Lambda function extracts the information about the link to the reference image and the metadata, certified by the back-office operator, updates both files, and stores the new version in the S3 bucket.

In the following sections, we create a seamless workflow for field data collection, AI-powered extraction, human validation, and inventory updates.

Prerequisites

You need the following prerequisites before you can proceed with solution. For this post, we use the us-east-1 Region. You will also need an AWS Identity and Access Management (IAM) user with administrative privileges to deploy the required components and a development environment with access to AWS resources already configured.

For the development environment, you can use an Amazon Elastic Compute Cloud (Amazon EC2) instance (choose select at least a t3.small instance type in order to be able to build the web application) or use a development environment of your own choice. Install Python 3.9 and install and configure AWS Command Line Interface (AWS CLI).

You will also need to install the Amplify CLI. Refer to Set up Amplify CLI for more information.

The next step is to enable the models used in this workshop in Amazon Bedrock. To do this, complete the following steps:

    1. On the Amazon Bedrock console, choose Model access in the navigation pane.
    2. Choose Enable specific models.

Model access

    1.  Select all Anthropic and Amazon models and choose Next


A new window will list the requested models.

  1. Confirm that the Amazon Titan models and Anthropic Claude models are on this list and choose Submit.

The next step is to create an Amazon SageMaker Ground Truth private labeling workforce that will be used to perform back-office activities. If you don’t already have a private labeling workforce in your account, you can create one following these steps:

  1. On the SageMaker console, under Ground Truth in the navigation pane, choose Labeling workforce.
  1. On the Private tab, choose Create private team.
  2. Provide a name to the team and your organization, and insert your email address (must be a valid one) for both Email addresses and Contact email.
  3. Leave all the other options as default.
  4. Choose Create private team.
  5. After your workforce is created, copy your workforce Amazon Resource Name (ARN) on the Private tab and save for later use.
    Lastly, build a Lambda layer that includes two Python libraries. To build this layer, connect to your development environment and issue the following commands:
git clone https://github.com/aws-samples/Build_a_computer_vision_based_asset_inventory_app_with_low_no_training
cd Build_a_computer_vision_based_asset_inventory_app_with_low_no_training
bash build_lambda_layer.sh

You should get an output similar to the following screenshot.


Save theLAMBDA_LAYER_VERSION_ARN for later use.

You are now ready to deploy the backend infrastructure and frontend application.

Deploy the backend infrastructure

The backend is deployed using AWS CloudFormation to build the following components:

      • An API Gateway to act as an integration layer between the frontend application and the backend
      • An S3 bucket to store the uploaded images and the knowledge base
      • Amazon Cognito to allow end-user authentication
      • A set of Lambda functions to implement backend services
      • An Amazon A2I workflow to support the back-office activities
      • An SQS queue to store knowledge base update requests
      • An EventBridge rule to trigger a Lambda function as soon as an Amazon A2I workflow is complete
      • A DynamoDB table to store inventory data
      • IAM roles and policies to allow access to the different components to interact with each other and also access Amazon Bedrock for generative AI-related tasks

Download the CloudFormation template, then complete the following steps:

      1. On the AWS CloudFormation console, chose Create stack.
      2. Choose Upload a template file and choose Choose file to upload the downloaded template.
      3. Choose Next.
      4. For Stack name, enter a name (for example, asset-inventory).
      5. For A2IWorkforceARN, enter the ARN of the labeling workforce you identified.
      6. For LambdaLayerARN, enter the ARN of the Lambda layer version you uploaded.
      7. Choose Next and Next again.
      8. Acknowledge that AWS CloudFormation is going to create IAM resources and choose Submit.


Wait until the CloudFormation stack creation process is complete; it will take about 15–20 minutes. You can then view the stack details.


Note the values on the Outputs tab. You will use the output data later to complete the configuration of the frontend application.

Deploy the frontend application

In this section, you will build the web application that is used by the on-site operator to collect a picture of the labels, submit it to the backend services to extract relevant information, validate or correct returned information, and submit the validated or corrected information to be stored in the asset inventory.

The web application uses React and will use the Amplify JavaScript Library.

Amplify provides several products to build full stack applications:

      • Amplify CLI – A simple command line interface to set up the needed services
      • Amplify Libraries – Use case-centric client libraries to integrate the frontend code with the backend
      • Amplify UI Components – UI libraries for React, React Native, Angular, Vue, and Flutter

In this example, you have already created the needed services with the CloudFormation template, so the Amplify CLI will deploy the application on the Amplify provided hosting service.

      1. Log in to your development environment and download the client code from the GitHub repository using the following command:
git clone https://github.com/aws-samples/Build_a_computer_vision_based_asset_inventory_app_with_low_no_training
cd Build_a_computer_vision_based_asset_inventory_app_with_low_no_training
cd webapp
      1. If you’re running on AWS Cloud9 as a development environment, issue the following command to let the Amplify CLI use AWS Cloud9 managed credentials:
ln -s $HOME/.aws/credentials $HOME/.aws/config
      1. Now you can initialize the Amplify application using the CLI:
amplify init

After issuing this command, the Amplify CLI will ask you for some parameters.

      1. Accept the default values by pressing Enter for each question.
      2. The next step is to modify amplifyconfiguration.js.template (you can find it in folder webapp/src) with the information collected from the output of the CloudFormation stack and save as amplifyconfiguration.js. This file tells Amplify which is the correct endpoint to use to interact with the backend resources created for this application. The information required is as follows:
        1. aws_project_region and aws_cognito_region – To be filled in with the Region in which you ran the CloudFormation template (for example, us-east-1).
        2. aws_cognito_identity_pool_id, aws_user_pools_id, aws_user_pools_web_client_id – The values from the Outputs tab of the CloudFormation stack.
        3. Endpoint – In the API section, update the endpoint with the API Gateway URL listed on the Outputs tab of the CloudFormation stack.
      3. You now need to add a hosting option for the single-page application. You can use Amplify to configure and host the web application by issuing the following command:
amplify hosting add

The Amplify CLI will ask you which type of hosting service you prefer and what type of deployment.

      1. Answer both questions by accepting the default option by pressing Enter key.
      2. You now need to install the JavaScript libraries used by this application using npm:
npm install
      1. Deploy the application using the following command:
amplify publish
      1. Confirm you want to proceed by entering Y.

At the end of the deployment phase, Amplify will return the public URL of the web application, similar to the following:

...
Find out more about deployment here:

https://cra.link/deployment

 Zipping artifacts completed.
 Deployment complete!
https://dev.xxx.amplifyapp.com

Now you can use your browser to connect to the application using the provided URL.

Clean up

To delete the resources used to build this solution, complete the following steps:

      1. Delete the Amplify application:
        1. Issue the following command:
amplify delete
        1. Confirm that you are willing to delete the application.
      1. Remove the backend resources:
        1. On the AWS CloudFormation console, choose Stacks in the navigation pane.
        2. Select the stack and choose Delete.
        3. Choose Delete to confirm.

At the end of the deletion process, you should not see the entry related to asset-inventory on the list of stacks.

      1. Remove the Lambda layer by issuing the following command in the development environment:
aws lambda delete-layer-version —layer-name asset-inventory-blog —version-number 1
      1. If you created a new labeling workforce, remove it by using the following command:
aws delete-workteam —workteam-name <the name you defined when you created the workteam>

Conclusion

In this post, we presented a solution that incorporates various AWS services to handle image storage (Amazon S3), mobile app development (Amplify), AI model hosting (Amazon Bedrock using Anthropic’s Claude), data verification (Amazon A2I), database (DynamoDB), and vector embeddings (Amazon Bedrock using Amazon Titan Multimodal Embeddings). It creates a seamless workflow for field data collection, AI-powered extraction, human validation, and inventory updates.

By taking advantage of the breadth of AWS services and integrating generative AI capabilities, this solution dramatically improves the efficiency and accuracy of asset inventory management processes. It reduces manual labor, accelerates data entry, and maintains high-quality inventory records, enabling organizations to optimize asset tracking and maintenance operations.

You can deploy this solution and immediately start collecting images of your assets to build or update your asset inventory.


About the authors


Federico D’Alessio is an AWS Solutions Architect and joined AWS in 2018. He is currently working in the Power and Utility and Transportation market. Federico is cloud addict and when not at work, he tries to reach clouds with his hang glider.


Leonardo Fenu is a Solutions Architect, who has been helping AWS customers align their technology with their business goals since 2018. When he is not hiking in the mountains or spending time with his family, he enjoys tinkering with hardware and software, exploring the latest cloud technologies, and finding creative ways to solve complex problems.


Elisabetta Castellano is an AWS Solutions Architect focused on empowering customers to maximize their cloud computing potential, with expertise in machine learning and generative AI. She enjoys immersing herself in cinema, live music performances, and books.


Carmela Gambardella is an AWS Solutions Architect since April 2018. Before AWS, Carmela has held various roles in large IT companies, such as software engineer, security consultant and solutions architect. She has been using her experience in security, compliance and cloud operations to help public sector organizations in their transformation journey to the cloud. In her spare time, she is a passionate reader, she enjoys hiking, traveling and playing yoga.

Read More

Isomorphic Labs Rethinks Drug Discovery With AI

Isomorphic Labs Rethinks Drug Discovery With AI

Isomorphic Labs is reimagining the drug discovery process with an AI-first approach. At the heart of this work is a new way of thinking about biology.

Max Jaderberg, chief AI officer, and Sergei Yakneen, chief technology officer at Isomorphic Labs joined the AI Podcast to explain why they look at biology as an information processing system.

“We’re building generalizable AI models capable of learning from the entire universe of protein and chemical interactions,” Jaderberg said. “This fundamentally breaks from the target-specific, siloed approach of conventional drug development.”

Isomorphic isn’t just working to optimize existing drug design workflows but completely rethinking how drugs are discovered — moving away from traditional methods that have historically been slow and inefficient.

By modeling cellular processes with AI, Isomorphic’s teams can predict molecular interactions with exceptional accuracy. Their advanced AI models enable scientists to computationally simulate how potential therapeutics interact with their targets in complex biological systems. Using AI to reduce dependence on wet lab experiments accelerates the drug discovery pipeline and creates possibilities for addressing previously untreatable conditions.

And that’s just the beginning.

Isomorphic Labs envisions a future of precision medicine, where treatments are tailored to an individual’s unique molecular and genetic makeup. While regulatory hurdles and technical challenges remain, Jaderberg and Yakneen are optimistic and devoted to balancing ambitious innovation with scientific rigor.

“We’re committed to proving our technology through real-world pharmaceutical breakthroughs,” said Jaderberg.

Time Stamps

1:14 – How AI is boosting the drug discovery process.

17:25 – Biology as a computational system.

19:50 – Applications of AlphaFold 3 in pharmaceutical research.

23:05 – The future of precision and preventative medicine.

You Might Also Like… 

NVIDIA’s Jacob Liberman on Bringing Agentic AI to Enterprises

Agentic AI enables developers to create intelligent multi-agent systems that reason, act and execute complex tasks with a degree of autonomy. Jacob Liberman, director of product management at NVIDIA, joined the NVIDIA AI Podcast to explain how agentic AI bridges the gap between powerful AI models and practical enterprise applications.

Roboflow Helps Unlock Computer Vision for Every Kind of AI Builder

Roboflow’s mission is to make the world programmable through computer vision. By simplifying computer vision development, the company helps bridge the gap between AI and the people looking to harness it. Cofounder and CEO Joseph Nelson discusses how Roboflow empowers users in manufacturing, healthcare and automotive to solve complex problems with visual AI.

How World Foundation Models Will Advance Physical AI With NVIDIA’s Ming-Yu Liu

AI models that can accurately simulate and predict outcomes in physical, real-world environments will enable the next generation of physical AI systems. Ming-Yu Liu, vice president of research at NVIDIA and an IEEE Fellow, explains the significance of world foundation models — powerful neural networks that can simulate physical environments.

Read More

Into the Omniverse: How Digital Twins Are Scaling Industrial AI

Into the Omniverse: How Digital Twins Are Scaling Industrial AI

Editor’s note: This post is part of Into the Omniverse, a series focused on how developers, 3D practitioners, and enterprises can transform their workflows using the latest advances in OpenUSD and NVIDIA Omniverse.

As industrial and physical AI streamline workflows, businesses are looking for ways to most effectively harness these technologies.

Scaling AI in industrial settings — like factories and other manufacturing facilities — presents unique challenges, such as fragmented data pipelines, siloed tools and the need for real-time, high-fidelity simulations.

The Mega NVIDIA Omniverse Blueprint — available in preview on build.nvidia.com — helps address these challenges by providing a scalable reference workflow for simulating multi-robot fleets in industrial facility digital twins, including those built with the NVIDIA Omniverse platform.

Industrial AI leaders — including Accenture, Foxconn, Kenmec, KION and Pegatron — are now using the blueprint to accelerate physical AI adoption and build autonomous systems that efficiently perform actions in industrial settings.

Built on the Universal Scene Description (OpenUSD) framework, the blueprint enables seamless data interoperability, real-time collaboration and AI-driven decision-making by unifying diverse data sources and improving simulation fidelity.

Industrial Leaders Adopt the Mega Blueprint

At Hannover Messe, the world’s largest industrial trade show that took place in Germany earlier this month, Accenture and Schaeffler, a leading motion technology company, showcased the adoption of the Mega blueprint to simulate Digit, a humanoid robot from Agility Robotics, performing material handling in kitting and commissioning areas.

Video courtesy of  Schaeffler, Accenture, Agility Robotics

KION, a supply chain solutions company, with Accenture are now using Mega to optimize warehouse and distribution processes.

At the NVIDIA GTC global AI conference in March, Accenture and Foxconn representatives discussed the impacts of introducing Mega into their industrial AI workflows.

Accelerating Industrial AI With Mega 

Mega NVIDIA Omniverse Blueprint architecture diagram

With the Mega blueprint, developers can accelerate physical AI workflows through:

  • Robot Fleet Simulation: Test and train diverse robot fleets in a safe, virtual environment to ensure they work seamlessly together.
  • Digital Twins: Use digital twins to simulate and optimize autonomous systems before physical deployment.
  • Sensor Simulation and Synthetic Data Generation: Generate realistic sensor data to ensure robots can accurately perceive and respond to their real-world environment.
  • Facility and Fleet Management Systems Integration: Connect robot fleets with management systems for efficient coordination and optimization.
  • Robot Brains as Containers: Use portable, plug-and-play modules for consistent robot performance and easier management.
  • World Simulator With OpenUSD: Simulate industrial facilities in highly realistic virtual environments using NVIDIA Omniverse and OpenUSD.
  • Omniverse Cloud Sensor RTX APIs: Ensure accurate sensor simulation with NVIDIA Omniverse Cloud application programming interfaces to create detailed virtual replicas of industrial facilities.
  • Scheduler: Manage complex tasks and data dependencies with a built-in scheduler for smooth and efficient operations.
  • Video Analytics AI Agents: Integrate AI agents built with the NVIDIA AI Blueprint for video search and summarization (VSS), leveraging NVIDIA Metropolis, to enhance operational insights.

Dive deeper into the Mega blueprint architecture on the NVIDIA Technical Blog.

Industrial AI is also being accelerated by the latest Omniverse Kit SDK 107 release, including major updates for robotics application development and enhanced simulation capabilities such as RTX Real-Time 2.0.

Get Plugged Into the World of OpenUSD

Learn more about OpenUSD and industrial AI by watching sessions from GTC, now available on demand, and by watching how ecosystem partners like Pegatron and others are pushing their industrial automation further, faster.

Join NVIDIA at COMPUTEX, running May 19-23 in Taipei, to discover the latest breakthroughs in AI. Watch NVIDIA founder and CEO Jensen Huang’s keynote on Sunday, May 18, at 8:00 p.m. PT.

Discover why developers and 3D practitioners are using OpenUSD and learn how to optimize 3D workflows with the new self-paced “Learn OpenUSD” curriculum for 3D developers and practitioners, available for free through the NVIDIA Deep Learning Institute.

For more resources on OpenUSD, explore the Alliance for OpenUSD forum and the AOUSD website.

Plus, tune in to the “OpenUSD Insiders” livestream taking place today at 11:00 a.m. PT to hear more about the Mega NVIDIA Omniverse Blueprint. Additionally, don’t miss next week’s livestream on April 26 at 11:00 a.m. PT, to hear Accenture discuss how they’re using the blueprint to build Omniverse digital twins for training and testing industrial AI’s robot brains.

Stay up to date by subscribing to NVIDIA news, joining the community and following NVIDIA Omniverse on Instagram, LinkedIn, Medium and X.

Featured image courtesy of:

Left and Top Right: Accenture, KION Group

Middle: Accenture, Agility Robotics, Schaeffler

Bottom Right: Foxconn

Read More

Scaling Diffusion Language Models via Adaptation from Autoregressive Models

Diffusion Language Models (DLMs) have emerged as a promising new paradigm for text generative modeling, potentially addressing limitations of autoregressive (AR) models. However, current DLMs have been studied at a smaller scale compared to their AR counterparts and lack fair comparison on language modeling benchmarks. Additionally, training diffusion models from scratch at scale remains challenging. Given the prevalence of open-source AR language models, we propose adapting these models to build text diffusion models. We demonstrate connections between AR and diffusion modeling objectives and…Apple Machine Learning Research