Agentic AI Leaders to Showcase Latest Advancements at NVIDIA GTC

Agentic AI Leaders to Showcase Latest Advancements at NVIDIA GTC

From improving customer experiences to boosting operational efficiency, agentic AI — advanced AI systems designed to autonomously reason, plan and execute complex tasks based on high-level goals — is changing the way people live and work. Across industries, AI agents serve as intelligent personal assistants, helping automate repetitive and time-consuming tasks to free up individuals and organizations to focus on higher-value work.

At NVIDIA GTC, a global AI conference taking place March 17-21 in San Jose, California, industry experts will share how they’re using AI agents to drive innovation in customer service, cybersecurity, retail, automotive, financial services, healthcare, telecommunications and more. In addition, engineers and developers can learn how to build AI agents and multi-agent systems using cutting-edge tools and frameworks.

Don’t miss NVIDIA founder and CEO Jensen Huang’s GTC keynote on Tuesday, March 18, at 10 a.m. PT.

Browse the agentic AI sessions coming to GTC, as well as the full conference catalog.

Agentic AI Sessions for Business Leaders

GTC will feature many sessions where executives and business leaders can learn how generative and agentic AI are transforming enterprises, including:

Agentic AI Sessions for Developers and Engineers

Developers, engineers, architects, data scientists and all technical professionals attending GTC can  stop by the AI Platforms Pavilion and the Generative and Agentic AI Pavilion to meet industry leaders streamlining AI development with end-to-end workflows using resources from the NVIDIA AI Enterprise software platform.

Browse the full list of technical agentic AI sessions, including:

Agentic AI Sessions for Industry Applications

GTC features sessions on how AI can be implemented in nearly every industry, from financial services to healthcare, retail, automotive and more.

Browse the full list of industry-specific sessions, including:

  • Unlock the Power of AI Agents in Automotive: Bryan Goodman, executive director of AI at Ford Motor Company, will discuss the benefits and challenges of deploying AI agents in the automotive industry, as well as practical solutions for doing so.
  • AI at Scale: Lessons from Capital One’s Agentic AI Adoption: In this fireside chat, Prem Natarajan, executive vice president and head of enterprise AI at Capital One, will talk about the bank’s development and adoption of generative and agentic AI technologies, highlighting key initiatives that balance innovation in customer experience with well-governed and risk-centered approaches.
  • AI Agents and Digital Humans Shaping the Future of Interaction in Telecoms: Leaders from Amdocs, AT&T, Indiana University and ServiceNow will delve into the cutting-edge developments in and challenges of deploying digital human AI agents and AI-driven solutions across telecommunications.
  • MOMA: Zero-Shot Multi-Objective Antibody Evolution Using Multi-Agents: Wei Lu, senior director of AI at Aureka Bio, will discuss how MOMA (Multi-Objective Multi-Agents) — a novel framework that orchestrates agentic models — accelerates the discovery of antibodies with optimal stability, target affinity and reduced side effects, enhancing high-valued therapeutic antibody design.

Register now for NVIDIA GTC to explore the latest technological advancements, including agentic and physical AI. 

Read More

Telenor Builds Norway’s First AI Factory, Offering Sustainable and Sovereign Data Processing

Telenor Builds Norway’s First AI Factory, Offering Sustainable and Sovereign Data Processing

Norway’s first sustainable and secure AI cloud service demonstrates how countries can maintain data sovereignty while advancing green computing initiatives.

Building on 170 years as a telecommunications provider, Telenor opened Norway’s first AI factory in November. The facility enables organizations to process sensitive data securely on Norwegian soil while prioritizing environmental responsibility.

Telenor’s Chief Innovation Officer and Head of the AI Factory Kaaren Hilsen joined the NVIDIA AI Podcast to discuss the AI factory’s rapid development, as it went from concept to reality in under a year. It now serves customers across industries including logistics and public services, and delivers sustainable and secure sovereign AI — where nations own the production of their own intelligence.

Telenor’s commitment to sustainability is evident in its plans to build a data center in Oslo that will run on renewable energy and repurpose excess heat for district heating in nearby residential buildings.

Sovereign AI infrastructure helps nations manage sensitive data and advance sustainability goals. As Hilsen said, the AI factory isn’t just about building infrastructure — it’s about “helping move society forward,” responsibly and sustainably.

Hilsen will share more insights at NVIDIA GTC, the leading global AI conference, in a session titled, “Accelerating Sovereign AI Factories: Insights from Telco Case Studies.”

Explore the lineup of telecom sessions at GTC and learn more about telecom companies becoming sovereign AI factories.

In NVIDIA’s third annual survey of telecommunications professionals, 84% of respondents intend to offer generative AI solutions to their customers, primarily as software-as-a-service (52%) and as a developer platform, including compute services (35%). Learn more in the full report.

Time Stamps

01:39 – Hilsen details her 25-year telco industry background and recent focus on sustainable innovation.

07:39 – How the AI factory went from concept to reality in less than a year.

13:09 – Exploration of how customers are using the AI factory for sensitive data processing.

16:50 – The importance of security and building trust with customers.

You Might Also Like… 

Technovation Empowers Girls in AI, Making AI Education More Inclusive and Engaging

Tara Chklovski, founder and CEO of tech education organization Technovation, and Anshita Saini, a Technovation alumna and member of the technical staff at OpenAI, explore how the nonprofit empowers girls worldwide through technology education. They discuss how Technovation is preparing the next generation of female leaders in AI and technology, and Saini talks about Wiser AI, her new initiative to support underrepresented voices in artificial intelligence.

NVIDIA’s Josh Parker on How AI and Accelerated Computing Drive Sustainability

AI isn’t just about building smarter machines. It’s about building a greener world. AI and accelerated computing are helping industries tackle some of the world’s toughest environmental challenges. Joshua Parker, senior director of corporate sustainability at NVIDIA, explains how these technologies are powering a new era of energy efficiency.

Snowflake’s Baris Gultekin on Unlocking the Value of Data With Large Language Models

Snowflake is using AI to help enterprises transform data into insights and applications. Baris Gultekin, head of AI at Snowflake, explains how the company’s AI Data Cloud platform separates the storage of data from compute, enabling organizations across the world to connect via cloud technology and work on a unified platform.

Read More

March Into Gaming With GeForce NOW’s 14 Must-Play Titles for Spring

March Into Gaming With GeForce NOW’s 14 Must-Play Titles for Spring

GeForce NOW is blooming further with an array of 14 new titles in March.

A garden of gaming delights will have members marching straight into action and adventure this spring, with Ubisoft’s Assassin’s Creed Shadows, Tripwire Interactive’s Killing Floor 3 and Hazelight Studio’s Split Fiction coming to the cloud next week at launch.

Start off with six games coming to the cloud this week, including Halo: The Master Chief Collection. And don’t miss out on the latest update for miHoYo’s hit game Honkai: Star Rail.

GeForce NOW one-month and six-month Performance and Ultimate memberships are available for new members to purchase again in the US, Canada and Europe, as well as the free tier. Stay tuned for updates as more membership options become available.

Split Adventure

Split Fiction on GeForce NOW, coming soon.
It takes communication … lots of it!

Hazelight Studios, creators of the acclaimed It Takes Two, are returning with a new and imaginative co-op game: Split Fiction. The narrative-driven adventure follows Mio, a writer of science fiction, and Zoe, a fantasy author, who find themselves trapped in a simulation stealing their stories. Adventure across wildly shifting worlds, ranging from the dazzling, neon-drenched landscapes of futuristic cyberpunk cities to the enchanted, dragon-populated depths of ancient forests.

Each level introduces fresh abilities, mini games and chaotic scenarios. Tame dragons, snowboard through explosions, master laser swords, solve gravity-defying puzzles, fight robotic parking attendants, outdance monkeys and flee supernovas while building trust between strangers. Play online or in couch co-op mode, using innovative split-screen mechanics that demand genuine teamwork and optimal communication between players to succeed. The Friend’s Pass feature lets one player host the full game while their partner joins for free.

Adventure through the genre-bending worlds of Split Fiction across devices with a GeForce NOW membership. With cloud saves and instant access, members never have to miss a moment to dive in whenever their gaming buddy’s available.

The Legend Lives On

Halo: The Master Chief Collection on GeForce NOW
Suit up, Spartan.

Halo: The Master Chief Collection comprises six legendary games for members to play — Halo: Reach, Halo: Combat Evolved Anniversary, Halo 2: Anniversary, Halo 3, Halo 3: ODST and Halo 4. Dive into more than 60 adrenaline-fueled campaign missions, from the explosive beginnings of the Halo saga to the high-stakes battles that shaped its epic legacy.

With breathtaking visuals, fast gameplay and stunning multiplayer arenas, this collection delivers heart-pounding action and unforgettable thrills. Whether a seasoned Spartan or new to the fight, Halo: The Master Chief Collection will keep players on the edges of their seats.

Gear up for an exceptional gaming experience with these upcoming titles and more. Ultimate members can stream in stunning 4K resolution at over 120 frames per second, powered by NVIDIA DLSS and Reflex technologies — delivering seamless gameplay on virtually any device.

New to Town

Honkai Star Rail V3.1
New heroes, old flames and a dash of chaos.

The newest Honkai: Star Rail update, “Light Slips the Gate, Shadow Greets the Throne,” is now available for members to stream in the cloud. Version 3.1 introduces a new chapter in the Flame-Chase Journey, playable characters and more. Explore the “Grove of Epiphany,” a new map ravaged by the black tide, where players must rescue survivors and reclaim Cerces’ Coreflame.

Two new playable characters are Tribbie, a five-star Quantum character on the Path of Harmony, and Mydei, a five-star Imaginary character on the Path of Destruction. The update also includes the return of limited five-star characters Yunli and Huohuo, a new season of the Divergent Universe game mode with a “Day and Night System,” as well as the Awooo Firm event where players manage a chimera squad in Okhema.

Look for the following games available to stream in the cloud this week:

  • Balatro (New release on Xbox, available on PC Game Pass)
  • The Dark Crystal: Age of Resistance Tactics (Xbox, available on the Microsoft Store)
  • The Evil Within 2 (Epic Games Store, Steam, and Xbox available on PC Game Pass)
  • Halo: The Master Chief Collection (Steam and Xbox, available on PC Game Pass)
  • Murky Divers (Steam)
  • Somerville (Xbox, available on the Microsoft Store)

Here’s what to expect for March:

  • Dragonkin: The Banished  (New release on Steam Mar. 6)
  • Split Fiction (New release on EA App and Steam, Mar. 6)
  • Split Fiction: Friend’s Pass  (New release on EA App and Steam, Mar. 6 App)
  • Assassin’s Creed Shadows (New release on Steam and Ubisoft Connect, Mar. 20)
  • Wreckfest 2 (New release on Steam, Mar. 20)
  • Killing Floor 3 (New Release on Steam, Mar. 25)
  • Atomfall (New release on Steam and Xbox available on PC Game Pass, Mar. 27)
  • The First Berserker: Khazan (New Release on Steam, Mar. 27
  • inZOI (New release on Steam, Mar. 28)
  • City Transport Simulator: Tram (Steam)
  • Dave the Diver (Steam)
  • The Legend of Heroes: Trails through Daybreak II (Steam)
  • Motor Town: Behind The Wheel (Steam)
  • Potion Craft: Alchemist Simulator (Steam)

Full of Games in February

In addition to the 17 games announced last month, six more joined the GeForce NOW library, including Halo: The Master Chief Collection being added this week:

  • Far Cry: New Dawn (New release on PC Game Pass, Feb. 4)
  • F1 Manager 2024 (New release on Epic Games Store, free, Feb. 13)
  • Warhammer 40,000: Rogue Trader (New release on Xbox, available on PC Game Pass, Feb. 20)
  • Batman: Arkham Asylum – Game of the Year Edition (Steam and Epic Games Store)
  • Batman: Arkham City – Game of the Year Edition (Steam and Epic Games Store)
  • Batman: Arkham Knight (Steam and Epic Games Store)

UNDER NIGHT IN-BIRTH II Sys:Celes didn’t make it to the cloud this month. Stay tuned to GFN Thursday for updates.

What are you planning to play this weekend? Let us know on X or in the comments below.

Read More

How Pattern PXM’s Content Brief is driving conversion on ecommerce marketplaces using AI

How Pattern PXM’s Content Brief is driving conversion on ecommerce marketplaces using AI

Brands today are juggling a million things, and keeping product content up-to-date is at the top of the list. Between decoding the endless requirements of different marketplaces, wrangling inventory across channels, adjusting product listings to catch a customer’s eye, and trying to outpace shifting trends and fierce competition, it’s a lot. And let’s face it—staying ahead of the ecommerce game can feel like running on a treadmill that just keeps speeding up. For many, it results in missed opportunities and revenue that doesn’t quite hit the mark.

“Managing a diverse range of products and retailers is so challenging due to the varying content requirements, imagery, different languages for different regions, formatting and even the target audiences that they serve.”

– Martin Ruiz, Content Specialist, Kanto

Pattern is a leader in ecommerce acceleration, helping brands navigate the complexities of selling on marketplaces and achieve profitable growth through a combination of proprietary technology and on-demand expertise. Pattern was founded in 2013 and has expanded to over 1,700 team members in 22 global locations, addressing the growing need for specialized ecommerce expertise.

Pattern has over 38 trillion proprietary ecommerce data points, 12 tech patents and patents pending, and deep marketplace expertise. Pattern partners with hundreds of brands, like Nestle and Philips, to drive revenue growth. As the top third-party seller on Amazon, Pattern uses this expertise to optimize product listings, manage inventory, and boost brand presence across multiple services simultaneously.

In this post, we share how Pattern uses AWS services to process trillions of data points to deliver actionable insights, optimizing product listings across multiple services.

Content Brief: Data-backed content optimization for product listings

Pattern’s latest innovation, Content Brief, is a powerful AI-driven tool designed to help brands optimize their product listings and accelerate growth across online marketplaces. Using Pattern’s dataset of over 38 trillion ecommerce data points, Content Brief provides actionable insights and recommendations to create standout product content that drives traffic and conversions.

Content Brief analyzes consumer demographics, discovery behavior, and content performance to give brands a comprehensive understanding of their product’s position in the marketplace. What would normally require months of research and work is now done in minutes. Content Brief takes the guesswork out of product strategy with tools that do the heavy lifting. Its attribute importance ranking shows you which product features deserve the spotlight, and the image archetype analysis makes sure your visuals engage customers.

As shown in the following screenshot, the image archetype feature shows attributes that are driving sales in a given category, allowing brands to highlight the most impactful features in the image block and A+ image content.

Content Brief incorporates review and feedback analysis capabilities. It uses sentiment analysis to process customer reviews, identifying recurring themes in both positive and negative feedback, and highlights areas for potential improvement.

Content Brief’s Search Family analysis groups similar search terms together, helping brands understand distinct customer intent and tailor their content accordingly. This feature combined with detailed persona insights helps marketers create highly targeted content for specific segments. It also offers competitive analysis, providing side-by-side comparisons with competing products, highlighting areas where a brand’s product stands out or needs improvement.

“This is the thing we need the most as a business. We have all of the listening tools, review sentiment, keyword things, but nothing is in a single place like this and able to be optimized to my listing. And the thought of writing all those changes back to my PIM and then syndicating to all of my retailers, this is giving me goosebumps.”

– Marketing executive, Fortune 500 brand

Brands using Content Brief can more quickly identify opportunities for growth, adapt to change, and maintain a competitive edge in the digital marketplace. From search optimization and review analysis to competitive benchmarking and persona targeting, Content Brief empowers brands to create compelling, data-driven content that drives both traffic and conversions.

Select Brands looked to improve their Amazon performance and partnered with Pattern. Content Brief’s insights led to updates that caused a transformation for their Triple Buffet Server listing’s image stack. Their old image stack was created for marketplace requirements, whereas the new image stack was optimized with insights based on product attributes to highlight from category and sales data. The updated image stack featured bold product highlights and captured shoppers with lifestyle imagery. The results were a 21% MoM revenue surge, 14.5% more traffic, and a 21 bps conversion lift.

“Content Brief is a perfect example of why we chose to partner with Pattern. After just one month of testing, we see how impactful it can be for driving incremental growth—even on products that are already performing well. We have a product that, together with Pattern, we were able to grow into a top performer in its category in less than 2 years, and it’s exciting to see how adding this additional layer can grow revenue even for that product, which we already considered to be strong.”

– Eric Endres, President, Select Brands

To discover how Content Brief helped Select Brands boost their Amazon performance, refer to the full case study.

The AWS backbone of Content Brief

At the heart of Pattern’s architecture lies a carefully orchestrated suite of AWS services. Amazon Simple Storage Service (Amazon S3) serves as the cornerstone for storing product images, crucial for comprehensive ecommerce analysis. Amazon Textract is employed to extract and analyze text from these images, providing valuable insights into product presentation and enabling comparisons with competitor listings. Meanwhile, Amazon DynamoDB acts as the powerhouse behind Content Brief’s rapid data retrieval and processing capabilities, storing both structured and unstructured data, including content brief object blobs.

Pattern’s approach to data management is both innovative and efficient. As data is processed and analyzed, they create a shell in DynamoDB for each content brief, progressively injecting data as it’s processed and refined. This method allows for rapid access to partial results and enables further data transformations as needed, making sure that brands have access to the most up-to-date insights.

The following diagram illustrates the pipeline workflow and architecture.

Scaling to handle 38 trillion data points

Processing over 38 trillion data points is no small feat, but Pattern has risen to the challenge with a sophisticated scaling strategy. At the core of this strategy is Amazon Elastic Container Store (Amazon ECS) with GPU support, which handles the computationally intensive tasks of natural language processing and data science. This setup allows Pattern to dynamically scale resources based on demand, providing optimal performance even during peak processing times.

To manage the complex flow of data between various AWS services, Pattern employs Apache Airflow. This orchestration tool manages the intricate dance of data with a primary DAG, creating and managing numerous sub-DAGs as needed. This innovative use of Airflow allows Pattern to efficiently manage complex, interdependent data processing tasks at scale.

But scaling isn’t just about processing power—it’s also about efficiency. Pattern has implemented batching techniques in their AI model calls, resulting in up to 50% cost reduction for two-batch processing while maintaining high throughput. They’ve also implemented cross-region inference to improve scalability and reliability across different geographical areas.

To keep a watchful eye on their system’s performance, Pattern employs LLM observability techniques. They monitor AI model performance and behavior, enabling continuous system optimization and making sure that Content Brief is operating at peak efficiency.

Using Amazon Bedrock for AI-powered insights

A key component of Pattern’s Content Brief solution is Amazon Bedrock, which plays a pivotal role in their AI and machine learning (ML) capabilities. Pattern uses Amazon Bedrock to implement a flexible and secure large language model (LLM) strategy.

Model flexibility and optimization

Amazon Bedrock offers support for multiple foundation models (FMs), which allows Pattern to dynamically select the most appropriate model for each specific task. This flexibility is crucial for optimizing performance across various aspects of Content Brief:

  • Natural language processing – For analyzing product descriptions, Pattern uses models optimized for language understanding and generation.
  • Sentiment analysis – When processing customer reviews, Amazon Bedrock enables the use of models fine-tuned for sentiment classification.
  • Image analysis – Pattern currently uses Amazon Textract for extracting text from product images. However, Amazon Bedrock also offers advanced vision-language models that could potentially enhance image analysis capabilities in the future, such as detailed object recognition or visual sentiment analysis.

The ability to rapidly prototype on different LLMs is a key component of Pattern’s AI strategy. Amazon Bedrock offers quick access to a variety of cutting-edge models o facilitate this process, allowing Pattern to continuously evolve Content Brief and use the latest advancements in AI technology. Today, this allows the team to build seamless integration and use various state-of-the-art language models tailored to different tasks, including the new, cost-effective Amazon Nova models.

Prompt engineering and efficiency

Pattern’s team has developed a sophisticated prompt engineering process, continually refining their prompts to optimize both quality and efficiency. Amazon Bedrock offers support for custom prompts, which allows Pattern to tailor the model’s behavior precisely to their needs, improving the accuracy and relevance of AI-generated insights.

Moreover, Amazon Bedrock offers efficient inference capabilities that help Pattern optimize token usage, reducing costs while maintaining high-quality outputs. This efficiency is crucial when processing the vast amounts of data required for comprehensive ecommerce analysis.

Security and data privacy

Pattern uses the built-in security features of Amazon Bedrock to uphold data protection and compliance. By employing AWS PrivateLink, data transfers between Pattern’s virtual private cloud (VPC) and Amazon Bedrock occur over private IP addresses, never traversing the public internet. This approach significantly enhances security by reducing exposure to potential threats.

Furthermore, the Amazon Bedrock architecture makes sure that Pattern’s data remains within their AWS account throughout the inference process. This data isolation provides an additional layer of security and helps maintain compliance with data protection regulations.

“Amazon Bedrock’s flexibility is crucial in the ever-evolving landscape of AI, enabling Pattern to utilize the most effective and efficient models for their diverse ecommerce analysis needs. The service’s robust security features and data isolation capabilities give us peace of mind, knowing that our data and our clients’ information are protected throughout the AI inference process.”

– Jason Wells, CTO, Pattern

Building on Amazon Bedrock, Pattern has created a secure, flexible, and efficient AI-powered solution that continuously evolves to meet the dynamic needs of ecommerce optimization.

Conclusion

Pattern’s Content Brief demonstrates the power of AWS in revolutionizing data-driven solutions. By using services like Amazon Bedrock, DynamoDB, and Amazon ECS, Pattern processes over 38 trillion data points to deliver actionable insights, optimizing product listings across multiple services.

Inspired to build your own innovative, high-performance solution? Explore AWS’s suite of services at aws.amazon.com and discover how you can harness the cloud to bring your ideas to life. To learn more about how Content Brief could help your brand optimize its ecommerce presence, visit pattern.com.


About the Author

Parker Bradshaw is an Enterprise SA at AWS who focuses on storage and data technologies. He helps retail companies manage large data sets to boost customer experience and product quality. Parker is passionate about innovation and building technical communities. In his free time, he enjoys family activities and playing pickleball.

Read More

How to configure cross-account model deployment using Amazon Bedrock Custom Model Import

How to configure cross-account model deployment using Amazon Bedrock Custom Model Import

In enterprise environments, organizations often divide their AI operations into two specialized teams: an AI research team and a model hosting team. The research team is dedicated to developing and enhancing AI models using model training and fine-tuning techniques. Meanwhile, a separate hosting team is responsible for deploying these models across their own development, staging, and production environments.

With Amazon Bedrock Custom Model Import, the hosting team can import and serve custom models using supported architectures such as Meta Llama 2, Llama 3, and Mistral using On-Demand pricing. Teams can import models with weights in Hugging Face safetensors format from Amazon SageMaker or from Amazon Simple Storage Service (Amazon S3). These imported custom models work alongside existing Amazon Bedrock foundation models (FMs) through a single, unified API in a serverless manner, alleviating the need to manage model deployment and scaling.

However, in such enterprise environments, these teams often work in separate AWS accounts for security and operational reasons. The model development team’s training results, known as model artifacts, for example model weights, are typically stored in S3 buckets within the research team’s AWS account, but the hosting team needs to access these artifacts from another account to deploy models. This creates a challenge: how do you securely share model artifacts between accounts?

This is where cross-account access becomes important. With Amazon Bedrock Custom Model Import cross-account support, we can help you configure direct access between the S3 buckets storing model artifacts and the hosting account. This streamlines your operational workflow while maintaining security boundaries between teams. One of our customers quotes:

Bedrock Custom Model Import cross-account support helped AI Platform team to simplify the configuration, reduce operational overhead and secure models in the original location.

– Scott Chang, Principal Engineer, AI Platform at Salesforce

In this guide, we walk you through step-by-step instructions for configuring cross-account access for Amazon Bedrock Custom Model Import, covering both non-encrypted and AWS Key Management Service (AWS KMS) based encrypted scenarios.

Example scenario

For this walkthrough, consider two AWS accounts:

  • Model Development account (111122223333):
    • Stores model artifacts (custom weights and configurations) in an S3 bucket called model-artifacts-111122223333
    • Optionally encrypts artifacts using AWS KMS customer managed key kms-cmk-111122223333
  • Model Hosting account (777788889999):
    • Hosts models using Amazon Bedrock Custom Model Import
    • Uses a new AWS Identity and Access Management (IAM) execution role BedrockCMIExecutionRole-777788889999
    • Can optionally encrypt artifacts using AWS KMS key kms-cmk-777788889999

The following figure illustrates this setup, showing how the cross-account access is configured between the S3 bucket, KMS keys, and Amazon Bedrock Custom Model Import.

Figure shows how the cross-account access is configured

To successfully implement the described scenario while adhering to the principle of least privilege access, the following steps must be executed:

  1. The Model Development account must provide access to the Model Hosting account’s IAM role BedrockCMIExecutionRole-777788889999, allowing it to utilize their S3 bucket and, if applicable, the encryption key, using resource-based policies.
  2. The Model Hosting account should establish an IAM role, such as BedrockCMIExecutionRole-777788889999. The identity-based policies needed would be for the Model Development S3 bucket and customer managed keys for decrypting model artifacts, like using kms-cmk-111122223333.
  3. The Model Hosting account must enable the Amazon Bedrock service to assume the IAM role BedrockCMIExecutionRole-777788889999, created in step 2, by including the Amazon Bedrock service as a trusted entity. This IAM role will be utilized by the Model Hosting account to initiate the custom model import job.

Prerequisites

Before you can start a custom model import job, you need to fulfill the following prerequisites:

  1. If you’re importing your model from an S3 bucket, prepare your model files in the Hugging Face weights format. For more information refer to Import source.
  2. (Optional) Set up extra security configurations.

Step-by-step execution

The following section provides the step-by-step execution of the previously outlined high-level process, from the perspective of an administrator managing both accounts:

Step 1: Set up the S3 bucket policy (in the Model Development account) to enable access for the Model Hosting account’s IAM role:

  1. Sign in to the AWS Management Console for account 111122223333, then access the Amazon S3 console.
  2. On the General purpose buckets view, locate model-artifacts-111122223333, the bucket used by the model development team to store their model artifacts.
  3. On the Permissions tab, select Edit in the Bucket policy section, and insert the following IAM resource-based policy. Be sure to update the AWS account IDs (shown in red) in the policy with your information.
    {
        "Version": "2012-10-17",
        "Id": "AllowCrossAccountS3Access",
        "Statement": [
            {
                "Sid": "cross-account-list-get",
                "Effect": "Allow",
                "Principal": {
     "AWS": "arn:aws:iam::777788889999:root"             },
                "Action": [
                    "s3:ListBucket",
                    "s3:GetObject"
                ],
                "Resource": [
     "arn:aws:s3:::model-artifacts-111122223333", "arn:aws:s3:::model-artifacts-111122223333/*"             ],
                "Condition": {
                    "ArnLike": {
     "aws:PrincipalArn": "arn:aws:iam::777788889999:role/BedrockCMIExecutionRole-777788889999*"                 }
                }
            }
        ]
    }

Step 2: Establish an IAM role (in the Model Hosting account) and authorize Amazon Bedrock to assume this role:

  1. Sign in to the AWS console for account 777788889999 and launch the IAM console.
  2. In the left navigation pane, select Policies and then choose Create policy. Within the Policy Editor, switch to the JSON tab and insert the following identity-based policy. This policy is designed for read-only access, enabling users or a role to list and download objects from a specified S3 bucket, but only if the bucket is owned by account 111122223333. Customize the AWS account ID and S3 bucket name/prefix (shown in red) with your information.
    {
        "Version": "2012-10-17",
        "Statement": [
            {
                "Sid": "1",
                "Effect": "Allow",
                "Action": [
                    "s3:ListBucket",
                    "s3:GetObject"
                ],
                "Resource": [
     "arn:aws:s3:::model-artifacts-111122223333", "arn:aws:s3:::model-artifacts-111122223333/*"             ],
                "Condition": {
                    "StringEquals": {
      "aws:ResourceAccount": "111122223333"                 }
                }
            }
        ]
    }

  1. Choose Next, assign the policy name as BedrockCMIExecutionPolicy-777788889999, and finalize by choosing Create policy.
  2. In the left navigation pane, choose Roles and select Custom trust policy as the Trusted entity type. Insert the following trusted entity policy, which restricts the role assumption to the Amazon Bedrock service, specifically for model import jobs in account 777788889999 located in the US East (N. Virginia) us-east-1 Region. Modify the AWS account ID and Region (shown in red) with your information.
    {
        "Version": "2012-10-17",
        "Statement": [
            {
                "Sid": "1",
                "Effect": "Allow",
                "Principal": {
                    "Service": "bedrock.amazonaws.com"
                },
                "Action": "sts:AssumeRole",
                "Condition": {
                    "StringEquals": {
     "aws:SourceAccount": "777788889999"                 },
                    "ArnEquals": {
     "aws:SourceArn": "arn:aws:bedrock:us-east-1:777788889999:model-import-job/*"                 }
                }
            }
        ]
    }

  1. Choose Next and in the Add permissions section, search for the policy created in the previous step BedrockCMIExecutionPolicy-777788889999, select the checkbox, and proceed by choosing Next.
  2. Assign the Role name as BedrockCMIExecutionRole-777788889999, provide a Description as “IAM execution role to be used by CMI jobs,” and finalize by choosing Create role.

Important: If you’re using an AWS KMS encryption key for model artifacts in the Model Development account or for imported model artifacts with the Amazon Bedrock managed AWS account, proceed with steps 3 through 5. If not, skip to step 6.

Step 3: Adjust the AWS KMS key policy (in the Model Development account) to allow the Amazon Bedrock CMI execution IAM role to decrypt model artifacts:

  1. Transition back to the Model Development account and find the AWS KMS key named kms-cmk-111122223333 in the AWS KMS console. Note the AWS KMS key Amazon Resource Name (ARN).
  2. On the Key policy tab, switch to the Policy view, and incorporate the following resource-based policy statement to enable the Model Hosting account’s IAM role BedrockCMIExecutionRole-777788889999 to decrypt model artifacts. Revise items in red with your information.
    {
          "Sid": "Allow use of the key by the destination account",
          "Effect": "Allow",
          "Principal": {
     "AWS": "arn:aws:iam::777788889999:role/BedrockCMIExecutionRole-777788889999"       },
          "Action": [
            "kms:Decrypt",
            "kms:DescribeKey"
          ],
          "Resource": "*"
    }

Step 4: Set the AWS KMS key policy (in the Model Hosting account) for the CMI execution IAM role to encrypt and decrypt model artifacts to securely store in the Amazon Bedrock AWS account:

  1. Return to the Model Hosting account and locate the AWS KMS key named kms-cmk-777788889999 in the AWS KMS console. Note the AWS KMS key ARN.
  2. Insert the following statement into the AWS KMS key’s resource-based policy to enable the BedrockCMIExecutionRole-777788889999 IAM role to encrypt and decrypt model artifacts at rest in the Amazon Bedrock managed AWS account. Revise items in red with your information.
    {
          "Sid": "Allow use of the key",
          "Effect": "Allow",
          "Principal": {
     "AWS": "arn:aws:iam::777788889999:role/BedrockCMIExecutionRole-777788889999"       },
          "Action": [
            "kms:Encrypt",
            "kms:Decrypt",
            "kms:ReEncrypt*",
            "kms:GenerateDataKey*",
            "kms:DescribeKey"
          ],
          "Resource": "*"
    }

Step 5: Modify the CMI execution role’s permissions (in the Model Hosting account) to provide access to encryption keys:

Access the IAM console and find the IAM policy BedrockCMIExecutionPolicy-777788889999. To the existing identity-based policy, append the following statements (replace the ARNs in red with one noted in steps 4 and 5):

{
    "Effect": "Allow",
    "Action": [
        "kms:Decrypt",
        "kms:DescribeKey"
    ],
 "Resource": "arn:aws:kms:us-east-1:111122223333:key/b5b6e052-fb27-4dbb-bf0d-daf3375a9fda" },
{
    "Effect": "Allow",
    "Action": [
        "kms:Encrypt",
        "kms:Decrypt",
        "kms:ReEncrypt*",
        "kms:GenerateDataKey*",
        "kms:DescribeKey"
    ],
 "Resource": "arn:aws:kms:us-east-1:777788889999:key/6cd5d3bf-3d9b-4d1c-83d5-8df6284435a1" }

Step 6: Initiate the Model import job (in the Model Hosting account)

In this step, we execute the model import job using the AWS Command Line Interface (AWS CLI) command. You can also use AWS SDKs or APIs for the same purpose. Run the following command from your terminal session with an IAM user or role that has the necessary privileges to create a custom model import job. You don’t need to explicitly provide an ARN or details of the CMK used by the Model Development team.

aws bedrock create-model-import-job 
    --job-name "cmi-job-777788889999-01" 
    --imported-model-name "mistral-777788889999-01" 
    --role-arn "arn:aws:iam::777788889999:role/BedrockCMIExecutionRole-777788889999" 
    --model-data-source "s3DataSource={s3Uri="s3://model-artifacts-111122223333/mistral-model-weights/"}"

When encrypting model artifacts with Amazon Bedrock Custom Model Import, use the --imported-model-kms-key-id flag and specify the ARN of the Model Hosting account’s CMK key.

aws bedrock create-model-import-job 
    --job-name "cmi-job-777788889999-04" 
    --imported-model-name "mistral-777788889999-01" 
    --role-arn "arn:aws:iam::777788889999:role/BedrockCMIExecutionRole-777788889999" 
    --model-data-source "s3DataSource={s3Uri="s3://model-artifacts-111122223333/mistral-model-weights/"}" 
    --imported-model-kms-key-id "arn:aws:kms:us-east-1:777788889999:key/6cd5d3bf-3d9b-4d1c-83d5-8df6284435a1" 

Cross-account access to the S3 bucket using the custom model import job is only supported through AWS CLI, AWS SDKs, or APIs. Console support is not yet available.

Troubleshooting

When IAM policy misconfigurations prevent a custom model import job, you might encounter an error like:

Amazon Bedrock does not have access to the S3 location (s3://model-artifacts-111122223333/mistral-model-weights). Update the permissions and try again.

To resolve this, manually verify access to Model Development’s S3 bucket from the Model Hosting account by assuming the BedrockCMIExecutionRole-777788889999. Follow these steps:

Step 1: Identify the current IAM role or user in the CLI with the following and copy the ARN from the output:

aws sts get-caller-identity

Step 2: Update trust relationships. Append the trust policy of the BedrockCMIExecutionRole-777788889999 to allow the current user or IAM role to assume this role:

{
    "Effect": "Allow",
    "Principal": {
        "AWS": "arn:aws:sts::777788889999:role/current-user-role"
    },
    "Action": "sts:AssumeRole"
}

Step 3: List or copy the S3 bucket contents assuming the Amazon Bedrock Custom Model Import execution role

  1. Assume the CMI execution role (replace the ARN with your information):
    aws sts assume-role 
        --role-arn "arn:aws:iam::776941257690:role/BedrockCMIExecutionRole-777788889999" 
        --role-session-name "BedrockCMISession"

  2. Export the returned temporary credentials as environment variables:
    export AWS_ACCESS_KEY_ID="ASIA..."
    export AWS_SECRET_ACCESS_KEY="..."
    export AWS_SESSION_TOKEN="..."

  3. Run commands to troubleshoot permission issues:
    aws s3 ls s3://model-artifacts-111122223333/mistral-model-weights/
    aws s3 cp s3://model-artifacts-111122223333/mistral-model-weights/config.json . 

If errors persist, consider using Amazon Q Developer or refer to additional resources outlined in the IAM User Guide.

Cleanup

There is no additional charge to import a custom model to Amazon Bedrock (refer to step 6 in the Step-by-step execution section). However, if your model isn’t in use for inference, and you want to avoid paying storage costs (refer to Amazon Bedrock pricing), delete the imported model using the AWS console or AWS CLI reference or API Reference. For example (replace the text in red with your imported model name):

aws bedrock delete-imported-model 
    --model-identifier "mistral-777788889999-01"

Conclusion

By using cross-account access in Amazon Bedrock Custom Model Import, organizations can significantly streamline their AI model deployment workflows.

Amazon Bedrock Custom Model Import is generally available today in Amazon Bedrock in the US East (N. Virginia) us-east-1 and US West (Oregon) us-west-2 AWS Regions. Refer to the full Region list for future updates. To learn more, refer to the Amazon Bedrock Custom Model Import product page and Amazon Bedrock pricing page. Give Amazon Bedrock Custom Model Import a try in the Amazon Bedrock console today and send feedback to AWS re:Post for Amazon Bedrock or through your usual AWS Support contacts.

Thank you to our contributors Scott Chang (Salesforce), Raghav Tanaji (Salesforce), Rupinder Grewal (AWS), Ishan Singh (AWS), and Dharinee Gupta (AWS)


About the Authors

Hrushikesh Gangur is a Principal Solutions Architect at AWS. Based in San Francisco, California, Hrushikesh is an expert in AWS machine learning. As a thought leader in the field of generative AI, Hrushikesh has contributed to AWS’s efforts in helping startups and ISVs build and deploy AI applications. His expertise extends to various AWS services, including Amazon SageMaker, Amazon Bedrock, and accelerated computing which are crucial for building AI applications.

Sai Darahas Akkineni is a Software Development Engineer at AWS. He holds a master’s degree in Computer Engineering from Cornell University, where he worked in the Autonomous Systems Lab with a specialization in computer vision and robot perception. Currently, he helps deploy large language models to optimize throughput and latency.

Prashant Patel is a Senior Software Development Engineer in AWS. He’s passionate about scaling large language models for enterprise applications. Prior to joining AWS, he worked at IBM on productionizing large-scale AI/ML workloads on Kubernetes. Prashant has a master’s degree from NYU Tandon School of Engineering. While not at work, he enjoys traveling and playing with his dogs.

Read More

ByteDance processes billions of daily videos using their multimodal video understanding models on AWS Inferentia2

ByteDance processes billions of daily videos using their multimodal video understanding models on AWS Inferentia2

This is a guest post authored by the team at ByteDance.

ByteDance is a technology company that operates a range of content platforms to inform, educate, entertain, and inspire people across languages, cultures, and geographies. Users trust and enjoy our content platforms because of the rich, intuitive, and safe experiences they provide. These experiences are made possible by our machine learning (ML) backend engine, with ML models built for video understanding, search, recommendation, advertising, and novel visual effects.

In support of its mission to “Inspire Creativity and Enrich Life,” we’ve made it straightforward and fun for people to engage with, create, and consume content. People can also discover and transact with a suite of more than a dozen products and services, such as CapCut, e-Shop, Lark, Pico, and Mobile Legends: Bang Bang.

At ByteDance, we collaborated with Amazon Web Services (AWS) to deploy multimodal large language models (LLMs) for video understanding using AWS Inferentia2 across multiple AWS Regions around the world. By using sophisticated ML algorithms, the platform efficiently scans billions of videos each day. We use this process to identify and flag content that violates community guidelines, enabling a better experience for all users. By using Amazon EC2 Inf2 instances for these video understanding workloads, we were able to cut the inference cost by half.

In this post, we discuss the use of multimodal LLMs for video understanding, the solution architecture, and techniques for performance optimization.

Overcoming video understanding hurdles with multimodal LLMs

Multimodal LLMs enable better understanding of the world, enabling various forms of digital content as inputs to the LLM, greatly increasing the range of useful applications we can now build. The need for AI systems capable of processing various content forms has become increasingly apparent. Multimodal LLMs have risen to meet this challenge by taking multiple data modalities, including text, images, audio, and video (refer to the following diagram), which allows for full understanding of content, mimicking human perception and interaction with the world. The enhanced capabilities of these models are evident in their performance, which far surpasses that of traditional models in tasks ranging from sophisticated virtual assistant to advanced content creation. By expanding the boundaries of AI capabilities and paving the way for more natural and intuitive interactions with technology, these models aren’t just improving existing applications but opening doors to entirely new possibilities in the realm of AI and user experience.

In our operations, the implementation of multimodal LLMs for video understanding represents a significant shift in thinking about AI-driven content analysis. This innovation addresses the daily challenge of processing billions of volumes of video content, overcoming the efficiency limits of traditional AI models. We’ve developed our own multimodal LLM architecture, designed to achieve state-of-the-art performance across single-image, multi-image, and video applications. Unlike traditional ML models, this new generative AI–enabled system integrates multiple input streams into a unified representational space. Cross-modal attention mechanisms facilitate information exchange between modalities, and fusion layers combine representations from different modalities. The decoder then generates output based on the fused multimodal representation, enabling a more nuanced and context-aware analysis of content.

Solution overview

We’ve collaborated with AWS since the first generation of Inferentia chips. Our video understanding department has been committed to finding more cost-efficient solutions that deliver higher performance to better meet ever-growing business needs. During this period, we found that AWS has been continually inventing and adding features and capabilities to its AWS Neuron software development kit (SDK), the software enabling high-performance workloads on the Inferentia chips. The popular Meta Llama and Mistral models were well supported with high performance on Inferentia2 shortly after their open source release. Therefore, we began to evaluate the Inferentia2 based solution, illustrated in the following diagram.

We made the strategic decision to deploy a fine-tuned middle-sized LLM on Inferentia2, to provide a performant and cost-effective solution capable of processing billions of videos daily. The process was a comprehensive effort aimed at optimizing end-to-end response time for our video understanding workload. The team explored a wide range of parameters, including tensor parallel sizes, compile configurations, sequence lengths, and batch sizes. We employed various parallelization techniques, such as multi-threading and model replication (for non-LLM models) across multiple NeuronCores. Through these optimizations, which included parallelizing sequence steps, reusing devices, and using auto-benchmark and profiling tools, we achieved a substantial performance boost, maintaining our position at the forefront of industry performance standards

We used tensor parallelism to effectively distribute and scale the model across multiple accelerators in an Inf2 instance. We used static batching, which improved the latency and throughput of our models by making sure that data is processed in uniform, fixed-size batches during inference. Using repeated n-grams filtering significantly improved the quality of automatically generated text and reduced inference time. Quantizing the weights of the multimodal model from FP16/BF16 to INT8 format allowed it to run more efficiently on Inferentia2 with less device memory usage, without compromising on accuracy. Using these techniques and model serialization, we optimized the throughput on inf2.48xlarge instance by maximizing the batch size such that the model could still fit on a single accelerator in an instance so we could deploy multiple model replicas on the same instance. This comprehensive optimization strategy helped us meet our latency requirements while providing optimal throughput and cost reduction. Notably, the Inferentia2 based solution cut the cost by half compared to comparable Amazon Elastic Compute Cloud (Amazon EC2) instances, highlighting the significant economic advantages of using Inferentia2 chips for large-scale video understanding tasks.

The following diagram shows how we deploy our LLM container on Amazon EC2 Inf2 instances using Neuron.

In summary, our collaboration with AWS has revolutionized video understanding, setting new industry standards for efficiency and accuracy. The multimodal LLM’s ability to adapt to global market demands and its scalable performance on Inferentia2 chips underscore the profound impact of this technology in safeguarding the platform’s global community.

Future plans

Looking further ahead, the development of a unified multimodal LLM represents an important shift in video understanding technology. This ambitious project aims to create a universal content tokenizer capable of processing all content types and aligning them within a common semantic space. After it’s tokenized, the content will be analyzed by advanced large models, generating appropriate content understanding outputs regardless of the original format (as shown in the following diagram). This unified approach can streamline the content understanding process, potentially improving both efficiency and consistency across diverse content types.

For additional learning, refer to the paper The Evolution of Multimodal Model Architectures.

The implementation of this comprehensive strategy sets new benchmarks in video understanding technology, striking a balance between accuracy, speed, and cultural sensitivity in an increasingly complex digital ecosystem. This forward-looking approach not only addresses current challenges in video understanding but also positions the system at the forefront of AI-driven content analysis and management for the foreseeable future.

By using cutting-edge AI techniques and a holistic approach to content understanding, this next-generation content understanding system aims to set new industry standards, providing safer and more inclusive online environments while adapting to the ever-evolving landscape of digital communication. At the same time, AWS is investing in next-generation AI chips such as AWS Trainium2, which will continue to push the performance boundaries while keeping costs under control. At ByteDance, we’re planning to test out this new generation of AWS AI chips and adopt them appropriately as the models and workloads continue to evolve.

Conclusion

The collaboration between ByteDance and AWS has revolutionized video understanding through the deployment of multimodal LLMs on Inferentia2 chips. This partnership has yielded remarkable results, the ability to process billions of videos daily, and significant cost reductions and higher performance over comparable EC2 instances.

As ByteDance continues to innovate with projects such as the unified multimodal large model, we remain committed to pushing the boundaries of AI-driven content analysis. Our goal is to make sure our platforms remain safe, inclusive, and creative spaces for our global community, setting new industry standards for efficient video understanding.

To learn more about Inf2 instances, refer to Amazon EC2 Inf2 Architecture.


About the Authors

Wangpeng An, Principal Algorithm Engineer at TikTok, specializes in multimodal LLMs for video understanding, advertising, and recommendations. He has led key projects in model acceleration, content moderation, and Ads LLM pipelines, enhancing TikTok’s real-time machine learning systems.

Haotian Zhang is a Tech Lead MLE at TikTok, specializing in content understanding, search, and recommendation. He received an ML PhD from University of Waterloo. At TikTok, he leads a group of engineers to improve the efficiency, robustness, and effectiveness of training and inference for LLMs and multimodal LLMs, especially for large distributed ML systems.

Xiaojie Ding is a senior engineer at TikTok, focusing on content moderation system development, model resource and deployment optimization, and algorithm engineering stability construction. In his free time, he likes to play single-player games.

Nachuan Yang is a senior engineer at TikTok, focusing on content security and moderation. He has successively been engaged in the construction of moderation systems, model applications, and deployment and performance optimization.

Kairong Sun is a Senior SRE on the AML Team at ByteDance. His role focuses on maintaining the seamless operation and efficient allocation of resources within the cluster, specializing in cluster machine maintenance and resource optimization.

The authors would like to thank other ByteDance and AWS team members for their contributions: Xi Dai, Kaili Zhao, Zhixin Zhang, Jin Ye, and Yann Xia from ByteDance; Jia Dong, Bingyang Huang, Kamran Khan, Shruti Koparkar, and Diwakar Bansal from AWS.

Read More

Explore How RTX AI PCs and Workstations Supercharge AI Development at NVIDIA GTC 2025

Explore How RTX AI PCs and Workstations Supercharge AI Development at NVIDIA GTC 2025

Generative AI is redefining computing, unlocking new ways to build, train and optimize AI models on PCs and workstations. From content creation and large and small language models to software development, AI-powered PCs and workstations are transforming workflows and  enhancing productivity.

At GTC 2025, running March 17–21 in the San Jose Convention Center, experts from across the AI ecosystem will share insights on deploying AI locally, optimizing models and harnessing cutting-edge hardware and software to enhance AI workloads — highlighting key advancements in RTX AI PCs and workstations.

Develop and Deploy on RTX

RTX GPUs are built with specialized AI hardware called Tensor Cores that provide the compute performance needed to run the latest and most demanding AI models. These high-performance GPUs can help build digital humans, chatbots, AI-generated podcasts and more.

With more than 100 million GeForce RTX and NVIDIA RTX™ GPUs users, developers have a large audience to target when new AI apps and features are deployed. In the session “Build Digital Humans, Chatbots, and AI-Generated Podcasts for RTX PCs and Workstations,” Annamalai Chockalingam, senior product manager at NVIDIA, will showcase the end-to-end suite of tools developers can use to streamline development and deploy incredibly fast AI-enabled applications.

Model Behavior

Large language models (LLMs) can be used for an abundance of use cases — and scale to tackle complex tasks like writing code or translating Japanese into Greek. But since they’re typically trained with a wide spectrum of knowledge for broad applications, they may not be the right fit for specific tasks, like nonplayer character dialog generation in a video game. In contrast, small language models balance need with reduced size, maintaining accuracy while running locally on more devices.

In the session “Watch Your Language: Create Small Language Models That Run On-Device,” Oluwatobi Olabiyi, senior engineering manager at NVIDIA, will present tools and techniques that developers and enthusiasts can use to generate, curate and distill a dataset — then train a small language model that can perform tasks designed for it.

Maximizing AI Performance on Windows Workstations

Optimizing AI inference and model execution on Windows-based workstations requires strategic software and hardware tuning due to diverse hardware configurations and software environments. The session “Optimizing AI Workloads on Windows Workstations: Strategies and Best Practices,” will explore best practices for AI optimization, including model quantization, inference pipeline enhancements and hardware-aware tuning.

A team of NVIDIA software engineers will also cover hardware-aware optimizations for ONNX Runtime, NVIDIA TensorRT and llama.cpp, helping developers maximize AI efficiency across GPUs, CPUs and NPUs.

Advancing Local AI Development

Building, testing and deploying AI models on local infrastructure ensures security and performance even without a connection to cloud-based services. Accelerated with NVIDIA RTX GPUs, Z by HP’s AI solutions provide the tools needed to develop AI on premises while maintaining control over data and IP.

Learn more by attending the following sessions:

  • Dell Pro Max and NVIDIA: Unleashing the Future of AI Development: This session introduces Dell Pro Max PCs, performance laptops and desktops for professionals, powered by NVIDIA RTX GPUs. Discover how this powerful duo can help jumpstart AI initiatives and transform the way AI developers, data scientists, creators and power users innovate.
  • Develop and Observe Gen AI On-Prem With Z by HP GenAI Lab and AI Studio: This session demonstrates how Z by HP solutions simplify local model training and deployment, harnessing models in the NVIDIA NGC catalog and Galileo evaluation technology to refine generative AI projects securely and efficiently.
  • Supercharge Gen AI Development With Z by HP GenAI Lab and AI Studio: This session explores how Z by HP’s GenAI Lab and AI Studio enable on-premises LLM development while maintaining complete data security and control. Learn how these tools streamline the entire AI lifecycle, from experimentation to deployment, while integrating models available in the NVIDIA NGC catalog for collaboration and workflow efficiency.

Developers and enthusiasts can get started with AI development on RTX AI PCs and workstations using NVIDIA NIM microservices. Rolling out today, the initial public beta release includes the Llama 3.1 LLM, NVIDIA Riva Parakeet for automatic speech recognition (ASR), and YOLOX  for computer vision.

NIM microservices are optimized, prepackaged models for generative AI. They span modalities important for PC development, and are easy to download and connect to via industry-standard application programming interfaces.

Attend GTC 2025

From the keynote by NVIDIA founder and CEO Jensen Huang to over 1,000 inspiring sessions, 300+ exhibits, technical hands-on training and tons of unique networking events — GTC is set to put a spotlight on AI and all its benefits.

Follow NVIDIA AI PC on Facebook, Instagram, TikTok and X — and stay informed by subscribing to the RTX AI PC newsletter.

Read More

Magma: A foundation model for multimodal AI agents across digital and physical worlds

Magma: A foundation model for multimodal AI agents across digital and physical worlds

Gradient background transitioning from blue on the left to pink on the right. In the center, a rectangular box with ‘MAGMA’ written in bold white letters. To the left, an icon of a globe representing Earth. To the right, an icon of a computer monitor displaying a globe. Arrows connect these three elements in a circular flow, indicating interaction or data exchange between Earth, MAGMA, and the computer.

Imagine an AI system capable of guiding a robot to manipulate physical objects as effortlessly as it navigates software menus. Such seamless integration of digital and physical tasks has long been the stuff of science fiction.  

Today, Microsoft researchers are bringing that vision closer to reality with Magma (opens in new tab), a multimodal AI foundation model designed to process information and generate action proposals across both digital and physical environments. It is designed to enable AI agents to interpret user interfaces and suggest actions like button clicks, while also orchestrating robotic movements and interactions in the physical world.  

Built on the foundation model paradigm, Magma is pretrained on an expansive and diverse dataset, allowing it to generalize better across tasks and environments than smaller, task-specific models. As illustrated in Figure 1, Magma synthesizes visual and textual inputs to generate meaningful actions—whether executing a command in software or grabbing a tool in the physical world. This new model represents a significant step toward AI agents that can serve as versatile, general-purpose assistants. 

Given a described goal, Magma can formulate plans and execute actions to achieve it. By effectively transferring knowledge from freely available visual and language data, Magma bridges verbal, spatial and temporal intelligence to navigate complex tasks and settings.
Figure 1: Magma is one of the first foundation models that is capable of interpreting and grounding multimodal inputs within both digital and physical environments. Given a described goal, Magma can formulate plans and execute actions to achieve it. By effectively transferring knowledge from freely available visual and language data, Magma bridges verbal, spatial and temporal intelligence to navigate complex tasks and settings.

Vision-Language-Action (VLA) models integrate visual perception, language comprehension, and action reasoning to enable AI systems to interpret images, process textual instructions, and propose actions. These models bridge the gap between multimodal understanding and real-world interaction. Typically pretrained on large numbers of VLA datasets, they acquire the ability to understand visual content, process language, and perceive and interact with the spatial world, allowing them to perform a wide range of tasks. However, due to the dramatic difference among various digital and physical environments, separate VLA models are trained and used for different environments. As a result, these models struggle to generalize to new tasks and environments outside of their training data. Moreover, most of these models do not leverage pretrained vision-language (VL) models or diverse VL datasets, which hampers their understanding of VL relations and generalizability.  

Magma, to the best of our knowledge, is one of the first VLA foundation model that can adapt to new tasks in both digital and physical environments, which helps AI-powered assistants or robots understand their surroundings and suggest appropriate actions. For example, it could enable a home assistant robot to learn how to organize a new type of object it has never encountered or help a virtual assistant generate step-by-step user interface navigation instructions for an unfamiliar task. Through Magma, we demonstrate the advantages of pretraining a single VLA model for AI agents across multiple environments while still achieving state-of-the-art results on user interface navigation and robotic manipulation tasks, outperforming previous models that are tailored to these specific domains. On VL tasks, Magma also compares favorably to popular VL models that are trained on much larger datasets. 

Building a foundation model that spans such different modalities has required us to rethink how we train and supervise AI agents. Magma introduces a novel training paradigm centered on two key innovations: Set-of-Mark (SoM) and Trace-of-Mark (ToM) annotations. These techniques developed by Microsoft Research, imbue the model with a structured understanding of tasks in both user interface navigation and robotic manipulation domains. 

  • Set-of-Mark (SoM): SoM is an annotated set of key objects, or interface elements that are relevant to achieving a given goal. For example, if the task is to navigate a web page, the SoM includes all the bounding boxes for clickable user interface elements. In a physical task like setting a table, the SoM could include the plate, the cup, and the position of each item on the table. By providing SoM, we give Magma a high-level hint of “what needs attention”—the essential elements of the task—without yet specifying the order or method.
Set-of-Mark prompting enables effective action grounding in images for both UI screenshot , robot manipulation and human video by having the model predict numeric marks for clickable buttons or robot arms in image space. These marks give Magma a high-level hint of “what needs attention” – the essential elements of the task.
Figure 2: Set-of-Mark (SoM) for Action Grounding. Set-of-Mark prompting enables effective action grounding in images for both UI screenshot (left), robot manipulation (middle) and human video (right) by having the model predict numeric marks for clickable buttons or robot arms in image space. These marks give Magma a high-level hint of “what needs attention” – the essential elements of the task 
  • Trace-of-Mark (ToM): In ToM we extend the strategy of “overlaying marks” from static images to dynamic videos, by incorporating tracing lines following object movements over time. While SoM highlights key objects or interface elements relevant to a task, ToM captures how these elements change or move throughout an interaction. For example, in a physical task like moving an object on a table, ToM might illustrate the motion of a hand placing the object and adjusting its position. By providing these temporal traces, ToM offers Magma a richer understanding of how actions unfold, complementing SoM’s focus on what needs attention.
Trace-of-Mark (ToM) for Action Planning. Trace-of-Mark supervisions for robot manipulation and human action. It compels the model to comprehend temporal video dynamics and anticipate future states before acting, while using fewer tokens than next-frame prediction to capture longer temporal horizons and action-related dynamics without ambient distractions
Figure 3: Trace-of-Mark (ToM) for Action Planning. Trace-of-Mark supervisions for robot manipulation (left) and human action (right). It compels the model to comprehend temporal video dynamics and anticipate future states before acting, while using fewer tokens than next-frame prediction to capture longer temporal horizons and action-related dynamics without ambient distractions. 

Performance and evaluation

Zero-shot agentic intelligence

Table 1: Zero-shot evaluation on agentic intelligence. We report the results for pretrained Magma without any domain-specific finetuning. In this experiment, Magma is the only model that can conduct the full task spectrum.
Table 1: Zero-shot evaluation on agentic intelligence. We report the results for pretrained Magma without any domain-specific finetuning. In this experiment, Magma is the only model that can conduct the full task spectrum.
Zero-shot evaluation on Google Robots and Bridge with SimplerEnv. Magma shows strong zero-shot cross-domain robustness and demonstrates impressive results in cross-embodiment manipulation simulation tasks.
Figure 4: Zero-shot evaluation on Google Robots and Bridge with SimplerEnv. Magma shows strong zero-shot cross-domain robustness and demonstrates impressive results in cross-embodiment manipulation simulation tasks.

Efficient finetuning

Table showing Zero-shot evaluation on agentic intelligence. We report the results for pretrained Magma without any domain-specific finetuning. In this experiment, Magma is the only model that can conduct the full task spectrum.
Table 2: Efficient finetuning on Mind2Web for web UI navigation.
Figure 5: Few-shot finetuning on Widow-X robot (left) and LIBERO (right). Magma achieves a significantly higher average success rate in all task suites. Additionally, removing SoM and ToM during pretraining has a negative impact on model performance.
Figure 5: Few-shot finetuning on Widow-X robot (left) and LIBERO (right). Magma achieves a significantly higher average success rate in all task suites. Additionally, removing SoM and ToM during pretraining has a negative impact on model performance.
Without task-specific data, Magma performs competitively and even outperforms some state-of-the-art approaches such as Video-Llama2 and ShareGPT4Video on most benchmarks, despite using much fewer video instruction tuning data.
Table 3: Without task-specific data, Magma performs competitively and even outperforms some state-of-the-art approaches such as Video-Llama2 and ShareGPT4Video on most benchmarks, despite using much fewer video instruction tuning data.

Relation to broader research

Magma is one component of a much larger vision within Microsoft Research for the future of agentic AI systems. Across various teams and projects at Microsoft, we are collectively exploring how AI systems can detect, analyze, and respond in the world to amplify human capabilities.

Earlier this month, we announced AutoGen v0.4, a fully reimagined open-source library for building advanced agentic AI systems. While AutoGen focuses on the structure and management of AI agents, Magma enhances those agents by empowering them with a new level of capability. Developers can already use AutoGen to set up an AI assistant that leverages an LLM for planning and dialogue using conventional LLMs. Now with MAGMA, if developers want to build agents that execute both physical or user interface/browser tasks, that same assistant would call upon Magma to understand the environment, perform reasoning, and take a sequence of actions to complete the task. 

The reasoning ability of Magma can be further developed by incorporating test-time search and reinforcement learning, as described in ExACT. ExACT shows an approach for teaching AI agents to explore more effectively, enabling them to intelligently navigate their environments, gather valuable information, evaluate options, and identify optimal decision-making and planning strategies.

At the application level, we are also exploring new user experience (UX) powered by foundation models for the next generation of agentic AI systems. Data Formulator is a prime example. Announced late last year, Data Formulator, is an AI-driven visualization tool developed by Microsoft Research that translates high-level analytical intents into rich visual representations by handling complex data transformations behind the scenes​.  

Looking ahead, the integration of reasoning, exploration and action capabilities will pave the way for highly capable, robust agentic AI systems.

Magma is available on Azure AI Foundry Labs (opens in new tab) as well as on HuggingFace (opens in new tab) with an MIT license. Please refer to the Magma project page (opens in new tab) for more technical details. We invite you to test and explore these cutting-edge agentic model innovations from Microsoft Research.

The post Magma: A foundation model for multimodal AI agents across digital and physical worlds appeared first on Microsoft Research.

Read More