Amazon Nova Lite enables Bito to offer a free tier option for its AI-powered code reviews

Amazon Nova Lite enables Bito to offer a free tier option for its AI-powered code reviews

This post is co-written by Amar Goel, co-founder and CEO of Bito.

Meticulous code review is a critical step in the software development process, one that helps delivery high-quality code that’s ready for enterprise use. However, it can be a time-consuming process at scale, when experts must review thousands of lines of code, looking for bugs or other issues. Traditional reviews can be subjective and inconsistent, because humans are human. As generative AI becomes more and more integrated into the software development process, a significant number of new AI-powered code review tools have entered the marketplace, helping software development groups raise quality and ship clean code faster—while boosting productivity and job satisfaction among developers.

Bito is an innovative startup that creates AI agents for a broad range of software developers. It emerged as a pioneer in its field in 2023, when it launched its first AI-powered developer agents, which brought large language models (LLMs) to code-writing environments. Bito’s executive team sees developers and AI as key elements of the future and works to bring them together in powerful new ways with its expanding portfolio of AI-powered agents. Its flagship product, AI Code Review Agent, speeds pull request (PR) time-to-merge up to 89%—accelerating development times and allowing developers to focus on writing code and innovating. Intended to assist, not replace, review by senior software engineers, Bito’s AI Code Review Agent focuses on summarizing, organizing, then suggesting line-level fixes with code base context, freeing engineers to focus on the more strategic aspects of code review, such as evaluating business logic.

With more than 100,000 active developers using its products globally, Bito has proven successful at using generative AI to streamline code review—delivering impressive results and business value to its customers, which include Gainsight, Privado, PubMatic, On-Board Data Systems (OBDS), and leading software enterprises. Bito’s customers have found that AI Code Review Agent provides the following benefits:

  • Accelerates PR merges by 89%
  • Reduces regressions by 34%
  • Delivers 87% of a PR’s feedback necessary for review
  • Has a 2.33 signal-to-noise ratio
  • Works in over 50 programming languages

Although these results are compelling, Bito needed to overcome inherent wariness among developers evaluating a flood of new AI-powered tools. To accelerate and expand adoption of AI Code Review Agent, Bito leaders wanted to launch a free tier option, which would let these prospective users experience the capabilities and value that AI Code Review Agent offered—and encourage them to upgrade to the full, for-pay Teams Plan. Its Free Plan would offer AI-generated PR summaries to provide an overview of changes, and its Teams Plan would provide more advanced features, such as downstream impact analysis and one-click acceptance of line-level code suggestions.

In this post, we share how Bito is able to offer a free tier option for its AI-powered code reviews using Amazon Nova.

Choosing a cost-effective model for a free tier offering

To offer a free tier option for AI Code Review Agent, Bito needed a foundation model (FM) that would provide the right level of performance and results at a reasonable cost. Offering code review for free to its potential customers would not be free for Bito, of course, because it would be paying inference costs. To identify a model for its Free Plan, Bito carried out a 2-week evaluation process across a broad range of models, including the high-performing FMs on Amazon Bedrock, as well as OpenAI GPT-4o mini. The Amazon Nova models—fast, cost-effective models that were recently introduced on Amazon Bedrock—were particularly interesting to the team.

At the end of its evaluation, Bito determined that Amazon Nova Lite delivered the right mix of performance and cost-effectiveness for its use cases. Its speed provided fast creation of code review summaries. However, cost—a key consideration for Bito’s Free Plan—proved to be the deciding factor. Ultimately, Amazon Nova Lite met Bito’s criteria for speed, cost, and quality. The combination of Amazon Nova Lite and Amazon Bedrock also make it possible for Bito to offer the reliability and security that their customers needed when entrusting Bito with their code. After all, careful control of code is one of Bito’s core promises to its customers. It doesn’t store code or use it for model training. And its products are SOC 2 Type 2-certified to provide data security, processing integrity, privacy, and confidentiality.

Implementing the right models for different tiers of offerings

Bito has now adopted Amazon Bedrock as its standardized platform to explore, add, and run models. Bito uses Amazon Nova Lite as the primary model for its Free Plan, and Anthropic’s Claude 3.7 Sonnet powers its for-pay Teams Plan, all accessed and integrated through the unified Amazon Bedrock API and controls. Amazon Bedrock provides seamless shifting from Amazon Nova Lite to Anthropic’s Sonnet when customers upgrade, with minimal code changes. Bito leaders are quick to point out that Amazon Nova Lite doesn’t just power its Free Plan—it inspired it. Without the very low cost of Amazon Nova Lite, they wouldn’t have been able to offer a free tier of AI Code Review Agent, which they viewed as a strategic move that would enable it to expand its enterprise customer base. This strategy quickly generated results, attracting three times more prospective customers to its Free Plan than anticipated. At the end of the 14-day trial period, a significant number of users convert to the full AI Code Review Agent to access its full array of capabilities.

Encouraged by the success with AI Code Review Agent, Bito is now using Amazon Nova Lite to power the chat capabilities of its offering for Bito Wingman, its latest AI agentic technology—a full-featured developer assistant in the integrated development environment (IDE) that combines code generation, error handling, architectural advice, and more. Again, the combination of quality and low cost of Amazon Nova Lite made it the right choice for Bito.

Conclusion

In this post, we shared how Bito—an innovative startup that offers a growing portfolio of AI-powered developer agents—chose Amazon Nova Lite to power its free tier offering of AI Code Review Agent, its flagship product. Its AI-powered agents are designed specifically to make developers’ lives easier and their work more impactful:

  • Amazon Nova Lite enabled Bito to meet one of its core business challenges—attracting enterprise customers. By introducing a free tier, Bito attracted three times more prospective new customers to its generative AI-driven flagship product—AI Code Review Agent.
  • Amazon Nova Lite outperformed other models during rigorous internal testing, providing the right level of performance at the very low cost Bito needed to launch a free tier of AI Code Review Agent.
  • Amazon Bedrock empowers Bito to seamlessly switch between models as needed for each tier of AI Code Review Agent—Amazon Nova Lite for its Free Plan and Anthropic’s Claude 3.7 Sonnet for its for-pay Teams Plan. Amazon Bedrock also provided security and privacy, critical considerations for Bito customers.
  • Bito shows how innovative organizations can use the combination of quality, cost-effectiveness, and speed in Amazon Nova Lite to deliver value to their customers—and to their business.

“Our challenge is to push the capabilities of AI to deliver new value to developers, but at a reasonable cost,” shares Amar Goel, co-founder and CEO of Bito. “Amazon Nova Lite gives us the very fast, low-cost model we needed to power the free offering of our AI Code Review Agent—and attract new customers.”

Get started with Amazon Nova on the Amazon Bedrock console. Learn more Amazon Nova Lite at the Amazon Nova product page.


About the authors

Eshan Bhatnagar is the Director of Product Management for Amazon AGI at Amazon Web Services.

Amar Goel is Co-Founder and CEO of Bito. A serial entrepreneur, Amar previously founded PubMatic (went public in 2020), and formerly worked at Microsoft, McKinsey, and was a software engineer at Netscape, the original browser company. Amar attended Harvard University. He is excited about using GenAI to power the next generation of how software gets built!

Read More

How Gardenia Technologies helps customers create ESG disclosure reports 75% faster using agentic generative AI on Amazon Bedrock

How Gardenia Technologies helps customers create ESG disclosure reports 75% faster using agentic generative AI on Amazon Bedrock

This post was co-written with Federico Thibaud, Neil Holloway, Fraser Price, Christian Dunn, and Frederica Schrager from Gardenia Technologies

“What gets measured gets managed” has become a guiding principle for organizations worldwide as they begin their sustainability and environmental, social, and governance (ESG) journeys. Companies are establishing baselines to track their progress, supported by an expanding framework of reporting standards, some mandatory and some voluntary. However, ESG reporting has evolved into a significant operational burden. A recent survey shows that 55% of sustainability leaders cite excessive administrative work in report preparation, while 70% indicate that reporting demands inhibit their ability to execute strategic initiatives. This environment presents a clear opportunity for generative AI to automate routine reporting tasks, allowing organizations to redirect resources toward more impactful ESG programs.

Gardenia Technologies, a data analytics company, partnered with the AWS Prototyping and Cloud Engineering (PACE) team to develop Report GenAI, a fully automated ESG reporting solution powered by the latest generative AI models on Amazon Bedrock. This post dives deep into the technology behind an agentic search solution using tooling with Retrieval Augmented Generation (RAG) and text-to-SQL capabilities to help customers reduce ESG reporting time by up to 75%.

In this post, we demonstrate how AWS serverless technology, combined with agents in Amazon Bedrock, are used to build scalable and highly flexible agent-based document assistant applications.

Scoping the challenge: Growing ESG reporting requirements and complexity

Sustainability disclosures are now a standard part of corporate reporting, with 96% of the 250 largest companies reporting on their sustainability progress based on government and regulatory frameworks. To meet reporting mandates, organizations must overcome many data collection and process-based barriers. Data for a single report includes thousands of data points from a multitude of sources including official documentation, databases, unstructured document stores, utility bills, and emails. The EU Corporate Sustainability Reporting Directive (CSRD) framework, for example, comprises of 1,200 individual data points that need to be collected across an enterprise. Even voluntary disclosures like the CDP, which is approximately 150 questions, cover a wide range of questions related to climate risk and impact, water stewardship, land use, and energy consumption. Collecting this information across an organization is time consuming.

A secondary challenge is that many organizations with established ESG programs need to report to multiple disclosure frameworks, such as SASB, GRI, TCFD, each using different reporting and disclosure standards. To complicate matters, reporting requirements are continually evolving, leaving organizations struggling just to keep up with the latest changes. Today, much of this work is highly manual and leaves sustainability teams spending more time on managing data collection and answering questionnaires rather than developing impactful sustainability strategies.

Solution overview: Automating undifferentiated heavy lifting with AI agents

Gardenia’s approach to strengthen ESG data collection for enterprises is Report GenAI, an agentic framework using generative AI models on Amazon Bedrock to automate large chunks of the ESG reporting process. Report GenAI pre-fills reports by drawing on existing databases, document stores and web searches. The agent then works collaboratively with ESG professionals to review and fine-tune responses. This workflow has five steps to help automate ESG data collection and assist in curating responses. These steps include setup, batch-fill, review, edit, and repeat. Let’s explore each step in more detail.

  • Setup: The Report GenAI agent is configured and authorized to access an ESG and emissions database, client document stores (emails, previous reports, data sheets), and document searches over the public internet. Client data is stored within specified AWS Regions using encrypted Amazon Simple Storage Service (Amazon S3) buckets with VPC endpoints for secure access, while relational data is hosted in Amazon Relational Database Service (Amazon RDS) instances deployed within Gardenia’s virtual private cloud (VPC). This architecture helps make sure data residency requirements can be fulfilled, while maintaining strict access controls through private network connectivity. The agent also has access to the relevant ESG disclosure questionnaire including questions and expected response format (we refer to this as a report specification). The following figure is an example of the Report GenAI user interface at the agent configuration step. As shown in the figure, the user can choose which databases, documents, or other tools the agent will use to answer a given question.

The user can choose which databases, documents or other tools the agent will use to answer a given question

  • Batch-fill: The agent then iterates through each question and data point to be disclosed and then retrieves relevant data from the client document stores and document searches. This information is processed to produce a response in the expected format depending on the disclosure report requirements.
  • Review: Each response includes cited sources and—if the response is quantitative—calculation methodology. This enables users to maintain a clear audit trail and verify the accuracy of batch-filled responses quickly.
  • Edit: While the agentic workflow is automated, our approach allows for a human-in-the-loop to review, validate, and iterate on batch-filled facts and figures. In the following figure, we show how users can chat with the AI assistant to request updates or manually refine responses. When the user is satisfied, the final answer is recorded. The agent will show references from which responses were sourced and allow the user to modify answers either directly or by providing an additional prompt.

The agent will show references from which responses were sourced and allow the user to modify answers either directly or by providing additional prompt

  • Repeat: Users can batch-fill multiple reporting frameworks to simplify and expand their ESG disclosure scope while avoiding extra effort to manually complete multiple questionnaires. After a report has been completed, it can then be added to the client document store so future reports can draw on it for knowledge. Report GenAI also supports bring your own report, which allows users to develop their own reporting specification (question and response model), which can then be imported into the application, as shown in the following figure.

The user can submit their own list of questions and configure the agent to pre-fill all responses in a single batch

Now that you have a description of the Report GenAI workflow, let’s explore how the architecture is built.

Architecture deep-dive: A serverless generative AI agent

The Report GenAI architecture consists of six components as illustrated in the following figure: A user interface (UI), the generative AI executor, the web search endpoint, a text-to-SQL tool, the RAG tool, and an embedding generation pipeline. The UI, generative AI executor, and generation pipeline components help orchestrate the workflow. The remaining three components function together to generate responses to perform the following actions:

  • Web search tool: Uses an internet search engine to retrieve content from public web pages.
  • Text-to-SQL tool: Generates and executes SQL queries to the company’s emissions database hosted by Gardenia Technologies. The tool uses natural language requests, such as “What were our Scope 2 emissions in 2024,” as input and returns the results from the emissions database.
  • Retrieval Augmented Generation (RAG) tool: Accesses information from the corporate document store (such as procedures, emails, and internal reports) and uses it as a knowledge base. This component acts as a retriever to return relevant text from the document store as a plain text query.

Architecture Diagram

Let’s take a look at each of the components.

1: Lightweight UI hosted on auto-scaled Amazon ECS Fargate

Users access Report GenAI by using the containerized Streamlit frontend. Streamlit offers an appealing UI for data and chat apps allowing data scientists and ML to build convincing user experiences with relatively limited effort. While not typically used for large-scale deployments, Streamlit proved to be a suitable choice for the initial iteration of Report GenAI.

The frontend is hosted on a load-balanced and auto-scaled Amazon Elastic Container Service (Amazon ECS) with Fargate launch type. This implementation of the frontend not only reduces the management overhead but also suits the expected intermittent usage pattern of Report GenAI, which is anticipated to be spikey with high-usage periods around the times when new reports must be generated (typically quarterly or yearly) and lower usage outside these windows. User authentication and authorization is handled by Amazon Cognito.

2: Central agent executor

The executor is an agent that uses reasoning capabilities of leading text-based foundation models (FMs) (for example, Anthropic’s Claude Sonnet 3.5 and Haiku 3.5) to break down user requests, gather information from document stores, and efficiently orchestrate tasks. The agent uses Reason and Act (ReAct), a prompt-based technique that enables large language models (LLMs) to generate both reasoning traces and task-specific actions in an interleaved manner. Reasoning traces help the model develop, track, and update action plans, while actions allow it to interface with a set of tools and information sources (also known as knowledge bases) that it can use to fulfil the task. The agent is prompted to think about an optimal sequence of actions to complete a given task with the tools at its disposal, observe the outcome, and iterate and improve until satisfied with the answer.

In combination, these tools provide the agent with capabilities to iteratively complete complex ESG reporting templates. The expected questions and response format for each questionnaire is captured by a report specification (ReportSpec) using Pydantic to enforce the desired output format for each reporting standard (for example, CDP, or TCFD). This ReportSpec definition is inserted into the task prompt. The first iteration of Report GenAI used Claude Sonnet 3.5 on Amazon Bedrock. As more capable and more cost effective LLMs become available on Amazon Bedrock (such as the recent release of Amazon Nova models), foundation models in Report GenAI can be swapped to remain up to date with the latest models.

The agent-executor is hosted on AWS Lambda and uses the open-source LangChain framework to implement the ReAct orchestration logic and implement the needed integration with memory, LLMs, tools and knowledge bases. LangChain offers deep integration with AWS using the first-party langchain-aws module. The module langchain-aws provides useful one-line wrappers to call tools using AWS Lambda, draw from a chat memory backed by Dynamo DB and call LLM models on Amazon Bedrock. LangChain also provides fine-grained visibility into each step of the ReAct decision making process to provide decision transparency.

LangChain provides fine-grained visibility into each step of the ReAct decision making process to provide decision transparency

3: Web-search tool

The web search tool is hosted on Lambda and calls an internet search engine through an API. The agent executor retrieves the information returned from the search engine to formulate a response. Web searches can be used in combination with the RAG tool to retrieve public context needed to formulate responses for certain generic questions, such as providing a short description of the reporting company or entity.

4: Text-to-SQL tool

A large portion of ESG reporting requirements is analytical in nature and requires processing of large amounts of numerical or tabular data. For example, a reporting standard might ask for total emissions in a certain year or quarter. LLMs are ill-equipped for questions of this nature. The Lambda-hosted text-to-SQL tool provides the agent with the required analytical capabilities. The tool uses a separate LLM to generate a valid SQL query given a natural language question along with the schema of an emissions database hosted on Gardenia. The generated query is then executed against this database and the results are passed back to the agent executor. SQL linters and error-correction loops are used for added robustness.

5: Retrieval Augmented Generation (RAG) tool

Much of the information required to complete ESG reporting resides in internal, unstructured document stores and can consist of PDF or Word documents, Excel spreadsheets, and even emails. Given the size of these document stores, a common approach is to use knowledge bases with vector embeddings for semantic search. The RAG tool enables the agent executor to retrieve only the relevant parts to answer questions from the document store. The RAG tool is hosted on Lambda and uses an in-memory Faiss index as a vector store. The index is persisted on Amazon S3 and loaded on demand whenever required. This workflow is advantageous for the given workload because of the intermittent usage of Report GenAI. The RAG tool accepts a plain text query from the agent executor as input, uses an embedding model on Amazon Bedrock to perform a vector search against the vector data base. The retrieved text is returned to the agent executor.

6: Embedding the generation asynchronous pipeline

To make text searchable, it must be indexed in a vector database. Amazon Step Functions provides a straightforward orchestration framework to manage this process: extracting plain text from the various document types, chunking it into manageable pieces, embedding the text, and then loading embeddings into a vector DB. Amazon Textract can be used as the first step for extracting text from visual-heavy documents like presentations or PDFs. An embedding model such as Amazon Titan Text Embeddings can then be used to embed the text and store it into a vector DB such as Lance DB. Note that Amazon Bedrock Knowledge Bases provides an end-to-end retrieval service automating most of the steps that were just described. However, for this application, Gardenia Technologies opted for a fully flexible implementation to retain full control over each design choice of the RAG pipeline (text extraction approach, embedding model choice, and vector database choice) at the expense of higher management and development overhead.

Evaluating agent performance

Making sure of accuracy and reliability in ESG reporting is paramount, given the regulatory and business implications of these disclosures. Report GenAI implements a sophisticated dual-layer evaluation framework that combines human expertise with advanced AI validation capabilities.

Validation is done both at a high level (such as evaluating full question responses) and sub-component level (such as breaking down to RAG, SQL search, and agent trajectory modules). Each of these has separate evaluation sets in addition to specific metrics of interest.

Human expert validation

The solution’s human-in-the-loop approach allows ESG experts to review and validate the AI-generated responses. This expert oversight serves as the primary quality control mechanism, making sure that generated reports align with both regulatory requirements and organization-specific context. The interactive chat interface enables experts to:

  • Verify factual accuracy of automated responses
  • Validate calculation methodologies
  • Verify proper context interpretation
  • Confirm regulatory compliance
  • Flag potential discrepancies or areas requiring additional review

A key feature in this process is the AI reasoning module, which displays the agent’s decision-making process, providing transparency into not only what answers were generated but how the agent arrived at those conclusions.

The user can review the steps in the AI agent’s reasoning to validate answers

These expert reviews provide valuable training data that can be used to enhance system performance through refinements to RAG implementations, agent prompts, or underlying language models.

AI-powered quality assessment

Complementing human oversight, Report GenAI uses state-of-the-art LLMs on Amazon Bedrock as LLM judges. These models are prompted to evaluate:

  • Response accuracy relative to source documentation
  • Completeness of answers against question requirements
  • Consistency with provided context
  • Alignment with reporting framework guidelines
  • Mathematical accuracy of calculations

The LLM judge operates by:

  • Analyzing the original question and context
  • Reviewing the generated response and its supporting evidence
  • Comparing the response against retrieved data from structured and unstructured sources
  • Providing a confidence score and detailed assessment of the response quality
  • Flagging potential issues or areas requiring human review

This dual-validation approach creates a robust quality assurance framework that combines the pattern recognition capabilities of AI with human domain expertise. The system continuously improves through feedback loops, where human corrections and validations help refine the AI’s understanding and response generation capabilities.

How Omni Helicopters International cuts its reporting time by 75%

Omni Helicopters International cut their CDP reporting time by 75% using Gardenia’s Report GenAI solution. In previous years, OHI’s CDP reporting required one month of dedicated effort from their sustainability team. Using Report GenAI, OHI tracked their GHG inventory and relevant KPIs in real time and then prepared their 2024 CDP submission in just one week. Read the full story in Preparing Annual CDP Reports 75% Faster.

“In previous years we needed one month to complete the report, this year it took just one week,” said Renato Souza, Executive Manager QSEU at OTA. “The ‘Ask the Agent’ feature made it easy to draft our own answers. The tool was a great support and made things much easier compared to previous years.”

Conclusion

In this post, we stepped through how AWS and Gardenia collaborated to build Report GenAI, an automated ESG reporting solution that relieves ESG experts of the undifferentiated heavy lifting of data gathering and analysis associated with a growing ESG reporting burden. This frees up time for more impactful, strategic sustainability initiatives. Report GenAI is available on the AWS Marketplace today. To dive deeper and start developing your own generative AI app to fit your use case, explore this workshop on building an Agentic LLM assistant on AWS.


About the Authors

Federico Thibaud is the CTO and Co-Founder of Gardenia Technologies, where he leads the data and engineering teams, working on everything from data acquisition and transformation to algorithm design and product development. Before co-founding Gardenia, Federico worked at the intersection of finance and tech — building a trade finance platform as lead developer and developing quantitative strategies at a hedge fund.

Neil Holloway is Head of Data Science at Gardenia Technologies where he is focused on leveraging AI and machine learning to build and enhance software products. Neil holds a masters degree in Theoretical Physics, where he designed and built programs to simulate high energy collisions in particle physics.

Fraser Price is a GenAI-focused Software Engineer at Gardenia Technologies in London, where he focuses on researching, prototyping and developing novel approaches to automation in the carbon accounting space using GenAI and machine learning. He received his MEng in Computing: AI from Imperial College London.

Christian Dunn is a Software Engineer based in London building ETL pipelines, web-apps, and other business solutions at Gardenia Technologies.

Frederica Schrager is a Marketing Analyst at Gardenia Technologies.

Karsten Schroer is a Senior ML Prototyping Architect at AWS. He supports customers in leveraging data and technology to drive sustainability of their IT infrastructure and build cloud-native data-driven solutions that enable sustainable operations in their respective verticals. Karsten joined AWS following his PhD studies in applied machine learning & operations management. He is truly passionate about technology-enabled solutions to societal challenges and loves to dive deep into the methods and application architectures that underlie these solutions.

Mohamed Ali Jamaoui is a Senior ML Prototyping Architect with over 10 years of experience in production machine learning. He enjoys solving business problems with machine learning and software engineering, and helping customers extract business value with ML. As part of AWS EMEA Prototyping and Cloud Engineering, he helps customers build business solutions that leverage innovations in MLOPs, NLP, CV and LLMs.

Marco Masciola is a Senior Sustainability Scientist at AWS. In his role, Marco leads the development of IT tools and technical products to support AWS’s sustainability mission. He’s held various roles in the renewable energy industry, and leans on this experience to build tooling to support sustainable data center operations.

Hin Yee Liu is a Senior Prototyping Engagement Manager at Amazon Web Services. She helps AWS customers to bring their big ideas to life and accelerate the adoption of emerging technologies. Hin Yee works closely with customer stakeholders to identify, shape and deliver impactful use cases leveraging Generative AI, AI/ML, Big Data, and Serverless technologies using agile methodologies.

Read More

NVIDIA Nemotron Super 49B and Nano 8B reasoning models now available in Amazon Bedrock Marketplace and Amazon SageMaker JumpStart

NVIDIA Nemotron Super 49B and Nano 8B reasoning models now available in Amazon Bedrock Marketplace and Amazon SageMaker JumpStart

This post is co-written with Eliuth Triana Isaza, Abhishek Sawarkar, and Abdullahi Olaoye from NVIDIA.

Today, we are excited to announce that the Llama 3.3 Nemotron Super 49B V1 and Llama 3.1 Nemotron Nano 8B V1 are available in Amazon Bedrock Marketplace and Amazon SageMaker JumpStart. With this launch, you can now deploy NVIDIA’s newest reasoning models to build, experiment, and responsibly scale your generative AI ideas on AWS.

In this post, we demonstrate how to get started with these models on Amazon Bedrock Marketplace and SageMaker JumpStart.

About NVIDIA NIMs on AWS

NVIDIA NIM inference microservices integrate closely with AWS managed services such as Amazon Elastic Compute Cloud (Amazon EC2), Amazon Elastic Kubernetes Service (Amazon EKS), and Amazon SageMaker AI, to enable the deployment of generative AI models at scale. As part of NVIDIA AI Enterprise, available in the AWS Marketplace, NVIDIA NIM is a set of easy-to-use microservices designed to accelerate the deployment of generative AI. These prebuilt containers support a broad spectrum of generative AI models from open source community models to NVIDIA AI Foundation and custom models. NIM microservices are deployed with a single command for easy integration into generative AI applications using industry-standard APIs and just a few lines of code, or with a few actions in the SageMaker JumpStart console. Engineered to facilitate seamless generative AI inferencing at scale, NIM ensures generative AI applications can be deployed anywhere.

Overview of NVIDIA Nemotron models

In this section, we provide an overview of the NVIDIA Nemotron Super and Nano NIM microservices discussed in this post.

Llama 3.3 Nemotron Super 49B V1

Llama-3.3-Nemotron-Super-49B-v1 is an LLM which is a derivative of Meta Llama-3.3-70B-Instruct (the reference model). It is a reasoning model that is post-trained for reasoning, human chat preferences, and task executions, such as Retrieval-Augmented Generation (RAG) and tool calling. The model supports a context length of 128K tokens. Using a novel Neural Architecture Search (NAS) approach, we greatly reduced the model’s memory footprint and increased efficiency to support larger workloads and for the model to fit onto a single Hopper GPU (P5 instances) at high workloads (H200).

Llama 3.1 Nemotron Nano 8B V1

Llama-3.1-Nemotron-Nano-8B-v1 is an LLM which is a derivative of Meta Llama-3.1-8B-Instruct (the reference model). It is a reasoning model that is post trained for reasoning, human chat preferences, and task execution, such as RAG and tool calling. The model supports a context length of 128K tokens. It is created from Llama 3.1 8B Instruct and offers improvements in model accuracy. The model fits on a single H100 or A100 GPU (P5 or P4 instances) and can be used locally.

About Amazon Bedrock Marketplace

Amazon Bedrock Marketplace plays a pivotal role in democratizing access to advanced AI capabilities through several key advantages:

  • Comprehensive model selection – Amazon Bedrock Marketplace offers an exceptional range of models, from proprietary to publicly available options, allowing organizations to find the perfect fit for their specific use cases.
  • Unified and secure experience – By providing a single access point for all models through the Amazon Bedrock APIs, Bedrock Marketplace significantly simplifies the integration process. Organizations can use these models securely, and for models that are compatible with the Amazon Bedrock Converse API, you can use the robust toolkit of Amazon Bedrock, including Amazon Bedrock Agents, Amazon Bedrock Knowledge Bases, Amazon Bedrock Guardrails, and Amazon Bedrock Flows.
  • Scalable infrastructure – Amazon Bedrock Marketplace offers configurable scalability through managed endpoints, allowing organizations to select their desired number of instances, choose appropriate instance types, define custom auto scaling policies that dynamically adjust to workload demands, and optimize costs while maintaining performance.

Deploy NVIDIA Llama Nemotron models in Amazon Bedrock Marketplace

Amazon Bedrock Marketplace gives you access to over 100 popular, emerging, and specialized foundation models (FMs) through Amazon Bedrock. To access the Nemotron reasoning models in Amazon Bedrock, complete the following steps:

  1. On the Amazon Bedrock console, in the navigation pane under Foundation models, choose Model catalog.
    You can also use the InvokeModel API to invoke the model. The InvokeModel API doesn’t support Converse APIs and other Amazon Bedrock tooling.
  2. On the Model catalog page, filter for NVIDIA as a provider and choose the Llama 3.3 Nemotron Super 49B V1 model.

The Model detail page provides essential information about the model’s capabilities, pricing structure, and implementation guidelines. You can find detailed usage instructions, including sample API calls and code snippets for integration.

  1. To begin using the Llama 3.3 Nemotron Super 49B V1 model, choose Subscribe to subscribe to the marketplace offer.
  2. On the model detail page, choose Deploy.

You will be prompted to configure the deployment details for the model. The model ID will be pre-populated.

  1. For Endpoint name, enter an endpoint name (between 1–50 alphanumeric characters).
  2. For Number of instances, enter a number of instances (between 1–100).
  3. For Instance type, choose your instance type. For optimal performance with Nemotron Super, a GPU-based instance type like ml.g6e.12xlarge is recommended.
    Optionally, you can configure advanced security and infrastructure settings, including virtual private cloud (VPC) networking, service role permissions, and encryption settings. For most use cases, the default settings will work well. However, for production deployments, you should review these settings to align with your organization’s security and compliance requirements.
  4. Choose Deploy to begin using the model.

When the deployment is complete, you can test its capabilities directly in the Amazon Bedrock playground.This is an excellent way to explore the model’s reasoning and text generation abilities before integrating it into your applications. The playground provides immediate feedback, helping you understand how the model responds to various inputs and letting you fine-tune your prompts for optimal results. A similar process can be followed for deploying the Llama 3.1 Nemotron Nano 8B V1 model as well.

Run inference with the deployed Nemotron endpoint

The following code example demonstrates how to perform inference using a deployed model through Amazon Bedrock using the InvokeModel api. The script initializes the bedrock_runtime client, configures inference parameters, and sends a request to generate text based on a user prompt. With Nemotron Super and Nano models, we can use a soft switch to toggle reasoning on and off. In the content field, set detailed thinking on or detailed thinking off.

Request

import boto3
import json

# Initialize Bedrock client
bedrock_runtime = boto3.client("bedrock-runtime")

# Configuration
MODEL_ID = ""  # Replace with Bedrock Marketplace endpoint arn

def invoke_model(prompt, max_tokens=1000, temperature=0.6, top_p=0.9):
    """
    Simple Bedrock model invocation
    """
    # Prepare model input
    body = {
        "model": "nvidia/llama-3.3-nemotron-super-49b-v1",
        "messages": [
            {   
                "role": "system",
                "content": "detailed thinking on" # to turn thinking off, simply toggle this
            },
            {
                "role": "user",
                "content": prompt  # Use the prompt parameter
            }
        ],
        "max_tokens": max_tokens,
        "temperature": temperature,
        "top_p": top_p
    }
    
    # Invoke model
    response = bedrock_runtime.invoke_model(
        modelId=MODEL_ID,
        body=json.dumps(body)
    )

    # Parse and return response
    model_output = json.loads(response['body'].read())
    print(json.dumps(model_output, indent=2))
    
# Example usage
if __name__ == "__main__":
    prompt = "Explain how a transformer neural network works."
    result = invoke_model(prompt)
    print(result)

Response body

{
  "id": "chat-43c35c87d5a646c9b3dcaba461eefeaa",
  "object": "chat.completion",
  "created": 1749566119,
  "model": "nvidia/llama-3.3-nemotron-super-49b-v1",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "<think>nOkay, I need to explain how a transformer neural network works. Let me start by recalling what I know about transformers. They were introduced in the paper "Attention Is All You Need" by Vaswani et al. in 2017, right? Before transformers, sequence models like RNNs and LSTMs were common, but they had issues with parallelization because of their sequential processing. Transformers changed that by using self-attention mechanisms, which allow parallel processing of the entire input sequence at once.nnSo, the main components of a transformer are the encoder and the decoder. Each encoder layer has two main sub-layers: multi-head self-attention and a fully connected feed-forward network. The decoder has these two plus an additional multi-head attention layer to interact with the encoder's output.nnLet me break down the encoder first. The self-attention mechanism is key here. Self-attention allows the model to attend to all positions in the input sequence simultaneously and weigh their importance. The multi-head part means that instead of just one attention mechanism, you have multiple (like 8 or 12) running in parallel, each with a different linear projection. This helps capture different types of relationships between tokens.nnEach head computes attention by using Query (Q), Key (K), and Value (V) matrices. These are derived from the input by linear transformations. The attention weights are computed as the dot product of Q and K transposed, divided by the square root of the key's dimensionality (for stability). Then, these weights are used to compute a weighted sum of the Value vectors. This gives the output for one attention head. Since there are multiple heads, their outputs are concatenated and then linearly transformed.nnAfter the multi-head attention, there's a feed-forward network (FFN) applied individually to each position. It's a simple neural network with a ReLU activation in the middle. Both the attention and FFN layers use residual connections followed by layer normalization.nnThe decoder is similar but has an extra attention layer that attends to the encoder's output. Also, in the decoder's self-attention, they use masking to prevent looking ahead in the sequence, which is important for tasks like translation where you generate one token at a time.nnPositional encodings are added to the input embeddings because transformers don't have recurrence or convolution, so they need a way to preserve sequence order. The encodings are fixed and based on sine and cosine functions, which allows the model to handle sequences of varying lengths.nnI should also mention the overall architecture: the input goes through an embedding layer plus positional encoding, then through a stack of encoder layers. The output from the encoder goes into the decoder's stack of layers, and the final output goes through a linear layer and softmax for tasks like machine translation.nnWait, maybe I should explain each part step by step. Start with the overall structure, then dive into encoder and decoder layers, then the self-attention mechanism, positional encodings, and how it's different from RNNs.nnPotential pitfalls: Confusing encoder and decoder layers, not explaining why multi-head attention is useful, or omitting the role of layer normalization and residual connections. Also, ensuring that the explanation of attention weights calculation is clear.nnLet me check if I missed anything. Oh, the decoder's structure: after self-attention, there's encoder-decoder attention, then FFN. And the output linear layer. Also, the training process involves teacher forcing, but maybe that's beyond the basic explanation.nnI need to make sure the explanation is clear for someone who knows basics of neural networks but not transformers. Use analogies if possible, like how attention is like selectively focusing on different parts of the input when processing each element.nnAnother point: the efficiency. Since attention is O(n^2) with sequence length n, but it's parallelizable, which is better than O(n) for RNNs which can't be parallelized. But for very long sequences, transformers can be computationally expensive.nnAlso, the difference between BERT-style transformers (encoder-only) and sequence-to-sequence models (both encoder and decoder). But the question is about transformer neural networks in general, so covering the basic encoder-decoder structure is sufficient.nnOkay, putting it all together in a structured way: introduction, architecture overview, encoder details, decoder details, key components (self-attention, positional encodings), advantages, and maybe an example application.n</think>nnA transformer neural network is a revolutionary architecture introduced in 2017 for sequence-to-sequence tasks, replacing recurrent neural networks (RNNs) with self-attention mechanisms for parallel processing. Here's a structured explanation:nn### **1. Architecture Overview**n- **Encoder**: Takes in a sequence (e.g., text) and outputs a continuous representation.n- **Decoder**: Generates an output sequence (e.g., translated text) one token at a time, conditioned on the"
      },
      "logprobs": null,
      "finish_reason": "length",
      "stop_reason": null
    }
  ],
  "usage": {
    "prompt_tokens": 29,
    "total_tokens": 1029,
    "completion_tokens": 1000
  },
  "prompt_logprobs": null
}

Amazon SageMaker JumpStart overview

SageMaker JumpStart is a fully managed service that offers state-of-the-art foundation models for various use cases such as content writing, code generation, question answering, copywriting, summarization, classification, and information retrieval. It provides a collection of pre-trained models that you can deploy quickly, accelerating the development and deployment of ML applications. One of the key components of SageMaker JumpStart is model hubs, which offer a vast catalog of pre-trained models, such as Mistral, for a variety of tasks. You can now discover and deploy Llama 3.3 Nemotron Super 49B V1 and Llama-3.1-Nemotron-Nano-8B-v1 in Amazon SageMaker Studio or programmatically through the SageMaker Python SDK, so you can derive model performance and MLOps controls with Amazon SageMaker AI features such as Amazon SageMaker Pipelines, Amazon SageMaker Debugger, or container logs. The model is deployed in a secure AWS environment and in your VPC, helping to support data security for enterprise security needs.

Prerequisites

Before getting started with deployment, make sure your AWS Identity and Access Management (IAM) service role for Amazon SageMaker has the AmazonSageMakerFullAccess permission policy attached. To deploy the NVIDIA Llama Nemotron models successfully, confirm one of the following:

  • Make sure your IAM role has the following permissions and you have the authority to make AWS Marketplace subscriptions in the AWS account used:
    • aws-marketplace:ViewSubscriptions
    • aws-marketplace:Unsubscribe
    • aws-marketplace:Subscribe
  • If your account is already subscribed to the model, you can skip to the Deploy section below. Otherwise, please start by subscribing to the model package and then move to the Deploy section.

Subscribe to the model package

To subscribe to the model package, complete the following steps:

  1. Open the model package listing page and choose Llama 3.3 Nemotron Super 49B V1 or Llama 3.1 Nemotron Nano 8B V1.
  2. On the AWS Marketplace listing, choose Continue to subscribe.
  3. On the Subscribe to this software page, review and choose Accept Offer if you and your organization agree with EULA, pricing, and support terms.
  4. Choose Continue to with the configuration and then choose an AWS Region where you have the service quota for the desired instance type.

A product ARN will be displayed. This is the model package ARN that you need to specify while creating a deployable model using Boto3.

(Option-1) Deploy NVIDIA Llama Nemotron Super and Nano models on SageMaker JumpStart

For those new to SageMaker Jumpstart, we will go to SageMaker Studio to access models on SageMaker Jumpstart. The Llama 3.3 Nemotron Super 49B V1and Llama 3.1 Nemotron Nano 8B V1 models are available on SageMaker Jumpstart. Deployment starts when you choose the Deploy option, you may be prompted to subscribe to this model on the Marketplace. If you are already subscribed, then you can move forward with selecting the second Deploy button. After deployment finishes, you will see that an endpoint is created. You can test the endpoint by passing a sample inference request payload or by selecting the testing option using the SDK.

(Option-2) Deploy NVIDIA Llama Nemotron using the SageMaker SDK

In this section we will walk through deploying the Llama 3.3 Nemotron Super 49B V1 model through the SageMaker SDK. A similar process can be followed for deploying the Llama 3.1 Nemotron Nano 8B V1 model as well.

Define the SageMaker model using the Model Package ARN

To deploy the model using the SDK, copy the product ARN from the previous step and specify it in the model_package_arn in the following code:

sm_model_name  "nim-llama-3-3-nemotron-super-49b-v1"
create_model_response  smcreate_model(
    ModelNamesm_model_name,
    PrimaryContainer{
        'ModelPackageName': model_package_arn
    },
    ExecutionRoleArnrole,
    EnableNetworkIsolation
)
print("Model Arn: "  create_model_response["ModelArn"])

Create the endpoint configuration

Next, we can create endpoint configuration by specifying instance type, in this case it’s ml.g6e.12xlarge. Make sure you have the account-level service limit for using ml.g6e.12xlarge for endpoint usage as one or more instances. NVIDIA also provides a list of supported instance types that supports deployment. Refer to the AWS Marketplace listing for both of these models to see supported instance types. To request a service quota increase, see AWS service quotas.

endpoint_config_name  sm_model_name

create_endpoint_config_response  smcreate_endpoint_config(
    EndpointConfigNameendpoint_config_name,
    ProductionVariants[
        {
            'VariantName': 'AllTraffic',
            'ModelName': sm_model_name,
            'InitialInstanceCount': 1,
            'InstanceType': 'ml.g6e.12xlarge',
            'InferenceAmiVersion': 'al2-ami-sagemaker-inference-gpu-2',
            'RoutingConfig': {'RoutingStrategy': 'LEAST_OUTSTANDING_REQUESTS'},
            'ModelDataDownloadTimeoutInSeconds': 3600, 
            'ContainerStartupHealthCheckTimeoutInSeconds': 3600, 
        }
    ]
)
print("Endpoint Config Arn: "  create_endpoint_config_response["EndpointConfigArn"])

Create the endpoint

Using the previous endpoint configuration we create a new SageMaker endpoint and add a wait and loop as shown below until the deployment finishes. This typically takes around 5-10 minutes. The status will change to InService once the deployment is successful.

endpoint_name  endpoint_config_name
create_endpoint_response  smcreate_endpoint(
    EndpointNameendpoint_name,
    EndpointConfigNameendpoint_config_name
)
print("Endpoint Arn: "  create_endpoint_response["EndpointArn"

Deploy the endpoint

Let’s now deploy and track the status of the endpoint.

resp = sm.describe_endpoint(EndpointName=endpoint_name)
status = resp["EndpointStatus"]
print("Status: " + status)

while status == "Creating":
    time.sleep(60)
    resp = sm.describe_endpoint(EndpointName=endpoint_name)
    status = resp["EndpointStatus"]
    print("Status: " + status)
    
print("Arn: " + resp["EndpointArn"])
print("Status: " + status)

Run Inference with Llama 3.3 Nemotron Super 49B V1

Once we have the model, we can use a sample text to do an inference request. NIM on SageMaker supports the OpenAI API inference protocol inference request format. For explanation of supported parameters please see Creates a model in the NVIDIA documentation.

Real-Time inference example

The following code examples illustrate how to perform real-time inference using the Llama 3.3 Nemotron Super 49B V1 model in non-reasoning and reasoning mode.

Non-reasoning mode

Perform real-time inference in non-reasoning mode:

payload_model  "nvidia/llama-3.3-nemotron-super-49b-v1"
messages  [
    {
    "role": "system",
    "content": "detailed thinking off"
    },
    {
    "role":"user",
    "content":"Explain how a transformer neural network works."
    }
    ]
    
payload  {
    "model": payload_model,
    "messages": messages,
    "max_tokens": 3000
}

response  clientinvoke_endpoint(
    EndpointNameendpoint_name, ContentType"application/json", Bodyjsondumps(payload)
)

output  jsonloads(response["Body"]read()decode("utf8"))
print(jsondumps(output, indent2))

Reasoning mode

Perform real-time inference in reasoning mode:

payload_model  "nvidia/llama-3.3-nemotron-super-49b-v1"
messages  [
    {
    "role": "system",
    "content": "detailed thinking on"
    },
    {
    "role":"user",
    "content":"Explain how a transformer neural network works."
    }
    ]
payload  {
    "model": payload_model,
    "messages": messages,
    "max_tokens": 3000
}

response  clientinvoke_endpoint(
    EndpointNameendpoint_name, ContentType"application/json", Bodyjsondumps(payload)
)

output  jsonloads(response["Body"]read()decode("utf8"))
print(jsondumps(output, indent2))

Streaming inference

NIM on SageMaker also supports streaming inference and you can enable that by setting stream as True in the payload and by using the invoke_endpoint_with_response_stream method.

Streaming inference:

payload_model = "nvidia/llama-3.3-nemotron-super-49b-v1"
messages = [
    {   
      "role": "system",
      "content": "detailed thinking on"# this can be toggled off to disable reasoning
    },
    {
      "role":"user",
      "content":"Explain how a transformer neural network works."
    }
  ]

payload = {
  "model": payload_model,
  "messages": messages,
  "max_tokens": 3000,
  "stream": True
}

response = client.invoke_endpoint_with_response_stream(
    EndpointName=endpoint_name,
    Body=json.dumps(payload),
    ContentType="application/json",
    Accept="application/jsonlines",
)

We can use some post-processing code for the streaming output that reads the byte-chunks coming from the endpoint, pieces them into full JSON messages, extracts any new text the model produced, and immediately prints that text to output.

event_stream = response['Body']
accumulated_data = ""
start_marker = 'data:'
end_marker = '"finish_reason":null}]}'

for event in event_stream:
    try:
        payload = event.get('PayloadPart', {}).get('Bytes', b'')
        if payload:
            data_str = payload.decode('utf-8')

            accumulated_data += data_str

            # Process accumulated data when a complete response is detected
            while start_marker in accumulated_data and end_marker in accumulated_data:
                start_idx = accumulated_data.find(start_marker)
                end_idx = accumulated_data.find(end_marker) + len(end_marker)
                full_response = accumulated_data[start_idx + len(start_marker):end_idx]
                accumulated_data = accumulated_data[end_idx:]

                try:
                    data = json.loads(full_response)
                    content = data.get('choices', [{}])[0].get('delta', {}).get('content', "")
                    if content:
                        print(content, end='', flush=True)
                except json.JSONDecodeError:
                    continue
    except Exception as e:
        print(f"nError processing event: {e}", flush=True)
        continue

Clean up

To avoid unwanted charges, complete the steps in this section to clean up your resources.

Delete the Amazon Bedrock Marketplace deployment

If you deployed the model using Amazon Bedrock Marketplace, complete the following steps:

  1. On the Amazon Bedrock console, in the navigation pane in the Foundation models section, choose Marketplace deployments.
  2. In the Managed deployments section, locate the endpoint you want to delete.
  3. Select the endpoint, and on the Actions menu, choose Delete.
  4. Verify the endpoint details to make sure you’re deleting the correct deployment:
    1. Endpoint name
    2. Model name
    3. Endpoint status
  5. Choose Delete to delete the endpoint.
  6. In the Delete endpoint confirmation dialog, review the warning message, enter confirm, and choose Delete to permanently remove the endpoint.

Delete the SageMaker JumpStart Endpoint

The SageMaker JumpStart model you deployed will incur costs if you leave it running. Use the following code to delete the endpoint if you want to stop incurring charges. For more details, see Delete Endpoints and Resources.

sm.delete_model(ModelName=sm_model_name)
sm.delete_endpoint_config(EndpointConfigName=endpoint_config_name)
sm.delete_endpoint(EndpointName=endpoint_name)

Conclusion

NVIDIA’s Nemotron Llama3 models deliver optimized AI reasoning capabilities and are now available on AWS through Amazon Bedrock Marketplace and Amazon SageMaker JumpStart. The Llama 3.3 Nemotron Super 49B V1, derived from Meta’s 70B model, uses Neural Architecture Search (NAS) to achieve a reduced 49B parameter count while maintaining high accuracy, enabling deployment on a single H200 GPU despite its sophisticated capabilities. Meanwhile, the compact Llama 3.1 Nemotron Nano 8B V1 fits on a single fits on a single H100 or A100 GPU (P5 or P4 instances) while improving on Meta’s reference model accuracy, making it ideal for efficiency-conscious applications. Both models support extensive 128K token context windows and are post-trained for enhanced reasoning, RAG capabilities, and tool calling, offering organizations flexible options to balance performance and computational requirements for enterprise AI applications.

With this launch, organizations can now leverage the advanced reasoning capabilities of these models while benefiting from the scalable infrastructure of AWS. Through either the intuitive UI or just a few lines of code, you can quickly deploy these powerful language models to transform your AI applications with minimal effort. These complementary platforms provide straightforward access to NVIDIA’s robust technologies, allowing teams to immediately begin exploring and implementing sophisticated reasoning capabilities in their enterprise solutions.


About the authors

Niithiyn Vijeaswaran is a Generative AI Specialist Solutions Architect with the Third-Party Model Science team at AWS. His area of focus is AWS AI accelerators (AWS Neuron). He holds a Bachelor’s in Computer Science and Bioinformatics.

Chase Pinkerton is a Startups Solutions Architect at Amazon Web Services. He holds a Bachelor’s in Computer Science with a minor in Economics from Tufts University. He’s passionate about helping startups grow and scale their businesses. When not working, he enjoys road cycling, hiking, playing volleyball, and photography.

Varun Morishetty is a Software Engineer with Amazon SageMaker JumpStart and Bedrock Marketplace. Varun received his Bachelor’s degree in Computer Science from Northeastern University. In his free time, he enjoys cooking, baking and exploring New York City.

Brian Kreitzer is a Partner Solutions Architect at Amazon Web Services (AWS). He works with partners to define business requirements, provide architectural guidance, and design solutions for the Amazon Marketplace.

Eliuth Triana Isaza is a Developer Relations Manager at NVIDIA, empowering Amazon’s AI MLOps, DevOps, scientists, and AWS technical experts to master the NVIDIA computing stack for accelerating and optimizing generative AI foundation models spanning from data curation, GPU training, model inference, and production deployment on AWS GPU instances. In addition, Eliuth is a passionate mountain biker, skier, and tennis and poker player.

Abhishek Sawarkar is a product manager in the NVIDIA AI Enterprise team working on integrating NVIDIA AI Software in Cloud MLOps platforms. He focuses on integrating the NVIDIA AI end-to-end stack within cloud platforms and enhancing user experience on accelerated computing.

Abdullahi Olaoye is a Senior AI Solutions Architect at NVIDIA, specializing in integrating NVIDIA AI libraries, frameworks, and products with cloud AI services and open source tools to optimize AI model deployment, inference, and generative AI workflows. He collaborates with AWS to enhance AI workload performance and drive adoption of NVIDIA-powered AI and generative AI solutions.

Read More

NVIDIA CEO Drops the Blueprint for Europe’s AI Boom

NVIDIA CEO Drops the Blueprint for Europe’s AI Boom

At GTC Paris — held alongside VivaTech, Europe’s largest tech event — NVIDIA founder and CEO Jensen Huang delivered a clear message: Europe isn’t just adopting AI — it’s building it.

“We now have a new industry, an AI industry, and it’s now part of the new infrastructure, called intelligence infrastructure, that will be used by every country, every society,” Huang said, addressing an audience gathered online and at the iconic Dôme de Paris.

From exponential inference growth to quantum breakthroughs, and from infrastructure to industry, agentic AI to robotics, Huang outlined how the region is laying the groundwork for an AI-powered future.

A New Industrial Revolution

At the heart of this transformation, Huang explained, are systems like GB200 NVL72 — “one giant GPU” and NVIDIA’s most powerful AI platform yet — now in full production and powering everything from sovereign models to quantum computing.

“This machine was designed to be a thinking machine, a thinking machine, in the sense that it reasons, it plans, it spends a lot of time talking to itself,” Huang said, walking the audience through the size and scale of these machines and their performance.

At GTC Paris, NVIDIA CEO Jensen Huang shows audience members the innards of some of NVIDIA’s latest hardware.

There’s more coming, with Huang saying NVIDIA’s partners are now producing 1,000 GB200 systems a week, “and this is just the beginning,” walking the audience through a range of available systems ranging from the tiny DGX Spark to rack-mounted RTX PRO Servers.

Huang explained that NVIDIA is working to help countries use technologies like these to build both AI infrastructure — services built for third parties to use and innovate on — and AI factories, which companies build for their own use, to generate revenue.

NVIDIA is partnering with European governments, telcos and cloud providers to deploy NVIDIA technologies across the region. NVIDIA is also expanding its network of technology centers across Europe — including new hubs in Finland, Germany, Spain, Italy and the U.K. — to accelerate skills development and quantum growth.

Quantum Meets Classical

Europe’s quantum ambitions just got a boost.

The NVIDIA CUDA-Q platform is live on Denmark’s Gefion supercomputer, opening new possibilities for hybrid AI and quantum engineering, and Huang announced that CUDA-Q is now available on NVIDIA Grace Blackwell systems.

Across the continent, NVIDIA is partnering with supercomputing centers and quantum hardware builders to advance hybrid quantum-AI research and accelerate quantum error correction, Huang said.

“Quantum computing is reaching an inflection point,” Huang said. “We are within reach of being able to apply quantum computing, quantum classical computing, in areas that can solve some interesting problems in the coming years.”

Sovereign Models, Smarter Agents

European developers want more control over their models. Enter NVIDIA Nemotron, designed to help build large language models tuned to local needs.

“And so now you know that you have access to an enhanced open model that is still open, that is top of the leader chart,” Huang said.

These models will be coming to Perplexity, a reasoning search engine, enabling secure, multilingual AI deployment across Europe.

“You can now ask and get questions answered in the language, in the culture, in the sensibility of your country,” Huang said.

More’s coming. Every company will build its own agents, Huang said. To help create those agents, Huang introduced a suite of agentic AI blueprints, including an Agentic AI Safety blueprint for enterprises and governments.

The new NVIDIA NeMo Agent toolkit and NVIDIA AI Blueprint for building data flywheels further accelerate the development of safe, high-performing AI agents.

To help deploy these agents, NVIDIA is partnering with European governments, telcos and cloud providers to deploy the DGX Cloud Lepton platform across the region, providing instant access to accelerated computing capacity.

“One model architecture, one deployment, and you can run it anywhere,” Huang said, adding that Lepton is now integrated with Hugging Face, giving developers direct access to global compute.

The Industrial Cloud Goes Live

AI isn’t just virtual. It’s powering physical systems, too, sparking a “new industrial revolution.”

“We’re working on industrial AI with one company after another,” Huang said, describing work to build digital twins based on NVIDIA Omniverse with companies across the continent.

During his keynote, NVIDIA CEO Jensen Huang explained that everything he showed during the keynote was “computer simulation, not animation” and that it looks beautiful because “it turns out the world is beautiful, and it turns out math is beautiful.”

To further this work, Huang announced NVIDIA is launching the world’s first industrial AI cloud — to be built in Germany — to help Europe’s manufacturers simulate, automate and optimize at scale.

More’s coming. “Soon, everything that moves will be robotic,” Huang said. “And the car is the next one.”

NVIDIA DRIVE, NVIDIA’s full-stack AV platform, is now in production to accelerate the large-scale deployment of safe, intelligent transportation.

And to show what’s coming next, Huang was joined on stage by Grek, a pint-sized robot, as Huang talked about how NVIDIA partnered with DeepMind and Disney to build Newton, the world’s most advanced physics training engine for robotics.

The Next Wave

The next wave of AI has begun — and it’s exponential, Huang explained.

“We have physical robots, and we have information robots. We call them agents,” Huang said.  “The technology necessary to teach a robot to manipulate, to simulate — and of course, the manifestation of an incredible robot — is now right in front of us.”

This new era of AI is being driven by a surge in inference workloads. “The number of people using inference has gone from 8 million to 800 million — 100 times in just a couple of years,” Huang said.

To meet this demand, Huang emphasized the need for a new kind of computer: “We need a special computer designed for thinking, designed for reasoning. And that’s what Blackwell is — a thinking machine.”

These Blackwell-powered systems will live in a new class of data centers — AI factories — built to generate tokens, the raw material of modern intelligence.

“These AI factories are going to generate tokens,” Huang said, turning to Grek with a smile. “And these tokens are going to become your food, little Grek.”

With that, the keynote closed on a bold vision: a future powered by sovereign infrastructure, agentic AI, robotics — and exponential inference — all built in partnership with Europe.

Read More

European Broadcasting Union and NVIDIA Partner on Sovereign AI to Support Public Broadcasters

European Broadcasting Union and NVIDIA Partner on Sovereign AI to Support Public Broadcasters

In a new effort to advance sovereign AI for European public service media, NVIDIA and the European Broadcasting Union (EBU) are working together to give the media industry access to high-quality and trusted cloud and AI technologies.

Announced at NVIDIA GTC Paris at VivaTech, NVIDIA’s collaboration with the EBU — the world’s leading alliance of public service media with more than 110 member organizations in 50+ countries, reaching an audience of over 1 billion — focuses on helping build sovereign AI and cloud frameworks, driving workforce development and cultivating an AI ecosystem to create a more equitable, accessible and resilient European media landscape.

The work will create better foundations for public service media to benefit from European cloud infrastructure and AI services that are exclusively governed by European policy, comply with European data protection and privacy rules, and embody European values.

Sovereign AI ensures nations can develop and deploy artificial intelligence using local infrastructure, datasets and expertise. By investing in it, European countries can preserve their cultural identity, enhance public trust and support innovation specific to their needs.

“We are proud to collaborate with NVIDIA to drive the development of sovereign AI and cloud services,” said Michael Eberhard, chief technology officer of public broadcaster ARD/SWR, and chair of the EBU Technical Committee. “By advancing these capabilities together, we’re helping ensure that powerful, compliant and accessible media services are made available to all EBU members — powering innovation, resilience and strategic autonomy across the board.”

Empowering Media Innovation in Europe

To support the development of sovereign AI technologies, NVIDIA and the EBU will establish frameworks that prioritize independence and public trust, helping ensure that AI serves the interests of Europeans while preserving the autonomy of media organizations.

Through this collaboration, NVIDIA and the EBU will develop hybrid cloud architectures designed to meet the highest standards of European public service media. The EBU will contribute its Dynamic Media Facility (DMF) and Media eXchange Layer (MXL) architecture, aiming to enable interoperability and scalability for workflows, as well as cost- and energy-efficient AI training and inference. Following open-source principles, this work aims to create an accessible, dynamic technology ecosystem.

The collaboration will also provide public service media companies with the tools to deliver personalized, contextually relevant services and content recommendation systems, with a focus on transparency, accountability and cultural identity. This will be realized through investment in sovereign cloud and AI infrastructure and software platforms such as NVIDIA AI Enterprise, custom foundation models, large language models trained with local data, and retrieval-augmented generation technologies.

As part of the collaboration, NVIDIA is also making available resources from its Deep Learning Institute, offering European media organizations comprehensive training programs to create an AI-ready workforce. This will support the EBU’s efforts to help ensure news integrity in the age of AI.

In addition, the EBU and its partners are investing in local data centers and cloud platforms that support sovereign technologies, such as NVIDIA GB200 Grace Blackwell Superchip, NVIDIA RTX PRO Servers, NVIDIA DGX Cloud and NVIDIA Holoscan for Media — helping members of the union achieve secure and cost- and energy-efficient AI training, while promoting AI research and development.

Partnering With Public Service Media for Sovereign Cloud and AI

Collaboration within the media sector is essential for the development and application of comprehensive standards and best practices that ensure the creation and deployment of sovereign European cloud and AI.

By engaging with independent software vendors, data center providers, cloud service providers and original equipment manufacturers, NVIDIA and the EBU aim to create a unified approach to sovereign cloud and AI.

This work will also facilitate discussions between the cloud and AI industry and European regulators, helping ensure the development of practical solutions that benefit both the general public and media organizations.

“Building sovereign cloud and AI capabilities based on EBU’s Dynamic Media Facility and Media eXchange Layer architecture requires strong cross-industry collaboration,” said Antonio Arcidiacono, chief technology and innovation officer at the EBU. “By collaborating with NVIDIA, as well as a broad ecosystem of media technology partners, we are fostering a shared foundation for trust, innovation and resilience that supports the growth of European media.”

Learn more about the EBU.

Watch the NVIDIA GTC Paris keynote from NVIDIA founder and CEO Jensen Huang at VivaTech, and explore GTC Paris sessions

Read More

Calling on LLMs: New NVIDIA AI Blueprint Helps Automate Telco Network Configuration

Calling on LLMs: New NVIDIA AI Blueprint Helps Automate Telco Network Configuration

Telecom companies last year spent nearly $295 billion in capital expenditures and over $1 trillion in operating expenditures.

These large expenses are due in part to laborious manual processes that telcos face when operating networks that require continuous optimizations.

For example, telcos must constantly tune network parameters for tasks — such as transferring calls from one network to another or distributing network traffic across multiple servers — based on the time of day, user behavior, mobility and traffic type.

These factors directly affect network performance, user experience and energy consumption.

To automate these optimization processes and save costs for telcos across the globe, NVIDIA today unveiled at GTC Paris its first AI Blueprint for telco network configuration.

At the blueprint’s core are customized large language models trained specifically on telco network data — as well as the full technical and operational architecture for turning the LLMs into an autonomous, goal-driven AI agent for telcos.

Automate Network Configuration With the AI Blueprint

NVIDIA AI Blueprints — available on build.nvidia.com — are customizable AI workflow examples. They include reference code, documentation and deployment tools that show enterprise developers how to deliver business value with NVIDIA NIM microservices.

The AI Blueprint for telco network configuration — built with BubbleRAN 5G solutions and datasets — enables developers, network engineers and telecom providers to automatically optimize the configuration of network parameters using agentic AI.

This can streamline operations, reduce costs and significantly improve service quality by embedding continuous learning and adaptability directly into network infrastructures.

Traditionally, network configurations required manual intervention or followed rigid rules to adapt to dynamic network conditions. These approaches limited adaptability and increased operational complexities, costs and inefficiencies.

The new blueprint helps shift telco operations from relying on static, rules-based systems to operations based on dynamic, AI-driven automation. It enables developers to build advanced, telco-specific AI agents that make real-time, intelligent decisions and autonomously balance trade-offs — such as network speed versus interference, or energy savings versus utilization — without human input.

Powered and Deployed by Industry Leaders

Trained on 5G data generated by BubbleRAN, and deployed on the BubbleRAN 5G O-RAN platform, the blueprint provides telcos with insight on how to set various parameters to reach performance goals, like achieving a certain bitrate while choosing an acceptable signal-to-noise ratio — a measure that impacts voice quality and thus user experience.

With the new AI Blueprint, network engineers can confidently set initial parameter values and update them as demanded by continuous network changes.

Norway-based Telenor Group, which serves over 200 million customers globally, is the first telco to integrate the AI Blueprint for telco network configuration as part of its initiative to deploy intelligent, autonomous networks that meet the performance and agility demands of 5G and beyond.

“The blueprint is helping us address configuration challenges and enhance quality of service during network installation,” said Knut Fjellheim, chief technology innovation officer at Telenor Maritime. “Implementing it is part of our push toward network automation and follows the successful deployment of agentic AI for real-time network slicing in a private 5G maritime use case.”

Industry Partners Deploy Other NVIDIA-Powered Autonomous Network Technologies

The AI Blueprint for telco network configuration is just one of many announcements at NVIDIA GTC Paris showcasing how the telecom industry is using agentic AI to make autonomous networks a reality.

Beyond the blueprint, leading telecom companies and solutions providers are tapping into NVIDIA accelerated computing, software and microservices to provide breakthrough innovations poised to vastly improve networks and communications services — accelerating the progress to autonomous networks and improving customer experiences.

NTT DATA is powering its agentic platform for telcos with NVIDIA accelerated compute and the NVIDIA AI Enterprise software platform. Its first agentic use case is focused on network alarms management, where NVIDIA NIM microservices help automate and power observability, troubleshooting, anomaly detection and resolution with closed loop ticketing.

Tata Consultancy Services is delivering agentic AI solutions for telcos built on NVIDIA DGX Cloud and using NVIDIA AI Enterprise to develop, fine-tune and integrate large telco models into AI agent workflows. These range from billing and revenue assurance, autonomous network management to hybrid edge-cloud distributed inference.

For example, the company’s anomaly management agentic AI model includes real-time detection and resolution of network anomalies and service performance optimization. This increases business agility and improves operational efficiencies by up to 40% by eliminating human intensive toils, overheads and cross-departmental silos.

Prodapt has introduced an autonomous operations workflow for networks, powered by NVIDIA AI Enterprise, that offers agentic AI capabilities to support autonomous telecom networks. AI agents can autonomously monitor networks, detect anomalies in real time, initiate diagnostics, analyze root causes of issues using historical data and correlation techniques, automatically execute corrective actions, and generate, enrich and assign incident tickets through integrated ticketing systems.

Accenture announced its new portfolio of agentic AI solutions for telecommunications through its AI Refinery platform, built on NVIDIA AI Enterprise software and accelerated computing.

The first available solution, the NOC Agentic App, boosts network operations center tasks by using a generative AI-driven, nonlinear agentic framework to automate processes such as incident and fault management, root cause analysis and configuration planning. Using the Llama 3.1 70B NVIDIA NIM microservice and the AI Refinery Distiller Framework, the NOC Agentic App orchestrates networks of intelligent agents for faster, more efficient decision-making.

Infosys is announcing its agentic autonomous operations platform, called Infosys Smart Network Assurance (ISNA), designed to accelerate telecom operators’ journeys toward fully autonomous network operations.

ISNA helps address long-standing operational challenges for telcos — such as limited automation and high average time to repair — with an integrated, AI-driven platform that reduces operational costs by up to 40% and shortens fault resolution times by up to 30%. NVIDIA NIM and NeMo microservices enhance the platform’s reasoning and hallucination-detection capabilities, reduce latency and increase accuracy.

Get started with the new blueprint today.

Learn more about the latest AI advancements for telecom and other industries at NVIDIA GTC Paris, running through Thursday, June 12, at VivaTech, including a keynote from NVIDIA founder and CEO Jensen Huang and a special address from Ronnie Vasishta, senior vice president of telecom at NVIDIA. Plus, hear from industry leaders in a panel session with Orange, Swisscom, Telenor and NVIDIA.

Read More

NVIDIA Brings Physical AI to European Cities With New Blueprint for Smart City AI

NVIDIA Brings Physical AI to European Cities With New Blueprint for Smart City AI

Urban populations are expected to double by 2050, which means around 2.5 billion people could be added to urban areas by the middle of the century, driving the need for more sustainable urban planning and public services. Cities across the globe are turning to digital twins and AI agents for urban planning scenario analysis and data-driven operational decisions.

Building a digital twin of a city and testing smart city AI agents within it, however, is a complex and resource-intensive endeavor, fraught with technical and operational challenges.

To address those challenges, NVIDIA today announced the NVIDIA Omniverse Blueprint for smart city AI, a reference framework that combines the NVIDIA Omniverse, Cosmos, NeMo and Metropolis platforms to bring the benefits of physical AI to entire cities and their critical infrastructure.

Using the blueprint, developers can build simulation-ready, or SimReady, photorealistic digital twins of cities to build and test AI agents that can help monitor and optimize city operations.

Leading companies including XXII, AVES Reality, Akila, Blyncsy, Bentley, Cesium, K2K, Linker Vision, Milestone Systems, Nebius, SNCF Gares&Connexions, Trimble and Younite AI are among the first to use the new blueprint.

NVIDIA Omniverse Blueprint for Smart City AI 

The NVIDIA Omniverse Blueprint for smart city AI provides the complete software stack needed to accelerate the development and testing of AI agents in physically accurate digital twins of cities. It includes:

The blueprint workflow comprises three key steps. First, developers create a SimReady digital twin of locations and facilities using aerial, satellite or map data with Omniverse and Cosmos. Second, they can train and fine-tune AI models, like computer vision models and VLMs, using NVIDIA TAO and NeMo Curator to improve accuracy for vision AI use cases​. Finally, real-time AI agents powered by these customized models are deployed to alert, summarize and query camera and sensor data using the Metropolis VSS blueprint.

NVIDIA Partner Ecosystem Powers Smart Cities Worldwide

The blueprint for smart city AI enables a large ecosystem of partners to use a single workflow to build and activate digital twins for smart city use cases, tapping into a combination of NVIDIA’s technologies and their own.

SNCF Gares&Connexions, which operates a network of 3,000 train stations across France and Monaco, has deployed a digital twin and AI agents to enable real-time operational monitoring, emergency response simulations and infrastructure upgrade planning.

This helps each station analyze operational data such as energy and water use, and enables predictive maintenance capabilities, automated reporting and GDPR-compliant video analytics for incident detection and crowd management.

Powered by Omniverse, Metropolis and solutions from ecosystem partners Akila and XXII, SNCF Gares&Connexions’ physical AI deployment at the Monaco-Monte-Carlo and Marseille stations has helped SNCF Gares&Connexions achieve a 100% on-time preventive maintenance completion rate, a 50% reduction in downtime and issue response time, and a 20% reduction in energy consumption.

The city of Palermo in Sicily is using AI agents and digital twins from its partner K2K to improve public health and safety by helping city operators process and analyze footage from over 1,000 public video streams at a rate of nearly 50 billion pixels per second.

Tapped by Sicily, K2K’s AI agents — built with the NVIDIA AI Blueprint for VSS and cloud solutions from Nebius — can interpret and act on video data to provide real-time alerts on public events.

To accurately predict and resolve traffic incidents, K2K is generating synthetic data with Cosmos world foundation models to simulate different driving conditions. Then, K2K uses the data to fine-tune the VLMs powering the AI agents with NeMo Curator. These simulations enable K2K’s AI agents to create over 100,000 predictions per second.

Milestone Systems — in collaboration with NVIDIA and European cities — has launched Project Hafnia, an initiative to build an anonymized, ethically sourced video data platform for cities to develop and train AI models and applications while maintaining regulatory compliance.

Using a combination of Cosmos and NeMo Curator on NVIDIA DGX Cloud and Nebius’ sovereign European cloud infrastructure, Project Hafnia scales up and enables European-compliant training and fine-tuning of video-centric AI models, including VLMs, for a variety of smart city use cases.

The project’s initial rollout, taking place in Genoa, Italy, features one of the world’s first VLM models for intelligent transportation systems.

Linker Vision was among the first to partner with NVIDIA to deploy smart city digital twins and AI agents for Kaohsiung City, Taiwan — powered by Omniverse, Cosmos and Metropolis. Linker Vision worked with AVES Reality, a digital twin company, to bring aerial imagery of cities and infrastructure into 3D geometry and ultimately into SimReady Omniverse digital twins.

Linker Vision’s AI-powered application then built, trained and tested visual AI agents in a digital twin before deployment in the physical city. Now, it’s scaling to analyze 50,000 video streams in real time with generative AI to understand and narrate complex urban events like floods and traffic accidents. Linker Vision delivers timely insights to a dozen city departments through a single integrated AI-powered platform, breaking silos and reducing incident response times by up to 80%.

Bentley Systems is joining the effort to bring physical AI to cities with the NVIDIA blueprint. Cesium, the open 3D geospatial platform, provides the foundation for visualizing, analyzing and managing infrastructure projects and ports digital twins to Omniverse. The company’s AI platform Blyncsy uses synthetic data generation and Metropolis to analyze road conditions and improve maintenance.

Trimble, a global technology company that enables essential industries including construction, geospatial and transportation, is exploring ways to integrate components of the Omniverse blueprint into its reality capture workflows and Trimble Connect digital twin platform for surveying and mapping applications for smart cities.

Younite AI, a developer of AI and 3D digital twin solutions, is adopting the blueprint to accelerate its development pipeline, enabling the company to quickly move from operational digital twins to large-scale urban simulations, improve synthetic data generation, integrate real-time IoT sensor data and deploy AI agents.

Learn more about the NVIDIA Omniverse Blueprint for smart city AI by attending this GTC Paris session or watching the on-demand video after the event. Sign up to be notified when the blueprint is available.

Watch the NVIDIA GTC Paris keynote from NVIDIA founder and CEO Jensen Huang at VivaTech, and explore GTC Paris sessions.

Read More

Retail Reboot: Major Global Brands Transform End-to-End Operations With NVIDIA

Retail Reboot: Major Global Brands Transform End-to-End Operations With NVIDIA

AI is packing and shipping efficiency for the retail and consumer packaged goods (CPG) industries, with a majority of surveyed companies in the space reporting the technology is increasing revenue and reducing operational costs.

Global brands are reimagining every facet of their businesses with AI, from how products are designed and manufactured to how they’re marketed, shipped and experienced in-store and online.

At NVIDIA GTC Paris at VivaTech, industry leaders including L’Oréal, LVMH and Nestlé shared how they’re using tools like AI agents and physical AI — powered by NVIDIA AI and simulation technologies — across every step of the product lifecycle to enhance operations and experiences for partners, customers and employees.

3D Digital Twins and AI Transform Marketing, Advertising and Product Design

The meeting of generative AI and 3D product digital twins results in unlimited creative potential.

Nestlé, the world’s largest food and beverage company, today announced a collaboration with NVIDIA and Accenture to launch a new, AI-powered in-house service that will create high-quality product content at scale for e-commerce and digital media channels.

The new content service, based on digital twins powered by the NVIDIA Omniverse platform, creates exact 3D virtual replicas of physical products. Product packaging can be adjusted or localized digitally, enabling seamless integration into various environments, such as seasonal campaigns or channel-specific formats. This means that new creative content can be generated without having to constantly reshoot from scratch.

Image courtesy of Nestlé

The service is developed in partnership with Accenture Song, using Accenture AI Refinery built on NVIDIA Omniverse for advanced digital twin creation. It uses NVIDIA AI Enterprise for generative AI, hosted on Microsoft Azure for robust cloud infrastructure.

Nestlé already has a baseline of 4,000 3D digital products — mainly for global brands — with the ambition to convert a total of 10,000 products into digital twins in the next two years across global and local brands.

LVMH, the world’s leading luxury goods company, home to 75 distinguished maisons, is bringing 3D digital twins to its content production processes through its wine and spirits division, Moët Hennessy.

The group partnered with content configuration engine Grip to develop a solution using the NVIDIA Omniverse platform, which enables the creation of 3D digital twins that power content variation production. With Grip’s solution, Moët Hennessy teams can quickly generate digital marketing assets and experiences to promote luxury products at scale.

The initiative, led by Capucine Lafarge and Chloé Fournier, has been recognized by LVMH as a leading approach to scaling content creation.

Image courtesy of Grip

L’Oréal Gives Marketing and Online Shopping an AI Makeover

Innovation starts at the drawing board. Today, that board is digital — and it’s powered by AI.

L’Oréal Groupe, the world’s leading beauty player, announced its collaboration with NVIDIA today. Through this collaboration, L’Oréal and its partner ecosystem will leverage the NVIDIA AI Enterprise platform to transform its consumer beauty experiences, marketing and advertising content pipelines.

“AI doesn’t think with the same constraints as a human being. That opens new avenues for creativity,” said Anne Machet, global head of content and entertainment at L’Oréal. “Generative AI enables our teams and partner agencies to explore creative possibilities.”

CreAItech, L’Oréal’s generative AI content platform, is augmenting the creativity of marketing and content teams. Combining a modular ecosystem of models, expertise, technologies and partners — including NVIDIA — CreAltech empowers marketers to generate thousands of unique, on-brand images, videos and lines of text for diverse platforms and global audiences.

The solution empowers L’Oréal’s marketing teams to quickly iterate on campaigns that improve consumer engagement across social media, e-commerce content and influencer marketing — driving higher conversion rates.

Noli.com, the first AI-powered multi-brand marketplace startup founded and backed by the  L’Oréal Groupe, is reinventing how people discover and shop for beauty products.

Noli’s AI Beauty Matchmaker experience uses L’Oréal Groupe’s century-long expertise in beauty, including its extensive knowledge of beauty science, beauty tech and consumer insights, built from over 1 million skin data points and analysis of thousands of product formulations. It gives users a BeautyDNA profile with expert-level guidance and personalized product recommendations for skincare and haircare.

“Beauty shoppers are often overwhelmed by choice and struggling to find the products that are right for them,” said Amos Susskind, founder and CEO of Noli. “By applying the latest AI models accelerated by NVIDIA and Accenture to the unparalleled knowledge base and expertise of the L’Oréal Groupe, we can provide hyper-personalized, explainable recommendations to our users.” 

The Accenture AI Refinery, powered by NVIDIA AI Enterprise, will provide the platform for Noli to experiment and scale. Noli’s new agent models will use NVIDIA NIM and NVIDIA NeMo microservices, including NeMo Retriever, running on Microsoft Azure.

Rapid Innovation With the NVIDIA Partner Ecosystem

NVIDIA’s ecosystem of solution provider partners empowers retail and CPG companies to innovate faster, personalize customer experiences, and optimize operations with NVIDIA accelerated computing and AI.

Global digital agency Monks is reshaping the landscape of AI-driven marketing, creative production and enterprise transformation. At the heart of their innovation lies the Monks.Flow platform that enhances both the speed and sophistication of creative workflows through NVIDIA Omniverse, NVIDIA NIM microservices and Triton Inference Server for lightning-fast inference.

AI image solutions provider Bria is helping retail giants like Lidl and L’Oreal to enhance marketing asset creation. Bria AI transforms static product images into compelling, dynamic advertisements that can be quickly scaled for use across any marketing need.

The company’s generative AI platform uses NVIDIA Triton Inference Server software and the NVIDIA TensorRT software development kit for accelerated inference, as well as NVIDIA NIM and NeMo microservices for quick image generation at scale.

Physical AI Brings Acceleration to Supply Chain and Logistics

AI’s impact extends far beyond the digital world. Physical AI-powered warehousing robots, for example, are helping maximize efficiency in retail supply chain operations. Four in five retail companies have reported that AI has helped reduce supply chain operational costs, with 25% reporting cost reductions of at least 10%.

Technology providers Lyric, KoiReader Technologies and Exotec are tackling the challenges of integrating AI into complex warehouse environments.

Lyric is using the NVIDIA cuOpt GPU-accelerated solver for warehouse network planning and route optimization, and is collaborating with NVIDIA to apply the technology to broader supply chain decision-making problems. KoiReader Technologies is tapping the NVIDIA Metropolis stack for its computer vision solutions within logistics, supply chain and manufacturing environments using the KoiVision Platform. And Exotec is using NVIDIA CUDA libraries and the NVIDIA JetPack software development kit for embedded robotic systems in warehouse and distribution centers.

From real-time robotics orchestration to predictive maintenance, these solutions are delivering impact on uptime, throughput and cost savings for supply chain operations.

Learn more by joining a follow-up discussion on digital twins and AI-powered creativity with Microsoft, Nestlé, Accenture and NVIDIA at Cannes Lions on Monday, June 16.

Watch the NVIDIA GTC Paris keynote from NVIDIA founder and CEO Jensen Huang at VivaTech, and explore GTC Paris sessions.

Read More

European Robot Makers Adopt NVIDIA Isaac, Omniverse and Halos to Develop Safe, Physical AI-Driven Robot Fleets

European Robot Makers Adopt NVIDIA Isaac, Omniverse and Halos to Develop Safe, Physical AI-Driven Robot Fleets

In the face of growing labor shortages and need for sustainability, European manufacturers are racing to reinvent their processes to become software-defined and AI-driven.

To achieve this, robot developers and industrial digitalization solution providers are working with NVIDIA to build safe, AI-driven robots and industrial technologies to drive modern, sustainable manufacturing.

At NVIDIA GTC Paris at VivaTech, Europe’s leading robotics companies including Agile Robots, Extend Robotics, Humanoid, idealworks, Neura Robotics, SICK, Universal Robots,  Vorwerk and Wandelbots are showcasing their latest AI-driven robots and automation breakthroughs, all accelerated by NVIDIA technologies. In addition, NVIDIA is releasing new models and tools to support the entire robotics ecosystem.

NVIDIA Releases Tools for Accelerating Robot Development and Safety

NVIDIA Isaac GR00T N1.5, an open foundation model for humanoid robot reasoning and skills, is now available for download on Hugging Face. This update enhances the model’s adaptability and ability to follow instructions, significantly improving its performance in material handling and manufacturing tasks. The NVIDIA Isaac Sim 5.0 and Isaac Lab 2.2 open-source robotics simulation and learning frameworks, optimized for NVIDIA RTX PRO 6000 workstations, are available on GitHub for developer preview.

In addition, NVIDIA announced that NVIDIA Halos — a full-stack, comprehensive safety system that unifies hardware architecture, AI models, software, tools and services — now expands to robotics, promoting safety across the entire development lifecycle of AI-driven robots.

The NVIDIA Halos AI Systems Inspection Lab has earned accreditation from the ANSI National Accreditation Board (ANAB) to perform inspections across functional safety for robotics, in addition to automotive vehicles.

“NVIDIA’s latest evaluation with ANAB verifies the demonstration of competence and compliance with internationally recognized standards, helping ensure that developers of autonomous machines — from automotive to robotics — can meet the highest benchmarks for functional safety,” said R. Douglas Leonard Jr., executive director of ANAB.

Arcbest, Advantech, Bluewhite, Boston Dynamics, FORT, Inxpect, KION, NexCobot — a NEXCOM company, SICK and Synapticon are among the first robotics companies to join the Halos Inspection Lab, ensuring their products meet NVIDIA safety and cybersecurity requirements.

To support robotics leaders in strengthening safety across the entire development lifecycle of AI-driven robots, Halos will now provide:

  • Safety extension packages for the NVIDIA IGX platform, enabling manufacturers to easily program safety functions into their robots, supported by TÜV Rheinland’s inspection of NVIDIA IGX.
  • A robotic safety platform, which includes IGX and NVIDIA Holoscan Sensor Bridge for a unified approach to designing sensor-to-compute architecture with built-in AI safety.
  • An outside-in safety AI inspector — an AI-powered agent for monitoring robot operations, helping improve worker safety.

Europe’s Robotics Ecosystem Builds on NVIDIA’s Three Computers

Europe’s leading robotics developers and solution providers are integrating the NVIDIA Isaac robotics platform to train, simulate and deploy robots across different embodiments.

Agile Robots is post-training the GR00T N1 model in Isaac Lab to train its dual-arm manipulator robots, which run on NVIDIA Jetson hardware, to execute a variety of tasks in industrial environments.

Meanwhile, idealworks has adopted the Mega NVIDIA Omniverse Blueprint for robotic fleet simulation to extend the blueprint’s capabilities to humanoids. Building on the VDA 5050 framework, idealworks contributes to the development of guidance that supports tasks uniquely enabled by humanoid robots, such as picking, moving and placing objects.

Neura Robotics is integrating NVIDIA Isaac to further enhance its robot development workflows. The company is using GR00T-Mimic to post-train the Isaac GR00T N1 robot foundation model for its service robot MiPA. Neura is also collaborating with SAP and NVIDIA to integrate SAP’s Joule agents with its robots, using the Mega NVIDIA Omniverse Blueprint to simulate and refine robot behavior in complex, realistic operational scenarios before deployment.

Vorwerk is using NVIDIA technologies to power its AI-driven collaborative robots. The company is post-training GR00T N1 models in Isaac Lab with its custom synthetic data pipeline, which is built on Isaac GR00T-Mimic and powered by the NVIDIA Omniverse platform. The enhanced models are then deployed on NVIDIA Jetson AGX, Jetson Orin or Jetson Thor modules for advanced, real-time home robotics.

Humanoid is using NVIDIA’s full robotics stack, including Isaac Sim and Isaac Lab, to cut its prototyping time down by six weeks. The company is training its vision language action models on NVIDIA DGX B200 systems to boost the cognitive abilities of its robots, allowing them to operate autonomously in complex environments using Jetson Thor onboard computing.

Universal Robots is introducing UR15, its fastest collaborative robot yet, to the European market. Using UR’s AI Accelerator — developed on NVIDIA Isaac’s CUDA-accelerated libraries and AI models, as well as NVIDIA Jetson AGX Orin — manufacturers can build AI applications to embed intelligence into the company’s new cobots.

Wandelbots is showcasing its NOVA Operating System, now integrated with Omniverse, to simulate, validate and optimize robotic behaviors virtually before deploying them to physical robots. Wandelbots also announced a collaboration with EY and EDAG to offer manufacturers a scalable automation platform on Omniverse that speeds up the transition from proof of concept to full-scale deployment.

Extend Robotics is using the Isaac GR00T platform to enable customers to control and train robots for industrial tasks like visual inspection and handling radioactive materials. The company’s Advanced Mechanics Assistance System lets users collect demonstration data and generate diverse synthetic datasets with NVIDIA GR00T-Mimic and GR00T-Gen to train the GR00T N1 foundation model.

SICK is enhancing its autonomous perception solutions by integrating new certified sensor models — as well as 2D and 3D lidars, safety scanners and cameras — into NVIDIA Isaac Sim. This enables engineers to virtually design, test and validate machines using SICK’s sensing models within Omniverse, supporting processes spanning product development to large-scale robotic fleet management.

Toyota Material Handling is working with SoftServe to simulate its autonomous mobile robots working alongside human workers, using the Mega NVIDIA Omniverse Blueprint. Toyota Material Handling is testing and simulating a multitude of traffic scenarios — allowing the company to refine its AI algorithms before real-world deployment.

NVIDIA’s partner ecosystem is enabling European industries to tap into intelligent, AI-powered robotics. By harnessing advanced simulation, digital twins and generative AI, manufacturers are rapidly developing and deploying safe, adaptable robot fleets that address labor shortages, boost sustainability and drive operational efficiency.

Watch the NVIDIA GTC Paris keynote from NVIDIA founder and CEO Jensen Huang at VivaTech, and explore GTC Paris sessions.

See notice regarding software product information.

Read More