Responsible AI at Google Research: User Experience Team

Responsible AI at Google Research: User Experience Team

Google’s Responsible AI User Experience (Responsible AI UX) team is a product-minded team embedded within Google Research. This unique positioning requires us to apply responsible AI development practices to our user-centered user experience (UX) design process. In this post, we describe the importance of UX design and responsible AI in product development, and share a few examples of how our team’s capabilities and cross-functional collaborations have led to responsible development across Google.

First, the UX part. We are a multi-disciplinary team of product design experts: designers, engineers, researchers, and strategists who manage the user-centered UX design process from early-phase ideation and problem framing to later-phase user-interface (UI) design, prototyping and refinement. We believe that effective product development occurs when there is clear alignment between significant unmet user needs and a product’s primary value proposition, and that this alignment is reliably achieved via a thorough user-centered UX design process.

And second, recognizing generative AI’s (GenAI) potential to significantly impact society, we embrace our role as the primary user advocate as we continue to evolve our UX design process to meet the unique challenges AI poses, maximizing the benefits and minimizing the risks. As we navigate through each stage of an AI-powered product design process, we place a heightened emphasis on the ethical, societal, and long-term impact of our decisions. We contribute to the ongoing development of comprehensive safety and inclusivity protocols that define design and deployment guardrails around key issues like content curation, security, privacy, model capabilities, model access, equitability, and fairness that help mitigate GenAI risks.

Responsible AI UX is constantly evolving its user-centered product design process to meet the needs of a GenAI-powered product landscape with greater sensitivity to the needs of users and society and an emphasis on ethical, societal, and long-term impact.

Responsibility in product design is also reflected in the user and societal problems we choose to address and the programs we resource. Thus, we encourage the prioritization of user problems with significant scale and severity to help maximize the positive impact of GenAI technology.

Communication across teams and disciplines is essential to responsible product design. The seamless flow of information and insight from user research teams to product design and engineering teams, and vice versa, is essential to good product development. One of our team’s core objectives is to ensure the practical application of deep user-insight into AI-powered product design decisions at Google by bridging the communication gap between the vast technological expertise of our engineers and the user/societal expertise of our academics, research scientists, and user-centered design research experts. We’ve built a multidisciplinary team with expertise in these areas, deepening our empathy for the communication needs of our audience, and enabling us to better interface between our user & society experts and our technical experts. We create frameworks, guidebooks, prototypes, cheatsheets, and multimedia tools to help bring insights to life for the right people at the right time.

Facilitating responsible GenAI prototyping and development

During collaborations between Responsible AI UX, the People + AI Research (PAIR) initiative and Labs, we identified that prototyping can afford a creative opportunity to engage with large language models (LLM), and is often the first step in GenAI product development. To address the need to introduce LLMs into the prototyping process, we explored a range of different prompting designs. Then, we went out into the field, employing various external, first-person UX design research methodologies to draw out insight and gain empathy for the user’s perspective. Through user/designer co-creation sessions, iteration, and prototyping, we were able to bring internal stakeholders, product managers, engineers, writers, sales, and marketing teams along to ensure that the user point of view was well understood and to reinforce alignment across teams.

The result of this work was MakerSuite, a generative AI platform launched at Google I/O 2023 that enables people, even those without any ML experience, to prototype creatively using LLMs. The team’s first-hand experience with users and understanding of the challenges they face allowed us to incorporate our AI Principles into the MakerSuite product design. Product features like safety filters, for example, enable users to manage outcomes, leading to easier and more responsible product development with MakerSuite.

Because of our close collaboration with product teams, we were able to adapt text-only prototyping to support multimodal interaction with Google AI Studio, an evolution of MakerSuite. Now, Google AI Studio enables developers and non-developers alike to seamlessly leverage Google’s latest Gemini model to merge multiple modality inputs, like text and image, in product explorations. Facilitating product development in this way provides us with the opportunity to better use AI to identify appropriateness of outcomes and unlocks opportunities for developers and non-developers to play with AI sandboxes. Together with our partners, we continue to actively push this effort in the products we support.

Google AI studio enables developers and non-developers to leverage Google Cloud infrastructure and merge multiple modality inputs in their product explorations.

Equitable speech recognition

Multiple external studies, as well as Google’s own research, have identified an unfortunate deficiency in the ability of current speech recognition technology to understand Black speakers on average, relative to White speakers. As multimodal AI tools begin to rely more heavily on speech prompts, this problem will grow and continue to alienate users. To address this problem, the Responsible AI UX team is partnering with world-renowned linguists and scientists at Howard University, a prominent HBCU, to build a high quality African-American English dataset to improve the design of our speech technology products to make them more accessible. Called Project Elevate Black Voices, this effort will allow Howard University to share the dataset with those looking to improve speech technology while establishing a framework for responsible data collection, ensuring the data benefits Black communities. Howard University will retain the ownership and licensing of the dataset and serve as stewards for its responsible use. At Google, we’re providing funding support and collaborating closely with our partners at Howard University to ensure the success of this program.

Equitable computer vision

The Gender Shades project highlighted that computer vision systems struggle to detect people with darker skin tones, and performed particularly poorly for women with darker skin tones. This is largely due to the fact that the datasets used to train these models were not inclusive to a wide range of skin tones. To address this limitation, the Responsible AI UX team has been partnering with sociologist Dr. Ellis Monk to release the Monk Skin Tone Scale (MST), a skin tone scale designed to be more inclusive of the spectrum of skin tones around the world. It provides a tool to assess the inclusivity of datasets and model performance across an inclusive range of skin tones, resulting in features and products that work better for everyone.

We have integrated MST into a range of Google products, such as Search, Google Photos, and others. We also open sourced MST, published our research, described our annotation practices, and shared an example dataset to encourage others to easily integrate it into their products. The Responsible AI UX team continues to collaborate with Dr. Monk, utilizing the MST across multiple product applications and continuing to do international research to ensure that it is globally inclusive.

Consulting & guidance

As teams across Google continue to develop products that leverage the capabilities of GenAI models, our team recognizes that the challenges they face are varied and that market competition is significant. To support teams, we develop actionable assets to facilitate a more streamlined and responsible product design process that considers available resources. We act as a product-focused design consultancy, identifying ways to scale services, share expertise, and apply our design principles more broadley. Our goal is to help all product teams at Google connect significant unmet user needs with technology benefits via great responsible product design.

One way we have been doing this is with the creation of the People + AI Guidebook, an evolving summative resource of many of the responsible design lessons we’ve learned and recommendations we’ve made for internal and external stakeholders. With its forthcoming, rolling updates focusing specifically on how to best design and consider user needs with GenAI, we hope that our internal teams, external stakeholders, and larger community will have useful and actionable guidance at the most critical milestones in the product development journey.

The People + AI Guidebook has six chapters, designed to cover different aspects of the product life cycle.

If you are interested in reading more about Responsible AI UX and how we are specifically thinking about designing responsibly with Generative AI, please check out this Q&A piece.

Acknowledgements

Shout out to our the Responsible AI UX team members: Aaron Donsbach, Alejandra Molina, Courtney Heldreth, Diana Akrong, Ellis Monk, Femi Olanubi, Hope Neveux, Kafayat Abdul, Key Lee, Mahima Pushkarna, Sally Limb, Sarah Post, Sures Kumar Thoddu Srinivasan, Tesh Goyal, Ursula Lauriston, and Zion Mengesha. Special thanks to Michelle Cohn for her contributions to this work.

Read More

Create a document lake using large-scale text extraction from documents with Amazon Textract

Create a document lake using large-scale text extraction from documents with Amazon Textract

AWS customers in healthcare, financial services, the public sector, and other industries store billions of documents as images or PDFs in Amazon Simple Storage Service (Amazon S3). However, they’re unable to gain insights such as using the information locked in the documents for large language models (LLMs) or search until they extract the text, forms, tables, and other structured data. With AWS intelligent document processing (IDP) using AI services such as Amazon Textract, you can take advantage of industry-leading machine learning (ML) technology to quickly and accurately process data from PDFs or document images (TIFF, JPEG, PNG). After the text is extracted from the documents, you can use it to fine-tune a foundation model, summarize the data using a foundation model, or send it to a database.

In this post, we focus on processing a large collection of documents into raw text files and storing them in Amazon S3. We provide you with two different solutions for this use case. The first allows you to run a Python script from any server or instance including a Jupyter notebook; this is the quickest way to get started. The second approach is a turnkey deployment of various infrastructure components using AWS Cloud Development Kit (AWS CDK) constructs. The AWS CDK construct provides a resilient and flexible framework to process your documents and build an end-to-end IDP pipeline. Through the use of the AWS CDK, you can extend its functionality to include redaction, store the output in Amazon OpenSearch, or add a custom AWS Lambda function with your own business logic.

Both of these solutions allow you to quickly process many millions of pages. Before running either of these solutions at scale, we recommend testing with a subset of your documents to make sure the results meet your expectations. In the following sections, we first describe the script solution, followed by the AWS CDK construct solution.

Solution 1: Use a Python script

This solution processes documents for raw text through Amazon Textract as quickly as the service will allow with the expectation that if there is a failure in the script, the process will pick up from where it left off. The solution utilizes three different services: Amazon S3, Amazon DynamoDB, and Amazon Textract.

The following diagram illustrates the sequence of events within the script. When the script ends, a completion status along with the time taken will be returned to the SageMaker studio console.

diagram

We have packaged this solution in a .ipynb script and .py script. You can use any of the deployable solutions as per your requirements.

Prerequisites

To run this script from a Jupyter notebook, the AWS Identity and Access Management (IAM) role assigned to the notebook must have permissions that allow it to interact with DynamoDB, Amazon S3, and Amazon Textract. The general guidance is to provide least-privilege permissions for each of these services to your AmazonSageMaker-ExecutionRole role. To learn more, refer to Get started with AWS managed policies and move toward least-privilege permissions.

Alternatively, you can run this script from other environments such as an Amazon Elastic Compute Cloud (Amazon EC2) instance or container that you would manage, provided that Python, Pip3, and the AWS SDK for Python (Boto3) are installed. Again, the same IAM polices need to be applied that allow the script to interact with the various managed services.

Walkthrough

To implement this solution, you first need to clone the repository GitHub.

You need to set the following variables in the script before you can run it:

  • tracking_table – This is the name of the DynamoDB table that will be created.
  • input_bucket – This is your source location in Amazon S3 that contains the documents that you want to send to Amazon Textract for text detection. For this variable, provide the name of the bucket, such as mybucket.
  • output_bucket – This is for storing the location of where you want Amazon Textract to write the results to. For this variable, provide the name of the bucket, such as myoutputbucket.
  • _input_prefix (optional) – If you want to select certain files from within a folder in your S3 bucket, you can specify this folder name as the input prefix. Otherwise, leave the default as empty to select all.

The script is as follows:

_tracking_table = "Table_Name_for_storing_s3ObjectNames"
_input_bucket = "your_files_are_here"
_output_bucket = "Amazon Textract_writes_JSON_containing_raw_text_to_here"

The following DynamoDB table schema gets created when the script is run:

Table              Table_Name_for_storing_s3ObjectNames
Partition Key       objectName (String)
                    bucketName (String)
                    createdDate (Decimal)
                    outputbucketName (String)
                    txJobId (String)

When the script is run for the first time, it will check to see if the DynamoDB table exists and will automatically create it if needed. After the table is created, we need to populate it with a list of document object references from Amazon S3 that we want to process. The script by design will enumerate over objects in the specified input_bucket and automatically populate our table with their names when ran. It takes approximately 10 minutes to enumerate over 100,000 documents and populate those names into the DynamoDB table from the script. If you have millions of objects in a bucket, you could alternatively use the inventory feature of Amazon S3 that generates a CSV file of names, then populate the DynamoDB table from this list with your own script in advance and not use the function called fetchAllObjectsInBucketandStoreName by commenting it out. To learn more, refer to Configuring Amazon S3 Inventory.

As mentioned earlier, there is both a notebook version and a Python script version. The notebook is the most straightforward way to get started; simply run each cell from start to finish.

If you decide to run the Python script from a CLI, it is recommended that you use a terminal multiplexer such as tmux. This is to prevent the script from stopping should your SSH session finish. For example: tmux new -d ‘python3 textractFeeder.py’.

The following is the script’s entry point; from here you can comment out methods not needed:

"""Main entry point into script --- Start Here"""
if __name__ == "__main__":    
    now = time.perf_counter()
    print("started")

The following fields are set when the script is populating the DynamoDB table:

  • objectName – The name of the document located in Amazon S3 that will be sent to Amazon Textract
  • bucketName – The bucket where the document object is stored

These two fields must be populated if you decide to use a CSV file from the S3 inventory report and skip the auto populating that happens within the script.

Now that the table is created and populated with the document object references, the script is ready to start calling the Amazon Textract StartDocumentTextDetection API. Amazon Textract, similar to other managed services, has a default limit on the APIs called transactions per second (TPS). If required, you can request a quota increase from the Amazon Textract console. The code is designed to use multiple threads concurrently when calling Amazon Textract to maximize the throughput with the service. You can change this within the code by modifying the threadCountforTextractAPICall variable. By default, this is set to 20 threads. The script will initially read 200 rows from the DynamoDB table and store these in an in-memory list that is wrapped with a class for thread safety. Each caller thread is then started and runs within its own swim lane. Basically, the Amazon Textract caller thread will retrieve an item from the in-memory list that contains our object reference. It will then call the asynchronous start_document_text_detection API and wait for the acknowledgement with the job ID. The job ID is then updated back to the DynamoDB row for that object, and the thread will repeat by retrieving the next item from the list.

The following is the main orchestration code script:

while len(results) > 0:
        for record in results: # put these records into our thread safe list
            fileList.append(record)    
        """create our threads for processing Amazon Textract"""
        	  threadsforTextractAPI=threading.Thread(name="Thread - " + str(i), target=procestTextractFunction, args=(fileList,)) 

The caller threads will continue repeating until there are no longer any items within the list, at which point the threads will each stop. When all threads operating within their swim lanes have stopped, the next 200 rows from DynamoDB are retrieved and a new set of 20 threads are started, and the whole process repeats until every row that doesn’t contain a job ID is retrieved from DynamoDB and updated. Should the script crash due to some unexpected problem, then the script can be run again from the orchestrate() method. This makes sure that the threads will continue processing rows that contain empty job IDs. Note that when rerunning the orchestrate() method after the script has stopped, there is a potential that a few documents will get sent to Amazon Textract again. This number will be equal to or less than the number of threads that were running at the time of the crash.

When there are no more rows containing a blank job ID in the DynamoDB table, the script will stop. All the JSON output from Amazon Textract for all the objects will be found in the output_bucket by default under the textract_output folder. Each subfolder within textract_output will be named with the job ID that corresponds to the job ID that was stored in the DynamoDB table for that object. Within the job ID folder, you will find the JSON, which will be numerically named starting at 1 and can potentially span additional JSON files that would be labeled 2, 3, and so on. Spanning JSON files is a result of dense or multi-page documents, where the amount of content extracted exceeds the Amazon Textract default JSON size of 1,000 blocks. Refer to Block for more information on blocks. These JSON files will contain all the Amazon Textract metadata, including the text that was extracted from within the documents.

You can find the Python code notebook version and script for this solution in GitHub.

Clean up

When the Python script is complete, you can save costs by shutting down or stopping the Amazon SageMaker Studio notebook or container that you spun up.

Now on to our second solution for documents at scale.

Solution 2: Use a serverless AWS CDK construct

This solution uses AWS Step Functions and Lambda functions to orchestrate the IDP pipeline. We use the IDP AWS CDK constructs, which make it straightforward to work with Amazon Textract at scale. Additionally, we use a Step Functions distributed map to iterate over all the files in the S3 bucket and initiate processing. The first Lambda function determines how many pages your documents has. This enables the pipeline to automatically use either the synchronous (for single-page documents) or asynchronous (for multi-page documents) API. When using the asynchronous API, an additional Lambda function is called to all the JSON files that Amazon Textract will produce for all of your pages into one JSON file to make it straightforward for your downstream applications to work with the information.

This solution also contains two additional Lambda functions. The first function parses the text from the JSON and saves it as a text file in Amazon S3. The second function analyzes the JSON and stores that for metrics on the workload.

The following diagram illustrates the Step Functions workflow.

Diagram

Prerequisites

This code base uses the AWS CDK and requires Docker. You can deploy this from an AWS Cloud9 instance, which has the AWS CDK and Docker already set up.

Walkthrough

To implement this solution, you first need to clone the repository.

After you clone the repository, install the dependencies:

pip install -r requirements.txt

Then use the following code to deploy the AWS CDK stack:

cdk bootstrap
cdk deploy --parameters SourceBucket=<Source Bucket> SourcePrefix=<Source Prefix>

You must provide both the source bucket and source prefix (the location of the files you want to process) for this solution.

When the deployment is complete, navigate to the Step Functions console, where you should see the state machine ServerlessIDPArchivePipeline.

Diagram

Open the state machine details page and on the Executions tab, choose Start execution.

Diagram

Choose Start execution again to run the state machine.

Diagram

After you start the state machine, you can monitor the pipeline by looking at the map run. You will see an Item processing status section like the following screenshot. As you can see, this is built to run and track what was successful and what failed. This process will continue to run until all documents have been read.

Diagram

With this solution, you should be able to process millions of files in your AWS account without worrying about how to properly determine which files to send to which API or corrupt files failing your pipeline. Through the Step Functions console, you will be able to watch and monitor your files in real time.

Clean up

After your pipeline is finished running, to clean up, you can go back into your project and enter the following command:

cdk destroy

This will delete any services that were deployed for this project.

Conclusion

In this post, we presented a solution that makes it straightforward to convert your document images and PDFs to text files. This is a key prerequisite to using your documents for generative AI and search. To learn more about using text to train or fine-tune your foundation models, refer to Fine-tune Llama 2 for text generation on Amazon SageMaker JumpStart. To use with search, refer to Implement smart document search index with Amazon Textract and Amazon OpenSearch. To learn more about advanced document processing capabilities offered by AWS AI services, refer to Guidance for Intelligent Document Processing on AWS.


About the Authors

Tim CondelloTim Condello is a senior artificial intelligence (AI) and machine learning (ML) specialist solutions architect at Amazon Web Services (AWS). His focus is natural language processing and computer vision. Tim enjoys taking customer ideas and turning them into scalable solutions.

David Girling is a senior AI/ML solutions architect with over twenty years of experience in designing, leading and developing enterprise systems. David is part of a specialist team that focuses on helping customers learn, innovate and utilize these highly capable services with their data for their use cases.

Read More

Amgen to Build Generative AI Models for Novel Human Data Insights and Drug Discovery

Amgen to Build Generative AI Models for Novel Human Data Insights and Drug Discovery

Generative AI is transforming drug research and development, enabling new discoveries faster than ever — and Amgen, one of the world’s leading biotechnology companies, is tapping the technology to power its research.

Amgen will build AI models trained to analyze one of the world’s largest human datasets on an NVIDIA DGX SuperPOD, a full-stack data center platform, that will be installed at Amgen’s deCODE genetics’ headquarters in Reykjavik, Iceland. The system will be named Freyja in honor of the powerful, life-giving Norse goddess associated with the ability to predict the future.

Freyja will be used to build a human diversity atlas for drug target and disease-specific biomarker discovery, providing vital diagnostics for monitoring disease progression and regression. The system will also help develop AI-driven precision medicine models, potentially enabling individualized therapies for patients with serious diseases.

Amgen plans to integrate the DGX SuperPOD, which will feature 31 NVIDIA DGX H100 nodes totaling 248 H100 Tensor Core GPUs, to train state-of-the-art AI models in days rather than months, enabling researchers to more efficiently analyze and learn from data in their search for novel health and therapeutics insights.

“For more than a decade, Amgen has been preparing for this hinge moment we are seeing in the industry, powered by the union of technology and biotechnology,” said David M. Reese, executive vice president and chief technology officer at Amgen. “We look forward to  combining the breadth and maturity of our world-class human data capabilities at Amgen with NVIDIA’s technologies.”

The goal of deCODE founder and CEO Kári Stefánsson in starting the company was to understand human disease by looking at the diversity of the human genome. He predicted in a recent Amgen podcast that within the next 10 years, doctors will routinely use genetics to explore uncommon diseases in patients.

“This SuperPOD has the potential to accelerate our research by training models more quickly and helping us generate questions we might not have otherwise thought to ask,” said Stefánsson.

Putting the Tech in Biotechnology

Since its founding in 1996, deCODE has curated more than 200 petabytes of de-identified human data from nearly 3 million individuals.

The company started by collecting de-identified data from Icelanders, who have a rich heritage in genealogies that stretch back for centuries. This population-scale data from research volunteers provides unique insights into human diversity as it applies to disease.

deCODE has also helped sequence more than half a million human genomes from volunteers in the UK Biobank.

But drawing insights from this much data requires powerful AI systems.

By integrating powerful new technology, Amgen has an opportunity to accelerate the discovery and development of life-changing medicines. In March 2023, NVIDIA announced that Amgen became one of the first companies to employ NVIDIA BioNeMo, which researchers have used to build generative AI models to accelerate drug discovery and development. Amgen researchers have also been accessing BioNeMo via NVIDIA DGX Cloud, an AI supercomputing service.

“Models trained in BioNeMo can advance drug discovery on multiple fronts,” said Marti Head, executive director of computational and data sciences at Amgen. “In addition to helping develop drugs that are more effective, they can also help avoid unwanted effects like immune responses, and new biologics can be made in volume.”

By adopting DGX SuperPOD, Amgen is poised to gain unprecedented data insights with the potential to change the pace and scope of drug discovery.

“The fusion of advanced AI, groundbreaking developments in biology and molecular engineering and vast quantities of human data are not just reshaping how we discover and develop new medicines — they’re redefining medicine,” Reese said.

Learn about NVIDIA’s AI platform for healthcare and life sciences.

Read More

NVIDIA Generative AI Is Opening the Next Era of Drug Discovery and Design

NVIDIA Generative AI Is Opening the Next Era of Drug Discovery and Design

In perhaps the healthcare industry’s most dramatic transformation since the advent of computing, digital biology and generative AI are helping to reinvent drug discovery, surgery, medical imaging and wearable devices.

NVIDIA has been preparing for this moment for over a decade, building deep domain expertise, creating the NVIDIA Clara healthcare-specific computing platform and expanding its work with a rich ecosystem of partners. Healthcare customers and partners already consume well over a billion dollars in NVIDIA GPU computing each year — directly and indirectly through cloud partners.

In the $250 billion field of drug discovery, these efforts are meeting an inflection point: R&D teams can now represent drugs inside a computer.

By harnessing emerging generative AI tools, drug discovery teams observe foundational building blocks of molecular sequence, structure, function and meaning — allowing them to generate or design novel molecules likely to possess desired properties. With these capabilities, researchers can curate a more precise field of drug candidates to investigate, reducing the need for expensive, time-consuming physical experiments.

Accelerating this shift is NVIDIA BioNeMo, a generative AI platform that provides services to develop, customize and deploy foundation models for drug discovery.

Used by pharmaceutical, techbio and software companies, BioNeMo offers a new class of computational methods for drug research and development, enabling scientists to integrate generative AI to reduce experiments and, in some cases, replace them altogether.

In addition to developing, optimizing and hosting AI models through BioNeMo, NVIDIA has boosted the computer-aided drug discovery ecosystem with investments in innovative techbio companies — such as biopharmaceutical company Recursion, which is offering one of its foundation models for BioNeMo users, and biotech company Terray Therapeutics, which is using BioNeMo for AI model development.

BioNeMo Brings Precision to AI-Accelerated Drug Discovery 

BioNeMo features a growing collection of pretrained biomolecular AI models for protein structure prediction, protein sequence generation, molecular optimization, generative chemistry, docking prediction and more. It also enables computer-aided drug discovery companies to make their models available to a broad audience through easy-to-access APIs for inference and customization.

Drug discovery teams use BioNeMo to invent or customize generative AI models with proprietary data — and drug discovery software companies, techbios and large pharmas  are integrating  BioNeMo cloud APIs, which will be released in beta this month, into platforms that deliver computer-aided drug discovery workflows.

The cloud APIs will now include foundation models from three sources: models invented by NVIDIA, such as the MolMIM generative chemistry model for small molecule generation; open-source models pioneered by global research teams, curated and optimized by NVIDIA, such as the OpenFold protein prediction AI; and proprietary models developed by NVIDIA partners, such as Recursion’s Phenom-Beta for embedding cellular microscopy images.

MolMIM generates small molecules while giving users finer control over the AI generation process — identifying new molecules that possess desired properties and follow constraints specified by users. For example, researchers could direct the model to generate molecules that have similar structures and properties to a given reference molecule.

Phenomenal AI for Pharma: Recursion Brings Phenom-Beta Model to BioNeMo

Recursion is the first hosting partner offering an AI model through BioNeMo cloud APIs: Phenom-Beta, a vision transformer model that extracts biologically meaningful features from cellular microscopy images.

This capability can provide researchers with insights about cell function and help them learn how cells respond to drug candidates or genetic engineering.

Phenom-Beta performed well on image reconstruction tasks, a training metric to evaluate model performance. Read the NeurIPS workshop paper to learn more.

Phenom-Beta was trained on Recursion’s publicly available RxRx3 dataset of biological images using the company’s BioHive-1 supercomputer, based on the NVIDIA DGX SuperPOD reference architecture.

To further its foundation model development, Recursion is expanding its supercomputer with more than 500 NVIDIA H100 Tensor Core GPUs. This will boost its computational capacity by 4x to create what’s expected to be the most powerful supercomputer owned and operated by any biopharma company.

How Companies Are Adopting NVIDIA BioNeMo

A growing group of scientists, biotech and pharma companies, and AI software vendors are using NVIDIA BioNeMo to support biology, chemistry and genomics research.

Biotech leader Terray Therapeutics is integrating BioNeMo cloud APIs into its development of a generalized, multi-target structural binding model. The company also uses NVIDIA DGX Cloud to train chemistry foundation models to power generative AI for small molecule design.

Protein engineering and molecular design companies Innophore and Insilico Medicine are bringing BioNeMo into their computational drug discovery applications. Innophore is integrating BioNeMo cloud APIs into its Catalophore platform for protein design and drug discovery. And Insilico, a premier member of the NVIDIA Inception program for startups, has adopted BioNeMo in its generative AI pipeline for early drug discovery.

Biotech software company OneAngstrom and systems integrator Deloitte are using BioNeMo cloud APIs to build AI solutions for their clients.

OneAngstrom is integrating BioNeMo cloud APIs into its SAMSON platform for molecular design used by academics, biotechs and pharmas. Deloitte is transforming scientific research by integrating BioNeMo on NVIDIA DGX Cloud with the Quartz Atlas AI platform. This combination enables biopharma researchers with unparalleled data connectivity and cutting-edge generative AI, propelling them into a new era of accelerated drug discovery.

Learn more about NVIDIA BioNeMo and subscribe to NVIDIA healthcare news.

Read More

NVIDIA Reveals Gaming, Creating, Generative AI, Robotics Innovations at CES

NVIDIA Reveals Gaming, Creating, Generative AI, Robotics Innovations at CES

The AI revolution returned to where it started this week, putting powerful new tools into the hands of gamers and content creators.

Generative AI models that will bring lifelike characters to games and applications and new GPUs for gamers and creators were among the highlights of a news-packed address Monday ahead of this week’s CES trade show in Las Vegas.

“Today, NVIDIA is at the center of the latest technology transformation: generative AI,” said Jeff Fisher, senior vice president for GeForce at NVIDIA, who was joined by leaders across the company to introduce products and partnerships across gaming, content creation, and robotics.

A Launching Pad for Generative AI

As AI shifts into the mainstream, Fisher said NVIDIA’s RTX GPUs, with more than 100 million units shipped, are pivotal in the burgeoning field of generative AI, exemplified by innovations like ChatGPT and Stable Diffusion.

In October, NVIDIA released the TensorRT-LLM library for Windows, accelerating large language models, or LLMs, like Llama 2 and Mistral up to 5x on RTX PCs.

And with our new Chat with RTX playground, releasing later this month, enthusiasts can connect an RTX-accelerated LLM to their own data, from locally stored documents to YouTube videos, using retrieval-augmented generation, or RAG, a technique for enhancing the accuracy and reliability of generative AI models.

Fisher also introduced TensorRT acceleration for Stable Diffusion XL and SDXL Turbo in the popular Automatic1111 text-to-image app, providing up to a 60% boost in performance.

NVIDIA Avatar Cloud Engine (ACE) Microservices Debut With Generative AI Models for Digital Avatars

NVIDIA ACE is a technology platform that brings digital avatars to life with generative AI. ACE AI models are designed to run in the cloud or locally on the PC.

In an ACE demo featuring Convai’s new technologies, NVIDIA’s Senior Product Manager Seth Schneider showed how it works.

 

First, a player’s voice input is passed to NVIDIA’s automatic speech recognition model, which translates speech to text. Then, the text is put into an LLM to generate the character’s response.

After that, the text response is vocalized using a text-to-speech model, which is passed to an animation model to create a realistic lip sync. Finally, the dynamic character is rendered into the game scene.

At CES, NVIDIA is announcing ACE Production Microservices for NVIDIA Audio2Face and NVIDIA Riva Automatic Speech Recognition. Available now, each model can be incorporated by developers individually into their pipelines.

NVIDIA is also announcing game and interactive avatar developers are pioneering ways ACE and generative AI technologies can be used to transform interactions between players and non-playable characters in games and applications. Developers embracing ACE include Convai, Charisma.AI, Inworld, miHoYo, NetEase Games, Ourpalm, Tencent, Ubisoft and UneeQ.

Getty Images Releases Generative AI by iStock and AI Image Generation Tools Powered by NVIDIA Picasso

Generative AI empowers designers and marketers to create concept imagery, social media content and more. Today, iStock by Getty Images is releasing a genAI service built on NVIDIA Picasso, an AI foundry for visual design, Fisher announced.

The iStock service allows anyone to create 4K imagery from text using an AI model trained on Getty Images’ extensive catalog of licensed, commercially safe creative content. New editing application programming interfaces that give customers powerful control over their generated images are also coming soon.

The generative AI service is available today at istock.com, with advanced editing features releasing via API.

NVIDIA Introduces GeForce RTX 40 SUPER Series

Fisher announced a new series of GeForce RTX 40 SUPER GPUs with more gaming and generative AI performance.

Fisher said that the GeForce RTX 4080 SUPER can power fully ray-traced games at 4K. It’s 1.4x faster than the RTX 3080 Ti without frame gen in the most graphically intensive games. With 836 AI TOPS, NVIDIA DLSS Frame Generation delivers an extra performance boost, making the RTX 4080 SUPER twice as fast as an RTX 3080 Ti.

Creators can generate video with Stable Video Diffusion 1.5x faster and images with Stable Diffusion XL 1.7x faster. The RTX 4080 SUPER features more cores and faster memory, giving it a performance edge at a great new price of $999. It will be available starting Jan. 31.

Next up is the RTX 4070 Ti SUPER. NVIDIA has added more cores and increased the frame buffer to 16GB and the memory bus to 256 bits. It’s 1.6x faster than a 3070 Ti and 2.5x faster with DLSS 3, Fisher said. The RTX 4070 Ti SUPER will be available starting Jan. 24 for $799.

Fisher also introduced the RTX 4070 SUPER. NVIDIA has added 20% more cores, making it faster than the RTX 3090 while using a fraction of the power. And with DLSS 3, it’s 1.5x faster in the most demanding games. It will be available for $599 starting Jan. 17.

NVIDIA RTX Remix Open Beta Launches This Month

There are over 10 billion game mods downloaded each year. With RTX Remix, modders can remaster classic games with full ray tracing, DLSS, NVIDIA Reflex and generative AI texture tools that transform low-resolution textures into 4K, physically accurate materials. The RTX Remix app will be released in open beta on Jan. 22.

RTX Remix has already delivered stunning remasters in NVIDIA’s Portal with RTX and the modder-made Portal: Prelude RTX. Now, Orbifold Studios is using RTX Remix to develop Half-Life 2 RTX: An RTX Remix Project, a community remaster of one of the highest-rated games of all time.

Check out this new Half-Life 2 RTX gameplay trailer:

 

Twitch and NVIDIA to Release Multi-Encode Livestreaming

Twitch is one of the most popular platforms for content creators, with over 7 million streamers going live each month to 35 million daily viewers. Fisher explained that these viewers are on all kinds of devices and internet services.

Yet many Twitch streamers are limited to broadcasting at a single resolution and quality level. As a result, they must broadcast at lower quality to reach more viewers.

To address this, Twitch, OBS and NVIDIA announced Enhanced Broadcasting, supported by all RTX GPUs. This new feature allows streamers to transmit up to three concurrent streams to Twitch at different resolutions and quality so each viewer gets the optimal experience.

Beta signups start today and will go live later this month. Twitch will also experiment with 4K and AV1 on the GeForce RTX 40 Series GPUs to deliver even better quality and higher resolution streaming.

‘New Wave’ of AI-Ready RTX Laptops

RTX is the fastest-growing laptop platform, having grown 5x in the last four years. Over 50 million devices are enjoyed by gamers and creators across the globe.

More’s coming. Fisher announced “a new wave” of RTX laptops launching from every major manufacturer. “Thanks to powerful RT and Tensor Cores, every RTX laptop is AI-ready for the best gaming and AI experiences,” Fisher said.

With an installed base of 100 million GPUs and 500 RTX games and apps, GeForce RTX is the world’s largest platform for gamers, creators and, now, generative AI.

Activision and Blizzard Games Embrace RTX

More than 500 games and apps now take advantage of NVIDIA RTX technology, NVIDIA’s Senior Consumer Marketing Manager Kristina Bartz said, including Alan Wake 2, which won three awards at this year’s Game Awards.

NVIDIA Consumer Marketing Manager Kristina Bartz spoke about how NVIDIA technologies are being integrated into popular games.

It’s a list that keeps growing with 14 new RTX titles announced at CES.

Horizon Forbidden West, the critically acclaimed sequel to Horizon Zero Dawn, will come to PC early this year with the Burning Shores expansion, accelerated by DLSS 3.

Pax Dei is a social sandbox massively multiplayer online game inspired by the legends of the medieval era. Developed by Mainframe Industries with veterans from CCP Games, Blizzard and Remedy Entertainment, Pax Dei will launch in early access on PC with AI-accelerated DLSS 3 this spring.

Last summer, Diablo IV launched with DLSS 3 and immediately became Blizzard’s fastest-selling game. RTX ray tracing will now be coming to Diablo IV in March.

More than 500 games and apps now take advantage of NVIDIA RTX technology, with more coming.

Day Passes and G-SYNC Technology Coming to GeForce NOW

NVIDIA’s partnership with Activision also extends to the cloud with GeForce NOW, Bartz said. In November, NVIDIA welcomed the first Activation and Blizzard game, Call of Duty: Modern Warfare 3. Diablo IV and Overwatch 2 are coming soon.

GeForce NOW will get Day Pass membership options starting in February. Priority and Ultimate Day Passes will give gamers a full day of gaming with the fastest access to servers, with all the same benefits as members, including NVIDIA DLSS 3.5 and NVIDIA Reflex for Ultimate Day Pass purchasers.

NVIDIA also announced Cloud G-SYNC technology is coming to GeForce NOW, which varies the display refresh rate to match the frame rate on G-SYNC monitors, giving members the smoothest, tear-free gaming experience from the cloud.

Generative AI Powers Smarter Robots With NVIDIA Isaac

NVIDIA Vice President of Robotics and Edge Computing Deepu Talla addressed the intersection of AI and robotics.

Closing out the special address, NVIDIA Vice President of Robotics and Edge Computing Deepu Talla shared how the infusion of generative AI into robotics is speeding up the ability to bring robots from proof of concept to real-world deployment.

Talla gave a peek into the growing use of generative AI in the NVIDIA robotics ecosystem, where robotics innovators like Boston Dynamics and Collaborative Robots are changing the landscape of human-robot interaction.

Read More

NVIDIA Drives AI Forward With Automotive Innovation on Display at CES

NVIDIA Drives AI Forward With Automotive Innovation on Display at CES

Amid explosive interest in generative AI, the auto industry is racing to embrace the power of AI across a range of critical activities, from vehicle design, engineering and manufacturing, to marketing and sales.

The adoption of generative AI — along with the growing importance of software-defined computing — will continue to transform the automotive market in 2024.

NVIDIA today announced that Li Auto, a pioneer in extended-range electric vehicles (EVs), has selected the NVIDIA DRIVE Thor centralized car computer to power its next-generation fleets. Also, EV makers GWM (Great Wall Motor), ZEEKR and Xiaomi have adopted the NVIDIA DRIVE Orin platform to power their intelligent automated-driving systems.

In addition, a powerful lineup of technology is on display from NVIDIA’s automotive partners on the CES trade show floor in Las Vegas.

  • Mercedes-Benz is kicking off CES with a press conference to announce a range of exciting software-driven features and the latest developments in the Mercedes-Benz MB.OS story, each one showcased in a range of cars, including the Concept CLA Class, which is using NVIDIA DRIVE Orin for the automated driving domain. 

    Mercedes-Benz is also using digital twins for production with help from NVIDIA Omniverse, a platform for developing applications to design, collaborate, plan and operate manufacturing and assembly facilities. (West Hall – 4941)

  • Luminar will host a fireside chat with NVIDIA on Jan. 9 at 2 p.m. PT to discuss the state of the art of sensor processing and ongoing collaborations between the companies. In addition, Luminar will showcase the work it’s doing with NVIDIA partners Volvo Cars, Polestar, Plus and Kodiak. (West Hall – 5917 and West Plaza – WP10)
  • Ansys is demonstrating how it leverages NVIDIA Omniverse to accelerate autonomous vehicle development. Ansys AVxcelerate Sensors will be accessible within NVIDIA DRIVE Sim. (West Hall – 6500)
  • Cerence is introducing CaLLM, an automotive-specific large language model that serves as the foundation for the company’s next-gen in-car computing platform, running on NVIDIA DRIVE. (West Hall – 6627)
  • Cipia is showcasing its embedded software version of Cabin Sense, which includes both driver and occupancy monitoring and is expected to go into serial production this year. NVIDIA DRIVE is the first platform on which Cabin Sense will run commercially. (North Hall – 11022)
  • Kodiak is exhibiting an autonomous truck, which relies on NVIDIA GPUs for high-performance compute to process the enormous quantities of data it collects from its cameras, radar and lidar sensors. (West Plaza – WP10, with Luminar)
  • Lenovo is displaying its vehicle computing roadmap, featuring new products based on NVIDIA DRIVE Thor, including: Lenovo XH1, a central compute unit for advanced driver-assistance systems and smart cockpit; Lenovo AH1, a level 2++ ADAS domain controller unit; and Lenovo AD1, a level 4 autonomous driving domain controller unit. (Estiatorio Milos, Venetian Hotel)
  • Pebble, a recreational vehicle startup, is presenting its flagship product Pebble Flow, the electric semi-autonomous travel trailer powered by NVIDIA DRIVE Orin, with production starting before the end of 2024. (West Hall – 7023)
  • Polestar is showcasing Polestar 3, which is powered by the NVIDIA DRIVE Orin central core computer. (West Hall – 5917 with Luminar and Central Plaza – CP1 with Google)
  • Zoox is showcasing the latest generation of its purpose-built robotaxi, which leverages NVIDIA technology, and is offering CES attendees the opportunity to join its early-bird waitlist for its autonomous ride-hailing service. (West Hall – 7228)

Explore to Win

Visit select NVIDIA partner booths for a chance to win GTC 2024 conference passes with hotel accommodations.

Event Lineup

Check out NVIDIA’s CES event page for a summary of all of the company’s automotive-related events. Learn about NVIDIA’s other announcements at CES by viewing the company’s special address on demand.

Read More

The Creative AI: NVIDIA Studio Unveils New RTX- and AI-Accelerated Tools and Systems for Creators

The Creative AI: NVIDIA Studio Unveils New RTX- and AI-Accelerated Tools and Systems for Creators

Editor’s note: This post is part of our weekly In the NVIDIA Studio series, which celebrates featured artists, offers creative tips and tricks, and demonstrates how NVIDIA Studio technology improves creative workflows. We’re also deep diving on new GeForce RTX 40 Series GPU features, technologies and resources, and how they dramatically accelerate content creation.

NVIDIA Studio is debuting at CES powerful new software and hardware upgrades to elevate content creation.

It brings the release of powerful NVIDIA Studio laptops and desktops from Acer, ASUS, Dell, HP, Lenovo, MSI and Samsung, as well as the launch of the new GeForce RTX 40 SUPER Series GPUs — including the GeForce RTX 4080 SUPER, GeForce RTX 4070 Ti SUPER and GeForce RTX 4070 SUPER — to supercharge creating, gaming and AI tasks.

Generative AI by iStock from Getty Images is a new generative AI tool trained by NVIDIA Picasso that uses licensed artwork and the NVIDIA Edify architecture model to ensure that generated assets are commercially safe.

RTX Video HDR coming Jan. 24 transforms standard dynamic range video playing in internet browsers into stunning high dynamic range (HDR). By pairing it with RTX Video Super Resolution, NVIDIA RTX and GeForce RTX GPU owners can achieve dramatic video quality improvements on their HDR10 displays.

Twitch, OBS and NVIDIA are enhancing livestreaming technology with the new Twitch Enhanced Broadcasting beta, powered by GeForce RTX GPUs. Available later this month, the beta will enable users to stream multiple encodes concurrently, providing optimal viewing experiences for a broad range of device types and connections.

And NVIDIA RTX Remix — a free modding platform for quickly remastering classic games with RTX — releases in open beta later this month. It provides full ray tracing, NVIDIA DLSS, NVIDIA Reflex and generative AI texture tools.

This week’s In the NVIDIA Studio installment also features NVIDIA artists Ashlee Martino-Tarr, a 3D content specialist, and Daniela Flamm Jackson, a technical product marketer, who transform 2D illustrations into dynamic 3D scenes using AI and Adobe Firefly — powered by NVIDIA in the cloud and natively with GeForce RTX GPUs.

New Year, New NVIDIA Studio Laptops

The new NVIDIA Studio laptops and desktops level up power and efficiency with exclusive software like Studio Drivers preinstalled — enhancing creative features, reducing time-consuming tasks and speeding workflows.

The Acer Predator Triton Neo 16 features several 16-inch screen options with up to a 3.2K resolution at a 165Hz refresh rate and 16:10 aspect ratio. It provides DCI-P3 100% color gamut and support for NVIDIA Optimus and NVIDIA G-SYNC technology for sharp color hues and tear-free frames. It’s expected to be released in March.

The Acer Predator Triton Neo 16, with up to the GeForce RTX 4070 Laptop GPU.

The ASUS ROG Zephryus G14 features a Nebula Display with a OLED panel and a G-SYNC OLED display running at 240Hz. It’s expected to release on Feb. 6.

The ASUS ROG Zephryus G14 with up to the GeForce RTX 4070 Laptop GPU.

The XPS 16 is Dell’s most powerful laptop featuring a large 16.3” InfinityEdge display, available with a 4K+ OLED touch display, true-to-life color delivering up to 80W of sustained performance, all with tone-on-tone finishes for an elegant, minimalistic design. Stay tuned for an update on release timing.

Dell’s XPS 16 with up to the GeForce RTX 4070 Laptop GPU.

Lenovo’s Yoga Pro 9i sports a 16-inch 3.2K PureSight Pro display, delivering a grid of over 1,600 mini-LED dimming zones, expertly calibrated colors accurate to Delta E< 1 and up to 165Hz. With Microsoft’s Auto Color Management feature, its display toggles automatically between 100% P3, 100% sRGB and 100% Adobe RGB color to ensure the highest-quality color. It’s expected to be released in April.

Lenovo Yoga Pro 9i with up to the GeForce RTX 4070 Laptop GPU.

HP’s OMEN 14 Transcend features a 14-inch 4K OLED WQXGA screen, micro-edge, edge-to-edge glass and 100% DCI-P3 with a 240Hz refresh rate. NVIDIA DLSS 3 technology helps unlock more efficient content creation and gaming sessions using only one-third of the expected battery power. It’s targeting a Jan. 19 release.

HP’s OMEN 14 Transcend with up to GeForce RTX 4070 Laptop GPU.

Samsung’s Galaxy Book4 Ultra includes an upgraded Dynamic AMOLED 2X display for high contrast and vivid color, as well as a convenient touchscreen. Its Vision Booster feature uses an Intelligent Outdoor Algorithm to automatically enhance visibility and color reproduction in bright conditions.

Samsung’s Galaxy Book4 Ultra with up to the GeForce RTX 4070 Laptop GPU.

Check back for more information on the new line of Studio systems, including updates to release dates.

A SUPER Debut for New GeForce RTX 40 Series Graphics Cards

The GeForce RTX 40 Series has been supercharged with the new GeForce RTX 4080 SUPER, GeForce RTX 4070 Ti SUPER and GeForce RTX 4070 SUPER graphics cards. This trio is faster than its predecessors, with RTX platform superpowers that enhance creating, gaming and AI tasks.

The GeForce RTX 4080 SUPER sports more CUDA cores than the GeForce RTX 4080 and includes the world’s fastest GDDR6X video memory at 23 Gbps. In 3D apps like Blender, it can run up to 70% faster than previous generations. In generative AI apps like Stable Diffusion XL or Stable Video Diffusion, it can produce 1,024×1,024 images 1.7x faster and video 1.5x faster. Or play fully ray-traced games, including Alan Wake 2, Cyberpunk 2077: Phantom Liberty and Portal with RTX, in stunning 4K. The RTX 4080 SUPER will be available Jan. 31 as a Founders Edition and as custom boards for partners starting at $999.

The GeForce RTX 4070 Ti SUPER is equipped with more CUDA cores than the RTX 4070, a frame buffer increased to 16GB, and a 256-bit bus. It’s suited for video editing and rendering large 3D scenes and runs up to 1.6x faster than the RTX 3070 Ti and 2.5x faster with DLSS 3 in the most graphics-intensive games. Gamers can max out high-refresh 1440p panels or even game at 4K. The RTX 4070 Ti SUPER will be available Jan. 24 from custom board partners in stock-clocked and factory-overclocked configurations starting at $799.

The GeForce RTX 4070 SUPER has 20% more CUDA cores than the GeForce RTX 4070 and is great for 1440p creating. With DLSS 3, it’s 1.5x faster than a GeForce RTX 3090 while using a fraction of the power.

Read more on the GeForce article.

Creative Vision Meets Reality With Getty Images and NVIDIA

Content creators using the new Generative AI by iStock from Getty Images tool powered by NVIDIA Picasso can now safely, affordably use AI-generated images with full protection.

Generative AI by iStock is trained on Getty Images’ vast creative library of high-quality licensed content, including millions of exclusive photos, illustrations and videos. Users can enter prompts to generate photo-quality images at up to 4K for social media promotion, digital advertisements and more.

Getty Images is also making advanced inpainting and outpainting features available via application programming interfaces. Developers can seamlessly integrate the new APIs with creative applications to add people and objects to images, replace specific elements and expand images to a wide range of aspect ratios.

Customers can use Generative AI by iStock online today. Advanced editing features are coming soon to the iStock website.

RTX Video HDR Brings AI Video Upgrades

RTX Video HDR brings a new AI-enhanced feature that instantly converts any standard dynamic range video playing in internet browsers into vibrant HDR.

HDR delivers stunning video quality but is not widely available because of effort and hardware limitations.

RTX Video HDR allows NVIDIA RTX and GeForce RTX GPU owners to maximize their HDR panel’s ability to display more vivid, dynamic colors, helping preserve intricate details that may be lost in standard dynamic range.

The feature requires an HDR10-compatible display or TV connected to a RTX-powered PC and works with Chromium-based browsers such as Google Chrome or Microsoft Edge.

RTX Video HDR and RTX Video Super Resolution can be used together to produce the clearest livestreamed video.

RTX Video HDR is coming to all NVIDIA RTX and GeForce RTX GPUs as part of a driver update later this month. Once the update goes through, navigate to the NVIDIA control panel and switch it on.

Enhanced Broadcasting Beta Enables Multi-Encode Livestreaming

With Twitch Enhanced Broadcasting beta, GeForce RTX GPU owners will be able to broadcast up to three resolutions simultaneously at up to 1080p. In the coming months, Twitch plans to roll out support for up to five concurrent encodes to further optimize viewer experiences.

As part of the beta, Twitch will test higher input bit rates as well as new codecs, which are expected to further improve visual quality. The new codecs include the latest-generation AV1 for GeForce RTX 40 Series GPUs, which provides 40% more encoding efficiency than H.264, and HEVC for previous-generation GeForce GPUs.

To simplify the setup process, Enhanced Broadcasting will automatically configure all open broadcaster software encoder settings, including resolution, bit rate and encoding parameters.

Sign up for the Twitch Enhanced Broadcasting beta today.

A Righteous RTX Remix

Built on NVIDIA Omniverse, RTX Remix allows modders to easily capture game assets, automatically enhance materials with generative AI tools, reimagine assets via Omniverse-connected apps and Universal Scene Description (OpenUSD), and quickly create stunning RTX remasters of classic games with full ray tracing and NVIDIA DLSS technology.

The RTX Remix open beta releases later this month.

RTX Remix has already delivered stunning remasters in Portal with RTX and the modder-made Portal: Prelude RTX. Now, Orbifold Studios is using RTX Remix to develop Half-Life 2 RTX: An RTX Remix Project, a community remaster of one of the highest-rated games of all time. Check out the new Half-Life 2 RTX gameplay trailer, showcasing Orbifold Studios’ latest updates to Ravenholm:

AI and RTX Bring Illustrations to Life

NVIDIA artists and this week’s In the NVIDIA Studio features Ashlee Martino-Tarr and Daniela Flamm Jackson are passionate about illustration — whether in work or at play.

They used Adobe Firefly’s generative AI features, powered by NVIDIA GPUs in the cloud and accelerated with Tensor Cores in GeForce RTX GPUs, to animate a 2D illustration with special effects.

To begin, the pair separated the 2D image into multiple layers and expanded the canvas. Firefly’s Generative Expand feature automatically filled the added space with AI-generated content.

 

Next, the team separated select elements — starting with character — and used the AI Object Select feature to automatically mask the layer. The Generative Fill feature then created new content to fill in the background, saving even more time.

 

This process continued until all distinct layers were separated and imported into Adobe After Effects. Next, they used the Mercury 3D Engine on local RTX GPUs to accelerate playback, unlocking smoother movement in the viewport. Previews and adjustments like camera shake and depth of field were also GPU-accelerated.

 

Firefly’s Style Match feature then took the existing illustration and created new imagery in its likeness — in this case, a vibrant butterfly sporting similar colors and tones. The duo also used Adobe Illustrator’s Generative Recolor feature, which enables artists to explore a wide variety of colors and themes without having to manually recolor their work.

 

Martino-Tarr and Jackson then chose their preferred assets and animated them in Adobe After Effects. Firefly’s powerful AI effects helped speed or entirely eliminate tedious tasks such as patching holes, handpainting set extensions and caching animation playbacks.

A variety of high-quality images to choose from.

The artists concluded post-production work by putting the finishing touches on their AI animation in After Effects.

 

Firefly’s powerful AI capabilities were developed with the creative community in mind — guided by AI ethics principles of content and data transparency — to ensure morally responsible output. NVIDIA technology continues to power these features from the cloud for photographers, illustrators, designers, video editors, 3D artists and more.

NVIDIA artists Ashlee Martino-Tarr and Daniela Flamm Jackson.

Check out Martino-Tarr’s portfolio on ArtStation and Jackson’s on IMDb.

Follow NVIDIA Studio on Instagram, Twitter and Facebook. Access tutorials on the Studio YouTube channel and get updates directly in your inbox by subscribing to the Studio newsletter. 

Read More

Twitch, OBS and NVIDIA to Release Multi-Encode Livestreaming

Twitch, OBS and NVIDIA to Release Multi-Encode Livestreaming

Twitch, OBS and NVIDIA are leveling up livestreaming technology with the new Twitch Enhanced Broadcasting beta, powered by GeForce RTX GPUs. Available in a few days, streamers will be able to stream multiple encodes concurrently, providing optimal viewing experiences for all viewers. 

Twitch Enhanced Broadcasting

Today, many streamers must choose between higher resolution and reliable streaming. High-quality video provides more enjoyable viewing experiences but causes streams to buffer for viewers with low bandwidth or older viewing devices. Streaming lower-bitrate video allows more people to watch the content seamlessly, but introduces artifacts.

Twitch — the interactive livestreaming platform — provides server-side transcoding for top-performing channels, meaning it will create different versions of the same stream for different bandwidth levels, improving the viewing experience. But the audience of many channels are left with a single stream option.

Twitch, OBS and NVIDIA have collaborated on a new feature to address this — Twitch Enhanced Broadcasting, releasing in beta later this month. Using the high-quality dedicated encoder (NVENC) in modern GeForce RTX and GTX GPUs, streamers will be able to broadcast up to three resolutions simultaneously at up to 1080p.

In the coming months, Enhanced Broadcasting beta testers will be able to experiment with higher-input bit rates, up to 4K resolutions, up to 5 concurrent streams, as well as new codecs. The new codecs include the latest-generation AV1 for GeForce RTX 40 Series GPUs, which provides 40% more encoding efficiency than H.264, and HEVC for previous-generation GeForce GPUs.

To simplify set up, Enhanced Broadcasting will automatically configure all OBS encoder settings, including resolution, bit rate and encoding parameters. A server-side algorithm will return the best possible configuration for OBS Studio based on the streamer’s setup, taking the headaches out of tuning settings for the best viewer experiences.

Using the dedicated NVENC hardware encoder, streamers can achieve the highest quality video across streaming bitrates, with minimal impact to app and game performance.

Sign up for the Twitch Enhanced Broadcasting beta today at twitch.tv/broadcast. Twitch will enroll participants on a first-come, first-served basis, starting later this month. Once a creator has been enrolled in the beta, they’ll receive an email with additional instructions.

To further elevate livestreams, download the NVIDIA Broadcast app, free for RTX GPU owners and powered by dedicated AI Tensor Cores, to augment broadcast capabilities for microphones and cameras.

Follow NVIDIA Studio on Instagram, Twitter and Facebook. Access tutorials on the Studio YouTube channel and get updates directly in your inbox by subscribing to the Studio newsletter. 

Read More

Picture This: Getty Images Releases Generative AI By iStock Powered by NVIDIA Picasso

Picture This: Getty Images Releases Generative AI By iStock Powered by NVIDIA Picasso

Getty Images, a global visual content creator and marketplace, today at CES released Generative AI by iStock, an affordable and commercially safe image generation service trained on the company’s creative library of licensed, proprietary data.

Built on NVIDIA Picasso, a foundry for custom AI models, Generative AI by iStock provides designers and businesses with a text-to-image generation tool to create ready-to-license visuals, with legal protection and usage rights for generated images included.

Alongside the release of the service on the iStock website, Getty Images is also making advanced inpainting and outpainting features available via application programming interfaces, launching on iStock.com and Gettyimages.com soon. Developers can seamlessly integrate the new APIs with creative applications to add people and objects to images, replace specific elements and expand images in a wide range of aspect ratios.

Create With Im-AI-gination

Generative AI by iStock is trained with NVIDIA Picasso on Getty Images’ vast creative library — including exclusive photos, illustrations and videos — providing users with a commercially safe way to generate visuals. Users can enter simple text prompts to generate photo-quality images at up to 4K resolution.

Generative AI by iStock Powered by Picasso Editing APIs
Inpainting and outpainting APIs, with Reflex feature coming soon.

New editing APIs give customers powerful control over their generated images.

The Inpainting feature allows users to mask a region of an image, then fill in the region with a person or object described via a text prompt.

Outpainting enables users to expand images to fit various aspect ratios, filling in new areas based on the context of the original image. This is a powerful tool to create assets with unique aspect ratios for advertising or social media promotion.

And coming soon, a Replace feature provides similar capabilities to Inpainting but with stricter adherence to the mask.

Transforming Visual Design

The NVIDIA Picasso foundry enables developers and service providers to seamlessly train, fine-tune, optimize and deploy generative AI models tailored to their visual design requirements. Developers can use their own AI models or train new ones using the NVIDIA Edify model architecture to generate images, videos, 3D assets, 360-degree high-dynamic-range imaging and physically based rendering materials from simple text prompts.

Using NVIDIA Picasso, Getty Images trained a bespoke Edify image generator based on its catalog of licensed images and videos to power the Generative AI by iStock service.

Customers can use Generative AI by iStock online today. Advanced editing features are now available via APIs and coming soon to the iStock website.

Read More

NVIDIA Omniverse Adopted by Global Automotive-Configurator Developer Ecosystem

NVIDIA Omniverse Adopted by Global Automotive-Configurator Developer Ecosystem

Whether building a super-capable truck or conjuring up a dream sports car, spending hours playing with online car configurators is easy.

With auto industry insiders predicting that most new vehicle purchases will move online by 2030, these configurators are more than just toys.

They’re crucial to the future of the world’s automakers — essential in showing off what their brand is all about, boosting average selling prices and helping customers select and personalize their vehicles.

It’s also a natural use case for the sophisticated simulation capabilities of NVIDIA Omniverse, a software platform for developing and deploying advanced 3D applications and pipelines based on OpenUSD. It provides the ability to instantly visualize changes to a car’s color or customize its interior with luxurious finishes.

Studies show that 80% of shoppers are drawn to brands that give them a personal touch while shopping.

Aiming to meet these customer demands, a burgeoning ecosystem of partners and customers is putting to work elements of Omniverse.

Key creative partners and developers like BITONE, Brickland, Configit, Katana Studio Ltd. (serving Craft Detroit), WPP and ZeroLight are pioneering Omniverse-powered configurators. And leading automakers such as Lotus are adopting these advanced solutions.

That’s because traditional auto configurators, often limited by pre-rendered images, experience difficulty achieving personalization and dynamic environment representation.

They use different kinds of data in various tools, such as static images of what users see on the website, lists of available options based on location, product codes and personal information.

These challenges extend from the consumer experience — often characterized by limited interactivity and realism — to back-end processes for original equipment manufacturers (OEMs) and agencies, where inflexibility and inefficiencies in updating configurators and repurposing assets are common.

Reconfiguring Configurators With NVIDIA Omniverse

Omniverse helps software developers and service providers streamline their work.

Service providers can now access the platform to craft state-of-the-art 3D experiences and showcase lifelike graphics and high-end, immersive environments with advanced lighting and textures.

And OEMs can benefit from a unified asset pipeline that simplifies the integration of design and engineering data for marketing purposes. Omniverse’s enhanced tools also allow them to quickly produce diverse marketing materials, boosting customer engagement through customized content.

Independent software vendors, or ISVs, can use the native OpenUSD platform as a foundation for creating scene construction tools — or to help develop tools for managing configuration variants.

With the NVIDIA Graphics Delivery Network (GDN) software development kit, high-quality, real-time NVIDIA RTX viewports can be embedded into web applications, ensuring seamless operation on nearly any device.

This, along with support for large-scale scenes and physically accurate graphics, allows developers to concentrate on enhancing the user experience without compromising quality on lower-spec machines.

Omniverse Cloud taps GDN, which uses NVIDIA’s global cloud-streaming infrastructure to deliver seamless access to high-fidelity 3D interactive experiences.

Configurators, when run on GDN, can be easily published at scale using the same GPU architecture on which they were developed and streamed to nearly any device.

All this means less redundancy in data prep, aggregated and accessible data, fewer manual pipeline updates and instant access for the entire intended audience.

Global Adoption by Innovators and Industry Leaders

Omniverse is powering a new era in automotive design and customer interaction, heralded by a vibrant ecosystem of partners and customers.

Lotus is at the forefront, developing an interactive dealership user experience using Omniverse and generative AI tools including NVIDIA Avatar Cloud Engine and NVIDIA Omniverse Audio2Face.

To dive deeper into the world of advanced car configurators, read more on Omniverse and GDN

Read More