Millions of new materials discovered with deep learning

Millions of new materials discovered with deep learning

We share the discovery of 2.2 million new crystals – equivalent to nearly 800 years’ worth of knowledge. We introduce Graph Networks for Materials Exploration (GNoME), our new deep learning tool that dramatically increases the speed and efficiency of discovery by predicting the stability of new materials.Read More

PyTorch 2.1 Contains New Performance Features for AI Developers

We are excited to see the release of PyTorch 2.1. In this blog, we discuss the five features for which Intel made significant contributions to PyTorch 2.1:

  1. TorchInductor-CPU optimizations including Bfloat16 inference path for torch.compile
  2. CPU dynamic shape inference path for torch.compile
  3. C++ wrapper (prototype)
  4. Flash-attention-based scaled dot product algorithm for CPU
  5. PyTorch 2 export post-training auantization with an x86 back end through an inductor

At Intel, we are delighted to be part of the PyTorch community and appreciate the collaboration with and feedback from our colleagues at Meta* as we co-developed these features.

Let’s get started.

TorchInductor-CPU Optimizations

This feature optimizes bfloat16 inference performance for TorchInductor. The 3rd and 4th generation Intel® Xeon® Scalable processors have built-in hardware accelerators for speeding up dot-product computation with the bfloat16 data type. Figure 1 shows a code snippet of how to specify the BF16 inference path.

user_model = ...

user_model.eval()
with torch.no_grad(), torch.autocast("cpu"):
	compiled_model = torch.compile(user_model)
	y = compiled_model(x)

Figure 1. Code snippet showing the use of BF16 inference with TorchInductor

We measured the performance on three TorchInductor benchmark suites—TorchBench, Hugging Face, and TIMM—and the results are as follows in Table 1. Here we see that performance in graph mode (TorchInductor) outperforms eager mode by factors ranging from 1.25x to 2.35x.

Table 1. Bfloat16 performance geometric mean speedup in graph mode, compared with eager mode

Bfloat16 Geometric Mean Speedup (Single-Socket Multithreads)
Compiler torchbench huggingface timm_models
inductor 1.81x 1.25x 2.35x
Bfloat16 Geometric Mean Speedup (Single-Core Single Thread)
Compiler torchbench huggingface timm_models
inductor 1.74x 1.28x 1.29x

Developers can fully deploy their models on 4th generation Intel Xeon processors to take advantage of the Intel® Advanced Matrix Extensions (Intel® AMX) feature to get peak performance for torch.compile. Intel AMX has two primary components: tiles and tiled matrix multiplication (TMUL). The tiles store large amounts of data in eight two-dimensional registers, each one kilobyte in size. TMUL is an accelerator engine attached to the tiles that contain instructions to compute larger matrices in a single operation.

CPU Dynamic Shapes Inference Path for torch.compile

Dynamic shapes is one of the key features in PyTorch 2.0. PyTorch 2.0 assumes everything is static by default. If we recompile because a size changed, we will instead attempt to recompile that size as being dynamic (sizes that have changed are likely to change in the future). Dynamic shapes support is required for popular models like large language models (LLM). Dynamic shapes that provide support for a broad scope of models can help users get more benefit from torch.compile. For dynamic shapes, we provide the post-op fusion for conv/gemm operators and vectorization code-gen for non-conv/gemm operators.

Dynamic shapes is supported by both the inductor Triton back end for CUDA* and the C++ back end for CPU. The scope covers improvements for both functionality (as measured by model passing rate) and performance (as measured by inference latency/throughput). Figure 2 shows a code snippet for the use of dynamic shape inference with TorchInductor.

user_model = ...

# Training example
compiled_model = torch.compile(user_model)
y = compiled_model(x_size1)
# Here trigger the recompile because the input size changed
y = compiled_model(x_size2)


# Inference example
user_model.eval()
compiled_model = torch.compile(user_model)
with torch.no_grad():
	y = compiled_model(x_size1)
 # Here trigger the recompile because the input size changed
 y = compiled_model(x_size2)

Figure 2. Code snippet showing the use of dynamic shape inference with TorchInductor

We again measured the performance on the three TorchInductor benchmark suites—TorchBench, Hugging Face, and TIMM—and the results are in Table 2. Here we see that performance in graph mode outperforms eager mode by factors ranging from 1.15x to 1.79x.

Table 2. Dynamic shape geometric mean speedup compared with Eager mode

Dynamic Shape Geometric Mean Speedup (Single-Socket Multithreads)
Compiler torchbench huggingface timm_models
inductor 1.35x 1.15x 1.79x
Dynamic Shape Geometric Mean Speedup (Single-Core Single-Thread)
Compiler torchbench huggingface timm_models
inductor 1.48x 1.15x 1.48x

C++ Wrapper (Prototype)

The feature generates C++ code instead of Python* code to invoke the generated kernels and external kernels in TorchInductor to reduce Python overhead. It is also an intermediate step to support deployment in environments without Python.

To enable this feature, use the following configuration:

import torch
import torch._inductor.config as config
config.cpp_wrapper = True

For light workloads where the overhead of the Python wrapper is more dominant, C++ wrapper demonstrates a higher performance boost ratio. We grouped the models in TorchBench, Hugging Face, and TIMM per the average inference time of one iteration and categorized them into small, medium, and large categories. Table 3 shows the geometric mean speedups achieved by the C++ wrapper in comparison to the default Python wrapper.

Table 3. C++ wrapper geometric mean speedup compared with Eager mode

FP32 Static Shape Mode Geometric Mean Speedup (Single-Socket Multithreads)
Compiler Small (t <= 0.04s) Medium (0.04s < t <= 1.5s) Large (t > 1.5s)
inductor 1.06x 1.01x 1.00x
FP32 Static Shape Mode Geometric Mean Speedup (Single-Core Single-Thread)
Compiler Small (t <= 0.04s) Medium (0.04s < t <= 1.5s) Large (t > 1.5s)
inductor 1.13x 1.02x 1.01x
FP32 Dynamic Shape Mode Geometric Mean Speedup (Single-Socket Multithreads)
Compiler Small (t <= 0.04s) Medium (0.04s < t <= 1.5s) Large (t > 1.5s)
inductor 1.05x 1.01x 1.00x
FP32 Dynamic Shape Mode Geometric Mean Speedup (Single-Core Single-Thread)
Compiler Small (t <= 0.04s) Medium (0.04s < t <= 1.5s) Large (t > 1.5s)
inductor 1.14x 1.02x 1.01x
BF16 Static Shape Mode Geometric Mean Speedup (Single-Socket Multithreads)
Compiler Small (t <= 0.04s) Medium (0.04s < t <= 1.5s) Large (t > 1.5s)
inductor 1.09x 1.03x 1.04x
BF16 Static Shape Mode Geometric Mean Speedup (Single-Core Single-Thread)
Compiler Small (t <= 0.04s) Medium (0.04s < t <= 1.5s) Large (t > 1.5s)
inductor 1.17x 1.04x 1.03x

Flash-Attention-Based Scaled Dot Product Algorithm for CPU

Scaled dot product attention (SDPA) is one of the flagship features of PyTorch 2.0 that helps speed up transformer models. It is accelerated with optimal CUDA kernels while still lacking optimized CPU kernels. This flash-attention implementation targets both training and inference, with both FP32 and Bfloat16 data types supported. There is no front-end use change for users to leverage this SDPA optimization. When calling SDPA, a specific implementation will be chosen automatically, including this new implementation.

We have measured the SDPA-related models in Hugging Face, and they are proven effective when compared to the unfused SDPA. Shown in Table 4 are the geometric mean speedups for SDPA optimization.

Table 4. SDPA optimization performance geometric mean speedup

SDPA Geometric Mean Speedup (Single-Socket Multithreads)
Compiler Geometric Speedup FP32 Geometric Speedup BF16
inductor 1.15x, 20/20 1.07x, 20/20
SDPA Geometric Mean Speedup (Single-Core Single-Thread)
Compiler Geometric Speedup FP32 Geometric Speedup BF16
inductor 1.02x, 20/20 1.04x, 20/20

PyTorch 2 Export Post-Training Quantization with x86 Back End through Inductor

PyTorch provides a new quantization flow in the PyTorch 2.0 export. This feature uses TorchInductor with an x86 CPU device as the back end for post-training static quantization with this new quantization flow. An example code snippet is shown in Figure 3.

import torch
import torch._dynamo as torchdynamo
from torch.ao.quantization.quantize_pt2e import convert_pt2e, prepare_pt2e
import torch.ao.quantization.quantizer.x86_inductor_quantizer as xiq

model = ... 

model.eval()
with torch.no_grad():
 # Step 1: Trace the model into an FX graph of flattened ATen operators
 exported_graph_module, guards = torchdynamo.export(
	 model,
	 *copy.deepcopy(example_inputs),
	 aten_graph=True,
 )

 # Step 2: Insert observers or fake quantize modules
 quantizer = xiq.X86InductorQuantizer()
 operator_config = xiq.get_default_x86_inductor_quantization_config()
 quantizer.set_global(operator_config)
 prepared_graph_module = prepare_pt2e(exported_graph_module, quantizer)

 # Doing calibration here.

 # Step 3: Quantize the model
 convert_graph_module = convert_pt2e(prepared_graph_module)

 # Step 4: Lower Quantized Model into the backend
 compile_model = torch.compile(convert_graph_module)

Figure 3. Code snippet showing the use of Inductor as back end for PyTorch 2 export post-training quantization

All convolutional neural networks (CNN) models from the TorchBench test suite have been measured and proven effective when compared with the Inductor FP32 inference path. Performance metrics are shown in Table 5.

Compiler Geometric Speedup Geometric Related Accuracy Loss
inductor 3.25x, 12/12 0.44%, 12/12

Next Steps

Get the Software

Try out PyTorch 2.1 and realize the performance benefits for yourself from these features contributed by Intel.

We encourage you to check out Intel’s other AI Tools and framework optimizations and learn about the open, standards-based oneAPI multiarchitecture, multivendor programming model that forms the foundation of Intel’s AI software portfolio.

For more details about the 4th generation Intel Xeon Scalable processor, visit the AI platform where you can learn how Intel is empowering developers to run high-performance, efficient end-to-end AI pipelines.

PyTorch Resources

Product and Performance Information

1 Amazon EC2* m7i.16xlarge: 1-node, Intel Xeon Platinum 8488C processor with 256 GB memory (1 x 256 GB DDR5 4800 MT/s), microcode 0x2b000461, hyperthreading on, turbo on, Ubuntu* 22.04.3 LTS, kernel 6.2.0-1011-aws, GCC* 11.3.0, Amazon Elastic Block Store 200 GB, BIOS Amazon EC2 1.0 10/16/2017; Software: PyTorch 2.1.0_rc4, Intel® oneAPI Deep Neural Network Library (oneDNN) version 3.1.1, TorchBench, TorchVision, TorchText, TorchAudio, TorchData, TorchDynamo Benchmarks, tested by Intel on 9/12/2023.

2 Amazon EC2 c6i.16xlarge: 1-node, Intel Xeon Platinum 8375C processor with 128 GB memory (1 x 128 GB DDR4 3200 MT/s), microcode 0xd0003a5, hyperthreading on, turbo on, Ubuntu 22.04.2 LTS, kernel 6.2.0-1011-aws, gcc 11.3.0, Amazon Elastic Block Store 200 GB, BIOS Amazon EC2 1.010/16/2017; Software: PyTorch 2.1.0_rc4, oneDNN version 3.1.1, TorchBench, TorchVision, TorchText, TorchAudio, TorchData, TorchDynamo Benchmarks, TorchBench cpu userbenchmark, tested by Intel on 9/12/2023.

Read More

The Power of Prompting

The Power of Prompting

Illustrated icons of a medical bag, hexagon with circles at its points, and a chat bubble on a blue and purple gradient background.

Today, we published an exploration of the power of prompting strategies that demonstrates how the generalist GPT-4 model can perform as a specialist on medical challenge problem benchmarks. The study shows GPT-4’s ability to outperform a leading model that was fine-tuned specifically for medical applications, on the same benchmarks and by a significant margin. These results are among other recent studies that show how prompting strategies alone can be effective in evoking this kind of domain-specific expertise from generalist foundation models.  

A visual illustration of Medprompt performance on the MedQA benchmark. Moving from left to right on a horizontal line, the illustration shows how different Medprompt components and additive contributions improve accuracy starting with zero-shot at 81.7 accuracy, to random few-shot at 83.9 accuracy, to random few-shot, chain-of-thought at 87.3 accuracy, to kNN, few-shot, chain-of-thought at 88.4 accuracy, to ensemble with choice shuffle at 90.2 accuracy.
Figure 1: Visual illustration of Medprompt components and additive contributions to performance on the MedQA benchmark. Prompting strategy combines kNN-based few-shot example selection, GPT-4–generated chain-of-thought prompting, and answer-choice shuffled ensembling.

During early evaluations of the capabilities of GPT-4, we were excited to see glimmers of general problem-solving skills, with surprising polymathic capabilities of abstraction, generalization, and composition—including the ability to weave together concepts across disciplines. Beyond these general reasoning powers, we discovered that GPT-4 could be steered via prompting to serve as a domain-specific specialist in numerous areas. Previously, eliciting these capabilities required fine-tuning the language models with specially curated data to achieve top performance in specific domains. This poses the question of whether more extensive training of generalist foundation models might reduce the need for fine-tuning.

In a study shared in March, we demonstrated how very simple prompting strategies revealed GPT-4’s strengths in medical knowledge without special fine-tuning. The results showed how the “out-of-the-box” model could ace a battery of medical challenge problems with basic prompts. In our more recent study, we show how the composition of several prompting strategies into a method that we refer to as “Medprompt” can efficiently steer GPT-4 to achieve top performance. In particular, we find that GPT-4 with Medprompt: 

  • Surpasses 90% on MedQA dataset for the first time
  • Achieves top reported results on all nine benchmark datasets in the MultiMedQA suite
  • Reduces error rate on MedQA by 27% over that reported by MedPaLM 2 

Many AI practitioners assume that specialty-centric fine-tuning is required to extend generalist foundation models to perform well on specific domains. While fine-tuning can boost performance, the process can be expensive. Fine-tuning often requires experts or professionally labeled datasets (e.g., via top clinicians in the MedPaLM project) and then computing model parameter updates. The process can be resource-intensive and cost-prohibitive, making the approach a difficult challenge for many small and medium-sized organizations. The Medprompt study shows the value of more deeply exploring prompting possibilities for transforming generalist models into specialists and extending the benefits of these models to new domains and applications. In an intriguing finding, the prompting methods we present appear to be valuable, without any domain-specific updates to the prompting strategy, across professional competency exams in a diversity of domains, including electrical engineering, machine learning, philosophy, accounting, law, and psychology. 

At Microsoft, we’ve been working on the best ways to harness the latest advances in large language models across our products and services while keeping a careful focus on understanding and addressing potential issues with the reliability, safety, and usability of applications. It’s been inspirational to see all the creativity, and the careful integration and testing of prototypes, as we continue the journey to share new AI developments with our partners and customers.

A chart shows GPT-4 performance using three different prompting strategies on out of domain datasets. GPT-4 out performs zero-shot and five-shot approaches across MMLU Machine Learning, MMLU Professional Psychology, MMLU Electrical Engineering, MMLU Philosophy, MMLU Professional Law, MMLU Accounting, NCLEX RegisteredNursing.com, and NCLEX Nurselabs.
Figure 3: GPT-4 performance with three different prompting strategies on out-of-domain datasets. Zero-shot and five-shot approaches represent baselines.

The post The Power of Prompting appeared first on Microsoft Research.

Read More

Learn how to assess the risk of AI systems

Learn how to assess the risk of AI systems

Artificial intelligence (AI) is a rapidly evolving field with the potential to improve and transform many aspects of society. In 2023, the pace of adoption of AI technologies has accelerated further with the development of powerful foundation models (FMs) and a resulting advancement in generative AI capabilities.

At Amazon, we have launched multiple generative AI services, such as Amazon Bedrock and Amazon CodeWhisperer, and have made a range of highly capable generative models available through Amazon SageMaker JumpStart. These services are designed to support our customers in unlocking the emerging capabilities of generative AI, including enhanced creativity, personalized and dynamic content creation, and innovative design. They can also enable AI practitioners to make sense of the world as never before—addressing language barriers, climate change, accelerating scientific discoveries, and more.

To realize the full potential of generative AI, however, it’s important to carefully reflect on any potential risks. First and foremost, this benefits the stakeholders of the AI system by promoting responsible and safe development and deployment, and by encouraging the adoption of proactive measures to address potential impact. Consequently, establishing mechanisms to assess and manage risk is an important process for AI practitioners to consider and has become a core component of many emerging AI industry standards (for example, ISO 42001, ISO 23894, and NIST RMF) and legislation (such as EU AI Act).

In this post, we discuss how to assess the potential risk of your AI system.

What are the different levels of risk?

While it might be easier to start looking at an individual machine learning (ML) model and the associated risks in isolation, it’s important to consider the details of the specific application of such a model and the corresponding use case as part of a complete AI system. In fact, a typical AI system is likely to be based on multiple different ML models working together, and an organization might be looking to build multiple different AI systems. Consequently, risks can be evaluated for each use case and at different levels, namely model risk, AI system risk, and enterprise risk.

Enterprise risk encompasses the broad spectrum of risks that an organization may face, including financial, operational, and strategic risks. AI system risk focuses on the impact associated with the implementation and operation of AI systems, whereas ML model risk pertains specifically to the vulnerabilities and uncertainties inherent in ML models.

In this post, we focus on AI system risk, primarily. However, it’s important to note that all different levels of risk management within an organization should be considered and aligned.

How is AI system risk defined?

Risk management in the context of an AI system can be a path to minimize the effect of uncertainty or potential negative impacts, while also providing opportunities to maximize positive impacts. Risk itself is not a potential harm but the effect of uncertainty on objectives. According to the NIST Risk Management Framework (NIST RMF), risk can be estimated as a multiplicative measure of an event’s probability of occurring timed by the magnitudes of the consequences of the corresponding event.

There are two aspects to risk: inherent risk and residual risk. Inherent risk represents the amount of risk the AI system exhibits in absence of mitigations or controls. Residual risk captures the remaining risks after factoring in mitigation strategies.

Always keep in mind that risk assessment is a human-centric activity that requires organization-wide efforts; these efforts range from ensuring all relevant stakeholders are included in the assessment process (such as product, engineering, science, sales, and security teams) to assessing how social perspectives and norms influence the perceived likelihood and consequences of certain events.

Why should your organization care about risk evaluation?

Establishing risk management frameworks for AI systems can benefit society at large by promoting the safe and responsible design, development and operation of AI systems. Risk management frameworks can also benefit organizations through the following:

  • Improved decision-making – By understanding the risks associated with AI systems, organizations can make better decisions about how to mitigate those risks and use AI systems in a safe and responsible manner
  • Increased compliance planning – A risk assessment framework can help organizations prepare for risk assessment requirements in relevant laws and regulations
  • Building trust – By demonstrating that they are taking steps to mitigate the risks of AI systems, organizations can show their customers and stakeholders that they are committed to using AI in a safe and responsible manner

How to assess risk?

As a first step, an organization should consider describing the AI use case that needs to be assessed and identify all relevant stakeholders. A use case is a specific scenario or situation that describes how users interact with an AI system to achieve a particular goal. When creating a use case description, it can be helpful to specify the business problem being solved, list the stakeholders involved, characterize the workflow, and provide details regarding key inputs and outputs of the system.

When it comes to stakeholders, it’s easy to overlook some. The following figure is a good starting point to map out AI stakeholder roles.

Source: “Information technology – Artificial intelligence – Artificial intelligence concepts and terminology”.

An important next step of the AI system risk assessment is to identify potentially harmful events associated with the use case. In considering these events, it can be helpful to reflect on different dimensions of responsible AI, such as fairness and robustness, for example. Different stakeholders might be affected to different degrees along different dimensions. For example, a low robustness risk for an end-user could be the result of an AI system exhibiting minor disruptions, whereas a low fairness risk could be caused by an AI system producing negligibly different outputs for different demographic groups.

To estimate the risk of an event, you can use a likelihood scale in combination with a severity scale to measure the probability of occurrence as well as the degree of consequences. A helpful starting point when developing these scales might be the NIST RMF, which suggests using qualitative nonnumerical categories ranging from very low to very high risk or semi-quantitative assessments principles, such as scales (such as 1–10), bins, or otherwise representative numbers. After you have defined the likelihood and severity scales for all relevant dimensions, you can use a risk matrix scheme to quantify the overall risk per stakeholders along each relevant dimension. The following figure shows an example risk matrix.

Using this risk matrix, we can consider an event with low severity and rare likelihood of occurring as very low risk. Keep in mind that the initial assessment will be an estimate of inherent risk, and risk mitigation strategies can help lower the risk levels further. The process can then be repeated to generate a rating for any remaining residual risk per event. If there are multiple events identified along the same dimension, it can be helpful to pick the highest risk level among all to create a final assessment summary.

Using the final assessment summary, organizations will have to define what risk levels are acceptable for their AI systems as well as consider relevant regulations and policies.

AWS commitment

Through engagements with the White House and UN, among others, we are committed to sharing our knowledge and expertise to advance the responsible and secure use of AI. Along these lines, Amazon’s Adam Selipsky recently represented AWS at the AI Safety Summit with heads of state and industry leaders in attendance, further demonstrating our dedication to collaborating on the responsible advancement of artificial intelligence.

Conclusion

As AI continues to advance, risk assessment is becoming increasingly important and useful for organizations looking to build and deploy AI responsibly. By establishing a risk assessment framework and risk mitigation plan, organizations can reduce the risk of potential AI-related incidents and earn trust with their customers, as well as reap benefits such as improved reliability, improved fairness for different demographics, and more.

Go ahead and get started on your journey of developing a risk assessment framework in your organization and share your thoughts in the comments.

Also check out an overview of generative AI risks published on Amazon Science: Responsible AI in the generative era, and explore the range of AWS services that can support you on your risk assessment and mitigation journey: Amazon SageMaker Clarify, Amazon SageMaker Model Monitor, AWS CloudTrail, as well as the model governance framework.


About the Authors

Mia C. Mayer is an Applied Scientist and ML educator at AWS Machine Learning University; where she researches and teaches safety, explainability and fairness of Machine Learning and AI systems. Throughout her career, Mia established several university outreach programs, acted as a guest lecturer and keynote speaker, and presented at numerous large learning conferences. She also helps internal teams and AWS customers get started on their responsible AI journey.

Denis V. Batalov is a 17-year Amazon veteran and a PhD in Machine Learning, Denis worked on such exciting projects as Search Inside the Book, Amazon Mobile apps and Kindle Direct Publishing. Since 2013 he has helped AWS customers adopt AI/ML technology as a Solutions Architect. Currently, Denis is a Worldwide Tech Leader for AI/ML responsible for the functioning of AWS ML Specialist Solutions Architects globally. Denis is a frequent public speaker, you can follow him on Twitter @dbatalov.

Dr. Sara Liu is a Senior Technical Program Manager with the AWS Responsible AI team. She works with a team of scientists, dataset leads, ML engineers, researchers, as well as other cross-functional teams to raise the responsible AI bar across AWS AI services. Her current projects involve developing AI service cards, conducting risk assessments for responsible AI, creating high-quality evaluation datasets, and implementing quality programs. She also helps internal teams and customers meet evolving AI industry standards.

Read More

Embracing Transformation: AWS and NVIDIA Forge Ahead in Generative AI and Cloud Innovation

Embracing Transformation: AWS and NVIDIA Forge Ahead in Generative AI and Cloud Innovation

Amazon Web Services and NVIDIA will bring the latest generative AI technologies to enterprises worldwide.

Combining AI and cloud computing, NVIDIA founder and CEO Jensen Huang joined AWS CEO Adam Selipsky Tuesday on stage at AWS re:Invent 2023 at the Venetian Expo Center in Las Vegas.

Selipsky said he was “thrilled” to announce the expansion of the partnership between AWS and NVIDIA with more offerings that will deliver advanced graphics, machine learning and generative AI infrastructure.

The two announced that AWS will be the first cloud provider to adopt the latest NVIDIA GH200 NVL32 Grace Hopper Superchip with new multi-node NVLink technology, that AWS is bringing NVIDIA DGX Cloud to AWS, and that AWS has integrated some of NVIDIA’s most popular software libraries.

Huang started the conversation by highlighting the integration of key NVIDIA libraries with AWS, encompassing a range from NVIDIA AI Enterprise to cuQuantum to BioNeMo, catering to domains like data processing, quantum computing and digital biology.

The partnership opens AWS to millions of developers and the nearly 40,000 companies who are using these libraries, Huang said, adding that it’s great to see AWS expand its cloud instance offerings to include NVIDIA’s new L4, L40S and, soon, H200 GPUs.

Selipsky then introduced the AWS debut of the NVIDIA GH200 Grace Hopper Superchip, a significant advancement in cloud computing, and prompted Huang for further details.

“Grace Hopper, which is GH200, connects two revolutionary processors together in a really unique way,” Huang said. He explained that the GH200 connects NVIDIA’s Grace Arm CPU with its H200 GPU using a chip-to-chip interconnect called NVLink, at an astonishing one terabyte per second.

Each processor has direct access to the high-performance HBM and efficient LPDDR5X memory. This configuration results in 4 petaflops of processing power and 600GB of memory for each superchip.

AWS and NVIDIA connect 32 Grace Hopper Superchips in each rack using a new NVLink switch. Each 32 GH200 NVLink-connected node can be a single Amazon EC2 instance. When these are integrated with AWS Nitro and EFA networking, customers can connect GH200 NVL32 instances to scale to thousands of GH200 Superchips

“With AWS Nitro, that becomes basically one giant virtual GPU instance,” Huang said.

The combination of AWS expertise in highly scalable cloud computing plus NVIDIA innovation with Grace Hopper will make this an amazing platform that delivers the highest performance for complex generative AI workloads, Huang said.

“It’s great to see the infrastructure, but it extends to the software, the services and all the other workflows that they have,” Selipsky said, introducing NVIDIA DGX Cloud on AWS.

This partnership will bring about the first DGX Cloud AI supercomputer powered by the GH200 Superchips, demonstrating the power of AWS’s cloud infrastructure and NVIDIA’s AI expertise.

Following up, Huang announced that this new DGX Cloud supercomputer design in AWS, codenamed Project Ceiba, will serve as NVIDIA’s newest AI supercomputer as well, for its own AI research and development.


Named after the majestic Amazonian Ceiba tree, the Project Ceiba DGX Cloud cluster incorporates 16,384 GH200 Superchips to achieve 65 exaflops of AI processing power, Huang said.

Ceiba will be the world’s first GH200 NVL32 AI supercomputer built and the newest AI supercomputer in NVIDIA DGX Cloud, Huang said.

Huang described Project Ceiba AI supercomputer as “utterly incredible,” saying it will be able to reduce the training time of the largest language models by half.

NVIDIA’s AI engineering teams will use this new supercomputer in DGX Cloud to advance AI for graphics, LLMs, image/video/3D generation, digital biology, robotics, self-driving cars, Earth-2 climate prediction and more, Huang said.

“DGX is NVIDIA’s cloud AI factory,” Huang said, noting that AI is now key to doing NVIDIA’s own work in everything from computer graphics to creating digital biology models to robotics to climate simulation and modeling.

“DGX Cloud is also our AI factory to work with enterprise customers to build custom AI models,” Huang said. “They bring data and domain expertise; we bring AI technology and infrastructure.”

In addition, Huang also announced that AWS will be bringing four Amazon EC2 instances based on the NVIDIA GH200 NVL, H200, L40S, L4 GPUs, coming to market early next year.

Selipsky wrapped up the conversation by announcing that GH200-based instances and DGX Cloud will be available on AWS in the coming year.

You can catch the discussion and Selipsky’s entire keynote on AWS’s YouTube channel. 

Read More

NVIDIA BioNeMo Enables Generative AI for Drug Discovery on AWS

NVIDIA BioNeMo Enables Generative AI for Drug Discovery on AWS

Researchers and developers at leading pharmaceutical and techbio companies can now easily deploy NVIDIA Clara software and services for accelerated healthcare through Amazon Web Services.

Announced today at AWS re:Invent, the initiative gives healthcare and life sciences developers using AWS cloud resources the flexibility to integrate NVIDIA-accelerated offerings such as NVIDIA BioNeMo — a generative AI platform for drug discovery — coming to NVIDIA DGX Cloud on AWS, and currently available via the AWS ParallelCluster cluster management tool for high performance computing and the Amazon SageMaker machine learning service.

Thousands of healthcare and life sciences companies globally use AWS. They will now be able to access BioNeMo to build or customize digital biology foundation models with proprietary data, scaling up model training and deployment using NVIDIA GPU-accelerated cloud servers on AWS.

Techbio innovators including Alchemab Therapeutics, Basecamp Research, Character Biosciences, Evozyne, Etcembly and LabGenius are among the AWS users already using BioNeMo for generative AI-accelerated drug discovery and development. This collaboration gives them more ways to rapidly scale up cloud computing resources for developing generative AI models trained on biomolecular data.

This announcement extends NVIDIA’s existing healthcare-focused offerings available on AWS — NVIDIA MONAI for medical imaging workflows and NVIDIA Parabricks for accelerated genomics.

New to AWS: NVIDIA BioNeMo Advances Generative AI for Drug Discovery

BioNeMo is a domain-specific framework for digital biology generative AI, including pretrained large language models (LLMs), data loaders and optimized training recipes that can help advance computer-aided drug discovery by speeding target identification, protein structure prediction and drug candidate screening.

Drug discovery teams can use their proprietary data to build or optimize models with BioNeMo and run them on cloud-based high performance computing clusters.

One of these models, ESM-2 — a powerful LLM that supports protein structure prediction —  achieves almost linear scaling on 256 NVIDIA H100 Tensor Core GPUs. Researchers can scale to 512 H100 GPUs to complete training in a few days instead of a month, the training time published in the original paper.

Developers can train ESM-2 at scale using checkpoints of 650 million or 3 billion parameters. Additional AI models supported in the BioNeMo training framework include small-molecule generative model MegaMolBART and protein sequence generation model ProtT5.

BioNeMo’s pretrained models and optimized training recipes — which are available using self-managed services like AWS ParallelCluster and Amazon ECS as well as integrated, managed services through NVIDIA DGX Cloud and Amazon SageMaker — can help R&D teams build foundation models that can explore more drug candidates, optimize wet lab experimentation and find promising clinical candidates faster.

Also Available on AWS: NVIDIA Clara for Medical Imaging and Genomics

Project MONAI, cofounded and enterprise-supported by NVIDIA to support medical imaging workflows, has been downloaded more than 1.8 million times and is available for deployment on AWS. Developers can harness their proprietary healthcare datasets already stored on AWS cloud resources to rapidly annotate and build AI models for medical imaging.

These models, trained on NVIDIA GPU-powered Amazon EC2 instances, can be used for interactive annotation and fine-tuning for segmentation, classification, registration and detection tasks in medical imaging. Developers can also harness MRI image synthesis models available in MONAI to augment training datasets.

To accelerate genomics pipelines, Parabricks enables variant calling on a whole human genome in around 15 minutes, compared to a day on a CPU-only system. On AWS, developers can quickly scale up to process large amounts of genomic data across multiple GPU nodes.

More than a dozen Parabricks workflows are available on AWS HealthOmics as Ready2Run workflows, which enable customers to easily run pre-built pipelines.

Get started with NVIDIA Clara on AWS to accelerate AI workflows for drug discovery, genomics and medical imaging.

Subscribe to NVIDIA healthcare news.

Read More

NVIDIA GPUs on AWS to Offer 2x Simulation Leap in Omniverse Isaac Sim, Accelerating Smarter Robots

NVIDIA GPUs on AWS to Offer 2x Simulation Leap in Omniverse Isaac Sim, Accelerating Smarter Robots

Developing more intelligent robots in the cloud is about to get a speed multiplier.

NVIDIA Isaac Sim and NVIDIA L40S GPUs are coming to Amazon Web Services, enabling developers to build and deploy accelerated robotics applications in the cloud. Isaac Sim, an extensible simulator for AI-enabled robots, is built on the NVIDIA Omniverse development platform for building and connecting OpenUSD applications.

Combining powerful AI compute with graphics and media acceleration, the L40S GPU is built to power the next generation of data center workloads. Based on the Ada Lovelace architecture, the L40S enables ultrafast real-time rendering delivering up to a 3.8x performance leap for Omniverse compared with the previous generation, boosting engineering and robotics teams.

The generational leap in acceleration results in 2x faster performance than the A40 GPU across a broad set of robotic simulations tasks when using Isaac Sim.

L40S GPUs can also be harnessed for generative AI workloads, from fine-tuning large language models within a matter of hours, to real-time inferencing for text-to-image and chat applications.

New Amazon Machine Images (AMIs) on the NVIDIA L40S in AWS Marketplace will enable roboticists to easily access preconfigured virtual machines to operate Isaac Sim workloads.

Robotics development in simulation is speeding the process of deploying applications, turbocharging industries such as retail, food processing, manufacturing, logistics and more.

Revenue from mobile robots in warehouses worldwide is expected to explode, more than tripling from $11.6 billion in 2023 to $42.2 billion by 2030, according to ABI Research.

Robotics systems have played an important role across fulfillment centers to help meet the demands of online shoppers and provide a better workplace for employees. Amazon Robotics has deployed more than 750,000 robots in its warehouses around the world to improve the experience for employees supporting package fulfillment and its customers.

“Simulation technology plays a critical role in how we develop, test and deploy our robots.” said Brian Basile, head of virtual systems at Amazon Robotics. “At Amazon Robotics we continue to increase the scale and complexity of our simulations. With the new AWS L40S offering we will push the boundaries of simulation, rendering and model training even further.”

Accelerated Robotics Development With Isaac Sim

Robotics systems can demand large datasets for precision operation in deployed applications. Gathering these datasets and testing them in the real world is time-consuming, costly and impractical.

Robotics simulation drives the training and testing of AI-based robotic applications. With synthetic data, simulations are enabling virtual advances like never before. Simulations can help verify, validate and optimize robot designs, systems and their algorithms before operation. It can also be used to optimize facility designs before construction or remodeling starts for maximum efficiencies, reducing costly manufacturing change orders.

Isaac Sim offers access to the latest robotics simulation tools and capabilities as well as cloud access, enabling teams to collaborate more effectively. Access to the Omniverse Replicator synthetic data generation engine in Isaac Sim allows machine learning engineers to build production-ready synthetic datasets for training robust deep learning perception models.

Customer Adoption of Isaac Sim on AWS

AWS early adopters tapping into the Isaac Sim platform include Amazon Robotics, Soft Robotics and Theory Studios.

Amazon Robotics has begun using Omniverse to build digital twins for automating, optimizing and planning its autonomous warehouses in virtual environments before deploying them into the real world.

Using Isaac Sim for sensor emulation, Amazon Robotics will accelerate development of its Proteus autonomous mobile robot, improving it to help the online retail giant efficiently manage fulfillment.

Learn more about Isaac Sim, powered by NVIDIA Omniverse.

 

Read More

NVIDIA Powers Training for Some of the Largest Amazon Titan Foundation Models

NVIDIA Powers Training for Some of the Largest Amazon Titan Foundation Models

Everything about large language models is big — giant models train on massive datasets across thousands of NVIDIA GPUs.

That can pose a lot of big challenges for companies pursuing generative AI. NVIDIA NeMo, a framework for building, customizing and running LLMs, helps overcome these challenges.

A team of experienced scientists and developers at Amazon Web Services creating Amazon Titan foundation models for Amazon Bedrock, a generative AI service for foundation models, has been using NVIDIA NeMo for over the past several months.

“One key reason for us to work with NeMo is that it is extensible, comes with optimizations that allow us to run with high GPU utilization while also enabling us to scale to larger clusters so we can train and deliver models to our customers faster,” said Leonard Lausen, a senior applied scientist at AWS.

Think Big, Really Big

Parallelism techniques in NeMo enable efficient LLM training at scale. When coupled with the Elastic Fabric Adapter from AWS, it allowed the team to spread its LLM across many GPUs to accelerate training.

EFA provides AWS customers with an UltraCluster Networking infrastructure that can directly connect more than 10,000 GPUs and bypass the operating system and CPU using NVIDIA GPUDirect.

The combination allowed the AWS scientists to deliver excellent model quality — something that’s not possible at scale when relying solely on data parallelism approaches.

Framework Fits All Sizes

“The flexibility of NeMo,” Lausen said, “allowed AWS to tailor the training software for the specifics of the new Titan model, datasets and infrastructure.”

AWS’s innovations include efficient streaming from Amazon Simple Storage Service (Amazon S3) to the GPU cluster. “It was easy to incorporate these improvements because NeMo builds upon popular libraries like PyTorch Lightning that standardize LLM training pipeline components,” Lausen said.

AWS and NVIDIA aim to infuse products like NVIDIA NeMo and services like Amazon Titan with lessons learned from their collaboration for the benefit of customers.

Read More

3D Artist Nourhan Ismail Brings Isometric Innovation ‘In the NVIDIA Studio’ With Adobe After Effects and Blender

3D Artist Nourhan Ismail Brings Isometric Innovation ‘In the NVIDIA Studio’ With Adobe After Effects and Blender

Editor’s note: This post is part of our weekly In the NVIDIA Studio series, which celebrates featured artists, offers creative tips and tricks, and demonstrates how NVIDIA Studio technology improves creative workflows. 

This week’s talented In the NVIDIA Studio artist, Nourhan Ismail, created a literal NVIDIA studio.

Her piece, called Creator by Day, Gamer by Night, was crafted with the isometric art style and impressive graphical fidelity Ismail’s known for, rich with vibrant colors and playful details. It also captures her “work hard, play hard” mentality as a 3D artist, interior designer and game level designer.

The same art style is featured in the NVIDIA Studio Sessions YouTube miniseries led by Ismail, which provides step-by-step tutorials on how to create a low-poly bedroom, from inception to final render.

Facial Animations Made Easier

Reallusion is the maker of Reallusion iClone, real-time 3D animation software built to produce professional animations for films and video games.

To expedite character animation workflows, the company recently launched its AccuFACE plug-in, which accurately captures facial expressions from webcams and conventional video files, without the need for expensive, specialized equipment.

The NVIDIA Maxine software development platform, the foundational technology behind the revolutionary NVIDIA Broadcast app, powers this incredible capability by weighing output and analyzing facial expressions and blendshapes to predict facial mesh animations.

From there, the AccuFACE plug-in converts this data into facial mesh assets for creators to apply seamlessly. It also fine-tunes lip and tongue articulation using proprietary AccuLIPS technology.

Download the plug-in today, available to creators with NVIDIA RTX GPUs.

Turning Pain Into Beauty

Ismail’s creative journey began at age four as a form of escape from the armed conflict occurring in Syria, her homeland. During that time, Ismail’s family faced many difficulties, including the loss of their home.

In the aftermath, she looked to her father, an accomplished artist and fashion designer, as a source of inspiration.

“His encouragement propelled me to showcase the pinnacle of my abilities, reminding me that art has the power to transform pain into beauty,” she said.

That encouragement has guided and fueled Ismail’s creative journey, eventually giving rise to her signature, single-room isometric style, an homage to the power of resilience and finding beauty in adversity.

Warm and homey.

“Starting with a single room, I delve into interior design, crafting spaces that reflect the comfort and joy I yearned for during challenging times,” she said. “To me, overcoming adversity proves that even from the harshest circumstances, beauty can emerge.”

Ismail started as a self-taught 3D artist, driven by a passion to learn the intricacies of creating digital masterpieces.

“Posting my works became a personal gauge of improvement — not for validation, but as a record of my learning curve,” she said.

Beautifully conceived, masterfully executed.

Each of Ismail’s pieces is a testament to her evolving skills, dedication and love for sharing her craft, especially with her father.

In fact, she dedicated her first isometric house to her father. “That was the happiest moment, to create something inspiring and make someone happy,” she said.

Isometric Art

Ismail first collects reference material on Adobe Behance to gain inspiration on ways to mix different art styles.

She then opens Blender and starts sketching in 3D. Blender Cycles’ RTX-accelerated OptiX ray tracing, powered by her GeForce RTX 3080 Ti GPU, ensured smooth viewport movement.

A gorgeous work in progress.

While the models are still fairly rudimentary, Ismail calculates the angles that light should be coming in from.

“Lighting is an emotional element,” she said. “The lighting of each piece evokes different emotions and a certain idiosyncratic introspectiveness, making the experience unique to each person.”

The NVIDIA Studio has NVIDIA Canvas!

Her trick is to regularly switch between rich, colorful scenes and plain color models to measure the emotional weight and visual impact. She either creates the custom textures herself or downloads premade ones online when on a time crunch.

Ismail’s incredible detail on full display.

Then, she plays with camera angles to analyze depth shadows and lighting, setting up animations and sequence shots in Blender. There, Blender Cycles’ RTX-accelerated OptiX ray tracing delivered seamless viewport movement.

 

Final touch-ups are done in post-production in Adobe After Effects. Over 30 GPU-accelerated effects sped the process, allowed Ismail to complete the project with time to spare.

“Creator by Day, Gamer by Night” in dark mode.

“There will always be hard times, so never give up and keep believing in yourself,” Ismail encourages content creators.

Digital 3D artist Nourhan Ismail.

Check out Ismail’s Instagram for more spectacular isometric art.

Follow NVIDIA Studio on Instagram, Twitter and Facebook. Access tutorials on the Studio YouTube channel and get updates directly in your inbox by subscribing to the Studio newsletter. 

Read More