May 2022 – Page 2

Scientists Building Digital Twins in NVIDIA Omniverse to Accelerate Clean Energy Research

As global climate change accelerates, finding and securing clean energy is a crucial challenge for many researchers, organizations and governments.

The U.K.’s Atomic Energy Authority (UKAEA), through an evaluation project at the University of Manchester, has been testing the NVIDIA Omniverse simulation platform to accelerate the design and development of a full-scale fusion powerplant that could put clean power on the grid in the coming years.

Over the past several decades, scientists have experimented with ways to create fusion energy, which produces zero carbon and low radioactivity. Such technology could provide virtually limitless clean, safe and affordable energy to meet the world’s growing demand.

Fusion is the principle that energy can be released by combining atomic nuclei together. But fusion energy has not yet been successfully scaled for production due to high energy input needs and unpredictable behavior of the fusion reaction.

Fusion reaction powers the sun, where massive gravitational pressures allow fusion to happen naturally at temperatures around 27 million degrees Fahrenheit. But Earth doesn’t have the same gravitational pressure as the sun, and this means that temperatures to produce fusion need to be much higher — above 180 million degrees.

To replicate the power of the sun on Earth, researchers and engineers are using the latest advances in data science and extreme-scale computing to develop designs for fusion powerplants. With NVIDIA Omniverse, researchers could potentially build a fully functioning digital twin of a powerplant, helping ensure the most efficient designs are selected for construction.

Accelerating Design, Simulation and Real-Time Collaboration

Building a digital twin that accurately represents all powerplant components, the plasma, and the control and maintenance systems is a massive challenge — one that can benefit greatly from AI, exascale GPU computing, and physically accurate simulation software.

It starts with the design of a fusion powerplant, which requires a large number of parts and inputs from large teams of engineering, design and research experts throughout the process. “There are many different components, and we have to take into account lots of different areas of physics and engineering,” said Lee Margetts, UKAEA chair of digital engineering for nuclear fusion at the University of Manchester. “If we make a design change in one system, this has a knock-on effect on other systems.”

Experts from various domains are involved in the project. Each team member uses different computer-aided design applications or simulation tools, and an expert’s work in one domain depends on the data from others working in different domains.

The UKAEA team is exploring Omniverse to help them work together in a real-time simulation environment, so they can see the design of the whole machine rather than only individual subcomponents.

Omniverse has been critical in keeping all these moving parts in sync. By enabling all tools and applications to connect, Omniverse allows the engineers working on the powerplant design to simultaneously collaborate from a single source of truth.

“We can see three different engineers, from three different locations, working on three different components of a powerplant in three different packages,” said Muhammad Omer, a researcher on the project.

Omer explained that when experimenting in Omniverse, the team achieved photorealism in their powerplant designs using the platform’s core abilities to import full-fidelity 3D data. They could also visualize in real time with the RTX Renderer, which made it easy for them to compare different design options for components.

Simulation of fusion plasma is also a challenge. The teams developed Python-based Omniverse Extensions with Omniverse Kit to connect and ingest data from industry simulation software Monte Carlo Neutronics Code Geant4. This allows them to simulate neutron transport in the powerplant core, which is what carries energy out of the powerplant.

They also built Omniverse Extensions to view JOREK plasma simulation code, which simulates visible light emissions, giving the researchers insight into the plasma’s state. The scientists will begin to explore the NVIDIA Modulus AI-physics framework to use with their existing simulation data to develop AI surrogate models to accelerate the fusion plasma simulations.

Simulation of Monte Carlo Neutronics Code Geant4 in Omniverse.

Using AI to Optimize Designs and Enhance Digital Twins

In addition to helping design, operate and control the powerplant, Omniverse can help assist in the training of future AI-driven or AI-augmented robotic control and maintenance systems. These will be essential for maintaining plants in the radiation environment of the powerplant.

Using Omniverse Replicator, a software development kit for building custom synthetic data-generation tools and datasets, researchers can generate large quantities of physically accurate synthetic data of the powerplant and plasma behavior to train robotic systems. By learning in simulation, the robots can correctly handle tasks more accurately in the real world, improving predictive maintenance and reducing downtime.

In the future, sensor models could livestream observation data to the Omniverse digital twin, constantly keeping the virtual twin synchronized to the powerplant’s physical state. Researchers will be able to explore various hypothetical scenarios by first testing in the virtual twin before deploying changes to the physical powerplant.

Overall, Margetts and the team at UKAEA saw many unique opportunities and benefits in using Omniverse to build digital twins for fusion powerplants. Omniverse provides the possibility of a real-time platform that researchers can use to develop first-of-a-kind powerplant technology. The platform also allows engineers to seamlessly work together on powerplant designs. And teams can access integrated AI tools that will enable users to optimize future power plants.

“We’re delighted about what we’ve seen. We believe it’s got great potential as a platform for digital engineering,” said Margetts.

Watch the demo and learn more about NVIDIA Omniverse.

Featured image courtesy of Brigantium Engineering and Bentley Systems.

The post Scientists Building Digital Twins in NVIDIA Omniverse to Accelerate Clean Energy Research appeared first on NVIDIA Blog.

HPC Researchers Seed the Future of In-Network Computing With NVIDIA BlueField DPUs

Across Europe and the U.S., HPC developers are supercharging supercomputers with the power of Arm cores and accelerators inside NVIDIA BlueField-2 DPUs.

At Los Alamos National Laboratory (LANL) that work is one part of a broad, multiyear collaboration with NVIDIA that targets 30x speedups in computational multi-physics applications.

LANL researchers foresee significant performance gains using data processing units (DPUs) running on NVIDIA Quantum InfiniBand networks. They will pioneer techniques in computational storage, pattern matching and more using BlueField and its NVIDIA DOCA software framework.

An Open API for DPUs

The efforts also will help further define OpenSNAPI, an application interface anyone can use to harness DPUs. It’s a project of the Unified Communication Framework, a consortium enabling heterogeneous computing for HPC apps whose members include Arm, IBM, NVIDIA, U.S. national labs and U.S. universities.

LANL is already feeling the power of in-network computing, thanks to a DPU-powered storage system it created.

The Accelerated Box of Flash (ABoF, pictured below) combines solid-state storage with DPU and InfiniBand accelerators to speed up performance-critical parts of a Linux file system. It’s up to 30x faster than similar storage systems and set to become a key component in LANL’s infrastructure.

ABoF places computation near storage to minimize data movement and improve the efficiency of both simulation and data-analysis pipelines, a researcher said in a recent LANL blog.

Texas Rides a Cloud-Native Super

The Texas Advanced Computing Center (TACC) is the latest to adopt BlueField-2 in Dell PowerEdge servers. It will use the DPUs on an InfiniBand network to make its Lonestar6 system a development platform for cloud-native supercomputing.

TACC’s Lonestar6 serves a wide swath of HPC developers at Texas A&M University, Texas Tech University, and the University of North Texas, as well as a number of research centers and the faculty.

MPI Gets Accelerated

Twelve hundred miles to the northeast, researchers at Ohio State University showed how DPUs can make one of HPC’s most popular programming models run up to 26 percent faster.

By offloading critical parts of the message passing interface (MPI), they accelerated P3DFFT, a library used in many large-scale HPC simulations.

“DPUs are like assistants that handle work for busy executives, and they will go mainstream because they can make all workloads run faster,” said Dhabaleswar K. (DK) Panda, a professor of computer science and engineering at Ohio State who led the DPU work using his team’s MVAPICH open source software.

DPUs in HPC Centers, Clouds

Double-digit boosts are huge for supercomputers running HPC simulations like drug discovery or aircraft design. And cloud services can use such gains to increase their customers’ productivity, said Panda, who’s had requests from multiple HPC centers for his code.

Quantum InfiniBand networks with features like NVIDIA SHARP help make his work possible.

“Others are talking about in-network computing, but InfiniBand supports it today,” he said.

Durham Does Load Balancing

Multiple research teams in Europe are accelerating MPI and other HPC workloads with BlueField DPUs.

For example, Durham University, in northern England, is developing software for load balancing MPI jobs using BlueField DPUs on a 16-node Dell PowerEdge cluster. Its work will pave the way for more efficient processing of better algorithms for HPC facilities around the world, said Tobias Weinzierl, principal investigator for the project.

DPUs in Cambridge, Munich

Researchers in Cambridge, London and Munich are also using DPUs.

For its part, University College London is exploring how to schedule tasks for a host system on BlueField-2 DPUs. It’s a capability that could be used, for example, to move data between host processors so it’s there when they need it.

BlueField DPUs inside Dell PowerEdge servers in the Cambridge Service for Data Driven Discovery offload security policies, storage frameworks and other jobs from host CPUs, maximizing the system’s performance.

Meanwhile, researchers in the computer architecture and parallel systems group at the Technical University of Munich are seeking ways to offload both MPI and operating system tasks with DPUs as part of a EuroHPC project.

Back in the U.S., researchers at Georgia Tech are collaborating with the Sandia National Laboratories to speed work in molecular dynamics using BlueField-2 DPUs. A paper describing their work so far shows algorithms can be accelerated by up to 20 percent with no loss in the accuracy of simulations.

An Expanding Network

Earlier this month, researchers in Japan announced a system using the latest NVIDIA H100 Tensor Core GPUs riding our fastest and smartest network ever, the NVIDIA Quantum-2 InfiniBand platform.

NEC will build the approximately 6 PFLOPS, H100-based supercomputer for the Center for Computational Sciences at the University of Tsukuba. Researchers will use it for climatology, astrophysics, big data, AI and more.

Meanwhile, researchers like Panda are already thinking about how they’ll use the cores in BlueField-3 DPUs.

“It will be like hiring executive assistants with college degrees instead of ones with high school diplomas, so I’m hopeful more and more offloading will get done,” he quipped.

The post HPC Researchers Seed the Future of In-Network Computing With NVIDIA BlueField DPUs appeared first on NVIDIA Blog.

Hyperscale Digital Twins to Give Us “Amazing Superpowers,” NVIDIA Exec Says at ISC 2022

Highly accurate digital representations of physical objects or systems, or “digital twins,” will enable the next era of industrial virtualization and AI, executives from NVIDIA and BMW said Tuesday.

Kicking off the ISC 2022 conference in Hamburg, Germany, NVIDIA’s Rev Lebaredian (left), vice president for Omniverse and simulation technology, was joined by Michele Melchiorre, senior vice president for product system, technical planning, and tool shop at BMW Group.

“If you can construct a virtual world that matches the real world in its complexity, in its scale and in its precision, then there are a great many things you can do with this,” Lebaredian said.

While Lebaredian outlined the broad trends and technological advancements driving the evolution of digital twin simulations, Melchiorre offered a detailed look at how BMW has put digital twins to work in its own factories.

Melchiorre explained BMW’s plans to use digital twins as a tool to become more “lean, green and digital,” describing real-time collaboration with digital twins and opportunities for training AIs as a “revolution in factory planning.”

Digital twins such as the BMW iFACTORY initiative described by Melchiorre — which harnesses real-time data, simulation and machine learning — are an example of how swiftly digital twins have become workhorses for industrial companies such as Amazon Robotics, BMW and others.

These systems will link our representations of the world with data streaming in, in real-time, from these worlds, Lebaredian explained.

“What we’re trying to introduce now is a mechanism by which we can link the two together, where we can detect all the changes in the physical version, and reflect them in the digital world,” Lebaredian said. “If we can establish that link we gain some amazing superpowers.”

Supercomputing Is Transforming Every Field of Discovery

And it’s another powerful example of how technologies from the supercomputing industry — particularly its focus on simulation and data center scale GPU computing — are spilling over into the broader world.

At the same time, converging technologies have transformed high-performance computing, Lebaredian said. GPU-accelerated systems have become a mainstay not just in scientific computing, but edge computing, data centers and cloud systems.

NVIDIA’s Rev Lebaredian, vice president for Omniverse and simulation technology, speaking at ISC 2022.

And AI-accelerated GPU computing has also become a cornerstone of modern high-performance computing. That’s positioned supercomputing to realize the original intent of computer graphics: simulation.

Computers, algorithms and AI have all matured enough that we can begin simulating worlds that are complex enough to be useful on an industrial scale, even using these simulations as training grounds for AI.

World Simulation at an Inflection Point

With digital twins, a new class of simulation is possible, Lebaredian said.

These require precision timing — the ability to simulate multiple autonomous systems at the same time.

They require physically accurate simulation.

And they require accurate ingestion of information from the “real twin,” and continuous synchronization.

These digital twin simulations will give us “superpowers.”

The first one Lebaradian dug into was teleportation. “Just like in a multiplayer video game any human anywhere on earth can teleport into that virtual world,” Lebaradian said.

The next: time travel.

“If you record the state of the world over time, you can recall it at any point, this allows time travel,” Lebaradian said.

“You can not only now teleport to that world, but you can scrub your timeline and go backwards to any point in time, and explore that space at any point in time,” he added.

And, finally, these simulations, if accurate enough, will let us understand what’s next.

“If you have a simulator that is extremely accurate and actually predictive of what will happen in the future if you understand the laws of physics well enough you essentially get time travel to the future,” Lebaredian said.

“You can compute not just one possible future, but many possible futures,” he added, outlining how this could let city planners see what could happen as they modify a city, plan the road and change the traffic systems to find “the best possible future.”

Modern supercomputing is unlocking these digital twins, which are extremely compute-intensive and require precision timing networking with extremely low latency.

“We need a new kind of supercomputer, one that can really accelerate artificial intelligence and run these massive simulations in true real-time,” Lebaradian said.

That will require GPU-accelerated systems that are optimized at every layer of the system to enable precision timing.

These systems will need to run not just on the data center, but reach the edge of networks to bring data into virtual simulations with precision timing.

Such systems will be key to advances on scales both small — such as drug discovery, and large — such as climate simulation.

“We need to simulate our climate we need to look really far out, we need to do so at a precision that’s never been done before, and we need to be able to trust our simulations are actually predictive and accurate, if we do that we have some hope we can deal with this climate change situation,” Lebaradian said.

BMW’s iFACTORY: “Lean, Green and Digital”

BMW’s Melchiorre provided an example of how this broad vision is being put to work today at BMW, as the automaker seeks to become “lean, green and digital.”

Michelle Melchiorre, senior vice president for product system, technical planning, and tool shop at BMW Group.

BMW has built exceptionally complex digital twins, simulating its factories with humans and robots interacting in the same space, at the same time.

It’s an effort that stretches from the factory floor to the company’s data center, to its entire supply chain. This digital twin involves millions of moving parts and pieces that are connected to an enormous supply chain.

Melchiorre walked his audience through a number of examples of how digital twins simulate various pieces of the plant, simulating how industrial machinery, robots, and people will move together.

Inside the digital twin of BMW’s assembly system, powered by Omniverse, an entire factory in simulation.

And he explained how they are leveraging NVIDIA technology to simulate entire factories before they’re even built.

Melchiorre showed an aerial image of the site where BMW is building a new factory in Hungary. While the real-world factory is still mostly open field, the digital factory is 80% complete.

“This will be the first plant where we will have a complete digital twin much before production starts,” Melchiorre said.

In the future, the iFACTORY will be real in all of BMW’s plants, Melchiorre explained, from BMW’s 100-year-old home plant in Munich to its forthcoming plant in Debrecen, Hungary.

“This is our production network, not just one factory – each and every plant will go in this direction, every plant will develop into a BMW iFACTORY, this is our master plan for our future,” Melchiorre said.

The post Hyperscale Digital Twins to Give Us “Amazing Superpowers,” NVIDIA Exec Says at ISC 2022 appeared first on NVIDIA Blog.

Teaching models to express their uncertainty in words

OpenAI Blog

ACL: What comes next for natural-language processing?

Amazon Scholar and Columbia professor Kathleen McKeown on model compression, data distribution shifts, language revitalization, and more.Read More

Amazon and Max Planck Society launch Science Hub

The first Amazon Science Hub to exist outside the US will focus on driving AI research and development throughout Germany.Read More

Evaluating Multimodal Interactive Agents

In this paper, we assess the merits of these existing evaluation metrics and present a novel approach to evaluation called the Standardised Test Suite (STS). The STS uses behavioural scenarios mined from real human interaction data.Read More

Evaluating Multimodal Interactive Agents

Deep Attentive Variational Inference

Figure 1: Overview of a local variational layer (left) and an attentive variational layer (right) proposed in this post.

Generative models are a class of machine learning models that are able to generate novel data samples such as fictional celebrity faces, digital artwork, or scenery images. Currently, the most powerful generative models are deep probabilistic models. This class of models uses deep neural networks to express statistical hypotheses about the way in which the data have been generated. Latent variable models augment the set of the observed data with latent (unobserved) information in order to better characterize the procedure that generates the data of interest.

In spite of the successful results, deep generative modeling remains one of the most complex and expensive tasks in AI. Recent models rely on increased architectural depth to improve performance. However, as we show in our paper [1], the predictive gains diminish as depth increases. Keeping a Green-AI perspective in mind when designing such models could lead to their wider adoption in describing large-scale, complex phenomena.

A quick review of Deep Variational AutoEncoders

Latent variable models augment the set of the observed variables with auxiliary latent variables. They are characterized by a posterior distribution over the latent variables, one which is generally intractable and typically approximated by closed-form alternatives. Moreover, they provide an explicit parametric characterization of the joint distribution over the expanded random variable space. The generative and the inference portions of such a model are jointly trained. The Variational AutoEncoder (VAE) belongs to this model category. Figure 2 provides an overview of a VAE.

Figure 2: A Variational AutoEncoder consists of a generative model and an inference model. The generative model, or decoder, is defined by a joint distribution of latent and observed variables. The inference model, or encoder, approximates the true posterior of the latent variables given the observations. The two parts are jointly trained.

VAEs are trained by maximizing the Evidence Lower BOund (ELBO) which is a tractable, lower bound of the marginal log-likelihood:

[text{log } p(x) ge mathbb{E}_{q(zmid x)}large[text{log } p(xmid z)large] – D_{KL} large(q(zmid x) mid mid p(z)large). ]

Figure 3: Overview of a hierarchical VAE.

The most powerful VAEs introduce large latent spaces (z) that are organized in blocks such that (z = {z_1, z_2, dots, z_L}), with each block being generated by a layer in a hierarchy. Figure 3 illustrates a typical architecture of a hierarchical VAE. Most state-of-the-art VAEs correspond to a fully connected probabilistic graphical model. More formally, the prior distribution follows the factorization:

[ p(z) = p(z_1) prod_{l=2}^L p(z_l mid z_{<l}). text{ (1)}]

In words, (z_l) depends on all previous latent factors (z_{<l}). Similarly, the posterior distribution is given by:

[q(zmid x) = q(z_1 mid x) prod_{l=2}^L q(z_l mid x, z_{<l}). text{ (2)}]

The long-range conditional dependencies are implicitly enforced via deterministic features that are mixed with the latent variables and are propagated through the hierarchy. Concretely, each layer (l) is responsible for providing the next layer with a latent sample (z_l) along with context information (c_l):

[c_l leftarrow T_l left (z_{l-1} oplus c_{l-1} right). text{ (3)}]

In a convolutional VAE, (T_l) is a non-linear transformation implemented by ResNet blocks as shown in Figure 1. The operator (oplus) combines two branches in the network. Due to its recursive definition, (c_l) is a function of (z_{<l}).

Deep Variational AutoEncoders are “overthinking”

Recent models such as NVAE [2], rely on the increased depth to improve performance and deliver results comparable to that of purely generative, autoregressive models while permitting fast sampling that requires a single network evaluation. However, as we show in our paper and Table 1, the predictive gains diminish as depth increases. After some point, even if we double the number of layers, we can only realize a slight increase in the marginal likelihood.

Depth (L)	bits/ dim ( downarrow)	(Delta(cdot) % )
2	3.5	–
4	3.26	-6.8
8	3.06	-6.1
16	2.96	-3.2
30	2.91	-1.7

Table 1: Deep VAEs suffer from diminishing returns. ( -text{log } p(x) ) in bits per dimension and relative decrease for varying number of variational layers (L).

We argue that this may be because the effect of the latent variables of earlier layers diminishes as the context feature (c_l) traverses the hierarchy and is updated with latent information from subsequent layers. In turn, this means that in practice the network may no longer respect the factorization of the variational distributions of Equations (1) and (2), leading to sub-optimal performance. Formally, large portions of early blocks (z_l) collapse to their prior counterparts, and therefore, they no longer contribute to inference.

This phenomenon can be attributed to the local connectivity of the layers in the hierarchy, as shown in Figure 4.a. In fact, a layer is directly connected only with the adjacent layers in a deep VAE, limiting long-range conditional dependencies between (z_l) and (z_{<<l}) as depth increases.

The flexibility of the prior (p(z)) and the posterior (q(z mid x)) can be improved by designing more informative representations for the conditioning factors of the conditional distributions (p(z_l mid z_{<l})) and (q(z_l mid x, z_{<l})). This can be accomplished by designing a hierarchy of densely connected stochastic layers that dynamically learn to attend to latent and observed information most critical to inference. A high-level description of this idea is illustrated in Figure 4.b.

Figure 4: (a) Locally Connected Variational Layer.

(b) Strongly Connected Variational Layer.

In the following sections, we describe the technical tool that allows our model to realize the strong couplings presented in Figure 4.b.

Problem: Handling long sequences of large 3D tensors

In deep convolutional architectures, we usually need to handle long sequences of large 3D context tensors. A typical sequence is shown in Figure 5. Constructing effectively strong couplings between current and previous layers in a deep architecture can be formulated as:

Figure 5: Sequence of 3D tensors in a convolutional architecture.

Problem definition: Given a sequence (c_{<l}={c_m}_{m=1}^{l-1}) of (l-1) contexts (c_m) with (c_min mathbb{R}^{H times W times C}), we need to construct a single context (hat{c}_linmathbb{R}^{H times W times C}) that summarizes information in (c_{<l}) that is most critical to the task.

In our framework, the task of interest is the construction of posterior and prior beliefs. Equivalently, contexts ( hat{c}^q_l) and ( hat{c}^p_l) represent the conditioning factor of the posterior and prior distribution of layer (l).

There are two ways to view a long sequence of (l-1) large (H times W times C)-dimensional contexts:

Inter-Layer couplings: As (H times W) independent pixel sequences of (C-)dimensional features of length (l-1). One such sequence is highlighted in Figure 5.
Intra-Layer couplings: As (l-1) independent pixel sequences of (C-)dimensional features of length (H times W).

This observation leads to a factorized attention scheme that identifies important long-range, inter-layer, and intra-layer dependencies separately. Such decomposition of large and long pixel sequences leads to significantly less compute.

Inter-Layer couplings: Depth-wise Attention

The network relies on a depth-wise attention scheme to discover inter-layer dependencies. The task is characterized by a query feature (s). During this phase, the pixel sequences correspond to instances of a pixel at the previous layers in the architecture. They are processed concurrently and independently from the rest. The contexts are represented by key features (k) of a lower dimension. The final context is computed as a weighted sum of the contexts according to an attention distribution. The mechanism is explained in Figure 6.

**Figure 6**: Explanation of depth-wise attention in convolutional architectures.

The layers in the variational hierarchy are augmented with two depth-wise attention blocks for constructing the context of the prior and posterior distribution. Figure 1 displays the computational block of an attentive variational layer. As shown in Figure 6, each layer also needs to emit attention-relevant features: the keys (k_l) and queries (s_l), along with the contexts (c_l). Equation (3) is revised for the attention-driven path in the decoder such that the context, its key, and the query are jointly learned:

[ [c_l, s_l, k_l] leftarrow T_l left (z_{l-1} oplus c_{l-1} right). text{ (4)}]

A formal description along with normalization schemes are provided in our paper.

Intra-Layer couplings: Non-local blocks

Intra-layer dependencies can be leveraged by interleaving non-local blocks [3] with the convolutions in the ResNet blocks of the architecture, also shown in Figure 1.

Experiments

We evaluate Attentive VAEs on several public benchmark datasets of both binary and natural images. In Table 2, we show performance and training time of state-of-the-art, deep VAEs on CIFAR-10. CIFAR-10 is a 32×32 natural images dataset. Attentive VAEs achieve state-of-the-art likelihoods compared to other deep VAEs. More importantly, they do so with significantly fewer layers. Fewer layers mean decreased training and sampling time.

Model	Layers	Training Time (GPU hours)	( – log p(x) ) (bits/dim)
Attentive VAE, 400 epochs [1]	16	272	2.82
Attentive VAE, 500 epochs [1]	16	336	2.81
Attentive VAE, 900 epochs [1]	16	608	2.79
NVAE [2]	30	440	2.91
Very Deep VAE [4]	45	288	2.87

Table 2: Comparison of performance and computational requirements of deep state-of-the art VAE models.

In Figures 8 and 9, we show reconstructed and novel images generated by attentive VAE.

**Figure 8:** Original & Reconstructed CIFAR-10 images.

**Figure 9:** Uncurated fantasy CIFAR-10 images.

The reason behind this improvement is that the attention-driven, long-range connections between layers lead to better utilization of the latent space. In Figure 7, we visualize the KL divergence per layer during training. As we see in (b), the KL penalty is evenly distributed among layers. In contrast, as shown in (a), the upper layers in a local, deep VAE are significantly less active. This confirms our hypothesis that the fully-connected factorizations of Equations (1) and (2) may not be supported by local models. In contrast, an attentive VAE dynamically prioritizes statistical dependencies between latent variables most critical to inference.

**Figure 7:** KL visualization in **(a)** a local

Finally, attention-guided VAEs close the gap in the performance between variational models and expensive, autoregressive models. Comprehensive comparisons, quantitative and qualitative results are provided in our paper.

Conclusion

The expressivity of current deep probabilistic models can be improved by selectively prioritizing statistical dependencies between latent variables that are potentially distant from each other. Attention mechanisms can be leveraged to build more expressive variational distributions in deep probabilistic models by explicitly modeling both nearby and distant interactions in the latent space. Attentive inference reduces computational footprint by alleviating the need for deep hierarchies.

Acknowledgments

A special word of thanks is due to Christos Louizos for helpful pointers to prior works on VAEs, Katerina Fragkiadaki for helpful discussions on generative models and attention mechanisms for computer vision tasks, Andrej Risteski for insightful conversations on approximate inference, and Jeremy Cohen for his remarks on a late draft of this work. Moreover, we are very grateful to Radium Cloud for granting us access to computing infrastructure that enabled us to scale up our experiments. We also thank the International Society for Bayesian Analysis (ISBA) for the travel grant and the invitation to present our work as a contributed talk at the 2022 ISBA World Meeting. This material is based upon work supported by the Defense Advanced Research Projects Agency under award number FA8750-17-2-0130, and by the National Science Foundation under grant number 2038612. Moreover, the first author acknowledges support from the Alexander Onassis Foundation and from A. G. Leventis Foundation. The second author is supported by the National Science Foundation Graduate Research Fellowship Program under Grant No. DGE1745016 and DGE2140739.

DISCLAIMER: All opinions expressed in this post are those of the author and do not represent the views of CMU.

References

[1] Apostolopoulou I, Char I, Rosenfeld E, Dubrawski A. Deep Attentive Variational Inference. InInternational Conference on Learning Representations 2021 Sep 29.

[2] Vahdat A, Kautz J. Nvae: A deep hierarchical variational autoencoder. Advances in Neural Information Processing Systems. 2020;33:19667-79.

[3] Wang X, Girshick R, Gupta A, He K. Non-local neural networks. InProceedings of the IEEE conference on computer vision and pattern recognition 2018 (pp. 7794-7803).

[4] Child R. Very deep vaes generalize autoregressive models and can outperform them on images. arXiv preprint arXiv:2011.10650. 2020 Nov 20.

Want to learn more?

Check out:

Translating images into bird’s-eye-view maps

Reformulating the mapping problem to take advantage of sequence-to-sequence Transformers improves performance by an average of 15%.Read More

Vedere AI

Monthly Archives: May 2022

Scientists Building Digital Twins in NVIDIA Omniverse to Accelerate Clean Energy Research

Accelerating Design, Simulation and Real-Time Collaboration

Using AI to Optimize Designs and Enhance Digital Twins

HPC Researchers Seed the Future of In-Network Computing With NVIDIA BlueField DPUs

An Open API for DPUs

Texas Rides a Cloud-Native Super

MPI Gets Accelerated

DPUs in HPC Centers, Clouds

Durham Does Load Balancing

DPUs in Cambridge, Munich

An Expanding Network

Hyperscale Digital Twins to Give Us “Amazing Superpowers,” NVIDIA Exec Says at ISC 2022

Supercomputing Is Transforming Every Field of Discovery

World Simulation at an Inflection Point

BMW’s iFACTORY: “Lean, Green and Digital”

Teaching models to express their uncertainty in words

ACL: What comes next for natural-language processing?

Amazon and Max Planck Society launch Science Hub

Evaluating Multimodal Interactive Agents

Evaluating Multimodal Interactive Agents

Deep Attentive Variational Inference

Translating images into bird’s-eye-view maps

Navigation

GenAI Vision Endless Possibilities

"I'm interested in things that change the world or that affect the future and wondrous, new technology where you see it, and you're like, 'Wow, how did that even happen? How is that possible?'" -- Elon Musk

Copyright © 2019-2025 Vedere AI. All Rights Reserved.