3D Artist and Educator Hsin-Chien Huang Takes VR to the World Stage This Week ‘In the NVIDIA Studio’

3D Artist and Educator Hsin-Chien Huang Takes VR to the World Stage This Week ‘In the NVIDIA Studio’

3D artist, virtual reality expert, storyteller and educator Hsin-Chien Huang shares his unique creator journey and award-winning artwork Samsara this week In the NVIDIA Studio.

A Journey Unlike Any Other

Huang is a distinguished professor in the Department of Design at National Taiwan Normal University.

His creative journey included overcoming a number of obstacles, starting at age 4, when he lost sight in his right eye. His eyesight was impaired for over a decade before he regained it thanks to a Sri Lankan cornea donor.

This singular event proved inspirational, cementing virtual reality as his primary creative field, as it allows him to share with others the world as he uniquely sees it.

When he was getting his driver’s license, Huang registered himself as an organ donor, imagining that the cornea in his right eye will continue its journey to others who may receive it after his death.

Deep in the journey of ‘Samsara.’

In Samsara, one’s consciousness can travel within different individuals and animals.

Color, materials, sound and music are critical story elements that drive the narratives of his artwork, Huang said.

How did we get here?

“Discussing with musicians about adding sound to works always brings me new ideas to revise stories,” he said. “Elements may influence one another, and the process is like an upward spiral where each element develops and fosters each other simultaneously, slowly shaping the story.”

Cool, But How?

Working in VR can often be a nonlinear experience. Huang spends considerable time prototyping and iterating ideas to ensure that they’re feasible and can be shared.

He and his team will program and create multiple 3D animations and interactions. This helps them to examine if their works can convey the exact concept, invoking the body of emotions they hoped for.

Parametric modeling allows for faster, wide-scale edits.

The team makes use of various parametric modeling tools in Autodesk Maya, Houdini, iClone and Unity. The key in setting up 3D geometric objects is that shapes can be changed once parameters such as dimensions or curvatures are modified — removing the need to reshape the model from scratch.

This saves artists lots of time — especially in the conceptual stage — and is critical to the team’s workflow, Huang said.

“We use Unity for integration and interaction, and Xsens and Vicon for motion capture,” he said. Unity’s light baking and Autodesk Maya’s Arnold renderer both require powerful GPUs, and his GeForce RTX 3070 GPU was equal to the task.

The team’s photogrammetry software in RealityCapture also benefits greatly from NVIDIA CUDA acceleration.

Textures applied in Unity.

“Nowadays, a powerful GeForce RTX GPU is an indispensable tool for digital artists.” — Hsin-Chien Huang 

“Although the resolutions of these scanned models are low, it has the aesthetic of pixel art,” Huang said. He processed these models in Unity to give them a unique digital style. NVIDIA DLSS technology powered by his GeForce RTX GPU increases the interactivity of the viewport by using AI to upscale frames rendered at lower resolution while still retaining high-fidelity detail.

When it comes to creating textures, Huang recommends Adobe Substance 3D Painter, which can rapidly create quality, realistic textures for prototyping. RTX-accelerated light and ambient occlusion baking optimize his assets in mere seconds.

Photorealistic details made even more realistic with Topaz Labs Gigapixel AI.

Huang also uses Topaz Labs Gigapixel AI, which uses deep learning to offer better photo quality. Yet again his RTX GPU acccelerates AI for the sharpening of images while retaining high-fidelity details.

Huang is grateful for advancements in technology and their impact on creative possibilities.

“Nowadays, a powerful GeForce RTX GPU is an indispensable tool for digital artists,” he said.

Huang’s increasing popularity and extraordinary talent led him to Hollywood. In 2018, Huang performed a VR demo on hit TV show America’s Got Talent, which left an enormous impression on the judges and audience.

It was the first real-time motion capture and VR experience to be presented on a live stage. Huang said the pressure was intense during his performance as it was a live show and no mistakes could be tolerated.

“I could still sense the thrill and excitement on stage,” he recalled.

VR expert, storyteller and educator Hsin-Chien Huang.

Check out more of Huang’s artwork on his website.

Carry on, Carry on #WinterArtChallenge

Enter NVIDIA Studio’s #WinterArtChallenge, running through the end of the year, by sharing winter-themed art on Instagram, Twitter or Facebook for a chance to be featured on our social media channels.

Like @RippaSats and his fun celebration of penguins.

Be sure to tag #WinterArtChallenge to join.

Get creativity-inspiring updates directly to your inbox by subscribing to the NVIDIA Studio newsletter.

The post 3D Artist and Educator Hsin-Chien Huang Takes VR to the World Stage This Week ‘In the NVIDIA Studio’ appeared first on NVIDIA Blog.

Read More

AWS Unveils New AI Service Features and Enhancements at re:Invent 2022

AWS Unveils New AI Service Features and Enhancements at re:Invent 2022

Over the last 5 years, artificial intelligence (AI) and machine learning (ML) have evolved from a niche activity to a rapidly growing mainstream endeavor. Today, more than 100,000 customers across numerous industries rely on AWS for ML and AI initiatives that infuse AI into a broad range of business use cases to automate repetitive and mundane tasks—from intelligent demand planning to document processing and content moderation. AWS AI services help customers create smoother, faster, and more efficient engagements with customers, driving greater efficiencies and lowering operational costs.

At AWS re:Invent, Amazon Web Services, Inc. has announced a series of features and enhancements across its portfolio of AI services, including purpose-built solutions to solve industry-specific challenges, representing a deeper integration of AI into everyday experiences. The new capabilities include Amazon Textract Analyze Lending to improve loan-document processing efficiency, Amazon Transcribe Call Analytics to analyze in-progress contact center calls, Amazon Kendra support for tabular search in HTML and seven new languages, Amazon HealthLake Imaging for medical image storing; Amazon HealthLake Analytics with multi-modal data querying capabilities, and broader programming languages support and easier administration in Amazon CodeWhisperer. These AI service innovations provide vertical markets and horizontal functions with deeper, real-time insights and cost-saving efficiencies to drive transformation across industries.

These new capabilities enhance AWS’s AI offerings at the top of its three-layer ML stack. The bottom layer includes foundational components (ML hardware and ML software libraries) to help customers build their own ML infrastructure, and the middle layer—Amazon SageMaker—is a fully managed ML development environment. The top layer of AI services brings ML to business use cases such as transcribing contact center calls, processing documents, and improving healthcare outcomes. Customers can use AWS AI services with no ML expertise required.

Customers from different industries rely on AWS AI services to improve efficiency and reduce operational costs. For example, WaFd Bank, a full-service US bank, improved its customer experience with Talkdesk (a global cloud contact center company) and AWS Contact Center Intelligence (CCI) solutions, reducing call times by up to 90%. And State Auto, a property and casualty insurance holding company, automated the property inspection process using Amazon Rekognition (a computer vision service), increasing the number of claims it reviews for potential fraud by 83%.

Amazon Textract Analyze Lending makes it easy to classify and extract mortgage loan data

Today, mortgage companies process large volumes of documents to extract business-critical data and make decisions on loan applications. For example, a typical US mortgage application can encompass 500 or more pages of diverse document types, including W2 forms, paystubs, bank statements, Form 1040, 1003, and many more. The lender’s loan processing application has to first understand and classify each document type to ensure that it is processed the right way. After that, the loan processing application has to extract all the data on each page of the document. The data in these documents exists in different formats and structures, and the same data element can have different names on different documents—for example, “SSN,” or “Social Security Number,” which can lead to inaccurate data extraction. So far, the classification and extraction of data from mortgage application packages have been primarily manual tasks. Furthermore, mortgage companies have to manage demand for mortgages that can fluctuate substantially during a year, so lenders are unable to plan effectively and must often allocate resources to process documents on an ad hoc basis. Overall, mortgage loan processing is still manual, slow, error-prone, and expensive.

Amazon Textract (AWS’s AI service to automatically extract text, handwriting, and data from scanned documents) now offers Amazon Textract Analyze Lending to make loan document processing more automated, faster, and cost-effective at scale. Amazon Textract Analyze Lending pulls together multiple ML models to classify various documents that commonly occur in mortgage packages, and then extracts critical information from these documents with high accuracy to improve loan document processing workflows. For example, it can now perform signature detection to identify whether documents have required signatures. It also provides a summary of the documents in a mortgage application package and identifies any missing documents. For instance, PennyMac, a financial services firm specializing in the production and servicing of US mortgage loans, uses Amazon Textract Analyze Lending to process a 3,000-page mortgage application in less than 5 minutes. Previously, PennyMac’s mortgage document processing required several hours of reviewing and preparing a loan package for approval.

Amazon Transcribe Call Analytics for improved end-user experiences

In most customer-facing industries such as telecom, finance, healthcare, and retail, customer experiences with call centers can profoundly impact perceptions of the company. Lengthy call-resolution times or the inability to deal with issues during live interactions can lead to poor customer experiences or customer churn. Contact centers need real-time insights into customer-experience issues (e.g., a product defect) while calls are in progress. Typically, developers use multiple AI services to generate live call transcriptions, extract relevant real-time insights, and manage sensitive customer information (e.g. identify and redact sensitive customer details) during live calls. However, this process adds unnecessary complexity, time, and cost.

Amazon Transcribe, an automatic speech recognition (ASR) service that makes it easy for developers to add speech-to-text capabilities to their applications, now supports call analytics to provide real-time conversation insights. Amazon Transcribe Call Analytics now provides real-time conversation insights that help analyze thousands of in-progress calls, identify call sentiment (e.g. calls that ended with a negative customer sentiment score), detect the potential reason for the call, and spot issues such as repeated requests to speak to a manager. Amazon Transcribe Call Analytics combines powerful automatic speech NLP models that are trained specifically to improve overall customer experience. With Amazon Transcribe Call Analytics, developers can build a real-time system that provides contact center agents with relevant information to solve customer issues or alert supervisors about potential issues. Amazon Transcribe Call Analytics also generates call summaries automatically, eliminating the need for agents to take notes and allowing them to focus on customer needs. Furthermore, Amazon Transcribe Call Analytics protects sensitive customer data by identifying and redacting personal information during live calls.

Amazon Kendra adds new search capabilities

Today, in the face of rapid growth in the volume and variety of data, enterprise search tools struggle to examine and uncover key insights stored across enterprise systems in heterogenous data formats and in different languages. Conventional enterprise search solutions are unable to find knowledge stored in unstructured datasets like HTML tables because it requires extracting information from two-dimensional formats (rows and columns). Sometimes, the information a customer may be seeking could exist in different languages, making the search even more challenging. As a result, enterprise employees waste time searching for information or are unable to perform their duties.

Amazon Kendra (AWS’s intelligent search service powered by ML) offers a new capability that supports tabular search in HTML. Customers can find more precise answers faster in HTML documents, whether they’re in the narrative body or tabular form, by using natural language questions. Amazon Kendra can find and extract precise answers from HTML tables by performing deeper analyses of HTML pages and using new specialized deep learning models that intelligently interpret columns and rows to pinpoint relevant data. Amazon Kendra is also adding semantic support for seven new languages (in addition to English): French, Spanish, German, Portuguese, Japanese, Korean, and Chinese. Customers can now ask natural language questions and get exact answers in any of the supported languages. One of AWS’s biopharmaceutical customers, Gilead Sciences Inc., increased staff productivity by cutting internal search times by roughly 50% using Amazon Kendra.

Amazon HealthLake offers next-generation imaging solutions and precision health analytics

Healthcare providers face a myriad of challenges as the scale and complexity of medical imaging data continues to increase. Medical imaging is a critical tool to diagnose patients, and there are billions of medical images scanned globally each year. Imaging data accounts for about 90% 1 of all healthcare data, and analyzing these complex images has largely been a manual task performed by experts and specialists. It often takes data scientists and researchers weeks or months to derive important insights from medical images, slowing down decision-making processes for healthcare providers and impacting patient-care delivery. To address these challenges, Amazon HealthLake (a HIPAA-eligible service to store, transform, query, and analyze large-scale health data) is adding two new capabilities for medical imaging and analytics:

  • Amazon HealthLake Imaging is a new HIPAA-eligible capability that enables healthcare providers and their software partners to easily store, access, and analyze medical images at petabyte scale. The new capability is designed for fast, subsecond image retrieval in clinical workflows that healthcare providers can access securely from anywhere (e.g., web, desktop, or phone) and with high availability. Typically, health systems store multiple copies of the same imaging data in clinical and research systems, leading to increased storage costs and complexity. Amazon HealthLake Imaging extracts and stores just one copy of the same image to the cloud. Customers can now access existing medical records and run analysis applications from a single encrypted copy of the same data in the cloud with normalized metadata and advanced compression. As a result, Amazon HealthLake Imaging can help providers reduce the total cost of medical imaging storage by up to 40%.
  • Amazon HealthLake Analytics is a new HIPAA-eligible capability that makes it easy to query and derive insights from multi-modal health data (e.g., imaging, text, or genetics), at the individual or population levels, with the ability to share data securely across the enterprise. It removes the need for healthcare providers to execute complex data exports and data transformations. Amazon HealthLake Analytics automatically normalizes raw health data from disparate sources (e.g., medical records, health insurance claims, EHRs, or medical devices) into an analytics and interoperable format in minutes. The new capability reduces what would otherwise take months of engineering effort to allow providers to focus on what they do best—delivering patient care.

Amazon CodeWhisperer offers broader support and easier administration

While the cloud has democratized application development through on-demand access to compute, storage, database, analytics, and ML, the traditional process of building software applications in any industry remains time-intensive. Developers must still spend significant time writing repetitive code not directly related to the core problems they want to solve. Even highly experienced developers find it difficult to keep up with multiple programming languages, frameworks, and software libraries, while ensuring they follow correct programming syntax and coding best practices.

Amazon CodeWhisperer (an ML-powered service that generates code recommendations) now supports AWS Builder ID so any developer can sign up securely with just an email address and enable Amazon CodeWhisperer for their IDE within the AWS Toolkit. In addition to Python, Java, and JavaScript, Amazon CodeWhisperer adds support for TypeScript and C# languages to accelerate code development. Also, Amazon CodeWhisperer now makes code recommendations for AWS application programming interfaces (APIs) across its most popular services, including Amazon Elastic Compute Cloud (Amazon EC2), AWS Lambda, and Amazon Simple Storage Service (Amazon S3). Finally, Amazon CodeWhisperer is now available on the AWS Management Console, so any authorized AWS administrator can enable Amazon CodeWhisperer for their organization.

Conclusion

With these new features and capabilities, AWS continues to expand its portfolio of the broadest and deepest set of AI services. AWS also recognizes that as AI-powered use cases become pervasive, it is important that these capabilities are built in a responsible way. AWS is committed to building its services in a responsible manner and supporting customers to help them deploy AI responsibly. By enabling customers to more easily and responsibly add new and expanded AI capabilities to their applications and workflows, AWS is unleashing even greater innovation and helping businesses reimagine how they approach and solve some of their most pressing challenges. To learn more about AWS’s comprehensive approach to responsible AI, visit Responsible use of artificial intelligence and machine learning.

References

1S. K. Zhou et al., “A Review of Deep Learning in Medical Imaging: Imaging Traits, Technology Trends, Case Studies With Progress Highlights, and Future Promises,” in Proceedings of the IEEE, vol. 109, no. 5, pp. 820-838, May 2021, doi: 10.1109/JPROC.2021.3054390.


About the Author

Bratin Saha is the Vice President of Artificial Intelligence and Machine Learning at AWS.

Read More

How Hugging Face improved Text Generation performance with XLA

How Hugging Face improved Text Generation performance with XLA

Posted by The Hugging Face Team 🤗

Language models have bloomed in the past few years thanks to the advent of the Transformer architecture. Although Transformers can be used in many NLP applications, one is particularly alluring: text generation. It caters to the practical goals of automating verbal tasks and to our dreams of future interactions with chatbots.

Text generation can significantly impact user experiences. So, optimizing the generation process for throughput and latency is crucial. On that end, XLA is a great choice for accelerating TensorFlow models. The caveat is that some tasks, like text generation, are not natively XLA-friendly.

The Hugging Face team recently added support for XLA-powered text generation in 🤗 transformers for the TensorFlow models. This post dives deeper into the design choices that had to be made in order to make the text generation models TensorFlow XLA-compatible. Through these changes to incorporate XLA compatibility, we were able to significantly improve the speed of the text generation models ~ 100x faster than before.

A Deeper Dive into Text Generation

To understand why XLA is non-trivial to implement for text generation, we need to understand text generation in more detail and identify the areas that would benefit the most from XLA.

Popular models based on the Transformer architecture (such as GPT2) rely on autoregressive text generation to produce their outputs. Autoregressive text generation (also known as language modeling) is when a model is iteratively called to predict the next token, given the tokens generated so far, until some stopping criterion is reached. Below is a schematic of a typical text generation loop:

Flow diagram of a typical text generation loop

Any autoregressive text generation pipeline usually contains two main stages in addition to the model forward pass: logits processing and next token selection.

Next token selection

Next token selection is, as the name suggests, the process of selecting the token for the current iteration of text generation. There are a couple of strategies to perform next token selection:

  • Greedy decoding. The simplest strategy, known as greedy decoding, simply picks the token with the highest probability as predicted by the underlying text generation model.
  • Beam search. The quality of greedy decoding can be improved with beam search, where a predetermined number of best partial solutions are kept as candidates at the cost of additional resources. Beam search is particularly promising to obtain factual information from the language model, but it struggles with creative outputs.
  • Sampling. For tasks that require creativity, a third strategy known as sampling is the most effective, where each subsequent input token is sampled from the probability distribution of the predicted tokens.

You can read more about these strategies in this blog post.

Logit preprocessing

Perhaps the least discussed step of text generation is what happens between the model forward pass and the next token selection. When performing a forward pass with a text generation model, you will obtain the unnormalized log probabilities for each token (also known as logits). At this stage, you can freely manipulate the logits to impart the desired behavior to text generation. Here are some examples:

  • You can prevent certain tokens from being generated if you set their logits to a very large negative value;
  • Token repetition can be reduced if you add a penalty to all tokens that have been previously generated;
  • You can nudge sampling towards the most likely tokens if you multiply all logits by a constant smaller than one, also known as temperature scaling.

Before you move on to the XLA section of this blog post, there is one more technical aspect of autoregressive text generation that you should know about. The input to a language model is the sequence of tokens generated so far. So, if the input has N tokens, the current forward pass will repeat some attention-related computations from the previous N-1 tokens. The actual details behind these repeated computations deserve (and have) a blog post of their own, the illustrated GPT-2. In summary, you can (and should) cache the keys and the values from the masked self-attention layers where the size of the cache equals the number of input tokens obtained in the previous generation iteration.

Here we identified three keys areas that could benefit from XLA:

  • Control flow
  • Data structures
  • Utilities accepting dynamically shaped inputs

Adjusting Text Generation for XLA

As a TensorFlow user, the first thing you must do if you want to compile your function with XLA is to ensure that it can be wrapped with a tf.function and handled with AutoGraph. There are many different paths you can follow to get it done for autoregressive text generation – this section will cover the design decisions made at Hugging Face 🤗, and is by no means prescriptive.

Switching between eager execution and XLA-enabled graph mode should come with as few surprises as possible. This design decision is paramount to the transformers library team. Eager execution provides an easy interface to the TensorFlow users for better interaction, greatly improving the user experience. To maintain a similar level of user experience, it is important for us to reduce the friction of XLA conversion.

Control flow

As mentioned earlier, text generation is an iterative process. You condition the inputs based on what has been generated, where the first generation is usually “seeded” with a start token. But, this continuity is not infinite – the generation process terminates with a stopping criterion.

For dealing with such a continuous process, we resort to while statements. AutoGraph can automatically handle most while statements with no changes, but if the while condition is a tensor, then it will be converted to a tf.while_loop in the function created by tf.function. With tf.while_loop, you can specify which variables will be used across iterations and if they are shape-invariant or not (which you can’t do with regular Python while statements, more on this later).

# This will have an implicit conversion to a `tf.while_loop` in a `tf.function`
x = tf.constant([10.0, 20.0])
while tf.reduce_sum(x) > 1.0:
  x = x / 2

# This will give you no surprises and a finer control over the loop.
x = tf.constant([10.0, 20.0])
x = tf.while_loop(
  cond=lambda x: tf.reduce_sum(x) > 1.0,
  body=lambda x: [x / 2],
  loop_vars=[x]
)[0]

An advantage of using tf.while_loop for the text generation autoregressive loop is that the stopping conditions become clearly identifiable – they are the termination condition of the loop, corresponding to its cond argument. Here are two examples we resorted to tf.while_loop with explicit conditioning:

Sometimes a for loop repeats the same operation for an array of inputs, such as in the processing of candidates for beam search. AutoGraph’s strategy will greatly depend on the type of the condition variable, but there are further alternatives that do not rely on AutoGraph. For instance, vectorization can be a powerful strategy – instead of applying a set of operations for each data point/slice, you apply the same operations across a dimension of your data. However, it has some drawbacks. Skipping operations is not desirable with vectorized operations, so it is a trade-off you should consider.

# Certain `for` loops might skip some unneeded computations …
x = tf.range(10) – 2
x_2 = []
for value in x:
  if value > 0:
      value = value / 2
  x_2.append(tf.cast(value, tf.float64))
y = tf.maximum(tf.stack(x_2), 0)
# … but the benefit might be small for loss in readability compared to a
# vectorized operation, especially if the performance gains from a simpler
# control flow are factored in.
x = tf.range(10) – 2
x_2 = x / 2
y = tf.maximum(x_2, 0)

In the beam search candidate loop, some of the iterations can be skipped because you can tell in advance that the result will not be used. The ratio of skipped iterations was low and the readability benefits of vectorization were considerable, so we adopted a vectorization strategy to execute the candidate processing in beam search. Here is one example of logit processing, benefitting from this type of vectorization.

The last type of control flow that must be addressed for text generation is the if/else branches. Similarly to while statements, AutoGraph will convert if statements into tf.cond if the condition is a tensor.

# If statements can look trivial like this one.
x = tf.constant(1.0)
if x > 0.0:
  x = x – 1.0

# However, they should be treated with care inside a `tf.function`
x = tf.constant(1.0)
x = tf.cond(
  tf.greater(x, 0.0),
  lambda: x – 1.0,
  lambda: x
)

This conversion places some constraints on your design: the branches of your if statement must now be converted to function calls, and both branches must return the same number and type of outputs. This change impacts complex logit processors, such as the one that prevents specific tokens from being generated. Here is one example that shows our XLA port to filter undesirable tokens as a part of logit processing.

Data structures

In text generation, many data structures don’t have a static dimension that depends on how many tokens were generated up to that point. This includes:

  • generated tokens themselves,
  • attention masks for the tokens,
  • and cached attention data as mentioned in the previous section,

among others. Although tf.while_loop allows you to use variables with varying shapes across iterations, this process will trigger re-tracing, which should be avoided whenever possible since it’s computationally expensive. You can refer to the official commentary on tracing in case you want to delve deeper.

The summary here is that if you constantly call your tf.function wrapped function with the same input tensor shape and type (even if they have different data), and do not use new non-tensor inputs, you will not incur tracing-related penalties.

At this point, you might have anticipated why loops with dynamic shapes are not desirable for text generation. In particular, the model forward pass would have to be retraced as more and more generated tokens are used as part of its input, which would be undesirable. As an alternative, our implementation of autoregressive text generation uses static shapes obtained from the maximum possible generation length. Those structures can be padded and easily ignored thanks to the attention masking mechanisms in the Transformer architecture. Similarly, tracing is also a problem when your function itself has different possible input shapes. For text generation, this problem is handled the same way: you can (and should) pad your input prompt to reduce the possible input lengths.

# You have to run each section separately, commenting out the other.
import time
import tensorflow as tf

# Same function being called with different input shapes. Notice how the
# compilation times change — most of the weight lifting is done on the
# first call.

@tf.function(jit_compile=True)
def reduce_fn_1(vector):
  return tf.reduce_sum(vector)

for i in range(10, 13):
  start = time.time_ns()
  reduce_fn_1(tf.range(i))
  end = time.time_ns()
  print(f”Execution time — {(end – start) / 1e6:.1f} ms”)
# > Execution time — 520.4 ms
# > Execution time — 26.1 ms
# > Execution time — 25.9 ms

# Now with a padded structure. Despite padding being much larger than the
# actual data, the execution time is much lower because there is no retracing.

@tf.function(jit_compile=True)
def reduce_fn_2(vector):
  return tf.reduce_sum(vector)

padded_length = 512
for i in range(10, 13):
  start = time.time_ns()
  reduce_fn_2(tf.pad(tf.range(i), [[0, padded_length – i]]))
  end = time.time_ns()
  print(f”Execution time — {(end – start) / 1e6:.1f} ms”)
# > Execution time — 511.8 ms
# > Execution time — 0.7 ms
# > Execution time — 0.4 ms

Positional embeddings

Transformer-based language models rely on positional embeddings for the input tokens since the Transformer architecture is permutation invariant. These positional embeddings are often derived from the size of the structures. With padded structures, that is no longer possible, as the length of the input sequences no longer matches the number of generated tokens. In fact, because different models have different ways of retrieving these positional embeddings given the position index, the most straightforward solution was to use explicit positional indexes for the tokens while generating and to perform some ad-hoc model surgery to handle them.

Here are a couple of example model surgeries that we made to make the underlying models XLA-compatible:

Finally, to make our users aware of the potential failure cases and limitations of XLA, we ensured adding informative in-code exceptions (an example).

To summarize, our journey from a naive TensorFlow text generation implementation to an XLA-powered one consisted of:

  1. Replacing for/while Python loops conditional on tensors with tf.while_loop or vectorization;
  2. Replacing if/else operations conditioned on tensors with tf.cond;
  3. Creating fixed-size tensors for all tensors that had dynamic size;
  4. Stopping relying on tensor shapes to obtain the positional embedding;
  5. Documenting proper use of the XLA-enabled text generation.

What’s next?

The journey to XLA-accelerated TensorFlow text generation by Hugging Face 🤗 was full of learning opportunities. But more importantly, the results speak for themselves: with these changes, TensorFlow text generation can execute 100x faster than before! You can try it yourself in this Colab and can check out some benchmarks here.

Bringing XLA into your mission-critical application can greatly impact driving down costs and latency. The key to accessing these benefits lies in understanding how AutoGraph and tracing work to bring the most out of them. Have a look at the resources shared in this blog post and give it a go!


Acknowledgements

Thanks to the TensorFlow team for bringing support for XLA. Thanks to Joao Gante (Hugging Face) for spearheading the development of XLA-enabled text generation models for TensorFlow in 🤗 Transformers.

Read More

Rewards Encoding Environment Dynamics Improves Preference-based Reinforcement Learning

This paper was accepted at the workshop at “Human-in-the-Loop Learning Workshop” at NeurIPS 2022.
Preference-based reinforcement learning (RL) algorithms help avoid the pitfalls of hand-crafted reward functions by distilling them from human preference feedback, but they remain impractical due to the burdensome number of labels required from the human, even for relatively simple tasks. In this work, we demonstrate that encoding environment dynamics in the reward function (REED) dramatically reduces the number of preference labels required in state-of-the-art preference-based RL frameworks. We…Apple Machine Learning Research

Deploy an MLOps solution that hosts your model endpoints in AWS Lambda

Deploy an MLOps solution that hosts your model endpoints in AWS Lambda

In 2019, Amazon co-founded the climate pledge. The pledge’s goal is to achieve net zero carbon by 2040. This is 10 years earlier than the Paris agreement outlines. Companies who sign up are committed to regular reporting, carbon elimination, and credible offsets. At the time of this writing, 377 companies have signed the climate pledge, and the number is still growing.

Because AWS is committed to helping you achieve your net zero goal through cloud solutions and machine learning (ML), many projects have already been developed and deployed that reduce carbon emissions. Manufacturing is one of the industries that can benefit greatly from such projects. Through optimized energy management of machines in manufacturing factories, such as compressors or chillers, companies can reduce their carbon footprint with ML.

Effectively transitioning from an ML experimentation phase to production is challenging. Automating model training and retraining, having a model registry, and tracking experiments and deployment are some of the key challenges. For manufacturing companies, there is another layer of complexity, namely how these deployed models can run at the edge.

In this post, we address these challenges by providing a machine learning operations (MLOps) template that hosts a sustainable energy management solution. The solution is agnostic to use cases, which means you can adapt it for your use cases by changing the model and data. We show you how to integrate models in Amazon SageMaker Pipelines, a native workflow orchestration tool for building ML pipelines, which runs a training job and optionally a processing job with a Monte Carlo Simulation. Experiments are tracked in Amazon SageMaker Experiments. Models are tracked and registered in the Amazon SageMaker model registry. Finally, we provide code for deployment of your final model in an AWS Lambda function.

Lambda is a compute service that lets you run code without managing or provisioning servers. Lambda’s automatic scaling, pay-per-request billing, and ease of use make it a common deployment choice for data science teams. With this post, data scientists can turn their model into a cost-effective and scalable Lambda function. Furthermore, Lambda allows for integration with AWS IoT Greengrass, which helps you build software that enables your devices to act at the edge on the data that they generate, as would be the case for a sustainable energy management solution.

Solution overview

The architecture we deploy (see the following figure) is a fully CI/CD-driven approach to machine learning. Elements are decoupled to avoid having one monolithic solution.

Architecture Diagram

Let’s start with the top left of the diagram. The Processing – Image build component is a CI/CD-driven AWS CodeCommit repository that helps build and push a Docker container to Amazon Elastic Container Registry (Amazon ECR). This processing container serves as the first step in our ML pipeline, but it’s also reused for postprocessing steps. In our case, we apply a Monte Carlo Simulation as postprocessing. The Training – Image build repository outlined on the bottom left has the same mechanism as the Processing block above it. The main difference is that it builds the container for model training.

The main pipeline, Model building (Pipeline), is another CodeCommit repository that automates running your SageMaker pipelines. This pipeline automates and connects the data preprocessing, model training, model metrics tracking in SageMaker Experiments, data postprocessing, and, model cataloging in SageMaker model registry.

The final component is on the bottom right: Model deployment. If you follow the examples in Amazon SageMaker Projects, you get a template that hosts your model using a SageMaker endpoint. Our deployment repository instead hosts the model in a Lambda function. We show an approach for deploying the Lambda function that can run real-time predictions.

Prerequisites

To deploy our solution successfully, you need the following:

Download the GitHub repository

As a first step, clone the GitHub repository to your local machine. It contains the following folder structure:

  • deployment – Contains code relevant for deployment
  • mllib — Contains ML code for preprocessing, training, serving, and simulating
  • tests — Contains unit and integration tests

The key file for deployment is the shell script deployment/deploy.sh. You use this file to deploy the resources in your account. Before we can run the shell script, complete the following steps:

  1. Open the deployment/app.py and change the bucket_name under SageMakerPipelineSourceCodeStack. The bucket_name needs to be globally unique (for example, add your full name).
  2. In deployment/pipeline/assets/modelbuild/pipelines/energy_management/pipeline.py, change the default_bucket under get_pipeline to the same name as specified in step 1.

Deploy solution with the AWS CDK

First, configure your AWS CLI with the account and Region that you want to deploy in. Then run the following commands to change to the deployment directory, create a virtual environment, activate it, install the required pip packages specified in setup.py, and run the deploy.sh:

cd deployment
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
pre-commit install
chmod u+x deploy.sh
./deploy.sh

deploy.sh performs the following actions:

  1. Creates a virtual environment in Python.
  2. Sources the virtual environment activation script.
  3. Installs the AWS CDK and the requirements outlined in setup.py.
  4. Bootstraps the environment.
  5. Zips and copies the necessary files that you developed, such as your mllib files, into the corresponding folders where these assets are needed.
  6. Runs cdk deploy —require-approval never.
  7. Creates an AWS CloudFormation stack through the AWS CDK.

The initial stage of the deployment should take less than 5 minutes. You should now have four repositories in CodeCommit in the Region you specified through the AWS CLI, as outlined in the architecture diagram. The AWS CodePipeline pipelines are run simultaneously. The modelbuild and modeldeploy pipelines depend on a successful run of the processing and training image build. The modeldeploy pipeline depends on a successful model build. The model deployment should be complete in less than 1.5 hours.

Clone the model repositories in Studio

To customize the SageMaker pipelines created through the AWS CDK deployment in the Studio UI, you first need to clone the repositories into Studio. Launch the system terminal in Studio and run the following commands after providing the project name and ID:

git clone https://git-codecommit.REGION.amazonaws.com/v1/repos/sagemaker-PROJECT_NAME-PROJECT_ID-modelbuild
git clone https://git-codecommit.REGION.amazonaws.com/v1/repos/sagemaker-PROJECT_NAME-PROJECT_ID-modeldeploy
git clone https://git-codecommit.REGION.amazonaws.com/v1/repos/sagemaker-PROJECT_NAME-PROJECT_ID-processing-imagebuild
git clone https://git-codecommit.REGION.amazonaws.com/v1/repos/sagemaker-PROJECT_NAME-PROJECT_ID-training-imagebuild

After cloning the repositories, you can push a commit to the repositories. These commits trigger a CodePipeline run for the related pipelines.

You can also adapt the solution on your local machine and work on your preferred IDE.

Navigate the SageMaker Pipelines and SageMaker Experiments UI

A SageMaker pipeline is a series of interconnected steps that are defined using the Amazon SageMaker Python SDK. This pipeline definition encodes a pipeline using a Directed Acyclic Graph (DAG) that can be exported as a JSON definition. To learn more about the structure of such pipelines, refer to SageMaker Pipelines Overview.

Navigate to SageMaker resources pane and choose the Pipelines resource to view. Under Name, you should see PROJECT_NAME-PROJECT_ID. In the run UI, there should be a successful run that is expected to take a little over 1 hour. The pipeline should look as shown in the following screenshot.

Amazon SageMaker Pipeline

The run was automatically triggered after the AWS CDK stack was deployed. You can manually invoke a run by choosing Create execution. From there you can choose your own pipeline parameters such as the instance type and number of instances for the processing and training steps. Furthermore, you can give the run a name and description. The pipeline is highly configurable through pipeline parameters that you can reference and define throughout your pipeline definition.

Feel free to start another pipeline run with your parameters as desired. Afterwards, navigate to the SageMaker resources pane again and choose Experiments and trials. There you should again see a line with a name such as PROJECT_NAME-PROJECT_ID. Navigate to the experiment and choose the only run with a random ID. From there, choose the SageMaker training job to explore the metrics related to the training Job.

The goal of SageMaker Experiments is to make it as simple as possible to create experiments, populate them with trials, and run analytics across trials and experiments. SageMaker Pipelines are closely integrated with SageMaker Experiments, and by default for each run create an experiment, trial and trial components in case they do not exist.

Approve Lambda deployment in the model registry

As a next step, navigate to the model registry under SageMaker resources. Here you can find again a line with a name such as PROJECT_NAME-PROJECT_ID. Navigate to the only model that exists and approve it. This automatically deploys the model artifact in a container in Lambda.

After you approve your model in the model registry, an Amazon EventBridge event rule is triggered. This rule runs the CodePipeline pipeline with the ending *-modeldeploy. In this section, we discuss how this solution uses the approved model and hosts it in a Lambda function. CodePipeline takes the existing CodeCommit repository also ending with *-modeldeploy and uses that code to run in CodeBuild. The main entry for CodeBuild is the buildspec.yml file. Let’s look at this first:

version: 0.2

env:
  shell: bash

phases:
  install:
    runtime_versions:
      python: 3.8
    commands:
      - python3 -m ensurepip --upgrade
      - python3 -m pip install --upgrade pip
      - python3 -m pip install --upgrade virtualenv
      - python3 -m venv .venv
      - source .venv/bin/activate
      - npm install -g aws-cdk@2.26.0
      - pip install -r requirements.txt
      - cdk bootstrap
  build:
    commands:
      - python build.py --model-package-group-name "$SOURCE_MODEL_PACKAGE_GROUP_NAME"
      - tar -xf model.tar.gz
      - cp model.joblib lambda/digital_twin
      - rm model.tar.gz
      - rm model.joblib
      - cdk deploy --require-approval never

During the installation phase, we ensure that the Python libraries are up to date, create a virtual environment, install AWS CDK v2.26.0, and install the aws-cdk Python library along with others using the requirements file. We also bootstrap the AWS account. In the build phase, we run build.py, which we discuss next. That file downloads the latest approved SageMaker model artifact from Amazon Simple Storage Service (Amazon S3) to your local CodeBuild instance. This .tar.gz file is unzipped and its contents copied into the folder that also contains our main Lambda code. The Lambda function is deployed using the AWS CDK, and code runs out of a Docker container from Amazon ECR. This is done automatically by AWS CDK.

The build.py file is a Python file that mostly uses the AWS SDK for Python (Boto3) to list the model packages available.

The function get_approved_package returns the Amazon S3 URI of the artifact that is then downloaded, as described earlier.

After successfully deploying your model, you can test it directly on the Lambda console in the Region you chose to deploy in. The name of the function should contain DigitalTwinStack-DigitalTwin*. Open the function and navigate to the Test tab. You can use the following event to run a test call:

{
  "flow": "[280, 300]",
  "pressure": "[69, 70]",
  "simulations": "10",
  "no_of_trials": "10",
  "train_error_weight": "1.0"
}

After running the test event, you get a response similar to that shown in the following screenshot.

Test AWS Lambda function

If you want to run more simulations or trials, you can increase the Lambda timeout limit and experiment with the code! Or you might want to pick up the data generated and visualize the same in Amazon QuickSight. Below is an example. It’s your turn now!

Amazon QuickSight

Clean up

To avoid further charges, complete the following steps:

  • On the AWS CloudFormation console, delete the EnergyOptimization stack.
    This deletes the entire solution.
  • Delete the stack DigitalTwinStack, which deployed your Lambda function.

Conclusion

In this post, we showed you a CI/CD-driven MLOps pipeline of an energy management solution where we keep each step decoupled. You can track your ML pipelines and experiments in the Studio UI. We also demonstrated a different deployment approach: upon approval of a model in the model registry, a Lambda function hosting the approved model is built automatically through CodePipeline.

If you’re interested in exploring either the MLOps pipeline on AWS or the sustainable energy management solution, check out the GitHub repository and deploy the stack in your own AWS environment!


About the Authors

Laurens van der Maas is a Data Scientist at AWS Professional Services. He works closely with customers building their machine learning solutions on AWS, and is passionate about how machine learning is changing the world as we know it.

Kangkang Wang is an AI/ML consultant with AWS Professional Services. She has extensive experience of deploying AI/ML solutions in healthcare and life sciences vertical. She also enjoys helping enterprise customers to build scalable AI/ML platforms to accelerate the cloud journey of their data scientists.

Selena Tabbara is a Data Scientist at AWS Professional Services. She works everyday with her customers to achieve their business outcomes by innovating on AWS platforms. In her spare time, Selena enjoys playing the piano, hiking and watching basketball.

Michael WallnerMichael Wallner is a Senior Consultant with focus on AI/ML with AWS Professional Services. Michael is passionate about enabling customers on their cloud journey to become AWSome. He is excited about manufacturing and enjoys helping transform the manufacturing space through data.

Read More

Causal Confounds in Sequential Decision Making

Causal Confounds in Sequential Decision Making

A standard assumption in sequential decision making is that we observe everything required to make good decisions. In practice however, this isn’t always the case. We discuss two specific examples (temporally correlated noise (a) and unobserved contexts (c)) that have stymied the use of IL/RL algorithms (in autonomous helicopters (b) and self-driving (d)). We derive provably correct algorithms for both of these problems that scale to continuous control problems.

Reinforcement Learning (RL) and Imitation Learning (IL) methods have achieved impressive results in recent years like beating the world champion at Go or controlling stratospheric balloons. Usually, these results are on problems where we either a) observe the full state or b) are able to faithfully execute our intended actions on the system. However, we frequently have to contend with situations where this isn’t the case: our self-driving car might miss a person’s hand gestures or persistent wind might make it difficult to fly our quadcopter perfectly straight. These sorts of situations can cause standard IL approaches to perform poorly ([1], [2]). In causal inference, we call a random variable that we don’t observe that influences a relationship we’d like to model a confounder. Using techniques from causal inference, we derive provably correct and scalable algorithms for sequential decision making in these sorts of confounded settings.

We’re going to be focused mostly on imitation learning (see our last blog post for more details). In IL, we observe trajectories (sequences of states and actions) generated by some expert policy (pi_E) and want to recover the policy that generated them. The expert could be a) an experienced driver if we’re trying to build a self-driving car, b) a user interacting with a recommender system we’re attempting to model, or c) scraped internet text used to train a large language model. We’re now going to discuss two issues of causation that are difficult for standard IL algorithms to handle. I’ll be drawing upon material from our ICML ’22 paper Causal Imitation under Temporally Correlated Noise and our NeurIPS ’22 paper Sequence Model Imitation Learning with Unobserved Contexts.

Issue 1: Temporally Correlated Noise

The first problem we’ll consider is when there’s temporally correlated noise (TCN) affecting pairs of expert actions. For example, if our expert was a human quadcopter pilot, the TCN could be persistent wind [1]. Let’s assume that the expert intended to fly completely straight but was unable to do so because of the persistent wind, producing swerving trajectories. If the learner ignores the fact that the wind was the root cause of these swerves, they might attempt to reproduce the swerving behavior of the expert, causing them to deviate even further at test time due to the continued influence of the wind.

In effect, adding correlated noise to both (X=s_t) and (Y=a_t) changes the apparent slope of the line we’re trying to fit, making standard regression-based approaches no longer consistent.

What’s happened is that the TCN has created a spurious correlation (e.g. being a little on the left is often followed by going more to the left) which the learner has latched onto. What we’d like is to instead learn a policy that can fly as straight as the expert did in the demonstrations.

Let’s try and formalize what’s happening in the above example via graphical models. In a Markov Decision Process (MDP), the learner (pi) responds to the observed state (s_t) via taking an action (a_t) and the MDP transitions via dynamics (mathcal{T}: s times a rightarrow s ) to the next state (s_{t+1}).

Adding in the TCN corresponds to adding in a common cause of a pair of actions that’s not part of the observed state ((u_t) below).

Now, let’s dig into what goes wrong with imitation learning under TCN. Let’s say we apply a standard algorithm like behavioral cloning: direct regression from states ((X = s_t)) to actions ((Y=a_t)). The TCN travels through the dynamics to influence both the inputs and outputs of our regression procedure, creating spurious correlations that our learner might latch onto, leading to poor test-time performance. We visualize this effect in red below.

In effect, the TCN makes some elements of state (s_2) look more correlated with action (a_2) than they would otherwise. Attempting to minimize training error, the learner will use this correlation to their advantage but will end up swerving unnecessarily at test time as a result.

Filtering out TCN with Instrumental Variable Regression

To learn well under TCN, we’re going to utilize the idea of a natural experiment from causal inference, in which we’re able to use variation in the observational data to simulate the effect of an intervention / randomized control trial (RCT). Interestingly enough, this idea was one of the central contributions of the winners of the Nobel Prize in Economics in 2021. Let’s first take a look at this idea in the single shot setting. Assume that we’re trying to predict from (X) to (Y), both of which are affected by an unobserved confounder (U):

Notice the random variable (Z) both (1) affects (X) and (2) is independent from (U). (Z) is therefore an independent source of variation in (X). We call such variables instruments. Under some assumptions (see our paper for details), one can condition on an instrument to de-noise the inputs to our regression procedure and recover a causal relationship. So, instead of regressing from $$X rightarrow Y,$$ one regresses from $$X|Z rightarrow Y|Z.$$

We’ll skip the derivation here but intuitively, conditioning on an independent source of variation, (Z), “washes out” the effect of the confounder, (U). The variation in (Z) therefore acts as a sort of “natural experiment,” simulating an RCT without requiring a true intervention.

Now, you might well be wondering what a valid instrument in the sequential setting is? Well, given the past is independent of future confounding, we can use past states as an instrument! Graphically,

Many econometrics papers will spend quite a lot of text arguing something is a valid instrument (i.e. satisfies (1) and (2)) but here, the arrow of time gives us an instrument for free. Algorithmically, instrumental variable regression for imitation learning under TCN corresponds to solving $$ s_t | s_{t-1} rightarrow a_t | s_{t-1}.$$

So, one first fits models of the left and right-hand sides of the above expression and then regresses between them. We give two algorithms for doing so efficiently in our paper. On simulated control tasks, we observe that our approaches (in teal and orange) are able to recover a policy that matches expert performance, while regression-based behavioral cloning struggles to do so.

So, putting it all together, temporally correlated noise between expert actions can create spurious correlations between recorded states and actions, causing standard regression-based imitation learning approaches to produce inaccurate predictors. One can use the “natural experiment” caused by the variation in past states to de-noise the inputs to their regression procedure and recover causally correct predictors that do not suffer from spurious correlations.

Issue 2: Unobserved Contexts

We’re now going to consider a different kind of confounder. Let’s assume we’re trying to imitate an expert who observes some context, (c), which the learner does not. For example, if we were trying to imitate an expert driver but our sensors don’t pick up a stop sign on the side of the road, (c) could refer to this side information. Now, this problem is in general impossible to solve: if we don’t see the sign, how would we know to stop? However, if we pay attention to our surroundings and accumulate information over time, we might be able to eventually filter out what we missed: if we observe all the other cars around us slowing down for a few timesteps, we might rationally conclude that we should as well, even though we never saw the stop sign.

More formally, we’re in a Partially Observed Markov Decision Process (a POMDP), which we can visualize as follows:

So, the effect of the stop sign ((c)) is reflected in the states ((s_1, s_2)). There’s two key differences between this graphical model and the TCN graphical model. First, the confounder directly affects the state. Second, the confounder is constant, rather than time-varying. These two characteristics make it plausible that, given enough state observations, we can figure out what (c) is and act as the expert does.

To enable us to accumulate evidence over time, we’re going to allow our policy to condition its actions on the entire history up to this point. Denoting ( h_t = (s_1, a_1, dots, s_{t}) ), our policy now takes the form $$pi: h_t rightarrow a_t .$$ This means our policy is now a sequence model (e.g. an RNN or a transformer).

Ok, we use sequence models as our policy class and apply behavioral cloning (prediction of the next token from the prefix) and call it a day right? Unfortunately, this often produces policies that naively repeat their own past actions. We call this the latching effect. For example, in the self-driving domain, this can lead to learning a driving policy that begins to turn and then just keeps turning, driving in circles [2]. This is clearly less than ideal.

Modeling Unobserved Contexts with Sequence Models + On-Policy Training

Let’s dig into why the latching effect stymies off-policy imitation learners. The model we learn during training looks like $$pi(a_t |h_t) approx p_(a_t^E| s_1^E, a_1^E, dots, s_t^E),$$ where the superscript (E) denotes that these are expert states and actions. Now at test time, the past actions we pass into our sequence model policy are our own rather than the expert’s. This means we’re trying to sample actions from $$p_(a_t^E| s_1, a_1, dots, s_t).$$ Unless we know that the elements to the right of the conditioning bar are exactly the same (i.e. we’ve perfectly matched the expert policy), we have no reason to believe that the model we’ve learned will generalize to our own induced state distribution. Put differently, we have no reason to believe we’ve learned a policy which will perform well at test-time.

In machine learning terms, we’re facing the problem of policy-induced covariate shift (PICS). Here, our covariates are the history of states and actions the policy uses to predict the next action. PICS is a well known problem in imitation learning and sequential prediction writ large (see last blog post for more details). What’s special about the unobserved context setting is the degree to which it causes problems. Early on in an episode, no learner can do well as they haven’t accumulated enough information to perform well. This means that we have an unavoidable source of covariate shift from preliminary mistakes.

Off-policy training (i.e. training only on the green expert states) might not ensure we generalize well to our own induced state distribution (the orange dashed line). The effect of unobserved contexts is to ensure we don’t do well at first because we don’t have enough information to act properly, making it quite likely that we go off the rails.

Now, let’s assume that the expert’s actions were relatively similar between adjacent timesteps (which is frequently true in practice). This means rather than learning a complex mapping from states to actions, our learner might have instead learned a policy that mostly copies the last action. This was fine when the previous actions were the expert’s. But when they are the learner’s extremely suboptimal initial actions, naive copying is a recipe for disaster. This is what leads to us learning a driving policy that is only capable of turning in circles. A different way of phrasing this point is because the learner was trained only on trajectories from the expert, it treats its own test-time past actions as though they were produced by the expert, leading to unfortunate downstream consequences.

On-policy training (i.e. performing rollouts in the environment and comparing them to expert rollouts) doesn’t suffer from these issues. This is because we’re instead trying to satisfy $$ p_(a_t^E| s_1^E, a_1^E, dots, s_t^E) approx p_(a_t| s_1, a_1, dots, s_t).$$ In words, rather than matching expert actions on expert histories, we’re instead trying to match expert and learner trajectory distributions. We’re making sure we predict well based on our own histories and therefore can’t be surprised at test-time by states we didn’t expect to end up in. We prove in our paper that under certain identifiability conditions, on-policy training of sequence model policies is both necessary and sufficient for the learner to generalize well to their own induced state distribution and successfully imitate the expert.

On-policy training involves rolling out the learner’s current policy (in orange) and minimizing some notion of divergence (in red) to expert trajectories (in green). Because we observe our own induced state distribution, we can’t fool ourselves into thinking our own past actions are like those of the expert.

We also conduct experiments on continuous control tasks with unobserved contexts and show that equipping an off-policy learner (in grey) with access to history can actually hurt its performance, while we see no such effect for an on-policy learner (in orange).

Discussion

Confounding commonly occurs in real world situations, including those in which we want to make a sequence of decisions. When we don’t filter out or model the effects of what we don’t observe, we can end up making extremely suboptimal decisions. We describe two situations in which there’s clean algorithms for handling confounding in sequential decision making (temporally correlated noise can be filtered out via IVR, unobserved contexts can be modeled via on-policy training of a sequence model). Moving forward, we’re interested in other causal structures (e.g. negative controls or proxy shifts) and applying our techniques to problems where confounders abound (e.g. recommender systems).

If you’re interested in learning more, I recommend you look at the full papers or the attached code:

DISCLAIMER: All opinions expressed in this post are those of the author and do not represent the views of CMU.

Read More

Google at NeurIPS 2022

Google at NeurIPS 2022

This week marks the beginning of the 36th annual Conference on Neural Information Processing Systems (NeurIPS 2022), the biggest machine learning conference of the year, which is being held in New Orleans, LA. NeurIPS 2022 will be held in person with additional options for virtual attendees, and includes invited talks, demonstrations and presentations of some of the latest in machine learning research. This year, NeurIPS is also offering a new track, called Spotlight Papers, which will provide opportunities to highlight papers presented in prestigious journals that would otherwise not have been eligible for submission.

Google is proud to be a Diamond level sponsor of NeurIPS this year and will have a significant presence year with more than 175 accepted papers, additionally contributing to and learning from the broader academic research community through numerous talks, posters, workshops, and tutorials. You can learn more about our work being presented in the list below (Google affiliations highlighted in bold).

Organizing Committee

General Chairs includes: Sanmi Koyejo

Program Chairs include: Alekh Agarwal

Workshop Chairs include: Hanie Sedghi

Tutorial Chairs include: Adji Bousso Dieng, Jessica Schrouff

Affinity Workshop Chair: Adji Bousso Dieng, Jessica Schrouff

Program Committee, Senior Area Chairs include: Corinna Cortes, Claudio Gentile, Mohammad Ghavamzadeh, Amir Globerson, Elad Hazan, Katherine Heller, Satyen Kale, Been Kim, Sanjiv Kumar, Hugo Larochelle, Sergey Levine, Yishay Mansour, Mehryar Mohri, Tara Sainath, Dale Schuurmans, Daniel Tarlow

NeurIPS Foundation Board Secretary: Michael Mozer

NeurIPS Foundation Board Members include: Corinna Cortes, Isabelle Guyon, Sanmi Koyejo, Hugo Larochelle

NeurIPS Foundation Advisory Board include: Peter Bartlett, Zoubin Ghahramani, John C. Platt, Fernando Pereira, Dale Schuurmans

Keynote Speakers

The Data-Centric Era: How ML is Becoming an Experimental Science
Isabelle Guyon

The Forward-Forward Algorithm for Training Deep Neural Networks
Geoffrey Hinton

Outstanding Paper Award

Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding
Chitwan Saharia, William Chan, Saurabh Saxena, Lala Li, Jay Whang, Emily Denton, Seyed Kamyar Seyed Ghasemipour, Burcu Karagol Ayan, S. Sara Mahdavi, Rapha Gontijo Lopes, Tim Salimans, Jonathan Ho, David J Fleet, Mohammad Norouzi

EXPO Day Workshops

Graph Neural Networks in Tensorflow: A Practical Guide
Workshop Organizers include: Bryan Perozzi, Sami Abu-el-Haija

A Hands-On Introduction to Tensorflow and Jax
Workshop Organizers include: Josh Gordon

Affinity Workshops

LatinX in AI (LXAI)
Platinum Sponsor
Networking & Social Chairs include: Andres Muñoz Medina
Program Committee includes: Johan Obando Ceron

Queer in AI
Panelists include: Sara Beery, Talia Ringer

Women in Machine Learning (WiML)
Platinum Sponsor
Workshop Organizers and Mentorship Chairs include: Beliz Gunel
Mentors include: Adam Roberts, Eleni Triantafillou, Zelda Mariet, Clara Hu, Rosanne Liu, Alekh Agarwal, Vinod Prabhakaran, Rose Yu, Katherine Heller

Workshops

New in ML
Workshop Organizers include: Isabelle Guyon

AI for Accelerated Materials Design (AI4Mat)
Workshop Organizers include: Benjamin Sanchez-Lengeling

All Things Attention: Bridging Different Perspectives on Attention
Invited Speakers and Panelists include: Vidhya Navalpakkam

Efficient Natural Language and Speech Processing (ENLSP-II): The Future of Pre-trained Models
Invited Speakers include: Tara Sainath, Anna Huang
Invited Panelists include: Mohammad Norouzi
Program Committee includes: Wenhu Chen

Federated Learning: Recent Advances and New Challenges
Program Committee includes: Kallista Bonawitz, Zachary Charles, Wenshuo Guo, Peter Kairouz, Zhaozhuo Xu, Zheng Xu

Gaussian Processes, Spatiotemporal Modeling, and Decision-Making Systems
Workshop Organizers include: Zi Wang
Invited Speakers include: Jasper Snoek, Carolina Osorio
Advisory Board includes: Zoubin Ghahramani

Has it Trained Yet? A Workshop for Algorithmic Efficiency in Practical Neural Network Training
Workshop Organizers include: Zachary Nado, George Dahl, Naman Agarwal, Aakanksha Chowdhery
Invited Speakers include: Aakanksha Chowdhery, Priya Goyal

Human in the Loop Learning (HiLL)
Workshop Organizers include: Fisher Yu, Vittorio Ferrari
Invited Speakers include: Dorsa Singh, Igor Mordatch, Ding Zhao

INTERPOLATE — First Workshop on Interpolation Regularizers and Beyond
Workshop Organizers include: Yann Dauphin
Invited Speakers include: Chelsea Finn
Panelists include: Chelsea Finn, Dustin Tran
Program Committee includes: Wang Chen, Kimin Lee

LaReL: Language and Reinforcement Learning
Invited Speakers include: Dorsa Singh, Igor Mordatch

Medical Imaging Meets NeurIPS
Program Committee includes: Chenyu You

Memory in Artificial and Real Intelligence (MemARI)
Program Committee includes: Benjamin Eysenbach, Otilia Stretcu

Meta-Learning
Workshop Organizers include: Eleni Triantafillou
Invited Speakers include: Lucas Byer, Chelsea Finn
Program Committee includes: Ishita Dasgupta, Praneet Dutta, Benjamin Eysenbach, Maximilian Igl, Louis Kirsch, Parsa Mahmoudieh, Marc Pickett, Eleni Triantafillou

New Frontiers in Graph Learning (GLFrontiers)
Workshop Organizers include: Hanjun Dai

Offline Reinforcement Learning Workshop: Offline RL as a “Launchpad”
Workshop Organizers include: Rishabh Agarwal, Aviral Kumar, George Tucker
Invited Speakers include: Dorsa Sadigh

Score-Based Methods
Invited Speakers include: Mohammad Norouzi
Invited Panelists include: Jascha Sohl-Dickstein

Synthetic Data for Empowering ML Research
Invited Speakers include: Mehryar Mohri
Invited Panelists include: Katrina Ligett
Program Committee includes: Jinsung Yoon

Table Representation Learning
Workshop Organizers include: Pengcheng Yin
Invited Speakers include: Xinyun Chen, Carsten Binnig
Panelists include: Julian Eisenschlos
Program Committee includes: Wenhu Chen, Xinyun Chen, Beliz Gunel

A Causal View on Dynamical Systems
Program Committee includes: Rose Yu

Algorithmic Fairness Through the Lens of Causality and Privacy
Workshop Organizers include: Awa Dieng
Invited Speakers include: Nicolas Papernot
Roundtable Leads include: David Madras, Negar Rostamzadeh, Nyalleng Moroosi
Program Committee includes: Matt Kusner

Broadening Research Collaborations in ML
Workshop Organizers include: Rosanne Liu, Pablo Samuel Castro, Sunipa Dev

Decentralization and Trustworthy Machine Learning in Web3: Methodologies, Platforms, and Applications
Invited Speakers include: Peter Kairouz

Distribution Shifts (DistShift): Connecting Methods and Applications
Workshop Organizers include: Becca Roelofs, Chelsea Finn, Jacob Eisenstein, Pang Wei Koh
Invited Speakers include: Sarah Beery

Foundation Models for Decision Making
Workshop Organizers include: Sherry Yang, Yilun Du, Igor Mordatch, Shixiang Shane Gu,Ofir Nachum
Invited Speakers include: Dorsa Sadigh, Dale Schuurmans, Machel Reid
Program Committee includes: Bo Dai, Aleksandra Faust, Hiroki Furuta, Kati Goshvadi, Izzeddin Gur, Austin Huang, Kimin Lee, Kuang-Huei Lee, Lisa Lee, Yingjie Miao, Jordi Orbay, Ted Xiao

Gaze Meets ML
Program Committee includes: Peter Mattson, Mehdi Moradi

I Can’t Believe It’s Not Better: Understanding Deep Learning Through Empirical Falsification
Workshop Organizers include: Javier Antorán
Panelists include: Kevin Murphy

Interactive Learning for Natural Language Processing
Invited Speakers include: Anca Dragan
Program Committees include: Julia Kreutzer, Shunyu Yao

Machine Learning and the Physical Sciences
Workshop Organizers include: Adji Bousso Dieng
Invited Speakers include: Ekin Doğuş Çubuk

Machine Learning for Systems
Workshop Organizers include: Martin Maas, Azade Nova, Dan Zhang
Invited Speakers include: Jeff Dean
Program Committee includes: Milad Hashemi, Kevin Swersky

Machine Learning in Structural Biology
Invited Speakers include: David Fleet

MATH-AI: Toward Human-Level Mathematical Reasoning
Workshop Organizers include: Swaroop Mishra, Yuhuai Wu
Invited Speakers include: Talia Ringer

OPT 2022: Optimization for Machine Learning
Workshop Organizers include: Courtney Paquette

Reinforcement Learning for Real Life (RL4RealLife)
Workshop Organizers include: Minmin Chen
Invited Panelists include: Pablo Samuel Castro
Program Committee includes: Victor Carbune, Bo Chang, Yinlam Chow, Konstantina Christakopoulou, Bo Dai, Hanjun Dai, Aleksandra Faust, Joshua Greaves‎, Chih-wei Hsu, Rahul Kidambi, Srivatsan Krishnan, Iou-Jen Liu, Cong Lu, Jincheng Mei, Chao Qin

Self-Supervised Learning – Theory and Practice
Invited Speakers include: Mathilde Caron

Symmetry and Geometry in Neural Representations (NeurReps)
Invited Speakers include: Noah Shutty
Program Committee includes: Ondrej Biza, Noah Shutty

Temporal Graph Learning Workshop
Invited Speakers include: Mehran Kazemi

Transfer Learning for Natural Language Processing
Workshop Organizers include: Deepak Ramachandran, Sebastian Ruder
Invited Speakers include: Jonas Pfeiffer
Invited Debaters include: Ellie Pavlick
Program Committee includes: Patrick Fernandes, Jonas Pfeiffer, Jiao Sun, Tu Vu, Xinyi Wang, Xin Xu

Cultures of AI and AI for Culture
Workshop Organizers include: Rida Qadri, Fernando Diaz

Deep Reinforcement Learning Workshop
Workshop Organizers include: Karol Hausman, Ted Xiao, Zeyu Zheng
Invited Speakers include: Igor Mordatch
Advisory Board includes: Chelsea Finn

Empowering Communities: A Participatory Approach to AI for Mental Health
Program Committee includes: Diana Mincu, Subhrajit Roy, Martin Seneviratne

HCAI@NeurIPS 2022, Human Centered AI
Keynote Speaker includes: Fernanda Viegas

Learning Meaningful Representations of Life
Workshop Organizers include: Adji Bousso Dieng

Machine Learning for Creativity and Design
Workshop Organizers include: Yingtao Tian

Machine Learning Safety
Workshop Organizers include: Nicholas Carlini
Invited Speakers include: Dorsa Sadigh

Neuro Causal and Symbolic AI (nCSI)
Workshop Organizers include: Thomas Kipf

Robot Learning Workshop: Trustworthy Robotics
Workshop Organizers include: Alex Bewley, Jonathan Tompson
Invited Speakers include: Karol Hausman, Brian Ichter, Been Kim, Leila Takayama, Andy Zeng
Program Committee includes: Vincent Vanhoucke

The Symbiosis of Deep Learning and Differential Equations II
Workshop Organizers include: Winnie Xu
Invited Speakers include: Rose Yu

Tackling Climate Change with Machine Learning
Workshop Organizers include: Emma Strubell

Trustworthy and Socially Responsible Machine Learning
Invited Speakers include: Been Kim, Dorsa Sadigh, Milind Tambe

Vision Transformers: Theory and Applications
Invited Speakers include: Cordelia Schmid, Ming Hsuan Yang

Tutorials

Advances in Bayesian Optimization
Tutorial Organizers include: Virginia Aglietti

Creative Culture and Machine Learning
Tutorial Organizers include: Negar Rostamzadeh

Fair and Socially Responsible ML for Recommendations: Challenges and Perspectives
Invited Panelists include: Fernando Diaz

Lifelong Learning Machines
Invited Panelists include: Christopher Summerfield

The Role of Meta-learning for Few-Shot Learning
Tutorial Organizers include: Eleni Triantafillou
Invited Panelists include: Neil Houlsby, Priyanka Agrawal

Competitions

NeurIPS 2022 Competition Track: Overview & Results
Invited Speakers include: Isabelle Guyon

Causal Insights for Learning Paths in Education
Competition Organizers include: Zichao (Jack) Wang

IGLU: Interactive Grounded Language Understanding in a Collaborative Environment
Competition Organizers include: Negar Arabzadeh

Cross-Domain MetaDL: Any-Way Any-Shot Learning Competition with Novel Datasets from Practical Domains
Competition Organizers include: Isabelle Guyon

Reconnaissance Blind Chess: An Unsolved Challenge for Multi-Agent Decision Making Under Uncertainty
Competition Organizers include: Bo Li

VisDA 2022 Challenge: Sim2Real Domain Adaptation for Industrial Recycling
Competition Organizers include: Dina Bashkirova

Spotlight Papers

CoPur: Certifiably Robust Collaborative Inference via Feature Purification
Jing Liu, Chulin Xie, Oluwasanmi O Koyejo, Bo Li

Machine Learning on Graphs: A Model and Comprehensive Taxonomy
Ines Chami*, Sami Abu-El-Haija, Bryan Perozzi, Christopher Ré, Kevin Murphy

Sparse Winning Tickets are Data-Efficient Image Recognizers
Mukund Varma T, Xuxi Chen, Zhenyu Zhang, Tianlong Chen, Subhashini Venugopalan, Zhangyang Wang

Federated Learning from Pre-trained Models: A Contrastive Learning Approach
Yue Tan, Guodong Long, Jie Ma, Lu Liu, Tianyi Zhou, Jing Jiang

Improving Multi-task Generalization via Regularizing Spurious Correlation
Ziniu Hu*, Zhe Zhao, Xinyang Yi, Tiansheng Yao, Lichan Hong, Yizhou Sun, Ed H. Chi

The Nature of Temporal Difference Errors in Multi-step Distributional Reinforcement Learning
Yunhao Tang, Mark Rowland, Rémi Munos, Bernardo Ávila Pires, Will Dabney, Marc G. Bellemare

Residual Multiplicative Filter Networks for Multiscale Reconstruction
Shayan Shekarforoush, David B. Lindell, David J. Fleet, Marcus A Brubaker

Differentially Private Learning with Margin Guarantees
Raef Bassily, Mehryar Mohri, Ananda Theertha Suresh

Optimal Query Complexities for Dynamic Trace Estimation
David P. Woodruff*, Fred Zhang*, Qiuyi Zhang

Papers

From Gradient Flow on Population Loss to Learning with Stochastic Gradient Descent
Ayush Sekhari, Satyen Kale, Jason D. Lee, Chris De Sa, Karthik Sridharan

On the Global Convergence Rates of Decentralized Softmax Gradient Play in Markov Potential Games
Runyu Zhang, Jincheng Mei, Bo Dai, Dale Schuurmans, Na Li

Matryoshka Representation Learning
Aditya Kusupati, Gantavya Bhatt, Aniket Rege, Matthew Wallingford, Aditya Sinha, Vivek Ramanujan, William Howard-Snyder, Kaifeng Chen, Sham Kakade, Prateek Jain, Ali Farhadi

Efficient Risk-Averse Reinforcement Learning
Ido Greenberg, Yinlam Chow, Mohammad Ghavamzadeh, Shie Mannor

Operator Splitting Value Iteration
Amin Rakhsha, Andrew Wang, Mohammad Ghavamzadeh, Amir-massoud Farahmand

Cluster Randomized Designs for One-Sided Bipartite Experiments
Jennifer Brennan*, Vahab Mirrokni, Jean Pouget-Abadie

A Unified Sequence Interface for Vision Tasks
Ting Chen, Saurabh Saxena, Lala Li, Tsung-Yi Lin*, David J. Fleet, Geoffrey Hinton

Cryptographic Hardness of Learning Halfspaces with Massart Noise
Ilias Diakonikolas, Daniel M. Kane, Pasin Manurangsi, Lisheng Ren

Better Best of Both Worlds Bounds for Bandits with Switching Costs
Idan Amir, Guy Azov, Tomer Koren, Roi Livni

Fast Neural Kernel Embeddings for General Activations
Insu Han, Amir Zandieh, Jaehoon Lee, Roman Novak, Lechao Xiao, Amin Karbasi

Hierarchical Agglomerative Graph Clustering in Poly-Logarithmic Depth
Laxman Dhulipala, David Eisenstat, Jakub Łącki, Vahab Mirronki, Jessica Shi

Improving Zero-Shot Generalization in Offline Reinforcement Learning Using Generalized Similarity Functions
Bogdan Mazoure*, Ilya Kostrikov, Ofir Nachum, Jonathan Tompson

Indicators of Attack Failure: Debugging and Improving Optimization of Adversarial Examples
Maura Pintor, Luca Demetrio, Angelo Sotgiu, Ambra Demontis, Nicholas Carlini, Battista Biggio, Fabio Roli

Learning Energy Networks with Generalized Fenchel-Young Losses
Mathieu Blondel, Felipe Llinares-López, Robert Dadashi, Léonard Hussenot, Matthieu Geist

Learning Robust Dynamics Through Variational Sparse Gating
Arnav Kumar Jain, Shiva Kanth Sujit, Shruti Joshi, Vincent Michalski, Danijar Hafner, Samira Ebrahimi Kahou

Learning to Reason with Neural Networks: Generalization, Unseen Data and Boolean Measures
Arnav Kumar Jain, Shiva Kanth Sujit, Shruti Joshi, Vincent Michalski, Danijar Hafner, Samira Ebrahimi Kahou

So3krates: Equivariant Attention for Interactions on Arbitrary Length-Scales in Molecular Systems
J. Thorben Frank, Oliver T. Unke, Klaus-Robert Müller

Spectral Bias in Practice: The Role of Function Frequency in Generalization
Sara Fridovich-Keil*, Raphael Gontijo-Lopes, Rebecca Roelofs

Delving into Out-of-Distribution Detection with Vision-Language Representations
Yifei Ming, Ziyang Cai, Jiuxiang Gu, Yiyou Sun, Wei Li, Yixuan Li

Path Independent Equilibrium Models Can Better Exploit Test-Time Computation
Cem Anil, Ashwini Pokle, Kaiqu Liang, Johannes Treutlein, Yuhuai Wu, Shaojie Bai, J. Zico Kolter, Roger Grosse

On Optimal Learning Under Targeted Data Poisoning
Steve Hanneke, Amin Karbasi, Mohammad Mahmoody, Idan Mehalel, Shay Moran

Learning With Little Mixing
Ingvar Ziemann, Stephen Tu

Block-Recurrent Transformers
DeLesley Hutchins, Imanol Schlag*, Yuhuai Wu, Ethan Dyer, Behnam Neyshabur

TabNAS: Rejection Sampling for Neural Architecture Search on Tabular Datasets
Chengrun Yang, Gabriel Bender, Hanxiao Liu, Pieter-Jan Kindermans, Madeleine Udell, Yifeng Lu, Quoc Le, Da Huang

Regret Bounds for Multilabel Classification in Sparse Label Regimes
Robert Busa-Fekete, Heejin Choi, Krzysztof Dembczynski, Claudio Gentile, Henry William Reeve, Balazs Szorenyi

Robust Reinforcement Learning Using Offline Data
Kishan Panaganti, Zaiyan Xu, Dileep Kalathil, Mohammad Ghavamzadeh

Contrastive Learning as Goal-Conditioned Reinforcement Learning
Benjamin Eysenbach, Tianjun Zhang, Sergey Levine, Ruslan Salakhutdinov

Beyond Rewards: A Hierarchical Perspective on Offline Multiagent Behavioral Analysis
Shayegan Omidshafiei, Andrei Kapishnikov, Yannick Assogba, Lucas Dixon, Been Kim

Revisiting Neural Scaling Laws in Language and Vision
Ibrahim Alabdulmohsin, Behnam Neyshabur, Xiaohua Zhai

Polynomial Neural Fields for Subband Decomposition and Manipulation
Guandao Yang*, Sagie Benaim, Varun Jampani, Kyle Genova, Jonathan T. Barron, Thomas Funkhouser, Bharath Hariharan, Serge Belongie

First Is Better Than Last for Language Data Influence
Chih-Kuan Yeh, Ankur Taly, Mukund Sundararajan, Frederick Liu, Pradeep Ravikumar

The Privacy Onion Effect: Memorization Is Relative
Nicholas Carlini, Matthew Jagielski, Chiyuan Zhang, Nicolas Papernot, Andreas Terzis, Florian Tramer

Deep Hierarchical Planning from Pixels (see blog post)
Danijar Hafner, Kuang-Huei Lee, Ian Fischer, Pieter Abbeel

Discovered Policy Optimisation
Chris Lu, Jakub Grudzien Kuba, Alistair Letcher, Luke Metz, Christian Schroeder de Witt, Jakob Foerster

Semi-supervised Active Linear Regression
Fnu Devvrit, Nived Rajaraman, Pranjal Awasthi

Pruning’s Effect on Generalization Through the Lens of Training and Regularization
Tian Jin, Daniel M. Roy, Michael Carbin, Jonathan Frankle, Gintare Karolina Dziugaite

Exploring Length Generalization in Large Language Models
Cem Anil*, Yuhuai Wu, Anders Andreassen, Aitor Lewkowycz, Vedant Misra, Vinay Ramasesh, Ambrose Slone, Guy Gur-Ari, Ethan Dyer, Behnam Neyshabur

Fast Stochastic Composite Minimization and an Accelerated Frank-Wolfe Algorithm Under Parallelization
Benjamin Dubois-Taine, Francis Bach, Quentin Berthet, Adrien Taylor

Global Normalization for Streaming Speech Recognition in a Modular Framework
Ehsan Variani, Ke Wu, Michael Riley, David Rybach, Matt Shannon, Cyril Allauzen

Learning Predictions for Algorithms with Predictions
Mikhail Khodak, Maria-Florina Balcan, Ameet Talwalkar, Sergei Vassilvitskii

Multimodal Contrastive Learning with LIMoE: the Language-Image Mixture of Experts (see blog post)
Basil Mustafa, Carlos Riquelme, Joan Puigcerver, Rodolphe Jenatton, Neil Houlsby

Incrementality Bidding via Reinforcement Learning Under Mixed and Delayed Rewards
Ashwinkumar Badanidiyuru, Zhe Feng, Tianxi Li, Haifeng Xu*

Solving Quantitative Reasoning Problems with Language Models (see blog post)
Aitor Lewkowycz, Anders Andreassen, David Dohan, Ethan Dyer, Henryk Michalewski, Vinay Ramasesh, Ambrose Slone, Cem Anil, Imanol Schlag, Theo Gutman-Solo, Yuhuai Wu, Behnam Neyshabur, Guy Gur-Ari, Vedant Misra

Anonymized Histograms in Intermediate Privacy Models
Badih Ghazi, Pritish Kamath, Ravi Kumar, Pasin Manurangsi

Efficient and Stable Fully Dynamic Facility Location
Sayan Bhattacharya, Nikos Parotsidis, Silvio Lattanzi

Are All Losses Created Equal: A Neural Collapse Perspective
Jinxin Zhou, Chong You, Xiao Li, Kangning Liu, Sheng Liu, Qing Qu, Zhihui Zhu

Universal Rates for Interactive Learning
Steve Hanneke, Amin Karbasi, Shay Moran, Grigoris Velegkas

Nearly Optimal Algorithms for Linear Contextual Bandits with Adversarial Corruptions
Jiafan He, Dongruo Zhou, Tong Zhang, Quanquan Gu

Multiclass Learnability Beyond the PAC Framework: Universal Rates and Partial Concept Classes
Alkis Kalavasis, Grigoris Velegkas, Amin Karbasi

Temporal Latent Bottleneck: Synthesis of Fast and Slow Processing Mechanisms in Sequence Learning
Cenk Baykal, Nishanth Dikkala, Rina Panigrahy, Cyrus Rashtchian, Xin Wang

Pre-trained Language Models for Interactive Decision-Making
Shuang Li, Xavier Puig, Chris Paxton, Yilun Du, Clinton Wang, Linxi Fan, Tao Chen, De-An Huang, Ekin Akyürek, Anima Anandkumar, Jacob Andreas, Igor Mordatch, Antonio Torralba, Yuke Zhu

Polynomial Neural Fields for Subband Decomposition and Manipulation
Guandao Yang*, Sagie Benaim, Varun Jampani, Kyle Genova, Jonathan T. Barron, Thomas Funkhouser, Bharath Hariharan, Serge Belongie

Submodular Maximization in Clean Linear Time
Wenxin Li, Moran Feldman, Ehsan Kazemi, Amin Karbasi

Reinforcement Learning with Logarithmic Regret and Policy Switches
Grigoris Velegkas, Zhuoran Yang, Amin Karbasi

Algorithms with Prediction Portfolios
Michael Dinitz, Sungjin Im, Thomas Lavastida, Benjamin Moseley, Sergei Vassilvitskii

Understanding and Improving Robustness of Vision Transformers Through Patch-Based Negative Augmentation
Yao Qin, Chiyuan Zhang, Ting Chen, Balaji Lakshminarayanan, Alex Beutel, Xuezhi Wang

Best of Both Worlds Model Selection
Aldo Pacchiano, Christoph Dann, Claudio Gentile

Fair Wrapping for Black-Box Predictions
Alexander Soen, Ibrahim Alabdulmohsin, Sanmi Koyejo, Yishay Mansour, Nyalleng Moorosi, Richard Nock, Ke Sun, Lexing Xie

A Reduction to Binary Approach for Debiasing Multiclass Datasets
Ibrahim Alabdulmohsin, Jessica Schrouff, Oluwasanmi Koyejo

Weighted Distillation with Unlabeled Examples
Fotis Iliopoulos, Vasilis Kontonis, Cenk Baykal, Gaurav Menghani, Khoa Trihn,Erik Vee

A Closer Look at Learned Optimization: Stability, Robustness, and Inductive Biases
James Harrison, Luke Metz, Jascha Sohl-Dickstein

Post-hoc Estimators for Learning to Defer to an Expert
Harikrishna Narasimhan, Wittawat Jitkrittum, Aditya Krishna Menon, Ankit Singh Rawat, Sanjiv Kumar

Model-Based RL with Optimistic Posterior Sampling: Structural Conditions and Sample Complexity
Alekh Agarwal, Tong Zhang

On the Statistical Efficiency of Reward-Free Exploration in Non-Linear RL
Jinglin Chen, Aditya Modi, Akshay Krishnamurthy, Nan Jiang, Alekh Agarwal

Towards Learning Universal Hyperparameter Optimizers with Transformers (see blog post)
Yutian Chen, Xingyou Song, Chansoo Lee, Zi Wang, Qiuyi Zhang, David Dohan, Kazuya Kawakami, Greg Kochanski, Arnaud Doucet, Marc’aurelio Ranzato, Sagi Perel, Nando de Freitas

Reproducibility in Optimization: Theoretical Framework and Limits
Kwangjun Ahn*, Prateek Jain, Ziwei Ji, Satyen Kale, Praneeth Netrapalli, Gil I. Shamir

Confident Adaptive Language Modeling
Tal Schuster, Adam Fisch, Jai Gupta, Mostafa Dehghani, Dara Bahri, Vinh Q. Tran, Yi Tay, Donald Metzler

Reinforcement Learning with Neural Radiance Fields
Danny Driess, Ingmar Schubert, Pete Florence, Yunzhu Li, Marc Toussaint

Invariant and Transportable Representations for Anti-Causal Domain Shifts
Yibo Jiang, Victor Veitch

Simple Mechanisms for Welfare Maximization in Rich Advertising Auctions
Gagan Aggarwal, Kshipra Bhawalkar, Aranyak Mehta, Divyarthi Mohan, Alexandros Psomas

STaR: Bootstrapping Reasoning with Reasoning
Eric Zelikman, Yuhuai Wu, Jesse Mu, Noah D. Goodman

Stochastic Online Learning with Feedback Graphs: Finite-Time and Asymptotic Optimality
Teodor V. Marinov, Mehryar Mohri, Julian Zimmert

The Curse of Unrolling: Rate of Differentiating Through Optimization
Damien Scieur, Quentin Bertrand, Gauthier Gidel, Fabian Pedregosa

Visual Prompting via Image Inpainting
Amir Bar, Yossi Gandelsman, Trevor Darrell, Amir Globerson, Alexei A Efros

Multi-Class H-Consistency Bounds
Pranjal Awasthi, Anqi Mao, Mehryar Mohri, Yutao Zhong

Anonymous Bandits for Multi-User Systems
Hossein Esfandiari, Vahab Mirrokni, Jon Schneider

Understanding the Eluder Dimension
Gene Li, Pritish Kamath, Dylan J. Foster, Nathan Srebro

Why So Pessimistic? Estimating Uncertainties for Offline RL Through Ensembles, and Why Their Independence Matters
Seyed Kamyar Seyed Ghasemipour, Shixiang Shane Gu, Ofir Nachum

A Best-of-Both-Worlds Algorithm for Bandits with Delayed Feedback
Saeed Masoudian, Julian Zimmert, Yevgeny Seldin

A Theoretical View on Sparsely Activated Networks
Cenk Baykal, Nishanth Dikkala, Rina Panigrahy, Cyrus Rashtchian, Xin Wang

Chain of Thought Prompting Elicits Reasoning in Large Language Models (see blog post)
Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Brian Ichter, Fei Xia, Ed Chi, Quoc Le, Denny Zhou

Decoupled Context Processing for Context Augmented Language Modeling
Zonglin Li, Ruiqi Guo, Sanjiv Kumar

Exploring Through Random Curiosity with General Value Functions
Aditya Ramesh, Louis Kirsch, Sjoerd van Steenkiste, Jürgen Schmidhuber

Object Scene Representation Transformer
Mehdi S. M. Sajjadi, Daniel Duckworth, Aravindh Mahendran, Sjoerd van Steenkiste, Filip Pavetić, Mario Lučić, Leonidas J. Guibas, Klaus Greff, Thomas Kipf

Joint Model-Policy Optimization of a Lower Bound for Model-Based RL
Benjamin Eysenbach, Alexander Khazatsky, Sergey Levine, Ruslan Salakhutdinov

A Fourier Approach to Mixture Learning
Mingda Qiao*, Guru Guruganesh, Ankit Singh Rawat, Avinava Dubey, Manzil Zaheer

Why Neural Networks Find Simple Solutions: The Many Regularizers of Geometric Complexity
Benoit Dherin, Michael Munn, Mihaela Rosca, David Barrett

Do Current Multi-task Optimization Methods in Deep Learning Even Help?
Derrick Xin, Behrooz Ghorbani, Ankush Garg, Orhan Firat, Justin Gilmer

Associating Objects and Their Effects in Video Through Coordination Games
Erika Lu, Forrester Cole, Weidi Xie, Tali Dekel, William Freeman, Andrew Zisserman, Michael Rubinstein

Increasing Confidence in Adversarial Robustness Evaluations
Roland S. Zimmermann*, Wieland Brendel, Florian Tramèr, Nicholas Carlini

The Role of Baselines in Policy Gradient Optimization
Jincheng Mei, Wesley Chung, Valentin Thomas, Bo Dai, Csaba Szepesvari, Dale Schuurmans

Scaling Multimodal Pre-training via Cross-Modality Gradient Harmonization
Junru Wu, Yi Liang, Feng Han, Hassan Akbari, Zhangyang Wang, Cong Yu*

S3GC: Scalable Self-Supervised Graph Clustering
Fnu Devvrit*, Aditya Sinha, Inderjit Dhillon, Prateek Jain

Algorithms and Hardness for Learning Linear Thresholds from Label Proportions
Rishi Saket

ALMA: Hierarchical Learning for Composite Multi-Agent Tasks
Shariq Iqbal, Robby Costales, Fei Sha

DC-BENCH: Dataset Condensation Benchmark
Justin Cui, Ruochen Wang, Si Si, Cho-Jui Hsieh

Does GNN Pre-training Help Molecular Representation?
Ruoxi Sun, Hanjun Dai, Adams Yu

Drawing Out of Distribution with Neuro-Symbolic Generative Models
Yichao Liang, Joshua B. Tenenbaum, Tuan Anh Le, N. Siddharth

Mixture-of-Experts with Expert Choice Routing (see blog post)
Yanqi Zhou, Tao Lei, Hanxiao Liu, Nan Du, Yanping Huang, Vincent Zhao, Andrew Dai, Zhifeng Chen, Quoc Le, James Laudon

Near-Optimal Regret for Adversarial MDP with Delayed Bandit Feedback
Tiancheng Jin, Tal Lancewicki, Haipeng Luo, Yishay Mansour, Aviv Rosenberg

Precise Learning Curves and Higher-Order Scalings for Dot-Product Kernel Regression
Lechao Xiao, Jeffrey Pennington, Theodor Misiakiewicz, Hong Hu, Yue Lu

Rate-Optimal Online Convex Optimization in Adaptive Linear Control
Asaf Cassel, Alon Cohen, Tomer Koren

Why Neural Networks Find Simple Solutions: The Many Regularizers of Geometric Complexity
Benoit Dherin, Michael Munn, Mihaela Rosca, David G.T. Barrett

Private Isotonic Regression
Badih Ghazi, Pritish Kamath, Ravi Kumar, Pasin Manurangsi

Sketching Based Representations for Robust Image Classification with Provable Guarantees
Nishanth Dikkala, Sankeerth Rao Karingula, Raghu Meka, Jelani Nelson, Rina Panigrahy, Xin Wang

The Role of Baselines in Policy Gradient Optimization
Jincheng Mei, Wesley Chung, Valentin Thomas, Bo Dai, Csaba Szepesvari, Dale Schuurmans

Bringing Image Scene Structure to Video via Frame-Clip Consistency of Object Tokens
Elad Ben Avraham, Roei Herzig, Karttikeya Mangalam, Amir Bar, Anna Rohrbach, Leonid Karlinsky, Trevor Darrell, Amir Globerson

Near-Optimal Private and Scalable k-Clustering
Vincent Cohen-Addad, Alessandro Epasto, Vahab Mirrokni, Shyam Narayanan*, Peilin Zhong

When Does Differentially Private Learning Not Suffer in High Dimensions?
Xuechen Li, Daogao Liu, Tatsunori Hashimoto, Huseyin A Inan, Janardhan Kulkarni, YinTat Lee, Abhradeep Guha Thakurta

End-to-End Learning to Index and Search in Large Output Spaces
Nilesh Gupta, Patrick H. Chen, Hsiang-Fu, Yu, Cho-Jui Hsieh, Inderjit S. Dhillon

A Boosting Approach to Reinforcement Learning
Nataly Brukhim, Elad Hazan, Karan Singh

FedRolex: Model-Heterogeneous Federated Learning with Rolling Sub-Model Extraction
Samiul Alam, Luyang Liu, Ming Yan, Mi Zhang

Non-Convex Online Learning via Algorithmic Equivalence
Udaya Ghai, Zhou Lu, Elad Hazan

Is this the Right Neighborhood? Accurate and Query Efficient Model Agnostic Explanations
Amit Dhurandhar, Karthikeyan Natesan Ramamurthy, Karthikeyan Shanmugam

SAVi++: Towards End-to-End Object-Centric Learning from Real-World Videos
Gamaleldin F. Elsayed, Aravindh Mahendran, Sjoerd van Steenkiste, Klaus Greff, Michael C. Mozer, Thomas Kipf

UViM: A Unified Modeling Approach for Vision with Learned Guiding Codes
Alexander Kolesnikov, André Susano Pinto, Lucas Beyer, Xiaohua Zhai, Jeremiah Harmsen, Neil Houlsby

Implicit Regularization or Implicit Conditioning? Exact Risk Trajectories of SGD in High Dimensions
Courtney Paquette, Elliot Paquette, Ben Adlam, Jeffrey Pennington

Multi-game Decision Transformers (see blog post)
Kuang-Huei Lee, Ofir Nachum, Mengjiao Yang, Lisa Lee, Daniel Freeman, Winnie Xu, Sergio Guadarrama, Ian Fischer, Eric Jang, Henryk Michalewski, Igor Mordatch

Subsidiary Prototype Alignment for Universal Domain Adaptation
Jogendra Nath Kundu, Suvaansh Bhambri, Akshay Ravindra Kulkarni, Hiran Sarkar, Varun Jampani, Venkatesh Babu Radhakrishnan

SAMURAI: Shape And Material from Unconstrained Real-world Arbitrary Image collections
Mark Boss*, Andreas Engelhardt*, Abhishek Kar, Yuanzhen Li, Deqing Sun, Jonathan T. Barron, Hendrik P. A. Lensch, Varun Jampani

Chefs’ Random Tables: Non-Trigonometric Random Features
Valerii Likhosherstov, Krzysztof Marcin Choromanski, Avinava Dubey, Frederick Liu, Tamas Sarlos, Adrian Weller

Lottery Tickets on a Data Diet: Finding Initializations with Sparse Trainable Networks
Mansheej Paul, Brett W Larsen, Surya Ganguli, Jonathan Frankle, Gintare Karolina Dziugaite

DP-PCA: Statistically Optimal and Differentially Private PCA
Xiyang Liu, Weihao Kong, Prateek Jain, Sewoong Oh

Emergent Communication: Generalization and Overfitting in Lewis Games
Mathieu Rita, Corentin Tallec, Paul Michel, Jean-Bastien Grill, Olivier Pietquin, Emmanuel Dupoux, Florian Strub

Handcrafted Backdoors in Deep Neural Networks
Sanghyun Hong, Nicholas Carlini, Alexey Kurakin

I2DFormer: Learning Image to Document Attention for Zero-Shot Image Classification
Muhammad Ferjad Naeem, Yongqin Xian, Luc Van Gool, Federico Tombari

Improved Differential Privacy for SGD via Optimal Private Linear Operators on Adaptive Streams
Sergey Denisov, Brendan McMahan, Keith Rush, Adam Smith, Abhradeep Guha Thakurta

Optimal Scaling for Locally Balanced Proposals in Discrete Spaces
Haoran Sun*, Hanjun Dai, Dale Schuurmans

Near-Optimal Correlation Clustering with Privacy
Vincent Cohen-Addad, Chenglin Fan, Silvio Lattanzi, Slobodan Mitrović, Ashkan Norouzi-Fard, Nikos Parotsidis, Jakub Tarnawski

Thor: Wielding Hammers to Integrate Language Models and Automated Theorem Provers
Albert Q. Jiang, Wenda Li, Szymon Tworkowski, Konrad Czechowski, Tomasz Odrzygóźdź, Piotr Miłoś, Yuhuai Wu, Mateja Jamnik

TPU-KNN: K Nearest Neighbor Search at Peak FLOP/s
Felix Chern, Blake Hechtman, Andy Davis, Ruiqi Guo, David Majnemer, Sanjiv Kumar

When Does Dough Become a Bagel? Analyzing the Remaining Mistakes on ImageNet
Vijay Vasudevan, Benjamin Caine, Raphael Gontijo-Lopes, Sara Fridovich-Keil, Rebecca Roelofs

DASCO: Dual-Generator Adversarial Support Constrained Offline Reinforcement Learning
Quan Vuong, Aviral Kumar, Sergey Levine, Yevgen Chebotar

A Characterization of Semi-Supervised Adversarially Robust PAC Learnability
Idan Attias, Steve Hanneke, Yishay Mansour

Back Razor: Memory-Efficient Transfer Learning by Self-Sparsified Backpropagation
Ziyu Jiang, Xuxi Chen, Xueqin Huang, Xianzhi Du, Denny Zhou, Zhangyang Wang

Subquadratic Kronecker Regression with Applications to Tensor Decomposition
Matthew Fahrbach, Gang Fu, Mehrdad Ghadiri

Zero-Shot Transfer Learning Within a Heterogeneous Graph via Knowledge Transfer Networks
Minji Yoon*, John Palowitch, Dustin Zelle, Ziniu Hu*, Ruslan Salakhutdinov, Bryan Perozzi

Differentially Private Graph Learning via Sensitivity-Bounded Personalized PageRank
Alessandro Epasto, Vahab Mirrokni, Bryan Perozzi, Anton Tsitsulin, Peilin Zhong

Reincarnating Reinforcement Learning: Reusing Prior Computation to Accelerate Progress (see blog post)
Rishabh Agarwal, Max Schwarzer, Pablo Samuel Castro, Aaron Courville, Marc G. Bellemare

Private and Communication-Efficient Algorithms for Entropy Estimation
Gecia Bravo-Hermsdorff, Robert Busa-Fekete, Mohammad Ghavamzadeh, Andres Munoz Medina, Umar Syed

Oracle Inequalities for Model Selection in Offline Reinforcement Learning
Jonathan Lee, George Tucker, Ofir Nachum, Bo Dai, Emma Brunskill

Diagnosing Failures of Fairness Transfer Across Distribution Shift in Real-World Medical Settings
Jessica Schrouff*, Natalie Harris, Oluwasanmi O Koyejo, Ibrahim Alabdulmohsin, Eva Schnider*, Krista Opsahl-Ong, Alexander Brown, Subhrajit Roy, Diana Mincu, Christina Chen, Awa Dieng, Yuan Liu, Vivek Natarajan, Alan Karthikesalingam, Katherine A Heller, Silvia Chiappa, Alexander D’Amour

LASSIE: Learning Articulated Shapes from Sparse Image Ensemble via 3D Part Discovery
Chun-Han Yao, Wei-Chih Hung, Yuanzhen Li, Michael Rubinstein, Ming-Hsuan Yang, Varun Jampani

Patching Open-Vocabulary Models by Interpolating Weights
Gabriel Ilharco, Mitchell Wortsman, Samir Yitzhak Gadre, Shuran Song, Hannaneh Hajishirzi, Simon Kornblith, Ali Farhadi, Ludwig Schmidt

TUSK: Task-Agnostic Unsupervised Keypoints
Yuhe Jin, Weiwei Sun, Jan Hosang, Eduard Trulls, Kwang Moo Yi

Active Learning of Classifiers with Label and Seed Queries
Marco Bressan, Nicolò Cesa-Bianchi, Silvio Lattanzi, Andrea Paudice, Maximilian Thiessen

Autoformalization with Large Language Models
Yuhuai Wu, Albert Q. Jiang, Wenda Li, Markus N. Rabe, Charles Staats, Mateja Jamnik, Christian Szegedy

Benign Underfitting of Stochastic Gradient Descent
Tomer Koren, Roi Livni, Yishay Mansour, Uri Sherman

Chain of Thought Imitation with Procedure Cloning
Mengjiao Yang, Dale Schuurmans, Pieter Abbeel, Ofir Nachum

Efficient and Modular Implicit Differentiation
Mathieu Blondel, Quentin Berthet, Marco Cuturi, Roy Frostig, Stephan Hoyer, Felipe Llinares-López, Fabian Pedregosa, Jean-Philippe Vert

Insights into Pre-training via Simpler Synthetic Tasks
Yuhuai Wu, Felix Li, Percy Liang

Self-Supervised Learning with an Information Maximization Criterion
Serdar Ozsoy, Shadi Hamdan, Sercan Ö. Arik, Deniz Yuret, Alper T. Erdogan

Trimmed Maximum Likelihood Estimation for Robust Generalized Linear Model
Weihao Kong, Rajat Sen, Pranjal Awasthi, Abhimanyu Das

Using Embeddings for Causal Estimation of Peer Influence in Social Networks
Irina Cristali, Victor Veitch

VCT: A Video Compression Transformer
Fabian Mentzer, George Toderici, David Minnen, Sung-Jin Hwang, Sergi Caelles, Mario Lucic, Eirikur Agustsson

Video Diffusion Models
Jonathan Ho, Tim Salimans, Alexey Gritsenko, William Chan, Mohammad Norouzi, David J. Fleet

Large Language Models are Zero-Shot Reasoners
Takeshi Kojima, Shixiang Shane Gu, Machel Reid, Yutaka Matsuo, Yusuke Iwasawa

Improved Coresets for Euclidean k-Means
Vincent Cohen-Addad, Kasper Green Larsen, David Saulpic, Chris Schwiegelshohn, Omar Ali Sheikh-Omar

On the Adversarial Robustness of Mixture of Experts
Joan Puigcerver, Rodolphe Jenatton, Carlos Riquelme Ruiz, Pranjal Awasthi, Srinadh Bhojanapalli

Stars: Tera-Scale Graph Building for Clustering and Learning
CJ Carey, Jonathan Halcrow, Rajesh Jayaram, Vahab Mirrokni, Warren Schudy, Peilin Zhong

VER: Scaling On-Policy RL Leads to the Emergence of Navigation in Embodied Rearrangement
Erik Wijmans, Irfan Essa, Dhruv Batra

TaSIL: Taylor Series Imitation Learning
Daniel Pfrommer, Thomas TCK Zhang, Stephen Tu, Nikolai Matni

RNNs of RNNs: Recursive Construction of Stable Assemblies of Recurrent Neural Networks
Leo Kozachkov, Michaela M Ennis, Jean-Jacques Slotine

Integral Probability Metrics PAC-Bayes Bounds
Ron Amit, Baruch Epstein, Shay Moran, Ron Meir

D2NeRF: Self-Supervised Decoupling of Dynamic and Static Objects from a Monocular Video
Tianhao Wu, Fangcheng Zhong, Andrea Tagliasacchi, Forrester Cole, Cengiz Oztireli

Posted Pricing and Dynamic Prior-Independent Mechanisms with Value Maximizers
Yuan Deng, Vahab Mirrokni, Hanrui Zhang

Transformer Memory as a Differentiable Search Index
Yi Tay, Vinh Q. Tran, Mostafa Dehghani, Jianmo Ni, Dara Bahri, Harsh Mehta, Zhen Qin, Kai Hui, Zhe Zhao, Jai Gupta, Tal Schuster, William W. Cohen, Donald Metzler



*Work done while at Google.  

Read More