Leveraging Large Language Models for Exploiting ASR Uncertainty

With the help of creative prompt engineering and in-context learning, large language models (LLMs) are known to generalize well on a variety of text-based natural language processing (NLP) tasks. However, for performing well on spoken language understanding (SLU) tasks, LLMs either need to be equipped with in-built speech modality or they need to rely on speech-to-text conversion from an off-the-shelf automation speech recognition (ASR) system. In this work, we focus on the latter setup where the accuracy of LLM on SLU tasks is constrained by the accuracy of a frozen ASR system on the given…Apple Machine Learning Research

Training Large-Vocabulary Neural Language Model by Private Federated Learning for Resource-Constrained Devices

*= Equal Contributors
Federated Learning (FL) is a technique to train models using data distributed across devices. Differential Privacy (DP) provides a formal privacy guarantee for sensitive data. Our goal is to train a large neural network language model (NNLM) on compute-constrained devices while preserving privacy using FL and DP. However, the DP-noise introduced to the model increases as the model size grows, which often prevents convergence. We propose Partial Embedding Updates (PEU), a novel technique to decrease noise by decreasing payload size. Furthermore, we adopt Low Rank…Apple Machine Learning Research

Importance of Smoothness Induced by Optimizers in FL4ASR: Towards Understanding Federated Learning for End-to-End ASR

In this paper, we start by training End-to-End Automatic Speech Recognition (ASR) models using Federated Learning (FL) and examining the fundamental considerations that can be pivotal in minimizing the performance gap in terms of word error rate between models trained using FL versus their centralized counterpart. Specifically, we study the effect of (i) adaptive optimizers, (ii) loss characteristics via altering Connectionist Temporal Classification (CTC) weight, (iii) model initialization through seed start, (iv) carrying over modeling setup from experiences in centralized training to FL…Apple Machine Learning Research

Bootstrap Your Own Variance

This paper was accepted at the workshop Self-Supervised Learning – Theory and Practice at NeurIPS 2023.
*=Equal Contributors
Understanding model uncertainty is important for many applications. We propose Bootstrap Your Own Variance (BYOV), combining Bootstrap Your Own Latent (BYOL), a negative-free Self-Supervised Learning (SSL) algorithm, with Bayes by Backprop (BBB), a Bayesian method for estimating model posteriors. We find that the learned predictive std of BYOV vs. a supervised BBB model is well captured by a Gaussian distribution, providing preliminary evidence that the learned parameter…Apple Machine Learning Research

DataComp: In Search of the Next Generation of Multimodal Datasets

*=Equal Contributors
Multimodal datasets are a critical component in recent breakthroughs such as Stable Diffusion and GPT-4, yet their design does not receive the same research attention as model architectures or training algorithms. To address this shortcoming in the ML ecosystem, we introduce DataComp, a testbed for dataset experiments centered around a new candidate pool of 12.8 billion image-text pairs from Common Crawl. Participants in our benchmark design new filtering techniques or curate new data sources and then evaluate their new dataset by running our standardized CLIP training…Apple Machine Learning Research

Advancements in machine learning for machine learning

Advancements in machine learning for machine learning

With the recent and accelerated advances in machine learning (ML), machines can understand natural language, engage in conversations, draw images, create videos and more. Modern ML models are programmed and trained using ML programming frameworks, such as TensorFlow, JAX, PyTorch, among many others. These libraries provide high-level instructions to ML practitioners, such as linear algebra operations (e.g., matrix multiplication, convolution, etc.) and neural network layers (e.g., 2D convolution layers, transformer layers). Importantly, practitioners need not worry about how to make their models run efficiently on hardware because an ML framework will automatically optimize the user’s model through an underlying compiler. The efficiency of the ML workload, thus, depends on how good the compiler is. A compiler typically relies on heuristics to solve complex optimization problems, often resulting in suboptimal performance.

In this blog post, we present exciting advancements in ML for ML. In particular, we show how we use ML to improve efficiency of ML workloads! Prior works, both internal and external, have shown that we can use ML to improve performance of ML programs by selecting better ML compiler decisions. Although there exist a few datasets for program performance prediction, they target small sub-programs, such as basic blocks or kernels. We introduce “TpuGraphs: A Performance Prediction Dataset on Large Tensor Computational Graphs” (presented at NeurIPS 2023), which we recently released to fuel more research in ML for program optimization. We hosted a Kaggle competition on the dataset, which recently completed with 792 participants on 616 teams from 66 countries. Furthermore, in “Learning Large Graph Property Prediction via Graph Segment Training”, we cover a novel method to scale graph neural network (GNN) training to handle large programs represented as graphs. The technique both enables training arbitrarily large graphs on a device with limited memory capacity and improves generalization of the model.

ML compilers

ML compilers are software routines that convert user-written programs (here, mathematical instructions provided by libraries such as TensorFlow) to executables (instructions to execute on the actual hardware). An ML program can be represented as a computation graph, where a node represents a tensor operation (such as matrix multiplication), and an edge represents a tensor flowing from one node to another. ML compilers have to solve many complex optimization problems, including graph-level and kernel-level optimizations. A graph-level optimization requires the context of the entire graph to make optimal decisions and transforms the entire graph accordingly. A kernel-level optimization transforms one kernel (a fused subgraph) at a time, independently of other kernels.

Important optimizations in ML compilers include graph-level and kernel-level optimizations.

To provide a concrete example, imagine a matrix (2D tensor):

It can be stored in computer memory as [A B C a b c] or [A a B b C c], known as row- and column-major memory layout, respectively. One important ML compiler optimization is to assign memory layouts to all intermediate tensors in the program. The figure below shows two different layout configurations for the same program. Let’s assume that on the left-hand side, the assigned layouts (in red) are the most efficient option for each individual operator. However, this layout configuration requires the compiler to insert a copy operation to transform the memory layout between the add and convolution operations. On the other hand, the right-hand side configuration might be less efficient for each individual operator, but it doesn’t require the additional memory transformation. The layout assignment optimization has to trade off between local computation efficiency and layout transformation overhead.

A node represents a tensor operator, annotated with its output tensor shape [n0, n1, …], where ni is the size of dimension i. Layout {d0, d1, …} represents minor-to-major ordering in memory. Applied configurations are highlighted in red, and other valid configurations are highlighted in blue. A layout configuration specifies the layouts of inputs and outputs of influential operators (i.e., convolution and reshape). A copy operator is inserted when there is a layout mismatch.

If the compiler makes optimal choices, significant speedups can be made. For example, we have seen up to a 32% speedup when choosing an optimal layout configuration over the default compiler’s configuration in the XLA benchmark suite.

TpuGraphs dataset

Given the above, we aim to improve ML model efficiency by improving the ML compiler. Specifically, it can be very effective to equip the compiler with a learned cost model that takes in an input program and compiler configuration and then outputs the predicted runtime of the program.

With this motivation, we release TpuGraphs, a dataset for learning cost models for programs running on Google’s custom Tensor Processing Units (TPUs). The dataset targets two XLA compiler configurations: layout (generalization of row- and column-major ordering, from matrices, to higher dimension tensors) and tiling (configurations of tile sizes). We provide download instructions and starter code on the TpuGraphs GitHub. Each example in the dataset contains a computational graph of an ML workload, a compilation configuration, and the execution time of the graph when compiled with the configuration. The graphs in the dataset are collected from open-source ML programs, featuring popular model architectures, e.g., ResNet, EfficientNet, Mask R-CNN, and Transformer. The dataset provides 25× more graphs than the largest (earlier) graph property prediction dataset (with comparable graph sizes), and graph size is 770× larger on average compared to existing performance prediction datasets on ML programs. With this greatly expanded scale, for the first time we can explore the graph-level prediction task on large graphs, which is subject to challenges such as scalability, training efficiency, and model quality.

Scale of TpuGraphs compared to other graph property prediction datasets.

We provide baseline learned cost models with our dataset (architecture shown below). Our baseline models are based on a GNN since the input program is represented as a graph. Node features, shown in blue below, consist of two parts. The first part is an opcode id, the most important information of a node, which indicates the type of tensor operation. Our baseline models, thus, map an opcode id to an opcode embedding via an embedding lookup table. The opcode embedding is then concatenated with the second part, the rest of the node features, as inputs to a GNN. We combine the node embeddings produced by the GNN to create the fixed-size embedding of the graph using a simple graph pooling reduction (i.e., sum and mean). The resulting graph embedding is then linearly transformed into the final scalar output by a feedforward layer.

Our baseline learned cost model employs a GNN since programs can be naturally represented as graphs.

Furthermore we present Graph Segment Training (GST), a method for scaling GNN training to handle large graphs on a device with limited memory capacity in cases where the prediction task is on the entire-graph (i.e., graph-level prediction). Unlike scaling training for node- or edge-level prediction, scaling for graph-level prediction is understudied but crucial to our domain, as computation graphs can contain hundreds of thousands of nodes. In a typical GNN training (“Full Graph Training”, on the left below), a GNN model is trained using an entire graph, meaning all nodes and edges of the graph are used to compute gradients. For large graphs, this might be computationally infeasible. In GST, each large graph is partitioned into smaller segments, and a random subset of segments is selected to update the model; embeddings for the remaining segments are produced without saving their intermediate activations (to avoid consuming memory). The embeddings of all segments are then combined to generate an embedding for the original large graph, which is then used for prediction. In addition, we introduce the historical embedding table to efficiently obtain graph segments’ embeddings and segment dropout to mitigate the staleness from historical embeddings. Together, our complete method speeds up the end-to-end training time by 3×.

Comparing Full Graph Training (typical method) vs Graph Segment Training (our proposed method).

Kaggle competition

Finally, we ran the “Fast or Slow? Predict AI Model Runtime” competition over the TpuGraph dataset. This competition ended with 792 participants on 616 teams. We had 10507 submissions from 66 countries. For 153 users (including 47 in the top 100), this was their first competition. We learned many interesting new techniques employed by the participating teams, such as:

  • Graph pruning / compression: Instead of using the GST method, many teams experimented with different ways to compress large graphs (e.g., keeping only subgraphs that include the configurable nodes and their immediate neighbors).
  • Feature padding value: Some teams observed that the default padding value of 0 is problematic because 0 clashes with a valid feature value, so using a padding value of -1 can improve the model accuracy significantly.
  • Node features: Some teams observed that additional node features (such as dot general’s contracting dimensions) are important. A few teams found that different encodings of node features also matter.
  • Cross-configuration attention: A winning team designed a simple layer that allows the model to explicitly “compare” configs against each other. This technique is shown to be much better than letting the model infer for each config individually.

We will debrief the competition and preview the winning solutions at the competition session at the ML for Systems workshop at NeurIPS on December 16, 2023. Finally, congratulations to all the winners and thank you for your contributions to advancing research in ML for systems!

NeurIPS expo

If you are interested in more research about structured data and artificial intelligence, we hosted the NeurIPS Expo panel Graph Learning Meets Artificial Intelligence on December 9, which covered advancing learned cost models and more!

Acknowledgements

Sami Abu-el-Haija (Google Research) contributed significantly to this work and write-up. The research in this post describes joint work with many additional collaborators including Mike Burrows, Kaidi Cao, Bahare Fatemi, Jure Leskovec, Charith Mendis, Dustin Zelle, and Yanqi Zhou.

Read More

StyleDrop: Text-to-image generation in any style

StyleDrop: Text-to-image generation in any style

Text-to-image models trained on large volumes of image-text pairs have enabled the creation of rich and diverse images encompassing many genres and themes. Moreover, popular styles such as “anime” or “steampunk”, when added to the input text prompt, may translate to specific visual outputs. While many efforts have been put into prompt engineering, a wide range of styles are simply hard to describe in text form due to the nuances of color schemes, illumination, and other characteristics. As an example, “watercolor painting” may refer to various styles, and using a text prompt that simply says “watercolor painting style” may either result in one specific style or an unpredictable mix of several.

When we refer to “watercolor painting style,” which do we mean? Instead of specifying the style in natural language, StyleDrop allows the generation of images that are consistent in style by referring to a style reference image*.

In this blog we introduce “StyleDrop: Text-to-Image Generation in Any Style”, a tool that allows a significantly higher level of stylized text-to-image synthesis. Instead of seeking text prompts to describe the style, StyleDrop uses one or more style reference images that describe the style for text-to-image generation. By doing so, StyleDrop enables the generation of images in a style consistent with the reference, while effectively circumventing the burden of text prompt engineering. This is done by efficiently fine-tuning the pre-trained text-to-image generation models via adapter tuning on a few style reference images. Moreover, by iteratively fine-tuning the StyleDrop on a set of images it generated, it achieves the style-consistent image generation from text prompts.

Method overview

StyleDrop is a text-to-image generation model that allows generation of images whose visual styles are consistent with the user-provided style reference images. This is achieved by a couple of iterations of parameter-efficient fine-tuning of pre-trained text-to-image generation models. Specifically, we build StyleDrop on Muse, a text-to-image generative vision transformer.

Muse: text-to-image generative vision transformer

Muse is a state-of-the-art text-to-image generation model based on the masked generative image transformer (MaskGIT). Unlike diffusion models, such as Imagen or Stable Diffusion, Muse represents an image as a sequence of discrete tokens and models their distribution using a transformer architecture. Compared to diffusion models, Muse is known to be faster while achieving competitive generation quality.

Parameter-efficient adapter tuning

StyleDrop is built by fine-tuning the pre-trained Muse model on a few style reference images and their corresponding text prompts. There have been many works on parameter-efficient fine-tuning of transformers, including prompt tuning and Low-Rank Adaptation (LoRA) of large language models. Among those, we opt for adapter tuning, which is shown to be effective at fine-tuning a large transformer network for language and image generation tasks in a parameter-efficient manner. For example, it introduces less than one million trainable parameters to fine-tune a Muse model of 3B parameters, and it requires only 1000 training steps to converge.

Parameter-efficient adapter tuning of Muse.

Iterative training with feedback

While StyleDrop is effective at learning styles from a few style reference images, it is still challenging to learn from a single style reference image. This is because the model may not effectively disentangle the content (i.e., what is in the image) and the style (i.e., how it is being presented), leading to reduced text controllability in generation. For example, as shown below in Step 1 and 2, a generated image of a chihuahua from StyleDrop trained from a single style reference image shows a leakage of content (i.e., the house) from the style reference image. Furthermore, a generated image of a temple looks too similar to the house in the reference image (concept collapse).

We address this issue by training a new StyleDrop model on a subset of synthetic images, chosen by the user or by image-text alignment models (e.g., CLIP), whose images are generated by the first round of the StyleDrop model trained on a single image. By training on multiple synthetic image-text aligned images, the model can easily disentangle the style from the content, thus achieving improved image-text alignment.

Iterative training with feedback*. The first round of StyleDrop may result in reduced text controllability, such as a content leakage or concept collapse, due to the difficulty of content-style disentanglement. Iterative training using synthetic images, generated by the previous rounds of StyleDrop models and chosen by human or image-text alignment models, improves the text adherence of stylized text-to-image generation.

Experiments

StyleDrop gallery

We show the effectiveness of StyleDrop by running experiments on 24 distinct style reference images. As shown below, the images generated by StyleDrop are highly consistent in style with each other and with the style reference image, while depicting various contexts, such as a baby penguin, banana, piano, etc. Moreover, the model can render alphabet images with a consistent style.

Stylized text-to-image generation. Style reference images* are on the left inside the yellow box.
Text prompts used are:

First row: a baby penguin, a banana, a bench.
Second row: a butterfly, an F1 race car, a Christmas tree.
Third row: a coffee maker, a hat, a moose.
Fourth row: a robot, a towel, a wood cabin.
Stylized visual character generation. Style reference images* are on the left inside the yellow box.
Text prompts used are: (first row) letter ‘A’, letter ‘B’, letter ‘C’, (second row) letter ‘E’, letter ‘F’, letter ‘G’.

Generating images of my object in my style

Below we show generated images by sampling from two personalized generation distributions, one for an object and another for the style.

Images at the top in the blue border are object reference images from the DreamBooth dataset (teapot, vase, dog and cat), and the image on the left at the bottom in the red border is the style reference image*. Images in the purple border (i.e. the four lower right images) are generated from the style image of the specific object.

Quantitative results

For the quantitative evaluation, we synthesize images from a subset of Parti prompts and measure the image-to-image CLIP score for style consistency and image-to-text CLIP score for text consistency. We study non–fine-tuned models of Muse and Imagen. Among fine-tuned models, we make a comparison to DreamBooth on Imagen, state-of-the-art personalized text-to-image method for subjects. We show two versions of StyleDrop, one trained from a single style reference image, and another, “StyleDrop (HF)”, that is trained iteratively using synthetic images with human feedback as described above. As shown below, StyleDrop (HF) shows significantly improved style consistency score over its non–fine-tuned counterpart (0.694 vs. 0.556), as well as DreamBooth on Imagen (0.694 vs. 0.644). We observe an improved text consistency score with StyleDrop (HF) over StyleDrop (0.322 vs. 0.313). In addition, in a human preference study between DreamBooth on Imagen and StyleDrop on Muse, we found that 86% of the human raters preferred StyleDrop on Muse over DreamBooth on Imagen in terms of consistency to the style reference image.

Conclusion

StyleDrop achieves style consistency at text-to-image generation using a few style reference images. Google’s AI Principles guided our development of Style Drop, and we urge the responsible use of the technology. StyleDrop was adapted to create a custom style model in Vertex AI, and we believe it could be a helpful tool for art directors and graphic designers — who might want to brainstorm or prototype visual assets in their own styles, to improve their productivity and boost their creativity — or businesses that want to generate new media assets that reflect a particular brand. As with other generative AI capabilities, we recommend that practitioners ensure they align with copyrights of any media assets they use. More results are found on our project website and YouTube video.

Acknowledgements

This research was conducted by Kihyuk Sohn, Nataniel Ruiz, Kimin Lee, Daniel Castro Chin, Irina Blok, Huiwen Chang, Jarred Barber, Lu Jiang, Glenn Entis, Yuanzhen Li, Yuan Hao, Irfan Essa, Michael Rubinstein, and Dilip Krishnan. We thank owners of images used in our experiments (links for attribution) for sharing their valuable assets.


*See image sources 

Read More

Use Amazon DocumentDB to build no-code machine learning solutions in Amazon SageMaker Canvas

Use Amazon DocumentDB to build no-code machine learning solutions in Amazon SageMaker Canvas

We are excited to announce the launch of Amazon DocumentDB (with MongoDB compatibility) integration with Amazon SageMaker Canvas, allowing Amazon DocumentDB customers to build and use generative AI and machine learning (ML) solutions without writing code. Amazon DocumentDB is a fully managed native JSON document database that makes it straightforward and cost-effective to operate critical document workloads at virtually any scale without managing infrastructure. Amazon SageMaker Canvas is a no-code ML workspace offering ready-to-use models, including foundation models, and the ability to prepare data and build and deploy custom models.

In this post, we discuss how to bring data stored in Amazon DocumentDB into SageMaker Canvas and use that data to build ML models for predictive analytics. Without creating and maintaining data pipelines, you will be able to power ML models with your unstructured data stored in Amazon DocumentDB.

Solution overview

Let’s assume the role of a business analyst for a food delivery company. Your mobile app stores information about restaurants in Amazon DocumentDB because of its scalability and flexible schema capabilities. You want to gather insights on this data and build an ML model to predict how new restaurants will be rated, but find it challenging to perform analytics on unstructured data. You encounter bottlenecks because you need to rely on data engineering and data science teams to accomplish these goals.

This new integration solves these problems by making it simple to bring Amazon DocumentDB data into SageMaker Canvas and immediately start preparing and analyzing data for ML. Additionally, SageMaker Canvas removes the dependency on ML expertise to build high-quality models and generate predictions.

We demonstrate how to use Amazon DocumentDB data to build ML models in SageMaker Canvas in the following steps:

  1. Create an Amazon DocumentDB connector in SageMaker Canvas.
  2. Analyze data using generative AI.
  3. Prepare data for machine learning.
  4. Build a model and generate predictions.

Prerequisites

To implement this solution, complete the following prerequisites:

  1. Have AWS Cloud admin access with an AWS Identity and Access Management (IAM) user with permissions required to complete the integration.
  2. Complete the environment setup using AWS CloudFormation through either of the following options:
    1. Deploy a CloudFormation template into a new VPC – This option builds a new AWS environment that consists of the VPC, private subnets, security groups, IAM execution roles, Amazon Cloud9, required VPC endpoints, and SageMaker domain. It then deploys Amazon DocumentDB into this new VPC. Download the template or quick launch the CloudFormation stack by choosing Launch Stack:
      Launch CloudFormation stack
    2. Deploy a CloudFormation template into an existing VPC – This option creates the required VPC endpoints, IAM execution roles, and SageMaker domain in an existing VPC with private subnets. Download the template or quick launch the CloudFormation stack by choosing Launch Stack:
      Launch CloudFormation stack

Note that if you’re creating a new SageMaker domain, you must configure the domain to be in a private VPC without internet access to be able to add the connector to Amazon DocumentDB. To learn more, refer to Configure Amazon SageMaker Canvas in a VPC without internet access.

  1. Follow the tutorial to load sample restaurant data into Amazon DocumentDB.
  2. Add access to Amazon Bedrock and the Anthropic Claude model within it. For more information, see Add model access.

Create an Amazon DocumentDB connector in SageMaker Canvas

After you create your SageMaker domain, complete the following steps:

  1. On the Amazon DocumentDB console, choose No-code machine learning in the navigation pane.
  2. Under Choose a domain and profile¸ choose your SageMaker domain and user profile.
  3. Choose Launch Canvas to launch SageMaker Canvas in a new tab.

When SageMaker Canvas finishes loading, you will land on the Data flows tab.

  1. Choose Create to create a new data flow.
  2. Enter a name for your data flow and choose Create.
  3. Add a new Amazon DocumentDB connection by choosing Import data, then choose Tabular for Dataset type.
  4. On the Import data page, for Data Source, choose DocumentDB and Add Connection.
  5. Enter a connection name such as demo and choose your desired Amazon DocumentDB cluster.

Note that SageMaker Canvas will prepopulate the drop-down menu with clusters in the same VPC as your SageMaker domain.

  1. Enter a user name, password, and database name.
  2. Finally, select your read preference.

To protect the performance of primary instances, SageMaker Canvas defaults to Secondary, meaning that it will only read from secondary instances. When read preference is Secondary preferred, SageMaker Canvas reads from available secondary instances, but will read from the primary instance if a secondary instance is not available. For more information on how to configure an Amazon DocumentDB connection, see the Connect to a database stored in AWS.

  1. Choose Add connection.

If the connection is successful, you will see collections in your Amazon DocumentDB database shown as tables.

  1. Drag your table of choice to the blank canvas. For this post, we add our restaurant data.

The first 100 rows are displayed as a preview.

  1. To start analyzing and preparing your data, choose Import data.
  2. Enter a dataset name and choose Import data.

Analyze data using generative AI

Next, we want to get some insights on our data and look for patterns. SageMaker Canvas provides a natural language interface to analyze and prepare data. When the Data tab loads, you can start chatting with your data with the following steps:

  1. Choose Chat for data prep.
  2. Gather insights about your data by asking questions like the samples shown in the following screenshots.

To learn more about how to use natural language to explore and prepare data, refer to Use natural language to explore and prepare data with a new capability of Amazon SageMaker Canvas.

Let’s get a deeper sense of our data quality by using the SageMaker Canvas Data Quality and Insights Report, which automatically evaluates data quality and detects abnormalities.

  1. On the Analyses tab, choose Data Quality and Insights Report.
  2. Choose rating as the target column and Regression as the problem type, then choose Create.

This will simulate model training and provide insights on how we can improve our data for machine learning. The complete report is generated in a few minutes.

Our report shows that 2.47% of rows in our target have missing values—we’ll address that in the next step. Additionally, the analysis shows that the address line 2, name, and type_of_food features have the most prediction power in our data. This indicates basic restaurant information like location and cuisine may have an outsized impact on ratings.

Prepare data for machine learning

SageMaker Canvas offers over 300 built-in transformations to prepare your imported data. For more information on transformation features of SageMaker Canvas, refer to Prepare data with advanced transformations. Let’s add some transformations to get our data ready for training an ML model.

  1. Navigate back to the Data flow page by choosing the name of your data flow at the top of the page.
  2. Choose the plus sign next to Data types and choose Add transform.
  3. Choose Add step.
  4. Let’s rename the address line 2 column to cities.
    1. Choose Manage columns.
    2. Choose Rename column for Transform.
    3. Choose address line 2 for Input column, enter cities for New name, and choose Add.
  5. Additionally, lets drop some unnecessary columns.
    1. Add a new transform.
    2. For Transform, choose Drop column.
    3. For Columns to drop, choose URL and restaurant_id.
    4. Choose Add.
      [
  6. Our rating feature column has some missing values, so let’s fill in those rows with the average value of this column.
    1. Add a new transform.
    2. For Transform, choose Impute.
    3. For Column type, choose Numeric.
    4. For Input columns, choose the rating column.
    5. For Imputing strategy, choose Mean.
    6. For Output column, enter rating_avg_filled.
    7. Choose Add.
  7. We can drop the rating column because we have a new column with filled values.
  8. Because type_of_food is categorical in nature, we’ll want to numerically encode it. Let’s encode this feature using the one-hot encoding technique.
    1. Add a new transform.
    2. For Transform, choose One-hot encode.
    3. For Input columns, choose type_of_food.
    4. For Invalid handling strategy¸ choose Keep.
    5. For Output style¸ choose Columns.
    6. For Output column, enter encoded.
    7. Choose Add.

Build a model and generate predictions

Now that we have transformed our data, let’s train a numeric ML model to predict the ratings for restaurants.

  1. Choose Create model.
  2. For Dataset name, enter a name for the dataset export.
  3. Choose Export and wait for the transformed data to be exported.
  4. Choose the Create model link at the bottom left corner of the page.

You can also select the dataset from the Data Wrangler feature on the left of the page.

  1. Enter a model name.
  2. Choose Predictive analysis, then choose Create.
  3. Choose rating_avg_filled as the target column.

SageMaker Canvas automatically selects a suitable model type.

  1. Choose Preview model to ensure there are no data quality issues.
  2. Choose Quick build to build the model.

The model creation will take approximately 2–15 minutes to complete.

You can view the model status after the model finishes training. Our model has an RSME of 0.422, which means the model often predicts the rating of a restaurant within +/- 0.422 of the actual value, a solid approximation for the rating scale of 1–6.

  1. Finally, you can generate sample predictions by navigating to the Predict tab.

Clean up

To avoid incurring future charges, delete the resources you created while following this post. SageMaker Canvas bills you for the duration of the session, and we recommend logging out of SageMaker Canvas when you’re not using it. Refer to Logging out of Amazon SageMaker Canvas for more details.

Conclusion

In this post, we discussed how you can use SageMaker Canvas for generative AI and ML with data stored in Amazon DocumentDB. In our example, we showed how an analyst can quickly build a high-quality ML model using a sample restaurant dataset.

We showed the steps to implement the solution, from importing data from Amazon DocumentDB to building an ML model in SageMaker Canvas. The entire process was completed through a visual interface without writing a single line of code.

To start your low-code/no-code ML journey, refer to Amazon SageMaker Canvas.


About the authors

Adeleke Coker is a Global Solutions Architect with AWS. He works with customers globally to provide guidance and technical assistance in deploying production workloads at scale on AWS. In his spare time, he enjoys learning, reading, gaming and watching sport events.

Gururaj S Bayari is a Senior DocumentDB Specialist Solutions Architect at AWS. He enjoys helping customers adopt Amazon’s purpose-built databases. He helps customers design, evaluate, and optimize their internet scale and high performance workloads powered by NoSQL and/or Relational databases.

Tim Pusateri is a Senior Product Manager at AWS where he works on Amazon SageMaker Canvas. His goal is to help customers quickly derive value from AI/ML. Outside of work, he loves to be outdoors, play guitar, see live music, and spend time with family and friends.

Pratik Das is a Product Manager at AWS. He enjoys working with customers looking to build resilient workloads and strong data foundations in the cloud. He brings expertise working with enterprises on modernization, analytical and data transformation initiatives.

Varma Gottumukkala is a Senior Database Specialist Solutions Architect at AWS based out of Dallas Fort Worth. Varma works with the customers on their database strategy and architect their workloads using AWS purpose built databases. Before joining AWS, he worked extensively with relational databases, NOSQL databases and multiple programming languages for the last 22 years.

Read More

Empowering Models with Performance: The Art of Generalized Model Transformation Approach

Empowering Models with Performance: The Art of Generalized Model Transformation Approach

Introduction

PyTorch 2.0 (PT2) offers a compiled execution mode which rewrites Python bytecode to extract sequences of PyTorch operations, translating them into a Graph IR. The IR is then just-in-time compiled through a customizable back end, improving training performance without user interference. Often, production models may go through multiple stages of optimization/lowering to hit performance targets. Therefore, having a compiled mode is desirable as it can separate the work of improving model performance from direct modification of the PyTorch model implementation. Thus, the compiled mode becomes more important, enabling Pytorch users to enhance model performance without modifying the PyTorch code implementation. This feature is particularly valuable for optimizing complex models, including large-scale and production-ready ones.

In our previous blog post , we outlined how heuristic model transformation rules are employed to optimize intricate production models. While these rules enabled substantial performance gains for some pilot models, they lacked universal adaptability; they don’t consistently perform well across different models or sometimes even within different sections of a single model.

Fig.1 PT1 Graph mode vs PT2 Compile mode.

Fig. 1: PT1 Graph mode vs PT2 Compile mode.

In this blog post, we propose a more generalized model transformation solution, serving as a plugin to the PT2 compiler as shown in Fig.1 which is more general, performant and user-friendly, bringing performance improvements to both model training and inference without manual efforts. As illustrated in Fig.2, by incorporating the previously user-defined transformations into the compiler, we have streamlined the production stack. These changes bring advantages to a broader range of PyTorch models, extending beyond just Meta models, which has already been incorporated in PT2 and is ready for use to benefit all Pytorch models.

Fig.2 Simplified stack with PT2 compile mode.

Fig. 2: Simplified stack with PT2 compile mode.

Guiding Principle: Atomic Rules

Traditionally, people might use predefined heuristic rules to replace a model subgraph with another more performant subgraph toreduce launch overhead, minimize memory bw, and fully occupy SMs. However, this approach doesn’t scale well as it is hard to craft a set of rules that fits all models perfectly.

Instead of grappling with bulky, complex rules, we can actually break them down into smaller, more digestible pieces – what we call ‘atomic rules’. These tiny powerhouses of efficiency target the transformation of individual operators, to conduct one step of the fusion/transformation. This makes them easy to handle and apply, offering a straightforward path to optimizing models. So, with these atomic rules in hand, optimizing any model for top-tier performance becomes a breeze!

We will walk through some simple examples to demonstrate how we use a chain of atomic rules to replace complicated heuristic rules.

Case 1: Horizontal fusion of computation chains started with accesses to embedding tables

Horizontal fusion means fusing parallel operators into one so as to reduce the number of kernels to be launched and improve performance. In our previous blog (Section 3.2), we described model transformations that fused layernorm and activation functions after embedding bags, as shown in the figure provided. However, this method, had limitations:

  1. It only worked with layernorm and activation functions after embedding.
  2. It was restricted to models with specific architecture rules, causing various issues in our production stack, including parameter changes and inference disruptions.

To improve, we can use three atomic rules as shown in Fig.3 to replace the complicated heuristic rule:

  • Fuse layernorms that follow the same split nodes horizontally.
  • Then, fuse tanh functions following the same split nodes horizontally.
  • Lastly, fuse vertical split-cat nodes.

These atomic rules offer a clean and streamlined way for model simplification and optimization.

Fig.3 Before, we optimized the model in one go by replacing subgraphs. Now, with atomic rules, we optimize step-by-step, covering more cases.

Fig. 3: Before, we optimized the model in one go by replacing subgraphs. Now, with atomic rules, we optimize step-by-step, covering more cases.

Case 2: Fuse horizontal MLP

MLPs (Multilayer Perceptrons) are fundamental components of deep neural networks, often consisting of linear, normalization, and activation functions. In complex models, there’s often a need to fuse many horizontal MLPs. Traditional methods find and replace parallel MLPs with a fused module as shown in Fig.4, but this isn’t always straightforward. Some models might not have normalization, or they might use different activation functions, making it hard to apply a one-size-fits-all rule.

This is where our atomic rules come in handy. These simplified rules target individual operators one at a time, making the process easier and more manageable. We use the following atomic rules for horizontal MLP fusion:

  • Fusing horizontal linear operators
  • Fusing horizontal layernorms.
  • Fusing horizontal activation functions.

Fig.4 Pseudocode for fusing MLP. Traditional optimizations need manual Python code changes.

Fig. 4: Pseudocode for fusing MLP. Traditional optimizations need manual Python code changes.

The beauty of these rules is that they’re not limited to one case. They can be applied broadly. Since PyTorch models are built with torch operators, focusing on a smaller set of operators simplifies the process. This approach is not only more manageable but also more general compared to writing a specific large pattern replacement rule, making it easier to optimize various models efficiently.

Compile-time Graph Search

Our principle is to use chained atomic rules to replace heuristic rules. While this approach covers a wider range of cases, it does entail a longer time for graph search and pattern matching. The next question is: how can we minimize compilation time while performing compile-time graph searches efficiently?

We design a two-step greedy algorithm as illustrated in Fig. 5. The first step in this process is to identify the target nodes, which we follow certain rules, e.g., identifying all linear operations with the same input shapes. Once identified, we use a Breadth-First Search (BFS) strategy to separate these nodes into different sets, so that nodes within a set don’t have data dependency. The nodes within each of these sets are independent and can be fused horizontally.

Fig.5 Process of model transformation with graph IR.

Fig. 5: Process of model transformation with graph IR.

With our approach, the search time is roughly 60 seconds for one of our largest internal models. It is manageable for on-the-fly tasks and

In the End

In our tests with internal ranking models, we observed approximately 5% to 15% training performance improvement across five models on top of the performance gain brought by torch.compile. We have enabled the optimization in PT2 compiler stack and landed it as default when users choose Inductor as the backend (config). We expect our generalized transformation approach could benefit models beyond Meta, and look forward to more discussion and improvement through this compiler level transformation framework.

Acknowledgements

Many thanks to Mark Saroufim, Gregory Chanan, Adnan Aziz, and Rocky Liu for their detailed and insightful reviews.

Read More