ParetoQ: Scaling Laws in Extremely Low-bit LLM Quantization

ParetoQ: Scaling Laws in Extremely Low-bit LLM Quantization

The field of large language models is shifting toward lower-precision computation. This shift necessitates a rethinking of scaling laws to account for the effects of quantization on resulting quantized model performance. In this work, we demonstrate that previous conclusions on the low-bit scaling laws can be significantly sharpened by better quantization scheme design and training improvements.

We propose ParetoQ, the first algorithm that unifies binary, ternary, and 2-to-4 bit quantization-aware training. ParetoQ demonstrates its robustness by yielding state-of-the-art (SOTA) models at all bit widths, surpassing prior works tailored for individual bit levels. We’ve released the MobileLLM low-bit model collection on Hugging Face, featuring models quantized with our ParetoQ method. The smallest model is an ultra-efficient 1-bit 125M variant, with just ~16MB equivalent storage size.

These SOTA points in the Pareto chart ensure that our scaling law comparisons are both reliable and consistent, as they derive from homogeneous settings. Our scaling laws reveal that binary quantization significantly compromises accuracy, while ternary, 2-bit, and 3-bit quantization are tied in performance, often surpassing 4-bit. 

ParetoQ is based on PyTorch models, including LLaMA and MobileLLM. We utilized a popular PyTorch Library: HuggingFace Transformers for accuracy experiments. For the latency experiments, we utilize the low-bit quantization kernels on the CPU with ExecuTorch. We compared their speed with that of 4-bit quantization. Additionally, we implemented state-of-the-art 2-bit GPU kernels, which showed up to a 4.14x speedup compared to FP16 and a 1.24x speedup over the Machete 4-bit kernel on TritonBench.

ParetoQ has been integrated into torchao [pull]. This integration enables users to leverage ParetoQ by specifying “paretoq” as the quantization method within torchao’s codebase. Once set, the users can utilize torchao’s ParetoQ workflow, optimizing quantization parameters to balance accuracy and compression trade-offs and compare different quantization bit`s apple-to-apple using Pareto frontier analysis. This allows for the efficient deployment of models on edge devices without requiring manual tuning of quantization settings. 

To obtain the ParetoQ-quantized models, simply navigate to the torchao/prototype/paretoq directory and execute the training script:

cd torchao/prototype/paretoq && bash 1_run_train.sh $w_bit

Here, $w_bit specifies the target weight bit-width for quantization.

ParetoQ code is available at: https://github.com/facebookresearch/ParetoQ

Paper link: https://arxiv.org/abs/2502.02631 

1 A Better QAT Scheduling Strategy for Extreme Low-Bit LLMs

1.1 Training Budget Allocation

Given a fixed training budget B_train = B_FPT +B_QAT, how should the budget be optimally allocated between full-precision training (B_FPT) and quantization-aware training/fine-tuning (B_QAT) to maximize the accuracy of the quantized model?

Figure 1: Optimal allocation between full-precision pretraining and QAT fine-tuning.

Finding-1 QAT finetuning consistently surpasses both PTQ with B_FPT = B_train and QAT from scratch with B_QAT = B_train. Optimal performance is nearly achieved by dedicating the majority of the training budget to full precision (FP) training and approximately 10% to QAT.

1.2 Fine-tuning Characteristics

Figure 2: Analysis of training token requirements for quantization-aware fine-tuning and training from scratch

Finding-2 While fine-tuning enhances performance across all bit-widths, even binary and ternary, optimal fine-tuning effort inversely correlates with bit-width. For 3-bit and 4-bit weights, fine-tuning adjusts within a nearby grid to mitigate accuracy loss and requires less fine-tuning tokens. In contrast, binary and ternary weights break the grid, creating new semantic representations to maintain performance, requiring longer fine-tuning.

Figure 3: L1 norm difference between QAT-finetuned weights and full-precision initialization (||W_finetune −W_init||_l1 /||W_init||_l1).

2 A Hitchhiker’s Guide to Quantization Method Choices

In sub-4-bit quantization, the choice of function is highly sensitive and can drastically alter scaling law outcomes.

 

 

Figure 4: Impact of quantization grid choice across bit widths. 2.1.1 Range clippingCompared to statistics-based quantization (e.g., min-max quantization), learnable scales which optimize quantization ranges as network parameters, balancing outlier suppression and precision, yields more stable and superior performance. As shown in Figure (b)-(e), learnable policies consistently outperform stats-based methods across all bit widths.

2.1.2 Quantization grids

Level symmetry in quantization grids is vital for lower-bit quantization but often overlooked. Including “0” in even-level quantization (e.g., 2-bit, 3-bit, 4-bit) can cause imbalance. For instance, 2-bit quantization options like (-2, -1, 0, 1) limit positive representation to only one level, while (-1.5, -0.5, 0.5, 1.5) offers more balanced representation. We propose Stretched Elastic Quant (SEQ) to address this in lower-bit scenarios.

SEQ balances quantized levels and evenly divides the full-precision weight span, crucial for extremely low-bit quantization. Figures show SEQ’s advantage in ternary and 2-bit quantization, while LSQ with “0” slightly excels in 3 and 4-bit cases.

Figure 5: Comparison of quantization methods across different bit-widths

2.2 Quantization Function

Based on our analysis, we combine the optimal quantization functions identified for each bit-width into one formula, denoted as ParetoQ. This includes Elastic Binarization [1] for 1-bit quantization, LSQ [2] for 3 and 4-bit quantization, and the proposed SEQ for 1.58 and 2-bit quantization.

Here, k equals 3 in the ternary case and 2Nbit otherwise; n = 2Nbit-1 and p = 2Nbit-1 -1. In the backward pass, the gradients to the weights and scaling factor can be easily calculated using a straight-through estimator.

With ParetoQ, we present a robust comparison framework across five bit-widths (1-bit, 1.58-bit, 2-bit, 3-bit, 4-bit), each achieving state-of-the-art accuracy. This facilitates direct, apple-to-apple comparisons to identify the most effective bit-width selection.

3 Comparison with SoTA

3.1 Comparisons on 1.58-bit quantization

The figure below illustrates that ParetoQ consistently outperforms previous methods targeting ternary quantization aware training including Spectra [3] and 1-bit Era [4]. Given that a full-precision LLaMA-3 3B model achieves 69.9 accuracy, it’s remarkable that ParetoQ ternary 3B-parameter model narrows the gap to just 4.1 points, while previous methods experience drops exceeding 11.7 points.

Figure 6: Ternary quantization accuracy averaged across six tasks: ARC-e, ARC-c, BoolQ, PIQA, HellaSwag, and WinoGrande. ParetoQ consistently outperforms all prior methods in ternary quantization-aware training.

3.2 comparisons 2-bit / 3-bit / 4-bit quantization

As evidenced by Figure 1, compared to previous state-of-the-art PTQ and QAT methods on 2, 3 or 4-bit quantization settings, our approach consistently resides on the Pareto front, with a particularly pronounced advantage in lower-bit quantization settings. These results confirm that our bit-accuracy trade-off conclusions are benchmarked against SoTA results across all bit settings, ensuring its reliability.

Figure 7: Accuracy comparison on 8 models. ParetoQ outperforms all state-of-the-art PTQ and QAT methods in 2, 3, and 4-bit settings.

4 Pareto Curve

4-bit quantization-aware training (QAT) achieves near-lossless compression in many scenarios. With ParetoQ, we are able to further improve the trade-off curve. Figure (a) demonstrates that sub-4-bit quantization, including binary, ternary, 2-bit, and 3-bit, often surpasses 4-bit. Notably, 2-bit and ternary models reside on the Pareto frontier.

To evaluate potential speedup benefits beyond memory reduction, we utilize the High-Performance Low-Bit Operators for 2-bit quantization and compare the latency with 4-bit quantization. The curves in Figure8 (c) demonstrate that, within our experimental range, 2-bit quantized models consistently outperform 4-bit models in terms of accuracy-speed performance, positioning 2-bit quantization as a superior choice for on-device applications where both latency and storage are critical.

Figure 8: (a) (b) In sub-4-bit regime, 1.58-bit, 2-bit, and 3-bit quantization outperform 4-bit in terms of the accuracy-model size trade-off. (c) Under hardware constraints, 2-bit quantization demonstrates superior accuracy-speed trade-offs compared to higher-bit schemes.

5 GPU Latency

We measured the latency of LLaMA 3.2 models (1B, 3B, 8B) on an H100 NVL GPU (94GB memory). The W4A16 kernel used the Machete kernel from vLLM, while the W2A16 kernel was implemented based on the CUTLASS mixed precision backbone kernel. All tests were performed on a single GPU with a context length of 2048 tokens. For kernel-level latency, we compared the 2-bit kernel to the 4-bit Machete kernel across three weight shapes: (4096 x 4096), (8192 x 8192), and (16384 x 16384) on TritonBench. For larger size kernels, 2-bit can achieve ~24% speed up compared to the 4-bit Machete kernel.

Conclusion

In this study, we propose ParetoQ, an advanced quantization framework that achieves state-of-the-art performance across all bit-width levels. This framework uniquely enables a direct, consistent comparison across different bit-widths, ensuring an equitable evaluation of performance metrics. Our empirical analysis indicates that quantization at 1.58-bit, 2-bit, and 3-bit offers a superior trade-off between accuracy and effective quantized model size compared to 4-bit, highlighting their potential for optimized model deployment.

Feel free to try running ParetoQ from torchao/prototype/paretoq, following the steps in that repo. If you have any questions, feel free to reach out to Zechun Liu <zechunliu@meta.com>, Changsheng Zhao <cszhao@meta.com> Andrew Or <andrewor@meta.com> 

References

[1] BiT: Robustly Binarized Multi-Distilled Transformer.

[2] Learned Step Size Quantization.

[3] Spectra: A Comprehensive Study of Ternary, Quantized, and FP16 Language Models.

[4] The Era of 1-bit LLMs: All Large Language Models Are in 1.58 Bits

Read More

HuggingFace Safetensors Support in PyTorch Distributed Checkpointing

HuggingFace Safetensors Support in PyTorch Distributed Checkpointing

Summary 

PyTorch Distributed Checkpointing (DCP) is making investments into addressing the interoperability blockers to ensure that popular formats, like HuggingFace safetensors, can work well with PyTorch’s ecosystem. Since HuggingFace has become a leading format in inference and fine-tuning, DCP is beginning to support HuggingFace safetensors. The first customer of these changes is torchtune, who has seen an improved user experience as they can now cleanly read and write directly to HuggingFace with DCP APIs.

Problem

Since HuggingFace is used widely, with over 5 million users, many ML engineers would like to save and load their checkpoints in safetensors format to be able to easily work with their ecosystem. By supporting safetensors format natively in DCP, checkpointing is simplified for our users in the following ways:

  • DCP currently has its own custom format, so users who want to work with HuggingFace models, but leverage DCP’s performance wins and features, had to build custom converters and components so that they could work between both systems.
  • Instead of users having to download and upload their checkpoints to local storage every time, HuggingFace models can now be saved and loaded directly into the fsspec-supported storage of their choosing.

How to Use

From a user’s perspective, the only change needed to use safetensors is to call load with the new load planner and storage reader, and similarly save with the new save planner and storage writer.

The load and save APIs are called as follows:


load(
	state_dict=state_dict,
	storage_reader=HuggingFaceStorageReader(path=path),
)

save(
	state_dict=state_dict,
	storage_writer=HuggingFaceStorageWriter(
				path=path,
				fqn_to_index_mapping=mapping
			),
)

The HuggingFaceStorageReader and HuggingFaceStorageWriter can take any fsspec based path and so it can read/write in HF safetensors format to any fsspec supported back-end, including local storage and HF storage. Since HuggingFace safetensors metadata doesn’t natively provide the same level of information as DCP metadata, distributed checkpoints are currently not well-supported in these APIs, but DCP plans on supporting this natively in the future.

 

torchtune

Our first customer of HuggingFace DCP support is torchtune – a post-training library written in native PyTorch. The primary way torchtune users retrieve model weights is from the Hugging Face Hub. Before, users had to download the model weights and upload the trained checkpoints via extra CLI commands; the new DCP APIs allow them to directly read and write to HuggingFace, resulting in a much better user experience. 

In addition, the support of safetensor serialization in DCP greatly simplifies the checkpointing code in torchtune. No longer will there need to be format-specific checkpointing solutions, thus increasing developer efficiency in the project.

Future Work

DCP plans to handle the distributed loading and saving of HuggingFace safetensors checkpoints with resharding. DCP also plans to support the ability to produce a consolidated final checkpoint to a single file for publishing.

Read More

Introducing the PyTorch Ecosystem Working Group and Project Spotlights

Introducing the PyTorch Ecosystem Working Group and Project Spotlights

The PyTorch Ecosystem goes back several years, with some of its earliest projects like Hugging Face, Fast.ai, and PyTorch Lightning going on to grow incredible communities of their own. The goal from the beginning was to bring together innovative open source AI projects that extend, integrate with, or build upon PyTorch. Some of the key aspects we looked at were, for example, that they were well tested and maintained (including CI), were easy to onboard as a user, and there was a growing community. Fast forward several years, and the ecosystem continues to thrive with a vibrant landscape of dozens of projects spanning privacy, computer vision, to reinforcement learning. Enter the PyTorch Ecosystem Working Group.

In early 2025, the PyTorch Foundation created the PyTorch Ecosystem Working Group to showcase projects that could be of interest to the community and represent projects that are mature and healthy, standing out in their respective domain. The working group, composed of members across the ecosystem, was tasked with defining a clear bar including functional requirements (e.g., CI, licensing…), measurable requirements (e.g., commits and contributors), and the implementation of best practices for how to structure their repos. The working group also implemented a streamlined submission and review process and a transparent lifecycle. It’s still very early, but the reception from the community has been great, with 21 submissions so far and a strong pipeline of projects in review.  You can learn more about this working group’s goals here, including the requirements and application process. 

As part of this new blog series, every quarter we will update the community on new entries in the PyTorch Ecosystem, as well as highlight up and coming projects that are in consideration that will benefit from more eyes and contributors.

Ecosystem Project Spotlights

We’re happy to welcome SGlang and docTR to the PyTorch Ecosystem. Here’s a short intro to both.

SGLang

SGLang is a fast-serving engine for large language models and vision language models. It makes the interaction with models faster and more controllable by co-designing the backend runtime and frontend language.

The core features include:

  • Fast Backend Runtime: Provides efficient serving with RadixAttention for prefix caching, zero-overhead CPU scheduler, continuous batching, token attention (paged attention), speculative decoding, tensor parallelism, chunked prefill, structured outputs, and quantization (FP8/INT4/AWQ/GPTQ).
  • Flexible Frontend Language: Offers an intuitive interface for programming LLM applications, including chained generation calls, advanced prompting, control flow, multi-modal inputs, parallelism, and external interactions.
  • Extensive Model Support: Supports a wide range of generative models (Llama, Gemma, Mistral, Qwen, DeepSeek, LLaVA, etc.), embedding models (e5-mistral, gte, mcdse) and reward models (Skywork), with easy extensibility for integrating new models.
  • Active Community: SGLang is open source and backed by an active community with industry adoption.

SGLang is famous for its fast speed. It can often significantly outperform other state-of-the-art frameworks in terms of serving throughput and latency. Learn more.

docTR

docTR is an Apache 2.0 project developed and distributed by Mindee to help developers integrate OCR capabilities into applications with no prior knowledge required.

To quickly and efficiently extract text information, docTR uses a two-stage approach:

  1. First, it performs text detection to localize words.
  2. Then, it conducts text recognition to identify all characters in a word.

Detection and recognition are performed by state-of-the-art models written in PyTorch. Learn more.

Up and Coming Project Spotlights

As part of this series, we highlight projects that are in consideration for the PyTorch Ecosystem, and that we believe will benefit from more eyes and contributors. This time it’s the turn of EIR and torchcvnn.

EIR

EIR is a comprehensive deep learning framework built on PyTorch that enables researchers and developers to perform supervised modeling, sequence generation, image/array generation, and survival analysis across multiple data modalities. EIR specializes in handling complex data types, including genotype, tabular, sequence, image, array, and binary inputs. While it has particular strengths in genomics and biomedical applications, its versatile handling of these diverse data types allows for broader applications across various sectors. For example, EIR’s multi-modal approach can enhance tasks such as detecting manufacturing defects by linking images with equipment readings (e.g., for an imperfect silicon wafer), monitoring infrastructure by analyzing site photos along with operational logs (e.g., to identify cracks in a pipeline), or improving retail insights by combining product images with their descriptions and sales figures. This demonstrates how EIR’s multi-modal capabilities can bring value to a wide range of industries.

The framework provides a high-level, yet modular API that reduces the amount of boilerplate code and pre-processing required to train models, allowing users to focus on their end goals rather than implementation details. To learn more and explore practical examples, please refer to the documentation. 

Key features include:

  • Multi-modal inputs: Seamless integration of genotype, tabular, sequence, image, array, and binary data.
  • Varied modeling options: Use any of the input modalities above for supervised learning, sequence generation, image/array generation, and survival analysis.
  • Scaling: Capabilities for custom data streaming for model training.
  • Explainability: Built-in explainability functionality for when performing supervised learning and survival analysis.
  • Model Deployment: Serve any of your trained models with just one command, allowing you or others to interact with your models via web services.

To explore EIR and consider how it might enhance your work with multi-modal data:

torchcvnn

torchcvnn is a library that helps researchers, developers, and organizations to easily experiment with Complex-valued Neural Networks (CVNNs)! In several domains, data are naturally represented in real-imaginary form, for instance, remote sensing, MRI, and many more. These domains would benefit from direct complex-valued computations, giving understanding about critical physical characteristics to the neural networks during the learning process.

torchcvnn gives you easy access to: 

  • Standard datasets for both remote sensing (SLC and ALOS2 formats) and MRI, and for different tasks (classification, segmentation, reconstruction, super-resolution)

 

  • Various activation functions, either operating independently on the real/imaginary components or fully exploiting the complex nature of the representations,

 

  • Normalization layers with the complex-valued BatchNorm of Trabelsi et al.(2018), LayerNorm, RMSNorm,

 

  • Complex-valued attention layer as introduced in Eilers et al. (2023),

PyTorch already supports optimization of complex-valued neural networks by implementing Wirtinger Calculus. However, there are still complex-valued building blocks missing to really be able to explore the capabilities of complex-valued neural networks. The objective of torchcvnn is to fill in this gap and to provide a library helping the PyTorch users to dig into the realm of complex-valued neural networks.

torchcvnn warmly welcomes contributions to both the core torchcvnn library or to the examples’ repository for whether spotting a bug, having suggestions for improvements, or even wanting to contribute to the source code. All the components are described in the documentation of the project. The torchcvnn team will be present at IJCNN 2025 in July in Rome during the special session on “Complex- and Hypercomplex-valued Neural Networks.”

How to Join the PyTorch Ecosystem

If you’re developing a project that supports the PyTorch community, you’re welcome to apply for inclusion in the Ecosystem. Please review the PyTorch Ecosystem review process to ensure that you meet the minimum expectations before applying.

Cheers!

The PyTorch Ecosystem Working Group

Read More

Open Source AI is Transforming the Economy—Here’s What the Data Shows

Open Source AI is Transforming the Economy—Here’s What the Data Shows

The Economic and Workforce Impacts of Open Source AI

Blog cross-posted on the Linux Foundation blog.

As we approach the midpoint of 2025, the potential of AI to transform businesses, economies, and industries is not only widely anticipated and nearly universal but also well documented. In a commissioned project by Meta, LF Research set out to capture existing evidence on this topic, with the specific aim of understanding how open source is playing a role in this transformation.

In its latest publication, The Economic and Workforce Impacts of Open Source AI, LF Research describes the nuances of how and to what extent open source AI (OSAI) is impacting the global economy and workforce. By examining existing evidence from industry, academic, and open source research, the authors found important insights on OSAI’s adoption rates, cost effectiveness, innovation-boosting potential, and more. Here are the big takeaways.

First, the adoption of open source AI is already widespread. Nearly all software developers have experimented with open models, and about 63% of companies are actively using them. In fact, among organizations that have embraced AI in any form, a striking 89% incorporate open source AI somewhere in their infrastructure. It’s no longer a fringe approach—it’s becoming the standard.

89 percent of orgs incorporate open source AI somewhere in their infrastructure

Why? Cost is a huge factor. Open source tools often come with significantly lower price tags than their proprietary counterparts. My prior research with Manuel Hoffmann and Yanuo Zhou has shown that if open source didn’t exist, companies would spend 3.5 times more on software than they currently do. The new LF report shows that two-thirds of organizations say OSAI is cheaper to deploy, and nearly half cite cost savings as a primary reason for choosing open source. Combine that with studies showing AI’s ability to cut business unit costs by over 50%, while still being user friendly and maintaining high performance, and it’s clear that OSAI represents a strategic advantage for boosting margins and scaling innovation.

two-thirds of organizations say Open Source AI is cheaper to deploy

Innovation and entrepreneurship are other major benefits of open source. In research with Nataliya Langburd Wright and Shane Greenstein, we found that when open source contributions increase at the country level, so do new startups; at the company level, there is a positive relationship between contributing to open source and startup growth. Open source encourages collaboration, inviting contributions from a global pool of developers and researchers. This external input helps accelerate the development of high-quality models. As Daniel Yue and I found when Meta donated the machine learning library PyTorch to the Linux Foundation, there was a notable increase in corporate contributions, especially from chip manufacturers.

Open Source AI encourages collaboration and accelerates the development of high-quality models

AI’s cost-cutting capabilities are not only linked to the increased productivity that comes from freed-up resources, but also from a re-orienting of the way people work—similar to how the full impact of the steam engine led to the industrial revolution, but only after factories re-oriented their entire work flow around it. Manuel Hoffmann, Sam Boysel, Kevin Xu, Sida Peng, and I found this to be the case with software developers. When GitHub rolled out their GenAI coding tool Copilot, developers changed the way that they worked by spending more time writing code and substantially less time doing project management. However, according to existing research identified in the LF study, this has not translated to substantial layoffs: 95% of surveyed hiring managers over the past two years said they do not plan to reduce headcount due to AI. What’s more, being able to use AI tools effectively may actually increase wages by over 20%.

Looking ahead, open source AI is likely to become foundational in areas like edge computing, where smaller, privacy-preserving models need to run efficiently on local devices. OSAI is also making big inroads in industry-specific applications. In manufacturing, for instance, open models offer the flexibility required to integrate AI into complex operational workflows. And in healthcare—a traditionally conservative and risk-averse field—open models are already matching proprietary ones in performance, giving institutions confidence to adopt without compromising on quality. OSAI is an important avenue to level the playing field, no matter your organization’s size or financial resources—as the report found, small businesses are adopting OSAI at higher rates than their larger counterparts.

small businesses are adopting open source AI at higher rates than their larger counterparts

OSAI is an economic force. It’s reducing costs, accelerating innovation, and empowering a wider range of players to shape the future of technology.

Read the Report

What’s Next for OSAI? Five Areas Ripe for Research

While the impact of OSAI is starting to take shape, the full scope of its influence is just beginning to unfold. To better understand and harness the potential of OSAI, the report outlines five key areas for future research, each crucial to shaping smart policy, business strategy, and innovation ecosystems.

  1. Tracking the Bigger Picture: OSAI’s Role in Market Growth
    One pressing question is how open models are influencing the overall AI market. Beyond the tools themselves, OSAI may be driving complementary innovation, spurring growth in services, applications, and platforms built on top of open infrastructure. Understanding this broader ripple effect is essential for grasping the true economic footprint of open AI.
  2. Making the Case for Investment
    To help make informed decisions, researchers are encouraged to analyze the return on investment in OSAI infrastructure at both country and company levels. Quantifying the long-term value of these open components, from datasets and compute to developer tooling, can guide resource allocation and policy decisions in a fast-moving field.
  3. Connecting Openness to Innovation
    Does OSAI directly foster more startups, patents, or efficient R&D? Future studies should explore how open access to models and tools correlates with concrete innovation metrics. This could provide evidence for how openness accelerates not just adoption, but invention.
  4. Crunching the Cost Numbers
    A detailed comparison of costs between open and proprietary AI solutions across sectors, company sizes, and global regions would shed light on who benefits most from going open. These insights would be invaluable for organizations navigating tight budgets and evaluating technology strategies.
  5. Understanding Workforce Impacts
    Finally, the human side matters. As AI tools reshape work, it’s vital to measure how open models affect worker productivity, satisfaction, and work patterns. Do open tools empower workers in certain tasks or industries more than others? Do they lead to more flexible, fulfilling roles? Answers to these questions will help ensure that AI benefits not just business, but people.

By exploring these future research areas, we can unlock a deeper understanding of how open source AI is transforming the global economy and workforce. The era of open source AI is here—and it’s time to study its impact with depth and rigor.

Read More

Build Responsible AI Products with your own Yellow Teaming LLM

Build Responsible AI Products with your own Yellow Teaming LLM

The tools we use to build AI are evolving fast, with PyTorch at the heart of many advances. But unless we evolve the way we approach building AI systems, we risk amplifying harm as fast as we’re scaling up performance. Building AI responsibly means designing systems that not only perform well but do so fairly, safely, and transparently—like making sure an AI hiring tool doesn’t favor one demographic over another.

One useful approach to developing responsible AI systems is Yellow Teaminga proactive exercise that surfaces potential unintended consequences before deployment. Yellow Teaming helps companies stand out in a crowded market by making more thoughtful, impact-aware design choices that lead to an overall better product.

In this blog, we show how you can quickly create a PyTorch-based LLM Yellow Teaming assistant running on AWS Graviton4 with a reusable system prompt. We also give you an example to show you how to use your new assistant to explore unintended business-critical consequences of feature design and ultimately build better products.

Let’s get started.

What is Yellow Teaming:

You may already be aware of the more popular term Red Teaming in cybersecurity, which involves simulating how adversaries might attack your product and fixing vulnerabilities before launch. Other color-coded approaches exist (like Blue Teams that defend against attacks), but Yellow Teaming is distinct in focusing on thoughtful design and implementation from the start of the product’s lifecycle. Red Teaming practices have already been adapted to the AI domain. Yellow Teaming principles are now becoming an important part of AI development as well.

The practice of Yellow Teaming asks a set of probing questions to help reveal the broader, unintended impacts of your product on your business, your users, and society at large. This application of Yellow Teaming, and the rationale behind it, are explained eloquently in the Development in Progress essay by The Consilience Project. A closely related practice is also offered in the module, Minimizing Harmful Consequences, in the Center for Humane Technology free course.

Why Does Yellow Teaming Matter?

The central idea is that by analyzing the consequences of your product decisions with a wide view, you can design better products that create positive feedback loops for your company’s bottom line and your users’ well-being. For example, it helps you avoid building a chatbot that unintentionally reinforces bias.

Traditional product development practices often solve for narrowly defined success metrics. Creating specific product measurables is good for focus and accountability, but can lead to over-optimization on metrics while ignoring other signals that matter to your company. For instance, building an app with AI-driven recommendations that boosts engagement in the short term but makes people feel worse and fails to retain users over time.

Narrow product optimization tends to cause unmeasured negative effects. These include users getting burnt out or frustrated when using your product, reputational harm or less overall engagement with your company, and society fracturing from lack of trust and meaningful communication.

In many cases, what looks like product success on paper is actually harming your users, your company, and your long-term goals.

How to Implement Yellow Teaming Practices

Yellow Teaming is straightforward and powerful. Pick a product you are building, and systematically evaluate the various consequences for your users, your business, and society when adopted at scale. Start with direct consequences, then move to second- and third-order consequences by asking ‘what happens as a result of the previous effects?’ You should think through these consequences across multiple axis:

  1. Good and bad
  2. Short-term and long-term
  3. Intended and unintended
  4. Your company and your users
  5. A single user and groups of users

These types of questions help foster productive brainstorming:

  • What kinds of behaviors will this feature incentivize in users?
  • What affordances does this technology provide (what can users now do that they couldn’t before, even if unintended)?
  • Will this improve or degrade trust in our platform?
  • What social groups might benefit—or be left behind?

Yellow Teaming is based on complex systems thinking and externality analysis—fields that have traditionally felt far removed from engineering workflows. But by incorporating a lightweight Yellow Teaming assistant to help your ideation processes, it can become an intuitive, high ROI part of product development.

Building Your PyTorch YellowTeamGPT

The good news is that you don’t need a PhD in philosophy or a panel of advisors to Yellow Team your AI project. You just need to be willing to act and, in this implementation of Yellow Teaming, use a good LLM with the right prompt. There are several advantages to running your LLM locally. The biggest is that you can safely feed in confidential product plans without worrying about your data being leaked. Another benefit is that the smaller model is not perfect and makes mistakes, forcing us as users to apply critical thinking to every output, and putting us in the right headspace to analyze non-obvious product consequences.

Here is how you can set up a PyTorch-based 8-billion parameter Llama3 model on your Graviton instance. First, create a r8g.4xlarge instance running Ubuntu 24.04 with at least 50 GB of storage, then follow these three steps:

1. Set up your machine with the torchchat repo and other requirements:

sudo apt-get update && sudo apt install gcc g++ build-essential python3-pip python3-venv google-perftools -y

git clone https://github.com/pytorch/torchchat.git && cd torchchat

python3 -m venv .venv && source .venv/bin/activate

./install/install_requirements.sh

2. Download the model from Hugging Face (HF) by entering your HF access token (note the max sequence length parameter, which you can increase to enable longer conversations with a linear increase in memory usage):

pip install -U "huggingface_hub[cli]"

huggingface-cli login

python torchchat.py export llama3.1 --output-dso-path exportedModels/llama3.1.so --device cpu --max-seq-length 8192

3. Run the model with Arm CPU optimizations and 700 max token length per response:

LD_PRELOAD=/usr/lib/aarch64-linux-gnu/libtcmalloc.so.4 TORCHINDUCTOR_CPP_WRAPPER=1 TORCHINDUCTOR_FREEZING=1 OMP_NUM_THREADS=16 python torchchat.py generate llama3.1 --dso-path exportedModels/llama3.1.so --device cpu --max-new-tokens 700 --chat

For more details on these commands and additional code snippets to add a UI to this chatbot, review this Arm Learning Path.

You can then enter a custom system prompt. Below is a simple prompt that turns your local LLM into a Yellow Teaming assistant. Feel free to review and tweak it to get the most out of it for your specific needs. Here’s what it does:
  1. Gathers key product details: What you’re building, how it makes money, who your users are.
  2. Analyzes direct and indirect consequences: YellowTeamGPT presents one at a time, considering non-obvious impacts to your business, users, and beyond (you’ll likely start to think of more impacts on your own).
  3. Iterates with you: You are in control, telling YellowTeamGPT to continue listing general direct consequences, identifying specific company risks, moving to 2nd-order effects, and even brainstorming features to make your product better.

Here is the YellowTeamGPT system prompt for you to copy. If directly copying, make sure to copy as one line into your terminal or the new lines may cause issues.

You are an expert in complex systems thinking and AI product design, called YellowTeamGPT. You help technologists build better products that users love, and lower company risk. You do this by helping the user evaluate their product design decisions via the Yellow Teaming methodology, which identifies the consequences of design decisions on their business, their users, and society.

You will request from the user information about their product under development. Once you have enough information, you will analyze the product’s consequences that arise if deployed at scale. Structure your thinking to first review direct consequences, then 2nd order consequences that follow from the identified direct effects (by asking ‘what might happen next as a result?’). Consider consequences that impact the company, users, and society; are short and long term; are across categories like truth and understanding, human well-being, capability growth, economics, and more.

You are here to constructively challenge users, not reinforce their existing ideas. Play devil’s advocate to help users think in ways they are not currently.

You will output in this format: For each identified consequence, tie the impact to product quality, and prompt the user with a question that helps them design the product better to mitigate that consequence (or turn a negative impact into a positive one). List one consequence at a time and ask the user to continue listing them or explore that consequence further.

Example Yellow Teaming

Give your LLM the provided system prompt and hit enter. Next, your YellowTeamGPT assistant will ask for some product details. Here is a hypothetical example product I used:

I’m building an app that turns a group chat conversation into a catchy pop song. Targeting any user, like WhatsApp users. Key functionality is importing a group chat conversation and outputting a tune with lyrics and beat to match. It is an app on any smartphone. Ideally, millions of users. Would make money by targeted advertising of the users.

You’ll notice, as YellowTeamGPT thinks and generates its reply, that it is notably slower than ChatGPT or other popular GPTs. Like the model’s inaccuracy, its slow speed can be perceived as a benefit. The point of this exercise is to slow down, think through non-obvious product impacts, and brainstorm enhancements that create positive value across the systems your product touches. While your YellowTeamingGPT is ‘thinking,’ you should be too.

And below are snippets of my conversation. First, it starts with one direct consequence:

I then instruct it to continue to another consequence:

I ask to explore the second-order effects of having misinformation spread at scale from this app:

Finally, I ask for help brainstorming product features to mitigate this harm. It generates a few interesting concepts that are not product-ready, but easily spark further ideation:

Using YellowTeamGPT for this use case, we were able to rapidly explore product impacts we may not have considered. We could then brainstorm features solving previously unconsidered problems, leading to an improved product experience that also mitigates the risk of reputational harm to our hypothetical company.

Integrating Yellow Teaming Into Your Practices

Anywhere you’re making decisions that shape your product’s features and the user experience, Yellow Teaming fits. Here are a few examples of where you can leverage your new YellowTeamGPT:

  • New product ideation sessions to expand your thinking.
  • Feature planning docs to stress-test your specs.
  • Code review workflows for flagging potential misuse.
  • Sprint retrospectives to reflect on design choices at scale.
  • Product pitch decks to show responsible AI due diligence.

It can be as formal or informal as you want. The more you and your team think about unintended, Nth-order product consequences across multiple axis, the better your product will be. By incorporating Yellow Teaming into your work, you don’t just do the right thing, you build products that:

  • Users engage with and trust more
  • Mitigate harmful impacts
  • Minimize company risk
  • Create lasting business value

Let’s stop thinking of responsible AI practices as something to check off a list and start thinking of it as what it really is –a competitive edge that creates positive outcomes for your company, for your users, and for our shared society.

Read More

PyTorch Hangzhou Meetup Recap: Exploring the AI Open Source Ecosystem and Cutting-Edge Technology Practices

PyTorch Hangzhou Meetup Recap: Exploring the AI Open Source Ecosystem and Cutting-Edge Technology Practices

On May 17, the PyTorch Meetup was successfully held in Hangzhou, drawing nearly 60 developers and industry experts from companies including Huawei, Tencent, Ant Group, and ByteDance. The event focused on the development of the PyTorch ecosystem, AI acceleration technologies, and industry practices. Through keynote speeches and technical sessions, in-depth discussions were held with participants, providing a valuable platform for exchange and collaboration.

Session Highlights:

Latest Developments in the PyTorch Community and Ecosystem Outlook

Yikun Jiang, a member of the PyTorch Technical Advisory Council (TAC), shared the latest updates from the PyTorch community. Topics included the general progress of PyTorch, PyTorch Foundation Expands to an Umbrella Foundation, the Ambassador Program, and PyTorch Conference planning. He emphasized how PyTorch continues to drive innovation and real-world adoption of AI open source technologies through technical iteration, ecosystem expansion, and global collaboration. He called on developers to actively engage in community building and help shape the future of the AI open source ecosystem.

Torchair: A torch.compile Backend Optimized for Ascend NPU

Peng Xue, Senior Engineer at Huawei, presented technical practices around graph mode optimization on Ascend NPUs. He introduced the two Torchair modes—Reduce-overhead and Max-autotune—and detailed deep optimizations in memory management, dynamic shapes, multi-stream parallelism, and compile-time caching. These improvements aim to enhance model training and inference performance while maintaining ease of use.

PyTorch Ecosystem on Ascend

Yuanhao Ji, Software Engineer at Huawei, discussed support for PyTorch ecosystem projects on Ascend NPUs. Focusing on model training, fine-tuning, and inference, he introduced TorchTitan, TorchTune, and vLLM as case studies. He explained their core features and adaptation strategies for Ascend, offering practical guidance for deploying PyTorch projects on this hardware.

Production Prefill/Decode Disaggregation Based on vLLM at Tencent

Chao Zhang, Senior Engineer at Tencent, presented the practice of Prefill/Decode (PD) separation in large model inference. This technique decouples the compute-intensive prefill stage from the memory-intensive decode stage, significantly improving system throughput and resource utilization. His talk covered key technical implementations such as KV cache transmission optimization, intelligent load balancing, and multi-turn dialogue caching. Real-world deployments on both homogeneous GPUs and heterogeneous setups like Ascend A2 + H20 showed performance improvements of 20%–50%. Tencent has further optimized the vLLM framework for CPUs, GPUs, and uses pipeline decomposition, low-precision KV caches, and graph compilers to enhance adaptability and performance across hardware platforms.

Key Reinforcement Learning (RL) Acceleration Techniques and Training Practices

Chenyi Pan, Senior Engineer at Huawei, shared Ascend’s breakthroughs in reinforcement learning and ecosystem development. Addressing the challenge of low resource utilization in RL systems, introduced a training-inference co-card solution that allows for efficient switching between the two tasks. This approach not only saves 50% in compute resources but also doubles single-card throughput and improves inference memory availability by 80%. To enrich the technical ecosystem, Ascend also launched TransferDock, a streaming data engine that employs dynamic load balancing strategies to improve task efficiency by over 10% compared to traditional caching mechanisms.

On the framework side, MindSpeed-RL combines the MindSpeed training backend with the vLLM inference engine, supporting dynamic weight partitioning and time-sharing of cluster resources while maintaining compatibility with mainstream open source ecosystems. Benchmarks using the Qwen2.5-32B model showed that this setup outperformed the SimpleRL-Zoo baseline on evaluations such as MATH500, demonstrating its technical leadership.

Ray’s Practice and Exploration in Ant Group’s AI Infra Ecosystem

Senlin Zhu, Senior Technical Expert at Ant Group and Head of Ant Ray, shared the practice and exploration of Ray within Ant’s AI Infra ecosystem. He outlined Ray’s architectural design and programming paradigm. Over time, Ray has evolved into critical infrastructure for AI systems, supporting training, inference, hyperparameter tuning, and reinforcement learning.

Since 2017, Ant Group has continuously invested in Ray, which now supports applications at the scale of 2 million cores. Ant has also contributed key features to the community, such as multi-tenancy support and the Flow Insight visual debugging tool. Flow Insight, in particular, has alleviated “black box” issues in complex AI systems and significantly improved observability and deployment efficiency at scale.

Challenges and Standardization in PyTorch Ecosystem Accelerator Development

Zesheng Zong, a community developer from Huawei, provided a systematic overview of the challenges, solutions, and case studies in developing accelerators for the PyTorch ecosystem. Developers integrating out-of-tree hardware face version compatibility issues and a lack of standardized quality benchmarks, making it hard to quantify new device support. In early 2025, a new exploration group was formed in the PyTorch community to tackle these challenges.

Key improvements include: Establishing a standardized testing framework using the public repository pytorch-fdn/oota for daily plugin testing. Developing the OpenReg module to simulate backend behavior and validate with test cases. Optimizing the PrivateUse1 plugin mechanism to reduce integration complexity. Supporting automatic plugin loading to simplify device access. Improving the torch.accelerator device-agnostic API for broader compatibility.

Intel’s community developer Chuanqi Wang followed up with a case study on integrating and running CI infrastructure using Intel Gaudi. He described how to leverage CI from code compilation and unit testing to TorchBench automated benchmarking, ensuring quality for new backend integrations. He also noted plans to reduce testing time, clarify required test items, and define quality standards to improve ecosystem compatibility and development efficiency.

This PyTorch Meetup served as a technical bridge for in-depth developer exchanges and demonstrated the vibrant energy of the PyTorch ecosystem in AI’s cutting-edge domains. Through diverse perspectives, the attendees sketched a picture of how open source collaboration drives technological progress. We look forward to more developers joining this open and thriving wave of innovation, where each exchange can spark new ideas in the age of intelligence.

Read More

The Open Source Legacy and AI’s Licensing Challenge

Open source licensing revolutionized software development, creating a thriving ecosystem built on shared innovation and collaboration. Licenses like MIT and Apache-2.0 gave developers a standard, legally robust way to share code, reducing friction and accelerating adoption.

Today, we stand at a similar inflection point with open AI models. These models, increasingly foundational to research and industry, lack an equivalent licensing standard. Existing open source software licenses weren’t designed with AI models in mind, while most model-specific licenses are either too complex, overly restrictive, or legally ambiguous.

To fully unlock the potential of open AI, we need a license purpose-built for the realities of machine learning. That’s where OpenMDW comes in.

Why AI Models Need a New License

AI models differ fundamentally from traditional software. They are:

  • Composites of multiple types of components: including code, architecture, training data, weights, documentation, and evaluation protocols.
  • Subject to overlapping IP regimes: such as copyright, database rights, and trade secrets, which vary across jurisdictions.
  • Distributed without a consistent definition of “open”: resulting in a fragmented licensing landscape.

This complexity has led to a proliferation of bespoke, incompatible licenses that often:

  • Limit redistribution, reuse, or modification.
  • Fail to address legal nuances unique to models.
  • Create uncertainty for developers and adopters alike.

The result? Friction in open ecosystems, legal ambiguity, and a significant barrier to collaboration and innovation.

The Origins of OpenMDW

OpenMDW,  short for Open Model, Data and Weights License  was born out of the effort to implement the Model Openness Framework (MOF). The MOF is a 3-tier classification system that defines what it means for a model to be truly “open”— not just available with limitations or use restrictions, but licensed openly across its code, architecture, parameters, training data, and documentation.

To make MOF practical, model developers needed a simple, standard license they could drop into any repository,  just like Apache-2.0 or MIT is used in software. Something purpose-built for many types of content including models, not just code.

What Makes OpenMDW Different

OpenMDW is the first truly permissive license designed from the ground up for machine learning models. Here’s what sets it apart:

Covers the Entire Model Stack

It’s designed to apply to all components of a model release:

  • Model architecture
  • Parameters and checkpoints
  • Training and inference code
  • Preprocessing and evaluation data
  • Documentation (e.g., model cards, data cards)

Importantly, OpenMDW does not require inclusion of all components. It applies only to what is distributed, while remaining compatible with many other licenses that may govern certain parts of the repository.

(OpenMDW users will of course have to continue to comply with any other third-party licenses that apply to other pre-existing materials in their repos, such as by providing license text and notices, source code where applicable, etc.)

Comprehensive and Legally Grounded

OpenMDW grants expansive permissions including under copyright, patent, database, and trade secret law, a broad legal spectrum of rights relevant to AI artifacts.

It also includes:

  • patent litigation termination clauses to deter patent assertions by users of the model’s materials
  • Attribution requirements to maintain provenance and trust

Compatible with Policy and Open Source Principles

  • Intended to be fully aligned with the EU AI Act’s references to “free and open-source licenses”
  • Supports the Open Source Initiative (OSI) 10 principles, including free redistribution, source availability, derived works and no discrimination against persons or groups

Designed for Simplicity

  • One license, one file, one place: a LICENSE file at the root of your repo
  • No complex licensing matrix: no confusion for downstream users
  • Easy integration into any repo: just like MIT or Apache-2.0.

Understanding the OpenMDW License

Definitions and Scope

Model Materials under OpenMDW include:

  • Model architecture and trained parameters; and
  • all other related materials provided under OpenMDW, which can include:
    • Preprocessing, training and inference code
    • Datasets and evaluation scripts
    • Documentation, metadata, and tools

This comprehensive scope maps directly to the Model Openness Framework (MOF), ensuring that all critical elements of a model are covered if they are included with the distribution.

The Model Materials are not intended to be a requirement of what has to be included in the distribution. It only specifies that what is included in the distribution is covered by the license, and excludes anything covered by other licenses in the distribution.

Grant of Rights

OpenMDW grants broad rights to “deal in the Model Materials without restriction,” including for example:

  • Use, modify and distribute the Model Materials
  • Operate under copyright, patent, database, and trade secret laws

These rights are granted free of charge, with no field-of-use restrictions,  removing ambiguity for developers and enterprises alike.

Attribution, Not Copyleft

OpenMDW imposes only minimal obligations:

  • Retain the license text
  • Preserve original copyright and attribution notices

There are no copyleft or share-alike conditions, meaning derivative models and integrations can remain fully permissive. This allows for maximum reuse and interoperability.

Patent Protection

To prevent misuse of the commons, OpenMDW includes a patent-litigation termination clause: if a licensee initiates offensive patent litigation over the Model Materials, their license is revoked.

This mirrors best practices in open source software and helps preserve a collaborative ecosystem.

Outputs Are Unrestricted

A major innovation: outputs generated by using a model under OpenMDW are completely free of licensing restrictions imposed by the provider of the Model Materials.

This eliminates confusion over whether generated text, images, code or predictions are encumbered  by the model provider— a common point of uncertainty in existing licenses.

How to Adopt OpenMDW

Adopting OpenMDW is straightforward:

  1. Add the OpenMDW-1.0 license file to your repository: LICENSE
  2. Clearly indicate that your release is under OpenMDW-1.0 in the README
  3. Ensure all components of the model package are covered and disclosed, including prominently highlighting any components that are subject to other licenses

Why This Matters Now

The AI community is reaching an inflection point. Open models  from AI2’s Molmo to Mistral, and open reasoning models like DeepSeek’s R1 to multimodal agents  are reshaping what’s possible in the open. But their licensing status remains hard to characterize, since software licenses may not map cleanly onto AI models.

Some open weights models which use restrictive licenses have become gradually more permissive; but without a strong legal framework available for licensing, model producers have been forced to err towards the side of caution in designing their own licenses.

In his recent post, Nathan Lambert of AI2 rightly notes: “One of the long standing todo items for open-source AI is better licenses”, OpenMDW helps to fill that need.

Just as Apache-2.0 and MIT became foundational licenses for open source software, OpenMDW is positioned to become the standard for open models. Its clarity, scope, and permissiveness lower barriers for developers and create certainty for companies and researchers looking to build responsibly on open foundations.

This isn’t just about legal clarity,  it’s about enabling an innovation-rich and open source AI ecosystem.

Visit openmdw.ai for more details including the FAQ.

Read More

Featured Sessions: Exploring Innovation at PyTorch Day China 2025

Featured Sessions: Exploring Innovation at PyTorch Day China 2025

PyTorch Day China 2025, proudly hosted by the PyTorch Foundation, will take place on June 7 in Beijing, China collocated with the BAAI Conference. This will be the second event in the new PyTorch Day series, following the inaugural PyTorch Day France last month in Paris., PyTorch Days are focused on regional communities and provide a forum for sharing technical advances, project updates and tutorials, and showcasing impactful innovations across research and industry.

PyTorch Day China will highlight cutting-edge tools, frameworks, and practices across the PyTorch ecosystem. The full-day event will feature insightful talks across a multitude of domains and technical discussions on the most cutting-edge and relevant challenges and projects in the open source AI lifecycle. 

PyTorch Day China Featured Sessions:

Running Large Models on Any AI Chip: PyTorch + Open-Source Stack (FlagOS)
Yonghua Lin, VP and Chief Engineer, BAAI
A deep dive into architecture-free deployment of large models using FlagOS and PyTorch—part of BAAI’s open-source stack for cross-hardware model execution.

torch.accelerator: A Unified Runtime API for Accelerators
Yu Guangye, AI Framework Engineer, Intel
Learn how Intel is helping unify PyTorch’s runtime interface across diverse hardware accelerators, streamlining portable and scalable AI workloads.

vLLM: Easy, Fast, and Cheap LLM Serving for Everyone
Kaichao You, Tsinghua University
Explore the design and performance of vLLM, a popular open-source project for efficient inference and serving of large language models.

PyTorch in Production: Boosting LLM Performance on Ascend NPU
Jiawei Li, Huawei
A look at how PyTorch is being deployed in Huawei’s large-scale heterogeneous environments, with a focus on performance tuning and production readiness.

This is just a sample of what PyTorch Day China will offer. To explore the full agenda, visit the BAAI Conference event page.

Whether you’re contributing to the PyTorch ecosystem or deploying it at scale, PyTorch Day China is an opportunity to connect with a growing community and shape the future of AI development.

Read More

Accelerating GPU Performance with Triton: April 30th PyTorch ATX Event

Accelerating GPU Performance with Triton: April 30th PyTorch ATX Event

The PyTorch ATX Triton event, sponsored by Red Hat, was held on April 30, 2025, at the University of Texas. It was an exciting gathering focused on the Triton framework and its role in optimizing and democratizing GPU performance. A key purpose of the event was to highlight the awesome Triton contributors based in Austin and working for companies like Red Hat, Intel, AMD, IBM Research, and the University of Texas. Bringing contributors together helped to share insights and foster a stronger community. 

More than 50 attendees gathered to hear experts from these organizations discuss the growing importance of Triton in optimizing GPU efficiency for various algorithms. Key topics included understanding how to write, optimize, and troubleshoot Triton kernels to maximize GPU utilization and kernel portability.

Presentations covered a range of subjects from an introduction to Triton and its significance in vendor-neutral hardware acceleration, new sub-projects exploring increased developer productivity and runtime performance, to specific use cases such as Triton for vLLM and the Triton implementations by AMD and Intel. All session videos can be found here (YouTube). Speakers also examined the Triton framework itself, along with its release process, providing attendees with a comprehensive overview of the technology and its application.

This event aimed to equip the PyTorch ATX community with the knowledge and skills necessary to leverage Triton effectively and foster a deeper understanding of Triton’s capabilities by introducing and connecting local contributors. And guess what? This event worked out so well that we’re going to be hosting another large PyTorch ATX event focused on vLLM and the future of inferencing, coming up in August! Sign up here.

Read More

How OpenSynth Uses PyTorch to Accelerate Compute for Energy Modelling Applications

How OpenSynth Uses PyTorch to Accelerate Compute for Energy Modelling Applications

PyTorch Case Study LF Energy OpenSynth

OpenSynth has recently leveraged PyTorch to improve the experience of its users and community. OpenSynth is an open source community hosted by LF Energy that is democratising access to synthetic energy demand data. 

Access to smart meter data is essential to rapid and successful energy transitions. Researchers, modelers and policymakers need to understand how energy demand profiles are changing, in a system that requires greater real time optimization of demand and supply on the grid. Yet current global energy modeling and policymaking is still largely based on static and highly aggregated data from the past – when energy flowed in one direction, consumer profiles were relatively predictable, and power generation was highly controllable.

The major challenge is that access to demand data is highly restrictive, as a result of privacy protections. Rather than joining industry calls to unlock raw smart meter data through existing mechanisms, by tackling current data regulations and smart meter legislation, OpenSynth believes generating synthetic data is the fastest way to achieve widespread, global access to smart meter datasets.

The community empowers holders of raw smart meter (i.e. demand) data to generate and share synthetic data and models that can be used by researchers, industry innovators and policy-makers. 

PyTorch allowed the OpenSynth community to use GPU compute to speed up computation and use distributed training. End users with access to multiple GPUs can split the dataset into multiple smaller datasets to parallelise compute, further speeding up compute. This allows scaling of training to much bigger datasets than before.

The Business Challenge

Centre for Net Zero, the non-profit that originally developed OpenSynth before it was contributed to LF Energy, has also developed an algorithm called Faraday, available via OpenSynth to its users, that can generate synthetic smart meter data. The Faraday algorithm consists of two components: an AutoEncoder module and a Gaussian Mixture Module

The Gaussian Mixture Model (GMM) of Faraday was originally implemented using scikit-learn’s implementation. Scikit Learn is a popular library used amongst data scientists to train many different machine learning algorithms. However, that implementation does not scale very well on large datasets, as it only supports CPUs (Central Processing Units) – it does not allow accelerated computation using GPUs (Graphical Processing units). GPUs are a more powerful chip that can perform mathematical operations much faster, and is commonly used to train deep learning models. 

Furthermore, it also does not allow any parallelisation. Parallelisation compute means splitting the original dataset into multiple independent and smaller datasets, and training smaller models on each individual dataset, then combining the smaller models into a single model. 

A different implementation was needed that supports both parallel computation and GPU acceleration. 

How OpenSynth Used PyTorch

The OpenSynth community recently ported the GMM module from Faraday to PyTorch. Originally implemented using scikit-learn, this reimplementation enables the use of GPUs for training GMMs, significantly accelerating computational performance.

By leveraging PyTorch’s powerful GPU capabilities, the new GMM module can now handle much larger datasets and faster computation, making it an invaluable tool for practitioners working with large datasets that cannot fit into memory. This update allows users to scale their models and processes more efficiently, leading to faster insights and improved results in energy modeling applications.

A Word from OpenSynth

PyTorch LF Energy OpenSynth Case Study

“Open source is a powerful catalyst for change. Our open data community, OpenSynth, is democratising global access to synthetic energy demand data – unlocking a diversity of downstream applications that can accelerate the decarbonisation of energy systems. PyTorch has an incredible open source ecosystem that enables us to significantly speed up computation for OpenSynth’s users, using distributed GPUs. Without this open source ecosystem, it would have been impossible to implement this change – and slowed down the efforts of those seeking to affect net zero action.” – Sheng Chai, Senior Data Scientist, Centre for Net Zero

Learn More

For more information, visit the LF Energy OpenSynth website.

Read More