April 2023 – Page 2

Introducing Hidet: A Deep Learning Compiler for Efficient Model Serving

Hidet is a powerful deep learning compiler that simplifies the process of implementing high-performing deep learning operators on modern accelerators (e.g., NVIDIA GPUs). With the new feature of torch.compile(...) in PyTorch 2.0, integrating a novel compiler into PyTorch is easier than ever – Hidet now can be used as a torch.compile(...) backend to accelerate PyTorch models, making it an attractive option for PyTorch users who want to improve the inference performance of their models, especially for those who also need to implement extremely optimized custom operators.

Using Hidet to Compile A PyTorch Model

To use Hidet in PyTorch, you need to first install the hidet package via pip:

pip install hidet

Hidet is integrated with PyTorch as a torch.compile(...) backend following the Custom Backends tutorial. You can specify hidet as the backend when you compile a model. (Note: requires PyTorch version 2.0+):

torch.compile(..., backend='hidet')

Hidet converts the given PyTorch model in the torch.fx.Graph format into its internal graph representation, and conducts a series of optimizations. Hidet provides a few options to configure the optimizations. For example, we can use hidet.torch.dynamo_config.use_tensor_core(<strong>True</strong>) to allow Hidet to generate CUDA kernels that leverage the Tensor Cores on NVIDIA GPUs, and use hidet.torch.dynamo_config.search_space(2) to allow Hidet to search for the best operator schedule specific for your hardware and input sizes. More configurations can be found in Hidet’s documentation.

Here’s a complete example of how to use Hidet to compile and optimize a pre-trained ResNet50 model from torchvision:

import hidet
import torch

# Load a pre-trained ResNet50 model
x = torch.randn(1, 3, 224, 224, device='cuda').half()
model = torch.hub.load(
    'pytorch/vision:v0.6.0', 'resnet50', pretrained=True
).cuda().half().eval()

# Configure hidet to use tensor core and enable tuning
hidet.torch.dynamo_config.use_tensor_core(True)
hidet.torch.dynamo_config.search_space(2) 

# Compile the model using Hidet
model_opt = torch.compile(model, backend='hidet')

# Check correctness
torch.testing.assert_close(actual=model_opt(x), expected=model(x), rtol=1e-2, atol=1e-2)

# Benchmark
from hidet.utils import benchmark_func
print('eager: {:2f}'.format(benchmark_func(lambda: model(x))))
print('hidet: {:2f}'.format(benchmark_func(lambda: model_opt(x))))

We encourage you to try out the above script on your own NVIDIA GPU(s)! If you run this script on an aws.g5.2xlarge instance, you would get the result shown in the following figure. Hidet achieves the speedup because it could automatically fuse multiple operators, tune operator schedules, and use CUDA Graph to reduce framework-level overhead. More results can be found in the ASPLOS’23 publication of Hidet (vs. PyTorch 1.11) and our performance tracking (vs. PyTorch 2.0).

Using Hidet Script to Write Custom Operators

Hidet Script is one approach to implement tensor operators in Python. The following example shows how to implement a naive matrix multiplication using Hidet Script and integrate it as a PyTorch operator.

import torch
import hidet


def matmul(m_size, n_size, k_size):
    from hidet.lang import f32, attr
    from hidet.lang.cuda import threadIdx, blockIdx, blockDim

    with hidet.script_module() as script_module:
        @hidet.script
        def matmul(
            a: f32[m_size, k_size],
            b: f32[k_size, n_size],
            c: f32[m_size, n_size]
        ):
            attr.cuda_grid_dim = ((m_size + 31) // 32, (n_size + 31) // 32)
            attr.cuda_block_dim = (32, 32)
            i = threadIdx.x + blockIdx.x * blockDim.x
            j = threadIdx.y + blockIdx.y * blockDim.y
            if i < m_size and j < n_size:
                c[i, j] = 0.0
                for k in range(k_size):
                    c[i, j] += a[i, k] * b[k, j]

    ir_module = script_module.ir_module()
    func = hidet.driver.build_ir_module(ir_module)
    return func


class NaiveMatmul(torch.autograd.Function):
    @staticmethod
    def forward(ctx, a, b):
        m, k = a.shape
        k, n = b.shape
        c = torch.empty([m, n], dtype=a.dtype, device=a.device)
        func = matmul(m, n, k)
        func(a, b, c)
        return c


a = torch.randn([3, 4], device='cuda')
b = torch.randn([4, 5], device='cuda')
c = NaiveMatmul.apply(a, b)
cc = torch.matmul(a, b)
torch.testing.assert_close(c, cc)

More optimizations can be applied, see the example in our documentation to learn more.

Hidet Script vs. Triton: Triton greatly simplifies the CUDA programming by introducing the tile-based programming model where the parallel execution unit is thread blocks instead of threads. However, this simplification also prevents the tensor program developers from manipulating the fine-grained computation and memory resources (e.g., warps, shared memory) in their preferred ways. It would be challenging to implement an optimization that requires fine-grained control of these resources using Triton if it has not been implemented by the Triton compiler itself. Hidet Script, on the other hand, simplifies tensor programming while still enabling users to implement their own optimizations with extensive flexibility. It’s worth noting that the more granular control of Hidet Script also brings added complexity compared to Triton.

More about Hidet

Hidet originates from a research project led by the EcoSystem lab at the University of Toronto (UofT) and AWS. The authors propose a new way, named the task-mapping programming paradigm, to construct tensor programs. It aims to simplify the tensor programming without sacrificing any optimization opportunity. Now, Hidet is an open-source project, jointly supported by CentML and the EcoSystem lab, that aims to provide an efficient solution to end-to-end inference on modern accelerators (e.g., NVIDIA GPUs).

Additional Resources

GitHub Repository: https://github.com/hidet-org/hidet
Hidet’s Documentation: https://docs.hidet.org
ASPLOS ’23 Publication: https://dl.acm.org/doi/10.1145/3575693.3575702
ASPLOS ’23 Tutorial: https://centml.github.io/asplos23-tutorial/

Acknowledgement

We would like to thank Jerry Park, Mark Saroufim, Jason Liang and Helen Suk for their valuable help on preparing the blog post and feedback on the text. We also would like to thank Nikita Shulga, Jason Ansel, and Dmytro Dzhulgakov for reviewing and improving our PyTorch PR73873 on the 3rd-party dynamo backend registration.

DeepMind’s latest research at ICLR 2023

Next week marks the start of the 11th International Conference on Learning Representations (ICLR), taking place 1-5 May in Kigali, Rwanda. This will be the first major artificial intelligence (AI) conference to be hosted in Africa and the first in-person event since the start of the pandemic. Researchers from around the world will gather to share their cutting-edge work in deep learning spanning the fields of AI, statistics and data science, and applications including machine vision, gaming and robotics. We’re proud to support the conference as a Diamond sponsor and DEI champion.Read More

DeepMind’s latest research at ICLR 2023

Robust and efficient medical imaging with self-supervision

Posted by Shekoofeh Azizi, Senior Research Scientist, and Laura Culp, Senior Research Engineer, Google Research

Despite recent progress in the field of medical artificial intelligence (AI), most existing models are narrow, single-task systems that require large quantities of labeled data to train. Moreover, these models cannot be easily reused in new clinical contexts as they often require the collection, de-identification and annotation of site-specific data for every new deployment environment, which is both laborious and expensive. This problem of data-efficient generalization (a model’s ability to generalize to new settings using minimal new data) continues to be a key translational challenge for medical machine learning (ML) models and has in turn, prevented their broad uptake in real world healthcare settings.

The emergence of foundation models offers a significant opportunity to rethink development of medical AI to make it more performant, safer, and equitable. These models are trained using data at scale, often by self-supervised learning. This process results in generalist models that can rapidly be adapted to new tasks and environments with less need for supervised data. With foundation models, it may be possible to safely and efficiently deploy models across various clinical contexts and environments.

In “Robust and Efficient MEDical Imaging with Self-supervision” (REMEDIS), to be published in Nature Biomedical Engineering, we introduce a unified large-scale self-supervised learning framework for building foundation medical imaging models. This strategy combines large scale supervised transfer learning with self-supervised learning and requires minimal task-specific customization. REMEDIS shows significant improvement in data-efficient generalization across medical imaging tasks and modalities with a 3–100x reduction in site-specific data for adapting models to new clinical contexts and environments. Building on this, we are excited to announce Medical AI Research Foundations (hosted by PhysioNet), an expansion of the public release of chest X-ray Foundations in 2022. Medical AI Research Foundations is a collection of open-source non-diagnostic models (starting with REMEDIS models), APIs, and resources to help researchers and developers accelerate medical AI research.

Large scale self-supervision for medical imaging

REMEDIS uses a combination of natural (non-medical) images and unlabeled medical images to develop strong medical imaging foundation models. Its pre-training strategy consists of two steps. The first involves supervised representation learning on a large-scale dataset of labeled natural images (pulled from Imagenet 21k or JFT) using the Big Transfer (BiT) method.

The second step involves intermediate self-supervised learning, which does not require any labels and instead, trains a model to learn medical data representations independently of labels. The specific approach used for pre-training and learning representations is SimCLR. The method works by maximizing agreement between differently augmented views of the same training example via a contrastive loss in a hidden layer of a feed-forward neural network with multilayer perceptron (MLP) outputs. However, REMEDIS is equally compatible with other contrastive self-supervised learning methods. This training method is applicable for healthcare environments as many hospitals acquire raw data (images) as a routine practice. While processes would have to be implemented to make this data usable within models (i.e., patient consent prior to gathering the data, de-identification, etc.), the costly, time-consuming, and difficult task of labeling that data could be avoided using REMEDIS.

REMEDIS leverages large-scale supervised learning using natural images and self-supervised learning using unlabeled medical data to create strong foundation models for medical imaging.

Given ML model parameter constraints, it is important that our proposed approach works when using both small and large model architecture sizes. To study this in detail, we considered two ResNet architectures with commonly used depth and width multipliers, ResNet-50 (1×) and ResNet-152 (2×) as the backbone encoder networks.

After pre-training, the model was fine-tuned using labeled task-specific medical data and evaluated for in-distribution task performance. In addition, to evaluate the data-efficient generalization, the model was also optionally fine-tuned using small amounts of out-of-distribution (OOD) data.

REMEDIS starts with representations initialized using large-scale natural image pretraining following the Big Transfer (BiT) method. We then adapt the model to the medical domain using intermediate contrastive self-supervised learning without using any labeled medical data. Finally, we fine-tune the model to specific downstream medical imaging tasks. We evaluate the ML model both in an in-distribution (ID) setting and in an out-of-distribution (OOD) setting to establish the data-efficient generalization performance of the model.

Evaluation and results

To evaluate the REMEDIS model’s performance, we simulate realistic scenarios using retrospective de-identified data across a broad range of medical imaging tasks and modalities, including dermatology, retinal imaging, chest X-ray interpretation, pathology and mammography. We further introduce the notion of data-efficient generalization, capturing the model’s ability to generalize to new deployment distributions with a significantly reduced need for expert annotated data from the new clinical setting. In-distribution performance is measured as (1) improvement in zero-shot generalization to OOD settings (assessing performance in an OOD evaluation set, with zero access to training data from the OOD dataset) and (2) significant reduction in the need for annotated data from the OOD settings to reach performance equivalent to clinical experts (or threshold demonstrating clinical utility). REMEDIS exhibits significantly improved in-distribution performance with up to 11.5% relative improvement in diagnostic accuracy over a strongly supervised baseline.

More importantly, our strategy leads to data-efficient generalization of medical imaging models, matching strong supervised baselines resulting in a 3–100x reduction in the need for retraining data. While SimCLR is the primary self-supervised learning approach used in the study, we also show that REMEDIS is compatible with other approaches, such as MoCo-V2, RELIC and Barlow Twins. Furthermore, the approach works across model architecture sizes.

REMEDIS outperformed the supervised baseline pre-trained on JFT-300M for various medical tasks and demonstrated improved data-efficient generalization, reducing data needs by 3–100x for adapting models to new clinical settings. This could potentially translate to significant reduction in clinician hours saved annotating data and cost of developing robust medical imaging systems.

REMEDIS is compatible with MoCo-V2, RELIC and Barlow Twins as alternate self-supervised learning strategies. All the REMEDIS variants lead to data-efficient generalization improvements over the strong supervised baseline for dermatology condition classification (T1), diabetic macular edema classification (T2), and chest X-ray condition classification (T3). The gray shaded area indicates the performance of the strong supervised baseline pre-trained on JFT.

Medical AI Research Foundations

Building on REMEDIS, we are excited to announce Medical AI Research Foundations, an expansion of the public release of chest X-ray Foundations in 2022. Medical AI Research Foundations is a repository of open-source medical foundation models hosted by PhysioNet. This expands the previous API-based approach to also encompass non-diagnostic models, to help researchers and developers accelerate their medical AI research. We believe that REMEDIS and the release of the Medical AI Research Foundations are a step toward building medical models that can generalize across healthcare settings and tasks.

We are seeding Medical AI Research Foundations with REMEDIS models for chest X-ray and pathology (with related code). Whereas the existing chest X-ray Foundation approach focuses on providing frozen embeddings for application-specific fine tuning from a model trained on several large private datasets, the REMEDIS models (trained on public datasets) enable users to fine-tune end-to-end for their application, and to run on local devices. We recommend users test different approaches based on their unique needs for their desired application. We expect to add more models and resources for training medical foundation models such as datasets and benchmarks in the future. We also welcome the medical AI research community to contribute to this.

Conclusion

These results suggest that REMEDIS has the potential to significantly accelerate the development of ML systems for medical imaging, which can preserve their strong performance when deployed in a variety of changing contexts. We believe this is an important step forward for medical imaging AI to deliver a broad impact. Beyond the experimental results presented, the approach and insights described here have been integrated into several of Google’s medical imaging research projects, such as dermatology, mammography and radiology among others. We’re using a similar self-supervised learning approach with our non-imaging foundation model efforts, such as Med-PaLM and Med-PaLM 2.

With REMEDIS, we demonstrated the potential of foundation models for medical imaging applications. Such models hold exciting possibilities in medical applications with the opportunity of multimodal representation learning. The practice of medicine is inherently multimodal and incorporates information from images, electronic health records, sensors, wearables, genomics and more. We believe ML systems that leverage these data at scale using self-supervised learning with careful consideration of privacy, safety, fairness and ethics will help lay the groundwork for the next generation of learning health systems that scale world-class healthcare to everyone.

Acknowledgements

This work involved extensive collaborative efforts from a multidisciplinary team of researchers, software engineers, clinicians, and cross-functional contributors across Google Health AI and Google Brain. In particular, we would like to thank our first co-author Jan Freyberg and our lead senior authors of these projects, Vivek Natarajan, Alan Karthikesalingam, Mohammad Norouzi and Neil Houlsby for their invaluable contributions and support. We also thank Lauren Winer, Sami Lachgar, Yun Liu and Karan Singhal for their feedback on this post and Tom Small for support in creating the visuals. Finally, we also thank the PhysioNet team for their support on hosting Medical AI Research Foundations. Users with questions can reach out to medical-ai-research-foundations at google.com.

Vedere AI

Monthly Archives: April 2023

Introducing Hidet: A Deep Learning Compiler for Efficient Model Serving

Using Hidet to Compile A PyTorch Model

Using Hidet Script to Write Custom Operators

More about Hidet

Additional Resources

Acknowledgement

DeepMind’s latest research at ICLR 2023

DeepMind’s latest research at ICLR 2023

DeepMind’s latest research at ICLR 2023

DeepMind’s latest research at ICLR 2023

DeepMind’s latest research at ICLR 2023

DeepMind’s latest research at ICLR 2023

DeepMind’s latest research at ICLR 2023

DeepMind’s latest research at ICLR 2023

Robust and efficient medical imaging with self-supervision

Large scale self-supervision for medical imaging

Evaluation and results

Medical AI Research Foundations

Conclusion

Acknowledgements

Navigation

GenAI Vision Endless Possibilities

"I'm interested in things that change the world or that affect the future and wondrous, new technology where you see it, and you're like, 'Wow, how did that even happen? How is that possible?'" -- Elon Musk

Copyright © 2019-2025 Vedere AI. All Rights Reserved.