Distributional Graphormer: Toward equilibrium distribution prediction for molecular systems

Distributional Graphormer: Toward equilibrium distribution prediction for molecular systems

Distributional Graphormer (DiG) animated logo

Structure prediction is a fundamental problem in molecular science because the structure of a molecule determines its properties and functions. In recent years, deep learning methods have made remarkable progress and impact on predicting molecular structures, especially for protein molecules. Deep learning methods, such as AlphaFold and RoseTTAFold, have achieved unprecedented accuracy in predicting the most probable structures for proteins from their amino acid sequences and have been hailed as a game changer in molecular science. However, this method provides only a single snapshot of a protein structure, and structure prediction cannot tell the complete story of how a molecule works.

Proteins are not rigid objects; they are dynamic molecules that can adopt different structures with specific probabilities at equilibrium. Identifying these structures and their probabilities is essential in understanding protein properties and functions, how they interact with other proteins, and the statistical mechanics and thermodynamics of molecular systems. Traditional methods for obtaining these equilibrium distributions, such as molecular dynamics simulations or Monte Carlo sampling (which uses repeated random sampling from a distribution to achieve numerical statistical results), are often computationally expensive and may even become intractable for complex molecules. Therefore, there is a pressing need for novel computational approaches that can accurately and efficiently predict the equilibrium distributions of molecular structures from basic descriptors.

A schematic diagram illustrating the goal of Distributional Graphormer (DiG). A molecular system is represented by a basic descriptor D, such as the amino acid sequence for a protein. DiG transforms D into a structural ensemble S, which consists of multiple possible conformations and their probabilities. S is expected to follow the equilibrium distribution of the molecular system. A legend shows a example of D and S for Adenylate kinase protein.
Figure 1. The goal of Distributional Graphormer (DiG). DiG takes the basic descriptor, D, of a molecular system, such as the amino acid sequence for a protein, as input to predict the structures and their probabilities following equilibrium distribution.

In this blog post, we introduce Distributional Graphormer (DiG), a new deep learning framework for predicting protein structures according to their equilibrium distribution. It aims to address this fundamental challenge and open new opportunities for molecular science. DiG is a significant advancement from single structure prediction to structure ensemble modeling with equilibrium distributions. Its distribution prediction capability bridges the gap between the microscopic structures and the macroscopic properties of molecular systems, which are governed by statistical mechanics and thermodynamics. Nevertheless, this is a tremendous challenge, as it requires modeling complex distributions in high-dimensional space to capture the probabilities of different molecular states.

DiG achieves a novel solution for distribution prediction through an advancement of our previous work, Graphormer, which is a general-purpose graph transformer that can effectively model molecular structures. Graphormer has shown excellent performance in molecular science research, demonstrated by applications in quantum chemistry and molecular dynamics simulations, as reported in our previous blog posts (see here and here for more details). Now, we have advanced Graphormer to create DiG, which has a new and powerful capability: using deep neural networks to directly predict target distribution from basic descriptors of molecules.

SPOTLIGHT: AI focus area

AI and Microsoft Research

Learn more about the breadth of AI research at Microsoft

DiG tackles this challenging problem. It is based on the idea of simulated annealing, a classic method in thermodynamics and optimization, which has also motivated the recent development of diffusion models that achieved remarkable breakthroughs in AI-generated content (AIGC). Simulated annealing produces a complex distribution by gradually refining a simple distribution through the simulation of an annealing process, allowing it to explore and settle in the most probable states. DiG mimics this process in a deep learning framework for molecular systems. AIGC models are often based on the idea of diffusion models, which are inspired by statistical mechanics and thermodynamics.

DiG is also based on the idea of diffusion models, but we bring this idea back to thermodynamics research, creating a closed loop of inspiration and innovation. We imagine scientists someday will be able to use DiG like an AIGC model for drawing, inputting a simple description, such as an amino acid sequence, and then using DiG to quickly generate realistic and diverse protein structures that follow equilibrium distribution. This will greatly enhance scientists’ productivity and creativity, enabling novel discoveries and applications in fields such as drug design, materials science, and catalysis.

How does DiG work?

A schematic diagram illustrating the design and backbone architecture of DiG. The diagram shows a molecular system with two possible conformations as an example. The top row shows the energy function of the molecular system as a curve, with two local minima corresponding to the two conformations. The bottom row shows the probability distribution of the molecular system as a bar chart, with two peaks corresponding to the two conformations. The diagram also shows a diffusion process that transforms the probability distribution from a simple uniform one to the equilibrium one that matches the energy function. The diffusion process consists of several intermediate time steps, labeled as i=0,1,…,T. At each time step, a deep-learning model, Graphormer, is used to construct a forward diffusion step that converts the distribution at the previous time step to the next one, indicated by blue arrows. The Graphormer model is learned to match the distribution at each time step to a predefined backward diffusion step that converts the equilibrium distribution to the simple one, indicated by orange arrows. The backward diffusion step is computed by adding Gaussian noise to the equilibrium distribution and normalizing it. The learning of the Graphormer model is supervised by both the samples and the energy function of the molecular system. The samples are obtained from a large-scale molecular simulation dataset that provides the initial samples and the corresponding energy labels. The energy function is used to calculate the energy scores for the generated samples and guide the diffusion process towards the equilibrium distribution. The diagram also shows a physics-informed diffusion pre-training (PIDP) method that is developed to pre-train DiG with only energy functions as inputs, without the data dependency. The PIDP method uses a contrastive loss function to minimize the distance between the energy scores and the probabilities of the generated samples at each time step. The PIDP method can enhance the generalization of DiG to molecular systems that are not in the dataset.
Figure 2. DiG’s design and backbone architecture.

DiG is based on the idea of diffusion by transforming a simple distribution to a complex distribution using Graphormer. The simple distribution can be a standard Gaussian, and the complex distribution can be the equilibrium distribution of molecular structures. The transformation is done step-by-step, where the whole process mimics the simulated annealing process.

DiG can be trained using different types of data or information. For example, DiG can use energy functions of molecular systems to guide transformation, and it can also use simulated structure data, such as molecular dynamics trajectories, to learn the distribution. More concretely, DiG can use energy functions of molecular systems to guide transformation by minimizing the discrepancy between the energy-based probabilities and the probabilities predicted by DiG. This approach can leverage the prior knowledge of the system and train DiG without stringent dependency on data. Alternatively, DiG can also use simulation data, such as molecular dynamics trajectories, to learn the distribution by maximizing the likelihood of the data under the DiG model.

DiG shows similarly good generalizing abilities on many molecular systems compared with deep learning-based structure prediction methods. This is because DiG inherits the advantages of advanced deep-learning architectures like Graphormer and applies them to the new and challenging task of distribution prediction.  Once trained, DiG can generate molecular structures by reversing the transformation process, starting from a simple distribution and applying neural networks in reverse order. DiG can also provide the probability estimation for each generated structure by computing the change of probability along the transformation process. DiG is a flexible and general framework that can handle different types of molecular systems and descriptors.

Results

We demonstrate DiG’s performance and potential through several molecular sampling tasks covering a broad range of molecular systems, such as proteins, protein-ligand complexes, and catalyst-adsorbate systems. Our results show that DiG not only generates realistic and diverse molecular structures with high efficiency and low computational costs, but it also provides estimations of state densities, which are crucial for computing macroscopic properties using statistical mechanics. Accordingly, DiG presents a significant advancement in statistically understanding microscopic molecules and predicting their macroscopic properties, creating many exciting research opportunities in molecular science.

One major application of DiG is to sample protein conformations, which are indispensable to understanding their properties and functions. Proteins are dynamic molecules that can adopt diverse structures with different probabilities at equilibrium, and these structures are often related to their biological functions and interactions with other molecules. However, predicting the equilibrium distribution of protein conformations is a long-standing and challenging problem due to the complex and high-dimensional energy landscape that governs probability distribution in the conformation space. In contrast to expensive and inefficient molecular dynamics simulations or Monte Carlo sampling methods, DiG generates diverse and functionally relevant protein structures from amino acid sequences at a high speed and a significantly reduced cost.

Figure 3. This illustration shows DiG’s performance when generating multiple conformations of proteins. On the left, DiG-generated structures of the main protease of SARS-CoV-2 virus are projected into 2D space panned with two TICA coordinates. On the right, structures generated by DiG (thin ribbons) are compared with experimentally determined structures (cylindrical figures) in each case.

DiG can generate multiple conformations from the same protein sequence. The left side of Figure 3 shows DiG-generated structures of the main protease of SARS-CoV-2 virus compared with MD simulations and AlphaFold prediction results. The contours (shown as lines) in the 2D space reveal three clusters sampled by extensive MD simulations. DiG generates highly similar structures in clusters II and III, while structures in cluster I are undersampled. In the right panel, DiG-generated structures are aligned to experimental structures for four proteins, each with two distinguishable conformations corresponding to unique functional states. In the upper left, the Adenylate kinase protein has open and closed states, both well sampled by DiG. Similarly, for the drug transport protein LmrP, DiG also generates structures for both states. Here, note that the closed state is experimentally determined (in the lower-right corner, with PDB ID 6t1z), while the other is the AlphaFold predicted model that is consistent with experimental data. In the case of human B-Raf kinase, the major structural difference is localized in the A-loop region and a nearby helix, which are well captured by DiG. The D-ribose binding protein has two separated domains, which can be packed into two distinct conformations. DiG perfectly generated the straight-up conformation, but it is less accurate in predicting the twisted conformation. Nonetheless, besides the straight-up conformation, DiG generated some conformations that appear to be intermediate states.

Another application of DiG is to sample catalyst-adsorbate systems, which are central to heterogeneous catalysis. Identifying active adsorption sites and stable adsorbate configurations is crucial for understanding and designing catalysts, but it is also quite challenging due to the complex surface-molecular interactions. Traditional methods, such as density functional theory (DFT) calculations and molecular dynamics simulations, are time-consuming and costly, especially for large and complex surfaces. DiG predicts adsorption sites and configurations, as well as their probabilities, from the substrate and adsorbate descriptors. DiG can handle various types of adsorbates, such as single atoms or molecules being adsorbed onto different types of substrates, such as metals or alloys.

Figure 4. Adsorption prediction results of single C, H, and O atoms on catalyst surfaces. The predicted probability distribution on catalyst surface is compared to the interaction energy between the adsorbate molecules and the catalyst in the middle and bottom rows.
Figure 4. Adsorption prediction results of single C, H, and O atoms on catalyst surfaces. The predicted probability distribution on catalyst surface is compared to the interaction energy between the adsorbate molecules and the catalyst in the middle and bottom rows.

Applying DiG, we predicted the adsorption sites for a variety of catalyst-adsorbate systems and compared these predicted probabilities with energies obtained from DFT calculations. We found that DiG could find all the stable adsorption sites and generate adsorbate configurations that are similar to the DFT results with high efficiency and at a low cost. DiG estimates the probabilities of different adsorption configurations, in good agreement with DFT energies.

Conclusion

In this blog, we introduced DiG, a deep learning framework that aims to predict the distribution of molecular structures. DiG is a significant advancement from single structure prediction toward ensemble modeling with equilibrium distributions, setting a cornerstone for connecting microscopic structures to macroscopic properties under deep learning frameworks.

DiG involves key ML innovations that lead to expressive generative models, which have been shown to have the capacity to sample multimodal distribution within a given class of molecules. We have demonstrated the flexibility of this approach on different classes of molecules (including proteins, etc.), and we have shown that individual structures generated in this way are chemically realistic. Consequently, DiG enables the development of ML systems that can sample equilibrium distributions of molecules given appropriate training data.

However, we acknowledge that considerably more research is needed to obtain efficient and reliable predictions of equilibrium distributions for arbitrary molecules. We hope that DiG inspires additional research and innovation in this direction, and we look forward to more exciting results and impact from DiG and other related methods in the future.

The post Distributional Graphormer: Toward equilibrium distribution prediction for molecular systems appeared first on Microsoft Research.

Read More

Pic2Word: Mapping pictures to words for zero-shot composed image retrieval

Pic2Word: Mapping pictures to words for zero-shot composed image retrieval

Image retrieval plays a crucial role in search engines. Typically, their users rely on either image or text as a query to retrieve a desired target image. However, text-based retrieval has its limitations, as describing the target image accurately using words can be challenging. For instance, when searching for a fashion item, users may want an item whose specific attribute, e.g., the color of a logo or the logo itself, is different from what they find in a website. Yet searching for the item in an existing search engine is not trivial since precisely describing the fashion item by text can be challenging. To address this fact, composed image retrieval (CIR) retrieves images based on a query that combines both an image and a text sample that provides instructions on how to modify the image to fit the intended retrieval target. Thus, CIR allows precise retrieval of the target image by combining image and text.

However, CIR methods require large amounts of labeled data, i.e., triplets of a 1) query image, 2) description, and 3) target image. Collecting such labeled data is costly, and models trained on this data are often tailored to a specific use case, limiting their ability to generalize to different datasets.

To address these challenges, in “Pic2Word: Mapping Pictures to Words for Zero-shot Composed Image Retrieval”, we propose a task called zero-shot CIR (ZS-CIR). In ZS-CIR, we aim to build a single CIR model that performs a variety of CIR tasks, such as object composition, attribute editing, or domain conversion, without requiring labeled triplet data. Instead, we propose to train a retrieval model using large-scale image-caption pairs and unlabeled images, which are considerably easier to collect than supervised CIR datasets at scale. To encourage reproducibility and further advance this space, we also release the code.

Description of existing composed image retrieval model.
We train a composed image retrieval model using image-caption data only. Our model retrieves images aligned with the composition of the query image and text.

Method overview

We propose to leverage the language capabilities of the language encoder in the contrastive language-image pre-trained model (CLIP), which excels at generating semantically meaningful language embeddings for a wide range of textual concepts and attributes. To that end, we use a lightweight mapping sub-module in CLIP that is designed to map an input picture (e.g., a photo of a cat) from the image embedding space to a word token (e.g., “cat”) in the textual input space. The whole network is optimized with the vision-language contrastive loss to again ensure the visual and text embedding spaces are as close as possible given a pair of an image and its textual description. Then, the query image can be treated as if it is a word. This enables the flexible and seamless composition of query image features and text descriptions by the language encoder. We call our method Pic2Word and provide an overview of its training process in the figure below. We want the mapped token s to represent the input image in the form of word token. Then, we train the mapping network to reconstruct the image embedding in the language embedding, p. Specifically, we optimize the contrastive loss proposed in CLIP computed between the visual embedding v and the textual embedding p.

Training of the mapping network (fM) using unlabeled images only. We optimize only the mapping network with a frozen visual and text encoder.

Given the trained mapping network, we can regard an image as a word token and pair it with the text description to flexibly compose the joint image-text query as shown in the figure below.

With the trained mapping network, we regard the image as a word token and pair it with the text description to flexibly compose the joint image-text query.

Evaluation

We conduct a variety of experiments to evaluate Pic2Word’s performance on a variety of CIR tasks.

Domain conversion

We first evaluate the capability of compositionality of the proposed method on domain conversion — given an image and the desired new image domain (e.g., sculpture, origami, cartoon, toy), the output of the system should be an image with the same content but in the new desired image domain or style. As illustrated below, we evaluate the ability to compose the category information and domain description given as an image and text, respectively. We evaluate the conversion from real images to four domains using ImageNet and ImageNet-R.

To compare with approaches that do not require supervised training data, we pick three approaches: (i) image only performs retrieval only with visual embedding, (ii) text only employs only text embedding, and (iii) image + text averages the visual and text embedding to compose the query. The comparison with (iii) shows the importance of composing image and text using a language encoder. We also compare with Combiner, which trains the CIR model on Fashion-IQ or CIRR.

We aim to convert the domain of the input query image into the one described with text, e.g., origami.

As shown in figure below, our proposed approach outperforms baselines by a large margin.

Results (recall@10, i.e., the percentage of relevant instances in the first 10 images retrieved.) on composed image retrieval for domain conversion.

Fashion attribute composition

Next, we evaluate the composition of fashion attributes, such as the color of cloth, logo, and length of sleeve, using the Fashion-IQ dataset. The figure below illustrates the desired output given the query.

Overview of CIR for fashion attributes.

In the figure below, we present a comparison with baselines, including supervised baselines that utilized triplets for training the CIR model: (i) CB uses the same architecture as our approach, (ii) CIRPLANT, ALTEMIS, MAAF use a smaller backbone, such as ResNet50. Comparison to these approaches will give us the understanding on how well our zero-shot approach performs on this task.

Although CB outperforms our approach, our method performs better than supervised baselines with smaller backbones. This result suggests that by utilizing a robust CLIP model, we can train a highly effective CIR model without requiring annotated triplets.

Results (recall@10, i.e., the percentage of relevant instances in the first 10 images retrieved.) on composed image retrieval for Fashion-IQ dataset (higher is better). Light blue bars train the model using triplets. Note that our approach performs on par with these supervised baselines with shallow (smaller) backbones.

Qualitative results

We show several examples in the figure below. Compared to a baseline method that does not require supervised training data (text + image feature averaging), our approach does a better job of correctly retrieving the target image.

Qualitative results on diverse query images and text description.

Conclusion and future work

In this article, we introduce Pic2Word, a method for mapping pictures to words for ZS-CIR. We propose to convert the image into a word token to achieve a CIR model using only an image-caption dataset. Through a variety of experiments, we verify the effectiveness of the trained model on diverse CIR tasks, indicating that training on an image-caption dataset can build a powerful CIR model. One potential future research direction is utilizing caption data to train the mapping network, although we use only image data in the present work.

Acknowledgements

This research was conducted by Kuniaki Saito, Kihyuk Sohn, Xiang Zhang, Chun-Liang Li, Chen-Yu Lee, Kate Saenko, and Tomas Pfister. Also thanks to Zizhao Zhang and Sergey Ioffe for their valuable feedback.

Read More

Integrate SaaS platforms with Amazon SageMaker to enable ML-powered applications

Integrate SaaS platforms with Amazon SageMaker to enable ML-powered applications

Amazon SageMaker is an end-to-end machine learning (ML) platform with wide-ranging features to ingest, transform, and measure bias in data, and train, deploy, and manage models in production with best-in-class compute and services such as Amazon SageMaker Data Wrangler, Amazon SageMaker Studio, Amazon SageMaker Canvas, Amazon SageMaker Model Registry, Amazon SageMaker Feature Store, Amazon SageMaker Pipelines, Amazon SageMaker Model Monitor, and Amazon SageMaker Clarify. Many organizations choose SageMaker as their ML platform because it provides a common set of tools for developers and data scientists. A number of AWS independent software vendor (ISV) partners have already built integrations for users of their software as a service (SaaS) platforms to utilize SageMaker and its various features, including training, deployment, and the model registry.

In this post, we cover the benefits for SaaS platforms to integrate with SageMaker, the range of possible integrations, and the process for developing these integrations. We also deep dive into the most common architectures and AWS resources to facilitate these integrations. This is intended to accelerate time-to-market for ISV partners and other SaaS providers building similar integrations and inspire customers who are users of SaaS platforms to partner with SaaS providers on these integrations.

Benefits of integrating with SageMaker

There are a number of benefits for SaaS providers to integrate their SaaS platforms with SageMaker:

  • Users of the SaaS platform can take advantage of a comprehensive ML platform in SageMaker
  • Users can build ML models with data that is in or outside of the SaaS platform and exploit these ML models
  • It provides users with a seamless experience between the SaaS platform and SageMaker
  • Users can utilize foundation models available in Amazon SageMaker JumpStart to build generative AI applications
  • Organizations can standardize on SageMaker
  • SaaS providers can focus on their core functionality and offer SageMaker for ML model development
  • It equips SaaS providers with a basis to build joint solutions and go to market with AWS

SageMaker overview and integration options

SageMaker has tools for every step of the ML lifecycle. SaaS platforms can integrate with SageMaker across the ML lifecycle from data labeling and preparation to model training, hosting, monitoring, and managing models with various components, as shown in the following figure. Depending on the needs, any and all parts of the ML lifecycle can be run in either the customer AWS account or SaaS AWS account, and data and models can be shared across accounts using AWS Identity and Access Management (IAM) policies or third-party user-based access tools. This flexibility in the integration makes SageMaker an ideal platform for customers and SaaS providers to standardize on.

SageMaker overview

Integration process and architectures

In this section, we break the integration process into four main stages and cover the common architectures. Note that there can be other integration points in addition to these, but those are less common.

  • Data access – How data that is in the SaaS platform is accessed from SageMaker
  • Model training – How the model is trained
  • Model deployment and artifacts – Where the model is deployed and what artifacts are produced
  • Model inference – How the inference happens in the SaaS platform

The diagrams in the following sections assume SageMaker is running in the customer AWS account. Most of the options explained are also applicable if SageMaker is running in the SaaS AWS account. In some cases, an ISV may deploy their software in the customer AWS account. This is usually in a dedicated customer AWS account, meaning there still needs to be cross-account access to the customer AWS account where SageMaker is running.

There are a few different ways in which authentication across AWS accounts can be achieved when data in the SaaS platform is accessed from SageMaker and when the ML model is invoked from the SaaS platform. The recommended method is to use IAM roles. An alternative is to use AWS access keys consisting of an access key ID and secret access key.

Data access

There are multiple options on how data that is in the SaaS platform can be accessed from SageMaker. Data can either be accessed from a SageMaker notebook, SageMaker Data Wrangler, where users can prepare data for ML, or SageMaker Canvas. The most common data access options are:

  • SageMaker Data Wrangler built-in connector – The SageMaker Data Wrangler connector enables data to be imported from a SaaS platform to be prepared for ML model training. The connector is developed jointly by AWS and the SaaS provider. Current SaaS platform connectors include Databricks and Snowflake.
  • Amazon Athena Federated Query for the SaaS platformFederated queries enable users to query the platform from a SageMaker notebook via Amazon Athena using a custom connector that is developed by the SaaS provider.
  • Amazon AppFlow – With Amazon AppFlow, you can use a custom connector to extract data into Amazon Simple Storage Service (Amazon S3) which subsequently can be accessed from SageMaker. The connector for a SaaS platform can be developed by AWS or the SaaS provider. The open-source Custom Connector SDK enables the development of a private, shared, or public connector using Python or Java.
  • SaaS platform SDK – If the SaaS platform has an SDK (Software Development Kit), such as a Python SDK, this can be used to access data directly from a SageMaker notebook.
  • Other options – In addition to these, there can be other options depending on whether the SaaS provider exposes their data via APIs, files or an agent. The agent can be installed on Amazon Elastic Compute Cloud (Amazon EC2) or AWS Lambda. Alternatively, a service such as AWS Glue or a third-party extract, transform, and load (ETL) tool can be used for data transfer.

The following diagram illustrates the architecture for data access options.

Data access

Model training

The model can be trained in SageMaker Studio by a data scientist, using Amazon SageMaker Autopilot by a non-data scientist, or in SageMaker Canvas by a business analyst. SageMaker Autopilot takes away the heavy lifting of building ML models, including feature engineering, algorithm selection, and hyperparameter settings, and it is also relatively straightforward to integrate directly into a SaaS platform. SageMaker Canvas provides a no-code visual interface for training ML models.

In addition, Data scientists can use pre-trained models available in SageMaker JumpStart, including foundation models from sources such as Alexa, AI21 Labs, Hugging Face, and Stability AI, and tune them for their own generative AI use cases.

Alternatively, the model can be trained in a third-party or partner-provided tool, service, and infrastructure, including on-premises resources, provided the model artifacts are accessible and readable.

The following diagram illustrates these options.

Model training

Model deployment and artifacts

After you have trained and tested the model, you can either deploy it to a SageMaker model endpoint in the customer account, or export it from SageMaker and import it into the SaaS platform storage. The model can be stored and imported in standard formats supported by the common ML frameworks, such as pickle, joblib, and ONNX (Open Neural Network Exchange).

If the ML model is deployed to a SageMaker model endpoint, additional model metadata can be stored in the SageMaker Model Registry, SageMaker Model Cards, or in a file in an S3 bucket. This can be the model version, model inputs and outputs, model metrics, model creation date, inference specification, data lineage information, and more. Where there isn’t a property available in the model package, the data can be stored as custom metadata or in an S3 file.

Creating such metadata can help SaaS providers manage the end-to-end lifecycle of the ML model more effectively. This information can be synced to the model log in the SaaS platform and used to track changes and updates to the ML model. Subsequently, this log can be used to determine whether to refresh downstream data and applications that use that ML model in the SaaS platform.

The following diagram illustrates this architecture.

Model deployment and artifacts

Model inference

SageMaker offers four options for ML model inference: real-time inference, serverless inference, asynchronous inference, and batch transform. For the first three, the model is deployed to a SageMaker model endpoint and the SaaS platform invokes the model using the AWS SDKs. The recommended option is to use the Python SDK. The inference pattern for each of these is similar in that the predictor’s predict() or predict_async() methods are used. Cross-account access can be achieved using role-based access.

It’s also possible to seal the backend with Amazon API Gateway, which calls the endpoint via a Lambda function that runs in a protected private network.

For batch transform, data from the SaaS platform first needs to be exported in batch into an S3 bucket in the customer AWS account, then the inference is done on this data in batch. The inference is done by first creating a transformer job or object, and then calling the transform() method with the S3 location of the data. Results are imported back into the SaaS platform in batch as a dataset, and joined to other datasets in the platform as part of a batch pipeline job.

Another option for inference is to do it directly in the SaaS account compute cluster. This would be the case when the model has been imported into the SaaS platform. In this case, SaaS providers can choose from a range of EC2 instances that are optimized for ML inference.

The following diagram illustrates these options.

Model inference

Example integrations

Several ISVs have built integrations between their SaaS platforms and SageMaker. To learn more about some example integrations, refer to the following:

Conclusion

In this post, we explained why and how SaaS providers should integrate SageMaker with their SaaS platforms by breaking the process into four parts and covering the common integration architectures. SaaS providers looking to build an integration with SageMaker can utilize these architectures. If there are any custom requirements beyond what has been covered in this post, including with other SageMaker components, get in touch with your AWS account teams. Once the integration has been built and validated, ISV partners can join the AWS Service Ready Program for SageMaker and unlock a variety of benefits.

We also ask customers who are users of SaaS platforms to register their interest in an integration with Amazon SageMaker with their AWS account teams, as this can help inspire and progress the development for SaaS providers.


About the Authors

Mehmet Bakkaloglu is a Principal Solutions Architect at AWS, focusing on Data Analytics, AI/ML and ISV partners.

Raj Kadiyala is a Principal AI/ML Evangelist at AWS.

Read More

‘My Favorite 3D App’: Blender Fanatic Shares His Japanese-Inspired Scene This Week ‘In the NVIDIA Studio’

‘My Favorite 3D App’: Blender Fanatic Shares His Japanese-Inspired Scene This Week ‘In the NVIDIA Studio’

Editor’s note: This post is part of our weekly In the NVIDIA Studio series, which celebrates featured artists, offers creative tips and tricks, and demonstrates how NVIDIA Studio technology improves creative workflows. We’re also deep diving on new GeForce RTX 40 Series GPU features, technologies and resources, and how they dramatically accelerate content creation.

A diverse range of artists, fashionistas, musicians and the cinematic arts inspired the creative journey of Pedro Soares, aka Blendeered, and helped him fall in love with using 3D to create art.

Now, the Porto, Portugal-based artist uses his own life experiences and interactions with people, regardless of their artistic background, to realize his artistic vision.

Enamored by Japanese culture, Blendeered sought to make a representation of an old Japanese temple, dedicated to an animal that has consistently delivered artistic inspiration — the mighty wolf. The result is Japanese Temple Set, a short animation that is the subject of this week’s edition of In the NVIDIA Studio, built with Blender and Blackmagic Design’s DaVinci Resolve.

 

In addition, get a glimpse of two cloud-based AI apps, Wondershare Filmora and Trimble SketchUp Go, powered by NVIDIA RTX GPUs, and learn how they can elevate and automate content creation.

Finally, the #SetTheScene challenge has come to an end. Check out highlights from some of the many incredible submissions.

An AI on the Future

Week by week, AI becomes more ubiquitous within the content creation workflows of aspiring artists and creative professionals. Those who own NVIDIA or GeForce RTX GPUs can take advantage of Tensor Cores that utilize AI to accelerate over 100 apps. For others yet to upgrade and seeking more versatility, NVIDIA is working with the top creative app publishers to accelerate their apps on RTX GPUs from the cloud.

Take the Wondershare Filmora app, which creators can use to capture and touch up video on their mobile devices. They can, for example, add photos and transform them into an animated video with the app’s AI Image feature. Those with a PC powered by RTX GPUs can send files to the Filmora desktop app and continue to edit with local RTX acceleration, such as by exporting video at double the speed with dual encoders on RTX 4070 Ti or above GPUs.

Data is based on tests carried out by Filmora technical experts. To test the export of 4K footage in H.265 and AV1 formats, Wondershare Filmora 12 has been performed on computers with RTX 3090 and 4090 graphics cards respectively.

With the Trimble SketchUp Go app, architects can design structures on any device — such as an iPad —  without loss in performance thanks to RTX acceleration in the cloud. Projects can be synced in the cloud using Trimble Connect, allowing users to refine projects on their RTX-powered PC using the SketchUp Pro app. There’s even an Omniverse Connector for Trimble, enabling SketchUp Pro compatibility with all apps on NVIDIA Omniverse, a development platform for connecting and building 3D tools and applications.

Cloud-based AI support with Trimble SketchUp Go.

Hungry Like the Wolf 

To gather inspiration, and reference materials, to fuel his Japanese temple project, Blendeered browsed Google, Pinterest and PureRef, a stand-alone app for creating mood boards. He sought wolf-inspired and Japanese open-source 3D assets to enrich the scene he envisions before modeling.

“Little details in the scene are a celebration of wolves, for example, the paintings on the ceiling and the statues,” said Blendeered. “I did this scene with the goal of invoking calm and relaxing emotions, giving people a moment to breathe and catch their breath.”

Work began in Blender with the block-out phase — creating a rough-draft level built using simple 3D shapes, without details or polished art assets. This helped to keep base meshes clean, eliminating the need to create new meshes in the next round, which required only minor edits.

Experimenting with camera angles.

Blender is the most popular open-source 3D app in the world as it supports the entirety of the 3D pipeline. Blendeered uses it to apply textures, adjust lighting and animate the scene with ease.

“Blender, my ultimate 3D app, captivates with its friendly interface, speed, power, real-time rendering, diverse addons, vibrant community, and the best part — it’s free!” said Blendeered, whose moniker underscores his enthusiasm.

Aided by his NVIDIA Studio laptop powered by GeForce RTX graphics, Blendeered used RTX-accelerated OptiX ray tracing in the viewport for interactive, photoreal rendering for his modeling and animation needs.

Blendeered can view footage through an Instagram aspect ratio.

“GPU acceleration for real-time rendering in Blender helps a lot by allowing instant feedback on how scenes look and what needs to be changed and improved,” said the artist.

With final renders ready, Blendeered accessed the Blender Cycles renderer and OptiX ray tracing to export final frames quickly, importing the project into DaVinci Resolve for post-production.

Here, his RTX card was put to work again, refining the scene with GPU-accelerated color grading, video editing and color scopes.

Node application in DaVinci Resolve.

The GPU-accelerated decoder (NVDEC) unlocked smoother playback and scrubbing of high-resolution and multistream videos, saving Blendeered massive amounts of time.

Blendeered had numerous RTX-accelerated AI-effects at his disposal, including Cut Scene Detection for automatically tagging clips and tracking of effects, SpeedWarp for smooth slow motion, and seamless video Super Resolution. Even non-GPU-powered effects such as Neural Engine text-based editing can prove to be tremendously useful.

Once satisfied with the animation, Blendeered used the GPU-accelerated encoder (NVENC) to speed up the exporting of his video.

Reflecting on the role his GPU had to play, Blendeered was matter-of-fact: “I chose a GeForce RTX-powered system because of the processing power and compatibility with the software I use.”

3D artist Blendeered.

View Blendeered’s portfolio on blendeered.com.

Follow NVIDIA Studio on Instagram, Twitter and Facebook. Access tutorials on the Studio YouTube channel and get updates directly in your inbox by subscribing to the Studio newsletter. 

Read More

Here’s the Deal: Steam Summer Sale Games Streaming on GeForce NOW

Here’s the Deal: Steam Summer Sale Games Streaming on GeForce NOW

GFN Thursday arrives alongside the sweet Steam Summer Sale — with hundreds of PC games playable on GeForce NOW available during Valve’s special event for PC gamers.

Also on sale, OCTOPATH TRAVELER and OCTOPATH TRAVELER II join the GeForce NOW library as a part of five new games coming to the service this week.

Saved by the Sale

Get great games at great deals to stream across your devices during the Steam Summer Sale. In total, more than 1,000 titles can be found at discounts of up to 90% through July 13.

Grow your game collection with some top picks.

Steam Summer Sale Row on GeForce NOW
Get to the gaming while also saving.

Enjoy iconic Xbox Game Studios hits from the Age of Empires series — even on Mac — thanks to the cloud. Control an empire with the goal of expanding to become a flourishing civilization. Age of Empires II arrives to the GeForce NOW library this week, joining Age of Empires, Age of Empires III and Age of Empires IV.

Stream Square Enix games at beautiful quality on underpowered PCs with heart-wrenching single-player stories like Life is Strange 2 and Life is Strange: True Colors, or battle it out with a squad in the dark sci-fi universe of Outriders.

Become a Viking, sail the open sea and fight monsters in the world of Valheim, or play a spooky game of hide-and-seek as a Ghost or a Hunter in Midnight Ghost Hunt. Take these titles from publisher Coffee Stain Studios on the go playing on nearly any Android or iOS mobile device.

Tune in on the big screen with NVIDIA SHIELD TVs for THQ Nordic favorites. Return to an apocalyptic Earth in the hack-n-slash adventure Darksiders III, or terrorize the people of 1950s Earth as an evil alien in Destroy All Humans!

Experience these titles and 1,600+ other games on GeForce NOW with all of the perks of an Ultimate membership, including RTX 4080 quality, support for 4K 120 frames per second gameplay and ultrawide resolutions, and the longest gaming sessions on the cloud.

Priority and Ultimate members can also experience DLSS 3 and RTX ON for real-time cinematic lighting in supported games.

Choose Your Path

OCTOPATH TRAVELER II on GeForce NOW
Eight travelers, eight stories, one very powerful cloud to stream from.

Visit faraway realms playing OCTOPATH TRAVELER and OCTOPATH TRAVELER II from Square Enix — also on sale on Steam. Members can even start their traveler journey with the OCTOPATH TRAVELER II Prologue demo.

Explore the story of eight travelers hailing from different regions who are set on vastly different ventures. Step into their shoes and use their unique talents to make decisions that will shape your path and aid you along your journey in these two award-winning role-playing games.

In OCTOPATH TRAVELER, engage in side quests and thrilling battles where every choice made by players shapes the storylines and destinies of these remarkable characters. Continue the adventure with OCTOPATH TRAVELER II and a fresh new set of eight travelers in the land of Solistia, a land comprising eastern and western continents divided by the sea.

It’s Good to Be the King

And that’s not all — five new games are joining the GeForce NOW library this week.

Age of Empires II on GeForce NOW
The (Industrial) Revolution will be streaming from the cloud.

Age of Empires II: Definitive Edition celebrates the 20th anniversary of one of the world’s most popular real-time strategy games. Explore all the original campaigns like never before, spanning over 200 hours of gameplay and 1,000 years of human history. Rise to the challenge of leading four new civilizations, exclusive to the Definitive Edition, and head online to challenge other players in a bid for world domination throughout the ages.

Catch the full list of new titles available to find your next adventure:

  • The Legend of Heroes: Trails into Reverie (New release on Steam, July 7)
  • OCTOPATH TRAVELER (Steam)
  • OCTOPATH TRAVELER II (Steam)
  • OCTOPATH TRAVELER II Prologue Demo (Steam)
  • Age of Empires II: Definitive Edition (Steam)

Members can also experience Major League Baseball’s new Virtual Ballpark and be a part of MLB’s interactive All-Star Celebrity Softball Watch Party on Saturday, July 8. Supported by NVIDIA’s global cloud streaming infrastructure, fans will have frictionless access to high-fidelity interactive experiences. Register for the event.

What are you planning to play this weekend? Let us know on Twitter or in the comments below.

Read More

Collaborators: Holoportation™ communication technology with Spencer Fowers and Kwame Darko

Collaborators: Holoportation™ communication technology with Spencer Fowers and Kwame Darko

black and white photos of Dr. Spencer Fowers, a member of the Special Projects Technical Staff at Microsoft Research and Dr. Kwame Darko, a plastic surgeon in the reconstructive plastic surgery and burns center in Ghana’s Korle Bu Teaching Hospital, next to the Microsoft Research Podcast

Episode 142 | July 6, 2023 

Transforming research ideas into meaningful impact is no small feat. It often requires the knowledge and experience of individuals from across disciplines and institutions. Collaborators, a new Microsoft Research Podcast series, explores the relationships—both expected and unexpected—behind the projects, products, and services being pursued and delivered by researchers at Microsoft and the diverse range of people they’re teaming up with.

In this episode, host Dr. Gretchen Huizinga welcomes Dr. Spencer Fowers, a member of the Special Projects Technical Staff at Microsoft Research, and Dr. Kwame Darko, a plastic surgeon in the reconstructive plastic surgery and burns center in Ghana’s Korle Bu Teaching Hospital. The two are part of an intercontinental research project to study Holoportation, a Microsoft 3D capture and communication technology, in the medical setting. The goal of the study—led by Korle Bu, NHS Scotland West of Scotland Innovation Hub, and Canniesburn Plastic Surgery and Burns Unit—is to make specialized healthcare more widely available, especially to those in remote or underserved communities. Fowers and Darko break down how the technology behind Holoportation and the telecommunication device being built around it brings patients and doctors together when being in the same room isn’t an easy option and discuss the potential impact of the work.

Transcript

[TEASER] [MUSIC PLAYS UNDER DIALOGUE]

SPENCER FOWERS: I work with a team that does moonshots for a living, so I’m always looking for, what can we shoot for? And our goal really is like, gosh, where can’t we apply this technology? I mean, just anywhere that it is at all difficult to get, you know, medical expertise, we can ease the burden of doctors by making it so they don’t have to travel to provide this specialized care and increase the access to healthcare to these people that normally wouldn’t be able to get access to it.

KWAME DARKO: So yeah, the scope is as far as the mind can imagine it.

GRETCHEN HUIZINGA: You’re listening to Collaborators, a Microsoft Research Podcast showcasing the range of expertise that goes into transforming mind-blowing ideas into world-changing technologies. I’m Dr. Gretchen Huizinga.

[MUSIC ENDS]


On this episode, I’m talking to Dr. Spencer Fowers, a Principal Member of the Technical Staff at Microsoft Research, and Dr. Kwame Darko, a plastic surgeon at the National Reconstructive Plastic Surgery and Burns Centre at the Korle Bu Teaching Hospital in Accra, Ghana. Spencer and Kwame are working on 3D telemedicine, a project they hope will increase access to specialized healthcare in rural and underserved communities by using live 3D communication, or Holoportation. We’ll learn much more about that in this episode. But first, let’s meet our collaborators.

Spencer, I’ll start with you. Tell us about the technical staff in the Special Projects division of Microsoft Research. What kind of work do you do, what’s the research model, and what’s your particular role there?

SPENCER FOWERS: Hi, Gretchen. Thanks for having me on here. Yeah, um, so our group at Special Projects was kind of patterned after the Lockheed Martin Skunk Works methodology. You know, we are very much a sort of “try big moonshot projects” type group. Our goal is sort of to focus on any sort of pie-in-the-sky idea that has some sort of a business application. So you can imagine we’ve done things like build the world’s first underwater datacenter or do post-quantum cryptography, things like that. Anything that, uh, is a very ambitious project that we can try to iterate on and see what type of an application we can find for it in the real world. And I’m one of the, as you said, a principal member of the technical staff. That means I’m one of the primary researchers, so I wear a lot of different hats. My job is everything from, you know, managing the project, meeting with people like Kwame and the other surgeons that we’ve worked with, and then interfacing there and finding ways that we can take theoretical research and turn it into applied research, actually find a way that we can bring that theory into reality.

HUIZINGA: You know, that’s a really interesting characterization because normally you think of those things in two different buckets, right? The big moonshot research has got a horizon, a time horizon, that’s pretty far out, and the applied research is get it going so it’s productizable or monetizable fairly quickly, and you’re marrying those two kinds of research models?

FOWERS: Yeah. I mean, we fit kind of a really interesting niche here at Microsoft because we get to operate sort of like a startup, but we have the backing of a very large company, so we get to sort of, yeah, take on these moonshot projects that a, a smaller company might not be able to handle and really attack it with the full resources of, of a company like Microsoft.

HUIZINGA: So it’s a moonshot project, but hurry up, let’s get ’er done. [LAUGHS]

FOWERS: Right, yeah.

HUIZINGA: Well, listen, Kwame, you’re a plastic surgeon at Korle Bu in Ghana. In many circles, that term is associated with nonessential cosmetic surgery, but that’s not what we’re talking about here. What constitutes the bulk of your work, and what prompted you to pursue it in the first place?

KWAME DARKO: All right, thanks, Gretchen, and also thank you for having me on the show. So, um, just as you said, my name is Kwame Darko, and I am a plastic surgeon. I think a lot of my passion to become a plastic surgeon came from the fact that at the time, we didn’t have too many plastic surgeons in Ghana. I mean, at the time that I qualified as a plastic surgeon, I was the eighth person in the country. And at the time, there was a population of 20-something million. Currently, we’re around 33 million, 34 million, and we have … we’re still not up to 30 plastic surgeons. So there was quite a bit of work to be done, and my work scopes from all the way to what everybody tends to [associate] plastic surgery with, the cosmetic stuff, across to burns, across to trauma from people with serious accidents that need some parts of their body reconstructed, to tumors of all sorts. Um, one of my fortes is breast cancer and breast reconstruction, but not limiting to that. We also [do] tumors of the leg. And we also help other surgeons to cover up spaces or defects that may have been created when they’ve taken off some sort of cancer or tumor or whatever it may be. So it’s a wide scope, as well as burn surgery and burn care, as well. So that’s the scope of the kind of work that I do.

HUIZINGA: You know, this wasn’t on my list to ask you, but I’m curious, um, both of you … Spencer, where did you get your training? What … what’s your background?

FOWERS: I actually got my PhD at university in computer engineering, uh, focused on computer vision. So a lot of my academic research was in, uh, you know, embedded systems and low-power systems and how we can get a vision-based stuff to work without using a lot of processing. And it actually fits really well for this application here where we’re trying to find low-cost ways that we can bring really high-end vision stuff, you know, and put it inside a hospital.

HUIZINGA: Yeah. So, Kwame, what about you? Where did you get your training, and did you start out in plastic surgery thinking, “Hey, that’s what I want to be”? Or did you start elsewhere and say, “This is cool”?

DARKO: So my, my background is that I did my medical school training here in Ghana at the medical school in Korle Bu and then started my postgraduate training in surgery. Um, over here, you need to do a number of years in just surgery before you can branch out and do a specific type of surgery. So, after my three, four years in that, I decided to do plastic surgery once again here in Ghana. You spend another up to three years—minimum of three years—training in that, which I did. And then you become a plastic surgeon. But then I went on for a bit of extra training and more exposure from different places around the world. I spent some time in Cape Town in South Africa working in a hospital called Groote Schuur. Um, I’ve also had the opportunity to work in, in Glasgow, where this idea originated from, and various courses in different parts of the world from India, the US, and stuff like that.

HUIZINGA: Wow. You know, I could spend a whole podcast asking you what you’ve seen in your lifetime in terms of trauma and burns and all of that. But I won’t, uh, because let’s talk about how this particular project came about, and I’d like both of your perspectives on it. This is a sort of “how I met your mother” story. As I understand it, there were a lot of people and more than two countries involved. Spencer, how do you remember the meet-up?

FOWERS: Yeah, I mean, Holoportation has been around since 2015, but it was around 2018 that, uh, Steven Lo—he’s a plastic surgeon in Glasgow—he approached us with this idea, saying, “Hey, we want to, we want to use this Holoportation technology in a hospital setting.” At that point, he was already working with Kwame and, and other doctors. They have a partnership between the Canniesburn Plastic Surgery unit in Glasgow and the Korle Bu Teaching Hospital in Accra. And he, he approached us with this idea of saying we want to build a clinic remotely so that people can come and see this. There is, like Kwame mentioned, right, a very drastic lack of surgeons in Ghana for the amount of the population, and so he wanted to find a way that he could provide reconstructive plastic surgery consultation to patients even though they’re very far away. Currently, the, you know, the Canniesburn unit, they do these trips every year, every couple of years, where they fly down to Ghana, perform surgeries. And the way it works is basically the surgeons get on an airplane, they fly down to Ghana, and then, you know, the next day they’re in the hospital, all day long, meeting these people that they’re going to operate on the next day, right? And trying to decide the day before the surgery what they’re going to operate on and what they’re going to do and get the consent from these patients. Is there a better way? Could we actually talk to these patients ahead of time? And 2D video calls just didn’t cut it. It wasn’t good enough, and Kwame can talk more about that. But his idea was, can we use something like this to make a 3D model of a patient, have a live conversation with them in 3D so that the surgeon can evaluate them before they go to Ghana and get an idea of what they’re going to do and be able to explain to the patient what they want to do before the surgery has to happen.

HUIZINGA: Yeah. So Microsoft Research, how did that group get involved?

FOWERS: Well, so we started with this technology back in, you know, 2015. And when he approached us with this idea, we were looking for ways that we could apply Holoportation to different areas, different markets. This came up as like one of those perfect fits for the technology, where we wanted to be able to use the system to image someone, it needed to be a live conversation, not a recording, and so that was, right there … was where we started working with them and designing the cameras that would go into the system they’re using today.

HUIZINGA: Right, right, right. Well, Kwame, in light of, uh, the increase, as we’ve just referred to, in 2D telemedicine, especially during COVID and, and post-COVID, people have gotten pretty used to talking to doctors over a screen as opposed to going in person. But there are drawbacks and shortcomings of 2D in your world. So how does 3D fill in those gaps, and, and what was attractive to you in this particular technology for the application you need?

DARKO: OK. So great, um, just as you’re saying, COVID really did spark the, uh, the spread of 2D telemedicine all over the world. But for myself, as a surgeon and particularly so as a plastic surgeon, we’re trying to think about, how is 2D video going to help me solve my problem or plan towards solving my problem for a patient? And you realize there is a significant shortfall when we’re not just dealing with the human being as a 2D object, uh, but 3D perspective is so important. So one of the most common things we’ve used this system to help us with is when we’re assessing a patient to decide which part of the body we’re going to move and use it to fit in the space that’s going to be created by taking out some form of tumor. And not only taking it out in 3D for us to know that it’s going to fit and be big enough but also demonstrating to the patient so they have a deeper understanding of exactly what is going to go and be used to reconstruct whichever part of their body and what defect is going to be left behind. So as against when you’re just having a straightforward consultation back and forth, answer and response, question and response, in this situation, we get the opportunity and have the ability to actually turn the patient around and then measure out specific problem … parts of the body that we’re going to take off and then transpose that on a different part of the body to make sure that it’s also going to be big enough to switch around and transpose. And when I’m saying transpose, I’m talking about maybe sticking something off from the front part of your thigh and then filling that in with maybe massive parts of your back muscle.

FOWERS: To add on to what Kwame said, you know, for us, for Microsoft Research, when Steven approached us with this, I don’t think we really understood the impact that it could have. You know, we even asked him, why don’t you just use like a cell phone, or why don’t you just use a 2D telemedicine call? Like, why do you need all this technology to do this? And he explained it to us, and we said, OK, like we’re going to take your word for it. It wasn’t until I went over there the first time that it really clicked for me and we had set up the system and he brought in a patient that had had reconstructive plastic surgery. She had had a cancerous tumor that required the amputation of her entire shoulder. So she lost her arm and, you know, this is not something that we think of on a day-to-day basis, but you actually, you can’t wear a shirt if you don’t have a shoulder. And so he was actually taking her elbow and replacing the joint that he was removing with her elbow joint. So he did this entire transpose operation. The stuff that they can do is amazing. But …

HUIZINGA: Right.

FOWERS: … he had done this operation on her probably a year before. And so he was bringing her back in for just the postoperative consult to see how she was doing. He had her in the system, and while she’s sitting in the system, he’s able to rotate the 3D model of her around so that she can see her own back. And he drew on her: “OK, this is where your elbow is now, and this is where we took the material from and what we did.” And during the teleconference, she says, “Oh, that’s what you did. I never knew what you did!” Like … she had had this operation a year ago, never knew what happened to herself because she couldn’t see her own back that way and couldn’t understand it. And it finally clicked to us like, oh my gosh, like, this is why this is important. Like not just because it aids the doctors in planning for surgeries, but the tremendous impact that it has on patient satisfaction with their operation and patient understanding of what’s going to happen.

HUIZINGA: Wow. That’s amazing. Even as you describe that, it’s … ahh … we could go so deep into the strangeness of what they can do with plastic surgery. But let’s talk about technology for a minute. Um, this is a highly visual technology, and we’re just doing a podcast, and we will provide some links in the show notes for people to see this in action, I hope. But in the meantime, Spencer, can you give us a kind of word picture of 3D telemedicine and the technology behind Holoportation? How does it work?

FOWERS: Yeah, the idea behind this technology is, if we can take pictures of a person from multiple angles and we know where those cameras are very, very accurately, we can stitch all those images together to make like a 3D picture of a person. So we’re actually using, for the 3D telemedicine system, we’re using the Azure Kinect. So it’s like Version 3 of the Kinect sensor that was introduced back in the Xbox days. And what that gives us is it gives us not just a color picture like you’re seeing on your normal 2D phone call, but it’s also giving us a depth picture so it can tell how far away you are from the camera. And we take that depth and that color information from 10 different cameras spaced around the room and stitch them all together in real time. So while we’re talking at, you know, normal conversation speed, it’s creating this 3D image of a person that the doctor, in this case, can actually rotate, pan, zoom in, and zoom out and be able to see them from any angle that they want without requiring that patient to get up and move around.

HUIZINGA: Wow. And that speaks to what you just said. The patient can see it as well as the clinician.

FOWERS: Yeah, I mean, you also have this problem with a lot of these patients if they’d had, you know, a leg amputation or something, when we typically talk like we’re talking now on like a, you know, the viewer, the listeners can’t see it, but a typical 2D telemedicine call, you’re looking at me from like my shoulders up. Well, if that person has an amputation of their knee, how do you get it so that you can talk to them in a normal conversation and then look at their knee? You, you just can’t do that on a 2D call. But this system allows them to talk to them and turn and look at their knee and show them—if it’s on their back, wherever it is—what they’re going to do and explain it to them.

HUIZINGA: That’s amazing. Kwame, this project doesn’t just address geographical challenges for remote patients. It also addresses geographical challenges for remote experts. So tell us about the nature and makeup of what you call MDTs­—or multidisciplinary teams—that you collaborate with and how 3D telemedicine impacts the care you’re able to provide because of that.

DARKO: All right. So with an MDT, or multidisciplinary team, just as you said, the focus on medicine these days is to take out individual bias in how we’re going to treat a particular patient, an individual knowledge base. So now what we tend to do is we try and get a group of doctors who would be treating a particular ailment—more often than not, it’s a cancer case—and everybody brings their view on what is best to holistically find a solution to the patient’s … the most ideal remedy for the patient. Now let’s take skin cancer, for example. You’re going to need a plastic surgeon if you’re going to cut it out. You’re going to need a dermatologist who is going to be able to manage it. If it’s that severe, you’re also going to need an oncologist. You may even need a radiologist and, of course, a psychologist and your nursing team, as well. So with an MDT, you’d ideally have members from each of these specialties in a room at a time discussing individual patients and deciding what’s best to do for them. What happens when I don’t have a particular specialty? And what happens when, even though I am the representative of my specialty on this group, I may not have as in-depth knowledge as is needed for this particular patient? What do we do? Do we have access to other brains around the world? Well, with this system, yes, we do. And just as we said earlier, that unlike where this is just a regular let’s say Teams meeting or whatever form of, uh, telemedicine meeting, in this one where we have the 3D edge, we can actually have the patient around in the rig. And as we’re discussing and talking about—and people are giving their ideas—we can swing the patient around and say, well, on this aspect, it would work because this is far away from the ear or closer to the ear, or no, the ear is going to have to go with this; it’s too close. So what do we do? Can we get somebody else to do an ear reconstruction in addition? If it’s, um, something on the back, if we’re taking it all out, is this going to involve the muscle, as well? If so, how are we going to replace the muscle? It’s beyond my scope. But oh! What do you know? We have an expert who’s done this kind of things from, let’s say, Korea or Singapore. And then they would log on and be able to see everything and give their input, as well. So this is another application which just crosses boundary … um, borders and gives us so much more scope to the application of this, uh, this new device.

HUIZINGA: So, so when we’re talking about multidisciplinary teams and, and we look at it from an expert point of view of having all these different disciplines in the room from the medical side, uh, Spencer this collaboration includes technologists, as well as medical professionals, but it also includes patients. You, you talk about what you call a participatory development validation. What is the role of patients in developing this technology?

FOWERS: Well, similar to like that story I was mentioning, right, as we started using this system, the initial goal was to give doctors this better ability to be able to see patients in preparation for surgery. What we found as we started to show this to patients was that it drastically increased their satisfaction from the visits with the doctors because they were able to better understand the operation that was going to be performed. It’s surprising how many times like Kwame and Steven will talk to me and they’ll tell us stories about how like they explain a procedure to a patient about what they’re going to do, and the patient says, “Yeah, OK.” And then they get done and the patient’s like, “Wait, what did you do? Like that doesn’t … I didn’t realize you were going to do that,” you know, because it’s hard for them to understand when you’re just talking about them or whether you’re drawing on a piece of paper. But when you actually have a picture of yourself in front of you that’s live and the doctors indicating on you what’s going to happen and what the surgery is going to be, it drastically increases the patient satisfaction. And so that was actually the direction of the randomized controlled trial that we’re conducting in, in Scotland right now is, what kind of improvement in patient satisfaction does this type of a system provide?

HUIZINGA: Hmm. It’s kind of speaking UX to me, like a patient experience as opposed to a user experience. Um, has it—any of this—fed into sort of feedback loop on technology development, or is it more just on the user side of how I feel about it?

FOWERS: Um, as far as like technology that we use for the system, when we started with Holoportation, we were actually using kind of research-grade cameras and building our own depth cameras and stuff like that, which made for a very expensive system that wasn’t easy to use. That’s why we transitioned over to the Azure Kinect because it’s actually like the highest-resolution depth camera you can get on the market today for this type of information. And so, it’s, it’s really pushed us to find, what can we use that’s more of a compact, you know, all-in-one system so that we can get the data that we need?

HUIZINGA: Right, right, right. Well, Kwame, at about this time, I always ask what could possibly go wrong? But when we talked before, you, you kind of came at this from a cup-half-full outlook because of the nature of what’s already wrong in digital healthcare in general, but particularly for rural and underserved communities, um, and you’ve kind of said what’s wrong is why we’re doing this. So what are some of the problems that are already in the mix, and how does 3D telemedicine mitigate any of them? Things like privacy and connectivity and bandwidth and cost and access and hacking, consent—all of those things that we’re sort of like concerned about writ large?

DARKO: All right. So when I was talking about the cup being half full in terms of these, all of these issues, it’s because these problems already exist. So this technology doesn’t present itself and create a new problem. It’s just going to piggyback off the solutions of what is already in existence. All right? So you, you mentioned most of them anyway. I mean, talking about patient privacy, which is No. 1. Um, all of these things are done on a hospital server. They are not done on a public or an ad hoc server of any sort. So whatever fail-safes there are within the hospital in itself, whichever hospital network we’re using, whether here in Ghana, whether in Glasgow, whether somewhere remotely in India or in the US, doesn’t matter where, it would be piggybacking off a hospital server. So those fail-safes are there already. So if anybody can get into the network and observe or steal data from our system, then it’s because the hospital system isn’t secure, not because it’s our system, in a manner of speaking, is not secure. All right? And then when I was saying that it’s half full, it’s because whatever lapses we have already in 2D telemedicine, this supersedes it. And not only does it supersede the 2D lapses, it goes again and gives significant patient feedback like we were saying earlier, what Spencer also alluded to, is that now you have the ability to show the patient exactly what’s going on. And so in previous aspects where, think about it, even if it’s an in-person consultation where I would draw on a piece of paper and explain to them, “Well, I’m going to do this, this, this, and that,” now I actually have the patient’s own body, which they’re watching at the same time, being spun around and indicating that this actually is the spot I was talking about and this is how big my cut is going to be, and this is what I’m going to move out from here and use to fill in this space. So once again, my inclination on this is that, on our side, we can only get good, as against to looking for problems. The problems, I, I admit, will exist, but not as a separate entity from regular 2D medicine that’s … or 2D videography that we’re already encountering.

HUIZINGA: So you’re not introducing new risks with this. You’re just sort of serving on the other risks.

DARKO: We’re adding to the positives, basically.

HUIZINGA: Right. Yeah, Spencer, in the “what could go wrong” bucket on the other side of it, I’m looking at healthcare and the cost of it, uh, especially when you’re dealing with multiple specialists and complicated surgeries and so on. And I know healthcare policy is not on your research roadmap necessarily, but you have to be thinking about that as you’re, um, as you’re going on how this will ultimately be implemented across cultures and so on. So have you given any thought to how this might play out in different countries, or is this just sort of “we’re going to make the technology and let the policy people and the wonks work it out later?”

FOWERS: [LAUGHS] Yeah, it’s a good question, and I think it’s something that we’re really excited to see how it can benefit. Luckily enough, where we’re doing the tests right now, like in, uh, Glasgow and in Ghana, they already have partnerships and so there’s already standards in place for being able to share doctors and technology across that. But yeah, we’ve definitely looked into like, what kind of an impact does this have? And one of the benefits that we see is using something like 3D telemedicine even to provide greater access for specialty doctors in places like rural or remote United States, where they just don’t have access to those specialists that they need. I mean, you know, Washington state, where I am, has a great example where you’ve got people that live out in Eastern Washington, and if they have some need to go see like a pediatric specialist, they’re going to have to drive all the way into Seattle to go to Seattle Children’s to see that person. What if we can provide a clinic that allows them to, you know, virtually, through 3D telemedicine, interface with that doctor without having to make that drive and all that commute until they know what they need to do. And so we actually look at it as being beneficial because this provides greater access to these specialists, to other regions. So it’s actually improving, improving healthcare reach and accessibility for everyone.

HUIZINGA: Yeah. Kwame, can you speak to accessibility of these experts? I mean, you would want them all on your team for a 3D telemedicine call, but how hard is it to get them all on the same … I mean, it’s hard to get people to come to a meeting, let alone, you know, a big consultation. Does that enter the picture at all or, is that … ?

DARKO: It does. It does. And I think, um, COVID is something, is something else that’s really changed how we do everyday, routine stuff. So for example, we here in Ghana have a weekly departmental meeting and, um—within the plastic surgery department and also within the greater department of surgery, weekly meeting—everything became remote. So all of a sudden, people who may not be able to make the meeting for whatever reason are now logging on. So it’s actually made accessibility to them much, much easier and swifter. I mean, where they are, what they’re doing at the time, we have no idea, but it just means that now we have access to them. So extrapolating this on to us getting in touch with specialists, um, if we schedule our timing right, it actually makes it easier for the specialists to log on. Now earlier we spoke about international MDTs, not just local, but then, have we thought about what would have happened if we did not have this ability to have this online international MDT? We’re talking about somebody getting a plane ticket, sitting on a plane, waiting in airports, airport delays, etcetera, etcetera, and flying over just to see the patient for 30 minutes and make a decision that, “Well, I can or cannot do this operation.” So now this jumps over all of this and makes it much, much easier for us. And now when we move on to the next stage of consultation, after the procedure has been done, when I’m talking about the surgery, now the patient doesn’t need to travel great distances for individual specialist review. Now in the case of plastic surgery, this may cover not only the surgeon but also the physiotherapist. And so, it’s not just before the consultation but also after the consultation.

HUIZINGA: Wow. Spencer, what you’re doing with 3D telemedicine through Holoportation is a prime example of how a technology developed for one thing turned out to have incredible uses for another. So give us just a brief history of the application for 3D communication and how it evolved from where it started to where it is now.

FOWERS: Yeah, I mean, 3D communication, at least from what we’re doing, really started with things like the original Xbox Kinect, right? With a gaming console and a way to kind of interact in a different way with your gaming system. What happened was, Microsoft released that initial Kinect and suddenly found that people weren’t buying the Kinect to play games with it. They were buying to put it on robots and buying to use, you know, for different kind of robotics applications and research applications. And that’s why the second Kinect, when it was released, they had an Xbox version and they actually had a Kinect for Windows version because they were expecting people to buy this sensor to plug it in their computers. And if you look at the form factor now with the Azure Kinect that we have, it’s a much more compact unit. It’s meant specifically for using on computers, and it’s built for robotics and computer vision applications, and so it’s been really neat to see how this thing that was developed as kind of a toy has become something that we now use in industrial applications.

HUIZINGA: Right. Yeah. And this … sort of the thing, the serendipitous nature of research, especially with, you know, big moonshot projects, is like this is going to be great for gaming and it actually turns out to be great for plastic surgery! Who’d have thunk? Um, Kwame, speaking to where this lives in terms of use, where is it now on the spectrum from lab to life, as I like to say?

DARKO: So currently, um, we have a rig in our unit, the plastic surgery unit in the Korle Bu Teaching Hospital. There’s a rig in Glasgow, and there’s a rig over in the Microsoft office. So currently what we’ve been able to do is to run a few tests between Ghana, Seattle, and Glasgow. So basically, we’ve been able to run MDTs and we’ve been able to run patient assessments, pre-op assessments, as well as post-operative assessments, as well. So that’s where we are at the moment. It takes quite a bit of logistics to initiate, but we believe once we’re on a steady roll, we’ll be able to increase our numbers that we’ve been able to do this on. I think currently those we’ve operated and had a pre-op assessment and post-op assessment have been about six or seven patients. And it was great, basically. We’ve done MDTs across with them, as well. So the full spectrum of use has been done: pre-op, MDT, and post-op assessments. So yeah, um, we have quite a bit more to do with numbers and to take out a few glitches, especially with remote access and stuff like that. But yeah, I think we’re, we’re, we’re making good progress.

HUIZINGA: Yeah. Spencer, do you see, or do you know of, hurdles that you’re going to have to jump through to get this into wider application?

FOWERS: For us, from a research setting, one of the things that we’ve been very clear about as we do this is that, while it’s being used in a medical setting, 3D telemedicine is actually just a communication technology, right? It’s a Teams call; it’s a communication device. We’re not actually performing surgery with the system, you know, or it’s not diagnosing or anything. So it’s not actually a medical device as much as it’s a telecommunication device that’s being used in a medical application.

HUIZINGA: Well, as we wrap up, I like to give each of you a chance to paint a picture of your preferred future. If your work is wildly successful, what does healthcare look like in five to 10 years? And maybe that isn’t the time horizon. It could be two to three; it could be 20 years. I don’t know. But how have you made a difference in specialized medicine with this communication tool?

FOWERS: Like going off of what Kwame was saying, right, back in November, when we flew down and were present for that first international MDT, it was really an eye-opening experience. I mean, these doctors, normally, they just get on an airplane, they fly down, and they meet these patients for the first time, probably the day before they’ve had surgery. And this time, they were able to meet them and then be able to spend time before they flew down preparing for the surgery. And then they did the surgeries. They flew back. And normally, they would fly back, they wouldn’t see that patient again. With 3D telemedicine, they jumped back on a phone call and there was the person in 3D, and they were able to talk to them, you know, turn them around, show them where the procedure was, ask them questions, and have this interaction that made it so much better of an experience for them and for the doctors involved. So when I look at kind of the future of where this goes, you know, our vision is, where else do we need this? Right now, it’s been showcased as this amazing way to bring international expertise to one operating theater, you know, with specialists from around the world, as needed. And I think that’s great. And I think we can apply that in so many different locations, right? Rural United States is a great example for us. We hope to expand out what we’re doing in Scotland, to rural areas of Scotland that, you know, it’s very hard for people in the Scottish isles to be able to get to their hospitals. You know, other possible applications … like can we make this system mobile? You can imagine like a clinical unit where this system drives out to remote villages and is able to allow people that can’t make it in to a hospital to get that initial consultation, to know whether they should make the trip or whether they need other work done before they can start surgery. So kind of the sky’s the limit, right? I mean, it’s always good to look at like, what’s … I work with a team that does moonshots for a living, so I’m always looking for what can we shoot for? And our goal really is like, gosh, where can’t we apply this technology? I mean, it just anywhere that it is at all difficult to get, you know, medical expertise, we can ease the burden of doctors by making it so they don’t have to travel to provide this specialized care and increase the access to healthcare to these people that normally wouldn’t be able to get access to it.

HUIZINGA: Kwame, what’s your, what’s your take?

DARKO: So to me, I just want to describe what the current situation is and what I believe the future situation will be. So, the current situation—and like, like, um, Spencer was saying, this just doesn’t apply to Ghana alone; it can apply in some parts of the US and some parts of the UK, as well—where a patient has a problem, is seen by a GP in the local area, has to travel close to 24 hours, sometimes sleep over somewhere, just to get access to a specialist to see what’s going on. The specialist now diagnoses, sees what’s happening, and then runs a barrage of tests and makes a decision, “Well, you’re going to have an operation, and the operation is going to be in, let’s say, four weeks, six weeks.” So what happens? The patient goes, spends another 24 hours-plus going all the way back home, waiting for the operation day or operation period, and then traveling all the way back. You can imagine the time and expense. And if this person can’t travel alone, that means somebody else needs to take a day off work to bring the person back and forth. So now what would happen in the future if everything goes the way we’re planning? We’d have a rig in every, let’s say, district or region. The person just needs to travel, assumedly, an hour or two to the rig. Gets the appointment. Everything is seen in 3D. All the local blood tests and stuff that can be done would be done locally, results sent across. Book a theater date. So the only time that the person really needs to travel is when they’re coming for the actual operation. And once again, if an MDT has to be run on this, on this patient, it will be done. And, um, they would be sitting in their rig remotely in the town or wherever it is. Those of us in the teaching hospitals across the country would also be in our places, and we’d run the MDT to be sure. Postoperatively, if it’s a review of the patient, we’d be able to do that, even if it’s an MDT review, as well, we could do that. And the extra application, which I didn’t highlight too much and I mentioned it, but I didn’t highlight it, is if this person needs to have physiotherapy and we need to make sure that they’re succeeding and doing it properly, we can actually do it through a 3D call and actually see the person walking in motion or wrist movement or hand extension or neck movements, whatever it is. We can do all this in 3D. So yeah, the, the scope is, is as far as the mind can imagine it!

HUIZINGA: You know, I’m even imagining it, and I hate to bring up The Jetsons as, you know, my, my anchor analogy, but, you know, at some point, way back, nobody thought they’d have the technology we have all in our rooms and on our bodies now. Maybe this is just like the beginning of everybody having 3D communication everywhere, and no one has to go to the doctor before they get the operation. [LAUGHS] I don’t know. Spencer Fowers, Kwame Darko. This is indeed a mind-blowing idea that has the potential to be a world-changing technology. Thanks for joining me today to talk about it.

DARKO: Thanks for having us, Gretchen.

FOWERS: Thanks.

The post Collaborators: Holoportation™ communication technology with Spencer Fowers and Kwame Darko appeared first on Microsoft Research.

Read More