Model Teachers: Startups Make Schools Smarter With Machine Learning

Like two valedictorians, SimInsights and Photomath tell stories worth hearing about how AI is advancing education.

SimInsights in Irvine, Calif., uses NVIDIA conversational AI to make virtual and augmented reality classes lifelike for college students and employee training.

Photomath — founded in Zagreb, Croatia and based in San Mateo, Calif. — created an app using computer vision and natural language processing to help students and their parents brush up on everything from arithmetic to calculus.

Both companies are a part of NVIDIA Inception, a free, global program that nurtures cutting-edge startups.

Surfing Sims in California

Rajesh Jha loved simulations since he developed a physics simulation engine for mechanical parts in college, more than 25 years ago. “So, I put sim in the name when I started my own company in 2009,” he said.

SimInsights originally developed web and mobile training simulations. When AR and VR platforms became available, Jha secured a grant to develop HyperSkill. Now the company’s main product, it’s a cloud-based, AI-powered 3D simulation authoring and analytics tool that makes training immersive.

The software helped UCLA’s medical center build a virtual clinic to train students. But they complained about the low accuracy of its rules-based conversational AI, so Jha took data from the first class and trained a deep neural network using NVIDIA Riva, GPU-accelerated software for building speech AI applications.

Riva Revs Up Speech AI

“There was a quick uptick in the quality, and they say it’s the most realistic training they’ve used,” said Jha.

Now, UCLA wants to apply the technology to train thousands of nurses on dealing with infectious diseases.

“There’s a huge role for conversational AI in education and training because it personalizes the experience,” he said. “And a lot of research shows if you can do that, people learn more and retain it longer.”

Access to New Technology

Because SimInsights is an NVIDIA Inception member, it got early access to Riva and NVIDIA TAO, a toolkit that accelerates evaluating and training AI models with transfer learning. They’ve become standard parts of the company’s workflow.

As for Riva, “it’s a powerful piece of software, and our team really appreciates working with NVIDIA to brainstorm our next steps,” Jha said.

Specifically, SimInsights aims to develop larger conversational AI models with more functions, such as question answering so students can point to objects in a scene and ask about them.

“As Riva gives us more capabilities, we’ll incorporate them into HyperSkill to make digital learning as good as working with an expert — it will take a while, but this is the way to get there,” he said.

Accelerating Math in Croatia

In Zagreb, Damir Sabol got stuck trying to help his eldest son understand a math problem in his homework. It sparked the idea for Photomath, an app that’s been downloaded more than 300 million times since its 2015 release.

The app detects an equation in a smartphone picture, then shows step-by-step solutions to it in formats that support different learning styles.

“At peak times, we get thousands of requests a second, so we need to be really fast,” said Ivan Jurin, who leads the startup’s AI projects.

Some teachers have students open the app as an alternative to working on the blackboard. It’s the kind of anecdote that makes Jurin’s day.

“We want to make education more accessible,” he said. “The free version of Photomath can help people who lack resources understand math almost as well as someone who can afford a tutor.”

A Large Hybrid Model

Under the hood, one large neural network does most of the work, detecting and parsing equations. It’s a mix of a convolutional network and a transformer model that packs about 100 million parameters.

It’s trained on local servers with NVIDIA RTX A6000 GPUs. For a cost-sensitive startup, “training in the cloud didn’t motivate us to experiment with larger datasets and more complex models, but with local servers we can queue up experiments as we see fit,” said Vedran Vekić, a senior machine learning engineer at the company.

Once trained, the service runs in the cloud on NVIDIA T4 Tensor Core GPUs, which he described as “very cost effective.”

NVIDIA Software Speeds Inference

The startup is migrating to a full stack of NVIDIA AI software to accelerate inference. It includes NVIDIA Triton Inference Server for maximum throughput, the TensorRT software development kit to minimize latency and NVIDIA DALI, a library for processing images fast.

“We were using the open-source TorchServe, but it wasn’t as efficient as we hoped,” Vekić said. NVIDIA software “gets 100% GPU utilization, so we’re using it on our smaller models and converting our large model to it, too.”

It’s a technical challenge that NVIDIA experts can help address, one of the benefits of being in Inception.

SimInsights and Photomath are among hundreds of startups — out of NVIDIA Inception’s total 10,000+ members — that are making education smarter with machine learning.

To learn more, check out these GTC sessions on NVIDIA Riva, NVIDIA Tao and NVIDIA Triton and TensorRT.

The post Model Teachers: Startups Make Schools Smarter With Machine Learning appeared first on NVIDIA Blog.

Read More

Ridiculously Realistic Renders Rule This Week ‘In the NVIDIA Studio’

Editor’s note: This post is part of our weekly In the NVIDIA Studio series, which celebrates featured artists, offers creative tips and tricks, and demonstrates how NVIDIA Studio technology accelerates creative workflows. 

Viral creator turned NVIDIA 3D artist Lorenzo Drago takes viewers on a jaw-dropping journey through Toyama, Japan’s Etchū-Daimon Station this week In the NVIDIA Studio.

Drago’s photorealistic recreation of the train station has garnered over 2 million views in under four months, with audiences marveling at the remarkably accurate detail.

“Reality inspires me the most,” said Drago. “Mundane, everyday things always tell a story — they have nuances that are challenging to capture in fantasy.”

 

Drago started by camera matching in the fSpy open-source software. This process computed the approximate focal length, orientation and position of the camera in 3D space, based on the defined control points chosen from his Etchū-Daimon reference image.

The artist then moved to Blender software to begin the initial blockout, a 3D rough-draft level built with simple 3D shapes without details or polished art assets. The goal of the blockout was to prototype, test and adjust the foundational shapes of the level.

Drago tests and adjusts foundational shapes of the level in Blender during the blockout phase of the project.

From there, Drago measured the height of a staircase and extrapolated those proportions to the rest of the 3D scene, ensuring it fit the grid size. The scene could then be built modularly, one model at a time. He modeled with lightning speed using NVIDIA RTX-accelerated OptiX ray tracing in the Blender viewport.

Incredibly, the entire scene is a combination of custom-textured assets. Drago’s texturing technique elevated the sense of realism by using tileable textures and trim sheets, which are textures that combine separate details into a single sheet. Mixing these techniques proved to be profitable in creating original, more detailed textures, as well as keeping good pixel density across the scene. Textures never exceeded more than 2048×2048 pixels in size.

 

Drago created his textures in Adobe Substance 3D Painter, taking advantage of NVIDIA Iray rendering for faster, interactive rendering. RTX acceleration enabled him to quickly bake ambient occlusion and other maps used in texturing, before exporting and applying the textures to the models inside Unreal Engine 5.

Final frame renders came quickly with Drago’s GeForce RTX 2080 SUPER GPU doing the heavy lifting. Drago stressed the necessity of his graphics card.

“For a 3D artist, probably the majority of the work, excluding the planning phases, requires GPU acceleration,” he said. “Being able to work on scenes and objects with materials and lighting rendered in real time saves a lot of time and headaches compared to wireframe or unshaded modes.”

 

Drago moved on to importing and assembling his textured models in Unreal Engine 5. Elated with the assembly, he began the lighting process. Unreal Engine 5’s Lumen technology enabled lighting iterations in real time, without Drago having to wait for baking or render times.

Unreal Engine’s virtual-reality framework allowed Drago to set up a virtual camera with motion tracking. This gave the animation its signature third-person vantage point, enabling the artist to move around his creative space as if he were holding a smartphone.

The GPU-accelerated NVDEC decoder in Premiere Pro enabled smooth playback and scrubbing of high-resolution video for Drago.

With the renders and animation in place, Drago exported the scene to Adobe Premiere Pro software, where he added sound effects. He also sharpened image details in the animation, one of the many GPU-accelerated features in the software. Drago then deployed GPU-accelerated encoding, NVENC, to speed up the exporting of final files.

Subtle modifications allowed Drago to create ultra-realistic renders. “Mimicking a contemporary smartphone camera — with a limited dynamic range, glare and sharpening artifacts — was a good way of selling the realism,” he said.

“RTX features, like DLSS and ray tracing, are amazing, both from a developer’s point of view and a gamer’s,” Drago stated.

NVIDIA 3D artist Lorenzo Drago.

Drago recently imported Etchū-Daimon Station into NVIDIA Omniverse, a 3D design platform for collaborative editing, which replaces linear pipelines with live-sync creation. Drago noted the massive potential within the Omniverse Create App, calling it “a powerful tool capable of achieving extremely high-fidelity results.”

NVIDIA GTC, a global AI conference running online Sept. 19-22, will feature Omniverse sessions with industry experts to demonstrate how the platform can elevate creative workflows. Take advantage of these free resources and register today.

Follow Lorenzo Drago and view his portfolio on ArtStation.

Continue Your Creative Journey

Drago is a self-taught 3D artist, proof that resilience and dedication can lead to incredible, thought provoking, inspirational creative work.

In the spirit of learning, the NVIDIA Studio team is posing a challenge for the community to show off personal growth. Participate in the #CreatorsJourney challenge for a chance to be showcased on NVIDIA Studio social media channels.

Entering is easy. Post an older piece of artwork alongside a more recent one to showcase your growth as an artist. Follow and tag NVIDIA Studio on Instagram, Twitter or Facebook, and use the #CreatorsJourney tag to join.

There’s more than one way to create incredibly photorealistic visuals. Check out Detailed World Building tutorial by material artist Javier Perez, as well as the three-part series, Create Impressive 360 Panoramic Concept Art, by concept design artist Vladmir Somov. The series showcases a complete workflow in modeling, world building and texturing, and post-processing.

Access free tutorials by industry-leading artists on the Studio YouTube channel. Get creativity-inspiring updates directly to your inbox by subscribing to the NVIDIA Studio newsletter.

The post Ridiculously Realistic Renders Rule This Week ‘In the NVIDIA Studio’ appeared first on NVIDIA Blog.

Read More

Analyzing the potential of AlphaFold in drug discovery

Analyzing the potential of AlphaFold in drug discovery

Over the past few decades, very few new antibiotics have been developed, largely because current methods for screening potential drugs are prohibitively expensive and time-consuming. One promising new strategy is to use computational models, which offer a potentially faster and cheaper way to identify new drugs.

A new study from MIT reveals the potential and limitations of one such computational approach. Using protein structures generated by an artificial intelligence program called AlphaFold, the researchers explored whether existing models could accurately predict the interactions between bacterial proteins and antibacterial compounds. If so, then researchers could begin to use this type of modeling to do large-scale screens for new compounds that target previously untargeted proteins. This would enable the development of antibiotics with unprecedented mechanisms of action, a task essential to addressing the antibiotic resistance crisis.

However, the researchers, led by James Collins, the Termeer Professor of Medical Engineering and Science in MIT’s Institute for Medical Engineering and Science (IMES) and Department of Biological Engineering, found that these existing models did not perform well for this purpose. In fact, their predictions performed little better than chance.

“Breakthroughs such as AlphaFold are expanding the possibilities for in silico drug discovery efforts, but these developments need to be coupled with additional advances in other aspects of modeling that are part of drug discovery efforts,” Collins says. “Our study speaks to both the current abilities and the current limitations of computational platforms for drug discovery.”

In their new study, the researchers were able to improve the performance of these types of models, known as molecular docking simulations, by applying machine-learning techniques to refine the results. However, more improvement will be necessary to fully take advantage of the protein structures provided by AlphaFold, the researchers say.

Collins is the senior author of the study, which appears today in the journal Molecular Systems Biology. MIT postdocs Felix Wong and Aarti Krishnan are the lead authors of the paper.

Molecular interactions

The new study is part of an effort recently launched by Collins’ lab called the Antibiotics-AI Project, which has the goal of using artificial intelligence to discover and design new antibiotics.

AlphaFold, an AI software developed by DeepMind and Google, has accurately predicted protein structures from their amino acid sequences. This technology has generated excitement among researchers looking for new antibiotics, who hope that they could use the AlphaFold structures to find drugs that bind to specific bacterial proteins.

To test the feasibility of this strategy, Collins and his students decided to study the interactions of 296 essential proteins from E. coli with 218 antibacterial compounds, including antibiotics such as tetracyclines.

The researchers analyzed how these compounds interact with E. coli proteins using molecular docking simulations, which predict how strongly two molecules will bind together based on their shapes and physical properties.

This kind of simulation has been successfully used in studies that screen large numbers of compounds against a single protein target, to identify compounds that bind the best. But in this case, where the researchers were trying to screen many compounds against many potential targets, the predictions turned out to be much less accurate.

By comparing the predictions produced by the model with actual interactions for 12 essential proteins, obtained from lab experiments, the researchers found that the model had false positive rates similar to true positive rates. That suggests that the model was unable to consistently identify true interactions between existing drugs and their targets.

Using a measurement often used to evaluate computational models, known as auROC, the researchers also found poor performance. “Utilizing these standard molecular docking simulations, we obtained an auROC value of roughly 0.5, which basically says you’re doing no better than if you were randomly guessing,” Collins says.

The researchers found similar results when they used this modeling approach with protein structures that have been experimentally determined, instead of the structures predicted by AlphaFold.

“AlphaFold appears to do roughly as well as experimentally determined structures, but we need to do a better job with molecular docking models if we’re going to utilize AlphaFold effectively and extensively in drug discovery,” Collins says.

Better predictions

One possible reason for the model’s poor performance is that the protein structures fed into the model are static, while in biological systems, proteins are flexible and often shift their configurations.

To try to improve the success rate of their modeling approach, the researchers ran the predictions through four additional machine-learning models. These models are trained on data that describe how proteins and other molecules interact with each other, allowing them to incorporate more information into the predictions.

“The machine-learning models learn not just the shapes, but also chemical and physical properties of the known interactions, and then use that information to reassess the docking predictions,” Wong says. “We found that if you were to filter the interactions using those additional models, you can get a higher ratio of true positives to false positives.”

However, additional improvement is still needed before this type of modeling could be used to successfully identify new drugs, the researchers say. One way to do this would be to train the models on more data, including the biophysical and biochemical properties of proteins and their different conformations, and how those features influence their binding with potential drug compounds.

This study both lets us understand just how far we are from realizing full machine-learning-based paradigms for drug development, and provides fantastic experimental and computational benchmarks to stimulate and direct and guide progress towards this future vision,” says Roy Kishony, a professor of biology and computer science at Technion (the Israel Institute of Technology), who was not involved in the study.

With further advances, scientists may be able to harness the power of AI-generated protein structures to discover not only new antibiotics but also drugs to treat a variety of diseases, including cancer, Collins says. “We’re optimistic that with improvements to the modeling approaches and expansion of computing power, these techniques will become increasingly important in drug discovery,” he says. “However, we have a long way to go to achieve the full potential of in silico drug discovery.”

The research was funded by the James S. McDonnell Foundation, the Swiss National Science Foundation, the National Institute of Allergy and Infectious Diseases, the National Institutes of Health, and the Broad Institute of MIT and Harvard. The Antibiotics-AI Project is supported by the Audacious Project, the Flu Lab, the Sea Grape Foundation, and the Wyss Foundation.

Read More

Analyzing the potential of AlphaFold in drug discovery

Over the past few decades, very few new antibiotics have been developed, largely because current methods for screening potential drugs are prohibitively expensive and time-consuming. One promising new strategy is to use computational models, which offer a potentially faster and cheaper way to identify new drugs.

A new study from MIT reveals the potential and limitations of one such computational approach. Using protein structures generated by an artificial intelligence program called AlphaFold, the researchers explored whether existing models could accurately predict the interactions between bacterial proteins and antibacterial compounds. If so, then researchers could begin to use this type of modeling to do large-scale screens for new compounds that target previously untargeted proteins. This would enable the development of antibiotics with unprecedented mechanisms of action, a task essential to addressing the antibiotic resistance crisis.

However, the researchers, led by James Collins, the Termeer Professor of Medical Engineering and Science in MIT’s Institute for Medical Engineering and Science (IMES) and Department of Biological Engineering, found that these existing models did not perform well for this purpose. In fact, their predictions performed little better than chance.

“Breakthroughs such as AlphaFold are expanding the possibilities for in silico drug discovery efforts, but these developments need to be coupled with additional advances in other aspects of modeling that are part of drug discovery efforts,” Collins says. “Our study speaks to both the current abilities and the current limitations of computational platforms for drug discovery.”

In their new study, the researchers were able to improve the performance of these types of models, known as molecular docking simulations, by applying machine-learning techniques to refine the results. However, more improvement will be necessary to fully take advantage of the protein structures provided by AlphaFold, the researchers say.

Collins is the senior author of the study, which appears today in the journal Molecular Systems Biology. MIT postdocs Felix Wong and Aarti Krishnan are the lead authors of the paper.

Molecular interactions

The new study is part of an effort recently launched by Collins’ lab called the Antibiotics-AI Project, which has the goal of using artificial intelligence to discover and design new antibiotics.

AlphaFold, an AI software developed by DeepMind and Google, has accurately predicted protein structures from their amino acid sequences. This technology has generated excitement among researchers looking for new antibiotics, who hope that they could use the AlphaFold structures to find drugs that bind to specific bacterial proteins.

To test the feasibility of this strategy, Collins and his students decided to study the interactions of 296 essential proteins from E. coli with 218 antibacterial compounds, including antibiotics such as tetracyclines.

The researchers analyzed how these compounds interact with E. coli proteins using molecular docking simulations, which predict how strongly two molecules will bind together based on their shapes and physical properties.

This kind of simulation has been successfully used in studies that screen large numbers of compounds against a single protein target, to identify compounds that bind the best. But in this case, where the researchers were trying to screen many compounds against many potential targets, the predictions turned out to be much less accurate.

By comparing the predictions produced by the model with actual interactions for 12 essential proteins, obtained from lab experiments, the researchers found that the model had false positive rates similar to true positive rates. That suggests that the model was unable to consistently identify true interactions between existing drugs and their targets.

Using a measurement often used to evaluate computational models, known as auROC, the researchers also found poor performance. “Utilizing these standard molecular docking simulations, we obtained an auROC value of roughly 0.5, which basically says you’re doing no better than if you were randomly guessing,” Collins says.

The researchers found similar results when they used this modeling approach with protein structures that have been experimentally determined, instead of the structures predicted by AlphaFold.

“AlphaFold appears to do roughly as well as experimentally determined structures, but we need to do a better job with molecular docking models if we’re going to utilize AlphaFold effectively and extensively in drug discovery,” Collins says.

Better predictions

One possible reason for the model’s poor performance is that the protein structures fed into the model are static, while in biological systems, proteins are flexible and often shift their configurations.

To try to improve the success rate of their modeling approach, the researchers ran the predictions through four additional machine-learning models. These models are trained on data that describe how proteins and other molecules interact with each other, allowing them to incorporate more information into the predictions.

“The machine-learning models learn not just the shapes, but also chemical and physical properties of the known interactions, and then use that information to reassess the docking predictions,” Wong says. “We found that if you were to filter the interactions using those additional models, you can get a higher ratio of true positives to false positives.”

However, additional improvement is still needed before this type of modeling could be used to successfully identify new drugs, the researchers say. One way to do this would be to train the models on more data, including the biophysical and biochemical properties of proteins and their different conformations, and how those features influence their binding with potential drug compounds.

This study both lets us understand just how far we are from realizing full machine-learning-based paradigms for drug development, and provides fantastic experimental and computational benchmarks to stimulate and direct and guide progress towards this future vision,” says Roy Kishony, a professor of biology and computer science at Technion (the Israel Institute of Technology), who was not involved in the study.

With further advances, scientists may be able to harness the power of AI-generated protein structures to discover not only new antibiotics but also drugs to treat a variety of diseases, including cancer, Collins says. “We’re optimistic that with improvements to the modeling approaches and expansion of computing power, these techniques will become increasingly important in drug discovery,” he says. “However, we have a long way to go to achieve the full potential of in silico drug discovery.”

The research was funded by the James S. McDonnell Foundation, the Swiss National Science Foundation, the National Institute of Allergy and Infectious Diseases, the National Institutes of Health, and the Broad Institute of MIT and Harvard. The Antibiotics-AI Project is supported by the Audacious Project, the Flu Lab, the Sea Grape Foundation, and the Wyss Foundation.

Read More

How The Chefz serves the perfect meal with Amazon Personalize

This is a guest post by Ramzi Alqrainy, Chief Technology Officer, The Chefz.

The Chefz is a Saudi-based online food delivery startup, founded in 2016. At the core of The Chefz’s business model is enabling its customers to order food and sweets from top elite restaurants, bakeries, and chocolate shops. In this post, we explain how The Chefz uses Amazon Personalize filters to apply business rules on recommendations to end-users, increasing revenue by 35%.

Food delivery is a growing industry but at the same time is extremely competitive. The biggest challenge in the industry is maintaining customer loyalty. This requires a comprehensive understanding of the customer’s preferences, the ability to provide excellent response time in terms of on-time delivery, and good food quality. These three factors determine the most important metric for The Chefz’s customer satisfaction. The Chefz’s demands fluctuate, especially with spikes in order volumes at lunch and dinner times. Demand also fluctuates during special days such as Mother’s Day, the football final, Ramadan dusk (Suhoor) and sundown (Iftaar) times, or Eid festive holidays. During these times, the demand can increase by up to 300%, adding one more critical challenge to recommend the perfect meal based on time of the day, especially in Ramadan.

The perfect meal at the right time

To make the ordering process more deterministic and to cater to peak demand times, The Chefz team decided to divide the day into different periods. For example, during Ramadan season, days are divided into Iftar and Suhoor. On regular days, days consist of four periods: breakfast, lunch, dinner, and dessert. The technology that underpins this deterministic ordering process is Amazon Personalize, a powerful recommendation engine. Amazon Personalize takes these grouped periods along with the location of the customer to provide a perfect recommendation.

This ensures the customer receives restaurant and meal recommendations based on their preference and from a nearby location so that it arrives quickly at their doorstep.

This recommendation engine based on Amazon Personalize is the key ingredient in how The Chefz’s customers enjoy personalized restaurant meal recommendations, rather than random recommendations for categories of favorites.

The personalization journey

The Chefz started its personalization journey by offering restaurant recommendations for customers using Amazon Personalize based on previous interactions, user metadata (such as age, nationality, and diet), restaurant metadata like category and food types offered, along with live tracking for customer interactions on the Chefz mobile application and web portal. The initial deployment phases of Amazon Personalize led to a 10% increase in customer interactions with the portal.

Although that was a milestone step, delivery time was still a problem that many customers encountered. One of the main difficulties customers had was delivery time during rush hour. To address this, the data scientist team added location as an additional feature to user metadata so recommendations would take into consideration both user preference and location for improved delivery time.

The next step in the recommendation journey was to consider annual timing, especially Ramadan, and the time of day. These considerations ensured The Chefz could recommend heavy meals or restaurants that provide Iftaar meals during Ramadan sundown, and lighter meals in the late evening. To solve this challenge, the data scientist team used Amazon Personalize filters updated by AWS Lambda functions, which were triggered by an Amazon CloudWatch cron job.

The following architecture shows the automated process for applying the filters:

  1. A CloudWatch event uses a cron expression to schedule when a Lambda function is invoked.
  2. When the Lambda function is triggered, it attaches the filter to the recommendation engine to apply business rules.
  3. Recommended meals and restaurants are delivered to end-users on the application.

Conclusion

Amazon Personalize enabled The Chefz to apply context about individual customers and their circumstances, and deliver customized recommendations based on business rules such as special deals and offers through our mobile application. This increased revenue by 35% per month and doubled customer orders at recommended restaurants.

“The customer is at the heart of everything we do at The Chefz, and we’re working tirelessly to improve and enhance their experience. With Amazon Personalize, we are able to achieve personalization at scale across our entire customer base, which was previously impossible.”

-Ramzi Algrainy, CTO at The Chefz.


About the authors

Ramzi Alqrainy is Chief Technology Officer at The Chefz. Ramzi is a contributor to Apache Solr and Slack and technical reviewer, and has published many papers in IEEE focusing on search and data functions.

Mohamed Ezzat is a Senior Solutions Architect at AWS with a focus in machine learning. He works with customers to address their business challenges using cloud technologies. Outside of work, he enjoys playing table tennis.

Read More

Distributed training with Amazon EKS and Torch Distributed Elastic

Distributed deep learning model training is becoming increasingly important as data sizes are growing in many industries. Many applications in computer vision and natural language processing now require training of deep learning models, which are growing exponentially in complexity and are often trained with hundreds of terabytes of data. It then becomes important to use a vast cloud infrastructure to scale the training of such large models.

Developers can use open-source frameworks such as PyTorch to easily design intuitive model architectures. However, scaling the training of these models across multiple nodes can be challenging due to increased orchestration complexity.

Distributed model training mainly consists of two paradigms:

  • Model parallel – In model parallel training, the model itself is so large that it can’t fit in the memory of a single GPU, and multiple GPUs are needed to train the model. The Open AI’s GPT-3 model with 175 billion trainable parameters (approximately 350 GB in size) is a good example of this.
  • Data parallel – In data parallel training, the model can reside in a single GPU, but because the data is so large, it can take days or weeks to train a model. Distributing the data across multiple GPU nodes can significantly reduce the training time.

In this post, we provide an example architecture to train PyTorch models using the Torch Distributed Elastic framework in a distributed data parallel fashion using Amazon Elastic Kubernetes Service (Amazon EKS).

Prerequisites

To replicate the results reported in this post, the only prerequisite is an AWS account. In this account, we create an EKS cluster and an Amazon FSx for Lustre file system. We also push container images to an Amazon Elastic Container Registry (Amazon ECR) repository in the account. Instructions to set up these components are provided as needed throughout the post.

EKS clusters

Amazon EKS is a managed container service to run and scale Kubernetes applications on AWS. With Amazon EKS, you can efficiently run distributed training jobs using the latest Amazon Elastic Compute Cloud (Amazon EC2) instances without needing to install, operate, and maintain your own control plane or nodes. It is a popular orchestrator for machine learning (ML) and AI workflows. A typical EKS cluster in AWS looks like the following figure.

We have released an open-source project, AWS DevOps for EKS (aws-do-eks), which provides a large collection of easy-to-use and configurable scripts and tools to provision EKS clusters and run distributed training jobs. This project is built following the principles of the Do Framework: Simplicity, Flexibility, and Universality. You can configure your desired cluster by using the eks.conf file and then launch it by running the eks-create.sh script. Detailed instructions are provided in the GitHub repo.

Train PyTorch models using Torch Distributed Elastic

Torch Distributed Elastic (TDE) is a native PyTorch library for training large-scale deep learning models where it’s critical to scale compute resources dynamically based on availability. The TorchElastic Controller for Kubernetes is a native Kubernetes implementation for TDE that automatically manages the lifecycle of the pods and services required for TDE training. It allows for dynamically scaling compute resources during training as needed. It also provides fault-tolerant training by recovering jobs from node failure.

In this post, we discuss the steps to train PyTorch EfficientNet-B7 and ResNet50 models using ImageNet data in a distributed fashion with TDE. We use the PyTorch DistributedDataParallel API and the Kubernetes TorchElastic controller, and run our training jobs on an EKS cluster containing multiple GPU nodes. The following diagram shows the architecture diagram for this model training.

TorchElastic for Kubernetes consists mainly of two components: the TorchElastic Kubernetes Controller (TEC) and the parameter server (etcd). The controller is responsible for monitoring and managing the training jobs, and the parameter server keeps track of the worker nodes for distributed synchronization and peer discovery.

In order for the training pods to access the data, we need a shared data volume that can be mounted by each pod. Some options for shared volumes through Container Storage Interface (CSI) drivers included in AWS DevOps for EKS are Amazon Elastic File System (Amazon EFS) and FSx for Lustre.

Cluster setup

In our cluster configuration, we use one c5.2xlarge instance for system pods. We use three p4d.24xlarge instances as worker pods to train an EfficientNet model. For ResNet50 training, we use p3.8xlarge instances as worker pods. Additionally, we use an FSx shared file system to store our training data and model artifacts.

AWS p4d.24xlarge instances are equipped with Elastic Fabric Adapter (EFA) to provide networking between nodes. We discuss EFA more later in the post. To enable communication through EFA, we need to configure the cluster setup through a .yaml file. An example file is provided in the GitHub repository.

After this .yaml file is properly configured, we can launch the cluster using the script provided in the GitHub repo:

./eks-create.sh

Refer to the GitHub repo for detailed instructions.

There is practically no difference between running jobs on p4d.24xlarge and p3.8xlarge. The steps described in this post work for both. The only difference is the availability of EFA on p4d.24xlarge instances. For smaller models like ResNet50, standard networking compared to EFA networking has minimal impact on the speed of training.

FSx for Lustre file system

FSx is designed for high-performance computing workloads and provides sub-millisecond latency using solid-state drive storage volumes. We chose FSx because it provided better performance as we scaled to a large number of nodes. An important detail to note is that FSx can only exist in a single Availability Zone. Therefore, all nodes accessing the FSx file system should exist in the same Availability Zone as the FSx file system. One way to achieve this is to specify the relevant Availability Zone in the cluster .yaml file for the specific node groups before creating the cluster. Alternatively, we can modify the network part of the auto scaling group for these nodes after the cluster is set up, and limit it to using a single subnet. This can be easily done on the Amazon EC2 console.

Assuming that the EKS cluster is up and running, and the subnet ID for the Availability Zone is known, we can set up an FSx file system by providing the necessary information in the fsx.conf file as described in the readme and running the deploy.sh script in the fsx folder. This sets up the correct policy and security group for accessing the file system. The script also installs the CSI driver for FSx as a daemonset. Finally, we can create the FSx persistent volume claim in Kubernetes by applying a single .yaml file:

kubectl apply -f fsx-pvc-dynamic.yaml

This creates an FSx file system in the Availability Zone specified in the fsx.conf file, and also creates a persistent volume claim fsx-pvc, which can be mounted by any of the pods in the cluster in a read-write-many (RWX) fashion.

In our experiment, we used complete ImageNet data, which contains more that 12 million training images divided into 1,000 classes. The data can be downloaded from the ImageNet website. The original TAR ball has several directories, but for our model training, we’re only interested in ILSVRC/Data/CLS-LOC/, which includes the train and val subdirectories. Before training, we need to rearrange the images in the val subdirectory to match the directory structure required by the PyTorch ImageFolder class. This can be done using a simple Python script after the data is copied to the persistent volume in the next step.

To copy the data from an Amazon Simple Storage Service (Amazon S3) bucket to the FSx file system, we create a Docker image that includes scripts for this task. An example Dockerfile and a shell script are included in the csi folder within the GitHub repo. We can build the image using the build.sh script and then push it to Amazon ECR using the push.sh script. Before using these scripts, we need to provide the correct URI for the ECR repository in the .env file in the root folder of the GitHub repo. After we push the Docker image to Amazon ECR, we can launch a pod to copy the data by applying the relevant .yaml file:

kubectl apply -f fsx-data-prep-pod.yaml

The pod automatically runs the script data-prep.sh to copy the data from Amazon S3 to the shared volume. Because the ImageNet data has more than 12 million files, the copy process takes a couple of hours. The Python script imagenet_data_prep.py is also run to rearrange the val dataset as expected by PyTorch.

Network acceleration

We can use Elastic Fabric Adapter (EFA) in combination with supported EC2 instance types to accelerate network traffic between the GPU nodes in your cluster. This can be useful when running large distributed training jobs where standard network communication may be a bottleneck. Scripts to deploy and test the EFA device plugin in the EKS cluster that we use here are included in the efa-device-plugin folder in the GitHub repo. To enable a job with EFA in your EKS cluster, in addition to the cluster nodes having the necessary hardware and software, the EFA device plugin needs to be deployed to the cluster, and your job container needs to have compatible CUDA and NCCL versions installed.

To demonstrate running NCCL tests and evaluating the performance of EFA on p4d.24xlarge instances, we first must deploy the Kubeflow MPI operator by running the corresponding deploy.sh script in the mpi-operator folder. Then we run the deploy.sh script and update the test-efa-nccl.yaml manifest so limits and requests for resource vpc.amazonaws.com are set to 4. The four available EFA adapters in the p4d.24xlarge nodes get bundled together to provide maximum throughput.

Run kubectl apply -f ./test-efa-nccl.yaml to apply the test and then display the logs of the test pod. The following line in the log output confirms that EFA is being used:

NCCL INFO NET/OFI Selected Provider is efa

The test results should look similar to the following output:

[1,0]<stdout>:#                                                       out-of-place                       in-place
[1,0]<stdout>:#       size         count      type   redop     time   algbw   busbw  error     time   algbw   busbw  error
[1,0]<stdout>:#        (B)    (elements)                       (us)  (GB/s)  (GB/s)            (us)  (GB/s)  (GB/s)
[1,0]<stdout>:           8             2     float     sum    629.7    0.00    0.00  2e-07    631.4    0.00    0.00  1e-07
[1,0]<stdout>:          16             4     float     sum    630.5    0.00    0.00  1e-07    628.1    0.00    0.00  1e-07
[1,0]<stdout>:          32             8     float     sum    627.6    0.00    0.00  1e-07    628.2    0.00    0.00  1e-07
[1,0]<stdout>:          64            16     float     sum    633.6    0.00    0.00  1e-07    628.4    0.00    0.00  6e-08
[1,0]<stdout>:         128            32     float     sum    627.5    0.00    0.00  6e-08    632.0    0.00    0.00  6e-08
[1,0]<stdout>:         256            64     float     sum    634.5    0.00    0.00  6e-08    636.5    0.00    0.00  6e-08
[1,0]<stdout>:         512           128     float     sum    634.8    0.00    0.00  6e-08    635.2    0.00    0.00  6e-08
[1,0]<stdout>:        1024           256     float     sum    646.6    0.00    0.00  2e-07    643.6    0.00    0.00  2e-07
[1,0]<stdout>:        2048           512     float     sum    745.0    0.00    0.01  5e-07    746.0    0.00    0.01  5e-07
[1,0]<stdout>:        4096          1024     float     sum    958.2    0.00    0.01  5e-07    955.8    0.00    0.01  5e-07
[1,0]<stdout>:        8192          2048     float     sum    963.0    0.01    0.02  5e-07    954.5    0.01    0.02  5e-07
[1,0]<stdout>:       16384          4096     float     sum    955.0    0.02    0.03  5e-07    955.5    0.02    0.03  5e-07
[1,0]<stdout>:       32768          8192     float     sum    975.5    0.03    0.06  5e-07   1009.0    0.03    0.06  5e-07
[1,0]<stdout>:       65536         16384     float     sum   1353.4    0.05    0.09  5e-07   1343.5    0.05    0.09  5e-07
[1,0]<stdout>:      131072         32768     float     sum   1395.9    0.09    0.18  5e-07   1392.6    0.09    0.18  5e-07
[1,0]<stdout>:      262144         65536     float     sum   1476.7    0.18    0.33  5e-07   1536.3    0.17    0.32  5e-07
[1,0]<stdout>:      524288        131072     float     sum   1560.3    0.34    0.63  5e-07   1568.3    0.33    0.63  5e-07
[1,0]<stdout>:     1048576        262144     float     sum   1599.2    0.66    1.23  5e-07   1595.3    0.66    1.23  5e-07
[1,0]<stdout>:     2097152        524288     float     sum   1671.1    1.25    2.35  5e-07   1672.5    1.25    2.35  5e-07
[1,0]<stdout>:     4194304       1048576     float     sum   1785.1    2.35    4.41  5e-07   1780.3    2.36    4.42  5e-07
[1,0]<stdout>:     8388608       2097152     float     sum   2133.6    3.93    7.37  5e-07   2135.0    3.93    7.37  5e-07
[1,0]<stdout>:    16777216       4194304     float     sum   2650.9    6.33   11.87  5e-07   2649.9    6.33   11.87  5e-07
[1,0]<stdout>:    33554432       8388608     float     sum   3422.0    9.81   18.39  5e-07   3478.7    9.65   18.09  5e-07
[1,0]<stdout>:    67108864      16777216     float     sum   4783.2   14.03   26.31  5e-07   4782.6   14.03   26.31  5e-07
[1,0]<stdout>:   134217728      33554432     float     sum   7216.9   18.60   34.87  5e-07   7240.9   18.54   34.75  5e-07
[1,0]<stdout>:   268435456      67108864     float     sum    12738   21.07   39.51  5e-07    12802   20.97   39.31  5e-07
[1,0]<stdout>:   536870912     134217728     float     sum    24375   22.03   41.30  5e-07    24403   22.00   41.25  5e-07
[1,0]<stdout>:  1073741824     268435456     float     sum    47904   22.41   42.03  5e-07    47893   22.42   42.04  5e-07
[1,4]<stdout>:test-efa-nccl-worker-0:33:33 [4] NCCL INFO comm 0x7fd4a0000f60 rank 4 nranks 16 cudaDev 4 busId 901c0 - Destroy COMPLETE
[1,0]<stdout>:# Out of bounds values : 0 OK
[1,0]<stdout>:# Avg bus bandwidth    : 8.23785

We can observe in the test results that the max throughput is about 42 GB/sec and average bus bandwidth is approximately 8 GB.

We also conducted experiments with a single EFA adapter enabled as well as no EFA adapters. All results are summarized in the following table.

Number of EFA Adapters Net/OFI Selected Provider Avg. Bandwidth (GB/s) Max. Bandwith (GB/s)
4 efa 8.24 42.04
1 efa 3.02 5.89
0 socket 0.97 2.38

We also found that for relatively small models like ImageNet, the use of accelerated networking reduces the training time per epoch only with 5–8% at batch size of 64. For larger models and smaller batch sizes, when increased network communication of weights is needed, the use of accelerated networking has greater impact. We observed a decrease of epoch training time with 15–18% for training of EfficientNet-B7 with batch size 1. The actual impact of EFA on your training will depend on the size of your model.

GPU monitoring

Before running the training job, we can also set up Amazon CloudWatch metrics to visualize the GPU utilization during training. It can be helpful to know whether the resources are being used optimally or potentially identify resource starvation and bottlenecks in the training process.

The relevant scripts to set up CloudWatch are located in the gpu-metrics folder. First, we create a Docker image with amazon-cloudwatch-agent and nvidia-smi. We can use the Dockerfile in the gpu-metrics folder to create this image. Assuming that the ECR registry is already set in the .env file from the previous step, we can build and push the image using build.sh and push.sh. After this, running the deploy.sh script automatically completes the setup. It launches a daemonset with amazon-cloudwatch-agent and pushes various metrics to CloudWatch. The GPU metrics appear under the CWAgent namespace on the CloudWatch console. The rest of the cluster metrics show under the ContainerInsights namespace.

Model training

All the scripts needed for PyTorch training are located in the elasticjob folder in the GitHub repo. Before launching the training job, we need to run the etcd server, which is used by the TEC for worker discovery and parameter exchange. The deploy.sh script in the elasticjob folder does exactly that.

To take advantage of EFA in p4d.24xlarge instances, we need to use a specific Docker image available in the Amazon ECR Public Gallery that supports NCCL communication through EFA. We just need to copy our training code to this Docker image. The Dockerfile under the samples folder creates an image to be used when running training job on p4d instances. As always, we can use the build.sh and push.sh scripts in the folder to build and push the image.

The imagenet-efa.yaml file describes the training job. This .yaml file sets up the resources needed for running the training job and also mounts the persistent volume with the training data set up in the previous section.

A couple of things are worth pointing out here. The number of replicas should be set to the number of nodes available in the cluster. In our case, we set this to 3 because we had three p4d.24xlarge nodes. In the imagenet-efa.yaml file, the nvidia.com/gpu parameter under resources and nproc_per_node under args should be set to the number of GPUs per node, which in the case of p4d.24xlarge is 8. Also, the worker argument for the Python script sets the number of CPUs per process. We chose this to be 4 because, in our experiments, this provides optimal performance when running on p4d.24xlarge instances. These settings are necessary in order to maximize the use of all the hardware resources available in the cluster.

When the job is running, we can observe the GPU usage in CloudWatch for all the GPUs in the cluster. The following is an example from one of our training jobs with three p4d.24xlarge nodes in the cluster. Here we’ve selected one GPU from each node. With the settings mentioned earlier, the GPU usage is close to 100% during the training phase of the epoch for all of the nodes in the cluster.

For training a ResNet50 model using p3.8xlarge instances, we need exactly the same steps as described for the EfficientNet training using p4d.24xlarge. We can also use the same Docker image. As mentioned earlier, p3.8xlarge instances aren’t equipped with EFA. However, for the ResNet50 model, this is not a significant drawback. The imagenet-fsx.yaml script provided in the GitHub repository sets up the training job with appropriate resources for the p3.8xlarge node type. The job uses the same dataset from the FSx file system.

GPU scaling

We ran some experiments to observe how the training time scales for the EfficientNet-B7 model by increasing the number of GPUs. To do this, we changed the number of replicas from 1 to 3 in our training .yaml file for each training run. We only observed the time for a single epoch while using the complete ImageNet dataset. The following figure shows the results for our GPU scaling experiment. The red dotted line represents how the training time should go down from a run using 8 GPUs by increasing the number of GPUs. As we can see, the scaling is quite close to what is expected.

Similarly, we obtained the GPU scaling plot for ResNet50 training on p3.8xlarge instances. For this case, we changed the replicas in our .yaml file from 1 to 4. The results of this experiment are shown in the following figure.

Clean up

It’s important to spin down resources after model training in order to avoid costs associated with running idle instances. With each script that creates resources, the GitHub repo provides a matching script to delete them. To clean up our setup, we must delete the FSx file system before deleting the cluster because it’s associated with a subnet in the cluster’s VPC. To delete the FSx file system, we just need to run the following command (from inside the fsx folder):

kubectl delete -f fsx-pvc-dynamic.yaml
./delete.sh

Note that this will not only delete the persistent volume, it will also delete the FSx file system, and all the data on the file system will be lost. When this step is complete, we can delete the cluster by using the following script in the eks folder:

./eks-delete.sh

This will delete all the existing pods, remove the cluster, and delete the VPC created in the beginning.

Conclusion

In this post, we detailed the steps needed for running PyTorch distributed data parallel model training on EKS clusters. This task may seem daunting, but the AWS DevOps for EKS project created by the ML Frameworks team at AWS provides all the necessary scripts and tools to simplify the process and make distributed model training easily accessible.

For more information on the technologies used in this post, visit Amazon EKS and Torch Distributed Elastic. We encourage you to apply the approach described here to your own distributed training use cases.

Resources


About the authors

Imran Younus is a Principal Solutions Architect for ML Frameworks team at AWS. He focuses on large scale machine learning and deep learning workloads across AWS services like Amazon EKS and AWS ParallelCluster. He has extensive experience in applications of Deep Leaning in Computer Vision and Industrial IoT. Imran obtained his PhD in High Energy Particle Physics where he has been involved in analyzing experimental data at peta-byte scales.

Alex Iankoulski is a full-stack software and infrastructure architect who likes to do deep, hands-on work. He is currently a Principal Solutions Architect for Self-managed Machine Learning at AWS. In his role he focuses on helping customers with containerization and orchestration of ML and AI workloads on container-powered AWS services. He is also the author of the open source Do framework and a Docker captain who loves applying container technologies to accelerate the pace of innovation while solving the world’s biggest challenges. During the past 10 years, Alex has worked on combating climate change, democratizing AI and ML, making travel safer, healthcare better, and energy smarter.

Read More

Long-term Dynamics of Fairness Intervention in Connection Recommender Systems

Figure 1: Modeled recommendation cycle. We demonstrate how enforcing group-fairness in every recommendation slate separately does not necessarily promote equity in second order variables of interest like network size.

Connection recommendation is at the heart of user experience in many online social networks. Given a prompt such as ‘People you may know’, connection recommender systems suggest a list of users, and the recipient of the recommendation decides which of the users to connect with. In some instances, connection recommendations can account for more than 50% of the social network graph [1]. Depending on the platform, being connected to the right people is tied to important advantages such as job opportunities or increased visibility. While this makes it imperative to treat users fairly, it is far from obvious how fairness can be enforced or what it even means to have a ‘fair’ system in this scenario. In fact, when enforcing fairness in dynamic systems like this, we are often interested in second-order variables while interventions target equity in single steps [2]. In connection recommendation, this generally means enforcing some sort of parity condition in each recommendation slate, e.g. equal exposure of female and male users for every query, while simultaneously hoping that this will promote more equitable network sizes in the long run. In our work, we demonstrate how this approach can fail and common statistical notions of recommendation fairness do not ensure equity of network sizes in the long run. In fact, we see that fairness interventions are not even sufficient to mitigate the amplification of existing biases. We reach these conclusions based on an extensive simulation study supplemented by theoretical limit analyses. In the following, we focus on a discussion of empirical results and refer to the full paper for theoretical derivations.

Recommendation procedure

The assumed recommendation cycle is depicted in Figure 1. We briefly summarize the involved steps and give more detail on the most important components in the following sections. On the left in the flowchart, a user queries a connection recommendation, for example, by loading the ‘People You may Know’ page on the platform’s website. The system then computes relevance scores between the user who is seeking recommendations – which we will refer to as the source user – and other previously unconnected users who will be referred to as the destination users. Based on these relevance scores, we derive a ranking of destination users subject to fairness constraints and display a list of possible connections to the source user. After new connections have been made, the cycle repeats.

Fairness constrained probabilistic ranking framework

For each recommendation query, the system obtains a set of relevance scores and derives a probabilistic ranking. This probabilistic ranking takes the form of a matrix (P_{s,d}^q) where the entry in the (d)th row and (r)th column denotes the probability of displaying member (d) in slot (r) of the recommendation list. The probabilities are selected to maximize the expected utility for the source user which is common practice in the recommendation literature [3].

More formally, we denote the source member as (s) and the destination member as (d). (q) denotes a query for an ordered list of (m) destination members and (u_{s,d}^q) the relevance score of (s) and (d) under query (q). We model a form of position bias by discounting the expected attention a recommended user receives based on how far down the list the recommendation, i.e. we define the exposure of slot (r) as (v(r)=1/log(r+1)). Given these quantities, the optimization problem for query (q) can be written as

$$argmax_{P}u_s^TP^q_sv$$

$$text{s.t.} sum_{i=1}^nP_s^q(d,i)leq 1 text{ for all } din[D_q],$$

$$sum_{i=1}^{D_q}P_s^q(i,r)=1text{ for all }rin[m],$$

$$0leq P_s^q(i,r)leq 1text{ for all }iin[D_q],jin[m].$$

The constraints in this problem ensure that each slot has a recommended user and each user can be recommended at most once in a given list. We explore two types of fairness constraints to add to this problem. First, we consider demographic parity of exposure which is a commonly suggested form of fairness in recommendations. It requires that groups receive expected exposure proportional to their population shares, i.e. for groups (G_0) and (G_1)

$$frac{1}{vert G_0vert}sum_{din G_0}sum_{r=1}^mP_s^q(d,r)v_r=frac{1}{vert G_1vert}sum_{din G_1}sum_{r=1}^mP_s^q(d,r)v_r.$$

Assume a setting in which there is no position bias (v), the population is split into 60% majority group and 40% minority group, and each recommendation list has 10 slots. In this setting, demographic parity of exposure requires (in expectation) that 6 of the recommended members belong to the majority group while 4 belong to the minority group in every recommendation list.

Second, we explore a dynamic parity of utility constraint. As opposed to the first constraint, this fairness measure does not only consider the exposure of groups but also the total utility that can be expected from the exposure, i.e. for two groups (G_0) and (G_1),

$$frac{1}{vert G_0vert}sum_{din G_0}u_{s,d}^qsum_{r=1}^mP_s^q(d,r)v_r=frac{1}{vert G_1vert}sum_{din G_1}u_{s,d}^qsum_{r=1}^mP_s^q(d,r)v_r.$$

On a high level, the constraint requires that the sum of relevance scores across groups is proportional to the population share of groups discounted by the exposure limitations posed by position bias. To understand what this means, let us consider the example setting from before with no position bias, a 60%/40% group split, and recommendation lists with 10 slots. Assume that all destination members of the majority group have a relevance score of 0.12 while all destination members of the minority group have a fixed relevance of 0.08. The probabilistic ranking fulfills dynamic parity of utility if the expected group-split in the recommendation list is 50%/50% because (0.5 * 10 * 0.12 = 0.6) and (0.5 * 10 * 0.08 = 0.4).

How do we model relevance scores?

Relevance scores usually model the probability of connection if recommended, a measure of downstream engagement or some mixture of the two. The exact models employed by social media platforms are usually proprietary, but we opt for a synthetic model in the form of logistic regression in three realistic features [4,5]. Assuming our prediction target is the probability of connection, we first use the source member’s network size. This is based on the assumption that users with larger networks are more likely to be proactive in connection forming as they are generally more active on the platform. Next, we assume that users with more common connections are more likely to connect. The common connections feature is rooted in social network literature where it is commonly referred to as triadic closure [e.g. 6,7]. Lastly, we make the assumption that users with similarities such as demographics, interests, education, workplace, etc. are more likely to connect. This follows the observation that individuals like to be connected to similar individuals commonly referred to as homophily in sociology and other social sciences [e.g. 8].

The main simulation procedure makes use of the relevance scoring model for user pairs as follows. We assume a fixed-size graph of evolving connections with two groups of members. 65% of members belong to an initially more connected majority group. First, we assume a ground truth model for matching scores by imposing ground-truth parameter values in the presented logistic regression function. We use this function to simulate a data set of recommendations and formed connections and use the synthetic data to train a logistic regression matching scores model. 

Simulation procedure

We sample covariate vectors from group-dependent distributions for each member in the graph. These vectors will be used to compute the similarity between members which we set as negative Euclidean distance. The connections in the graph are initialized with a stochastic block model in which user pairs in the majority group are slightly more likely to connect than user pairs in the minority group and user pairs who belong to the same group are slightly more likely to connect than user pairs who belong to opposite groups. This models the initial advantage for majority group members we would expect to see in practice and accounts for homophily preferences. Given the initial specifications, we go through the previously described recommendation cycle for 2,500 timesteps and keep track of key fairness and performance metrics. Following our reasoning from the scoring model, the frequency with which users seek out recommendations is based on exponential waiting times with a mean depending on the current network size of a user. The whole simulation procedure is repeated with each fairness intervention separately and without intervention for comparison. Results are averaged over 10 runs. 

Figure 2: Results of the simulation study averaged over 10 runs. Panel (a) shows the absolute difference in average network sizes between groups over the simulation time period. Panel (b) shows the share of network degrees that belong to the majority group. A connection essentially translates into two degrees, one for the source member of the connection and one for the destination member. In panel (c), we see the share of the majority group on the destination members of new connections, and in panel (d), we see the share of the majority group among all new degrees.

Rich-gets-richer in groups

Figure 2 depicts the performance of all 3 intervention types. With no intervention – which is displayed in red in the graphs – we observe that the gap in average network sizes between groups drastically increases over time. In fact, our results reveal a group-wise rich-get-richer effect in which network sizes tend to a power law distribution with a lower mode in the minority population. Members whose networks are in the tail of the distribution tend to be the members who had large networks, as compared to their peers, to begin with. These observations are in line with previous findings on rich-get-richer phenomena under homophily.

Demographic parity of exposure intervention

We now move to the results for the demographic parity of exposure intervention. First, we can see that the majority group share of exposure in recommendations is down to 66.3% which suggests that the intervention is working. However, as we can see in panels (a) and (b), the gap in average network sizes is still increasing over time. Why is this the case? First, majority group members seek out recommendations more frequently based on their already larger network sizes and activity levels. This leads to more connections formed with majority group members at the source side of the recommendation while our intervention can only target the destination side. Second, majority group members have higher ranking scores and are thus more likely to be invited for connection even if both groups are exposed equally. Consider the following example recommendation:

Here, both female and male users have the same exposure – we will ignore position bias for this example. However, the probability of connection and its proxy – the matching score – are much lower for women than for men essentially leading to fewer new connections for women than for men. While the intervention suggests recommendations are made fairly, this does not align with our intuitive goal of more equitable network sizes.

Dynamic parity of utility intervention

Some of the problems with the demographic parity of exposure intervention are addressed by the dynamic parity of utility intervention. We see that the majority group share of destination members of new connections is down to 65.4% as desired. Yet, even with this type of constraint, panels (a) and (b) show that the gap in average network sizes is still increasing over time. Like in the previous case, one of the reasons for this is that majority group members seek out recommendations more frequently. In addition, our analysis reveals that source members from the majority group generally form more connections per recommendation query leading to the increased share in panel (d). To understand why this is the case, consider the following example:

In the first row, a majority group member – here displayed as male – receives recommendations. In the second row, a minority group member – here female – receives recommendations. In both recommendation lists the dynamic parity of utility constraint is fulfilled, but the male source member receives more overall utility from the query because the constraint can only target fairness within a list and not in between different sets of recommendations.

Summary of findings

Let us summarize the key findings. Our study shows that unconstrained connection recommendation leads to a group-wise rich-get-richer effect. Enforcing demographic parity of exposure or dynamic parity of utility between groups, which are commonly suggested remedies against demographic bias in recommender systems, leads to less bias amplification but is not sufficient in order to mitigate an increase in the disparities in network sizes over time. As shown in the full paper, theoretical limit analysis shows that dynamic parity of utility would be the optimal intervention if there was no source-side bias. Yet, this is an unrealistic assumption in practice.

Overall, the common practice of measuring fairness in recommender systems in a one-shot or time-aggregate static manner can lead to an illusion of fairness and deployment of fairness-enhancing algorithms with unforeseen consequences. Connection recommendation operates on a dynamical system that needs to be taken into account to ensure equitable outcomes in the long run.

The full paper is published in the proceedings of the AAAI / ACM conference on Artificial Intelligence, Ethics, and Society (AIES 2022). A preprint is available here.

References

[1] LinkedIn PYMK: https://engineering.linkedin.com/teams/data/artificial-intelligence/people-you-may-know [Online; accessed 7/6/22] [2] Lydia T. Liu, Sarah Dean, Esther Rolf, Max Simchowitz, and Moritz Hardt. Delayed impact of fair machine learning. In Proceedings of the 35th International Conference on Machine Learning (ICLM 2018), 2018.

[3] Deepak K. Agarwal and Bee-Chung Chen. 2016. Statistical Methods for Recommender Systems. Cambridge University Press.

[4] LinkedIn PYMK: https://engineering.linkedin.com/teams/data/artificial-intelligence/people-you-may-know [Online; accessed 7/6/22] [5] Facebook PYMK: https://www.facebook.com/help/336320879782850 [Online; accessed 7/6/22] [6] Kossinets, G., & Watts, D. J. (2006). Empirical Analysis of an Evolving Social Network. In Science (Vol. 311, Issue 5757, pp. 88–90). American Association for the Advancement of Science (AAAS).

[7] David Liben-Nowell and Jon Kleinberg. 2007. The link-prediction problem for social networks. J. Am. Soc. Inf. Sci. Technol. 58, 7 (May 2007), 1019–1031.

[8] McPherson, M., Smith-Lovin, L., & Cook, J. M. (2001). Birds of a Feather: Homophily in Social Networks. In Annual Review of Sociology (Vol. 27, Issue 1, pp. 415–444). Annual Reviews.

Read More