Diverse representation in advertising: Q&A with Creative Shop Researcher Fernanda de Lima Alcantara

International Women’s Day celebrates the social, economic, cultural, and political achievements of women. To highlight the impactful work that women researchers are doing at Facebook, we reached out to Fernanda de Lima Alcantara, Marketing Science Researcher at Facebook’s Creative Shop.

The Creative Shop is an internal team of creative strategists, designers, writers, producers, and data experts who collaborate with advertisers to help them run effective campaigns on Facebook’s apps and services. Within this team, De Lima focuses on helping businesses succeed by providing them with marketing and advertising insights, with a current focus in representation in online ads.

In this Q&A, we ask De Lima about her journey at Facebook, her background, and her current research projects. She shares insights from her recent white paper, “Diverse and inclusive representation in online advertising: An exploration of the current landscape and people’s expectations,” and explains what marketers should take away from this research.

Q: Tell us about your experience in academia before joining Facebook.

Fernanda de Lima Alcantara: I first started my career in Brazil as a telecommunications technician, but soon I found my passion for data analysis and earned an undergraduate degree in computer science. For almost six years, I worked with data mining and decision science in the finance sector. I also obtained many certifications in analytical and statistical tools. To continue growing my skills in quantitative and qualitative analysis, I moved to Europe to pursue a master’s in machine learning at University College London.

What excited me about machine learning is that it can be applied to multiple domains (like neuroscience, bioinformatics, machine vision, and so on) to solve real-world problems using a data-driven approach. I learned to design, develop, and evaluate appropriate algorithms and methods for new applications, as well as some new techniques to analyze data. I felt the machine learning master’s program was strongly aligned with my business experience and my field of interest.

Q: What has your journey with Facebook been like so far?

FDLA: I joined Facebook in 2012 in the São Paulo office. In Brazil, I helped many businesses grow by transforming current marketing practices and developing new strategies, always grounded in our foundational measurement practices. Over the years, I worked on projects using simple aggregation, descriptive analysis, or more advanced analyses using data models and causal inference.

I officially joined the research team five years ago, when I moved to the United States to work from the Facebook Menlo Park office in California. In the first two years, I was dedicated to consumer insights and spent time studying the intersection of advertiser value and consumer behavior within ads products. I worked on a range of projects, some focused on the consumer journey and others focused on understanding how people feel about our products. It was very exciting to work with a breadth of methodologies like behavioral lab and consumer neuroscience, passive measurement in sales touchpoints, surveys, focus groups, and in-depth interviews.

For the last three years, I’ve been working in New York as a Marketing Science Researcher in Creative Shop. Every day, I’m provided with the unique opportunity to explore the creative potential of Facebook platforms and help businesses connect with people in meaningful ways and succeed. In my day-to-day, I use experimental design, online surveys, and Facebook data to build tools for statistical, qualitative, and quantitative analysis. My goal is to learn, share, and inspire business with new possibilities through data, creativity, and storytelling. And I love working at the intersection of art and science.

Q: What are you currently working on?

FDLA: Every half we are presented with exciting challenges to advance the industry. Currently, I have two projects that are top of mind: The first one promotes diverse and inclusive representation in online advertising, and the second explores the creative opportunities in emerging platforms.

The first project is very close to my heart because it promotes social justice and business equality. The objective is to identify opportunities to better represent people in online ads, inspire more inclusive and authentic advertising content, and uncover the positive impact of inclusive portrayal — for people and businesses.

The second project investigates the new ways people are connecting online and the new creative potential for people and businesses. In this project, I explore creative ideas to help businesses succeed in AR, VR, and other immersive experiences.

Both projects bring me a sense of community and meaningfulness because they aim to create a positive social impact by improving people’s representation in ads and their experience with Facebook, and they support business growth.

Q: Social impact, diversity, and inclusion continue to play a big role in the advertising industry. What should marketers take away from this research?

FDLA: Our research in diverse and inclusive representation in online advertising showed that stereotypes and bias still exist within advertising, with some groups practically absent or portrayed in stereotypical ways. In contrast, people expect the advertising industry to ensure diverse voices and experiences are represented authentically, and they want to see ads that reflect their lived experiences and communities more accurately.

While there’s no single path to progress, part of this process involves getting more comfortable having conversations around inclusivity and ensuring diversity of people both building and leading creative development. It is also part of the challenge to support creative development with mechanisms to spot bias and track progress with data.

Fundamentally, people expect brands to get involved and promote better representation and portrayal of people in advertising. In doing so, they might see a range of positive effects on business outcomes.

Advertising aims to tell stories, evoke emotions, and compel actions. But to improve the representation and portrayal of people in advertising, we must close the gap between what people want to see in advertising and what the ad creative — that is, characters and storyline — is actually showing them. This is how we can better reflect the full breadth of people we serve and make progress.

More details can be found in the white paper.

Q: Where can people learn more about your research?

FDLA: You can find an article about this research at fb.me/representationinads.

To learn more about how Facebook is celebrating the achievements of women during Women’s History Month, visit Newsroom.

The post Diverse representation in advertising: Q&A with Creative Shop Researcher Fernanda de Lima Alcantara appeared first on Facebook Research.

Read More

In Genomics Breakthrough, Harvard, NVIDIA Researchers Use AI to Spot Active Areas in Cell DNA

Like a traveler who overpacks a suitcase with a closet’s worth of clothes, most cells in the body carry around a complete copy of a person’s DNA, with billions of base pairs crammed into the nucleus.

But an individual cell pulls out only the subsection of genetic apparel that it needs to function, with each cell type — such as liver, blood or skin cells — activating different genes. The regions of DNA that determine a cell’s unique function are opened up for easy access, while the rest remains wadded up around proteins.

Researchers from NVIDIA and Harvard University’s Department of Stem Cell and Regenerative Biology have developed a deep learning toolkit to help scientists study these accessible regions of DNA, even when sample data is noisy or limited — which is often the case in the early detection of cancer and other genetic diseases.

AtacWorks, featured today in Nature Communications, both denoises sequencing data and identifies areas with accessible DNA, and can run inference on a whole genome in just half an hour with NVIDIA Tensor Core GPUs. It’s available on NGC, NVIDIA’s hub of GPU-optimized software.

AtacWorks works with ATAC-seq, a popular method for finding open areas in the genome in both healthy and diseased cells, enabling critical insights for drug discovery.

ATAC-seq typically requires tens of thousands of cells to get a clean signal — making it very difficult to investigate rare cell types, like the stem cells that produce blood cells and platelets. By applying AtacWorks to ATAC-seq data, the same quality of results can be achieved with just tens of cells, enabling scientists to learn more about the sequences active in rare cell types, and to identify mutations that make people more vulnerable to diseases.

“With AtacWorks, we’re able to conduct single-cell experiments that would typically require 10 times as many cells,” says paper co-author Jason Buenrostro, assistant professor at Harvard and the developer of the ATAC-seq method. “Denoising low-quality sequencing coverage with GPU-accelerated deep learning has the potential to significantly advance our ability to study epigenetic changes associated with rare cell development and diseases.”

Needle in a Noisy Haystack

Buenrostro pioneered ATAC-seq in 2013 as a way to scan the epigenome to locate sites with accessible areas within a chromosome, known as chromatin. The method, popular among leading genomics research labs and pharmaceutical companies, measures the intensity of a signal at every region across the genome. Peaks in the signal correspond to areas with open DNA.

The fewer the cells available, the noisier the data appears — making it difficult to identify which areas of the DNA are accessible.

AtacWorks, a PyTorch-based convolutional neural network, was trained on labeled pairs of matching ATAC-seq datasets: one high quality and one noisy. Given a downsampled copy of the data, the model learned to predict an accurate high-quality version and identify peaks in the signal.

The researchers found that using AtacWorks, they could identify accessible chromatin in a noisy sequence of 1 million reads nearly as well as traditional methods did with a clean dataset of 50 million reads. With this capability, scientists could conduct research with a smaller number of cells, significantly reducing the cost of sample collection and sequencing.

Analysis, too, becomes faster and cheaper with AtacWorks: Running on NVIDIA Tensor Core GPUs, the model took under 30 minutes for inference on a whole genome, a process that would take 15 hours on a system with 32 CPU cores.

“With very rare cell types, it’s not possible to study differences in their DNA using existing methods,” said NVIDIA researcher Avantika Lal, lead author on the paper. “AtacWorks can help not only drive down the cost of gathering chromatin accessibility data, but also open up new possibilities in drug discovery and diagnostics.”

Enabling Insights into Disease, Drug Discovery

Looking at accessible regions of DNA could help medical researchers identify specific mutations or biomarkers that make people more vulnerable to conditions including Alzheimer’s, heart disease or cancers. This knowledge could also inform drug discovery by giving researchers a better understanding of the mechanisms of disease.

In the Nature Communications paper, the Harvard researchers applied AtacWorks to a dataset of stem cells that produce red and white blood cells — rare subtypes that couldn’t be studied with traditional methods.

With a sample set of just 50 cells, the team was able to use AtacWorks to identify distinct regions of DNA associated with cells that develop into white blood cells, and separate sequences that correlate with red blood cells.

Learn more about NVIDIA’s work in healthcare at the GPU Technology Conference, April 12-16. Registration is free. The healthcare track includes 16 live webinars, 18 special events, and over 100 recorded sessions, including a talk by Lal titled Deep Learning and Accelerated Computing for Epigenomic Data.

Subscribe to NVIDIA healthcare news

The DOI for this Nature Communications paper is 10.1038/s41467-021-21765-5.

The post In Genomics Breakthrough, Harvard, NVIDIA Researchers Use AI to Spot Active Areas in Cell DNA appeared first on The Official NVIDIA Blog.

Read More

Multimodal deep learning approach for event detection in sports using Amazon SageMaker

Have you ever thought about how artificial intelligence could be used to detect events during live sports broadcasts? With machine learning (ML) techniques, we introduce a scalable multimodal solution for event detection on sports video data. Recent developments in deep learning show that event detection algorithms are performing well on sports data [1]; however, they’re dependent upon the quality and amount of data used in model development. This post explains a deep learning-based approach developed by the Amazon Machine Learning Solutions Lab for sports event detection using Amazon SageMaker. This approach minimizes the impact of low-quality data in terms of labeling and image quality while improving the performance of event detection. Our solution uses a multimodal architecture utilizing video, static images, audio, and optical flow data to develop and fine-tune a model, followed by boosting and a postprocessing algorithm.

We used sports video data that included static 2D images and frames over time and audio data, which enabled us to train separate models in parallel. The outlined approach also enhances the performance of event detection by consolidating the models’ outcomes into one decision-maker using a boosting technique.

In this post, we first give an overview of the data. We then explain the preprocessing workflow, modeling strategy, postprocessing, and present the results.

Dataset

In this exploratory research study, we used the Sports-1 Million dataset [2], which includes 400 classes of short video clips of sports. The videos include the audio channel, enabling us to extract audio samples for multimodal model development. Among the sports in the dataset, we selected the most frequently occurring sports based on their number of data samples, resulting in 89 sports.

We then consolidated the sports in similar categories, resulting in 25 overall classes. The final list of selected sports for modeling is:

['americanfootball', 'athletics', 'badminton', 'baseball', 'basketball', 'bowling', 'boxing', 'cricket', 'cycling', 'fieldhockey', 'football', 'formula1', 'golf', 'gymnastics', 'handball', 'icehockey', 'lacrosse', 'rugby', 'skiing', 'soccer', 'swimming', 'tabletennis', 'tennis', 'volleyball', 'wrestling']

The following graph shows the number of video samples per sports category. Each video is cut into 1-second intervals.

The following graph shows the number of video samples per sports category. Each video is cut into 1-second intervals.

Data processing pipeline

The temporal modeling in this solution uses video clips with 1-second-long durations. Therefore, we first extracted 1-second length video clips from each data example. The average length of videos in the dataset is around 20 seconds, resulting in approximately 190,000 1-second video clips. We passed each second-level video clip through a frame extraction pipeline and, depending on the frames per second (fps) of the video clip, extracted the corresponding number of frames, and stored them in an Amazon Simple Storage Service (Amazon S3) bucket. The total number of frames extracted was around 3.8 million. We performed multi-processing on a SageMaker notebook using an Amazon Elastic Compute Cloud (Amazon EC2) ml.c5.large instance with 64 cores to parallelize the I/O heavy clip-extraction process. Parallelization reduced the clip extraction from hours to minutes.

To train the ML algorithms, we split the data using stratified sampling on the original clips, which prevented potential information leakage down the pipeline. In a classification setting, stratifying helps ensure that the training, validation, and test sets have approximately the same percentage of samples of each target class as the complete set. We split the data into 80/10/10 portions for training, validation, and test sets, respectively. We then reflected this splitting pattern on the 1-second video clips level and the corresponding extracted frames level.

Next, we fine-tuned the ResNet50 architecture using the extracted frames. Additionally, we trained a ResNet50 architecture using dense optical flow features extracted from the frames for each 1-second clip. Finally, we extracted audio features from 1-second clips and implemented an audio model. Each approach represented a modality in the final multimodal technique. The following diagram illustrates the architecture of the data processing and pipeline.

The following diagram illustrates the architecture of the data processing and pipeline.

The rest of this section details each modality.

Computer vision

We used two separate computer vision-based approaches to fit the data. First, we used the ResNet50 architecture to fine-tune the multi-class classification algorithm using RBG frames. Second, we used the ResNet50 architecture with the same fine-tuning strategy against optical flow frames. ResNet50 is one of the best classifiers for image data and has been remarkably successful in developing business applications.

We used a two-step fine-tuning approach: we first unfroze the last layer, added two flattened layers to the network, and fine-tuned the results for 10 epochs; we then saved the weights of this model, unfroze all the layers, and trained the entire network on the preceding sports data for 30 epochs. We used TensorFlow with Horovod for training on AWS Deep Learning AMI (DLAMI) instances. You can also use SageMaker Pipe mode to set up Horovod.

Horovod, an open-source framework for distributed deep learning, is available for use with most popular deep learning toolkits, like TensorFlow, Keras, PyTorch, and Apache MXNet. It uses the all-reduce algorithm for fast distributed training rather than using a parameter server approach, and it includes multiple optimization methods to make distributed training faster.

Since completing this project, SageMaker has introduced a new data parallelism library optimized for AWS, which allows you to use existing Horovod APIs. For more information, see New – Managed Data Parallelism in Amazon SageMaker Simplifies Training on Large Datasets.

Optical flow

For the second modality, we used an optical flow approach. The implementations of a classifier, such as ResNet50, on image data only addresses relationships of objects within the same frame, disregarding time information. A model trained this way assumes that frames are independent and unrelated.

To capture the relationships between consecutive frames, such as for recognizing human actions, we can use optical flow. Optical flow is the motion of objects between consecutive frames of sequence caused by the relative movement between the object and camera. We performed a dense optical flow algorithm on the images extracted from each 1-second video. We used OpenCV’s Gunner Farnebäck’s algorithm, which is explained in Farnebäck’s 2003 article “Two-Frame Motion Estimation Based on Polynomial Expansion” [3].

Audio event detection

ML-based audio modeling formed the third stream of our multimodal event detection solution, where audio samples were extracted from 1-second videos, resulting in audio segments in M4A format.

To explore the performance of audio models, two types of features broadly used in digital signal processing were extracted from the audio samples: Mel Spectrogram (MelSpec) and Mel-Frequency Cepstrum coefficient (MFCC). A modified version of MobileNet, a state-of-the-art architecture for audio data classification, was employed for the model development [4].

The audio processing pipeline consists of three steps, including MelSpec and MFCC features and MobileNetV2 model development:

  • First, MelSpec refers to the fast Fourier transformation of an audio segment known as spectrogram while considering Mel-Scale. Research has shown that human auditory systems are non-linearly distinguishable between certain frequencies so that the Mel-Scale equalizes the distance between frequency bands audible to a human. For our use case, MelSpec features with 128 points were calculated for model development.
  • Second, MFCC is a similar feature to MelSpec, where a linear cosine transformation is applied to the MelSpec feature as research has revealed that such a transformation can improve the performance of classification for audible sound. MFCC features with 33 points were extracted from the audio data; however, the performance of a model based on this feature was unable to compete with MelSpec, suggesting that MFCC often performs better with sequence models.
  • Finally, the audio model MobileNetV2 was adopted for our data and trained for 100 epochs with preloaded ImageNet weights. MobileNetV2 [5] is a convolutional neural network architecture that seeks to perform well on mobile devices. It’s based on an inverted residual structure, where the residual connections occur between the bottleneck layers.

Postprocessing

The objective of the postprocessing step employs a boosting algorithm to do the following:

  • Obtain video-level performance from the frame level
  • Incorporate three models of output into the decision-making process
  • Enhance the model performance in prediction using a defined class-model strategy obtained from validation sets and applied to test sets

First, the postprocessing module generated 1-second-level predicted classes and their probabilities for RGB, optical flow, and audio models. We then used a majority voting algorithm to assign the predicted class at the 1-second-level during inference.

Next, the 1-second-level computer vision and audio labels were converted to video-level performance. The results on the validations sets were compared to create a table of classes based on the model-class performance strategy for multimodal prediction against testing sets.

In the final stage, testing sets were passed through the prediction module, resulting in three labels and probabilities.

In this work, the RGB models resulted in the highest performance for all classes except badminton, where the audio model gave the best performance. The optical flow models didn’t compete with the other two models, although some research has shown that optical flow-based models could generate better results for certain datasets. The final prediction was performed by incorporating all three labels based on the predefined table to output the most probable classes.

The boosting algorithm of the prediction module is described as follows:

  1. Split videos into 1-second segments.
  2. Extract frames and audio signals.
  3. Prepare RGB frames and MelSpec features.
  4. Pass RGB frames through the trained ResNet50 by RGB samples and obtain prediction labels per frame.
  5. Pass MelSpec features through the trained MobileNet by audio samples and obtain prediction labels for each 1-second audio sample.
  6. Calculate 1-second-level RGB labels and probabilities.
  7. Use a predefined table (obtained from validation results).
  8. If the badminton class is found among two labels associated with a 1-second sample, vote for the audio model (get the label and probability from the audio model). Otherwise, vote for the RBG model (get the label and probability from the RGB model).

Results

The following graph shows the averaged frame-level F1 scores of the three models against two validation datasets; the error bars represent the standard deviations.

The following graph shows the averaged frame-level F1 scores of the three models against two validation datasets

Similarly, the following graph compares the F1 scores for three models per class measured for two testing datasets before postprocessing (average and standard deviation as error bars).

Similarly, the following graph compares the F1 scores for three models per class measured for two testing datasets before postprocessing

After applying the multimodal prediction module to the testing datasets to convert frame-level and 1-second-level predictions, the postprocessed video-level metrics were produced (see the following graph) and showed a significant improvement from the frame-level single modality to the video-level multimodal outputs.

After applying the multimodal prediction module to the testing datasets to convert frame-level and 1-second-level predictions, the postprocessed video-level metrics were produced (see the following graph) and showed a significant improvement from the frame-level single modality to the video-level multimodal outputs.

As previously mentioned, the class-model table was prepared using the comparison of three models for validation sets.

The analysis demonstrated that the multimodal approach could improve the performance of multi-class event detection by 5.10%, 55.68%, and 34.2% for single RGB, optical flow, and audio models, respectively. In addition, the confusion matrices for postprocessed testing datasets, shown in the following figures, indicated that the multimodal approach could predict most classes in a challenging 25-class event detection task.

The following figure shows the video-level confusion matrix of the first testing dataset after postprocessing.

The following figure shows the video-level confusion matrix of the first testing dataset after postprocessing.

The following figure shows the video-level confusion matrix of the second testing dataset after postprocessing.

The following figure shows the video-level confusion matrix of the second testing dataset after postprocessing.

The modeling workflow explained in this post assumes that the data examples in the dataset are all relevant, are all labeled correctly, and have similar distributions among each class. However, the authors’ manual observation of the data sometimes found substantial differences in video footage from one sample to another in the same class. Therefore, one of the areas of improvement that can have great impact on the performance of the model is to further prune the dataset to only include the relevant training examples and provide better labeling.

We used the multimodal model prediction against the testing dataset to generate the following demo for 25 sports, where the bars demonstrate the probability of each class per second (we called it 1-second-level prediction).

Conclusion

This post outlined a multimodal event detection approach using a combination of RGB, optical flow, and audio models through robust ResNet50 and MobileNet architectures implemented on SageMaker. The results of this study demonstrated that, by using a parallel model development, multimodal event detection improved the performance of a challenging 25-class event detection task in sports.

A dynamic postprocessing module enables you to update predictions after new training to enhance the model’s performance against new data.

About Amazon ML Solutions Lab

The Amazon ML Solutions Lab pairs your team with ML experts to help you identify and implement your organization’s highest value ML opportunities. If you’d like help accelerating your use of ML in your products and processes, please contact the Amazon ML Solutions Lab.

Disclaimer

Editor’s note: The dataset used in this post is for non-commercial demonstration and exploratory research.

References

[1] Vats, Kanav, Mehrnaz Fani, Pascale Walters, David A. Clausi, and John Zelek. “Event Detection in Coarsely Annotated Sports Videos via Parallel Multi-Receptive Field 1D Convolutions.” In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pp. 882-883. 2020.

[2] Karpathy, Andrej, George Toderici, Sanketh Shetty, Thomas Leung, Rahul Sukthankar, and Li Fei-Fei. “Large-scale video classification with convolutional neural networks.” In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1725-1732. 2014.

[3] Farnebäck, Gunnar. “Two-frame motion estimation based on polynomial expansion.” In Scandinavian Conference on Image Analysis, pp. 363-370. Springer, Berlin, Heidelberg, 2003.

[4] Adapa, Sainath. “Urban sound tagging using convolutional neural networks.” arXiv preprint arXiv:1909.12699 (2019).

[5] Sandler, Mark, Andrew Howard, Menglong Zhu, Andrey Zhmoginov, and Liang-Chieh Chen. “Mobilenetv2: Inverted residuals and linear bottlenecks.” In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4510-4520. 2018.


About the Authors

 

Saman Sarraf is a Data Scientist at the Amazon ML Solutions Lab. His background is in applied machine learning including deep learning, computer vision, and time series data prediction.

 

 

 

Mehdi Noori is a Data Scientist at the Amazon ML Solutions Lab, where he works with customers across various verticals, and helps them to accelerate their cloud migration journey and solve their ML problems using state-of-the-art solutions and technologies.

Read More

PAIRED: A New Multi-agent Approach for Adversarial Environment Generation

Posted by Natasha Jaques, Google Research and Michael Dennis, UC Berkeley

The effectiveness of any machine learning method is critically dependent on its training data. In the case of reinforcement learning (RL), one can rely either on limited data collected by an agent interacting with the real world, or a simulated training environment that can be used to collect as much data as needed. This latter method of training in simulation is increasingly popular, but it has a problem — the RL agent can learn what is built into the simulator, but tends to be bad at generalizing to tasks that are even slightly different than the ones simulated. And obviously building a simulator that covers all the complexity of the real-world is extremely challenging.

An approach to address this is to automatically create more diverse training environments by randomizing all the parameters of the simulator, a process called domain randomization (DR). However, DR can fail even in very simple environments. For example, in the animation below, the blue agent is trying to navigate to the green goal. The left panel shows an environment created with DR where the positions of the obstacles and goal have been randomized. Many of these DR environments were used to train the agent, which was then transferred to the simple Four Rooms environment in the middle panel. Notice that the agent can’t find the goal. This is because it has not learned to walk around walls. Even though the wall configuration from the Four Rooms example could have been generated randomly in the DR training phase, it’s unlikely. As a result, the agent has not spent enough time training on walls similar to the Four Rooms structure, and is unable to reach the goal.

Domain randomization (left) does not effectively prepare an agent to transfer to previously unseen environments, such as the Four Rooms scenario (middle). To address this, a minimax adversary is used to construct previously unseen environments (right), but can result in creating situations that are impossible to solve.

Instead of just randomizing the environment parameters, one could train a second RL agent to learn how to set the environment parameters. This minimax adversary can be trained to minimize the performance of the first RL agent by finding and exploiting weaknesses in its policy, e.g. building wall configurations it has not encountered before. But again there is a problem. The right panel shows an environment built by a minimax adversary in which it is actually impossible for the agent to reach the goal. While the minimax adversary has succeeded in its task — it has minimized the performance of the original agent — it provides no opportunity for the agent to learn. Using a purely adversarial objective is not well suited to generating training environments, either.

In collaboration with UC Berkeley, we propose a new multi-agent approach for training the adversary in “Emergent Complexity and Zero-shot Transfer via Unsupervised Environment Design”, a publication recently presented at NeurIPS 2020. In this work we present an algorithm, Protagonist Antagonist Induced Regret Environment Design (PAIRED), that is based on minimax regret and prevents the adversary from creating impossible environments, while still enabling it to correct weaknesses in the agent’s policy. PAIRED incentivizes the adversary to tune the difficulty of the generated environments to be just outside the agent’s current abilities, leading to an automatic curriculum of increasingly challenging training tasks. We show that agents trained with PAIRED learn more complex behavior and generalize better to unknown test tasks. We have released open-source code for PAIRED on our GitHub repo.

PAIRED
To flexibly constrain the adversary, PAIRED introduces a third RL agent, which we call the antagonist agent, because it is allied with the adversarial agent, i.e., the one designing the environment. We rename our initial agent, the one navigating the environment, the protagonist. Once the adversary generates an environment, both the protagonist and antagonist play through that environment.

The adversary’s job is to maximize the antagonist’s reward while minimizing the protagonist’s reward. This means it must create environments that are feasible (because the antagonist can solve them and get a high score), but challenging to the protagonist (exploit weaknesses in its current policy). The gap between the two rewards is the regret — the adversary tries to maximize the regret, while the protagonist competes to minimize it.

The methods discussed above (domain randomization, minimax regret and PAIRED) can be analyzed using the same theoretical framework, unsupervised environment design (UED), which we describe in detail in the paper. UED draws a connection between environment design and decision theory, enabling us to show that domain randomization is equivalent to the Principle of Insufficient Reason, the minimax adversary follows the Maximin Principle, and PAIRED is optimizing minimax regret. This formalism enables us to use tools from decision theory to understand the benefits and drawbacks of each method. Below, we show how each of these ideas works for environment design:

Domain randomization (a) generates unstructured environments that aren’t tailored to the agent’s learning progress. The minimax adversary (b) may create impossible environments. PAIRED (c) can generate challenging, structured environments, which are still possible for the agent to complete.

Curriculum Generation
What’s interesting about minimax regret is that it incentivizes the adversary to generate a curriculum of initially easy, then increasingly challenging environments. In most RL environments, the reward function will give a higher score for completing the task more efficiently, or in fewer timesteps. When this is true, we can show that regret incentivizes the adversary to create the easiest possible environment the protagonist can’t solve yet. To see this, let’s assume the antagonist is perfect, and always gets the highest score that it possibly can. Meanwhile, the protagonist is terrible, and gets a score of zero on everything. In that case, the regret just depends on the difficulty of the environment. Since easier environments can be completed in fewer timesteps, they allow the antagonist to get a higher score. Therefore, the regret of failing at an easy environment is greater than the regret of failing on a hard environment:

So, by maximizing regret the adversary is searching for easy environments that the protagonist fails to do. Once the protagonist learns to solve each environment, the adversary must move on to finding a slightly harder environment that the protagonist can’t solve. Thus, the adversary generates a curriculum of increasingly difficult tasks.

Results
We can see the curriculum emerging in the learning curves below, which plot the shortest path length of a maze the agents have successfully solved. Unlike minimax or domain randomization, the PAIRED adversary creates a curriculum of increasingly longer, but possible, mazes, enabling PAIRED agents to learn more complex behavior.

But can these different training schemes help an agent generalize better to unknown test tasks? Below, we see the zero-shot transfer performance of each algorithm on a series of challenging test tasks. As the complexity of the transfer environment increases, the performance gap between PAIRED and the baselines widens. For extremely difficult tasks like Labyrinth and Maze, PAIRED is the only method that can occasionally solve the task. These results provide promising evidence that PAIRED can be used to improve generalization for deep RL.

Admittedly, these simple gridworlds do not reflect the complexities of the real world tasks that many RL methods are attempting to solve. We address this in “Adversarial Environment Generation for Learning to Navigate the Web”, which examines the performance of PAIRED when applied to more complex problems, such as teaching RL agents to navigate web pages. We propose an improved version of PAIRED, and show how it can be used to train an adversary to generate a curriculum of increasingly challenging websites:

Above, you can see websites built by the adversary in the early, middle, and late training stages, which progress from using very few elements per page to many simultaneous elements, making the tasks progressively harder. We test whether agents trained on this curriculum can generalize to standardized web navigation tasks, and achieve a 75% success rate, with a 4x improvement over the strongest curriculum learning baseline:

Conclusions
Deep RL is very good at fitting a simulated training environment, but how can we build simulations that cover the complexity of the real world? One solution is to automate this process. We propose Unsupervised Environment Design (UED) as a framework that describes different methods for automatically creating a distribution of training environments, and show that UED subsumes prior work like domain randomization and minimax adversarial training. We think PAIRED is a good approach for UED, because regret maximization leads to a curriculum of increasingly challenging tasks, and prepares agents to transfer successfully to unknown test tasks.

Acknowledgements
We would like to recognize the co-authors of “Emergent Complexity and Zero-shot Transfer via Unsupervised Environment Design”: Michael Dennis, Natasha Jaques, Eugene Vinitsky, Alexandre Bayen, Stuart Russell, Andrew Critch, and Sergey Levine, as well as the co-authors of Adversarial Environment Generation for Learning to Navigate the Web: Izzeddin Gur, Natasha Jaques, Yingjie Miao, Jongwook Choi, Kevin Malta, Manoj Tiwari, Honglak Lee, Aleksandra Faust. In addition, we thank Michael Chang, Marvin Zhang, Dale Schuurmans, Aleksandra Faust, Chase Kew, Jie Tan, Dennis Lee, Kelvin Xu, Abhishek Gupta, Adam Gleave, Rohin Shah, Daniel Filan, Lawrence Chan, Sam Toyer, Tyler Westenbroek, Igor Mordatch, Shane Gu, DJ Strouse, and Max Kleiman-Weiner for discussions that contributed to this work.

Read More

Inside Magma: A look at the team behind the software

Facebook Connectivity’s mission is to bring more people online to a faster internet. Together with partners around the world, we’re developing programs and technologies that increase the availability, affordability, and awareness of high-quality internet access. In this blog, we take a closer look at Magma, an open source software platform that enables operators and internet service providers to deploy mobile networks in hard-to-reach areas.

To help tell the story of Magma, we reached out to two members of the Facebook Magma team, Brian Barritt (Software Engineering Manager) and Ulas Kozat (Software Engineer). As experts in this space, they provide more details about Magma, its use cases, as well as its growing community of academics, developers, and industry partners.

About Magma

Every mobile network needs a high-performance packet core at the center of its network. But the market has made it difficult for communications providers to buy, deploy, and maintain the latest technologies at a reasonable cost. According to Kozat, Magma is an open-source, enhanced packet core solution that delivers flexibility, openness, and lower costs to communications service providers. This ultimately means people can experience better connectivity, whether through 4G, 5G, Wi-Fi, or other wireless access technologies.

Kozat names the following potential use cases for Magma:

  • Providing connectivity solutions for smaller populations (such as remote locations, enterprises, and factories) that need more localized, self-managed networking
  • Providing regional or national operators with a solution to fill gaps in coverage or capacity in both rural and urban areas
  • Providing low-latency, high-bandwidth access to edge cloud (like AR/VR applications), to proliferate the next generation of applications and services

Facebook Connectivity’s work by nature is highly collaborative and spans several fields of expertise, and Magma is no exception. “Facebook Connectivity open-sourced Magma in 2019, and we continue to be major contributors to the code base,” says Kozat. “Our partner engineers, marketing team, and management team build partnerships with vendors, system integrators, academics, and service providers to accelerate market adoption and bring millions of real users online powered by Magma.”

Partnerships, collaborations, and community

The Magma team actively solicits researchers to join its advanced research arm through the Magma Academic Partnership Program, which was launched in 2020. “The program aims to foster strong participation from academic researchers to advance edge connectivity over open wireless research testbeds and platforms. The program also supports research projects that more directly explore advanced use cases using the Magma platform,” says Kozat.

In line with this vision, Magma and others within Facebook Connectivity have been part of the organization committee for the academic program, and have been speakers for the first OpenWireless Workshop, which was organized as part of ACM Mobisys in June 2020.

The Magma team further fosters collaboration and community among industry and academic partners through events like the Magma Developers Conference, which took place this year on February 3, 2021. The annual event brings together developers, communications service providers, field experts, academia, and technology leaders to discuss opportunities, challenges, and new ways to improve and expand global connectivity. “The major theme of the event this year echoes Facebook Connectivity’s mission and underscores Magma’s role in connecting people to a faster internet,” says Barritt.

This year’s conference featured three talks led by academic collaborators: Sylvia Ratnasamy (University of California, Berkeley), Kurtis Heimerl (University of Washington), and Rahman Doost-Mohammady (Rice University). For those interested in learning more about the event, all the sessions are on the Open Infra Foundation’s YouTube channel.

Magma is expanding its community of developers through an open-source industry collaboration with the Linux Foundation. The Linux Foundation will provide a neutral governance framework for Magma, and is joined by other open-source communities including the Open Infrastructure Foundation and the OpenAirInterface Software Alliance. Many other partner companies of varying sizes have also joined the project.

“The sustainability of open-source projects depends on a healthy ecosystem. For Magma, there are many partners actively contributing to the codebase and are actively deploying Magma. Their business success is intertwined with the success of Magma,” Kozat says. More information about the collaboration is available on the Linux Foundation blog.

Next steps

For 2021, the Magma team will continue to emphasize the importance of collaboration. “Our efforts to include the research community in the Magma ecosystem will continue in 2021 with full thrust,” says Kozat. “New funding opportunities and support mechanisms for universities will be offered to push the envelope further than the near-term industry needs.”

To get involved with the Magma developer community, check out Magma’s GitHub page. Here you will find information about Slack channels and mailing lists. For marketing and industry-related news and announcements, visit the Magma website and subscribe to receive updates. For general updates from the Facebook research community, follow our Facebook page.

The post Inside Magma: A look at the team behind the software appeared first on Facebook Research.

Read More

Carnegie Mellon University at the Conference on Fairness, Accountability, and Transparency (ACM FAccT 2021)

This week researchers from all across computer science, social sciences and humanities are gathering for the flagship conference of the emerging field of Fairness, Accountability and Transparency in algorithmic systems: FAccT. FAccT (previously FAT*) is dedicated to the inherent risks that come with the increasing adoption of data-driven algorithmic decision making systems in socially consequential domains such as policing, criminal justice, health care and education. The conference was formed as a venue for the increasing volume of work in this area in 2018 and has since become one of the top venues in the study of societal impacts of machine learning – submissions have more than quadrupled since the inaugural conference!

Number of submitted and accepted papers at FAccT since inaugural conference.

Now in its 4th year, the fully-virtual event spans 82 paper presentations from 15 different countries across 14 time zones as well as 13 tutorials, a doctoral consortium and 10 CRAFT sessions aimed at Critiquing and Rethinking Accountability, Fairness and Transparency. Complementing paper presentations and tutorials, the CRAFT sessions aim for interaction between participants with different backgrounds including academics, journalists, advocates, activists, educators and artists with the idea of reflection and discussion of the field from a more holistic perspective.

Many influential papers have been published at FAccT even within these first few years of the conference. Examples include Joy Buolamwini and Timnit Gebru’s 2018 study on Gender Shades in which the authors uncover significantly higher error rates in commercial gender classification for darker-skinned females which led companies to adjust their algorithms and sparked a wider discussion of similar problems in computer vision. Leading up to this year’s conference, the paper ‘On the Dangers of Stochastic Parrots: Can Language Models Be Too Big?’ coming out of Google and the University of Washington has gotten much attention in the wider field as it led to the firing of Timnit Gebru as co-lead of the Ethical AI team from Google leaving both room for speculations and sparking a discussion on the future of AI ethics research in private companies.

As one of the main contributing institutions, Carnegie Mellon University is proud to present 10 papers and one tutorial at this year’s conference. Contributions are made from all across campus with authors from the Machine Learning Department, the Department of Statistics and Data Science, the Institute for Software Research, the Computer Science Department, Heinz College of Public Policy, and the Philosophy Department. Several of the studies focus on auditing existing systems in the context of predictive policing [4], image representations learned in an unsupervised manner [5], or the use of mobility data for Covid-19 policy [6]. Others propose new algorithmic solutions to analyze the allocation of opportunities for intergenerational mobility [1], post-process predictions in risk assessment [2], examine the equity of cash bail decisions [3], or understand the fairness implications of leave-one-out training data [8]. The authors of [7] focus on disparity amplification avoidance under different world views and fairness notions, while [9] introduce Value Cards, an educational toolkit for teaching the societal impacts of machine learning. Finally, the authors of [10] provide counternarratives on data sharing in Africa using a storytelling approach based on a series of interviews. We give a short description of each of the papers along with the session times at the conference and links to the preprints below.

Papers

[1] Allocating Opportunities in a Dynamic Model of Intergenerational Mobility
Hoda Heidari (Carnegie Mellon University), Jon Kleinberg (Cornell University)
Session: March 8, 22:00 – 23:45 UTC 
Tags: Algorithm Development, Fairness
Summary: The authors develop a model for analyzing the allocation of opportunities for intergenerational mobility such as higher education and find that purely payoff-maximizing objectives can still lead to a form of socioeconomic affirmative action in the optimal allocation.

[2] Fairness in Risk Assessment Instruments: Post-Processing to Achieve Counterfactual Equalized Odds
Alan Mishler (Carnegie Mellon University), Edward Kennedy (Carnegie Mellon University), Alexandra Chouldechova (Carnegie Mellon University)
Session: March 10, 20:00 – 21:30 UTC
Tags: Algorithm Development, Causality, Evaluation, Fairness
Summary: The authors develop a method to post-process existing binary predictors used in risk assessment, e.g. for recidivism prediction, to satisfy approximate counterfactual equalized odds. They discuss the convergence rate to an optimal fair predictor and propose doubly robust estimation of the risk and fairness properties of a fixed post-processed predictor.

[3] A Bayesian Model of Cash Bail Decisions
Joshua Williams (Carnegie Mellon University), Zico Kolter (Carnegie Mellon University)
Session: March 8, 20 – 21:45 UTC
Tags: Algorithm Development, Data, Fairness, Law & Policy
Summary: The authors create a hierarchical Bayesian model of cash bail assignments to analyze fairness between racial groups while overcoming the problem of infra-marginality. Results on 50 judges uniformly show that they are more likely to assign cash bail to black defendants than to white defendants given the same likelihood of skipping a court appearance.

[4] The effect of differential victim crime reporting on predictive policing systems
Nil-Jana Akpinar (Carnegie Mellon University), Maria De-Arteaga (University of Texas at Austin), Alexandra Chouldechova (Carnegie Mellon University)
Session: March 8, 20:00 – 21:45 UTC
Tags: Auditing, Data, Evaluation
Summary: The authors audit place-based predictive policing algorithms trained on victim crime reporting data and find that geographical bias arises when victim crime reporting rates vary within a city. This result requires no use of arrest data or data from police initiated contact.

[5] Image Representations Learned With Unsupervised Pre-Training Contain Human-like Biases
Ryan Steed (Carnegie Mellon University), Aylin Caliskan (George Washington University)
Session: March 8, 12:00 – 13:45 UTC
Tags: Computer Vision, Data, Evaluation, Fairness, Humanistic Theory & Critique
Summary: The authors develop a method for quantifying biased associations between representations of social concepts and attributes in images using image representations learned in an unsupervised manner. The results closely match hypotheses about intersectional bias from social psychology and suggest that machine learning models can automatically learn bias from the way people are stereotypically portrayed on the web.

[6] Leveraging Administrative Data for Bias Audits: Assessing Disparate Coverage with Mobility Data for COVID-19 Policy
Amanda Coston (Carnegie Mellon University), Neel Guha (Stanford University), Derek Ouyang (Stanford University), Lisa Lu (Stanford University), Alexandra Chouldechova (Carnegie Mellon University), Daniel E. Ho (Stanford University)
Session: March 9, 14:00 – 15:45 UTC
Tags: Auditing, Data, Evaluation
Summary: The authors audit the use of smartphone-based mobility data for COVID-19 policy by leveraging administrative voter roll data in the absence of demographic information. Their results suggest that older and non-white voters are less liekely to be captured by mobility data which can disproportionally harm these groups if allocation of public health resources is based on such data sets.

[7] Avoiding Disparity Amplification under Different Worldviews
Samuel Yeom (Carnegie Mellon University), Michael Carl Tschantz (International Computer Science Institute)
Session: Match 10, 14:00 – 15:45 UTC
Tags: Data, Evaluation, Metrics
Summary: The authors mathematically compare competing definitions of group-level fairness and their properties under various worldviews which are assumptions about how, if at all, the observed data is biased. They discuss the criterion of disparity amplification and introduce a new world view with a corresponding notion of fairness as a more realistic perspective.

[8] Leave-one-out Unfairness
Emily Black (Carnegie Mellon University), Matt Fredrikson (Carnegie Mellon University)
Session: March 9, 22:00 – 23:45 UTC
Tags: Algorithm Development, Data, Evaluation, Fairness, Metrics 
Summary: The authors introduce leave-one-out unfairness which focuses on the change of prediction for an individual due to inclusion or exclusion of a single other individual from the training data. They discuss the relation of this concept to robustness, memorization and individual fairness in deep models.

[9] Value Cards: An Educational Toolkit for Teaching Social Impacts of Machine Learning through Deliberation
Hong Shen (Carnegie Mellon University), Wesley Deng (UC Berkeley), Aditi Chattopadhyay (Carnegie Mellon University), Steven Wu (Carnegie Mellon University), Xu Wang (University of Michigan), Haiyi Zhu (Carnegie Mellon University)
Session: March 8, 22:00 – 23:45 UTC
Tags: Accountability, Education, Human Factors
Summary: The authors introduce Value Cards, an educational toolkit with topics related to Fairness, Accountability, and Ethics, and present an early use of the approach in a college-level computer science course. Results suggest that the use of the toolkit can improve students’ understanding of both technical definitions and trade-offs of performance metrics and apply them in real-world contexts.

[10] Narratives and Counternarratives on Data Sharing in Africa
Rediet Abebe (UC Berkeley), Kehinde Aruleba (University of Witwatersrand), Abeba Birhane (University College Dublin), Sara Kingsley (Carnegie Mellon University), George Obaido (University of Witwatersrand), Sekou L. Remy (IBM Research Africa), Swathi Sadagopan (Deloitte)
Session: March 9, 12:00 – 13:50 UTC
Tags: Data, Ethics, Humanistic Theory & Critique 
Summary: The authors use storytelling via fictional personas built from a series of interviews with African data experts to complicate dominant narratives and provide counternarratives on data sharing in Africa. They discuss issues arising from power imbalances and Western-centric policies in the context of open data initiatives centered around data extracted from African communities and discuss avenues for addressing these issues.

Tutorials

Sociocultural diversity in machine learning: Lessons from philosophy, psychology, and organizational science
Sina Fazelpour (Carnegie Mellon University) and Maria De-Arteaga (University of Texas at Austin)
Session: March 4, 14:00 – 15:30 UTC
Summary: The current discussion of sociocultural diversity in machine learning research leaves a gap between the conversation about measures and benefits and the philosophical, psychological and organizational research on the underlying concepts. This tutorial addresses the concepts and consequences of sociocultural diversity and situates this understanding and its implications for the discussion of sociocultural diversity in machine learning.

Read More

Juicing AI: University of Florida Taps Computer Vision to Combat Citrus Disease

Florida orange juice is getting a taste of AI.

With the Sunshine State’s $9 billion annual citrus crops plagued by a fruit-souring disease, researchers and businesses are tapping AI to help rescue the nation’s largest producer of orange juice.

University of Florida researchers are developing AI applications for agriculture. And the technology — computer vision for smart sprayers — is now being licensed and deployed in pilot tests by CCI, an agricultural equipment company.

The efforts promise to help farmers combat what’s known as “citrus greening,” the disease brought on by bacteria from the Asian citrus psyllid insect hitting farms worldwide.

Citrus greening causes patchy leaves and green fruit and can quickly decimate orchards.

The agricultural equipment supplier has seen farmers lose one-third of the orchard acreage in Florida from the onslaught of citrus greening.

“It’s having a huge impact on the state of Florida, California, Brazil, China, Mexico — the entire world is battling a citrus crisis,” said Yiannis Ampatzidis, assistant professor at UF’s Department of Agricultural and Biological Engineering.

Fertilizing Precision Agriculture

Ampatzidis works with a team of researchers focused on automation in agriculture. They develop AI applications to forecast crop yields and reduce pesticide use. The team’s image recognition models are run on the Jetson AI platform in the field for inference.

“The goal is to use Jetson Xavier to detect the size of the tree and the leaf density to instantly optimize the flow of the nozzles on sprayers for farming,” said Ampatzidis. “It also allows us to count fruit density, predict yield, and study water usage and pH levels.”

The growing popularity of organic produce and the adoption of more sustainable farming practices have drawn a field of startups plowing AI for benefits to businesses and the planet. John Deere-owned Blue River, FarmWise, SeeTree and Smart Ag are just some of the agriculture companies adopting NVIDIA GPUs for training and inference.

Like many, UF and CCI are developing applications for deployment on the NVIDIA Jetson edge AI platform. And UF has wider ambitions for fostering AI development that benefits the state

Last July, UF and NVIDIA hatched plans to build one of the world’s fastest AI supercomputers in academia, delivering 700 petaflops of processing power. Built with NVIDIA DGX systems and NVIDIA Mellanox networking, HiPerGator AI is now online to power UF’s precision agriculture research.  The new supercomputer was made possible by a $25 million donation from alumnus and NVIDIA founder Chris Malachowsky and $25 million in hardware, software, training and services from NVIDIA.

UF is a member of the NVIDIA Applied Research Accelerator Program, which supports applied research in coordination with businesses relying on NVIDIA platforms for GPU-accelerated application deployments.

Deploying Robotic Sprayers

Citrus greening has required farmers to act quickly to remove diseased trees to prevent its advances. Many orchards now have gaps in their rows of trees. As a result, conventional sprayers that apply agrochemicals uniformly along entire rows will often overspray, wasting resources and creating unnecessary environmental contamination.

UF researchers developed a sensor system of lidar and cameras for sprayers used in orchards. These sensors feed into the NVIDIA Jetson AGX Xavier, which can process split-second inference on whether the sprayer is facing a tree to spray or not, enabling autonomous spraying.

The system can adjust in real time to turn off or on the application of crop protection products or fertilizers as well as adjust the amount sprayed based on the plant’s size, said Kieth Hollingsworth, a CCI sales specialist.

“It cuts down on spraying waste overspray and on wasted material that ultimately gets washed into the groundwater. We can also predict yield based on the oranges we see on the tree,” said Hollingsworth.

Commercializing AgTech AI

CCI began working with UF eight years ago. In the past couple of years, the company has been working with the university to upgrade its infrared laser-based spraying system to one with AI.

And customers are coming to CCI for novel ways to attack the problem, said Hollingsworth.

Working with NVIDIA’s Applied Research Accelerator Program, CCI has gotten a boost with technical guidance on Jetson Xavier that has sped its development.

Citrus industry veteran Hollingsworth says AI is a useful tool in the field to wield against the crop disease that has taken some of the sweetness out of orange juice over the years.

“People have no idea how complex of a crop oranges are to grow and what it takes to produce and squeeze the juice that goes into a glass of orange juice,” said Hollingsworth.

Academic researchers can apply now for the Applied Research Accelerator Program.

Photo credit: Samuel Branch on Unsplash

The post Juicing AI: University of Florida Taps Computer Vision to Combat Citrus Disease appeared first on The Official NVIDIA Blog.

Read More

What Is a Cluster? What Is a Pod?

Everything we do on the internet — which is just about everything we do these days — depends on the work of clusters, which are also called pods.

When we stream a hot new TV show, order a pair of jeans or Zoom with grandma, we use clusters. You’re reading this story thanks to pods.

So, What Is a Cluster? What Is a Pod?

A cluster or a pod is simply a set of computers linked by high-speed networks into a single unit.

Computer architects must have reached, at least unconsciously, for terms rooted in nature. Pea pods and dolphin superpods, like today’s computer clusters, show the power of many individuals working as a team.

The Roots of Pods and Superpods

The links go deeper. Botanists say pods not only protect and nourish individual peas, they can reallocate resources from damaged seeds to thriving ones. Similarly, a load balancer moves jobs off a failed compute node to a functioning one.

The dynamics aren’t much different for dolphins.

Working off the coast of the Bahamas, veteran marine biologist Denise Herzing often sees every day the same pods, family groups of perhaps 20 dolphins. And once she encountered a vastly larger group.

“Years ago, off the Baja peninsula, I saw a superpod. It was very exciting and a little overwhelming because as a researcher I want to observe a small group closely, not a thousand animals spread over a large area,” said the founder of the Wild Dolphin Project.

For dolphins, superpods are vital. “They protect the travelers by creating a huge sensory system, a thousand sets of ears that listen for predators like one super-sensor,” she said, noting the parallels with the clusters used in cloud computing today.

Warehouse-sized data center with many clusters or pods.
A data center with multiple clusters or pods can span multiple buildings, and run as a single system.

Pods Sprout in Early Data Centers

As companies began computerizing their accounting systems in the early 1960s, they instinctively ganged multiple computers together so they would have backups in case one failed, according to Greg Pfister, a former IBM technologist and an expert on clusters.

“I’m pretty sure NCR, MetLife and a lot of people did that kind of thing,” said Pfister, author of In Search of Clusters, considered by some the bible of the field.

In May 1983, Digital Equipment Corp. packed several of its popular 32-bit VAX minicomputers into what it called a VAXcluster. Each computer ran its own operating system, but they shared other resources, providing IT users with a single system image.

An early cluster diagram
Diagram of an early PC-based cluster.

By the late 1990s, the advent of low-cost PC processors, Ethernet networks and Linux inspired at least eight major research projects that built clusters. NASA designed one with 16 PC motherboards on two 10 Mbit/second networks and dubbed it Beowulf, imagining it slaying the giant mainframes and massively parallel systems of the day.

Cluster Networks Need Speed

Researchers found clusters could be assembled quickly and offered high performance at low cost, as long as they used high-speed networks to eliminate bottlenecks.

Another late ‘90’s project, Berkeley’s Network of Workstations (NoW), linked dozens of Sparc workstations on the fastest interconnects of the day. They created an image of a pod of small fish eating a larger fish to illustrate their work.

Berkeley NoW image of pod or superpod cluster
Researchers behind Berkeley’s NoW project envisioned clusters of many small systems out-performing a single larger computer.

One researcher, Eric Brewer, saw clusters were ideal for emerging internet apps, so he used the 100-server NoW system as a search engine.

“For a while we had the best search engine in the world running on the Berkeley campus,” said David Patterson, a veteran of NoW and many computer research projects at Berkeley.

The work was so successful, Brewer co-founded Inktomi, an early search engine built on a NoW-inspired cluster of 1,000 systems. It had many rivals, including a startup called Google with roots at Stanford.

“They built their network clusters out of PCs and defined a business model that let them grow and really improve search quality — the rest was history,” said Patterson, co-author of a popular textbook on computing.

Today, clusters or pods are the basis of most of the world’s TOP500 supercomputers as well as virtually all cloud computing services. And most use NVIDIA GPUs, but we’re getting ahead of the story.

Pods vs. Clusters: A War of Words

While computer architects called these systems clusters, some networking specialists preferred the term pod. Turning the biological term into a tech acronym, they said POD stood for a “point of delivery” of computing services.

The term pod gained traction in the early days of cloud computing. Service providers raced to build ever larger, warehouse-sized systems often ordering entire shipping containers, aka pods, of pre-configured systems they could plug together like Lego blocks.

First Google cluster in a container
An early prototype container delivered to a cloud service provider.

More recently, the Kubernetes group adopted the term pod. They define a software pod as “a single container or a small number of containers that are tightly coupled and that share resources.”

Industries like aerospace and consumer electronics adopted the term pod, too, perhaps to give their concepts an organic warmth. Among the most iconic examples are the iPod, the forerunner of the iPhone, and the single-astronaut vehicle from the movie 2001: A Space Odyssey.

When AI Met Clusters

In 2012, cloud computing services heard the Big Bang of AI, the genesis of a powerful new form of computing. They raced to build giant clusters of GPUs that, thanks to their internal clusters of accelerator cores, could process huge datasets to train and run neural networks.

To help spread AI to any enterprise data center, NVIDIA packs GPU clusters on InfiniBand networks into NVIDIA DGX Systems. A reference architecture lets users easily scale from a single DGX system to an NVIDIA DGX POD or even a supercomputer-class NVIDIA DGX SuperPOD.

For example, Cambridge-1, in the United Kingdom, is an AI supercomputer based on a DGX SuperPOD, dedicated to advancing life sciences and healthcare. It’s one of many AI-ready clusters and pods spreading like never before. They’re sprouting like AI itself, in many shapes and sizes in every industry and business.

The post What Is a Cluster? What Is a Pod? appeared first on The Official NVIDIA Blog.

Read More