Announcing the winners of the Building Tools to Enhance Transparency in Fairness and Privacy RFP

In August, Meta, formerly known as Facebook, launched the Building Tools to Enhance Transparency in Fairness and Privacy request for proposals (RFP). Today, we’re announcing the winners of this award.
VIEW RFPThrough this RFP, we hope to support academics in building the trusted tools to more effectively monitor systems that help spot concerns in fairness, privacy, and safety.

“Improving fairness and privacy across the internet is an ambitious goal, and one that requires a consistent investment in new ideas and researchers who can bring them to life,” said Will Bullock, Meta Statistics and Privacy Director. “We’re excited to support these leading scholars, and eagerly anticipate their breakthroughs in the years to come.”

The RFP attracted 50 proposals from 40 universities and institutions around the world. Thank you to everyone who took the time to submit a proposal, and congratulations to the winners.

Research award winners

Principal investigators are listed first unless otherwise noted.

A tool to study the efficacy of fairness algorithms on specific bias types
Hoda Heidari, Haiyi Zhu, Steven Wu (Carnegie Mellon University)

Analyzing the accuracy, transparency and privacy of profiling algorithms
Ruben Cuevas Rumin, Angel Cuevas Rumin, Patricia Callejo Pinardo, Pelayo Vallina Rodriguez (University Carlos III de Madrid)

Comprehensive privacy auditing in machine learning
Reza Shokri, Vincent Bindschaedler (National University of Singapore)

Galaxy: a library for safeguarding deep neural networks against unknowns
Sharon Li (University of Wisconsin–Madison)

High-confidence long-term safety and fairness guarantees
Philip Thomas, Yuriy Brun (University of Massachusetts Amherst)

Towards ML governance with accountability and auditing
Nicolas Papernot (University of Toronto)

The post Announcing the winners of the Building Tools to Enhance Transparency in Fairness and Privacy RFP appeared first on Facebook Research.

Read More

Improving experiment precision with machine learning

The challenge of noise in experiments

Experimentation is a central part of data-driven product development, yet in practice the results from experiments may be too imprecise to be of much help in improving decision-making. One possible response is to reduce statistical noise by simply running larger experiments. However, this is not always desirable, or even feasible. This raises the question of how we can make better use of the data we have and get sharper, more precise experimental estimates without having to enroll more people in the test.

In a collaboration between Meta’s Core Data Science and Experimentation Platform teams, we developed a new methodology for making progress on this problem, which both has formal statistical guarantees and is scalable enough to implement in practice. The work, described in detail in our NeurIPS paper, allows for general machine learning (ML) techniques to be used in conjunction with experimental data to substantially increase the precision of experimental estimates, relative to other existing methods.

How it works

Our algorithm, MLRATE (machine learning regression-adjusted average treatment effects), involves two main steps. First, we train a model predicting the experimental outcome of interest, given a set of pre-experimental covariates. Second, we use these predictions as a control variable in a linear regression. The coefficient on the treatment effect estimator is our variance-reduced average treatment effect estimator.

In the first step, we use sample splitting, so that predicted outcomes for each observation are generated by a model trained on data not including that observation. This allows us to use a broad class of ML methods in the first step, and gives us the flexibility to choose whichever model does the best job at predicting outcomes. The ML method in question may even be asymptotically biased, and not even converge to the truth in large samples, without affecting the validity of our estimator.

In the second step, we treat the predictions from the first step as a control variable in a linear regression. This form of linear regression adjustment is relatively common in the analysis of experimental data (e.g., Lin [2013], Deng et al. [2013]). The contribution of our paper is to show how this methodology can be generalized to accommodate control variables, which are themselves the output of a potentially complex ML algorithm.

Empirical results

To quantify the variance reduction gains one might expect from MLRATE in practice, we implemented it in A/A tests for a set of 48 outcome metrics commonly monitored in Meta experiments. Using either gradient-boosted decision trees or elastic net regression for the ML prediction step, we find that MLRATE has, on average, over 70 percent lower variance than the simple difference-in-means estimator for these metrics, and about 19 percent lower variance than the common univariate procedure, which adjusts only for pre-experiment values of the outcome.

Alternatively, to achieve the same precision as MLRATE, the conventional difference-in-means estimator would require sample sizes over five times as large on average across metrics, and the univariate linear regression procedure would require sample sizes about 1.6 times as large. The figure above displays the metric-level distribution of confidence interval widths relative to the univariate adjustment case. There is substantial heterogeneity in performance across metrics: For some, ML regression adjustment delivers only modest gains relative to univariate adjustment; for others, it drastically shrinks confidence intervals. This is natural given the variety of metrics in the analysis: Some, especially binary or discrete outcomes, may benefit more from more sophisticated predictive modeling, whereas for others, simple linear models may perform well.

Why MLRATE matters in practice

A couple of features of this methodology make it relatively straightforward to implement in practice. First, the formulas for calculating treatment effect estimators and confidence intervals are no more complex than they are in case of conventional linear regression adjustment. Second, most common off-the-shelf ML methods can be used for the prediction stage, as long as the covariates used are pre-experiment. Finally, MLRATE does not require an investment in ML modeling for each individual experiment to work well. Once predictive models have been trained for an outcome of interest, they can be used for many experiments, so the cost of the ML training does not scale with the number of experiments.

If you’re dealing with the problem of excessive noise in your experiments and you can construct good predictors of the outcome of interest, MLRATE may be a helpful new tool for variance reduction. Depending on the metric, it may even be the difference between experimentation being feasible or not. For more details, check out our NeurIPS paper.

The post Improving experiment precision with machine learning appeared first on Facebook Research.

Read More

Announcing the winners of the City-Scale 3D Map Making with Mapillary Metropolis request for proposals

In July 2021, Meta launched the Benchmarking City-Scale 3D Map Making with Mapillary Metropolis request for proposals (RFP). Today, we’re announcing the winners of this award.

VIEW RFP

Earlier this year, we introduced a novel, city-scale data set called Mapillary Metropolis, which was designed with the goal of creating a completely novel and complex benchmarking paradigm for training and testing computer vision algorithms in the context of semantic 3D map making.

For this RFP, we sought research proposals that leveraged Mapillary Metropolis to improve basic computer vision algorithms that use one or preferably multiple data modalities from our data set for improving semantic 3D building. We were particularly interested in the following areas:

  • City-scale 3D modeling from heterogeneous data sources
  • ML for object recognition, tracking, and dense labeling
  • Image-based matching, relocalization, and retrieval

The RFP attracted 29 proposals from 27 universities and institutions around the world. Thank you to everyone who took the time to submit a proposal, and congratulations to the winners.

Research award winners

Principal investigators are listed first unless otherwise noted.

Factorized, object-centric implicit representations for city-scale scenes
Jiajun Wu, Hong-Xing (Koven) Yu (Stanford University)

Multi-modal 6DOF visual relocalization in Mapillary Metropolis
Torsten Sattler, Zuzana Kukelova (Czech Technical University in Prague)

Neural feature fields for photorealistic scene synthesis
Andreas Geiger (University of Tübingen, Germany)

The post Announcing the winners of the City-Scale 3D Map Making with Mapillary Metropolis request for proposals appeared first on Facebook Research.

Read More

Shops on Facebook and Instagram: Understanding relationships between products to improve buyer and seller experience

This Research in Brief summarizes various projects carried out by co-authors Yaniv Sheena and Oren Sar Shalom, along with their colleagues on the Relevance Foundations team at Meta.

What the research is:

In 2020, we launched Shops on Facebook and Instagram to make it easy for businesses to set up a digital storefront and sell online. Currently, Shops holds a massive inventory of products from different verticals and diverse sellers, where the data provided tend to be unstructured, multilingual, and in some cases missing crucial information.

Understanding these products’ core characteristics and encoding their relationships can help to unlock a variety of e-commerce experiences, whether that’s recommending similar or complementary products on the product page or diversifying shopping feeds to avoid showing the same product multiple times. To unlock these opportunities, we have established a team of researchers and engineers in Tel-Aviv with the goal of creating a product graph that accommodates different product relations. The team has already launched capabilities that are integrated in various products across Meta.

Our research is focused on capturing and embedding different notions of relationships between products. These methods are based on signals from the products’ content (text, image, etc.) as well as past user interactions (e.g., collaborative filtering).

First, we tackle the problem of product deduplication, where we cluster together duplicates or variants of the same product. Finding duplicates or near-duplicate products among billions of items is like finding a needle in a haystack. For instance, if a local store in Israel and a big brand in Australia sell the exact same shirt or variants of the same shirt (e.g., different colors), we cluster these products together. This is challenging at a scale of billions of products with different images (some of low quality), descriptions, and languages.

Next, we introduce Frequently Bought Together (FBT), an approach for product recommendation based on products people tend to jointly buy or interact with.

How it works:

Product clustering

We developed a clustering platform that clusters similar items in real time. For every new item listed in the Shops catalog, our algorithm assigns either an existing cluster or a new cluster.

This process takes the following steps:

  • Product retrieval: We use image index based on GrokNet visual embedding as well as text retrieval based on an internal search back end powered by Unicorn. We retrieve up to 100 similar products from an index of representative items, which can be thought of as cluster centroids.
  • Pairwise similarity: We compare the new item with each representative item using a pairwise model that, given two products, predicts a similarity score.
  • Item to cluster assignment: We choose the most similar product and apply a static threshold. If the threshold is met, we assign the item. Otherwise, we create a new singleton cluster.

We specify two types of clustering spaces, based on business objectives:

  • Exact duplicates: Grouping instances of the exact same product
  • Product variants: Grouping variants of the same product (such as shirts in different colors or iPhones with differing amounts of storage)

For each clustering type, we train a model tailored for the specific task. The model is based on gradient boosted decision trees (GBDT) with a binary loss, and uses both dense and sparse features. Among the features, we use GrokNet embedding cosine distance (image distance), LASER embedding distance (cross-language textual representation), textual features like the Jaccard index, and a tree-based distance between products’ taxonomies. This allows us to capture both visual and textual similarities, while also leveraging signals like brand and category. Furthermore, we also experimented with SparseNN model, a deep model originally developed at Meta for personalization. It is designed to combine dense and sparse features to jointly train a network end to end by learning semantic representations for the sparse features. However, this model did not outperform the GBDT model, which is much lighter in terms of training time and resources.

Our models require training data sets for both clustering tasks: We send pairs of products to human raters to compose sets for training, validation, and evaluation. In addition, to obtain more relevant pairs with hard negatives, we utilize an active learning approach based on our existing retrieval mechanisms, followed by sampling by uncertainty and density (SUD).

To evaluate our approach, we formed a set consisting of ~100K pairs of products from the verticals Clothing & Accessories, Health & Beauty, and Home. Each pair was annotated by humans who marked whether the two products were different, exact duplications, or variants. We then measure precision and recall by inferring whether the products would reside in the same cluster, based on the above steps. Final results are pivoted by verticals, which tend to have different traits.

Pairwise similarity models performance: GBDT vs SparseNN

Clustering system-level performance by vertical

Since grouping together different products may cause unsatisfactory user experience, we tuned our models to be precision-oriented. Results suggest that we could solve a large portion of the problem but we still need to focus on improving recall. Further, we found that health & beauty products were more challenging and required better text understanding.

Frequently Bought Together (FBT)

Analysis of past purchases shows that customers often look for multiple items in a short period of time, such that together they have a synergistic utility. A notable example is a pair of jeans, together with a belt and possibly a matching shirt. When a customer is currently viewing a certain product (dubbed seed product), our task is to help them find complementary products.

Arguably, the most standard method to find products that go together is to simply count co-purchases. That is, we observe the (normalized) number of customers who purchased the seed items and, shortly afterward, another candidate product. If this amount exceeds some threshold, we say that the candidate product makes a good FBT recommendation for the seed product. However, with the ever-increasing variety of products available on Shops on Facebook and Instagram, there is always an abundance of new products that haven’t been purchased in large numbers. Reducing the recommendation threshold results in an overwhelming amount of noise — and, in particular, substitute items tangled with complementary ones.

To remedy this, we apply a two-step solution. First, we work on the category level (rather on product level) to identify pairs of categories that go together. This aggregation solves the problem of purchase sparsity, and its output was further verified by expert taxonomists. Then it then allows us to resort to a simple count-based approach, setting a low threshold but considering only pairs that belong to categories that go together.

Yet, even with a low threshold, there are many products that aren’t covered by this method. To increase coverage, we apply the following steps:

  • First, we utilize the variants’ model and copy recommendations of a product to its variants as well.
  • Second, we employ a model that predicts to what extent a pair of items are complementary based on their visual appearance.

As a training set for this model, we need a list of products that go together. To this end, we go over fashion images and extract the appeared products, assuming that products that appear in the same image make a good FBT recommendation.

To assess the performance of our approach, we conducted an experiment (A/B test) where we suggested a set of complementary items to buyers who considered a product (product page). We compared our approach with a baseline (control) consisting of suggestions that were hand-picked by sellers. FBT recommendation led to a 12 percent relative improvement in click-through rate, which proves the viability and effectiveness of that approach.

Why it matters:

Our methods to incorporate product similarities have improved various consumer-facing applications in Shops. First, we launched clustering-based post ranking logic, which diversifies product search results. We also showed that similarities based on intentful user actions led to better recommendation compared to suggestions chosen by sellers. Finally, we constantly collaborate with different teams across Shops to leverage our signals and improve relevance. Through intensive A/B testing, we learned that capturing relationships between products is a significant step in unlocking better user experiences.

What’s next:

We’re currently developing a holistic model that considers simultaneously behavioral data like co-views, co-purchases (distinct users who are viewing or buying the same product), and the preferences of the users who interacted with each item, together with product information like image, textual description, price, and brand. These two types of modalities, buyer engagement and product information, are learned in a mutual reinforcement manner where one type of modality acts as the label for the other type. Concretely, given a seed product, the behavioral modality allows us to find two products such that one of them makes a better recommendation than the other, thereby allowing the side information to be learned using triplet loss. Likewise, the side information modality generates triplets that allow to improve the behavioral features.

The post Shops on Facebook and Instagram: Understanding relationships between products to improve buyer and seller experience appeared first on Facebook Research.

Read More

Improving the performance of large-scale applications via basic block reordering

What the research is:

At Meta, we develop compilers to optimize the performance of our large-scale applications running in data center environments. Profile-guided optimization (PGO) is an important step in modern compilers for improving the performance of large-scale applications based on their runtime behavior. The technique leverages program execution profiles, such as the execution frequencies of binary functions or individual instructions, to guide compilers to optimize critical paths of a program more selectively and effectively.

Basic block reordering is among the most impactful PGO techniques. As part of the compilation process, a binary is broken up into smaller blocks. Many of these basic blocks are executed one after the other, but some contain conditional branches (control-flow instructions, like if-then-else, while, and switch statements) where the execution can jump to two or more blocks. Depending on the relative frequency of these jumps, some orderings of basic blocks can lead to fewer CPU instruction cache misses and therefore faster executions.

The source code on the left is transformed into the control-flow graph in the middle. The code blocks that make up this graph are laid out in memory on the right.

Profiling is used to collect information about the typical execution of an application. It allows us to learn how many times each basic block has been executed and how many times each branch has been taken. Given this information, a compiler’s job is to produce the most “CPU-friendly” ordering of basic blocks that leads to the best binary performance.

Traditional compiler approaches for basic block reordering optimize a specific dimension of CPU, such as instruction cache line utilization or branch prediction. However, we found that such orderings may impose suboptimal results. To overcome these shortcomings, we proposed a new model for block reordering that combines multiple effects and does a better job at predicting the performance of an application. Our model is based on a new optimization problem that we call the extended traveling salesman problem, or Ext-TSP for short.

How it works:

Given a collection of cities and distances between every pair of cities, the classical traveling salesman problem (TSP) involves finding an order in which to visit the cities to minimize the total distance traveled. There are many variations of this problem, like MAX-TSP, where we want to maximize the total distance traveled. Ext-TSP is a generalization of the latter problem, where we want to maximize the distance of not only two adjacent cities but also cities that are close enough together in the order — say, no more than a fixed number of positions away.

In the context of basic block reordering, the blocks play the role of the cities, and the jump counts play the role of the distances between two cities. The ordering corresponds to how the basic blocks are laid out in memory. If two basic blocks are laid out close together, then there is a good chance that a jump from one to another will not incur an instruction cache miss. In a sense, the Ext-TSP objective aims to optimize the utilization of the instruction cache and thus the performance of the application.

Our paper “Improved basic block reordering” introduces the new optimization problem. It shows that finding the best ordering is NP-hard, but also that there is a greedy efficient heuristic that produces good solutions for the instances typically arising from real-world binaries. In addition, for the aforementioned optimization problem, we describe a mixed integer programming formulation that is capable of finding optimal solutions on small functions. Our experiments with the exact method demonstrate that the new suggested heuristic finds an optimal ordering of basic blocks in over 98 percent of real-world instances. From the practical point of view, the new basic block reordering has been implemented in BOLT, an open source binary optimization and layout tool developed at Meta. An extensive evaluation on a variety of real-world data center applications indicate that the new method outperforms existing block reordering techniques, improving the resulting performance of applications with large code size.

This image shows a control-flow graph and two block layouts, one maximizing the number of fall-through jumps and another maximizing the Ext-TSP objective.

Theory behind extended TSP

Given the notoriety of the original TPS problem, we wanted to understand how much more difficult Ext-TSP was compared to the classical TSP. In our paper “On the extended TSP problem,” we study Ext-TSP from a mathematical perspective and prove both negative and positive results about the problem.

On the negative side, it turns out that Ext-TSP is much harder than classical TSP. We prove that conditional on the exponential time hypothesis, it is unlikely that there exists an efficient algorithm for solving the problem optimally, even for very simple treelike instances, like those arising from simple programs without loops. This is very surprising, as most optimization problems (including the classical TSP) admit such efficient algorithms on trees.

On the positive side, we design so-called approximation algorithms that are efficient and return a solution that is guaranteed to be at most a given fixed factor worse than the optimal solution. Given the previous impossibility of having efficient optimal algorithms, such approximation algorithms are the best we can hope for.

Why it matters:

Developing new compiler technology to optimize the performance of our servers is an impactful area of research at Meta. Faster applications mean less computer power is needed to serve our users, which directly translates into less electricity usage in our data centers and a small environmental footprint of our operations.

The post Improving the performance of large-scale applications via basic block reordering appeared first on Facebook Research.

Read More