Facebook AI – Page 12

Measuring social capital in Facebook Groups

December 21, 2021

by Facebook AI

In a recent paper, Meta researchers propose that diversity in social capital can be facilitated by connecting in Facebook Groups and offer a few ways to measure that diversity.Read More

Announcing the winners of the Mathematical Modeling and Optimization for Large-Scale Distributed Systems request for proposals

December 21, 2021

by Facebook AI

Meta announces the winners of the Mathematical modeling and optimization for large-scale distributed systems request for proposalsRead More

Four Core Data Science researchers discuss opportunities for academics at Meta

December 20, 2021

by Facebook AI

Meta’s Core Data Science team discusses opportunities for academics at Meta such as full-time employment, temporary employment, and dual affiliation.Read More

Q&A with Sin Lee Loh, UX Research Manager based in Singapore

December 18, 2021

by Facebook AI

2021 Academic Review: Developer Infrastructure

December 17, 2021

by Facebook AI

Introducing Bean Machine, a probabilistic programming platform built on PyTorch

December 15, 2021

by Facebook AI

Today, we’re excited to announce an early beta release of Bean Machine, a PyTorch-based probabilistic programming system that makes it easy to represent and to learn about uncertainties in the machine learning models that we work with every day.Read More

Announcing the winners of the Building Tools to Enhance Transparency in Fairness and Privacy RFP

December 10, 2021

by Meta Research Facebook AI

In August, Meta, formerly known as Facebook, launched the Building Tools to Enhance Transparency in Fairness and Privacy request for proposals (RFP). Today, we’re announcing the winners of this award.
VIEW RFPThrough this RFP, we hope to support academics in building the trusted tools to more effectively monitor systems that help spot concerns in fairness, privacy, and safety.

“Improving fairness and privacy across the internet is an ambitious goal, and one that requires a consistent investment in new ideas and researchers who can bring them to life,” said Will Bullock, Meta Statistics and Privacy Director. “We’re excited to support these leading scholars, and eagerly anticipate their breakthroughs in the years to come.”

The RFP attracted 50 proposals from 40 universities and institutions around the world. Thank you to everyone who took the time to submit a proposal, and congratulations to the winners.

Research award winners

Principal investigators are listed first unless otherwise noted.

A tool to study the efficacy of fairness algorithms on specific bias types
Hoda Heidari, Haiyi Zhu, Steven Wu (Carnegie Mellon University)

Analyzing the accuracy, transparency and privacy of profiling algorithms
Ruben Cuevas Rumin, Angel Cuevas Rumin, Patricia Callejo Pinardo, Pelayo Vallina Rodriguez (University Carlos III de Madrid)

Comprehensive privacy auditing in machine learning
Reza Shokri, Vincent Bindschaedler (National University of Singapore)

Galaxy: a library for safeguarding deep neural networks against unknowns
Sharon Li (University of Wisconsin–Madison)

High-confidence long-term safety and fairness guarantees
Philip Thomas, Yuriy Brun (University of Massachusetts Amherst)

Towards ML governance with accountability and auditing
Nicolas Papernot (University of Toronto)

The post Announcing the winners of the Building Tools to Enhance Transparency in Fairness and Privacy RFP appeared first on Facebook Research.

Improving experiment precision with machine learning

December 9, 2021

by Yongyi Guo, Dominic Coey, Mikael Konutgan, Wenting Li, Chris Schoener, Matt Goldman Facebook AI

The challenge of noise in experiments

Experimentation is a central part of data-driven product development, yet in practice the results from experiments may be too imprecise to be of much help in improving decision-making. One possible response is to reduce statistical noise by simply running larger experiments. However, this is not always desirable, or even feasible. This raises the question of how we can make better use of the data we have and get sharper, more precise experimental estimates without having to enroll more people in the test.

In a collaboration between Meta’s Core Data Science and Experimentation Platform teams, we developed a new methodology for making progress on this problem, which both has formal statistical guarantees and is scalable enough to implement in practice. The work, described in detail in our NeurIPS paper, allows for general machine learning (ML) techniques to be used in conjunction with experimental data to substantially increase the precision of experimental estimates, relative to other existing methods.

How it works

Our algorithm, MLRATE (machine learning regression-adjusted average treatment effects), involves two main steps. First, we train a model predicting the experimental outcome of interest, given a set of pre-experimental covariates. Second, we use these predictions as a control variable in a linear regression. The coefficient on the treatment effect estimator is our variance-reduced average treatment effect estimator.

In the first step, we use sample splitting, so that predicted outcomes for each observation are generated by a model trained on data not including that observation. This allows us to use a broad class of ML methods in the first step, and gives us the flexibility to choose whichever model does the best job at predicting outcomes. The ML method in question may even be asymptotically biased, and not even converge to the truth in large samples, without affecting the validity of our estimator.

In the second step, we treat the predictions from the first step as a control variable in a linear regression. This form of linear regression adjustment is relatively common in the analysis of experimental data (e.g., Lin [2013], Deng et al. [2013]). The contribution of our paper is to show how this methodology can be generalized to accommodate control variables, which are themselves the output of a potentially complex ML algorithm.

Empirical results

To quantify the variance reduction gains one might expect from MLRATE in practice, we implemented it in A/A tests for a set of 48 outcome metrics commonly monitored in Meta experiments. Using either gradient-boosted decision trees or elastic net regression for the ML prediction step, we find that MLRATE has, on average, over 70 percent lower variance than the simple difference-in-means estimator for these metrics, and about 19 percent lower variance than the common univariate procedure, which adjusts only for pre-experiment values of the outcome.

Alternatively, to achieve the same precision as MLRATE, the conventional difference-in-means estimator would require sample sizes over five times as large on average across metrics, and the univariate linear regression procedure would require sample sizes about 1.6 times as large. The figure above displays the metric-level distribution of confidence interval widths relative to the univariate adjustment case. There is substantial heterogeneity in performance across metrics: For some, ML regression adjustment delivers only modest gains relative to univariate adjustment; for others, it drastically shrinks confidence intervals. This is natural given the variety of metrics in the analysis: Some, especially binary or discrete outcomes, may benefit more from more sophisticated predictive modeling, whereas for others, simple linear models may perform well.

Why MLRATE matters in practice

A couple of features of this methodology make it relatively straightforward to implement in practice. First, the formulas for calculating treatment effect estimators and confidence intervals are no more complex than they are in case of conventional linear regression adjustment. Second, most common off-the-shelf ML methods can be used for the prediction stage, as long as the covariates used are pre-experiment. Finally, MLRATE does not require an investment in ML modeling for each individual experiment to work well. Once predictive models have been trained for an outcome of interest, they can be used for many experiments, so the cost of the ML training does not scale with the number of experiments.

If you’re dealing with the problem of excessive noise in your experiments and you can construct good predictors of the outcome of interest, MLRATE may be a helpful new tool for variance reduction. Depending on the metric, it may even be the difference between experimentation being feasible or not. For more details, check out our NeurIPS paper.

The post Improving experiment precision with machine learning appeared first on Facebook Research.

Announcing the winners of the City-Scale 3D Map Making with Mapillary Metropolis request for proposals

December 9, 2021

by Meta Research Facebook AI

In July 2021, Meta launched the Benchmarking City-Scale 3D Map Making with Mapillary Metropolis request for proposals (RFP). Today, we’re announcing the winners of this award.

VIEW RFP

Earlier this year, we introduced a novel, city-scale data set called Mapillary Metropolis, which was designed with the goal of creating a completely novel and complex benchmarking paradigm for training and testing computer vision algorithms in the context of semantic 3D map making.

For this RFP, we sought research proposals that leveraged Mapillary Metropolis to improve basic computer vision algorithms that use one or preferably multiple data modalities from our data set for improving semantic 3D building. We were particularly interested in the following areas:

City-scale 3D modeling from heterogeneous data sources
ML for object recognition, tracking, and dense labeling
Image-based matching, relocalization, and retrieval

The RFP attracted 29 proposals from 27 universities and institutions around the world. Thank you to everyone who took the time to submit a proposal, and congratulations to the winners.

Research award winners

Principal investigators are listed first unless otherwise noted.

Factorized, object-centric implicit representations for city-scale scenes
Jiajun Wu, Hong-Xing (Koven) Yu (Stanford University)

Multi-modal 6DOF visual relocalization in Mapillary Metropolis
Torsten Sattler, Zuzana Kukelova (Czech Technical University in Prague)

Neural feature fields for photorealistic scene synthesis
Andreas Geiger (University of Tübingen, Germany)

The post Announcing the winners of the City-Scale 3D Map Making with Mapillary Metropolis request for proposals appeared first on Facebook Research.

Shops on Facebook and Instagram: Understanding relationships between products to improve buyer and seller experience

December 8, 2021

by Yaniv Sheena, Oren Sar Shalom Facebook AI

This Research in Brief summarizes various projects carried out by co-authors Yaniv Sheena and Oren Sar Shalom, along with their colleagues on the Relevance Foundations team at Meta.

What the research is:

In 2020, we launched Shops on Facebook and Instagram to make it easy for businesses to set up a digital storefront and sell online. Currently, Shops holds a massive inventory of products from different verticals and diverse sellers, where the data provided tend to be unstructured, multilingual, and in some cases missing crucial information.

Understanding these products’ core characteristics and encoding their relationships can help to unlock a variety of e-commerce experiences, whether that’s recommending similar or complementary products on the product page or diversifying shopping feeds to avoid showing the same product multiple times. To unlock these opportunities, we have established a team of researchers and engineers in Tel-Aviv with the goal of creating a product graph that accommodates different product relations. The team has already launched capabilities that are integrated in various products across Meta.

Our research is focused on capturing and embedding different notions of relationships between products. These methods are based on signals from the products’ content (text, image, etc.) as well as past user interactions (e.g., collaborative filtering).

First, we tackle the problem of product deduplication, where we cluster together duplicates or variants of the same product. Finding duplicates or near-duplicate products among billions of items is like finding a needle in a haystack. For instance, if a local store in Israel and a big brand in Australia sell the exact same shirt or variants of the same shirt (e.g., different colors), we cluster these products together. This is challenging at a scale of billions of products with different images (some of low quality), descriptions, and languages.

Next, we introduce Frequently Bought Together (FBT), an approach for product recommendation based on products people tend to jointly buy or interact with.

How it works:

Product clustering

We developed a clustering platform that clusters similar items in real time. For every new item listed in the Shops catalog, our algorithm assigns either an existing cluster or a new cluster.

This process takes the following steps:

Product retrieval: We use image index based on GrokNet visual embedding as well as text retrieval based on an internal search back end powered by Unicorn. We retrieve up to 100 similar products from an index of representative items, which can be thought of as cluster centroids.
Pairwise similarity: We compare the new item with each representative item using a pairwise model that, given two products, predicts a similarity score.
Item to cluster assignment: We choose the most similar product and apply a static threshold. If the threshold is met, we assign the item. Otherwise, we create a new singleton cluster.

We specify two types of clustering spaces, based on business objectives:

Exact duplicates: Grouping instances of the exact same product
Product variants: Grouping variants of the same product (such as shirts in different colors or iPhones with differing amounts of storage)

For each clustering type, we train a model tailored for the specific task. The model is based on gradient boosted decision trees (GBDT) with a binary loss, and uses both dense and sparse features. Among the features, we use GrokNet embedding cosine distance (image distance), LASER embedding distance (cross-language textual representation), textual features like the Jaccard index, and a tree-based distance between products’ taxonomies. This allows us to capture both visual and textual similarities, while also leveraging signals like brand and category. Furthermore, we also experimented with SparseNN model, a deep model originally developed at Meta for personalization. It is designed to combine dense and sparse features to jointly train a network end to end by learning semantic representations for the sparse features. However, this model did not outperform the GBDT model, which is much lighter in terms of training time and resources.

Our models require training data sets for both clustering tasks: We send pairs of products to human raters to compose sets for training, validation, and evaluation. In addition, to obtain more relevant pairs with hard negatives, we utilize an active learning approach based on our existing retrieval mechanisms, followed by sampling by uncertainty and density (SUD).

To evaluate our approach, we formed a set consisting of ~100K pairs of products from the verticals Clothing & Accessories, Health & Beauty, and Home. Each pair was annotated by humans who marked whether the two products were different, exact duplications, or variants. We then measure precision and recall by inferring whether the products would reside in the same cluster, based on the above steps. Final results are pivoted by verticals, which tend to have different traits.

Pairwise similarity models performance: GBDT vs SparseNN

Clustering system-level performance by vertical

Since grouping together different products may cause unsatisfactory user experience, we tuned our models to be precision-oriented. Results suggest that we could solve a large portion of the problem but we still need to focus on improving recall. Further, we found that health & beauty products were more challenging and required better text understanding.

Frequently Bought Together (FBT)

Analysis of past purchases shows that customers often look for multiple items in a short period of time, such that together they have a synergistic utility. A notable example is a pair of jeans, together with a belt and possibly a matching shirt. When a customer is currently viewing a certain product (dubbed seed product), our task is to help them find complementary products.

Arguably, the most standard method to find products that go together is to simply count co-purchases. That is, we observe the (normalized) number of customers who purchased the seed items and, shortly afterward, another candidate product. If this amount exceeds some threshold, we say that the candidate product makes a good FBT recommendation for the seed product. However, with the ever-increasing variety of products available on Shops on Facebook and Instagram, there is always an abundance of new products that haven’t been purchased in large numbers. Reducing the recommendation threshold results in an overwhelming amount of noise — and, in particular, substitute items tangled with complementary ones.

To remedy this, we apply a two-step solution. First, we work on the category level (rather on product level) to identify pairs of categories that go together. This aggregation solves the problem of purchase sparsity, and its output was further verified by expert taxonomists. Then it then allows us to resort to a simple count-based approach, setting a low threshold but considering only pairs that belong to categories that go together.

Yet, even with a low threshold, there are many products that aren’t covered by this method. To increase coverage, we apply the following steps:

First, we utilize the variants’ model and copy recommendations of a product to its variants as well.
Second, we employ a model that predicts to what extent a pair of items are complementary based on their visual appearance.

As a training set for this model, we need a list of products that go together. To this end, we go over fashion images and extract the appeared products, assuming that products that appear in the same image make a good FBT recommendation.

To assess the performance of our approach, we conducted an experiment (A/B test) where we suggested a set of complementary items to buyers who considered a product (product page). We compared our approach with a baseline (control) consisting of suggestions that were hand-picked by sellers. FBT recommendation led to a 12 percent relative improvement in click-through rate, which proves the viability and effectiveness of that approach.

Why it matters:

Our methods to incorporate product similarities have improved various consumer-facing applications in Shops. First, we launched clustering-based post ranking logic, which diversifies product search results. We also showed that similarities based on intentful user actions led to better recommendation compared to suggestions chosen by sellers. Finally, we constantly collaborate with different teams across Shops to leverage our signals and improve relevance. Through intensive A/B testing, we learned that capturing relationships between products is a significant step in unlocking better user experiences.

What’s next:

We’re currently developing a holistic model that considers simultaneously behavioral data like co-views, co-purchases (distinct users who are viewing or buying the same product), and the preferences of the users who interacted with each item, together with product information like image, textual description, price, and brand. These two types of modalities, buyer engagement and product information, are learned in a mutual reinforcement manner where one type of modality acts as the label for the other type. Concretely, given a seed product, the behavioral modality allows us to find two products such that one of them makes a better recommendation than the other, thereby allowing the side information to be learned using triplet loss. Likewise, the side information modality generates triplets that allow to improve the behavioral features.

The post Shops on Facebook and Instagram: Understanding relationships between products to improve buyer and seller experience appeared first on Facebook Research.

Vedere AI

Posts in category: Facebook AI

Measuring social capital in Facebook Groups

Announcing the winners of the Mathematical Modeling and Optimization for Large-Scale Distributed Systems request for proposals

Four Core Data Science researchers discuss opportunities for academics at Meta

Q&A with Sin Lee Loh, UX Research Manager based in Singapore

2021 Academic Review: Developer Infrastructure

Introducing Bean Machine, a probabilistic programming platform built on PyTorch

Announcing the winners of the Building Tools to Enhance Transparency in Fairness and Privacy RFP

Research award winners

Improving experiment precision with machine learning

The challenge of noise in experiments

How it works

Empirical results

Why MLRATE matters in practice

Announcing the winners of the City-Scale 3D Map Making with Mapillary Metropolis request for proposals

Research award winners

Shops on Facebook and Instagram: Understanding relationships between products to improve buyer and seller experience

What the research is:

How it works:

Product clustering

Frequently Bought Together (FBT)

Why it matters:

What’s next:

Navigation

GenAI Vision Endless Possibilities

"I'm interested in things that change the world or that affect the future and wondrous, new technology where you see it, and you're like, 'Wow, how did that even happen? How is that possible?'" -- Elon Musk

Copyright © 2019-2025 Vedere AI. All Rights Reserved.