Q&A with Sameer Hinduja and Justin Patchin of the Cyberbullying Research Center

In this monthly interview series, we turn the spotlight on members of the academic community and the important research they do — as partners, collaborators, consultants, and independent contributors.

For April, we nominated an academic duo: Sameer Hinduja (Florida Atlantic University) and Justin Patchin (University of Wisconsin–Eau Claire) of the Cyberbullying Research Center, which they created as a one-stop-shop for the most recent research on cyberbullying among adolescents. Patchin and Hinduja are top industry consultants, and they provide valuable insights that better inform our content policies. In this Q&A, they share more about their background, the formation of the Cyberbullying Research Center, their contributions to Facebook and Instagram, and their current academic research.

Q: Tell us about your backgrounds in academia.

A: We both earned master’s and doctoral degrees in criminal justice from Michigan State University. Upon entering graduate school, Sameer was interested in emerging crime issues related to technology and Justin was interested in school violence and juvenile delinquency. When we observed adolescent behaviors online, we noticed bullying occurring among this population. We started systematically studying cyberbullying and quickly learned that it was affecting young people. Using data from thousands of youths over the last two decades, we’ve been able to contribute evidence-based insights about the experiences of youths online. Thankfully, it isn’t all bad! But we use results from our research to advise teens, parents, and others about safe online practices.

Q: How did the Cyberbullying Research Center form?

A: The Cyberbullying Research Center formed from our interest in studying cyberbullying behaviors, but also in more quickly disseminating information from our research to those who could benefit from it (parents, educators, youths). We wanted a platform where we could post timely results from our studies, in the form of blog posts, research briefs, and fact sheets. We still write academic journal articles and books, but we also want to produce resources that are more easily accessible to everyone. We wanted to create a one-stop-shop people could turn to for reliable information on youth cyberbullying and other online problems.

Q: How have you contributed your expertise to Facebook and Instagram?

A: Part of our mission as action researchers is to help people prevent and more adequately respond to cyberbullying and other problematic online behaviors among adolescents. This includes working with industry partners, like Facebook, to keep them up-to-date on the latest research and help inform their policies and practices concerning inappropriate behaviors. We are also trusted partners for Facebook and Instagram, so we are able to help flag abusive content on these platforms more quickly. We also routinely walk people who use these platforms through how to deal with problematic content on the apps so that they can have positive experiences. Sometimes it can be challenging navigating all the settings and reporting features, and we know these pretty well.

Q: What have you been working on lately?

A: We recently completed a study of tween cyberbullying for Cartoon Network and are currently planning to collect more data very soon on teen cyberbullying in the United States to see whether behaviors have changed as a result of the COVID-19 pandemic. We continue to write academic articles and are in the early stages of our next book. Finally, we continue to discuss various current events and issues at the intersection of youth and social media on our blog, and we regularly create new resources for youths and youth-serving adults to use.

Q: Where can people learn more about your work?

A: You can read about our research on our website or follow us on Facebook and Instagram @cyberbullyingresearch.

The post Q&A with Sameer Hinduja and Justin Patchin of the Cyberbullying Research Center appeared first on Facebook Research.

Read More

Auto-placement of ad campaigns using multi-armed bandits

What the research is:

We look at the problem of allocating the budget of an advertiser across multiple surfaces optimally when both the demand and the value are unknown. Consider an advertiser who uses the Facebook platform to advertise a product. They have a daily budget that they would like to spend on our platform. Advertisers want to reach users where they spend time, so they spread their budget over multiple platforms, like Facebook, Instagram, and others. They want an algorithm to help bid on their behalf on the different platforms and are increasingly relying on automation products to help them achieve it.

In this research, we model the problem of placement optimization as a stochastic bandit problem. In this problem, the algorithm is participating in k different auctions, one for each platform, and needs to decide the correct bid for each of the auctions. The algorithm is given a total budget B (e.g., the daily budget) and a time horizon T over which this budget should be spent. At each time-step, the algorithm should decide the bid it will associate with each of the k platform, which will be input into the auctions for the next set of requests on each of the platforms. At the end of a round (i.e., a sequence of requests), the algorithm sees the total reward it obtained (e.g., number of clicks) and the total budget that was consumed in the process, on each of the different platforms. Based on just this history, the algorithm should decide the next set of bid multipliers it needs to place. The goal of the algorithm is to maximize the total advertiser value with the given budget across the k platforms.

How it works:

This problem can be tackled using a model of bandits called bandits with budgets. In this paper, we propose a modified algorithm that works optimally in the regime when the number of platforms k is large and the total possible value is small relative to the total number of plays. Online advertising data has this particular behavior where the budget spent and the total value the advertiser receives are much smaller compared with the total number of auctions they participate in, because of the scale in the competition pool. Thus, our algorithm is a significant improvement over prior works, which tend to focus on the regime where the total optimal value possible is comparable to the number of time-steps.

The key idea of the algorithm is to modify a primal-dual based approach in prior work [1] that can handle multiple platforms. In particular, we derive a new optimization program at each time-step whose optimal solution gives us the bid multiplier that needs to be placed at each time-step. Prior work [2] that solves an optimization program usually relies on also performing a rounding step. However, this rounding step works well only when the optimal value possible is at least √T, and hence the assumption of the optimal value being comparable to the number of time-steps is unavoidable. However, in this work, we rely on the property of this linear program [3] and show that for the special case of multiplatform bid optimization, the optimal solution is already integral, and thus, we do not need a rounding step. This is the key idea that leads to the optimal regret guarantee.

We use logged data to show that this algorithm works well in practice with desirable properties such as uniform budget consumption and small total regret. We compare it with the prior works and also other commonly used heuristics in the industry. We show that the proposed algorithm is indeed superior to all these algorithms.

Why it matters:

On the business side, this research has potential benefits for advertisers, users, and platforms. Automated products that perform many of the targeting, placement, and creative optimization on advertisers’ behalf and their adoption is rapidly rising in the larger induis increasing in number. The key challenges with these automated products are scalability and budget management. The number of possible combinations explodes exponentially while the total budget provided by the advertiser roughly remains the same. This research provides scalable and simple algorithms that can help us in creating such automated solutions by automatically manipulating the bids, in real time, in the auction mechanism. Bid is one primary lever that advertisers use to change the delivery of the ads to their desired behavior. However, they usually do so in a black-box fashion because they do not have the required data to perform optimal bidding choices. However, the advantage of using the proposed algorithm is that the bidding is near-optimal thus, getting the most value for their spend. This has benefits for both the individual advertiser and the overall ecosystem.

On the research side, “Bandits with Budgets” has primarily been studied in the theoretical computer science/operations research community purely as a mathematical problem. This research bridges the gap between theory and practice of these algorithms — by applying it to a large-scale important problem. En route to this application, we also create a new, simpler algorithm that is optimal in the parameter ranges desired in the application.

Going forward, we hope that our paper opens the door for newer applications, both within online advertising and outside of it, for this extremely general and versatile model. We believe that this line of work has huge research potential for creating new algorithms as well as for affecting core business problems.

Read the full paper:

Stochastic bandits for multiplatform budget optimization in online advertising

References:

[1] – Badanidiyuru, Ashwinkumar, Robert Kleinberg, and Aleksandrs Slivkins. “Bandits with knapsacks.” 2013 IEEE 54th Annual Symposium on Foundations of Computer Science. IEEE, 2013.
[2] – Sankararaman, Karthik Abinav, and Aleksandrs Slivkins. “Combinatorial semi-bandits with knapsacks.” International Conference on Artificial Intelligence and Statistics. PMLR, 2018.
[3] – Avadhanula, Vashist, et al. “On the tightness of an LP relaxation for rational optimization and its applications.” Operations Research Letters 44.5 (2016): 612-617.

The post Auto-placement of ad campaigns using multi-armed bandits appeared first on Facebook Research.

Read More

Introducing causal network motifs: A new approach to identifying heterogeneous spillover effects

This project is joint work with Yuan Yuan, PhD candidate at MIT, and the Facebook Core Data Science team. Learn more about CDS on the CDS team page

What the research is:

Randomized experiments, or A/B tests, remain the gold standard for evaluating the causal effect of a policy intervention or product change. However, experimental settings, such as social networks, where users are interacting with and influencing one another, may violate conventional assumptions of no interference for credible causal inference. Existing solutions to the network setting include accounting for the fraction or count of treated neighbors in a user’s network, yet most current methods do not account for the local network structure beyond simply counting the number of neighbors.

Our study provides an approach that accounts for both the local structure in a user’s social network via motifs as well as the treatment assignment conditions of neighbors. We propose a two-part approach. We first introduce and employ causal network motifs, which are network motifs that characterize the assignment conditions in local ego networks. Then, we propose a tree-based algorithm for identifying different network interference conditions and estimating their average potential outcomes. Our approach can account for social network theories, such as structural diversity and echo chambers, and also can help specify network interference conditions that are suitable for each experiment. We test our method on a synthetic network setting and on a real-world experiment on a large-scale network, which highlight how accounting for local structures can better account for different interference patterns in networks.

As an example, Figure 1 illustrates four examples of network interference that could be captured by the local network structures and treatment assignment. The first two conditions are simply the cases where all neighbors are treated or nontreated, followed by the important network interference conditions suggested by structural diversity and complex contagion, respectively. In the case of structural diversity and echo chamber settings, the ego nodes in (c) and (d) have 1/2 neighbors treated but exhibit very different local structures, and the ego’s outcome may be different in these settings. We do not know which one is the dominant factor that drives most of the variance in the outcome.


Figure 1: Examples of network interference conditions across different local network structures. The star indicates a user and a circle represents a user’s friends. Solid circles indicate that a friend is in treatment and hollow circles indicate a friend is in control. For stars, the shaded indicates that it could be treated or control.

Given the large number of researcher degrees of freedom in existing approaches for network interference, such as choosing the threshold for an exposure condition, our approach provides a simple way to automatically specify exposure conditions. In this way, researchers no longer need to define exposure conditions a priori, and the exposure conditions generated by the algorithm are suitable for the given data and experiment. We believe that methodological innovation for addressing network interference concerns in A/B tests on networks will continue to be an important area for development, and accounting for network motifs with treatment assignment conditions provides a useful way to detect heterogeneous network interference effects.

How it works:

Our study provides a two-step solution to automatically identify different exposure conditions while overcoming selection bias concerns, as will be explained in more detail in the section after Figure 2. First, for an A/B test on a network, we construct network motif features with treatment assignment conditions (i.e., causal network motifs) to provide a fine-grained characterization of the local network structure and potential interference conditions. Second, using the network motif characterization as input, we develop a tree-based algorithm to perform clustering and define the set D rather than allowing practitioners to explore that.

We introduce causal network motifs, which differ from conventional network motifs in two primary aspects. First, we focus on (1-hop) ego networks that include the ego node, with the methods generalizing to higher 𝑛-hop ego networks for 𝑛>1. Second, we consider the treatment assignment conditions of the user and their 𝑛-hop connections. We use the term “network motifs” to refer to conventional motifs without treatment assignment labels (or assignment conditions) and “causal network motifs” to refer to those with assignment conditions. Examples of network motifs are illustrated in Figure 2. We use these counts on an 𝑛-hop ego network to characterize the exposure condition of each observation.


Figure 2: Examples of causal network motifs. Stars represent egos and circles represent alters. Solid indicates the node being treated, hollow indicates control, and shaded indicates that it could be treated or control. The first patterns in each row are conventional network motifs without assignment conditions, or just called network motifs, followed by corresponding network motifs. Our interference vector is constructed by dividing the count of a causal network motif by the count of the corresponding causal network motif. The labels below each network motif indicate the naming: for example, an open triad where one neighbor is treated is named 3o-1.

After counting causal network motifs for each ego node in our network, our next step is to convert the counts to features, which will be used in the next section. Let X𝑖 denote an 𝑚-dimensional random vector, referred to as interference vector. The interference vector has an important requirement: Each element of the random vector is intervenable — that is, the random treatment assignment affects the value of each element of the vector. The requirement addresses the selection bias issue when we estimate the average potential outcomes.

We construct the interference vector in the following way. For each observation, for the count for each causal network motif (e.g., 2-1, 2-0, …, 3o-2, 3o-1, …), we normalize it by the count of the corresponding network motifs (e.g., dyads, open triads, closed triads, …). In this way, each element of X𝑖 is intervenable, and the support for each element is in [0, 1]. Note that when considering a network motif with many nodes, some observations may not have certain network motifs, and normalization cannot be performed. In these scenarios, we can either exclude this network motif from the interference vector or drop these observations if they take a really small proportion. Please refer to Figure 3 for an illustration of constructing the interference vector.


Figure 3: An example of ego network with treatment assignments and the corresponding interference vector. Stars represent egos and circles represent alters. Solid indicates the node being treated, hollow indicates control, and shaded indicates that it could be treated or control.

Then, our approach partitions [0, 1]m+1 and determines exposure conditions based on a decision tree regression. Decision trees can be used for clustering [1] and typically have good interpretability in the decision-making process [2]. Thus, it is a proper machine learning algorithm to solve the partitioning problem. Each leaf of the decision tree corresponds to a unique exposure condition (partition). Compared with conventional decision tree regression, we have several revisions to accommodate honest splitting, positivity, and so on.

Why it matters:

Network interference is much more complicated than simply being described as the indirect effect. To examine and analyze heterogeneity of indirect effects in experimental data sets, we provide a two-step solution. We first propose and employ the causal network motifs to characterize the network interference conditions, and then develop a tree-based algorithm for partitioning. Our tree-based algorithm is interpretable in terms of highlighting which exposure conditions are important for defining potential outcomes, it addresses selection bias and positivity issues, and it avoids incorrect standard error concerns via honest splitting.

Practitioners using our approach may obtain important insights. For example, they could understand how to utilize social contagion for product promotion when they have constraints on the number of promos. Researchers may identify important network interference conditions that are not theorized in certain experimental settings.

Read the full paper:

Causal network motifs: Identifying heterogeneous spillover effects in A/B tests

Learn more:

Check out our open source implementation on GitHub.

Watch our presentation at the Web Conference 2021.

[1] Bing Liu, Yiyuan Xia, and Philip S Yu. 2000. Clustering through decision tree construction. In CIKM. 20–29.
[2] J. Ross Quinlan. 1986. Induction of decision trees. Mach Learn (1986).

The post Introducing causal network motifs: A new approach to identifying heterogeneous spillover effects appeared first on Facebook Research.

Read More

Navigating industry after academia: Q&A with Software Engineering Manager Hyojeong Kim

Academics come to work at Facebook at various stages of their career. Some join right after graduating; others join after long and established careers as professors. Somewhere in between is Hyojeong Kim, who joined Facebook after spending some time in industry.

Kim is a Software Engineering Manager within network routing at Facebook. She owns routing protocols (BGP and Open/R) and related software services running in Facebook’s production network, which consists of data center, backbone, and edge networks. Her team focuses on tackling routing challenges to support next-generation production networks while ensuring production reliability.

We reached out to Kim to learn more about her academic background, her journey at Facebook so far, and her recent projects on the Networking team. She also offers advice for PhDs looking to follow a similar path.

Q: Tell us about your experience in academia before joining Facebook.

Hyojeong Kim: During my undergraduate career in South Korea in the late ’90s, I saw how the internet connected people around the world. This inspired me to want to study the core technologies that enable the internet and contribute to it, so I came to the United States and joined the PhD program in computer science at Purdue University.

During my time at Purdue, I had the opportunity to learn about Border Gateway Protocol (BGP)-based internet routing and its problems. I built an internet-scale BGP simulation test bed over a distributed Linux cluster. I was very excited to learn so many new things. As I tried to experiment with the latest measurement-based internet topologies, I encountered many distributed systems challenges due to the scale of the input, and I spent lots of time and effort creating software tools to support the distributed simulation.

These were interesting problems by themselves, but they were also very challenging. As a graduate student, I didn’t have the context for how my work could be applied to real-world problems. However, my learnings from PhD study turned out to be very useful for solving real-world problems throughout my career in various ways. It helped me to extend BGP to support fast response to internet distributed denial-of-service attacks, to build a system to improve how Facebook serves user traffic, and to build and run Facebook data center network routing software stack. In retrospect, all my learnings and experiences during my PhD study were very valuable. I just did not have perspective at that time.

Q: What brought you to Facebook?

HK: I started my career at Cisco, working on BGP routing software running on core routers used by internet service providers. I felt proud contributing to the foundation of the internet. After gaining some industry experience, when I was thinking about the next chapter of my career, I had a chance to attend the Grace Hopper Conference. I was so inspired, meeting so many women in various stages of their career and getting advice from them on helping women have successful careers in tech. I also met engineers from Facebook and heard about their experiences. It led me to join Facebook eventually.

Q: What has your journey with Facebook been like so far?

HK: I first joined Facebook as a software engineer. Coming to Facebook made it possible to pursue big research questions and be creative, which I was very excited about. I learned how Facebook’s production network is connected to the internet, and I had an exciting opportunity to build and run a software-defined networking controller called Edge Fabric. This was a research collaboration with a PhD intern and his advisers. We enhanced the system significantly and shared our operational experience with the academic community at SIGCOMM 2017.

On the Facebook Networking team, we study our own production network, identify problems, build solutions, deploy them to production, and keep iterating on solutions, receiving signals from operations. I really have enjoyed the opportunity of owning the full problem-solving cycle. At Facebook, engineers are empowered to innovate and to be bold and creative in their solutions. This encouraged me to take ownership of big challenges.

Within Facebook, changing teams or trying out different job roles is common and very much encouraged. This keeps the work exciting and challenging, and it ensures that we’re always learning new things. As a software engineer, I had the opportunity to lead a team of engineers for a couple of major initiatives. Then, I became interested in learning how to grow other engineers and how to support a team to solve multiple challenging projects. Eventually, I became a software engineering manager, and now I lead a team of software engineers within network routing.

Q: What are some of your most recent projects?

HK: I changed my focus to data center network routing a few years ago. This was the time when the team was scaling Facebook’s in-house network switch and software stack, FBOSS. The goal was to transition the data center network to FBOSS. During this time, I learned and improved the BGP-based data center routing design. I led building a scalable, performant BGP software and its testing/deployment pipeline. These allow us to treat BGP like any other software component, enabling fast incremental updates.

Using what I’ve learned over the years, I co-authored the NSDI 2021 paper “Running BGP in data centers at scale.” BGP was designed for the internet, but big web-scale companies often use it in data centers. This paper describes how we build, test, deploy, and use BGP in Facebook’s data centers, which has never been thoroughly discussed in academia before. This paper was a collaboration with our past PhD interns, Anubhavnidhi Abhashkumar and Kausik Subramanian from the University of Wisconsin, and their adviser, Aditya Akella. They helped capture our operational experience from an academic point of view.

Q: What advice would you give to current PhD candidates looking to transition to industry?

HK: If you’re a PhD candidate who’s having a similar experience as I did, where you feel unsure about how your current work would make an impact on real-world problems, I recommend looking for internship opportunities in the industry you’re interested in. When you have only academic experience, it’s difficult to know how research is applied in industry without actually having industry experience. Internships can help you contextualize your research and give you a new perspective on it, which will help you think about it in relation to solving practical problems. Additionally, you’ll make connections that could potentially result in future research collaborations. Internships also allow you to experience and explore different company cultures, which may help you find the right place to work after graduation.

Also, I recommend that PhDs attend as many networking events as possible. Attending Grace Hopper was a pivotal moment in my career, and it opened my eyes to all the places I could work.

Q: Where can people learn more about what the Facebook Networking team is up to?

HK: Check out the Networking team page for all our most recent publications, news, programs, and job openings. We are also launching an RFP at NSDI ’21. Sign up here for email notifications about new RFPs.

The post Navigating industry after academia: Q&A with Software Engineering Manager Hyojeong Kim appeared first on Facebook Research.

Read More

New Analytics API for researchers studying Facebook Page data

Today we are launching the Facebook Open Research & Transparency (FORT) Analytics API for researchers. This release includes a collection of API endpoints that helps academics identify trends on Facebook Pages and how they’ve evolved over time. You can leverage these insights to focus on specific Pages that are of interest.

We designed this API specifically for the academic community to conduct longitudinal analyses with time series data. With the launch of FORT Pages API in 2020 and now the Analytics API, we will continue to develop a product roadmap focused on sharing new types of Facebook and Instagram data with the academic research community, as well as additional analytics endpoints.

Analytics API features

With the FORT Analytics API, we are offering three new endpoints. Each of these are aggregated, time-series data, captured at daily intervals and include:

  1. Lifetime follower count (number of users who have ever followed a Page) by country per Page.
  2. Page admin Post count (applies only to Posts made by Page admins and not to user posts)
  3. Page engagement count, where engagement is defined as total number of likes, comments, clicks and shares on Posts created on that Page

Unlike other research-related APIs, these endpoints facilitate queries across the entirety of public Facebook Pages. (You can learn more about Facebook data sets that we make available to researchers here.)

For technical documentation, click here.

Important considerations

Current Analytics data: These endpoints provide aggregate metrics on Facebook Pages. They are designed to help researchers observe and analyze engagement patterns of Pages and leverage that analysis to decide which Pages to focus on, saving time and effort. Our systems take a snapshot of Page activity once per day and provide that information to you via the Pages API. Consequently, these aggregations are directionally accurate (as opposed to data which is dynamically generated for each query).

Historical Page data: When using these endpoints you may encounter small, historical inaccuracies with the counts. For instance, the Center for Disease Control Page seems to have lost a few hundred followers in January, most likely due to user account deletions etc. For now, we consider this an acceptable amount of inaccuracy to still deliver high quality research, but we continue to monitor these to incorporate into future updates. Note that a user deleting a Post, unliking a Post, or deleting a Comment will not be reflected in the analytics. In other words, the data does not account for content deletions.

How to access the FORT Analytics API

If you are a grantee of Social Science One, you will have default access to this API through the FORT platform. If you are not an SS1 researcher but are interested in using this API, please apply to join the Social Science One community here.

Technical limitations
While we won’t apply any formal limit on query size, we recommend the following for best performance:

  • No more than 250 page ids in a single API call
  • Request no more than 10,000 records in a single API call

See this documentation for more details.

About the Facebook Open Research and Transparency Platform

The Facebook Open Research and Transparency (FORT) Platform facilitates responsible research by providing flexible access to valuable data. The platform is built with validated privacy and security protections, such as data access controls, and has been penetration-tested by internal and external experts.

The FORT platform runs on an opinionated version of JupyterHub, an open source tool that is widely used by the academic community. The FORT platform supports multiple standard programs, including SQL, Python, and R, and a specialized bridge to specific Facebook Graph APIs.

Publication guidelines

Researchers may publish research conducted using this data without Facebook’s permission. As with other research conducted under the Research Data Agreement, Facebook is entitled to review (not approve or reject) research prior to publication, and remove any confidential or personally identifiable information.

The post New Analytics API for researchers studying Facebook Page data appeared first on Facebook Research.

Read More

Core Data Science researchers discuss research award opportunity in adaptive experimentation

On February 24, Facebook launched a request for proposals (RFP) on sample-efficient sequential Bayesian decision making, which closes on April 21. With this RFP, the Facebook Core Data Science (CDS) team hopes to deepen its ties to the academic research community by seeking out innovative ideas and applications of Bayesian optimization that further advance the field. To provide an inside look from the team behind the RFP, we reached out to Eytan Bakshy and Max Balandat, who are leading the effort within CDS.
View RFPBakshy leads the Adaptive Experimentation team, which seeks to improve the throughput of experimentation with the help of machine learning and statistical learning. Balandat supports the team’s efforts on modeling and optimization, which is primarily focused on probabilistic models and Bayesian optimization. In this Q&A, Bakshy and Balandat contextualize the RFP by sharing more information about how the work of their team relates to the areas of interest for the call.

Q: What’s the goal of this RFP?

A: Primarily, we are keen to learn more about all the great research that is going on in this area. Conversely, we are also able to share a number of really interesting real-world use cases that we hope can help inspire additional applied research, and increase interest and research activity into sample-efficient sequential Bayesian decision making. Lastly, we aim to further strengthen our ties to academia and our collaborations with academics who are at the forefront of this area.

We are both excited to dive in and learn about creative applications and approaches to Bayesian optimization that researchers come up with in their proposals.

Q: What inspired you to launch this RFP?

A: We publish quite a bit in top-tier AI/ML venues, and all our papers are informed by very practical problems we face every day in our work. The need for exploring large design spaces via experiments with a limited budget is widespread across Facebook, Instagram, and Facebook Reality Labs. Much of our team’s work focuses on applied problems to help support the company and use-inspired basic research, but it is clear that there are plenty of ideas out there that can advance the area of sample-efficient sequential decision making, such as Bayesian optimization and related techniques.

In academia, it can sometimes be challenging to understand what exactly the most relevant and impactful “real-world” problems are. Conversely, academics may have an easier time taking a step back, looking at the bigger picture, and doing more exploratory research. With this RFP, we hope to help bridge this gap and foster increased collaboration and cross-pollination between industry and academia.

Q: What is Bayesian optimization and how is it applied at Facebook?

A: Bayesian optimization is a set of methodologies for exploring large design spaces on a limited budget. While Bayesian optimization is frequently used for hyperparameter optimization in machine learning (AutoML), our team’s work had originally been motivated by the use of online experiments (A/B tests) for optimizing software and recommender systems.

Since then, the applications for Bayesian optimization have expanded tremendously in scope, with applications ranging from the design of next-generation AR/VR hardware, to bridging the gap between simulations and real-world experiments, to efforts around providing affordable connectivity solutions to developing countries.

The main idea behind Bayesian optimization is to fit a probabilistic surrogate model to the “black-box” function one is trying to optimize, and then use this model to inform at which new parameters to evaluate the function next. Doing so allows for a principled way of trading off reducing uncertainty with exploring promising regions of the parameter space. As described earlier, we apply this approach to a large variety of problems in different domains at Facebook.

Q: What is BoTorch, and how does it relate to the RFP?

A: The adaptive experimentation team has been investing in methodological development and tooling for Bayesian optimization for over five years. A few years ago, we found that our current tooling was slowing down researchers’ ability to generate new ideas and engineers’ ability to scale out Bayesian optimization use cases.

To address these problems, we developed BoTorch, a framework for Bayesian optimization research, and Ax, a robust platform for adaptive experimentation. BoTorch follows the same modular design philosophy as PyTorch, which makes it very easy for users to swap out or rearrange individual components in order to customize all aspects of their algorithm, thereby empowering researchers to do state-of-the art research on modern Bayesian optimization methods. By exploiting modern parallel computing paradigms on both CPUs and GPUs, it is also fast.

BoTorch has really changed the way we approach Bayesian optimization research and accelerates our ability to tackle new problems. With the RFP, we hope to attract more widespread interest in this area and raise awareness of our open source tools.

Q: Where can people stay updated and learn more?

A: We actively engage with researchers on Twitter, so follow @eytan and @maxbalandat for the latest research, and always feel free to reach out to us via Twitter, email, or GitHub Issues if you have any questions or ideas.

You can find the latest and greatest of what we are working on in our open source projects, BoTorch and Ax. It also helps to keep an eye out for our papers in machine learning conferences, such as NeurIPS, ICML, and AISTATS.

—-

Applications for the RFP on sample-efficient sequential Bayesian decision making close on April 21, 2021, and winners will be announced the following month. To receive updates about new research award opportunities and deadline notifications, subscribe to our RFP email list.

The post Core Data Science researchers discuss research award opportunity in adaptive experimentation appeared first on Facebook Research.

Read More

Q&A with Brown University’s Anna Lysyanskaya, two-time winner of Facebook research awards in cryptography

In this monthly interview series, we turn the spotlight on members of the academic community and the important research they do — as partners, collaborators, consultants, and independent contributors.

For March, we nominated Anna Lysyanskaya, a professor at Brown University. Lysyanskaya is a two-time Facebook research award recipient in cryptography and is most known for her work in digital signatures and anonymous credentials. In this Q&A, Lysyanskaya shares more about her background, her two winning research proposals, her recent talk at the Real World Cryptography Symposium, and the topics she’s currently focusing on.

Q: Tell us about your role at Brown and the type of research you specialize in.

Anna Lysyanskaya: I am a professor of computer science, and my area of expertise is cryptography, specifically privacy-preserving authentication and anonymous credentials. I’ve had a long career in academia and finished my PhD 19 years ago, so this particular area is something that I started working on basically since I started doing research as a PhD student.

I got into this field mostly by chance, and honestly, I could have ended up anywhere. At the time, everything was new and interesting to me, but I remember I had a chance encounter with the person who would eventually become my adviser. At the time, he had a couple of papers he wanted to take a closer look at, so I started reading them and meeting with him to discuss them.

At the beginning, I was attracted to cryptography because I was interested in the math aspect, as well as the social aspect of solving math problems with interesting people who made everything fun. That initial fascination, paired with being in a great place to study it, led me to where I am today.

Eventually, I learned that it’s not just fun and math, and that there are actually interesting applications of what I’m working on. This is actually why I’m still working on it all this time later, because I just haven’t run out of interesting places to apply this stuff.

Q: You were a winner of two Facebook requests for proposals: the Role of Applied Cryptography in a Privacy-Focused Advertising Ecosystem RFP and the Privacy Preserving Technologies RFP. What were your winning proposals about?

AL: My ads-focused proposal was entitled “Know your anonymous customer.” Let’s start with how a website — say, yourfavoritenewspaper.com — turns content into money: by showing ads. When you click on an ad and buy something, the website that sent you there gets a small payment. At scale, these payments are what pays for the content you find online. The main issue here is that the websites you visit track your activities, and by tracking what you do, they are able to reward the sites that successfully showed you an ad.

My project is about finding a privacy-preserving approach to reward ad publishers — an approach that would not involve tracking a user’s activities but that would still allow reliable accountability when it comes to rewarding a website responsible for sending a customer to, say, a retailer that closed a sale with that customer. The idea is to use anonymous credentials: When you purchase something, your browser obtains a credential from the retailer that just received money from you. Your browser then communicates this credential, transformed in a special way, to whichever website sent you to the original retailer. The crux of the matter is that the transformed credential cannot be linked to the data issued by the retailer, so even if the website and retailer collude, they cannot tell that it was the same user.

My other proposal, which I coauthored with Foteini Baldimtsi from George Mason University, was about private user authentication and anonymous credentials on Facebook’s Libra blockchain. The nature of a blockchain is that it’s very public, but you also want to protect everyone’s privacy, so our goal was to build cryptographic tools for maintaining privacy on the blockchain. Having the opportunity to work with Libra researchers on this project is very exciting.

The tools for both research projects are very similar in spirit, but the stories are different. Because the applications are different enough, you still need to do some original research to solve the problems. The motivations for both projects are achieving user privacy and protecting users.

Q: You recently spoke at Real World Cryptography (RWC). What was your presentation about?

AL: Anonymous credentials have been central to my entire research career. They are what I am most known for, and they were the subject of my talk. An anonymous credential allows you to provide proof that you’re a credentialed user without disclosing any other information. In the aforementioned advertising example, a retail website you visit gives an anonymous credential to your browser that allows you to prove that you have purchased something at this retailer, without revealing who you are or any information that would allow anyone to infer who you are or what you purchased.

Of course, anonymous credentials can be used much more broadly. An especially timely potential application would be vaccination credentials. Suppose that everyone who receives a vaccination also receives a credential attesting to their vaccination status. Then, once you’re vaccinated, you can return to pre-pandemic activities, such as attending concerts and sports events, air travel, and even taking vacation cruises. To gain access to such venues, you’d have to show your vaccination credential. But unless anonymous credentials are used, this is potentially a privacy-invasive proposition, so anonymous credentials are a better approach.

Q: What are some of the biggest research questions you’re interested in?

AL: This talk that I gave at RWC is kind of about this. In a technical field, it’s hard to communicate what you’re doing to people who can actually potentially apply it, mostly because it’s not easy to explain mathematical concepts. Anonymous credentials are especially hard to explain to somebody who hasn’t studied cryptography for at least a few years.

Right now, my focus is to recast this problem in a way that’s a little bit more intuitive. My current attempt is to have an intermediate primitive called a mercurial signature. This is just like a digital signature, but it’s mercurial as in you can transform it in a way that’s still meaningfully signing a statement — just in a way that’s not recognizable to what it looked like when it was first issued.

There are several reasons why I think mercurial signatures are a good building block to study:

  • First, we actually do have a candidate construction, so it’s not completely far-fetched, and we know that we can do it. Now, that construction has some shortcomings, but it isn’t a completely crazy idea.
  • Second, mercurial signatures are an accessible concept to somebody who has just a basic undergraduate understanding of cryptography. You can actually explain what a mercurial signature is to somebody who knows what a digital signature is in just a few minutes.
  • Also, mercurial signatures have very rich applications, and they allow us to build anonymous credentials that have some nice features. One example is delegation. Let’s say I anonymously give a credential to you and then you give a credential to someone else. When they use their credential, it doesn’t reveal what the chain of command is — just that they’re authorized.

This is actually the bulk of my RWC talk, and it’s what I think is the next thing to do.

Q: Where can people learn more about your research?

AL: People can learn more about my research on my Google Scholar profile.

The post Q&A with Brown University’s Anna Lysyanskaya, two-time winner of Facebook research awards in cryptography appeared first on Facebook Research.

Read More