Introducing new election-related ad data sets for researchers

We previously announced that starting February 1, 2021, we would make targeting information for more than 1.3 million social issues, electoral, and political Facebook ads available to academic researchers for the first time. This data package includes ads that ran during the three-month period prior to Election Day in the United States, from August 3 to November 3, 2020, and is accessible through the Facebook Open Research and Transparency (FORT) platform.

As part of this launch, we are sharing access to two new data sets:

  • Ad Targeting data set: Includes the targeting logic of social issues, election, and political ads that ran between August 3, 2020, and November 3, 2020. We exclude ads that had fewer than 100 impressions, which is one of several steps we take to protect users’ privacy.
  • Ad Library data set: Includes social issues, election, and political ads that are part of the Ad Library product. It is included so that researchers can analyze the ads and targeting information in the same environment. In other words, this data set is a copy of corresponding Ad Library data made available in the FORT platform and is different from the Ad Library API product.

We created this tool to enable academic researchers to better study the impact of Facebook’s products on elections, and we included measures to protect people’s privacy and keep the platform secure.

How to access the data sets

To apply for access to these data sets, please fill out this form.

After you provide your information, Facebook will contact you with details on next steps, which includes details on the Facebook Research Data Agreement (RDA) and the ID verification process. Once you and your university have signed the RDA and you are ID verified, you will gain access to the Facebook Open Research and Transparency platform and these two new data sets.

More information about the election-related ad data sets

Ad Targeting data set

This includes the targeting options selected by advertisers when creating an ad. You can learn more about Facebook ads here.

Overview of targeting options

Learn about more targeting options here.

  • Location: Cities, communities, and countries
  • Demographics: Age, gender, education, job title, and more
  • Interests: Interests and hobbies of the people advertisers want to reach — these help make ads more relevant
  • Behaviors: Consumer behavior, such as prior purchases and device usage
  • Connections: Audiences based on people who are connected to the advertiser’s Facebook Page, app, or event
  • Custom audiences: Options that enable an advertiser to find their existing audiences among people who are on Facebook, e.g., through customer lists, website or app traffic, or engagement on Facebook. Learn more about the different types of custom audiences here. In the data set, we indicate whether the custom audience used is based specifically on a customer list.
  • Lookalike audiences: Help advertisers reach new people potentially interested in their business because they’re similar to their existing customers. Learn more about how advertisers set up a lookalike audience.

Wherever applicable, we indicate whether the targeting options were selected for inclusion or exclusion targeting.

Key considerations for the Ad Targeting data set

Location data

This data set includes the location targeting chosen by an advertiser for the ad. Advertisers can input location targeting in a number of ways, such as by selecting zip codes, countries, designated market areas (DMAs), or pindrops/addresses/places with a specified radius.

We’ve provided the location targeting selected by advertisers. When an advertiser selects an address, a place, or a location pindrop, we note the type of selection, the city it falls in, and the radius specified by the advertiser. Larger geographic areas, such as zip codes, cities, or countries, are included in the data set.

The following examples help illustrate the transformations:

On the left, the targeting selection by the advertiser; on the right, the transformation found in the data set. You will see that addresses, pindrops (longitude and latitude), and places are replaced by the redacted text <address>, <location>, and <place>, respectively.

  1. Seattle + 5 miles —> Seattle (+5 miles)
  2. 1 North Almaden Blvd, San Jose + 5 miles —> <address> San Jose (+5 miles)
  3. 95110, San Jose —> 95110, San Jose (so no change)
  4. 95110, San Jose + 5 miles —> 95110, San Jose (+5 miles)
  5. 37.335080; -121.895480 + 5 miles —> <location> San Jose (+5 miles)
  6. Acme Park, San Jose (+ 1 mile) —> <place> San Jose (+1 mile)

Joining targeting data with Ad Library data

If researchers want to understand more information about an ad (its creative, spend, etc.) and want to map this ad with its targeting information, they can perform a join between the column ad_archive_id of ad_archive_api (Ad Library data) with the column archive_id of ad_library_targeting table (Ad Targeting data).

However, you may see inconsistencies between ad_archive_api and ad_library_targeting tables:

If you perform a join between the two tables, you will see that there are ads in the targeting table that do not have a corresponding entry in the ad library table (or vice versa).

This happens because ads could be classified as political/nonpolitical long after they have been run. When this happens, the ad library is updated. But since the targeting data set is a one-time data release that occurred on January 22, it would not be reflected in it.

However, this doesn’t happen often. For example, when we made the one-time data release of targeting data on January 22, 2021, we found nine ads (out of roughly 1.3 million) that were in the targeting data set but not in the ad library because they were later found to be incorrectly labeled as political ads and were removed from the library. However, since the targeting data set had already been generated by then, these nine ads were included in this data set.

Over the next few weeks, we will monitor this situation, and if we notice a large volume of ads where such an issue exists, then we will evaluate whether to update this data set with these new ads.

Ad Library data set

The Ad Library data set contains the following fields:

  • ad_archive_id: ID for the archived ad object
  • ad_creation_time: The UTC date and time when someone created the ad. This is not the same time as when the ad ran. Includes date and time separated by T. Example: 2019-01-24T19:02:04+0000, where +0000 is the UTC offset.
  • ad_creative_body: The text that displays in the ad. Typically 90 characters. See Reference, Ad Creative.
  • ad_creative_link_caption: If an ad contains a link, the text that appears in the link
  • ad_creative_link_description: If an ad contains a link, any text description that appears next to the link, such as a caption or description
  • ad_creative_link_title: If an ad contains a link, any title provided
  • ad_delivery_start_time: Date and time when an advertiser wants Facebook to start delivering any of the ads. Provided in UTC as in ad_creation_time
  • ad_delivery_stop_time: The time when an advertiser wants to stop delivery of their ad. If this is blank, Facebook runs the ad until the advertiser stops it or they spend their entire campaign budget. In UTC.
  • ad_snapshot_url: String with URL link which displays the archived ad
  • currency: The currency used to pay for the ad, as an ISO currency code
  • impressions: A string containing the number of times the ad created an impression. In ranges of <1000, 1K-5K, 5K-10K, 10K-50K, 50K-100K, 100K-200K, 200K-500K, >1M.
  • demographic_distribution: The demographic distribution of people reached by the ad. Provided as age ranges and gender:
    • Age ranges can be one of the following: 18-24, 25-34, 35-44, 45-54, 55-64, 65+
    • Gender can be any of the following strings: “Male”, “Female”, “Unknown”
  • funding_entity: A string containing the name of the person, company, or entity that provided funding for the ad. Provided by the purchaser of the ad.
  • page_id: ID of the Facebook Page that ran the ad
  • page_name: Name of the Facebook Page that ran the ad
  • region_distribution: Regional distribution of people reached by the ad. Provided as a percentage and where regions are at a subcountry level.
  • spend: A string showing the amount of money spent running the ad as specified in currency. This is reported in ranges: <100, 100-499, 500-999, 1K-5K, 5K-10K, 10K-50K, 50K-100K, 100K-200K, 200K-500K, >1M.
  • is_active: Binary; describes whether an ad is active
  • reached_countries: Facebook delivered the ads in these countries. Provided as ISO country codes.
  • publisher_platforms: Search for ads based on whether they appear on a particular platform, such as Instagram or Facebook. You can provide one platform or a comma-separated list of platforms.
  • potential_reach: This is an estimate of the size of the audience that’s eligible to see this ad. It’s based on targeting criteria, ad placements, and how many people were shown ads on Facebook apps and services in the past 30 days. This is not an estimate of how many people will actually see this ad, and the number may change over time. It isn’t designed to match population or census estimates.

About the Facebook Open Research and Transparency platform

The Facebook Open Research and Transparency (FORT) platform facilitates responsible research by providing flexible access to valuable data. The platform is built with validated privacy and security protections, such as data access controls, and has been penetration-tested by internal and external experts.

The FORT platform runs on a configured version of JupyterHub, an open source tool that is widely used by the academic community. Hosted on Amazon Web Services on servers in Ireland, the FORT platform supports multiple standard programs, including SQL, Python, and R, and a specialized bridge to specific Facebook Graph APIs.

Publication guidelines

Researchers may publish research conducted using this data without Facebook’s permission. Note that the terms of the Facebook Research Data Agreement require researchers to submit publications of any kind to Facebook for a privacy review at least 30 days prior to publication.

The post Introducing new election-related ad data sets for researchers appeared first on Facebook Research.

Read More