Named entity disambiguation (NED) is the process of mapping “strings” to “things” in a knowledge base. You have likely already used a system that requires NED multiple times today. Every time you ask a question to your personal assistant or issue a search query on your favorite browser, these systems use NED to understand what people, places, and things (entities) are being talked about.
Take the example shown above. You ask your personal assistant “What is the average gas mileage of a Lincoln?”. The assistant would need NED to know that “Lincoln” refers to Lincoln Motors (the car company)—not the former president or city in Nebraska. The ambiguity of mentions in text is what makes NED so challenging as it requires the use of subtle cues.
NED gets more interesting when we examine the full spectrum of entities shown above, specifically the more rare tail and unseen entities. These are entities that occur infrequently or not at all in data. Performance over the tail is critical because the majority of entities are rare. In Wikidata, only 13% of entities even have Wikipedia pages as a source of textual information.
Prior approaches to NED use BERT-based systems to memorize textual patterns associated with an entity (e.g., Abraham Lincoln is associated with “president”). As shown above, the SotA BERT-based baseline from Févry does a great job at memorizing patterns over popular entities (it achieves 86 F1 points over all entities). For the rare entities, it does much worse (58 F1 points lower on the tail). One possible solution to better tail performance is to simply train over more data, but this would likely require training over data 1,500x the size of Wikipedia for the model to achieve 60 F1 points over all entities!
In this blog post, we present Bootleg, a self-supervised approach to NED that is better able to handle rare entities.
Tail Disambiguation through NED Reasoning Patterns
The question we are left with is how to disambiguate these rare entities? Our insight is that humans disambiguate entities, including rare entities, by using signals from text as well as from entity relations and types. For example, the sentence “What is the gas mileage of a Lincoln?” requires reasoning that cars have a gas mileage, not people or locations. This can be used to reason that the mention of “Bluebird” in “What is the average gas mileage of a Bluebird?” refers to the car, a Nissan Bluebird, not the animal. Our goal in Bootleg is to train a model to reason over entity types and relations and better identify these tail entities.
Through empirical analysis, we found four reasoning patterns for NED, shown and defined in the figure below.
These patterns rely on signals from entities, types, and relations. Luckily, tail entities do not have equally rare types and relations. This means we should be able to learn type and relation patterns from our data that can apply to tail entities.
Bootleg: A Model for Tail NED
Bootleg takes as input a sentence, determines the possible entity candidates that could be mentioned in the sentence, and outputs the most likely candidates. The core insight that enables Bootleg to better identify rare entities is in how it internally represents entities.
Similar to how words are often represented by continuous word embeddings (e.g., BERT or ELMo), Bootleg represents entity candidates as a combination of a unique entity embedding, a type embedding, and a relation embedding, as shown above. For example, each car entity will get the same car type embedding (likewise for relations) which will encode patterns learned over all cars in the training data. A rare car can then use this global “car type” knowledge for disambiguation, as it will have the car embedding as part of its representation.
To output the correct entities, Bootleg uses these representations in a stacked Transformer module to allow the model to naturally learn the useful patterns for disambiguation without hard-coded rules. Bootleg then scores the output candidate representations and returns the most likely candidates.
There are other exciting techniques we present in our paper regarding regularization and weak labeling to improve tail performance.
Bootleg Improves Tail Performance and Allows for Knowledge Transfer
Our simple insight of training a model to reason over types and relations provides state-of-the-art performance on three standard NED benchmarks – matching or exceeding SotA by up to 5.6 F1 points – and outperforms a BERT-based NED baseline by 5.4 F1 points over all entities and 40 F1 points over tail entities (see F1 versus entity occurrence plot above).
We’ll now show how the entity knowledge encoded in Bootleg’s entity representations can transfer to non-NED tasks. We extract our entity representations and use them in both a production task at a major technology company and relation extraction task. We find that the use of Bootleg embeddings in the production task provides a 8% lift in performance and even improves quality over Spanish, French, and German languages. We repeat this experiment by adding Bootleg representations to a SotA model for the TACRED relation extraction task (see tutorial). We find this Bootleg-enhanced model sets a new SotA by 1 F1 point.
These results suggest that Bootleg entity representations can transfer entity knowledge to other language tasks!
To recap, we described the problem of the tail of NED and showed that existing NED systems fall short at disambiguating these rare, yet important entities. We then introduced four reasoning patterns for NED and described how we trained Bootleg to learn these patterns through the use of embeddings and Transformer modules. We finally showed that Bootleg is a SotA NED system that better disambiguates rare entities than prior methods. Further, Bootleg learns representations that can transfer entity knowledge to non-NED tasks.