Discovering types for entity disambiguation

OpenAI Blog Papers

Summary

OpenAI researchers present a novel approach to entity disambiguation using type discovery, where a system predicts entity types from a pre-chosen category set to resolve ambiguous references. The method achieves state-of-the-art results on entity disambiguation datasets and enables efficient O(N) runtime entity ranking through type-based weighting.

We’ve built a system for automatically figuring out which object is meant by a word by having a neural network decide if the word belongs to each of about 100 automatically-discovered “types” (non-exclusive categories).
Original Article
View Cached Full Text

Cached at: 04/20/26, 02:56 PM

# Discovering types for entity disambiguation Source: [https://openai.com/index/discovering-types-for-entity-disambiguation/](https://openai.com/index/discovering-types-for-entity-disambiguation/) For example, given a sentence like “the prey saw the jaguar cross the jungle”, rather than trying to reason directly whether jaguar means the car, the animal, or something else, the system plays “20 questions” with a pre\-chosen set of categories\. This approach gives a big boost in state\-of\-the\-art on several entity disambiguation datasets\. Using the top solution from our type system optimization, we can now label data from Wikipedia using labels generated by the type system\. Using this data \(in our experiments, 400M tokens for each of English and French\), we can now train a[bidirectional LSTM⁠\(opens in a new window\)](http://colah.github.io/posts/2015-08-Understanding-LSTMs/)to independently predict all the type memberships for each word\. On the Wikipedia source text, we only have supervision on intra\-wiki links, however this is sufficient to train a deep neural network to predict type membership with an[F1⁠\(opens in a new window\)](https://en.wikipedia.org/wiki/F1_score)of over 0\.91\. One of our type systems, discovered by beam search, includes types such as`Aviation`,`Clothing`, and`Games`\(as well as surprisingly specific ones like`1754 in Canada`—indicating 1754 was an exciting year in the dataset of 1,000 Wikipedia articles it was trained on\); you can also view the[full⁠\(opens in a new window\)](https://cdn.openai.com/discovering-types-for-entity-disambiguation/greedy.txt)type system\. Predicting entities in a document usually relies on a “coherence” metric between different entities, e\.g\., measuring how well each entity fits with each other, which is`O\(N^2\)`in the length of the document\. Instead, our runtime is`O\(N\)`as we need only to look up each phrase in a trie which maps phrases to their possible meanings\. We rank each of the possible entities according to the link frequency seen in Wikipedia, refined by weighting each entity by its likelihood under the type classifier\. New entities can be added just by specifying their type memberships \(person, animal, country of origin, time period, etc\.\)\.

Similar Articles