An empirical analysis of compute-optimal large language model training

We ask the question: “What is the optimal model size and number of training tokens for a given compute budget?” To answer this question, we train models of various sizes and with various numbers of tokens, and estimate this trade-off empirically. Our main finding is that the current large language models are far too large for their compute budget and are not being trained on enough data.Read More

An empirical analysis of compute-optimal large language model training

We ask the question: “What is the optimal model size and number of training tokens for a given compute budget?” To answer this question, we train models of various sizes and with various numbers of tokens, and estimate this trade-off empirically. Our main finding is that the current large language models are far too large for their compute budget and are not being trained on enough data.Read More

GopherCite: Teaching language models to support answers with verified quotes

Language models like Gopher can “hallucinate” facts that appear plausible but are actually fake. Those who are familiar with this problem know to do their own fact-checking, rather than trusting what language models say. Those who are not, may end up believing something that isn’t true. This paper describes GopherCite, a model which aims to address the problem of language model hallucination. GopherCite attempts to back up all of its factual claims with evidence from the web.Read More

GopherCite: Teaching language models to support answers with verified quotes

Language models like Gopher can “hallucinate” facts that appear plausible but are actually fake. Those who are familiar with this problem know to do their own fact-checking, rather than trusting what language models say. Those who are not, may end up believing something that isn’t true. This paper describes GopherCite, a model which aims to address the problem of language model hallucination. GopherCite attempts to back up all of its factual claims with evidence from the web.Read More

Predicting the past with Ithaca

The birth of human writing marked the dawn of History and is crucial to our understanding of past civilisations and the world we live in today. For example, more than 2,500 years ago, the Greeks began writing on stone, pottery, and metal to document everything from leases and laws to calendars and oracles, giving a detailed insight into the Mediterranean region. Unfortunately, it’s an incomplete record. Many of the surviving inscriptions have been damaged over the centuries or moved from their original location. In addition, modern dating techniques, such as radiocarbon dating, cannot be used on these materials, making inscriptions difficult and time-consuming to interpret.Read More