OpenAI presents a two-stage approach for improving language understanding: pretraining a transformer model on large unsupervised datasets using language modeling, then fine-tuning on smaller supervised datasets for specific tasks. The method achieves state-of-the-art results across diverse tasks including commonsense reasoning, semantic similarity, and reading comprehension with minimal hyperparameter tuning.
We’ve obtained state-of-the-art results on a suite of diverse language tasks with a scalable, task-agnostic system, which we’re also releasing. Our approach is a combination of two existing ideas: transformers and unsupervised pre-training. These results provide a convincing example that pairing supervised learning methods with unsupervised pre-training works very well; this is an idea that many have explored in the past, and we hope our result motivates further research into applying this idea on larger and more diverse datasets.
# Improving language understanding with unsupervised learning
Source: [https://openai.com/index/language-unsupervised/](https://openai.com/index/language-unsupervised/)
Our system works in two stages; first we train a transformer model on a very large amount of data in an unsupervised manner—using language modeling as a training signal—then we fine\-tune this model on much smaller supervised datasets to help it solve specific tasks\. We developed this approach following our[sentiment neuron](https://openai.com/index/unsupervised-sentiment-neuron/)work, in which we noted that unsupervised learning techniques can yield surprisingly discriminative features when trained on enough data\. Here, we wanted to further explore this idea: can we develop one model, train it in an unsupervised way on a large amount of data, and then fine\-tune the model to achieve good performance on many different tasks? Our results indicate that this approach works surprisingly well; the same core model can be fine\-tuned for very different tasks with minimal adaptation\.
This work builds on the approach introduced in[Semi\-supervised Sequence Learning\(opens in a new window\)](https://arxiv.org/abs/1511.01432), which showed how to improve document classification performance by using unsupervised pre\-training of an LSTM followed by supervised fine\-tuning\. It also extends[ULMFiT\(opens in a new window\)](https://arxiv.org/abs/1801.06146), research that shows how a single dataset\-agnostic LSTM language model can be fine\-tuned to get state\-of\-the\-art performance on a variety of document classification datasets; our work shows how a Transformer\-based model can be used in this approach to succeed at a broader range of tasks beyond document classification, such as commonsense reasoning, semantic similarity, and reading comprehension\. It is also similar to but more task\-agnostic than[ELMo\(opens in a new window\)](https://allennlp.org/elmo), which incorporates pre\-training but uses task\-customized architectures to get state\-of\-the\-art results on a broad suite of tasks\.
Very little tuning was used to achieve our results\. All datasets use a single forward language model, without any ensembling, and the majority of the reported results use the exact same hyperparameter settings\.
A result we are particularly excited about is the performance of our approach on three datasets—[COPA\(opens in a new window\)](http://people.ict.usc.edu/~gordon/copa.html),[RACE\(opens in a new window\)](https://arxiv.org/abs/1704.04683), and[ROCStories\(opens in a new window\)](http://cs.rochester.edu/nlp/rocstories/)—designed to test commonsense reasoning and reading comprehension\. Our model obtains new state\-of\-the\-art results on these datasets by a wide margin\. These datasets are thought to require multi\-sentence reasoning and significant world knowledge to solve suggesting that our model improves these skills predominantly via unsupervised learning\. This suggests there’s hope for developing complex language understanding capabilities via unsupervised techniques\.
OpenAI introduces GPT-2, a 1.5 billion parameter transformer-based language model trained on 40GB of internet text that achieves state-of-the-art performance on language modeling benchmarks and demonstrates zero-shot capabilities in reading comprehension, translation, question answering, and summarization. Due to safety concerns, only a smaller model and technical paper are released publicly rather than the full trained model.
OpenAI research demonstrates that language model behavior can be significantly improved through fine-tuning on small, curated datasets (<100 examples) targeting specific behavioral values, with effectiveness increasing at larger model scales. The approach provides users with tools to align models with Charter-compatible values for their specific applications.
OpenAI demonstrates a technique for improving language model summarization by training a reward model on human preferences and fine-tuning models with reinforcement learning, achieving significant quality improvements that generalize across datasets. This work advances model alignment through human feedback at scale, with applications beyond summarization.
This paper proposes a reinforcement learning approach to enable large language models to translate unseen languages by leveraging in-context linguistic knowledge, outperforming in-context learning and supervised fine-tuning.
This article profiles MIT senior Olivia Honeycutt, highlighting her interdisciplinary research at the intersection of linguistics, computation, and cognition, with a focus on comparing human language processing with large language models.