Language models are few-shot learners
Summary
OpenAI introduces GPT-3, a 175-billion parameter autoregressive language model that demonstrates strong few-shot learning capabilities across diverse NLP tasks without gradient updates or fine-tuning, representing a paradigm shift in how language models can be applied to new tasks through text interactions alone.
View Cached Full Text
Cached at: 04/20/26, 02:46 PM
Similar Articles
Better language models and their implications
OpenAI introduces GPT-2, a 1.5 billion parameter transformer-based language model trained on 40GB of internet text that achieves state-of-the-art performance on language modeling benchmarks and demonstrates zero-shot capabilities in reading comprehension, translation, question answering, and summarization. Due to safety concerns, only a smaller model and technical paper are released publicly rather than the full trained model.
First look at GPT-5
OpenAI provides a first look at GPT-5, representing a major advancement in large language models with potential paradigm-shifting capabilities.
Aligning language models to follow instructions
OpenAI introduces InstructGPT, a GPT-3 variant fine-tuned using reinforcement learning from human feedback (RLHF) to better follow instructions and reduce harmful outputs. A 1.3B InstructGPT model is preferred by human evaluators over a 175B GPT-3 model, now becoming the default on OpenAI's API.
GPT-4
OpenAI releases GPT-4, a large multimodal model that accepts image and text inputs and demonstrates human-level performance on professional and academic benchmarks, significantly outperforming GPT-3.5 across various evaluation metrics.
Meta-Tool: Efficient Few-Shot Tool Adaptation for Small Language Models
Independent study shows 227M-parameter hypernetwork adds zero gain over well-crafted few-shot prompts for tool-use in 3B Llama, achieving 79.7% of GPT-5 performance at 10× lower latency.