Cached at:
05/08/26, 09:47 AM
# T5Gemma: A new collection of encoder-decoder Gemma models
Source: [https://developers.googleblog.com/en/t5gemma/](https://developers.googleblog.com/en/t5gemma/)
In the rapidly evolving landscape of large language models \(LLMs\), the spotlight has largely focused on the decoder\-only architecture\. While these models have shown impressive capabilities across a wide range of generation tasks, the classic encoder\-decoder architecture, such as T5 \(The Text\-to\-Text Transfer Transformer\), remains a popular choice for many real\-world applications\. Encoder\-decoder models often excel at summarization, translation, QA, and more due to their high inference efficiency, design flexibility, and richer encoder representation for understanding input\. Nevertheless, the powerful encoder\-decoder architecture has received little relative attention\.
Today, we revisit this architecture and introduce[T5Gemma](https://arxiv.org/abs/2504.06225), a new collection of encoder\-decoder LLMs developed by converting pretrained decoder\-only models into the encoder\-decoder architecture through a technique called adaptation\. T5Gemma is based on the Gemma 2 framework, including adapted Gemma 2 2B and 9B models as well as a set of newly trained T5\-sized models \(Small, Base, Large and XL\)\. We are excited to release pretrained and instruction\-tuned T5Gemma models to the community to unlock new opportunities for research and development\.
## From decoder\-only to encoder\-decoder
In T5Gemma, we ask the following question:*can we build top\-tier encoder\-decoder models based on pretrained decoder\-only models?*We answer this question by exploring a technique called*model adaptation*\. The core idea is to initialize the parameters of an encoder\-decoder model using the weights of an already pretrained decoder\-only model, and then further adapt them via UL2 or PrefixLM\-based pre\-training\.

An overview of our approach, showing how we initialize a new encoder\-decoder model using the parameters from a pretrained, decoder\-only model\.
This adaptation method is highly flexible, allowing for creative combinations of model sizes\. For instance, we can pair a large encoder with a small decoder \(e\.g\., a 9B encoder with a 2B decoder\) to create an "unbalanced" model\. This allows us to fine\-tune the quality\-efficiency trade\-off for specific tasks, such as summarization, where a deep understanding of the input is more critical than the complexity of the generated output\.
## Towards better quality\-efficiency trade\-off
*How does T5Gemma perform?*
In our experiments, T5Gemma models achieve comparable or better performance than their decoder\-only Gemma counterparts, nearly dominating the quality\-inference efficiency pareto frontier across several benchmarks, such as SuperGLUE which measures the quality of the learned representation\.

Encoder\-decoder models consistently offer better performance for a given level of inference compute, leading the quality\-efficiency frontier across a range of benchmarks\.
This performance advantage isn't just theoretical; it translates to real\-world quality and speed too\. When measuring the actual latency for GSM8K \(math reasoning\), T5Gemma provided a clear win\. For example, T5Gemma 9B\-9B achieves higher accuracy than Gemma 2 9B but with a similar latency\. Even more impressively, T5Gemma 9B\-2B delivers a significant accuracy boost over the 2B\-2B model, yet its latency is nearly identical to the much smaller Gemma 2 2B model\. Ultimately, these experiments showcase that encoder\-decoder adaptation offers a flexible, powerful way to balance across quality and inference speed\.
## Unlocking foundational and fine\-tuned capabilities
*Could encoder\-decoder LLMs have similar capabilities to decoder\-only models?*
Yes, T5Gemma shows promising capabilities both before and after instruction tuning\.
After pre\-training, T5Gemma achieves impressive gains on complex tasks that require reasoning\. For instance, T5Gemma 9B\-9B scores over 9 points higher on GSM8K \(math reasoning\) and 4 points higher on DROP \(reading comprehension\) than the original Gemma 2 9B model\. This pattern demonstrates that the encoder\-decoder architecture, when initialized via adaptation, has the potential to create a more capable, performant foundational model\.

Detailed results for pretrained models, illustrating how adapted models have significant gains on several reasoning\-intensive benchmarks compared to decoder\-only Gemma 2\.
These foundational improvements from pre\-training set the stage for even more dramatic gains after instruction tuning\. For example, comparing Gemma 2 IT to T5Gemma IT, the performance gap widens significantly across the board\. T5Gemma 2B\-2B IT sees its MMLU score jump by nearly 12 points over the Gemma 2 2B, and its GSM8K score increases from 58\.0% to 70\.7%\. The adapted architecture not only potentially provides a better starting point but also responds more effectively to instruction\-tuning, ultimately leading to a substantially more capable and helpful final model\.

Detailed results for fine\-tuned \+ RLHFed models, illustrating the capabilities of post\-training to significantly amplify the performance advantages of the encoder\-decoder architecture\.
## Explore our models: Releasing T5Gemma checkpoints
We’re very excited to present this new method of building powerful, general purpose encoder\-decoder models by adapting from pretrained decoder\-only LLMs like Gemma 2\. To help accelerate further research and allow the community to build on this work, we are excited to release a suite of our T5Gemma checkpoints\.
The release includes:
- **Multiple Sizes:**Checkpoints for T5\-sized models \(Small, Base, Large, and XL\), the Gemma 2\-based models \(2B and 9B\), as well as an additional model in between T5 Large and T5 XL\.
- **Multiple Variants**: Pretrained and instruction\-tuned models\.
- **Flexible Configurations:**A powerful and efficient unbalanced 9B\-2B checkpoint to explore the trade\-offs between encoder and decoder size\.
- **Different Training Objectives:**Models trained with either PrefixLM or UL2 objectives to provide either state\-of\-the\-art generative performance or representation quality\.
We hope these checkpoints will provide a valuable resource for investigating model architecture, efficiency, and performance\.
## Getting started with T5Gemma
We can't wait to see what you build with T5Gemma\. Please see the following links for more information:
- Learn about the research behind this project by reading[the paper](https://arxiv.org/abs/2504.06225)\.
- Download the models: Find the model weights on[Hugging Face](https://huggingface.co/collections/google/t5gemma-686ba262fe290b881d21ec86)and[Kaggle](https://www.kaggle.com/models/google/t5gemma)\.
- Explore the models capabilities or fine\-tune them for your own use cases with the[Colab notebook](https://github.com/google-gemini/gemma-cookbook/blob/main/Research/%5BT5Gemma%5DExample.ipynb)\.
- Run inference with the models on[Vertex AI](https://console.cloud.google.com/vertex-ai/publishers/google/model-garden/t5gemma)\.