Tag
A thread from Ai2 compares transformer (Olmo 3) and hybrid (Olmo Hybrid) models, finding that transformers excel at copying while RNNs better model meaning-bearing words, highlighting the growing viability of hybrid architectures.
A study comparing Olmo Hybrid and Olmo 3 transformers at the token level shows hybrid models better predict meaningful tokens like nouns/verbs, while transformers excel at copying tokens from input.