cross-architecture

#cross-architecture

Pattern Selectivity is Not Task-Causal Structure: A Cross-Architecture Mechanistic Study of Composed-Task Circuits in 1B-Class Language Models

arXiv cs.LG ↗ · 2026-06-05 Cached

This paper tests whether the standard recipe for identifying attention-head circuits by task-pattern selectivity and causal ablation yields consistent mechanistic claims across different 1B-class language model families (Pythia, OLMo, OLMoE). It finds no two (task, model) cells share the same primary causal screen, and introduces a five-category taxonomy of screen outcomes, with the MoE model showing a distinct prev-token positional substrate.

0 favorites 0 likes

#cross-architecture

Cross-Tokenizer LLM Distillation through a Byte-Level Interface

Hugging Face Daily Papers ↗ · 2026-04-13 Cached

This paper proposes Byte-Level Distillation (BLD), a simple method for cross-tokenizer knowledge transfer in language models by operating at a shared byte-level interface, achieving competitive or superior performance compared to more complex existing approaches across 1B-8B parameter models.

0 favorites 0 likes

cross-architecture

Pattern Selectivity is Not Task-Causal Structure: A Cross-Architecture Mechanistic Study of Composed-Task Circuits in 1B-Class Language Models

Cross-Tokenizer LLM Distillation through a Byte-Level Interface

Submit Feedback