Tag
This theoretical paper analyzes the expressivity of padded transformers, showing that attention type, width, and uniformity have little impact compared to numeric precision and model depth. It establishes equivalences between transformer variants and circuit complexity classes like AC0 and TC0, providing a robust characterization.