Tag
Introduces Picotron, a clean-room rewrite of Nanotron that eliminates mandatory GPU-specific dependencies, enabling LLM training on older GPUs like T4 and V100. It defaults to standard PyTorch SDPA but supports FlashAttention-2 at runtime.