Tag
RASFT is a novel supervised fine-tuning framework for large language models that adapts expert supervision based on the model's own reasoning capabilities, achieving better performance on mathematical and code reasoning benchmarks compared to standard SFT and reinforcement learning methods.
LongAttnComp adapts AttnComp for long-context reasoning by fine-tuning lightweight cross-attention layers and introducing token-level chunking, a top-p algorithm, positional reordering, and a query parser. It achieves strong performance on long-context tasks like code debugging and transfers across multiple model families.
Researchers from University of Edinburgh propose a self-play framework using Liquid Haskell for formal verification to train LLMs on semantic equivalence reasoning, releasing OpInstruct-HSx dataset (28k programs) and achieving 13.3pp accuracy gains on EquiBench.