Trained transformer-based chess models to play like humans (including thinking time) [P]
Summary
Trained transformer-based chess models for rating buckets from 800 to 2500+, predicting moves, thinking time, and outcome. Achieves strong accuracy with only 9M parameters, and includes a novel thinking-time prediction component.
Similar Articles
Transformer Math Explorer [P]
This interactive tool visualizes the mathematical underpinnings of transformer models through dataflow graphs, covering architectures from GPT-2 to Qwen 3.6 and various attention mechanisms.
What to expect from AlphaZero's value predictions [D]
The article analyzes how AlphaZero's value predictions are shaped by self-play training data and noise, questioning whether they reliably estimate win chances against opponents with different play styles despite AlphaZero's strong empirical performance.
tencent/HY-Embodied-0.5
Tencent releases HY-Embodied-0.5, a suite of foundation models designed for embodied AI agents featuring a Mixture-of-Transformers (MoT) architecture with efficient 2B and powerful 32B variants for real-world robot control and spatial-temporal reasoning.
Optimizing Transformer model size & inference beyond FP16 + ONNX (pruning/graph opt didn’t help much) [P]
Author shares experience hitting diminishing returns with FP16 + ONNX + pruning on 162 MB transformer, seeks advice on next best steps among quantization, distillation, low-rank factorization, or hardware-specific tricks.
@berryxia: Small model, big wisdom? It's now real! A 7B small model now acts as the boss of top large models like GPT-5, Claude Sonnet 4, Gemini 2.5 Pro. A new paper shows an RL-trained 7B model learned to write natural language subtasks, assign them to different models, precisely...
A new paper proposes training a 7B small model via reinforcement learning as a task scheduler, automatically decomposing subtasks and assigning them to top models like GPT-5 and Claude. It surpasses individual frontier models on several hard benchmarks, demonstrating that end-to-end reward learning can effectively replace manual prompt engineering and multi-agent pipeline design.