Trained transformer-based chess models to play like humans (including thinking time) [P]

Reddit r/MachineLearning 05/13/26, 10:08 PM Models

chess transformers deep-learning human-like rating-conditioning thinking-time open-source

Summary

Trained transformer-based chess models for rating buckets from 800 to 2500+, predicting moves, thinking time, and outcome. Achieves strong accuracy with only 9M parameters, and includes a novel thinking-time prediction component.

I trained a set of deep learning (transformer-based) chess models to play like humans (inspired by MAIA and Grandmaster Chess Without Search). There's a separate model for each 100-point rating bucket from \~800 to 2500+. I started with training a mid-strength model from scratch on a 8xH100 cluster, then fine-tuned models for the other rating ranges on my local 5090 GPU. The total training size was nearly a year of Lichess data, about 1B total games. Each rating range actually has 3 models: A move model, a thinking time model, and a white win / draw / black win model. Despite being quite small (only 9MM parameters!) the move models achieve better accuracy than MAIA-2 and are approximately on par with MAIA-3 (see [here](https://github.com/thomasj02/1e4_ai/blob/master/experiments/maia2_benchmark/RESULTS.md) for MAIA-2 comparison). AFAIK this is the only attempt to train on thinking times in chess, so I don't have a benchmark to compare against for that. Likely because of the network size, at high ratings the models aren't quite as good as they could be. They see short tactical motifs but can't do deep calculation - probably a bigger model would help here. The move and win models take into account player ratings and clock times. For instance, under extreme time pressure a much stronger player has a lower win prob even if their opponent is weaker. The models blunder more under time pressure as well. The data pipeline is C++ via nanobind, then training with Pytorch. Getting this right was actually the thing I spent the most time on. Pre-shuffling the dataset and then being able to read the shuffled dataset sequentially at training time kept the GPU utilization high. Without this it spent a huge percentage of time on I/O while the GPU sat idle. Happy to answer questions about the rating-conditioning, the clock model, or the data pipeline. Code (including training code and model weights) is at [https://github.com/thomasj02/1e4\_ai/](https://github.com/thomasj02/1e4_ai/). A demo is at [https://1e4.ai/](https://1e4.ai/) but all the frontend code is also in the repo if you want to self-host.

Original Article

Trained transformer-based chess models to play like humans (including thinking time) [P]

Similar Articles

Transformer Math Explorer [P]

What to expect from AlphaZero's value predictions [D]

tencent/HY-Embodied-0.5

Optimizing Transformer model size & inference beyond FP16 + ONNX (pruning/graph opt didn’t help much) [P]

@berryxia: Small model, big wisdom? It's now real! A 7B small model now acts as the boss of top large models like GPT-5, Claude Sonnet 4, Gemini 2.5 Pro. A new paper shows an RL-trained 7B model learned to write natural language subtasks, assign them to different models, precisely...

Submit Feedback

Similar Articles

What to expect from AlphaZero's value predictions [D]

Optimizing Transformer model size & inference beyond FP16 + ONNX (pruning/graph opt didn’t help much) [P]

@berryxia: Small model, big wisdom? It's now real! A 7B small model now acts as the boss of top large models like GPT-5, Claude Sonnet 4, Gemini 2.5 Pro. A new paper shows an RL-trained 7B model learned to write natural language subtasks, assign them to different models, precisely...