@FrancoisChauba1: If you train on (unsorted list, bubble sort procedure, sorted list) traces, you will never test time compute (TTC) your…

X AI KOLs Following 05/26/26, 06:37 PM News

llm training search test-time-compute generalization agi alpha-zero

Summary

A critique arguing that training LLMs on human-generated data limits their ability to discover novel solutions via test-time compute, and that true AGI requires models that can explore hypothesis spaces more broadly, similar to AlphaZero.

If you train on (unsorted list, bubble sort procedure, sorted list) traces, you will never test time compute (TTC) your way to mergesort. So frontier lab ppl say "well we dont just train on 1 algo, we train on many classes of sort algo's so it should be able to explore the function space of sort". You are still limited then. Lets for example say we dont know about non-comparative sort (radix sort). But we train on all comparative sort algos.. same issue. it wont sample non-comparative sort algos! How? It doesnt think orthogonally? But ppl do! OAI STILL think this is the path to AGI?! It cant be. Modern LLM stack today is essentially imitation learning + small amount of search via TTC (test time compute) leveraging gen-verifier gap to self-distill back into the weights. This will always confine you to the train manifold of function space to search. This makes novel programs that are much better but far outside the human manifold almost impossible to TTC your way to find. We need to teach the model a more general search procedure to explore the full hypothesis space without such heavy bias to human thinking (e.g. AlphaZero). People have given up on this bc at large action spaces such DQN+MCTS collapses. The idea shouldnt be thrown out just because the implementation of it doesnt scale. But thats what it seems everyone has done. If we want true AGI, we need models that can think from first principles, branching/exploring in a clever way to go the rest of the distance. Essentially mimicking the scientific method. Asking the RIGHT question / conducting a CLEVER experiment to reduce the hypothesis space. Why do frontier labs not get this yet? Or is this a psyops on us all?

Original Article

View Cached Full Text

Cached at: 05/26/26, 08:56 PM

If you train on (unsorted list, bubble sort procedure, sorted list) traces, you will never test time compute (TTC) your way to mergesort.

So frontier lab ppl say “well we dont just train on 1 algo, we train on many classes of sort algo’s so it should be able to explore the function space of sort”.

You are still limited then.

Lets for example say we dont know about non-comparative sort (radix sort). But we train on all comparative sort algos.. same issue. it wont sample non-comparative sort algos! How? It doesnt think orthogonally? But ppl do!

OAI STILL think this is the path to AGI?!

It cant be.

Modern LLM stack today is essentially imitation learning + small amount of search via TTC (test time compute) leveraging gen-verifier gap to self-distill back into the weights.

This will always confine you to the train manifold of function space to search.

This makes novel programs that are much better but far outside the human manifold almost impossible to TTC your way to find.

We need to teach the model a more general search procedure to explore the full hypothesis space without such heavy bias to human thinking (e.g. AlphaZero). People have given up on this bc at large action spaces such DQN+MCTS collapses. The idea shouldnt be thrown out just because the implementation of it doesnt scale. But thats what it seems everyone has done.

If we want true AGI, we need models that can think from first principles, branching/exploring in a clever way to go the rest of the distance. Essentially mimicking the scientific method.

Asking the RIGHT question / conducting a CLEVER experiment to reduce the hypothesis space.

Why do frontier labs not get this yet? Or is this a psyops on us all?

@FrancoisChauba1: If you train on (unsorted list, bubble sort procedure, sorted list) traces, you will never test time compute (TTC) your…

Similar Articles

@tunguz: Here is one big reason why this matters. Time spent on non-LLM inference tasks is only going to increase. However, tool…

@polynoamial: https://x.com/polynoamial/status/2064210146558136827

@jeremyphoward: I feel that the trend towards training models to autonomously go off and try to do everything themselves is anti-human.…

@askalphaxiv: A fascinating paper supervised by Yoshua Bengio "Generative Recursive Reasoning" Test time compute should scale not jus…

Test-Time Training Undermines Safety Guardrails

Submit Feedback

Similar Articles

@tunguz: Here is one big reason why this matters. Time spent on non-LLM inference tasks is only going to increase. However, tool…

@polynoamial: https://x.com/polynoamial/status/2064210146558136827

@jeremyphoward: I feel that the trend towards training models to autonomously go off and try to do everything themselves is anti-human.…

@askalphaxiv: A fascinating paper supervised by Yoshua Bengio "Generative Recursive Reasoning" Test time compute should scale not jus…

Test-Time Training Undermines Safety Guardrails