@auroter: Frontier AI is BRAINDEAD. GPT5.5 xHigh in Codex thinks I should use Tensor Parallelism to deploy Qwen 3.6 27B on my sys…

X AI KOLs Following News

Summary

The author criticizes Frontier AI (GPT5.5 xHigh) for incorrectly suggesting Tensor Parallelism for a model that fits on a single GPU, and announces a planned shootout comparing several AI models (GPT5.5, Opus 4.8, Qwen variants, Nemotron) on a real-world problem.

Frontier AI is BRAINDEAD. GPT5.5 xHigh in Codex thinks I should use Tensor Parallelism to deploy Qwen 3.6 27B on my system which has 4x RTX 6000 Pro Blackwell cards. Why, you ask? It's reasoning is that without Tensor Parallelism, "we would be forced to serve the model across 4 separate ports, which would confuse OpenCode." Yes, it's suggesting I run Tensor Parallelism to deploy a model which easily fits in BF16 on a single card. Because ports are scary, and you couldn't possibly listen on one of them and direct traffic accordingly. ... In other news, I am doing a shootout this morning, giving the same problem to GPT 5.5 xHigh, Opus 4.8 Max, Qwen3.5 397b-a17b, Qwen3.6 27B and Nemotron 3 Ultra. The larger models will be quantized to NVFP4, and the 27B will be run in BF16. As you may have noticed, we are not off to a good start with GPT5.5. It's struggling to figure out how to set up the shootout without my explicit guidance. So far I am seeing that its superiority over Opus 4.8 is marginal at best. Stay tuned. This topic of open source models vs frontier has come up a few times in recent conversations with people on X, so I want to do a real life comparison of these models and their ability to problem-solve real scenarios on my ongoing project.
Original Article
View Cached Full Text

Cached at: 06/08/26, 11:29 PM

Frontier AI is BRAINDEAD.

GPT5.5 xHigh in Codex thinks I should use Tensor Parallelism to deploy Qwen 3.6 27B on my system which has 4x RTX 6000 Pro Blackwell cards.

Why, you ask? It’s reasoning is that without Tensor Parallelism, “we would be forced to serve the model across 4 separate ports, which would confuse OpenCode.”

Yes, it’s suggesting I run Tensor Parallelism to deploy a model which easily fits in BF16 on a single card. Because ports are scary, and you couldn’t possibly listen on one of them and direct traffic accordingly.

In other news, I am doing a shootout this morning, giving the same problem to GPT 5.5 xHigh, Opus 4.8 Max, Qwen3.5 397b-a17b, Qwen3.6 27B and Nemotron 3 Ultra. The larger models will be quantized to NVFP4, and the 27B will be run in BF16.

As you may have noticed, we are not off to a good start with GPT5.5. It’s struggling to figure out how to set up the shootout without my explicit guidance. So far I am seeing that its superiority over Opus 4.8 is marginal at best.

Stay tuned. This topic of open source models vs frontier has come up a few times in recent conversations with people on X, so I want to do a real life comparison of these models and their ability to problem-solve real scenarios on my ongoing project.

Similar Articles