Visuals v/s Description. Splitting a task into different models works better than expected.

Reddit r/ArtificialInteligence 05/16/26, 07:13 PM News

multi-model claude gemini image-to-code workflow cost-optimization

Summary

A user shares how splitting a visual coding task between Gemini (to produce XML description from an image) and Claude (to generate Next.js/Tailwind code) improved accuracy and reduced token cost compared to using Claude alone.

So about a hour ago, I was coding with Claude projects. I was building my site, and thought of generating the prototype from Image 2, then using Claude. I asked it to provide me the component in Next JS, & Tailwind according to the shown image. And it generates gibberish. The visuals perfectly addresses all the visuals, but Claude couldn't code it, and generated something ..*shameful*. So, even though Claude excels at Coding & creative writing, it's all nuts when it comes to analyzing an Image. Well then after a short research, I used XML to properly define the visual through Gemini. And then pasted the XML + Visuals into Claude. And it created as exactly shown (but with black background and some tweaks). What it costed me before was: 250K tokens = Fluff. Then after this, it costed: 140K tokens. The cost was lower, but the results were actually different. What's your opinion about this? Ps, BTW, I'm thinking of creating a documentry to discuss about building a powerful SaaS, with LLMS. I will discuss about my failures and realization. Just saying ^⁠_⁠_⁠_⁠_⁠_⁠_⁠_⁠_⁠_⁠^ . Downvote me, if I don't fit this subreddit, & comment.

Original Article

Visuals v/s Description. Splitting a task into different models works better than expected.

Similar Articles

Claude vision v/s Gemini vision (Gemini is much better in vision and world knowledge)

Those of you who like Gemma4 models - how are you guys using them?

Ask HN: Has anyone replaced Claude/GPT with a local model for daily coding?

Using Claude Code: The Unreasonable Effectiveness of HTML

Claude Code vs OpenCode: I ran the same agent tasks in both. Here’s where each one broke.

Submit Feedback

Similar Articles

Claude vision v/s Gemini vision (Gemini is much better in vision and world knowledge)

Those of you who like Gemma4 models - how are you guys using them?

Ask HN: Has anyone replaced Claude/GPT with a local model for daily coding?

Using Claude Code: The Unreasonable Effectiveness of HTML

Claude Code vs OpenCode: I ran the same agent tasks in both. Here’s where each one broke.