The article argues that AI-generated UIs need evidence gates like design tokens, screenshots, and visual QA to ensure quality, and introduces Superloopy, a CLI tool that enforces these checks.
I think frontend work exposes a weird weakness in AI coding agents. For backend tasks, failure is often obvious: tests fail, types fail, the API returns the wrong thing. For UI work, an agent can make the app compile and still leave you with something that feels generated: - inconsistent spacing and shadows - default typography - random gradients - components that do not share a design language - no browser screenshots proving the result actually looks right The useful bar, at least for me, is not “the agent edited the React files.” It is closer to an evidence gate: Define the visual contract before coding. A `DESIGN.md` or token file should say what colors, type scales, spacing, radii, shadows, and motion are allowed. Block generic AI defaults before implementation. If the result drifts into the same purple-gradient / three-card / random-shadow SaaS pattern, that should fail before “done.” Verify in a real browser, not just with a build. Capture screenshots at mobile/tablet/desktop widths, check empty/loading/error states, and verify interactions instead of trusting a static code diff. If there is a reference target, use visual diff as a map, not a verdict. Hotspots should tell the reviewer where to inspect; a high similarity score should not override clipped text, broken layout, or fake parity. Make the final answer cite evidence. “Done” should point to screenshots, logs, test output, or a visual QA artifact, and it should say what is still uncertain. I’m building this into a small MIT Codex plugin/CLI called Superloopy. I’m the developer, so this is partly a project post, but the underlying idea is the part I’d like feedback on. Recent work added a `superloopy-frontend` skill that tries to make frontend work better by requiring a design-token contract, anti-slop checks, a 92-entry brand/style reference library, design-system compliance checks, screenshot evidence, and visual QA before the agent can claim the UI is done. The same pattern also shows up in the research and clone skills: - research: cited synthesis, expansion waves, claim ledger, verification artifacts - authorized website rebuilds: screenshots, DOM/topology, computed styles, assets, component specs, build output, visual QA Repo for context: https://github.com/beefiker/superloopy Question: if you use AI agents for product/frontend work, what evidence would actually make you trust the final answer? Screenshots? Design-token compliance? Visual diffs? Lighthouse? A human checklist? Something else?
Argues that ugly AI app UIs are due to missing a structured design system, not AI itself. Recommends using Moonchild to create token-based components that Codex can read for consistent interfaces.
The author discovers that instructing AI agents to generate web frontends in Qt style greatly reduces the typical 'slop' in AI-generated UIs, sharing results and calling for further experimentation.
A developer explores why AI coding agents produce inconsistent UI across sessions and compares solutions like natural-language rules files, Tailwind config, and structured token specs like Google Labs' design.md, seeking feedback on what works in practice.
The article analyzes a PocketOS incident where an AI agent deleted a production database, arguing for 'hard gates' like validator independence and reversibility checks instead of relying solely on prompts.
Impeccable is a suite of 18 CLI commands, a Chrome extension and library that embeds design-quality checks into AI coding workflows to detect and fix common UI anti-patterns without needing an LLM.