@quanruzhuoxiu: My favorite design in Midscene.js is actually not the AI part, but the HTML replay report. Every time a script runs, it automatically generates a single-file HTML report containing: - Screenshots of each step - Full prompt input to the model - JSON output from the model (...

X AI KOLs Timeline 05/22/26, 02:00 PM Tools

midscene-js html-report ai-automation debugging playwright screenshot prompt

Summary

Midscene.js's HTML replay report design helps developers quickly locate the cause of AI automation failures through the triple combination of screenshots, prompt, and model output.

My favorite design in Midscene.js is actually not the AI part, but the HTML replay report. Every time a script runs, it automatically generates a single-file HTML report containing: - Screenshots of each step - Full prompt input to the model - JSON output from the model (including coordinates of the located elements) - Bounding boxes drawn on the screenshots - Time taken for each step Why is this important? The hardest part of AI automation is not 'making it work', but 'how do you know why it didn't work'. Traditional Playwright gives you a one-line error; AI automation gives you a 'can't find element' message, which is as good as nothing. With the triple combination of screenshots, prompt, and model output, you can precisely identify whether the issue is 'poor prompt writing', 'screenshot misalignment', or 'model misunderstanding'. The fixes for these three types of problems are completely different. Here's a public example you can open directly — a complete replay of Midscene automatically liking a tweet on @midscene_ai: http://lf3-static.bytednsdoc.com/obj/eden-cn/nupipfups/Midscene/1.0-showcases/x.html…

Original Article

View Cached Full Text

Cached at: 05/23/26, 04:03 AM

One of my favorite design decisions in Midscene.js isn’t actually the AI part — it’s the HTML playback report.

Every time a script finishes, it automatically generates a single-file HTML report that includes:

Screenshots at each step
The full prompt text fed to the model
The model’s JSON output (with positioning coordinates)
Bounding boxes drawn on the screenshots
Time taken for each step

Why this matters — the hardest part of AI automation isn’t “making it work,” it’s “when it doesn’t work, how do you know why.” Traditional Playwright gives you one line of error. AI automation failing gives you “element not found,” which is basically useless.

With the trio of screenshot + prompt + model output, you can precisely pinpoint whether it’s “badly written prompt,” “wrongly captured screenshot,” or “model misunderstood” — and the fix for each is completely different.

Here’s a public example you can click and see — a full playback of having Midscene automatically like a tweet on @midscene_ai: http://lf3-static.bytednsdoc.com/obj/eden-cn/nupipfups/Midscene/1.0-showcases/x.html

Similar Articles

@quanruzhuoxiu: Over the two years of developing Midscene.js, we made a belated but critical decision: UI automation will sooner or later shift from 'understanding the DOM' to 'looking at the screen'. So in the December 1.0 release, we directly cut the DOM compatibility path. In the early days, like everyone else, we followed a DOM + visual hybrid approach...

X AI KOLs Timeline

The Midscene.js team decided to completely shift from a DOM + visual hybrid approach to pure visual UI automation, believing that future UI automation must be based on screenshots rather than the DOM. This change reduced token consumption and simplified cross-platform adaptation.

@geekbb: AI-generated technical docs are often thousands of lines long, scrolling in the terminal — nobody wants to read them. md2html lets AI automatically convert those Markdown docs into HTML pages with sidebar table of contents, diagrams, timelines, cards, and callouts, all in a single file to share with the team. https://github.c…

X AI KOLs Timeline

md2html is a tool that converts AI-generated Markdown documents into polished, self-contained HTML pages with sidebar table of contents, diagrams, timelines, and callouts, making them easier to read and share.

@Saccc_c: To provide a more intuitive overview of my valuable AI outputs, I've pinned this long post for my followers and clients to view. It will be updated regularly. Current key outputs include: 1) 360-degree panoramic images created with Image 2.0 + Three.js; 2) Videos created with Image 2.0 + Seedance…

X AI KOLs Following

This post introduces a collection of AI outputs created by the author using tools such as Image 2.0, Three.js, Seedance 2.0, and Codex/Claude Code, aiming to showcase their current primary work.

@AYi_AInotes: Claude's engineers have completely abandoned Markdown. It's not that Markdown doesn't work well—it's that AI has evolved too fast for it to keep up. Back when AI wrote 10 lines of notes, Markdown was perfect. Now AI can output 1000 lines of plans, complex flowcharts, and complete code reviews all at once—who has the patience to read through a wall of plain text?

X AI KOLs Timeline

Claude's engineers are ditching Markdown for HTML because AI output has grown from 10 lines to 1000 lines, making plain text formats impractical. HTML enables colored tables, SVG flowcharts, and interactive prototypes—significantly improving human-AI collaboration, albeit with 2-4x longer generation times.

@quanruzhuoxiu: Often asked: What's the difference between Midscene and Browser-Use? Both are open-source, both use vision, both solve their respective problems. Here's an honest comparison, not to bash Browser-Use. Browser-Use is a web agent, positioned as "open the browser, get this done…"

X AI KOLs Timeline

A comparison of Midscene and Browser-Use, two open-source tools with different focuses: Browser-Use is a web agent for one-time tasks, while Midscene is a vision SDK designed for reliable multi-platform repeated execution.

Similar Articles

@quanruzhuoxiu: Often asked: What's the difference between Midscene and Browser-Use? Both are open-source, both use vision, both solve their respective problems. Here's an honest comparison, not to bash Browser-Use. Browser-Use is a web agent, positioned as "open the browser, get this done…"

Submit Feedback