@atomic_chat_hq: Open-weight MiniMax M3 filled out a US customs form from a driver's license photo For this test we deployed MiniMax M3 …
Summary
A test of the open-weight MiniMax M3 model using MLX-VLM on a Mac Studio shows it can autonomously fill out a US customs form from a driver's license photo and a scanned document, using tool calls for fields, checkboxes, and signature.
View Cached Full Text
Cached at: 06/15/26, 09:09 PM
Open-weight MiniMax M3 filled out a US customs form from a driver’s license photo
For this test we deployed MiniMax M3 Q4 using MLX-VLM on a Mac Studio M3 Ultra 512GB RAM. The model was tasked with reading a scanned document and an ID card photo, then completing a declaration form
Output: 736 tokens · Input: 1,847 tokens · Time: ~31s
The model analyzed both inputs, streamed its reasoning, and then called three tools: write_field for text fields, mark for Yes/No checkboxes, and sign for the signature and date. It extracted the required information, mapped it to the correct fields and completed the form without any manual input
Similar Articles
MiniMax M3 available on HuggingChat (with Artifacts support)
MiniMax M3 model is now available on HuggingChat, an open source AI chat app with Artifacts support.
MiniMax M3 (2 minute read)
MiniMax introduces M3, the first open-weights model to combine coding, agentic, and multimodal capabilities with up to 1M context via sparse attention.
Testing MiniMax M2.7 via API on three real ML and coding workflows
A developer tests the MiniMax M2.7 model via its API on three practical machine learning and coding workflows, evaluating its performance.
@PrajwalTomar_: Everyone's sleeping on MiniMax. Again. They just shipped M3. The first open-weights model to combine frontier coding, 1…
MiniMax released M3, an open-weights model combining frontier coding, 1M context, and native multimodality, offering comparable performance to Opus at a fraction of the cost.
MiniMax promises M3 weights after 1M-context model launch (2 minute read)
MiniMax released M3, a model with a 1M-token context window and native multimodal input, via API. The company promises open-weight release and a technical report within 10 days.