TinyFish Bigset turns text prompts into live datasets (3 minute read)

TLDR AI Tools

Summary

TinyFish Bigset is an open-source multi-agent system that turns natural language prompts into structured datasets from the live web, with schema inference, autonomous research agents, and scheduled refresh. It runs self-hosted via Docker and is built on TinyFish's search infrastructure.

TinyFish released Bigset, an open-source system that converts text prompts into structured datasets from the live web.
Original Article
View Cached Full Text

Cached at: 06/03/26, 03:35 PM

# TinyFish Bigset turns text prompts into live datasets Source: [https://www.testingcatalog.com/tinyfish-bigset-turns-text-prompts-into-live-datasets-from-web/](https://www.testingcatalog.com/tinyfish-bigset-turns-text-prompts-into-live-datasets-from-web/) [![Google Preferred Source](https://www.testingcatalog.com/assets/images/google_preferred_source_badge_light_en.png?v=64eb2acc86)](https://google.com/preferences/source?q=testingcatalog.com) [TinyFish](https://www.tinyfish.ai/?ref=testingcatalog.com)has launched[Bigset](https://github.com/tinyfish-io/bigset?ref=testingcatalog.com), an open\-source multi\-agent system that turns a plain\-language sentence into a structured dataset pulled from the live web\. You describe what you want, and Bigset infers the schema, sends autonomous agents to research it on real web pages, verifies their findings against sources, deduplicates, and hands back a clean table you can export as CSV or XLSX\. Set a refresh cadence from 30 minutes to weekly, and have the agents rerun on schedule so the dataset stays current without anyone needing to touch a script\. 0:00 /0:48 University Application Tracker ![](https://storage.ghost.io/c/2a/1b/2a1b1782-8506-4d7d-bf53-ad3fb52e2a0f/content/media/2026/06/tfbig2_thumb.jpg)The work is split across two agent roles\. An orchestrator agent does breadth\-first discovery, identifying which rows belong in the dataset and where on the web to find them, then dispatches sub\-agents to fill each one\. The orchestrator holds no write access of its own\. Each sub\-agent researches a single entity under a tight budget of 6 tool calls, pulls real data via TinyFish Search and Fetch, and inserts one verified row with its source URLs and a record of how the data was found\. ![](https://storage.ghost.io/c/2a/1b/2a1b1782-8506-4d7d-bf53-ad3fb52e2a0f/content/images/2026/06/Screenshot-2026-06-02-113120.png)TinyFish BigsetSub\-agents are instructed never to fabricate values, to leave fields blank when they cannot be confirmed, and to reject duplicate primary keys automatically\. The orchestrator runs until the dataset reaches its row target, building faster as it learns where the data lives\. ![](https://storage.ghost.io/c/2a/1b/2a1b1782-8506-4d7d-bf53-ad3fb52e2a0f/content/images/2026/06/Screenshot-2026-06-02-113036.png)Tinyfish Bigset ResultsBigset is licensed under AGPL\-3\.0 and runs self\-hosted through Docker, with schema inference on Claude Sonnet 4\.6 and the agent roles on Qwen3\.7\-max by default, all routed through OpenRouter and configurable per role\. The team is candid that the project is experimental: a dataset takes 2 to 5 minutes to build, it works best on topics with public web data, and the free tier covers 2,500 row operations per month\. It ships with 9 curated public datasets covering AI companies hiring, GPU prices, model pricing, and top open\-source repositories, browsable without an account\. TinyFish is the Palo Alto\-based company behind the platform, backed by $47 million in Series A funding led by ICONIQ, and counts Google, DoorDash, and Amazon among its enterprise clients, having processed more than 40 million agent operations\. Bigset is built directly on TinyFish Search and Fetch, the same web infrastructure underneath the company's enterprise agent products, and arrives as the open\-source answer to proprietary natural\-language dataset tools, with no per\-seat pricing, no domain restrictions, and full pipeline ownership for anyone who runs it themselves\. Star it on[Github](https://github.com/tinyfish-io/bigset?ref=testingcatalog.com)and grab an[API key](https://bit.ly/4x6SIyk?ref=testingcatalog.com)\! 🔥

Similar Articles

Datasette Agent

Simon Willison's Blog

Datasette Agent is a new extensible AI assistant for Datasette that lets users query their data conversationally and generate charts via plugins. It supports local models and cloud APIs like Gemini 3.1 Flash-Lite.

Fish Audio S2 Technical Report

Papers with Code Trending

Fish Audio S2 is an open-source text-to-speech system featuring multi-speaker capabilities, multi-turn generation, and instruction-following control, backed by a production-ready inference engine with low latency.

666ghj/MiroFish

GitHub Trending (daily)

MiroFish is an open-source swarm intelligence engine that uses multi-agent technology to create a parallel digital world for predicting future outcomes. Users upload seed materials and receive detailed prediction reports and interactive simulations.

@gkxspace: Found a crazy open-source tool. You input a sentence describing what data you want, and it deploys a group of AI agents to research on various websites in parallel. After a few minutes, it compiles a structured table for you. In fact, the data is all on the internet, but turning it into a usable table has always been a labor-intensive task. In the past, this was an engineering project: combining searches, writing crawlers...

X AI KOLs Timeline

BigSet is an open-source tool. You input a sentence describing the data you need, and it deploys multiple AI agents to research the web in parallel, automatically inferring schema, deduplicating, verifying, and generating a structured table. It supports scheduled refreshes.