data-preparation

Tag

Cards List
#data-preparation

@tom_doerr: Converts documents and media into structured JSON for LLMs https://github.com/adithya-s-k/omniparse…

X AI KOLs Timeline · 2026-05-25 Cached

OmniParse is a local platform that ingests and parses unstructured data (documents, images, video, audio, web) into structured JSON optimized for LLM applications like RAG and fine-tuning.

0 favorites 0 likes
#data-preparation

One AI agent use case that’s actually been useful for me at work

Reddit r/AI_Agents · 2026-05-18

The author shares that they find AI agents useful for repetitive data prep work, specifically using Pandada to clean and standardize raw files, which reduces manual effort and mistakes.

0 favorites 0 likes
#data-preparation

@tom_doerr: Generates LLM-ready datasets from raw data https://github.com/OpenDCAI/DataFlow…

X AI KOLs Timeline · 2026-05-16 Cached

DataFlow is an open-source tool with visual, low-code pipelines to generate, clean, and prepare high-quality LLM training datasets from raw data. It includes a technical report on arXiv.

0 favorites 0 likes
#data-preparation

DataFlow: An LLM-Driven Framework for Unified Data Preparation and Workflow Automation in the Era of Data-Centric AI

Papers with Code Trending · 2025-12-18 Cached

DataFlow is an LLM-driven framework for automated data preparation and workflow engineering, featuring nearly 200 reusable operators and six domain-general pipelines that improve LLM performance across tasks like math, code, and Text-to-SQL.

0 favorites 0 likes
← Back to home

Submit Feedback