Tag
DataFlow is an open-source tool with visual, low-code pipelines to generate, clean, and prepare high-quality LLM training datasets from raw data. It includes a technical report on arXiv.