dataset-curation

Tag

Cards List
#dataset-curation

Sharing "cull" : my open-source dataset tool for image scraping & classification & captioning pipeline

Reddit r/LocalLLaMA · 2026-05-10

Cull is an open-source machine curation engine for AI image datasets that automates scraping, classification, and captioning to prepare data for training LoRAs or fine-tuning models.

0 favorites 0 likes
#dataset-curation

OpenAI Data Partnerships

OpenAI Blog · 2023-11-09 Cached

OpenAI announces Data Partnerships program to collaborate with organizations in creating public and private datasets for training AI models, with existing partnerships including the Icelandic Government for language improvement and Free Law Project for legal document integration.

0 favorites 0 likes
#dataset-curation

Improving language model behavior by training on a curated dataset

OpenAI Blog · 2021-06-10 Cached

OpenAI research demonstrates that language model behavior can be significantly improved through fine-tuning on small, curated datasets (<100 examples) targeting specific behavioral values, with effectiveness increasing at larger model scales. The approach provides users with tools to align models with Charter-compatible values for their specific applications.

0 favorites 0 likes
← Back to home

Submit Feedback