Tag
Indian workers, paid 250 rupees per hour, strap phones to their heads to record themselves doing household chores, providing training data for AI robots. They film over 90 different scenes and angles of actions every day, highlighting the labor issues behind AI training.
Scale AI CEO Alexandr Wang shares how Paul Graham's 'Schlep Blindness' essay inspired the company's focus on solving the unglamorous but critical problem of building high-quality data sets for machine learning.
This paper presents the construction of a Korean evaluation-annotated corpus (EVAD) for fine-grained aspect-based sentiment analysis in e-commerce reviews using Semi-Automatic Symbolic Propagation. It evaluates KoBERT and KcBERT models on the dataset, achieving high F1 scores in aspect-value pair recognition.
This paper introduces Annotator Policy Models (APMs) by Apple, which use interpretability techniques to infer annotators' internal safety policies from their labeling behavior without requiring additional annotation effort. The authors demonstrate that APMs can accurately model these policies and distinguish between sources of annotation disagreement, such as operational failures, policy ambiguity, and value pluralism.
This paper presents a large-scale analysis of four harmful language detection datasets, examining how annotator characteristics and linguistic features interact to influence annotation variation. It highlights intersectional effects and warns against generalizing findings across different datasets.
Tendem by Toloka is a platform that connects AI developers with human experts for data annotation and training.