Tag
This paper evaluates LLM-based strategies (embedding, prompt, hybrid) against classical tabular models on an industrial car retrofit prediction dataset with hashed categorical features. It finds that tree ensembles outperform LLMs overall, but embeddings and hybrid approaches remain useful, while direct prompting fails without semantic cues.