Tag
Fine-tuning small LLMs (3B-7B) with QLoRA on biomedical claim verification achieves higher F1 than GPT-4o and GPT-5 at 44.5x lower cost, and reveals a structural artifact in SciFact. The study demonstrates robust cross-domain transfer when training on structurally sound data.
A detailed benchmark of 20 small LLMs quantized for a 6GB GPU, measuring speed and VRAM usage at various context lengths, with qualitative probing for tool-use and instruction following. The report aims to help users with modest hardware choose models for local, private automation tasks.