Tag
The article argues that tool-calling reliability often does not scale with model capability; smaller models can outperform larger ones in schema adherence and format discipline, suggesting that raw capability is not the sole factor in choosing a model for tool use.