Tag
This paper shows that layer-local training methods like Forward-Forward (FF) do not scale to realistic image sizes and datasets, and that synthetic benchmarks overstate their performance. The authors introduce a strong FF variant (DTG-FF) and demonstrate that on real data (e.g., ImageNet-100 at 224x224) FF achieves only 49.4% versus typical BP above 75%, while on synthetic tasks the gap narrows or reverses.