Why our #1 LightGBM feature by importance made predictions worse [D]

Reddit r/MachineLearning News

Summary

A blog post from Flyback demonstrates how a LightGBM feature that ranked #1 in importance actually worsened predictions due to target encoding leakage, highlighting the danger of relying solely on feature importance metrics.

We recently hit a classic gradient boosting trap with our pricing engine (Flyback), and I wanted to share the ablation data. We run LightGBM quantile regression to forecast secondary market watch prices. We engineered a variant-conditioned Bayesian target encoder to isolate within-reference pricing dynamics. LightGBM absolutely loved it. It ranked #1 in feature importance at q90 by a wide margin, with gains several times the next-highest feature, across all our multi seed runs. But when we ran a strict 4-seed × 3-variant ablation on the hold-out set, the results inverted. Test MAPE regressed by +0.28pp and the between-variant delta was 7x the within-variant standard deviation. The encoder was finding effective splits that completely failed to generalize because the signal it was learning was driven by irreducible label variance: unobserved factors like condition nuance, seller behavior, and timing that no feature can capture. I wrote a full post breaking down the architecture, the ablation methodology, and the mechanism behind the divergence. Happy to discuss LightGBM split mechanics, target encoding leakage, or the ablation setup. Full post and ablation results: [https://flyback.ai/engineering/target-encoding-divergence](https://flyback.ai/engineering/target-encoding-divergence)
Original Article

Similar Articles

The Implicit Bias of Depth: From Neural Collapse to Softmax Codes

arXiv cs.LG

This paper studies how depth alone induces an implicit low-rank bias in deep unconstrained feature models trained without regularization, shifting the optimal solution from neural collapse to softmax codes, and provides the first asymptotic and dynamic characterization of this bias under gradient descent with cross-entropy loss.