Can Neural Networks Achieve Optimal Computational-statistical Tradeoff? An Analysis on Single-Index Model

arXiv cs.LG Papers

Summary

This paper demonstrates that two-layer neural networks trained with gradient-based methods can achieve the optimal computational-statistical tradeoff for learning Gaussian single-index models, matching the SQ lower bound up to polylogarithmic factors for all generative exponents and extending to sparse settings with a novel weight perturbation technique.

arXiv:2606.15219v1 Announce Type: new Abstract: In this work, we tackle the following question: Can neural networks trained with gradient-based methods achieve the optimal computational-statistical tradeoff in learning Gaussian single-index models? Prior research has shown that any polynomial-time algorithm under the statistical query (SQ) framework requires $\Omega(d^{s^\star/2}\lor d)$ samples, where $s^\star$ is the generative exponent representing the intrinsic difficulty of learning the underlying model. However, it remains unknown whether neural networks can achieve this sample complexity. Inspired by prior techniques such as label transformation and landscape smoothing for learning single-index models, we propose a unified gradient-based algorithm for training a two-layer neural network in polynomial time. Our method is adaptable to a variety of loss and activation functions, covering a broad class of existing approaches. We show that our algorithm learns a feature representation that strongly aligns with the unknown signal $\theta^\star$, with sample complexity $\widetilde{O} (d^{s^\star/2} \lor d)$, matching the SQ lower bound up to a polylogarithmic factor for all generative exponents $s^\star\geq 1$. Furthermore, we extend our approach to the setting where $\theta^\star$ is $k$-sparse for $k = o(\sqrt{d})$ by introducing a novel weight perturbation technique that leverages the sparsity structure. We derive a corresponding SQ lower bound of order $\widetilde{\Omega}(k^{s^\star})$, matched by our method up to a polylogarithmic factor. Our framework, especially the weight perturbation technique, is of independent interest, and suggests potential gradient-based solutions to other problems such as sparse tensor PCA.
Original Article
View Cached Full Text

Cached at: 06/16/26, 11:38 AM

# Can Neural Networks Achieve Optimal Computational-statistical Tradeoff? An Analysis on Single-Index Model
Source: [https://arxiv.org/abs/2606.15219](https://arxiv.org/abs/2606.15219)
[View PDF](https://arxiv.org/pdf/2606.15219)

> Abstract:In this work, we tackle the following question: Can neural networks trained with gradient\-based methods achieve the optimal computational\-statistical tradeoff in learning Gaussian single\-index models? Prior research has shown that any polynomial\-time algorithm under the statistical query \(SQ\) framework requires $\\Omega\(d^\{s^\\star/2\}\\lor d\)$ samples, where $s^\\star$ is the generative exponent representing the intrinsic difficulty of learning the underlying model\. However, it remains unknown whether neural networks can achieve this sample complexity\. Inspired by prior techniques such as label transformation and landscape smoothing for learning single\-index models, we propose a unified gradient\-based algorithm for training a two\-layer neural network in polynomial time\. Our method is adaptable to a variety of loss and activation functions, covering a broad class of existing approaches\. We show that our algorithm learns a feature representation that strongly aligns with the unknown signal $\\theta^\\star$, with sample complexity $\\widetilde\{O\} \(d^\{s^\\star/2\} \\lor d\)$, matching the SQ lower bound up to a polylogarithmic factor for all generative exponents $s^\\star\\geq 1$\. Furthermore, we extend our approach to the setting where $\\theta^\\star$ is $k$\-sparse for $k = o\(\\sqrt\{d\}\)$ by introducing a novel weight perturbation technique that leverages the sparsity structure\. We derive a corresponding SQ lower bound of order $\\widetilde\{\\Omega\}\(k^\{s^\\star\}\)$, matched by our method up to a polylogarithmic factor\. Our framework, especially the weight perturbation technique, is of independent interest, and suggests potential gradient\-based solutions to other problems such as sparse tensor PCA\.

## Submission history

From: Siyu Chen \[[view email](https://arxiv.org/show-email/6b7b9978/2606.15219)\] **\[v1\]**Sat, 13 Jun 2026 09:34:39 UTC \(1,738 KB\)

Similar Articles

The Implicit Bias of Depth: From Neural Collapse to Softmax Codes

arXiv cs.LG

This paper studies how depth alone induces an implicit low-rank bias in deep unconstrained feature models trained without regularization, shifting the optimal solution from neural collapse to softmax codes, and provides the first asymptotic and dynamic characterization of this bias under gradient descent with cross-entropy loss.

QuIDE: Mastering the Quantized Intelligence Trade-off via Active Optimization

arXiv cs.LG

This paper introduces QuIDE, a framework featuring an Intelligence Index to evaluate the trade-offs between compression, accuracy, and latency in quantized neural networks. It demonstrates that optimal bit-widths vary by task, with 4-bit being ideal for LLMs and simple tasks, while 8-bit is better for complex CNNs.

State-Space NTK Collapse Near Bifurcations

arXiv cs.LG

This paper develops a local theory of gradient descent near bifurcations in dynamical models, showing that the state-space neural tangent kernel collapses to a rank-one operator that dominates learning dynamics, making optimization effectively low-dimensional and predictable from normal forms.