Moonshine: An Autonomous Mathematical Research Agent Centered on Conjecture Generation

arXiv cs.AI Papers

Summary

This paper presents Moonshine, an autonomous mathematical research agent that generates conjectures, exemplified by deriving the Neural Jacobian Conjecture from the classical Jacobian conjecture and proving a special case using LLMs.

arXiv:2606.10806v1 Announce Type: new Abstract: Moonshine is an autonomous agent whose central objective is to generate mathematical conjectures. Its core capability is to extract structure from classical problems, distill new concepts, and formulate conjectures of mathematical significance. Rather than treating the solution of a single proposition as its endpoint, Moonshine builds an extensible theoretical framework through conjecture generation, bridge building, and obstacle identification. This article uses Moonshine's exploration of the Jacobian conjecture as an example. It shows how the central logic of whether local nondegeneracy can force global injectivity is transferred to one-hidden-layer affine-ridge sigmoid networks. This leads to the formulation of the \emph{Neural Jacobian Conjecture} (NJC): if such a network has strictly positive Jacobian determinant on the whole space, then it must be globally injective. By invoking GPT-5.5-pro and DeepSeek-V4-pro separately, Moonshine obtained independent complete proofs for the case \(N=n+1\). In addition, with the assistance of ChatGPT through interactive use of its web interface with GPT-5.5-pro, a geometric-topological proof was developed. These results provide preliminary evidence for the plausibility of the conjecture. The general higher-width case \(N\ge n+2\), however, remains unresolved and is left for further investigation. This work illustrates Moonshine's ability to autonomously generate meaningful mathematical problems and make rigorous progress on them.
Original Article
View Cached Full Text

Cached at: 06/10/26, 06:17 AM

# Moonshine: An Autonomous Mathematical Research Agent Centered on Conjecture Generation
Source: [https://arxiv.org/html/2606.10806](https://arxiv.org/html/2606.10806)
Xiaoyang Chen, Xiang Jiang School of Mathematical Science, Tongji University Moonshine Technology

###### Abstract

Moonshine is an autonomous agent whose central objective is to generate mathematical conjectures\. Its core capability is to extract structure from classical problems, distill new concepts, and formulate conjectures of mathematical significance\. Rather than treating the solution of a single proposition as its endpoint, Moonshine builds an extensible theoretical framework through conjecture generation, bridge building, and obstacle identification\. This article uses Moonshine’s exploration of the Jacobian conjecture as an example\. It shows how the central logic of whether local nondegeneracy can force global injectivity is transferred to one\-hidden\-layer affine\-ridge sigmoid networks\. This leads to the formulation of the*Neural Jacobian Conjecture*\(NJC\): if such a network has strictly positive Jacobian determinant on the whole space, then it must be globally injective\. By invoking GPT\-5\.5\-pro and DeepSeek\-V4\-pro separately, Moonshine obtained independent complete proofs for the caseN=n\+1N=n\+1\. In addition, with the assistance of ChatGPT through interactive use of its web interface with GPT\-5\.5\-pro, a geometric\-topological proof was developed\. These results provide preliminary evidence for the plausibility of the conjecture\. The general higher\-width caseN≥n\+2N\\geq n\+2, however, remains unresolved and is left for further investigation\. This work illustrates Moonshine’s ability to autonomously generate meaningful mathematical problems and make rigorous progress on them\.

## 1Moonshine as an Autonomous Mathematical Research Agent Centered on Conjecture Generation

Moonshine is an autonomous mathematical research agent framework\. It differs from question\-answering systems or numerical computation tools in that its design goal is toautonomously generate valuable mathematical conjecturesand to verify or refute them through structured exploration\. Moonshine’s behavior revolves around the following components:

- •Structural recognition and conjecture distillation\.It identifies core structural features in classical problems or mathematical objects, distills new concepts, and formulates precise and testable conjectures based on them\.
- •Deep exploration and bridge building\.It connects conjectures with existing theories, explores interactions with other mathematical areas, and derives conditional conclusions\.
- •Obstacle identification and boundary delineation\.Through proofs and counterexample construction, it clarifies sufficient conditions under which a conjecture holds and identifies barriers that cannot be crossed, thereby determining the true range of applicability\.
- •Research logs and memory\.It maintains long\-term structured logs recording the evolution of conjectures, proof attempts, failed paths, and open subproblems, thereby forming an extensible theoretical framework\.

Moonshine is organized around a runtime home directory \(by default~/\.moonshine\), which contains configuration files, project folders, session logs, knowledge bases, skills, tools, and MCP server definitions\. The agent operates in a research mode, in which it can iterate autonomously, search its historical memory, invoke verification tools, and gradually enrich its understanding of a given conjecture\.

##### Moonshine’s exploration of the Jacobian conjecture and the formulation of the Neural Jacobian Conjecture\.

Inspired by the classical complex Jacobian conjecture, Moonshine did not attempt to prove or disprove the original conjecture directly\. Instead, it extracted the core logic: whether local nondegeneracy, expressed by a nonzero Jacobian determinant, can force global injectivity\. This logic was then transferred to a restricted but structurally transparent function class, namely one\-hidden\-layer affine\-ridge sigmoid networks\. By analyzing the special algebraic and geometric structure of this network class, Moonshine distilled a new conjecture, theNeural Jacobian Conjecture\(NJC\)\. The conjecture asserts that if the Jacobian determinant of such a network is everywhere positive, then the network must be globally injective\. This conjecture is both an analogy with the classical problem and an independently meaningful statement, because it attributes the rigidity from local diffeomorphism to global injectivity to the special affine\-ridge structure\.

The following sections present Moonshine’s exploration of the NJC\.

## 2The Neural Jacobian Conjecture \(NJC\)

### 2\.1Function class and notation

Letσ:ℝ→\(0,1\)\\sigma:\\mathbb\{R\}\\to\(0,1\)be the logistic sigmoid function

σ​\(t\)=11\+e−t\.\\sigma\(t\)=\\frac\{1\}\{1\+e^\{\-t\}\}\.It is strictly increasing and satisfiesσ′​\(t\)=σ​\(t\)​\(1−σ​\(t\)\)\>0\\sigma^\{\\prime\}\(t\)=\\sigma\(t\)\(1\-\\sigma\(t\)\)\>0for allt∈ℝt\\in\\mathbb\{R\}\.

###### Definition 2\.1\(Affine\-ridge sigmoid networks\)\.

Forn,N≥1n,N\\geq 1, a mapF:ℝn→ℝnF:\\mathbb\{R\}^\{n\}\\to\\mathbb\{R\}^\{n\}is called a one\-hidden\-layer affine\-ridge sigmoid network of widthNNif

F​\(x\)=W\(2\)​σ​\(W\(1\)​x\+b\(1\)\)\+b\(2\),F\(x\)=W^\{\(2\)\}\\sigma\\bigl\(W^\{\(1\)\}x\+b^\{\(1\)\}\\bigr\)\+b^\{\(2\)\},whereW\(1\)∈ℝN×nW^\{\(1\)\}\\in\\mathbb\{R\}^\{N\\times n\},W\(2\)∈ℝn×NW^\{\(2\)\}\\in\\mathbb\{R\}^\{n\\times N\},b\(1\)∈ℝNb^\{\(1\)\}\\in\\mathbb\{R\}^\{N\},b\(2\)∈ℝnb^\{\(2\)\}\\in\\mathbb\{R\}^\{n\}, andσ\\sigmaacts componentwise\. This class is denoted by𝒩n,N\\mathcal\{N\}\_\{n,N\}\. When injectivity is unaffected by the notation, we write

F​\(x\)=A​σ​\(B​x\+c\)\.F\(x\)=A\\sigma\(Bx\+c\)\.

###### Definition 2\.2\(Positive\-Jacobian subclass\)\.

Define

𝒩n,N\+=\{F∈𝒩n,N:detD​F​\(x\)\>0​for all​x∈ℝn\}\.\\mathcal\{N\}\_\{n,N\}^\{\+\}=\\\{F\\in\\mathcal\{N\}\_\{n,N\}:\\det DF\(x\)\>0\\text\{ for all \}x\\in\\mathbb\{R\}^\{n\}\\\}\.

### 2\.2The conjecture proposed by Moonshine

###### Conjecture 2\.3\(Neural Jacobian Conjecture\)\.

For everyF∈𝒩n,N\+F\\in\\mathcal\{N\}\_\{n,N\}^\{\+\}, the mapFFis globally injective\.

The motivation behind the conjecture is that a local diffeomorphism can in general form multiple sheets, whereas the special structure of affine\-ridge sigmoid networks \- in particular, the interaction between the kernel of the output weight matrix and the image of the hidden layer \- may enforce uniqueness\. If true, the NJC would provide a rigidity theorem in the neural\-network setting, showing that local nondegeneracy implies global injectivity under this special structural constraint, and it would form an interesting parallel with the classical Jacobian conjecture\.

### 2\.3A geometric reformulation

Leth​\(x\)=σ​\(B​x\+c\)h\(x\)=\\sigma\(Bx\+c\), and let

X1=h​\(ℝn\)⊂\(0,1\)N,X2=ker⁡A⊂ℝN\.X\_\{1\}=h\(\\mathbb\{R\}^\{n\}\)\\subset\(0,1\)^\{N\},\\qquad X\_\{2\}=\\ker A\\subset\\mathbb\{R\}^\{N\}\.HereX1X\_\{1\}is the hidden\-layer submanifold andX2X\_\{2\}is the output kernel\. ThenFFis injective if and only if

\(p\+X2\)∩X1=\{p\},∀p∈X1\.\(p\+X\_\{2\}\)\\cap X\_\{1\}=\\\{p\\\},\\qquad\\forall p\\in X\_\{1\}\.The positive\-Jacobian conditiondetD​F\>0\\det DF\>0is equivalent to the transversal intersection ofX1X\_\{1\}andp\+X2p\+X\_\{2\}at every point, with local intersection index\+1\+1\. Hence the NJC can be reformulated as follows: do transversality and positive local index force every affine fiber to intersectX1X\_\{1\}uniquely?

## 3Partial Verification of the NJC in Low Width

Moonshine does not claim to have proved the NJC in full\. It first analyzes the most accessible cases, namely when the hidden\-layer widthNNequals the input dimensionnn, or is larger by one\. These cases provide initial evidence for the conjecture\.

### 3\.1The caseN=nN=n

###### Proposition 3\.1\.

IfN=nN=nandF∈𝒩n,n\+F\\in\\mathcal\{N\}\_\{n,n\}^\{\+\}, thenFFis injective\.

###### Proof\.

WhenN=nN=n, the matricesA,B∈ℝn×nA,B\\in\\mathbb\{R\}^\{n\\times n\}\. Define

D​\(x\)=diag⁡\(σ′​\(\(B​x\+c\)1\),…,σ′​\(\(B​x\+c\)n\)\)\.D\(x\)=\\operatorname\{diag\}\\bigl\(\\sigma^\{\\prime\}\(\(Bx\+c\)\_\{1\}\),\\ldots,\\sigma^\{\\prime\}\(\(Bx\+c\)\_\{n\}\)\\bigr\)\.
Since

detD​F​\(x\)=detA⋅detD​\(x\)⋅detB\>0\\det DF\(x\)=\\det A\\cdot\\det D\(x\)\\cdot\\det B\>0for allxx, anddetD​\(x\)\>0\\det D\(x\)\>0, bothdetA\\det AanddetB\\det Bare nonzero\. ThusAAandBBare invertible\. The mapF​\(x\)=A​σ​\(B​x\+c\)F\(x\)=A\\sigma\(Bx\+c\)is a composition of three injective maps: the invertible affine mapx↦B​x\+cx\\mapsto Bx\+c, the componentwise strictly increasing mapz↦σ​\(z\)z\\mapsto\\sigma\(z\), and the invertible linear mapw↦A​ww\\mapsto Aw\. HenceFFis injective\. ∎

### 3\.2The caseN=n\+1N=n\+1

This is the smallest width in whichAAhas a nontrivial kernel\. The main result of this section is the following\.

###### Theorem 3\.2\.

LetF:ℝn→ℝnF:\\mathbb\{R\}^\{n\}\\to\\mathbb\{R\}^\{n\}be a one\-hidden\-layer affine\-ridge sigmoid network withN=n\+1N=n\+1hidden units\. If

detD​F​\(x\)\>0,∀x∈ℝn,\\det DF\(x\)\>0,\\qquad\\forall x\\in\\mathbb\{R\}^\{n\},thenFFis injective\.

The first proof below was obtained by Moonshine using GPT\-5\.5\-pro\.

#### 3\.2\.1First algebraic proof: an injectivity lemma on convex sets

As a supplement to Theorem[3\.2](https://arxiv.org/html/2606.10806#S3.Thmtheorem2), we give a proof whose core is the following injectivity lemma for a linear projection applied to the graph of a scalar function\.

###### Lemma 3\.3\(Injectivity lemma on convex sets\)\.

LetΩ⊂ℝn\\Omega\\subset\\mathbb\{R\}^\{n\}be a nonempty open convex set, leth∈C1​\(Ω\)h\\in C^\{1\}\(\\Omega\), and letL:ℝn\+1→ℝnL:\\mathbb\{R\}^\{n\+1\}\\to\\mathbb\{R\}^\{n\}be linear\. Define

T​\(y\)=L​\(y,h​\(y\)\)\.T\(y\)=L\(y,h\(y\)\)\.IfdetD​T​\(y\)≠0\\det DT\(y\)\\neq 0for ally∈Ωy\\in\\Omega, thenTTis injective\.

###### Proof\.

SincedetD​T​\(y\)≠0\\det DT\(y\)\\neq 0, we haverank⁡D​T​\(y\)=n\\operatorname\{rank\}DT\(y\)=n\. Hencerank⁡L=n\\operatorname\{rank\}L=n, because the image ofD​T​\(y\)DT\(y\)is contained in the image ofLL\. Thusdimker⁡L=1\\dim\\ker L=1\. Letk=\(p,r\)∈ℝn×ℝk=\(p,r\)\\in\\mathbb\{R\}^\{n\}\\times\\mathbb\{R\}be a nonzero vector spanningker⁡L\\ker L\. There are two cases\.

*Case 1:r≠0r\\neq 0\.*Rescalekkso thatr=−1r=\-1, hencek=\(p,−1\)k=\(p,\-1\)\. DefineQ​\(y,z\)=y\+p​zQ\(y,z\)=y\+pz\. Thenker⁡Q=ℝ​\(p,−1\)=ker⁡L\\ker Q=\\mathbb\{R\}\(p,\-1\)=\\ker L\. Since bothLLandQQare full\-rank maps fromℝn\+1\\mathbb\{R\}^\{n\+1\}toℝn\\mathbb\{R\}^\{n\}, there existsA0∈G​Ln​\(ℝ\)A\_\{0\}\\in GL\_\{n\}\(\\mathbb\{R\}\)such thatA0​L=QA\_\{0\}L=Q\. Acting on the target byA0A\_\{0\}does not affect injectivity, so we may assume

T​\(y\)=y\+p​h​\(y\)\.T\(y\)=y\+ph\(y\)\.SupposeT​\(y1\)=T​\(y2\)T\(y\_\{1\}\)=T\(y\_\{2\}\)\. Then

y2−y1=p​\(h​\(y1\)−h​\(y2\)\)\.y\_\{2\}\-y\_\{1\}=p\\bigl\(h\(y\_\{1\}\)\-h\(y\_\{2\}\)\\bigr\)\.Thusy2−y1y\_\{2\}\-y\_\{1\}is parallel topp\. Ifp=0p=0, thenT​\(y\)=yT\(y\)=y, and there is nothing to prove\. Assumep≠0p\\neq 0\. Theny2=y1\+s​py\_\{2\}=y\_\{1\}\+spfor somes≠0s\\neq 0\. By convexity ofΩ\\Omega, the segment\{y1\+t​p:0≤t≤s\}\\\{y\_\{1\}\+tp:0\\leq t\\leq s\\\}is contained inΩ\\Omega\. The equality above gives

s\+h​\(y1\+s​p\)−h​\(y1\)=0\.s\+h\(y\_\{1\}\+sp\)\-h\(y\_\{1\}\)=0\.Setψ​\(t\)=t\+h​\(y1\+t​p\)\\psi\(t\)=t\+h\(y\_\{1\}\+tp\)\. Thenψ​\(s\)=ψ​\(0\)\\psi\(s\)=\\psi\(0\)\. Moreover,

ψ′​\(t\)=1\+∇h​\(y1\+t​p\)⋅p\.\\psi^\{\\prime\}\(t\)=1\+\\nabla h\(y\_\{1\}\+tp\)\\cdot p\.By the matrix determinant lemma,

detD​T​\(y1\+t​p\)=det\(I\+p​\(∇h\)T\)=1\+∇h​\(y1\+t​p\)⋅p\.\\det DT\(y\_\{1\}\+tp\)=\\det\\bigl\(I\+p\(\\nabla h\)^\{T\}\\bigr\)=1\+\\nabla h\(y\_\{1\}\+tp\)\\cdot p\.Henceψ′​\(t\)=detD​T​\(y1\+t​p\)\\psi^\{\\prime\}\(t\)=\\det DT\(y\_\{1\}\+tp\)\. By assumption this never vanishes; since it is continuous, its sign is constant, andψ\\psiis strictly monotone\. This contradictsψ​\(s\)=ψ​\(0\)\\psi\(s\)=\\psi\(0\)unlesss=0s=0\. Thereforey1=y2y\_\{1\}=y\_\{2\}\.

*Case 2:r=0r=0\.*Thenker⁡L=ℝ​\(p,0\)\\ker L=\\mathbb\{R\}\(p,0\)withp≠0p\\neq 0\. Choose a linear isomorphism

P:ℝn−1×ℝ→ℝnP:\\mathbb\{R\}^\{n\-1\}\\times\\mathbb\{R\}\\to\\mathbb\{R\}^\{n\}such thatP​\(0,1\)=pP\(0,1\)=p\. Let

Ω~=P−1​\(Ω\),h~​\(u,s\)=h​\(P​\(u,s\)\)\.\\widetilde\{\\Omega\}=P^\{\-1\}\(\\Omega\),\\qquad\\widetilde\{h\}\(u,s\)=h\(P\(u,s\)\)\.In coordinates\(u,s,z\)∈ℝn−1×ℝ×ℝ\(u,s,z\)\\in\\mathbb\{R\}^\{n\-1\}\\times\\mathbb\{R\}\\times\\mathbb\{R\}, define

L~​\(u,s,z\)=L​\(P​\(u,s\),z\)\.\\widetilde\{L\}\(u,s,z\)=L\(P\(u,s\),z\)\.Thenker⁡L~=ℝ​\(0,1,0\)\\ker\\widetilde\{L\}=\\mathbb\{R\}\(0,1,0\)\. Consider

H=ℝn−1×\{0\}×ℝ≅ℝn\.H=\\mathbb\{R\}^\{n\-1\}\\times\\\{0\\\}\\times\\mathbb\{R\}\\cong\\mathbb\{R\}^\{n\}\.The restrictionL~\|H:H→ℝn\\widetilde\{L\}\|\_\{H\}:H\\to\\mathbb\{R\}^\{n\}is a linear isomorphism\. Let

J:H→ℝn,J​\(u,0,z\)=\(u,z\),J:H\\to\\mathbb\{R\}^\{n\},\\qquad J\(u,0,z\)=\(u,z\),and define

M=J∘\(L~\|H\)−1∈G​Ln​\(ℝ\)\.M=J\\circ\(\\widetilde\{L\}\|\_\{H\}\)^\{\-1\}\\in GL\_\{n\}\(\\mathbb\{R\}\)\.Since\(0,s,0\)∈ker⁡L~\(0,s,0\)\\in\\ker\\widetilde\{L\}, we have

M​L~​\(u,s,z\)=J​\(u,0,z\)=\(u,z\)\.M\\widetilde\{L\}\(u,s,z\)=J\(u,0,z\)=\(u,z\)\.After the source coordinate changey=P​\(u,s\)y=P\(u,s\)and the target linear changeMM, the map becomes

T~​\(u,s\)=M​T​\(P​\(u,s\)\)=M​L~​\(u,s,h~​\(u,s\)\)=\(u,h~​\(u,s\)\)\.\\widetilde\{T\}\(u,s\)=MT\(P\(u,s\)\)=M\\widetilde\{L\}\(u,s,\\widetilde\{h\}\(u,s\)\)=\(u,\\widetilde\{h\}\(u,s\)\)\.These transformations are invertible, so they do not affect injectivity; the Jacobian determinant is only multiplied by a nonzero constant\.

The Jacobian ofT~\\widetilde\{T\}is

D​T~​\(u,s\)=\(In−10∂uh~​\(u,s\)∂sh~​\(u,s\)\),D\\widetilde\{T\}\(u,s\)=\\begin\{pmatrix\}I\_\{n\-1\}&0\\\\ \\partial\_\{u\}\\widetilde\{h\}\(u,s\)&\\partial\_\{s\}\\widetilde\{h\}\(u,s\)\\end\{pmatrix\},and hencedetD​T~​\(u,s\)=∂sh~​\(u,s\)\\det D\\widetilde\{T\}\(u,s\)=\\partial\_\{s\}\\widetilde\{h\}\(u,s\)\. Therefore

∂sh~​\(u,s\)≠0,\(u,s\)∈Ω~\.\\partial\_\{s\}\\widetilde\{h\}\(u,s\)\\neq 0,\\qquad\(u,s\)\\in\\widetilde\{\\Omega\}\.For fixeduu, the fiber

Iu=\{s∈ℝ:\(u,s\)∈Ω~\}I\_\{u\}=\\\{s\\in\\mathbb\{R\}:\(u,s\)\\in\\widetilde\{\\Omega\}\\\}is an interval, sinceΩ~\\widetilde\{\\Omega\}is convex\. The functions↦h~​\(u,s\)s\\mapsto\\widetilde\{h\}\(u,s\)has a continuous derivative which never vanishes onIuI\_\{u\}, hence it is strictly monotone\. IfT~​\(u1,s1\)=T~​\(u2,s2\)\\widetilde\{T\}\(u\_\{1\},s\_\{1\}\)=\\widetilde\{T\}\(u\_\{2\},s\_\{2\}\), the firstn−1n\-1components giveu1=u2u\_\{1\}=u\_\{2\}, and the last component givesh~​\(u1,s1\)=h~​\(u1,s2\)\\widetilde\{h\}\(u\_\{1\},s\_\{1\}\)=\\widetilde\{h\}\(u\_\{1\},s\_\{2\}\)\. Strict monotonicity then yieldss1=s2s\_\{1\}=s\_\{2\}\. HenceT~\\widetilde\{T\}, and thereforeTT, is injective\. ∎

###### Proof of Theorem[3\.2](https://arxiv.org/html/2606.10806#S3.Thmtheorem2), first algebraic form\.

WriteF​\(x\)=A​σ​\(B​x\+c\)F\(x\)=A\\sigma\(Bx\+c\), whereA∈ℝn×\(n\+1\)A\\in\\mathbb\{R\}^\{n\\times\(n\+1\)\},B∈ℝ\(n\+1\)×nB\\in\\mathbb\{R\}^\{\(n\+1\)\\times n\}, andc∈ℝn\+1c\\in\\mathbb\{R\}^\{n\+1\}\. Sincerank⁡B=n\\operatorname\{rank\}B=n, somennrows ofBBare linearly independent\. After reordering the hidden units, assume that the firstnnrows are independent\. By an invertible affine change of input variables, the firstnnpreactivations can be normalized tox1,…,xnx\_\{1\},\\ldots,x\_\{n\}\. Thus the network can be written as

F​\(x\)=C​σ​\(x\)\+w​σ​\(aT​x\+β\),F\(x\)=C\\sigma\(x\)\+w\\sigma\(a^\{T\}x\+\\beta\),whereC∈ℝn×nC\\in\\mathbb\{R\}^\{n\\times n\},w∈ℝnw\\in\\mathbb\{R\}^\{n\},a∈ℝna\\in\\mathbb\{R\}^\{n\},β∈ℝ\\beta\\in\\mathbb\{R\}, andb∈ℝnb\\in\\mathbb\{R\}^\{n\}\. Equivalently,

B=\(InaT\),A=\(Cw\)\.B=\\begin\{pmatrix\}I\_\{n\}\\\\ a^\{T\}\\end\{pmatrix\},\\qquad A=\\begin\{pmatrix\}C&w\\end\{pmatrix\}\.LetΦ:ℝn→\(0,1\)n\\Phi:\\mathbb\{R\}^\{n\}\\to\(0,1\)^\{n\}beΦ​\(x\)=σ​\(x\)\\Phi\(x\)=\\sigma\(x\), and set

h​\(y\)=σ​\(aT​Φ−1​\(y\)\+β\),y∈\(0,1\)n\.h\(y\)=\\sigma\\bigl\(a^\{T\}\\Phi^\{\-1\}\(y\)\+\\beta\\bigr\),\\qquad y\\in\(0,1\)^\{n\}\.Then

F​\(x\)=C​y\+w​h​\(y\)=T​\(y\),y=Φ​\(x\)\.F\(x\)=Cy\+wh\(y\)=T\(y\),\\qquad y=\\Phi\(x\)\.SincedetD​F​\(x\)\>0\\det DF\(x\)\>0andΦ\\Phiis a diffeomorphism,detD​T​\(y\)≠0\\det DT\(y\)\\neq 0for ally∈\(0,1\)ny\\in\(0,1\)^\{n\}\. The domain\(0,1\)n\(0,1\)^\{n\}is open and convex, so Lemma[3\.3](https://arxiv.org/html/2606.10806#S3.Thmtheorem3)implies thatTTis injective\. HenceFFis injective\. ∎

This proof highlights the fundamental role of convexity and a one\-dimensional kernel in the low\-width case of the NJC\.

#### 3\.2\.2Second algebraic proof: one\-dimensional monotonicity along the kernel direction

The following second proof was obtained by Moonshine using DeepSeek\-V4\-pro\.

###### Proof\.

Again writeF​\(x\)=A​σ​\(B​x\+c\)F\(x\)=A\\sigma\(Bx\+c\), whereA∈ℝn×\(n\+1\)A\\in\\mathbb\{R\}^\{n\\times\(n\+1\)\},B∈ℝ\(n\+1\)×nB\\in\\mathbb\{R\}^\{\(n\+1\)\\times n\}, andc∈ℝn\+1c\\in\\mathbb\{R\}^\{n\+1\}\. As above, after normalization the network can be written as

F​\(x\)=C​σ​\(x\)\+w​σ​\(aT​x\+β\)\.F\(x\)=C\\sigma\(x\)\+w\\sigma\(a^\{T\}x\+\\beta\)\.Its Jacobian is

D​F​\(x\)=C​S​\(x\)\+sn\+1​\(x\)​w​aT,DF\(x\)=CS\(x\)\+s\_\{n\+1\}\(x\)wa^\{T\},where

S​\(x\)=diag⁡\(σ′​\(x1\),…,σ′​\(xn\)\),sn\+1​\(x\)=σ′​\(aT​x\+β\)\.S\(x\)=\\operatorname\{diag\}\(\\sigma^\{\\prime\}\(x\_\{1\}\),\\ldots,\\sigma^\{\\prime\}\(x\_\{n\}\)\),\\qquad s\_\{n\+1\}\(x\)=\\sigma^\{\\prime\}\(a^\{T\}x\+\\beta\)\.All entries ofS​\(x\)S\(x\)andsn\+1​\(x\)s\_\{n\+1\}\(x\)are positive\.

Suppose, for contradiction, thatFFis not injective\. Then there existp≠qp\\neq qsuch thatF​\(p\)=F​\(q\)F\(p\)=F\(q\)\. Let

u=σ​\(p\)∈\(0,1\)n,v=σ​\(q\)∈\(0,1\)n\.u=\\sigma\(p\)\\in\(0,1\)^\{n\},\\qquad v=\\sigma\(q\)\\in\(0,1\)^\{n\}\.Define

G​\(u\)=σ​\(aT​σ−1​\(u\)\+β\),G\(u\)=\\sigma\\bigl\(a^\{T\}\\sigma^\{\-1\}\(u\)\+\\beta\\bigr\),whereσ−1​\(s\)=log⁡\(s/\(1−s\)\)\\sigma^\{\-1\}\(s\)=\\log\(s/\(1\-s\)\)acts componentwise\. Then

F​\(x\)=C​u\+w​G​\(u\),u=σ​\(x\)\.F\(x\)=Cu\+wG\(u\),\\qquad u=\\sigma\(x\)\.SinceA=\[C,w\]A=\[C,w\]has ranknn, its kernel is one\-dimensional\. Choose

k=\(k^kn\+1\)∈ker⁡A∖\{0\}\.k=\\begin\{pmatrix\}\\widehat\{k\}\\\\ k\_\{n\+1\}\\end\{pmatrix\}\\in\\ker A\\setminus\\\{0\\\}\.Then

C​k^\+kn\+1​w=0\.C\\widehat\{k\}\+k\_\{n\+1\}w=0\.The equalityF​\(p\)=F​\(q\)F\(p\)=F\(q\)implies that the difference of the hidden\-layer outputs lies inker⁡A\\ker A\. Thus for someλ≠0\\lambda\\neq 0,

\(vG​\(v\)\)−\(uG​\(u\)\)=λ​k\.\\begin\{pmatrix\}v\\\\ G\(v\)\\end\{pmatrix\}\-\\begin\{pmatrix\}u\\\\ G\(u\)\\end\{pmatrix\}=\\lambda k\.In particular,

v=u\+λ​k^,G​\(v\)−G​\(u\)=λ​kn\+1\.v=u\+\\lambda\\widehat\{k\},\\qquad G\(v\)\-G\(u\)=\\lambda k\_\{n\+1\}\.Define

f​\(t\)=G​\(u\+t​k^\)−G​\(u\)−t​kn\+1f\(t\)=G\(u\+t\\widehat\{k\}\)\-G\(u\)\-tk\_\{n\+1\}for thosettfor whichu\+t​k^∈\(0,1\)nu\+t\\widehat\{k\}\\in\(0,1\)^\{n\}\. By \(2\),f​\(0\)=f​\(λ\)=0f\(0\)=f\(\\lambda\)=0, withλ≠0\\lambda\\neq 0\. Let

xt=σ−1​\(u\+t​k^\)\.x\_\{t\}=\\sigma^\{\-1\}\(u\+t\\widehat\{k\}\)\.Convexity of\(0,1\)n\(0,1\)^\{n\}ensures thatxtx\_\{t\}is well\-defined forttbetween0andλ\\lambda\. Differentiating gives

f′​\(t\)=∇G​\(u\+t​k^\)⋅k^−kn\+1\.f^\{\\prime\}\(t\)=\\nabla G\(u\+t\\widehat\{k\}\)\\cdot\\widehat\{k\}\-k\_\{n\+1\}\.A direct computation yields

∇G​\(u\+t​k^\)=sn\+1​\(xt\)​aT​S​\(xt\)−1,\\nabla G\(u\+t\\widehat\{k\}\)=s\_\{n\+1\}\(x\_\{t\}\)a^\{T\}S\(x\_\{t\}\)^\{\-1\},and hence

f′​\(t\)=sn\+1​\(xt\)​aT​S​\(xt\)−1​k^−kn\+1\.f^\{\\prime\}\(t\)=s\_\{n\+1\}\(x\_\{t\}\)a^\{T\}S\(x\_\{t\}\)^\{\-1\}\\widehat\{k\}\-k\_\{n\+1\}\.We relate this expression todetD​F​\(xt\)\\det DF\(x\_\{t\}\)\.

*Case 1:kn\+1≠0k\_\{n\+1\}\\neq 0\.*Rescalekkso thatkn\+1=−1k\_\{n\+1\}=\-1\. Then \(1\) givesw=C​k^w=C\\widehat\{k\}\. Therefore

D​F​\(xt\)=C​\(S​\(xt\)\+sn\+1​\(xt\)​k^​aT\)\.DF\(x\_\{t\}\)=C\\bigl\(S\(x\_\{t\}\)\+s\_\{n\+1\}\(x\_\{t\}\)\\widehat\{k\}a^\{T\}\\bigr\)\.SinceA=\[C,w\]A=\[C,w\]has ranknn, andw=C​k^w=C\\widehat\{k\}, the matrixCCis invertible\. The matrix determinant lemma gives

detD​F​\(xt\)=detC​detS​\(xt\)​\(1\+sn\+1​\(xt\)​aT​S​\(xt\)−1​k^\)\.\\det DF\(x\_\{t\}\)=\\det C\\det S\(x\_\{t\}\)\\bigl\(1\+s\_\{n\+1\}\(x\_\{t\}\)a^\{T\}S\(x\_\{t\}\)^\{\-1\}\\widehat\{k\}\\bigr\)\.By \(3\),

f′​\(t\)=1\+sn\+1​\(xt\)​aT​S​\(xt\)−1​k^=detD​F​\(xt\)detC​detS​\(xt\)\.f^\{\\prime\}\(t\)=1\+s\_\{n\+1\}\(x\_\{t\}\)a^\{T\}S\(x\_\{t\}\)^\{\-1\}\\widehat\{k\}=\\frac\{\\det DF\(x\_\{t\}\)\}\{\\det C\\det S\(x\_\{t\}\)\}\.SincedetD​F​\(xt\)\>0\\det DF\(x\_\{t\}\)\>0,detS​\(xt\)\>0\\det S\(x\_\{t\}\)\>0, anddetC≠0\\det C\\neq 0is constant,f′f^\{\\prime\}has a fixed nonzero sign on the interval between0andλ\\lambda\. Henceffis strictly monotone there\.

*Case 2:kn\+1=0k\_\{n\+1\}=0\.*ThenC​k^=0C\\widehat\{k\}=0, withk^≠0\\widehat\{k\}\\neq 0\. Thusrank⁡C≤n−1\\operatorname\{rank\}C\\leq n\-1\. Since\[C,w\]\[C,w\]has ranknn, we haverank⁡C=n−1\\operatorname\{rank\}C=n\-1, andw∉Im⁡Cw\\notin\\operatorname\{Im\}C\. Choosev0≠0v\_\{0\}\\neq 0spanning the left kernelker⁡CT\\ker C^\{T\}\. Thenv0T​w≠0v\_\{0\}^\{T\}w\\neq 0\. The adjugate matrix has rank one and can be written as

adj⁡\(C\)=γ​k^​v0T,γ≠0\.\\operatorname\{adj\}\(C\)=\\gamma\\widehat\{k\}v\_\{0\}^\{T\},\\qquad\\gamma\\neq 0\.Writing

D​F​\(xt\)=\(C\+sn\+1​\(xt\)​w​aT​S​\(xt\)−1\)​S​\(xt\),DF\(x\_\{t\}\)=\\bigl\(C\+s\_\{n\+1\}\(x\_\{t\}\)wa^\{T\}S\(x\_\{t\}\)^\{\-1\}\\bigr\)S\(x\_\{t\}\),and using the rank\-one perturbation formula for a rankn−1n\-1matrix,

det\(C\+u​vT\)=vT​adj⁡\(C\)​u,\\det\(C\+uv^\{T\}\)=v^\{T\}\\operatorname\{adj\}\(C\)u,we obtain

detD​F​\(xt\)=detS​\(xt\)​sn\+1​\(xt\)​aT​S​\(xt\)−1​k^​γ​\(v0T​w\)\.\\det DF\(x\_\{t\}\)=\\det S\(x\_\{t\}\)\\,s\_\{n\+1\}\(x\_\{t\}\)a^\{T\}S\(x\_\{t\}\)^\{\-1\}\\widehat\{k\}\\,\\gamma\(v\_\{0\}^\{T\}w\)\.Sincekn\+1=0k\_\{n\+1\}=0, \(3\) becomes

f′​\(t\)=sn\+1​\(xt\)​aT​S​\(xt\)−1​k^\.f^\{\\prime\}\(t\)=s\_\{n\+1\}\(x\_\{t\}\)a^\{T\}S\(x\_\{t\}\)^\{\-1\}\\widehat\{k\}\.Therefore

f′​\(t\)=detD​F​\(xt\)detS​\(xt\)​γ​\(v0T​w\)\.f^\{\\prime\}\(t\)=\\frac\{\\det DF\(x\_\{t\}\)\}\{\\det S\(x\_\{t\}\)\\gamma\(v\_\{0\}^\{T\}w\)\}\.AgaindetD​F​\(xt\)\>0\\det DF\(x\_\{t\}\)\>0,detS​\(xt\)\>0\\det S\(x\_\{t\}\)\>0, andγ​\(v0T​w\)≠0\\gamma\(v\_\{0\}^\{T\}w\)\\neq 0, sof′f^\{\\prime\}has a fixed nonzero sign\. Thusffis strictly monotone\.

In both cases,ffis strictly monotone on the closed interval with endpoints0andλ\\lambda, butf​\(0\)=f​\(λ\)=0f\(0\)=f\(\\lambda\)=0andλ≠0\\lambda\\neq 0, a contradiction\. Hence no distinctp,qp,qwithF​\(p\)=F​\(q\)F\(p\)=F\(q\)exist, andFFis injective\. ∎

#### 3\.2\.3Geometric\-topological proof: the hidden\-layer submanifold and one\-dimensional fibers

The following geometric\-topological proof was developed with the assistance of ChatGPT, using GPT\-5\.5\-pro through interactive use of its web interface\.

Use the notation

F​\(x\)=A​σ​\(B​x\+c\),F\(x\)=A\\sigma\(Bx\+c\),where

A∈ℝn×\(n\+1\),B∈ℝ\(n\+1\)×n\.A\\in\\mathbb\{R\}^\{n\\times\(n\+1\)\},\\qquad B\\in\\mathbb\{R\}^\{\(n\+1\)\\times n\}\.Assume

rank⁡A=rank⁡B=n,detD​F​\(x\)\>0∀x∈ℝn\.\\operatorname\{rank\}A=\\operatorname\{rank\}B=n,\\qquad\\det DF\(x\)\>0\\quad\\forall x\\in\\mathbb\{R\}^\{n\}\.Let

h​\(x\)=σ​\(B​x\+c\),h\(x\)=\\sigma\(Bx\+c\),and define

X1=h​\(ℝn\)=σ​\(c\+Im⁡B\)⊂\(0,1\)n\+1,X2=ker⁡A\.X\_\{1\}=h\(\\mathbb\{R\}^\{n\}\)=\\sigma\(c\+\\operatorname\{Im\}B\)\\subset\(0,1\)^\{n\+1\},\\qquad X\_\{2\}=\\ker A\.SinceBBhas full column rank,c\+Im⁡Bc\+\\operatorname\{Im\}Bis an affinenn\-plane inℝn\+1\\mathbb\{R\}^\{n\+1\}\. Since the componentwise sigmoid is a diffeomorphism fromℝn\+1\\mathbb\{R\}^\{n\+1\}to\(0,1\)n\+1\(0,1\)^\{n\+1\},X1X\_\{1\}is a smoothly embeddednn\-dimensional submanifold of\(0,1\)n\+1\(0,1\)^\{n\+1\}\. MoreoverdimX2=1\\dim X\_\{2\}=1\.

By the hidden\-layer intersection formulation,FFis injective if and only if

\(p\+X2\)∩X1=\{p\},∀p∈X1\.\(p\+X\_\{2\}\)\\cap X\_\{1\}=\\\{p\\\},\\qquad\\forall p\\in X\_\{1\}\.It is therefore enough to prove that every affine line with directionX2X\_\{2\}meetsX1X\_\{1\}at most once\.

Fixp∈X1p\\in X\_\{1\}, and choosek∈X2∖\{0\}k\\in X\_\{2\}\\setminus\\\{0\\\}withX2=ℝ​kX\_\{2\}=\\mathbb\{R\}k\. Define

Ip=\{t∈ℝ:p\+t​k∈\(0,1\)n\+1\}\.I\_\{p\}=\\\{t\\in\\mathbb\{R\}:p\+tk\\in\(0,1\)^\{n\+1\}\\\}\.This is an open interval containing0\.

We describeX1X\_\{1\}in logit coordinates\. Let

logit⁡\(z\)=\(log⁡z11−z1,…,log⁡zn\+11−zn\+1\)\.\\operatorname\{logit\}\(z\)=\\left\(\\log\\frac\{z\_\{1\}\}\{1\-z\_\{1\}\},\\ldots,\\log\\frac\{z\_\{n\+1\}\}\{1\-z\_\{n\+1\}\}\\right\)\.Then

z∈X1⇔logit⁡\(z\)−c∈Im⁡B\.z\\in X\_\{1\}\\iff\\operatorname\{logit\}\(z\)\-c\\in\\operatorname\{Im\}B\.Choose0≠λ∈\(Im⁡B\)⟂0\\neq\\lambda\\in\(\\operatorname\{Im\}B\)^\{\\perp\}, and define

Φ:\(0,1\)n\+1→ℝ,Φ​\(z\)=λ⋅\(logit⁡\(z\)−c\)\.\\Phi:\(0,1\)^\{n\+1\}\\to\\mathbb\{R\},\\qquad\\Phi\(z\)=\\lambda\\cdot\(\\operatorname\{logit\}\(z\)\-c\)\.Then

X1=\{z∈\(0,1\)n\+1:Φ​\(z\)=0\}\.X\_\{1\}=\\\{z\\in\(0,1\)^\{n\+1\}:\\Phi\(z\)=0\\\}\.RestrictΦ\\Phito the affine linep\+X2p\+X\_\{2\}, and define

g:Ip→ℝ,g​\(t\)=Φ​\(p\+t​k\)\.g:I\_\{p\}\\to\\mathbb\{R\},\\qquad g\(t\)=\\Phi\(p\+tk\)\.Then

p\+t​k∈X1⇔g​\(t\)=0\.p\+tk\\in X\_\{1\}\\iff g\(t\)=0\.Thus intersections ofp\+X2p\+X\_\{2\}withX1X\_\{1\}correspond exactly to zeros ofgg, andg​\(0\)=0g\(0\)=0\.

We first prove that ift0∈Ipt\_\{0\}\\in I\_\{p\}is a zero ofgg, then

g′​\(t0\)≠0\.g^\{\\prime\}\(t\_\{0\}\)\\neq 0\.This assertion concerns the derivative at zeros only; it does not say thatg′g^\{\\prime\}is nonzero on all ofIpI\_\{p\}\.

Letg​\(t0\)=0g\(t\_\{0\}\)=0, and setz0=p\+t0​kz\_\{0\}=p\+t\_\{0\}k\. Thenz0∈X1z\_\{0\}\\in X\_\{1\}, soz0=h​\(x0\)=σ​\(B​x0\+c\)z\_\{0\}=h\(x\_\{0\}\)=\\sigma\(Bx\_\{0\}\+c\)for somex0x\_\{0\}\. Define

D​\(x0\)=diag⁡\(σ′​\(\(B​x0\+c\)1\),…,σ′​\(\(B​x0\+c\)n\+1\)\)\.D\(x\_\{0\}\)=\\operatorname\{diag\}\\bigl\(\\sigma^\{\\prime\}\(\(Bx\_\{0\}\+c\)\_\{1\}\),\\ldots,\\sigma^\{\\prime\}\(\(Bx\_\{0\}\+c\)\_\{n\+1\}\)\\bigr\)\.Then

Tz0​X1=D​\(x0\)​Im⁡B\.T\_\{z\_\{0\}\}X\_\{1\}=D\(x\_\{0\}\)\\operatorname\{Im\}B\.
###### Lemma 3\.4\.

At every zerot0t\_\{0\}ofgg, one has

k∉Tz0​X1\.k\\notin T\_\{z\_\{0\}\}X\_\{1\}\.

###### Proof\.

Ifk∈Tz0​X1k\\in T\_\{z\_\{0\}\}X\_\{1\}, thenk=D​\(x0\)​B​vk=D\(x\_\{0\}\)Bvfor somev∈ℝnv\\in\\mathbb\{R\}^\{n\}\. Sincek∈X2=ker⁡Ak\\in X\_\{2\}=\\ker A,

0=A​k=A​D​\(x0\)​B​v\.0=Ak=AD\(x\_\{0\}\)Bv\.ButA​D​\(x0\)​B=D​F​\(x0\)AD\(x\_\{0\}\)B=DF\(x\_\{0\}\), anddetD​F​\(x0\)\>0\\det DF\(x\_\{0\}\)\>0, soA​D​\(x0\)​BAD\(x\_\{0\}\)Bis invertible\. Hencev=0v=0, and thereforek=0k=0, a contradiction\. ∎

SinceX1X\_\{1\}is the zero level set ofΦ\\Phi, and onX1X\_\{1\}

Tz0​X1=ker⁡d​Φz0,T\_\{z\_\{0\}\}X\_\{1\}=\\ker d\\Phi\_\{z\_\{0\}\},Lemma[3\.4](https://arxiv.org/html/2606.10806#S3.Thmtheorem4)implies

d​Φz0​\(k\)≠0\.d\\Phi\_\{z\_\{0\}\}\(k\)\\neq 0\.By the chain rule, becauseg​\(t\)=Φ​\(p\+t​k\)g\(t\)=\\Phi\(p\+tk\),

g′​\(t0\)=d​Φz0​\(k\)\.g^\{\\prime\}\(t\_\{0\}\)=d\\Phi\_\{z\_\{0\}\}\(k\)\.Thus every zero ofggis nondegenerate, in the sense that its derivative is nonzero\.

We next show that the signs ofg′g^\{\\prime\}at all zeros are the same\. For this purpose we use a separate linear\-algebra lemma\.

###### Lemma 3\.5\.

Let

A:ℝn\+1→ℝn,B:ℝn→ℝn\+1A:\\mathbb\{R\}^\{n\+1\}\\to\\mathbb\{R\}^\{n\},\\qquad B:\\mathbb\{R\}^\{n\}\\to\\mathbb\{R\}^\{n\+1\}be linear maps withrank⁡A=rank⁡B=n\\operatorname\{rank\}A=\\operatorname\{rank\}B=n\. Suppose

ker⁡A=ℝ​k,0≠λ∈\(Im⁡B\)⟂\.\\ker A=\\mathbb\{R\}k,\\qquad 0\\neq\\lambda\\in\(\\operatorname\{Im\}B\)^\{\\perp\}\.Then there exists a nonzero constantC=C​\(A,B,k,λ\)C=C\(A,B,k,\\lambda\)such that for everyL∈G​Ln\+1​\(ℝ\)L\\in GL\_\{n\+1\}\(\\mathbb\{R\}\),

det\(A​L​B\)=C​detL​λT​L−1​k\.\\det\(ALB\)=C\\det L\\,\\lambda^\{T\}L^\{\-1\}k\.In particular, ifdetL\>0\\det L\>0, then there is a fixed signε∈\{±1\}\\varepsilon\\in\\\{\\pm 1\\\}, depending only onA,B,k,λA,B,k,\\lambda, such that

sgn​det\(A​L​B\)=ε​sgn⁡\(λT​L−1​k\)\.\\operatorname\{sgn\}\\det\(ALB\)=\\varepsilon\\,\\operatorname\{sgn\}\(\\lambda^\{T\}L^\{\-1\}k\)\.

###### Proof\.

First, for anyM=\[m1,…,mn\]∈ℝ\(n\+1\)×nM=\[m\_\{1\},\\ldots,m\_\{n\}\]\\in\\mathbb\{R\}^\{\(n\+1\)\\times n\}, sinceAAhas ranknnandker⁡A=ℝ​k\\ker A=\\mathbb\{R\}k, there is a nonzero constantcAc\_\{A\}, depending only onAAandkk, such that

det\(A​M\)=cA​det\[M,k\]\.\\det\(AM\)=c\_\{A\}\\det\[M,k\]\.Indeed, choose a basisu1,…,unu\_\{1\},\\ldots,u\_\{n\}of a complement toℝ​k\\mathbb\{R\}k, and setU=\[u1,…,un\]U=\[u\_\{1\},\\ldots,u\_\{n\}\]\. Then\[U,k\]\[U,k\]is invertible andA​UAUis invertible\. EveryMMcan be written uniquely asM=U​Y\+k​αTM=UY\+k\\alpha^\{T\}\. Then

det\(A​M\)=det\(A​U\)​detY,det\[M,k\]=det\[U,k\]​detY,\\det\(AM\)=\\det\(AU\)\\det Y,\\qquad\\det\[M,k\]=\\det\[U,k\]\\det Y,which gives the desired proportionality\.

Second, sincerank⁡B=n\\operatorname\{rank\}B=n, the imageIm⁡B\\operatorname\{Im\}Bis annn\-dimensional hyperplane inℝn\+1\\mathbb\{R\}^\{n\+1\}\. The mapq↦det\[B,q\]q\\mapsto\\det\[B,q\]is a linear functional vanishing onIm⁡B\\operatorname\{Im\}B, as isq↦λT​qq\\mapsto\\lambda^\{T\}q\. Since the space of such functionals is one\-dimensional, there existscB≠0c\_\{B\}\\neq 0such that

det\[B,q\]=cB​λT​q,∀q∈ℝn\+1\.\\det\[B,q\]=c\_\{B\}\\lambda^\{T\}q,\\qquad\\forall q\\in\\mathbb\{R\}^\{n\+1\}\.Now takeM=L​BM=LB\. Then

det\(A​L​B\)=cA​det\[L​B,k\]\.\\det\(ALB\)=c\_\{A\}\\det\[LB,k\]\.Since\[L​B,k\]=L​\[B,L−1​k\]\[LB,k\]=L\[B,L^\{\-1\}k\],

det\[L​B,k\]=detL​det\[B,L−1​k\]=detL​cB​λT​L−1​k\.\\det\[LB,k\]=\\det L\\det\[B,L^\{\-1\}k\]=\\det L\\,c\_\{B\}\\lambda^\{T\}L^\{\-1\}k\.Thus

det\(A​L​B\)=cA​cB​detL​λT​L−1​k\.\\det\(ALB\)=c\_\{A\}c\_\{B\}\\det L\\,\\lambda^\{T\}L^\{\-1\}k\.SetC=cA​cB≠0C=c\_\{A\}c\_\{B\}\\neq 0\. The sign statement follows immediately\. ∎

Return to the geometric proof\. At a zerot0t\_\{0\}, write againz0=p\+t0​k=h​\(x0\)z\_\{0\}=p\+t\_\{0\}k=h\(x\_\{0\}\)\. Sincez0=σ​\(B​x0\+c\)z\_\{0\}=\\sigma\(Bx\_\{0\}\+c\),

D​\(x0\)=diag⁡\(z0,1​\(1−z0,1\),…,z0,n\+1​\(1−z0,n\+1\)\),D\(x\_\{0\}\)=\\operatorname\{diag\}\\bigl\(z\_\{0,1\}\(1\-z\_\{0,1\}\),\\ldots,z\_\{0,n\+1\}\(1\-z\_\{0,n\+1\}\)\\bigr\),which is a positive diagonal matrix\. Moreover,

d​Φz0​\(w\)=λT​D​\(x0\)−1​w,sod​Φz0​\(k\)=λT​D​\(x0\)−1​k\.d\\Phi\_\{z\_\{0\}\}\(w\)=\\lambda^\{T\}D\(x\_\{0\}\)^\{\-1\}w,\\qquad\\text\{so\}\\qquad d\\Phi\_\{z\_\{0\}\}\(k\)=\\lambda^\{T\}D\(x\_\{0\}\)^\{\-1\}k\.Applying Lemma[3\.5](https://arxiv.org/html/2606.10806#S3.Thmtheorem5)withL=D​\(x0\)L=D\(x\_\{0\}\), we get

det\(A​D​\(x0\)​B\)=C​detD​\(x0\)​d​Φz0​\(k\)\.\\det\(AD\(x\_\{0\}\)B\)=C\\det D\(x\_\{0\}\)\\,d\\Phi\_\{z\_\{0\}\}\(k\)\.SincedetD​\(x0\)\>0\\det D\(x\_\{0\}\)\>0, there exists a fixed signε\\varepsilon, independent of the intersection point, such that

sgn​det\(A​D​\(x0\)​B\)=ε​sgn⁡d​Φz0​\(k\)\.\\operatorname\{sgn\}\\det\(AD\(x\_\{0\}\)B\)=\\varepsilon\\,\\operatorname\{sgn\}d\\Phi\_\{z\_\{0\}\}\(k\)\.Butdet\(A​D​\(x0\)​B\)=detD​F​\(x0\)\>0\\det\(AD\(x\_\{0\}\)B\)=\\det DF\(x\_\{0\}\)\>0, so all zerost0t\_\{0\}have the same sign ofd​Φz0​\(k\)d\\Phi\_\{z\_\{0\}\}\(k\), and hence the same sign ofg′​\(t0\)g^\{\\prime\}\(t\_\{0\}\)\.

Finally, we use the following elementary one\-dimensional fact: ifg:I→ℝg:I\\to\\mathbb\{R\}isC1C^\{1\}, andt1<t2t\_\{1\}<t\_\{2\}are two adjacent zeros withg′​\(t1\)≠0g^\{\\prime\}\(t\_\{1\}\)\\neq 0andg′​\(t2\)≠0g^\{\\prime\}\(t\_\{2\}\)\\neq 0, then

sgn⁡g′​\(t1\)=−sgn⁡g′​\(t2\)\.\\operatorname\{sgn\}g^\{\\prime\}\(t\_\{1\}\)=\-\\operatorname\{sgn\}g^\{\\prime\}\(t\_\{2\}\)\.Indeed, ifg′​\(t1\)\>0g^\{\\prime\}\(t\_\{1\}\)\>0, thenggis positive just to the right oft1t\_\{1\}\. Since there are no zeros in\(t1,t2\)\(t\_\{1\},t\_\{2\}\), it remains positive on that interval\. Sinceg​\(t2\)=0g\(t\_\{2\}\)=0andg′​\(t2\)≠0g^\{\\prime\}\(t\_\{2\}\)\\neq 0, it follows thatg′​\(t2\)<0g^\{\\prime\}\(t\_\{2\}\)<0\. The other case is analogous\.

If there were another intersection, then for someλ0∈Ip∖\{0\}\\lambda\_\{0\}\\in I\_\{p\}\\setminus\\\{0\\\},g​\(λ0\)=0g\(\\lambda\_\{0\}\)=0\. On the compact interval with endpoints0andλ0\\lambda\_\{0\}, the zeros are isolated and hence finite\. Thus two adjacent zeros can be chosen\. The one\-dimensional fact forces the signs ofg′g^\{\\prime\}at these two zeros to be opposite, whereas the argument above shows that all such signs are equal\. This contradiction proves

g−1​\(0\)=\{0\}\.g^\{\-1\}\(0\)=\\\{0\\\}\.Equivalently,

\(p\+X2\)∩X1=\{p\}\.\(p\+X\_\{2\}\)\\cap X\_\{1\}=\\\{p\\\}\.Sincep∈X1p\\in X\_\{1\}was arbitrary,FFis globally injective\.

This proof reveals the essence of the caseN=n\+1N=n\+1: the one\-dimensional output kernel reduces the intersection problem to a one\-variable zero problem; nondegenerate zeros in one dimension have alternating derivative signs, while the positive Jacobian condition forces all local signs to agree\. This mechanism no longer holds automatically for higher\-dimensional fibers, which explains the difficulty of the caseN≥n\+2N\\geq n\+2\.

### 3\.3The higher\-width caseN≥n\+2N\\geq n\+2remains open

ForN≥n\+2N\\geq n\+2, the dimension of the kernelX2X\_\{2\}is at least two\. Local positive index no longer excludes multiple intersections through one\-dimensional sign alternation\. Higher\-dimensional maps can have multiple regular zeros, all with local index\+1\+1, while still failing to be injective\. Thus the Neural Jacobian Conjecture remains open in this higher\-width regime\.

## 4Conclusion and Outlook

By reflecting on the classical Jacobian conjecture, Moonshine extracted the core principle “local nondegeneracy implies global injectivity” and transferred it to one\-hidden\-layer affine\-ridge sigmoid networks, thereby proposing the Neural Jacobian Conjecture\. If fully true, the conjecture would reveal an intrinsic rigidity of a special class of neural networks\. Even if it is eventually disproved, its exploration helps clarify the boundary between local diffeomorphism and global injectivity\.

Moonshine has proved the NJC in the lowest nontrivial widthsN=nN=nandN=n\+1N=n\+1, providing initial evidence for its plausibility\. For the general higher\-width caseN≥n\+2N\\geq n\+2, the conjecture is neither proved nor disproved and remains an active open problem\. This exemplifies Moonshine’s working mode as a conjecture\-generating mathematical agent: it formulates precise conjectures, establishes rigorous partial results, and identifies the unresolved boundary that guides subsequent research\.

The complete source code, research logs, and intermediate verification records are available in the project repository:[https://github\.com/DeepMathLLM/Moonshine](https://github.com/DeepMathLLM/Moonshine)\.

## Appendix ASupplementary Remarks

## References

- \[1\]Bass, H\., Connell, E\. H\., and Wright, D\. The Jacobian conjecture: reduction of degree and formal expansion of the inverse\.*Bulletin of the American Mathematical Society \(New Series\)*, 7\(2\), 287–330, 1982\.
- \[2\]Gale, D\., and Nikaido, H\. The Jacobian matrix and global univalence of mappings\.*Mathematische Annalen*, 159, 81–93, 1965\.
- \[3\]Guillemin, V\., and Pollack, A\.*Differential Topology*\. Prentice\-Hall, Englewood Cliffs, 1974\.
- \[4\]Hirsch, M\. W\.*Differential Topology*\. Graduate Texts in Mathematics, Vol\. 33\. Springer, New York, 1976\.
- \[5\]Horn, R\. A\., and Johnson, C\. R\.*Matrix Analysis*\. 2nd ed\. Cambridge University Press, Cambridge, 2012\.
- \[6\]Keller, O\.\-H\. Ganze Cremona\-Transformationen\.*Monatshefte fuer Mathematik und Physik*, 47, 299–306, 1939\.
- \[7\]Pinchuk, S\. A counterexample to the strong real Jacobian conjecture\.*Mathematische Zeitschrift*, 217, 1–4, 1994\.
- \[8\]Smale, S\. Mathematical problems for the next century\.*The Mathematical Intelligencer*, 20\(2\), 7–15, 1998\.
- \[9\]van den Essen, A\.*Polynomial Automorphisms and the Jacobian Conjecture*\. Progress in Mathematics, Vol\. 190\. Birkhaeuser, Basel, 2000\.

Similar Articles

RMA: an Agentic System for Research-Level Mathematical Problems

arXiv cs.AI

Research Math Agents (RMA) is an agentic framework for automated reasoning on research-level mathematical problems, achieving state-of-the-art results on the First Proof benchmark by solving 8 out of 10 problems, outperforming strong baselines like GPT-5.2R and Aletheia.

NoisyCoconut: Counterfactual Consensus via Latent Space Reasoning

arXiv cs.LG

This paper introduces NoisyCoconut, an inference-time method that improves LLM reliability by injecting noise into latent trajectories to generate diverse reasoning paths. The approach enables models to abstain when uncertain, significantly reducing error rates in mathematical reasoning tasks without requiring retraining.

AI Co-Mathematician: Accelerating Mathematicians with Agentic AI

Hugging Face Daily Papers

This paper introduces the AI Co-Mathematician, a workbench that uses agentic AI to support mathematicians in open-ended research tasks like ideation and theorem proving. Early tests show the system achieving state-of-the-art results on hard problem-solving benchmarks, including a 48% score on FrontierMath Tier 4.