Predicting Channel Closures in the Lightning Network with Machine Learning
Summary
This paper explores predicting whether Lightning Network channels will close mutually or via forced closure using machine learning on gossip data. An MLP with temporal features outperforms graph-based models, and the dataset is publicly released.
View Cached Full Text
Cached at: 05/14/26, 06:18 AM
# Predicting Channel Closures in the Lightning Network with Machine Learning
Source: [https://arxiv.org/html/2605.12759](https://arxiv.org/html/2605.12759)
Simone Antonelli1,\*, Vincent Davis2, Harrison Rush2, Anthony Potdevin2, Jesse Shrader2, Vikash Singh3, Emanuele Rossi2,4
###### Abstract
The Lightning Network \(LN\) is a second\-layer protocol for Bitcoin designed to enable fast and cost\-efficient off\-chain transactions\. Channels in the LN can be closed either by mutual agreement or unilaterally through a*forced closure*, which locks the involved capital for an extended period and degrades network reliability\. In this paper, we study the problem of predicting channel closure types from publicly available gossip data, framing it as a temporal link classification task over the evolving channel graph\. We construct a dataset spanning over two years of LN activity and benchmark a range of machine learning approaches, from MLPs to temporal graph neural networks and spectral encodings\. Our experiments reveal that the dominant predictive signals are temporal and behavioural, namely how recently each endpoint was active and the per\-node history of past closures, while the surrounding network topology provides no additional benefit\. We find that a simple MLP operating on edge\-level features, node\-level event counts, and temporal patterns outperforms all graph\-based approaches, and discuss how the inherent privacy of the LN, where critical information such as channel balances and payment flows remains hidden, fundamentally limits the predictability of closures from gossip data alone\. We publicly release the dataset and code at[AmbossTech/ln\-channel\-closure\-prediction](https://github.com/AmbossTech/ln-channel-closure-prediction)to encourage further research on this practically relevant task\.
## IIntroduction
The Lightning Network \(LN\)\[[1](https://arxiv.org/html/2605.12759#bib.bib1)\]is a second\-layer protocol on top of Bitcoin that moves most payments off\-chain\. Two users open a*payment channel*by jointly locking Bitcoin on\-chain, route an arbitrary number of off\-chain payments through it, and eventually settle back on\-chain by closing the channel\.
Figure 1:Overview of the channel closure prediction task\.*Left*: the Lightning Network evolves over time as channels open and close, forming a temporal graph\.*Right*: given the current graph state at timett, we predict whether each open channel will remain open, close cooperatively \(mutual\), or be force\-closed within a windowΔt\\Delta t\.A*mutual closure*settles the channel cooperatively and releases the funds immediately, while a*forced closure*is initiated unilaterally, typically because one party is unresponsive or a dispute arises, and locks the initiating party’s funds for a timelock period of days to weeks\. Forced closures are costly: they consume on\-chain fees, freeze liquidity that could otherwise be routed, and temporarily reduce network capacity\. Anticipating them is therefore of practical interest for node operators, routing algorithms, and liquidity tooling\. More broadly, temporal modelling of the channel graph is a useful tool for operators optimising outcomes such as payment reliability and earned routing fees, which depend not on a static snapshot of the network but on how the graph evolves over time as channels open, close, and update their routing parameters \(fees, timelocks, disabled flags\)\.
The LN’s topology and channel metadata are partially observable through its*gossip protocol*, which broadcasts channel openings, closures, and periodic updates including fee policies, capacity, and disabled flags\. This public information forms a temporal graph \([Figure 1](https://arxiv.org/html/2605.12759#S1.F1)\), and raises the question:*can we predict, from gossip data alone, whether a channel will remain open, close mutually, or be force\-closed?*
Prior work has studied the LN’s topology\[[2](https://arxiv.org/html/2605.12759#bib.bib2)\], liquidity dynamics\[[3](https://arxiv.org/html/2605.12759#bib.bib3)\], and applied Graph Neural Networks \(GNNs\) to snapshot\-based LN tasks\[[4](https://arxiv.org/html/2605.12759#bib.bib4)\], but none has explicitly modeled the temporal evolution of channel closures\. We formalize the question above as a*temporal link classification*task and conduct a systematic study of its predictability, benchmarking random baselines, gradient\-boosted trees, an MLP, GNNs \(static and temporal\), and spectral graph encodings on a dataset of over two years of daily LN gossip snapshots that we release publicly\. Our findings are: \(i\) the dominant predictive signals are temporal and behavioural, namely endpoint activity recency and per\-node closure history, while static channel metadata is far less informative; \(ii\) graph topology, whether via message passing or spectral encodings, does not improve over a simple MLP using per\-channel and per\-node features; and \(iii\) the overall predictive performance remains moderate, reflecting a fundamental information gap, as the signals most relevant to closure decisions \(balances, payment failures, node uptime\) are private by design and not disclosed by gossip\.
## IIProblem statement
We consider the definition of a*temporal graph*as defined in\[[5](https://arxiv.org/html/2605.12759#bib.bib5)\], namely a set of events occurring at various timestamps that together build the final graph structure:
𝒢=\{x\(tm\):tm−1≤tm≤tm\+1,form∈\[1,2,…\]\}\{\\mathcal\{G\}\}=\\\{x\(t\_\{m\}\):t\_\{m\-1\}\\leq t\_\{m\}\\leq t\_\{m\+1\},\\text\{ for \}m\\in\[1,2,\\dots\]\\\}Each eventx\(t\)x\(t\)belongs to one of two types: a*node\-wise event*𝒗i\(t\)\{\\bm\{v\}\}\_\{i\}\(t\), involving the addition, deletion, or feature update of a node; or an*interaction event*𝒆ij\(t\)\{\\bm\{e\}\}\_\{ij\}\(t\), representing the addition or removal of an edge \(i\.e\., a payment channel\) between two nodesiiandjj\. A graph at timett, denoted𝒢\(t\)\{\\mathcal\{G\}\}\(t\), is defined by the pair\(𝒱\(t\),ℰ\(t\)\)\\left\(\{\\mathcal\{V\}\}\(t\),\{\\mathcal\{E\}\}\(t\)\\right\), where𝒱\(t\)=\{i:𝒗i\(tm\)∈𝒢andtm≤t\}\{\\mathcal\{V\}\}\(t\)=\\\{i:\{\\bm\{v\}\}\_\{i\}\(t\_\{m\}\)\\in\{\\mathcal\{G\}\}\\text\{ and \}t\_\{m\}\\leq t\\\}is the set of nodes present up to timett, andℰ\(t\)=\{\(i,j\):𝒆ij\(tm\)∈𝒢andtm≤t\}\{\\mathcal\{E\}\}\(t\)=\\\{\(i,j\):\{\\bm\{e\}\}\_\{ij\}\(t\_\{m\}\)\\in\{\\mathcal\{G\}\}\\text\{ and \}t\_\{m\}\\leq t\\\}is the set of directed edges up to timett\. Since payments in the LN flow in both directions, if𝒆ij\(tm\)\{\\bm\{e\}\}\_\{ij\}\(t\_\{m\}\)represents an edge fromiitojj, there also exists𝒆ji\(tm\)\{\\bm\{e\}\}\_\{ji\}\(t\_\{m\}\)in the opposite direction\. We denote a channel opening at timetmt\_\{m\}as𝒆ij\+\(tm\)\{\\bm\{e\}\}\_\{ij\}^\{\+\}\(t\_\{m\}\)and a channel closure as𝒆ij−\(tm\)\{\\bm\{e\}\}\_\{ij\}^\{\-\}\(t\_\{m\}\)\.
When dealing with temporal tasks, it is important to differentiate between*current time*and*query time*\. At a given momenttt\(the current time\), a model forecasts some property at a future pointt\+Δtt\+\\Delta\_\{t\}\(the query time\), whereΔt\\Delta\_\{t\}is a configurable lookahead window\. The task at hand can be formulated as a*temporal link classification*problem: for each edge that is*open*at the current timestamptmt\_\{m\}, the objective is to predict its state asopen,mutual, orforced\(see[SectionIII\-A](https://arxiv.org/html/2605.12759#S3.SS1)\) at the query timetm\+Δtt\_\{m\}\+\\Delta\_\{t\}\. In our experiments we setΔt=180\\Delta\_\{t\}=180days\. An edge is*open*at timestamptmt\_\{m\}if its most recent interaction event up totmt\_\{m\}is a channel opening\. Formally, letℰ\+\(tm\)=\{\(i,j\)∣∃𝒆ij\+\(t\)witht≤tm\}\{\\mathcal\{E\}\}^\{\+\}\(t\_\{m\}\)=\\\{\(i,j\)\\mid\\exists\\,\{\\bm\{e\}\}\_\{ij\}^\{\+\}\(t\)\\text\{ with \}t\\leq t\_\{m\}\\\}andℰ−\(tm\)=\{\(i,j\)∣∃𝒆ij−\(t\)witht≤tm\}\{\\mathcal\{E\}\}^\{\-\}\(t\_\{m\}\)=\\\{\(i,j\)\\mid\\exists\\,\{\\bm\{e\}\}\_\{ij\}^\{\-\}\(t\)\\text\{ with \}t\\leq t\_\{m\}\\\}\. The set of open edges is thenℰopen\(tm\)=ℰ\+\(tm\)∖ℰ−\(tm\)\{\\mathcal\{E\}\}\_\{\\text\{open\}\}\(t\_\{m\}\)=\{\\mathcal\{E\}\}^\{\+\}\(t\_\{m\}\)\\setminus\{\\mathcal\{E\}\}^\{\-\}\(t\_\{m\}\)\. An*open*edge is thus a persistent state of a channel, whereas interaction events are single occurrences that modify𝒢\(tm\)\{\\mathcal\{G\}\}\(t\_\{m\}\)\.
## IIIDataset
We collect daily snapshots111On some days, data were collected twice at different times\.of the LN from its gossip messages, covering the period from June 9, 2022, to October 14, 2024\. The raw data comprises693 277693\\,277directed events \(channel openings and closures\) recorded across874874timestamps, involving36 17036\\,170unique nodes\.
Handling the initial snapshot\.The first gossip snapshot \(June 9, 2022\) captures the entire state of the LN at that point, containing358 994358\\,994events \(over half the dataset\) at a single timestamp\. These events represent the accumulated history of the network rather than real\-time activity\. To handle this, we adopt a*warm\-start*strategy: first\-day events initialize the graph state \(populating the set of open channels and node\-level statistics\) but are excluded from training, validation, and testing\. This way, models learn from genuinely temporal activity while retaining access to the network’s structure at the start of the observation period\.
Parallel channels\.The LN is naturally a multigraph, where two nodes can maintain multiple channels simultaneously\. To reduce it to a simple graph, we remove all node pairs that have more than one channel between them\. This affects approximately 3% of node pairs and reduces the total event count by roughly 20%, but eliminates the ambiguity of which channel’s properties to use when multiple channels connect the same pair of endpoints\.
Chronological split\.To prevent information leakage, we split the remaining data chronologically into training, validation, and test sets using a70%/15%/15%70\\%/15\\%/15\\%partition of the timeline\. Each event contains achannel\_statusattribute indicating whether it represents a channel opening or closure; this information is derived from gossip messages and is available at the time of the event, so using it does not constitute leakage of future information\.
Labeling\.We assign labels to open edges based on their future status\. For any edge that is open at timett, we check whether a closing event involving that same edge occurs within the nextΔt=180\\Delta\_\{t\}=180days\. If a closure is found, the edge is labeled according to the corresponding closure type \(forcedormutual\); otherwise, it remains labeled asopen\.
### III\-AClasses
Figure 2:Distribution of label counts over time\. The plot shows the temporal evolution of the three labels \(forced,mutual,open\) in the dataset, with vertical dashed lines indicating the starting points of the validation and test periods\.The task is a multi\-class link classification problem, where the goal is to predict the channel’s status within a temporal window \(from the current timestamp up to the query timestamp\)\. Each channel can take one of the following classes, which reflect its state in the LN:
- •open: The channel is operational withinΔt\\Delta\_\{t\}\.
- •mutual: Mutual closure agreement withinΔt\\Delta\_\{t\}\.
- •forced: The channel is unilaterally closed withinΔt\\Delta\_\{t\}\.
A small fraction \(<0\.01%<0\.01\\%\) of closures are classified aspenaltyclosures, which we merge into theforcedclass\.
Figure 3:Daily average distribution of the three classes \(open,forced,mutual\) for the train, validation, and test splits\. Proportions are computed by first considering daily fractions, then averaging across all days within each split\.[Figure 2](https://arxiv.org/html/2605.12759#S3.F2)shows the distribution of event labels over time\. Although the total number of events decreases as time progresses, the relative proportions remain fairly consistent\. Across the post\-warm\-start data,openchannels account for approximately 47%,mutualclosures for 30%, andforcedclosures for 23% of events\. However, at prediction time, when the model must classify*all currently open edges*, the distribution is heavily skewed: roughly83%83\\%of open edges remainopen, with about9%9\\%eventually closing asmutualand8%8\\%asforced\.[Figure 3](https://arxiv.org/html/2605.12759#S3.F3)shows the average class distribution within each of the three temporal splits\.
### III\-BNode and edge features
Each event carries features at both the channel \(edge\) and endpoint \(node\) levels, as reported in the corresponding gossip message\. Per channel, we keep the on\-chain timestamps \(ts,height\), the lockedcapacity, and the funding block’sblock\_avg\_fee\_rate; identifiers and labels \(transaction\_id/vout,channel\_status,event\_label,gossip\_ts\) are kept as metadata for bookkeeping and as the prediction target, but are not given to the model\. Per endpoint we keep, separately for source and destination, the routing\-policy parameters declared in gossip: base and proportional fees \(fee\_base\_msat,fee\_rate\_milli\_msat\), HTLC bounds \(min\_htlc,max\_htlc\_msat\),time\_lock\_delta, thedisabledflag, the timestamp of the latest gossip update for that direction \(last\_update\), and the node’s LNimplementation\. We preserve channel directionality and thus treat \(src,dst\) separately from \(dst,src\)\. All features are taken directly from gossip, and the dataset is sorted bygossip\_tsto maintain chronological order\.
## IVMethodology
We study a*temporal link classification*task in which the goal is to predict the future status of an open channel given its history up to a user\-defined query time\. Most existing temporal graph benchmarks\[[6](https://arxiv.org/html/2605.12759#bib.bib6),[7](https://arxiv.org/html/2605.12759#bib.bib7)\]focus on*link prediction*, i\.e\., predicting whether an edge will form between two nodes at a future time\. In our setting, by contrast, we already know the channel exists and instead classify its state \(open,mutual, orforced\) at a future time\. Class imbalance is a key challenge here, as most channels remain open at any given time, with relatively few closing within a prediction window\.
Formally, given a modelMM, a temporal windowΔt\\Delta\_\{t\}, the current timestamptt, and the graph𝒢\(t\)\{\\mathcal\{G\}\}\(t\)of open channels observed up to timett, the task is defined as follows:
For eachopenedgee∈ℰopen\(t\)e\\in\{\\mathcal\{E\}\}\_\{\\text\{open\}\}\(t\), lett′=t\+Δtt^\{\\prime\}=t\+\\Delta\_\{t\}be the query time:1\.compute model predictions=M\(e,t′\)s=M\(e,t^\{\\prime\}\)and apply a softmax over the possible classes,2\.evaluatessusing classification metrics \(e\.g\., F1\-score, accuracy\)\.
In other words, for each open channel at timett, we want to predict its status att′=t\+Δtt^\{\\prime\}=t\+\\Delta\_\{t\}based on all events observed up tott\. At each training step, the model predicts the status of*all*currently open edges, not just those involved in the current batch’s events\. This differs from standard link prediction, where only a sampled subset of edges is typically considered per step, and makes the evaluation closely reflect how a deployed model would be used in practice\.
We build our temporal evaluation pipeline on the Temporal Graph Network \(TGN\) framework\[[5](https://arxiv.org/html/2605.12759#bib.bib5)\], adapting two of its components for our setting\. First, we replace the original*neighbor loader*, which samples a fixed number of recent neighbors for queried nodes, with a variant that maintains the full set of currently open edges, inserting channels as they open and removing them as they close\. At each step, this loader provides all open edges for prediction\. Second, we replace TGN’s learned RNN\-based memory module with a simpler, non\-parametric*feature storage*that accumulates event counts \(open,forced,mutual\) per node as channels are opened and closed over time, providing a lightweight temporal summary of each node’s history\. All learned models share this temporal infrastructure and differ only in how they produce predictions from the current graph state\.
### IV\-ABaselines
Random baselines\.We consider three non\-learned baselines:i\)uniform, sampling labels uniformly at random;ii\)stratified, sampling labels based on observed class frequencies in the training set;iii\)majority, always predicting the most frequent class \(open\)\.
### IV\-BMLP predictor
Our primary model is a multi\-layer perceptron \(MLP\) that classifies each open edge independently, without any graph\-based message passing\. For each open edge\(i,j\)\(i,j\)at timett, the input feature vector is the concatenation of:
- •*Edge features*: channel properties from the gossip protocol, including capacity, fee policies \(base fee, fee rate\), disabled flags, timelocks, min/max HTLC values, etc\.
- •*Node features*: for each endpointiiandjj, the running counts of events of each type \(count\_open,count\_forced,count\_mutual\) accumulated from the feature storage\. These summarise how many channels each node has opened and the closures it has been involved in up to timett\(3 dimensions per node\)\.
- •*Temporal encodings*: the channel aget−topent\-t\_\{\\text\{open\}\}\(edge\_age\), and the source and destination*recency*t−tlast\_update,it\-t\_\{\\text\{last\\\_update\},i\}\(src\_recency\) andt−tlast\_update,jt\-t\_\{\\text\{last\\\_update\},j\}\(dst\_recency\), wheretlast\_update,kt\_\{\\text\{last\\\_update\},k\}is the timestamp of the most recent gossip event involving nodekk\. Each of these three scalars is passed through a learnable time encoder \(3×dtime3\\times d\_\{\\text\{time\}\}dimensions in total\)\.
### IV\-CGradient\-boosted trees
As tabular baselines we also include two gradient\-boosted decision tree classifiers, XGBoost\[[8](https://arxiv.org/html/2605.12759#bib.bib8)\]and LightGBM\[[9](https://arxiv.org/html/2605.12759#bib.bib9)\], both receiving exactly the same input vector as the MLP\. We replay the training events to populate the neighbor loader and feature storage, then fit each model once on the snapshot of all currently open edges at the end of training, using the oracle’s closure labels\. At test time the model predicts at each timestamp from the features extracted from the current snapshot\. Both models use500500trees of depth66, learning rate0\.10\.1, and the same per\-class loss weights\[1,5,5\]\[1,5,5\]as the neural baselines\.
### IV\-DGraph\-based models
We evaluate two GNN variants and a spectral baseline, covering different ways of injecting structure into the prediction\.
Static GNN\.A GraphSAGE\[[10](https://arxiv.org/html/2605.12759#bib.bib10)\]network operating on the current graph of open edges, with node embeddings initialised from current degrees and aggregated through the GraphSAGE layers before being fed to a prediction MLP\. It does not use edge features or temporal encodings, isolating the predictive value of graph structure\.
TGN\.The TGN uses the same input features as the MLP and additionally computes node embeddings via attention\-based message passing over the current graph of open edges, using edge features as attention inputs\. These embeddings are concatenated with the edge features, node features, and temporal encodings in the prediction MLP, letting the model capture structural patterns that the edge\-level MLP cannot access from the local features of a single channel alone\.
Spectral encodings\.As an alternative to message passing, we augment the MLP with the top\-kkeigenvectors of the normalised Laplacian of the open\-edges graph, concatenating each endpoint’s spectral position\[ϕi,ϕj\]∈ℝ2k\[\\phi\_\{i\},\\phi\_\{j\}\]\\in\\mathbb\{R\}^\{2k\}to the input\. These encodings capture each node’s structural role in the topology without iterative aggregation\. We usek=16k=16, recomputed periodically as the graph evolves\.
## VExperimental setup and results
We assess model performance using the macro\-average F1\-score, which is well suited for handling class imbalance\. Edge features are preprocessed with a log\-transform followed by min\-max scaling fitted on the training set\. All learned models are trained for 30 epochs with a weighted cross\-entropy loss \(weights\[1,5,5\]\[1,5,5\]\), optimized via Adam \(lr=10−4=10^\{\-4\}, weight decay=10−5=10^\{\-5\}\) with linear warmup over 1000 steps\. We use hidden dimension 128 and temporal encoding dimension 128\. We report means and standard deviations over 3 seeds\.
TABLE I:Performance on the test set\. For the MLP and TGN we report the best architectural configuration; the other learned models use default hyperparameters\. Per\-class and macro\-average F1\-scores as mean±\\pmstd over 3 seeds\.Figure 4:\(a\) Normalized confusion matrix for the MLP \(open,forced,mutual\)\. The model achieves high recall onopenbut struggles to distinguish closure types\. \(b\) Per\-class F1 binned by channel age at query time\. Open F1 increases sharply with channel age, showing that long\-lived channels are reliably predicted to remain open\. Conversely, forced and mutual F1 decrease with age and approach zero in the oldest bin, where closures are rare and extremely difficult to predict\.### V\-AMain results
[Table I](https://arxiv.org/html/2605.12759#S5.T1)compares all models on the test set\. For the MLP and TGN we report the best architectural configuration identified through our layer ablation \([Figure 6](https://arxiv.org/html/2605.12759#S5.F6)\); the other learned models use default hyperparameters\. The MLP predictor achieves the best macro\-average F1 of0\.38±0\.0010\.38\\pm 0\.001\. The TGN, which additionally computes node embeddings via GNN message passing, reaches0\.36±0\.0070\.36\\pm 0\.007, still below the MLP\. The static GNN and the MLP augmented with spectral positional encodings both achieve0\.350\.35\. All learned models improve over the stratified random baseline \(0\.320\.32\), but the margin remains modest\.
Notably, the MLP uses no graph information whatsoever, yet outperforms all graph\-aware models and the gradient\-boosted tree baselines that share its input features\. The TGN’s GNN message passing does not improve over the MLP despite access to neighborhood structure, and neither spectral positional encodings nor the static GNN help\. This finding is robust across seeds and suggests that graph topology provides little additional signal beyond per\-channel and per\-node features, as we investigate in the following ablations\.
[Figure 4](https://arxiv.org/html/2605.12759#S5.F4)\(a\) shows the confusion matrix for the MLP\. The model correctly identifies mostopenchannels but frequently confusesforcedandmutualclosures with each other and withopen, suggesting that the gossip features do not clearly distinguish the two closure types\.[Figure 4](https://arxiv.org/html/2605.12759#S5.F4)\(b\) breaks down per\-class F1 by channel age\. TheopenF1 increases sharply with age, reaching0\.930\.93for channels older than a year, as the model learns that long\-lived channels rarely close\. Conversely,mutualF1 is highest for recently\-opened channels and declines steadily with age, whileforcedF1 peaks for medium\-aged channels \(90–180 days\); both drop to near zero for the oldest bin, where closures are rare and difficult to detect\.
### V\-BAblation studies
We now investigate which factors drive performance, through four complementary ablations: feature groups, model depth, prediction window, and class imbalance handling\.
Feature groups\.To understand which components drive the MLP’s performance and why graph\-based models underperform, we conduct a feature ablation study\. All configurations share the same temporal pipeline \(neighbor loader and feature storage\) and the same prediction MLP, differing only in which feature groups are exposed to the prediction head and whether GNN message passing is enabled on top\.
TABLE II:Feature ablation using the best MLP and TGN configurations\. We progressively add feature groups and assess whether GNN message passing adds value on top of them\. Mean±\\pmstd over 3 seeds\.The top half of[Table II](https://arxiv.org/html/2605.12759#S5.T2)progressively adds feature groups to the MLP\. Starting from the time\-only baseline, edge features alone do not help; the largest gain comes from adding the per\-node event counts, which lift performance to0\.380\.38and reach the full MLP\. The accumulated history of how each endpoint has behaved is thus the dominant signal, with edge features contributing only in combination with it\.
The bottom half enables GNN message passing on top of these same features, yielding the TGN architecture\. With only edge and time features, GNN message passing slightly helps \(Edge \+ Time \+ GNN reaches0\.370\.37, above0\.360\.36for the same features without the GNN\), but once node\-level event counts are added, the GNN no longer brings any benefit and performance drops back to0\.360\.36\. The MLP augmented with spectral encodings \([Table I](https://arxiv.org/html/2605.12759#S5.T1)\) similarly fails to improve over the baseline MLP\. Overall, graph aggregation can compensate when richer node features are missing, but it does not unlock performance beyond what the per\-node history and edge features already provide on their own\.
Figure 5:Feature importances for the trained MLP, computed as the mean absolute SHAP value over test edges\. Behavioural and temporal signals \(node recency, per\-node closure counts, channel age\) dominate over static channel metadata\.To complement the feature\-group ablation, we also estimate the importance of*individual*input features for the trained MLP\. We compute SHAP values via gradient\-based attribution on a sample of currently\-open edges at the end of training, and report the mean absolute SHAP value per feature\.[Figure 5](https://arxiv.org/html/2605.12759#S5.F5)shows the top fifteen features\. The dominant signals are how recently each endpoint was active \(src\_recency,dst\_recency\) and the per\-node history of past closures \(src/dst\_count\_mutual,src/dst\_count\_forced\), together with the channel’s age\. Static channel metadata such as fee policies, capacity, disabled flags, and timelocks appear well below the top of the ranking\. This is consistent with the gossip\-protocol design: per\-channel parameters are largely set at opening time and rarely revisited, while the only window into channel*behaviour*is what the endpoints have done in the past\.
Figure 6:Effect of the prediction head depth, where11corresponds to a single linear layer \(logistic regression\)\. For the TGN, we fix the GNN depth to 1 and vary the prediction MLP head\. Deeper heads perform worse in both cases\.Model depth\.We also vary the depth of the prediction head, ranging from a single linear layer \(logistic regression\) to three layers, for both the MLP and the TGN \([Figure 6](https://arxiv.org/html/2605.12759#S5.F6)\)\. For the TGN, we fix the GNN depth to its best value \(11layer\) and vary only the head\. A shallow MLP with one hidden layer achieves the best performance \(0\.380\.38\), only marginally above plain logistic regression \(0\.370\.37\), while deeper architectures degrade\. The TGN follows a similar trend, peaking with a linear head \(0\.360\.36\) and degrading with depth\. At every setting the MLP outperforms the TGN, confirming that neither additional model capacity nor GNN message passing helps on this task\.
Figure 7:Effect of the prediction windowΔt\\Delta\_\{t\}on the MLP compared to the stratified baseline\. The MLP matches the baseline atΔt=30\\Delta\_\{t\}=30days and outperforms it at all longer horizons, with the largest gap atΔt=180\\Delta\_\{t\}=180days\.Prediction window\.We also vary the prediction windowΔt∈\{30,90,180,365\}\\Delta\_\{t\}\\in\\\{30,90,180,365\\\}days \([Figure 7](https://arxiv.org/html/2605.12759#S5.F7)\)\. The MLP matches the stratified baseline atΔt=30\\Delta\_\{t\}=30days and outperforms it at all longer horizons, with the largest gap atΔt=180\\Delta\_\{t\}=180days\. At very short horizons very few channels close, leaving little signal to learn; atΔt=365\\Delta\_\{t\}=365days the increased uncertainty over a full year dilutes the predictive value of current gossip features\. The 180\-day window is also comfortably above the typical closure timescale in our dataset: among channels that eventually close, the median lifetime is7373days and roughly76%76\\%close within180180days, so the window captures the bulk of closure events without being so long that the look\-ahead routinely runs past the end of the available data and forces channels to be labelledopensimply because we ran out of observation history\. We therefore adoptΔt=180\\Delta\_\{t\}=180days for the main experiments, as a window that is both informative and well aligned with the natural closure timescale of the network\.
TABLE III:MLP ablation over strategies for handling class imbalance\. Class weights are the coefficients in the weighted cross\-entropy loss; downsampling keepsrropenedges per closing edge in the loss\. Mean±\\pmstd over 3 seeds\.Class imbalance\.Finally, we ablate different strategies for handling the severe class imbalance \([Table III](https://arxiv.org/html/2605.12759#S5.T3)\)\. Without any class weighting, the model collapses to predictingopenfor all edges \(macro F1=0\.30=0\.30, the majority baseline\)\. Mild reweighting \(\[1,2,3\]\[1,2,3\]or\[1,3,6\]\[1,3,6\]\) only partially compensates and stays close to the stratified baseline\. Symmetric moderate weights\[1,5,5\]\[1,5,5\]are the sweet spot\. More aggressive weighting hurts performance:\[1,10,10\]\[1,10,10\]pulls the model below the moderate setting, and the inverse\-frequency weights\[1,8\.5,17\]\[1,8\.5,17\]collapse below the majority baseline\. Balanced downsampling \(r=1r=1with uniform weights\) recovers most of the performance \(0\.370\.37\), and combining downsampling with the default class weights matches the weighted\-loss baseline\. Overall, a weighted cross\-entropy with symmetric moderate weights\[1,5,5\]\[1,5,5\]is the most effective and simplest choice\.
### V\-CDiscussion
The consistent finding across all experiments is that temporal and behavioural node\-level signals are the only features that meaningfully drive closure prediction, which has a natural interpretation in the context of the LN’s design\.
The gossip protocol broadcasts channel\-level metadata \(fee policies, capacity, disabled flags, timelocks\) but reveals almost nothing about the*activity*flowing through channels\. Channel balances, payment volumes, routing failures, and node uptime are all private\. Yet these are precisely the factors most likely to trigger a forced closure: a node going offline, a payment dispute, or a depleted channel balance\. Since this information is invisible in the gossip data, the model falls back on the closest observable proxies, namely how often each endpoint has been involved in past closures and how recently it has been active\. The static channel parameters \(fees, capacity, timelocks\), by contrast, are largely set at opening time and rarely revisited, and our SHAP analysis confirms that the model relies on them only marginally\.
The same logic explains why graph topology adds little\. In networks where node behaviour is informative, such as social networks where a user’s friends reveal something about the user, GNNs can leverage neighbourhood structure\. In the LN, by contrast, a node’s neighbours reveal little about whether*this specific channel*will be force\-closed, because the determining information stays private to the two endpoints and is not propagated through gossip\. The moderate overall performance \(best macro F1 of0\.380\.38\) is therefore better understood as a*fundamental information gap*than as a failure of the models\. Substantially improving beyond this level would likely require access to private node\-side data, such as channel balances, payment histories, or per\-node uptime, that the gossip protocol is intentionally designed not to expose\.
## VIRelated work
Machine learning for the Lightning Network\.Since its inception, the LN has been the subject of extensive research, focusing on both its topology\[[2](https://arxiv.org/html/2605.12759#bib.bib2),[11](https://arxiv.org/html/2605.12759#bib.bib11)\]and liquidity dynamics\[[3](https://arxiv.org/html/2605.12759#bib.bib3)\]\. ML techniques have increasingly been applied to LN\-specific problems:\[[12](https://arxiv.org/html/2605.12759#bib.bib12)\]explore different methods to predict channel balances,\[[13](https://arxiv.org/html/2605.12759#bib.bib13)\]employ reinforcement learning for joint node selection and resource allocation, and\[[14](https://arxiv.org/html/2605.12759#bib.bib14)\]leverage probabilistic modeling to optimize payment probing\. Graph\-based ML methods have also been applied to the LN:\[[4](https://arxiv.org/html/2605.12759#bib.bib4)\]benchmark GNNs on LN\-specific tasks using snapshot\-based datasets\. However, prior studies have not explicitly incorporated the temporal dimension of the LN nor addressed the specific problem of predicting channel closure types\. Our work fills this gap by modeling the LN as a continuous\-time dynamic graph and providing a systematic study of closure predictability using publicly available gossip data\.
Temporal graph neural networks\.Early approaches to temporal graphs modeled them as sequences of snapshots and applied GNNs to discrete representations\[[15](https://arxiv.org/html/2605.12759#bib.bib15),[16](https://arxiv.org/html/2605.12759#bib.bib16),[17](https://arxiv.org/html/2605.12759#bib.bib17),[18](https://arxiv.org/html/2605.12759#bib.bib18),[19](https://arxiv.org/html/2605.12759#bib.bib19)\], commonly referred to as Discrete\-Time Dynamic Graphs \(DTDGs\)\. More recent approaches model temporal graphs continuously as event sequences, termed Continuous\-Time Dynamic Graphs \(CTDGs\)\[[20](https://arxiv.org/html/2605.12759#bib.bib20),[21](https://arxiv.org/html/2605.12759#bib.bib21),[22](https://arxiv.org/html/2605.12759#bib.bib22),[5](https://arxiv.org/html/2605.12759#bib.bib5),[23](https://arxiv.org/html/2605.12759#bib.bib23),[24](https://arxiv.org/html/2605.12759#bib.bib24)\]\. The Temporal Graph Network \(TGN\)\[[5](https://arxiv.org/html/2605.12759#bib.bib5)\]provides a general framework combining memory modules, message passing, and temporal encoding, and has become widely adopted\. Standardized benchmarks\[[6](https://arxiv.org/html/2605.12759#bib.bib6),[7](https://arxiv.org/html/2605.12759#bib.bib7)\]have facilitated progress, primarily on link existence prediction and node classification tasks\. Our work applies temporal graph methods to a different task, link*classification*, and provides evidence that, for this particular application, the temporal and edge\-level components are more valuable than graph aggregation\.
## VIIConclusion and Further Research
We studied predicting channel closure types in the Lightning Network from publicly available gossip data, formalising the task as a temporal link classification problem, constructing a dataset spanning two years of LN activity, and benchmarking a broad set of approaches, including random baselines, MLPs, gradient\-boosted trees, graph neural networks \(both static and temporal\), and spectral encodings\.
Our experiments revealed that the dominant predictive signals are temporal and behavioural \(endpoint activity recency, per\-node history of past closures, channel age\), while static channel metadata and graph topology contribute much less\. The best\-performing model is a simple MLP operating on edge\- and node\-level features without any graph message passing, reaching a macro\-average F1 of0\.380\.38\. The moderate overall performance reflects a fundamental information gap rather than a model limitation: the factors most likely to trigger forced closures, such as channel balance depletion, payment routing failures, and node downtime, are private by design and not disclosed by gossip, in line with the LN’s privacy\-preserving architecture\.
Beyond closure prediction, we believe temporal modelling of the channel graph is a valuable tool for operators, whose outcomes, such as payment reliability and earned routing fees, are inherently functions of how the graph evolves over time\. We release the dataset publicly to support further research; promising directions include incorporating on\-chain signals \(e\.g\. fee market conditions, mempool congestion\), exploring node\-local prediction using private data such as local balance histories or payment failure logs that are available only to the channel endpoints, and studying how closure patterns evolve as the LN’s topology and usage shift over time\.
## References
- \[1\]Poon and Dryja, “The bitcoin lightning network: Scalable off\-chain instant payments,” 2016\.
- \[2\]P\. Zabka, K\.\-T\. Förster, S\. Schmid, and C\. Decker, “Node classification and geographical analysis of the lightning cryptocurrency network,” in*Proceedings of the 22nd International Conference on Distributed Computing and Networking*, ser\. ICDCN ’21\. New York, NY, USA: Association for Computing Machinery, 2021, p\. 126–135\. \[Online\]\. Available:[https://doi\.org/10\.1145/3427796\.3427837](https://doi.org/10.1145/3427796.3427837)
- \[3\]J\. Herrera\-Joancomartí, G\. Navarro\-Arribas, A\. Ranchal\-Pedrosa, C\. Pérez\-Solà, and J\. Garcia\-Alfaro, “On the difficulty of hiding the balance of lightning network channels,” in*Proceedings of the 2019 ACM Asia Conference on Computer and Communications Security*, ser\. Asia CCS ’19\. New York, NY, USA: Association for Computing Machinery, 2019, p\. 602–612\. \[Online\]\. Available:[https://doi\.org/10\.1145/3321705\.3329812](https://doi.org/10.1145/3321705.3329812)
- \[4\]R\. Feichtinger, F\. Grötschla, L\. Heimbach, and R\. Wattenhofer, “Benchmarking gnns using lightning network data,” 2024\. \[Online\]\. Available:[https://arxiv\.org/abs/2407\.07916](https://arxiv.org/abs/2407.07916)
- \[5\]E\. Rossi, B\. Chamberlain, F\. Frasca, D\. Eynard, F\. Monti, and M\. M\. Bronstein, “Temporal graph networks for deep learning on dynamic graphs,”*CoRR*, vol\. abs/2006\.10637, 2020\. \[Online\]\. Available:[https://arxiv\.org/abs/2006\.10637](https://arxiv.org/abs/2006.10637)
- \[6\]S\. Huang, F\. Poursafaei, J\. Danovitch, M\. Fey, W\. Hu, E\. Rossi, J\. Leskovec, M\. M\. Bronstein, G\. Rabusseau, and R\. Rabbany, “Temporal graph benchmark for machine learning on temporal graphs,” in*Thirty\-seventh Conference on Neural Information Processing Systems Datasets and Benchmarks Track*, 2023\. \[Online\]\. Available:[https://openreview\.net/forum?id=qG7IkQ7IBO](https://openreview.net/forum?id=qG7IkQ7IBO)
- \[7\]J\. Gastinger, S\. Huang, M\. Galkin, E\. Loghmani, A\. Parviz, F\. Poursafaei, J\. Danovitch, E\. Rossi, I\. Koutis, H\. Stuckenschmidt, R\. Rabbany, and G\. Rabusseau, “TGB 2\.0: A benchmark for learning on temporal knowledge graphs and heterogeneous graphs,” in*The Thirty\-eight Conference on Neural Information Processing Systems Datasets and Benchmarks Track*, 2024\. \[Online\]\. Available:[https://openreview\.net/forum?id=EADRzNJFn1](https://openreview.net/forum?id=EADRzNJFn1)
- \[8\]T\. Chen and C\. Guestrin, “Xgboost: A scalable tree boosting system,” in*Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining*, ser\. KDD ’16\. New York, NY, USA: Association for Computing Machinery, 2016, p\. 785–794\. \[Online\]\. Available:[https://doi\.org/10\.1145/2939672\.2939785](https://doi.org/10.1145/2939672.2939785)
- \[9\]G\. Ke, Q\. Meng, T\. Finley, T\. Wang, W\. Chen, W\. Ma, Q\. Ye, and T\.\-Y\. Liu, “Lightgbm: A highly efficient gradient boosting decision tree,” in*Advances in Neural Information Processing Systems*, I\. Guyon, U\. V\. Luxburg, S\. Bengio, H\. Wallach, R\. Fergus, S\. Vishwanathan, and R\. Garnett, Eds\., vol\. 30\. Curran Associates, Inc\., 2017\. \[Online\]\. Available:[https://proceedings\.neurips\.cc/paper\_files/paper/2017/file/6449f44a102fde848669bdd9eb6b76fa\-Paper\.pdf](https://proceedings.neurips.cc/paper_files/paper/2017/file/6449f44a102fde848669bdd9eb6b76fa-Paper.pdf)
- \[10\]W\. Hamilton, Z\. Ying, and J\. Leskovec, “Inductive representation learning on large graphs,” in*Advances in Neural Information Processing Systems*, I\. Guyon, U\. V\. Luxburg, S\. Bengio, H\. Wallach, R\. Fergus, S\. Vishwanathan, and R\. Garnett, Eds\., vol\. 30\. Curran Associates, Inc\., 2017\. \[Online\]\. Available:[https://proceedings\.neurips\.cc/paper\_files/paper/2017/file/5dd9db5e033da9c6fb5ba83c7a7ebea9\-Paper\.pdf](https://proceedings.neurips.cc/paper_files/paper/2017/file/5dd9db5e033da9c6fb5ba83c7a7ebea9-Paper.pdf)
- \[11\]I\. A\. Seres, L\. Gulyás, D\. A\. Nagy, and P\. Burcsi, “Topological analysis of bitcoin’s lightning network,” in*Mathematical Research for Blockchain Economy*, P\. Pardalos, I\. Kotsireas, Y\. Guo, and W\. Knottenbelt, Eds\. Cham: Springer International Publishing, 2020, pp\. 1–12\.
- \[12\]E\. Rossi, V\. Singh*et al\.*, “Channel balance interpolation in the lightning network via machine learning,”*arXiv preprint arXiv:2405\.12087*, 2024\.
- \[13\]M\. Salahshour, A\. Shafiee, and M\. Tefagh, “Joint combinatorial node selection and resource allocations in the lightning network using attention\-based reinforcement learning,” 2024\. \[Online\]\. Available:[https://arxiv\.org/abs/2411\.17353](https://arxiv.org/abs/2411.17353)
- \[14\]V\. Singh, M\. Khanzadeh, V\. Davis, H\. Rush, E\. Rossi, J\. Shrader, and P\. Lio, “Bayesian binary search,”*arXiv preprint arXiv:2410\.01771*, 2024\.
- \[15\]A\. Pareja, G\. Domeniconi, J\. Chen, T\. Ma, T\. Suzumura, H\. Kanezashi, T\. Kaler, T\. Schardl, and C\. Leiserson, “Evolvegcn: Evolving graph convolutional networks for dynamic graphs,”*Proceedings of the AAAI Conference on Artificial Intelligence*, vol\. 34, no\. 04, pp\. 5363–5370, Apr\. 2020\. \[Online\]\. Available:[https://ojs\.aaai\.org/index\.php/AAAI/article/view/5984](https://ojs.aaai.org/index.php/AAAI/article/view/5984)
- \[16\]J\. Chen, X\. Wang, and X\. Xu, “Gc\-lstm: graph convolution embedded lstm for dynamic network link prediction,”*Applied Intelligence*, vol\. 52, no\. 7, p\. 7513–7528, May 2022\. \[Online\]\. Available:[https://doi\.org/10\.1007/s10489\-021\-02518\-9](https://doi.org/10.1007/s10489-021-02518-9)
- \[17\]M\. Yang, M\. Zhou, M\. Kalander, Z\. Huang, and I\. King, “Discrete\-time temporal network embedding via implicit hierarchical learning in hyperbolic space,” in*Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining*, ser\. KDD ’21\. New York, NY, USA: Association for Computing Machinery, 2021, p\. 1975–1985\. \[Online\]\. Available:[https://doi\.org/10\.1145/3447548\.3467422](https://doi.org/10.1145/3447548.3467422)
- \[18\]J\. You, T\. Du, and J\. Leskovec, “Roland: Graph learning framework for dynamic graphs,” in*Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining*, ser\. KDD ’22\. New York, NY, USA: Association for Computing Machinery, 2022, p\. 2358–2366\. \[Online\]\. Available:[https://doi\.org/10\.1145/3534678\.3539300](https://doi.org/10.1145/3534678.3539300)
- \[19\]Y\. Zhu, F\. Cong, D\. Zhang, W\. Gong, Q\. Lin, W\. Feng, Y\. Dong, and J\. Tang, “Wingnn: Dynamic graph neural networks with random gradient aggregation window,” in*Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining*, ser\. KDD ’23\. New York, NY, USA: Association for Computing Machinery, 2023, p\. 3650–3662\. \[Online\]\. Available:[https://doi\.org/10\.1145/3580305\.3599551](https://doi.org/10.1145/3580305.3599551)
- \[20\]R\. Trivedi, M\. Farajtabar, P\. Biswal, and H\. Zha, “Dyrep: Learning representations over dynamic graphs,” in*International Conference on Learning Representations*, 2019\. \[Online\]\. Available:[https://openreview\.net/forum?id=HyePrhR5KX](https://openreview.net/forum?id=HyePrhR5KX)
- \[21\]S\. Kumar, X\. Zhang, and J\. Leskovec, “Predicting dynamic embedding trajectory in temporal interaction networks,” in*Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining*, ser\. KDD ’19\. New York, NY, USA: Association for Computing Machinery, 2019, p\. 1269–1278\. \[Online\]\. Available:[https://doi\.org/10\.1145/3292500\.3330895](https://doi.org/10.1145/3292500.3330895)
- \[22\]da Xu, chuanwei ruan, evren korpeoglu, sushant kumar, and kannan achan, “Inductive representation learning on temporal graphs,” in*International Conference on Learning Representations*, 2020\. \[Online\]\. Available:[https://openreview\.net/forum?id=rJeW1yHYwH](https://openreview.net/forum?id=rJeW1yHYwH)
- \[23\]L\. Yu, L\. Sun, B\. Du, and W\. Lv, “Towards better dynamic graph learning: New architecture and unified library,” in*Thirty\-seventh Conference on Neural Information Processing Systems*, 2023\. \[Online\]\. Available:[https://openreview\.net/forum?id=xHNzWHbklj](https://openreview.net/forum?id=xHNzWHbklj)
- \[24\]W\. Cong, S\. Zhang, J\. Kang, B\. Yuan, H\. Wu, X\. Zhou, H\. Tong, and M\. Mahdavi, “Do we really need complicated model architectures for temporal networks?” in*The Eleventh International Conference on Learning Representations*, 2023\. \[Online\]\. Available:[https://openreview\.net/forum?id=ayPPc0SyLv1](https://openreview.net/forum?id=ayPPc0SyLv1)Similar Articles
Quantifying and Mitigating Premature Closure in Frontier LLMs
This paper defines and measures premature closure in frontier LLMs, finding that models frequently give confident answers even when the correct option is removed or when clarification is needed, highlighting a critical safety concern for medical applications.
LLMs as Noisy Channels: A Shannon Perspective on Model Capacity and Scaling Laws
The paper proposes a Shannon Scaling Law that models LLM training as information transmission over a noisy channel, explaining non-monotonic performance phenomena like catastrophic overtraining and quantization-induced degradation, and demonstrating superior predictive accuracy over traditional scaling laws.
ChurnNet: A Optimized Modern AI for Churn Prediction
This paper evaluates traditional machine learning techniques (Random Forests, XGBoost, SVM) against a deep learning model (Unified Multi-Task Time Series Model) for customer churn prediction in retail, finding that conventional methods can outperform in predictive performance and efficiency.
Physics-Informed Machine Learning for Short-Term Flood Prediction
Researchers propose a Physics-Informed Machine Learning (PIML) framework that integrates hydrological constraints into an LSTM loss function to improve short-term flood forecasting, particularly in data-scarce regimes. A 'Trend Alignment' constraint enforcing consistency between precipitation and discharge trends improves Nash-Sutcliffe Efficiency and eliminates unphysical predictions during extreme events.
Forecasting Downstream Performance of LLMs With Proxy Metrics
This paper introduces proxy metrics based on token-level statistics from expert-written solutions to forecast downstream LLM performance, significantly outperforming loss-based methods in model selection, pretraining data selection, and training-time forecasting.