CHAM-net: A Contrastive Hierarchical Adaptive Meta-network for Robust Global Methane Flux Prediction

arXiv cs.LG 06/02/26, 04:00 AM Papers
methane climate flux-prediction meta-learning contrastive-learning hierarchical spatiotemporal
Summary
CHAM-net introduces a contrastive hierarchical adaptive meta-network that captures site-specific and cross-year dynamics for robust global methane flux prediction, outperforming baseline methods on simulation and observational datasets.
arXiv:2606.00338v1 Announce Type: new Abstract: Methane is a potent greenhouse gas that significantly contributes to global warming. However, accurately estimating global methane emissions and consumption remains challenging due to the complex interactions among environmental drivers that may vary across spatial and temporal scales. Prior data-driven methods often overlook the inherent spatiotemporal heterogeneity of ecosystems, failing to explicitly capture site-specific characteristics and cross-year evolutionary dynamics. To address these issues, we propose the Contrastive Hierarchical Adaptive Meta-network (CHAM-net), a novel framework that explicitly learns from historical context to capture site-specific dynamics. CHAM-net employs a hierarchical encoder-decoder architecture, in which the encoder captures site-specific characteristics from historical data and then dynamically conditions the decoder to generate the final prediction. Experimental results demonstrate that CHAM-net consistently outperforms all baseline methods on both simulation and observational datasets for methane emission and consumption, achieving nRMSE values as low as 0.43 and 0.88 with corresponding R2 scores up to 0.97 and 0.68 for emission prediction.
Original Article
View Cached Full Text
Cached at: 06/02/26, 03:41 PM
# CHAM-net: A Contrastive Hierarchical Adaptive Meta-network for Robust Global Methane Flux Prediction
Source: [https://arxiv.org/html/2606.00338](https://arxiv.org/html/2606.00338)
Yiming Sun1Shuo Chen2Youmi Oh3,4Licheng Liu5Yiqun Xie6&Xiaowei Jia1 1University of Pittsburgh2Purdue University3University of Colorado Boulder4NOAA Global Monitoring Laboratory5University of Wisconsin–Madison&6University of Maryland \{rongchaodong, yis108, xiaowei\}@pitt\.edu, chen4371@purdue\.edu, youmi\.oh@noaa\.gov, licheng\.liu@wisc\.edu, xie@umd\.edu

###### Abstract

Methane is a potent greenhouse gas that significantly contributes to global warming\. However, accurately estimating global methane emissions and consumption remains challenging due to the complex interactions among environmental drivers that may vary across spatial and temporal scales\. Prior data\-driven methods often overlook the inherent spatiotemporal heterogeneity of ecosystems, failing to explicitly capture site\-specific characteristics and cross\-year evolutionary dynamics\. To address these issues, we propose theContrastiveHierarchicalAdaptiveMeta\-network \(CHAM\-net\), a novel framework that explicitly learns from historical context to capture site\-specific dynamics\. CHAM\-net employs a hierarchical encoder–decoder architecture, in which the encoder captures site\-specific characteristics from historical data and then dynamically conditions the decoder to generate the final prediction\. Experimental results demonstrate that CHAM\-net consistently outperforms all baseline methods on both simulation and observational datasets for methane emission and consumption, achieving nRMSE values as low as 0\.43 and 0\.88 with corresponding R2scores up to 0\.97 and 0\.68 for emission prediction\.

## 1Introduction

Methane \(CH4CH\_\{4\}\) is the second most significant greenhouse gas contributing to global warming after carbon dioxide \(CO2CO\_\{2\}\) and is responsible for about 30% of the increase in global temperature since the industrial revolutionStocker \([2014](https://arxiv.org/html/2606.00338#bib.bib21)\)\. Unlike carbon dioxide, methane is chemically reactive in the atmosphere and therefore has a relatively short atmospheric lifetime of about 9 yearsSaunoiset al\.\([2025](https://arxiv.org/html/2606.00338#bib.bib43)\)\. This short lifetime means that reducing methane emissions can deliver rapid climate benefits, including slower warming rates, reduced climate extremes, and improved air qualityMaret al\.\([2022](https://arxiv.org/html/2606.00338#bib.bib44)\), making methane mitigation one of the most effective near\-term strategies for protecting human health, ecosystems, and vulnerable communitiesOckoet al\.\([2021](https://arxiv.org/html/2606.00338#bib.bib45)\)\.

Traditional approaches primarily rely on process\-based biogeochemistry modelsZhuanget al\.\([2004](https://arxiv.org/html/2606.00338#bib.bib24)\); Tianet al\.\([2015](https://arxiv.org/html/2606.00338#bib.bib22)\); Zhanget al\.\([2017](https://arxiv.org/html/2606.00338#bib.bib23)\)to simulate and estimate the natural methane cycle\. By incorporating the theoretical understanding of methane ecosystem dynamics with the key environmental drivers \(e\.g\., soil features and temperatures\), these models can be extended to enable global methane prediction and subsequent budget estimation\. However, they are often limited by rigid and extensive parameterization, leading to biased prediction and substantial computational requirements when applied over large regions and long time periods\. Recently, data\-driven machine learning \(ML\) methodsKimet al\.\([2020](https://arxiv.org/html/2606.00338#bib.bib27)\); Irvinet al\.\([2021](https://arxiv.org/html/2606.00338#bib.bib26)\); Luoet al\.\([2023](https://arxiv.org/html/2606.00338#bib.bib28)\); Sahaet al\.\([2025](https://arxiv.org/html/2606.00338#bib.bib29)\)have emerged as a promising alternative, demonstrating strong capability in capturing non\-linear relationships between environmental drivers and methane flux\. Existing works have further explored knowledge transfer \(e\.g\., pretraining and fine\-tuning\) from simulated data to sparse real observations \(e\.g\., collected from eddy covariance towers\) to improve the prediction accuracySunet al\.\([2025](https://arxiv.org/html/2606.00338#bib.bib20)\)\.

However, existing ML methods typically assume diverse methane sites across space are governed by a shared set of global parameters, and neglect the spatiotemporal heterogeneity\. In particular, methane emission and consumption patterns are highly site\-specific because different sites can respond differently to similar input drivers due to the underlying variations in microbial communities and soil properties\. For example, Figure[1](https://arxiv.org/html/2606.00338#S1.F1)\(a\) compares annual emissions at three sites \(US\.MybMattheset al\.\([2018](https://arxiv.org/html/2606.00338#bib.bib35)\), DE\.HteKoebsch and Jurasinski \([2018](https://arxiv.org/html/2606.00338#bib.bib36)\), and US\.ORvBohrer and Morin \([2015](https://arxiv.org/html/2606.00338#bib.bib37)\)\) in 2013\. These sites exhibit clearly different emission patterns and magnitudes\. Without fully leveraging site\-specific context, existing models tend to predict averaged values, which could underestimate high\-valued regions and overestimate low\-valued regions \(e\.g\., the significant differences between sites DE\.Hte and US\.ORv in Figure[1](https://arxiv.org/html/2606.00338#S1.F1)\(a\)\)\.

![Refer to caption](https://arxiv.org/html/2606.00338v1/x1.png)Figure 1:spatiotemporal heterogeneity within the datasets\.Additionally, current ML methods mostly utilize short\-term data \(e\.g\., data from the current year\) and focus on capturing short\-term temporal dynamics \(e\.g\., seasonal changes of precipitation\)\. However, they are not designed to capture the impact of many long\-term processes \(e\.g\., slow changes in plant cover, soil composition\) that also affect methane dynamics over years\. As shown in Figure[1](https://arxiv.org/html/2606.00338#S1.F1)\(b\), for site FI\.LomLohilaet al\.\([2010](https://arxiv.org/html/2606.00338#bib.bib38)\), we can observe a sustained decline in total methane emissions from 2006 to 2009, reflecting a site\-specific cross\-year temporal evolution pattern\.

To address these limitations, we propose theContrastiveHierarchicalAdaptiveMeta\-network \(CHAM\-net\), a novel framework that explicitly learns site\-specific dynamics from multi\-year historical data\. While many site\-specific characteristics \(e\.g\., microbial communities and soil properties\) are not directly observable, their influence is often manifested in the long\-term trend shown in each site’s historical methane record\. CHAM\-net therefore leverages each site’s historical methane data to capture these latent characteristics and improve current\-year prediction\. Specifically, CHAM\-net adopts a hybrid meta\-learning mechanismHospedaleset al\.\([2021](https://arxiv.org/html/2606.00338#bib.bib2)\)in which the inner model encodes multi\-year environmental and methane dynamics into learnable representations to summarize site\-specific characteristics \(e\.g\., temporal patterns and scales\), while the outer model leverages these learned representations to condition the current\-year prediction process\. This design shifts prediction from a global model to a site\-aware estimation\. Additionally, optimizing the outer loop helps shape the inner\-loop learning task, which enables effective extraction of site\-specific information\.

##### Connections to the social good\.

By applying our advanced AI architecture to resolve the spatiotemporal heterogeneity of methane ecosystems, this work significantly reduces uncertainties in natural methane budgets and improves process\-level understanding, thereby enabling more effective and accurate methane mitigation strategies in support of the Global Methane PledgeMalleyet al\.\([2023](https://arxiv.org/html/2606.00338#bib.bib46)\)\. These advances directly contribute to multiple UN Sustainable Development Goals \(https://sdgs\.un\.org/goals\), including Climate Action, Good Health and Well\-Being, and Life on Land, by informing near\-term climate mitigation pathways and air\-quality prediction, and supporting sustainable wetland management\.

This paper is conducted in active collaboration with domain experts from NOAA Global Monitoring Laboratory, University of Wisconsin\-Madison, University of Maryland, and Purdue University, who are internationally recognized experts in global methane observations, process understanding, and atmospheric data assimilation\. These scientists contribute domain knowledge, observational constraints, and evaluate the real\-world impact of these novel methodologies throughout model development, training, and interpretation\.

##### Technical Contributions\.

- ∙\\bulletWe identify inherent site heterogeneity as a key factor underlying the inaccurate predictions of the current models, and show that historical data encodes critical site\-specific information essential for effective site prediction\.
- ∙\\bulletWe propose CHAM\-net, an encoder–decoder hybrid meta\-learning framework that dynamically leverages historical data to calibrate site\-specific predictions for the current year\.
- ∙\\bulletWe evaluate CHAM\-net on extensive methane emission and consumption datasets, including both simulation and observational data\. Experimental results show that CHAM\-net consistently outperforms all baselines in all datasets, achieving an nRMSE of 0\.88 and an R2of 0\.68 in FLUXNET emission dataset\.

## 2Problem Formulation

The task of global methane flux prediction can be formulated as a site\-level time\-series regression problem\. For each sitei∈\{1,…,N\}i\\in\\\{1,\\dots,N\\\}, we are given a sequence of environmental drivers \(e\.g\., soil properties and temperature\) over a time period, denoted asXi=\{x1,x2,…,xT\}X\_\{i\}=\\\{x\_\{1\},x\_\{2\},\\dots,x\_\{T\}\\\}, where eachxt∈ℝDx\_\{t\}\\in\\mathbb\{R\}^\{D\}is a feature vector of dimensionDDat timett\(e\.g\., a specific date\),DDis the total number of input drivers, andTTis the length of each sequence\. Following prior works in methane prediction and other environmental monitoring tasksLiuet al\.\([2024a](https://arxiv.org/html/2606.00338#bib.bib33)\); Sunet al\.\([2025](https://arxiv.org/html/2606.00338#bib.bib20)\), we cut the data into yearly sequences \(i\.e\.,TT=365\) to facilitate modeling seasonal patterns\. The goal is to predict the methane flux for the corresponding yearYi=\{y1,y2,…,yT\}Y\_\{i\}=\\\{y\_\{1\},y\_\{2\},\\dots,y\_\{T\}\\\}, whereyt∈ℝy\_\{t\}\\in\\mathbb\{R\}is the target label, e\.g\., methane emission or consumption\. In our proposed method, we also leverage multi\-year historical records\. For each siteii, we useXi\(k\)X\_\{i\}^\{\(k\)\}andYi\(k\)Y\_\{i\}^\{\(k\)\}to represent the environmental drivers and methane data from a historical yeark∈\{1,…,K\}k\\in\\\{1,\\dots,K\\\}\.

The datasets used in this paper can be categorized into simulation and observational datasets, each with distinct characteristics as follows:

- ∙\\bulletSimulation DatasetWe use process\-based simulation datasets, which incorporate biogeochemical processes to simulate methane fluxes by solving differential equations\. Given globally available input drivers, they enable estimation of methane fluxes at the global scale\. However, due to the complexity of model computations and uncertainties in multiple input drivers, the highest spatial resolution is limited to 0\.5 degree, corresponding to approximately 3,000 km2per grid\.
- ∙\\bulletObservational DatasetIn observational datasets, both input drivers and methane fluxes are directly measured at each site using eddy covariance techniques, providing real\-world ground truth observations\. However, these sites are spatially sparse and geographically discontinuous, with footprints on the order of hundreds of square meters\. Moreover, observational datasets cover substantially shorter time spans than simulation datasets\. The details of the datasets used in this paper are provided in Section[4\.1](https://arxiv.org/html/2606.00338#S4.SS1)\. ## 3Design and Methodology In this section, we introduce the main design of CHAM\-net \(Contrastive Hierarchical Adaptive Meta\-network\) architecture, which addresses the aforementioned problems and leverages historical information to improve predictions\. ![Refer to caption](https://arxiv.org/html/2606.00338v1/x2.png)Figure 2:CHAM\-net structure overview\.### 3\.1Model Overview The proposed CHAM\-net model explicitly incorporates historical information to learn site\-specific characteristics and long\-term dynamics\. By employing a hybrid hierarchical meta\-learning architecture, the model extracts the most informative historical trends and scales for each site\. These learned site\-specific representations are then injected into the decoder and combined with current\-year inputs to improve the final prediction\. Figure[2](https://arxiv.org/html/2606.00338#S3.F2)illustrates the overall architecture of the CHAM\-net model\. For each site, a configurable length of historical years’ data is used as thesupport set, while the current year data forms thequery set\. When the historical year length exceeds one year, a cross\-attention module is first applied to compute each year’s relevance to the current\-year inputs\. The attention weights are then used to extract context\-aware representations capturing local characteristics that influence current\-year dynamics\. These representations are then propagated to the decoder stage to guide the final prediction\. The workflow consists of two hierarchical phases: - ∙\\bulletContext Encoder\.The context encoder is designed to extract representations for capturing site\-specific characteristics that affect current\-year dynamics\. Although many of these characteristics are not directly observable, they can be inferred from the dynamic responses of methane fluxesY\{Y\}to environmental driversX\{X\}, i\.e\.,Y=f\(X;θ\)\{Y\}=f\(\{X\};\\theta\)\. The key idea of the context encoder is to*inversely*infer these characteristics from historical methane data, by embedding them in a site\-specific representation that serves as the parametersθ\\thetafor the functionff\. However, directly inferring theθ\\thetaas the full parameterization of the functionffmay not precisely capture the site\-specific characteristics as the mapping is also influenced by many confounding factors\. Hence, we reformulate the modelffasY=f\(g\(X\);θ\)\{Y\}=f\(g\(\{X\}\);\\theta\), whereggserves as a global feature extractor \(via a bidirectional GRU\) that captures shared underlying processes whileθ\\thetaparameterizes the site\-specific information\. The extractorggis learned end\-to\-end under supervised training, which helps better define and stabilize the inverse problem for the target task\. A contrastive learning objective is further introduced to amplify inter\-site differences, which ensures the discrimination of site\-specific characteristics\. - ∙\\bulletAdaptive Decoder\.The resulting site\-specific representations are then transformed and injected into the hidden state of a Long Short\-Term Memory \(LSTM\)Hochreiter and Schmidhuber \([1997](https://arxiv.org/html/2606.00338#bib.bib11)\)\-based decoder\. The decoder shares parameters across all sites, but conditions its temporal dynamics on the injected site context\. The final output of methane fluxes is produced from this context\-enhanced hidden state\. The model is trained through a bi\-level optimization process\. The inner\-loop optimization aims to extract site\-specific representations from historical data observations\. In the outer loop, the model utilizes these representations to condition and enhance the prediction, while updating model parameters to optimize predictive performance and to shape the inner\-loop objective\. In the following, we provide details of the model components as well as the training process\. ### 3\.2Context Encoder The goal of the encoder is to convert the historical support set𝒮\(k\)=\{\(Xit,\(k\),Yit,\(k\)\)\}t=1T\\mathcal\{S\}^\{\(k\)\}=\\\{\(X\_\{i\}^\{t,\(k\)\},Y\_\{i\}^\{t,\(k\)\}\)\\\}\_\{t=1\}^\{T\}into a compact context\-aware representation𝐖i\\mathbf\{W\}\_\{i\}for each siteii, and then use it to enhance the decoder through site\-specific conditioning\. BiGRU Module\. We first encode data from each historical yearkkusing a bidirectional GRU: 𝐆i\(k\)=BiGRU\(𝐗i\(k\)\)∈ℝT×2H,\\mathbf\{G\}\_\{i\}^\{\(k\)\}=\\text\{BiGRU\}\(\\mathbf\{X\}\_\{i\}^\{\(k\)\}\)\\in\\mathbb\{R\}^\{T\\times 2H\},\(1\)whereHHis the size of the hidden dimension\. Site\-specific MLP\. To capture the site\-specific characteristics, we assign each siteiia dedicated MLP projection head parameterized by𝐖ik\\mathbf\{W\}\_\{i\}^\{k\}, which can be represented by: 𝐘^i\(k\)=𝐆i\(k\)𝐖ik\+bi,\\hat\{\\mathbf\{Y\}\}\_\{i\}^\{\(k\)\}=\\mathbf\{G\}\_\{i\}^\{\(k\)\}\\mathbf\{W\}\_\{i\}^\{k\}\+b\_\{i\},\(2\)where𝐖ik∈ℝ2H×1\\mathbf\{W\}\_\{i\}^\{k\}\\in\\mathbb\{R\}^\{2H\\times 1\}andbi∈ℝb\_\{i\}\\in\\mathbb\{R\}\.𝐘^i\(k\)\\hat\{\\mathbf\{Y\}\}\_\{i\}^\{\(k\)\}represents the predicted labels of yearkk, which are later used to compute the inner\-loop loss\. In our proposed method, we use the learned𝐖ik\\mathbf\{W\}\_\{i\}^\{k\}as the representation of site\-specific behavior\. The MLP weights effectively distill rich historical information from both the drivers and methane fluxes into a compact representation\. The dimensionality of𝐖ik\\mathbf\{W\}\_\{i\}^\{k\}can be adjusted based on the output dimension of the previous GRU layer\. Year\-wise Attention\. When incorporating multiple historical years \(i\.e\.,K\>1K\>1\), we weight each year based on its relevance to the current\-year prediction\. We therefore compute attention weights that measure the year\-wise importanceαi\(k\)\\alpha\_\{i\}^\{\(k\)\}of each historical year, as follows: αi\(k\)=Attn\(𝐗i\(0\),𝐗i\(k\)\),s\.t\.,∑k=1Kαi\(k\)=1,\\alpha\_\{i\}^\{\(k\)\}=\\text\{Attn\}\\big\(\\mathbf\{X\}\_\{i\}^\{\(0\)\},\\mathbf\{X\}\_\{i\}^\{\(k\)\}\\big\),\\quad\\text\{s\.t\., \\,\}\\sum\_\{k=1\}^\{K\}\\alpha\_\{i\}^\{\(k\)\}=1,\(3\) where𝐗i\(0\)\\mathbf\{X\}\_\{i\}^\{\(0\)\}is the current\-year inputs\. We employ the standard multiple\-head attention mechanism to calculate the similarity between the current year and the historical years, and then normalize the attention weights acrossKKhistorical years via softmax\. The attention weights are used in the outer\-loop for aggregating𝐖ik\\mathbf\{W\}\_\{i\}^\{k\}from historical years, which will be discussed in Section[3\.3](https://arxiv.org/html/2606.00338#S3.SS3)\. Contrastive Learning\. To better ensure site discrimination and encode persistent site characteristics \(e\.g\., long\-term scaling and trend patterns\) into𝐖ik\\mathbf\{W\}\_\{i\}^\{k\}, we introduce a contrastive objective directly defined over the site\-specific representation𝐖ik\\mathbf\{W\}\_\{i\}^\{k\}\. We treat the representation from the same site but different years as positive pairs\(𝐖ik,𝐖i\+\)\(\\mathbf\{W\}\_\{i\}^\{k\},\\mathbf\{W\}\_\{i\}^\{\+\}\)and other sites in the batch as negatives\. The contrastive loss is Info Noise\-Contrastive Estimation \(InfoNCE\) loss, which is defined as: ℒcont=−log⁡exp⁡\(sim\(𝐖ik,𝐖i\+\)/τ\)∑j=1Bexp⁡\(sim\(𝐖ik,𝐖j\)/τ\),\\mathcal\{L\}\_\{cont\}=\-\\log\\frac\{\\exp\(\\text\{sim\}\(\\mathbf\{W\}\_\{i\}^\{k\},\\mathbf\{W\}\_\{i\}^\{\+\}\)/\\tau\)\}\{\\sum\_\{j=1\}^\{B\}\\exp\(\\text\{sim\}\(\\mathbf\{W\}\_\{i\}^\{k\},\\mathbf\{W\}\_\{j\}\)/\\tau\)\},\(4\)whereBBis the batch size,sim\(⋅\)\\text\{sim\}\(\\cdot\)is the cosine similarity, andτ\\tauis the configurable temperature parameter\. To improve the robustness of site\-specific representations, we adopt a stochastic perturbation method to augment the generated𝐖ik\\mathbf\{W\}\_\{i\}^\{k\}in contrastive learning: 𝐖~ik=𝐖ik\+ϵ,ϵ∼𝒩\(𝟎,σ2𝐈\),\\tilde\{\\mathbf\{W\}\}\_\{i\}^\{k\}=\\mathbf\{W\}\_\{i\}^\{k\}\+\\boldsymbol\{\\epsilon\},\\quad\\boldsymbol\{\\epsilon\}\\sim\\mathcal\{N\}\(\\mathbf\{0\},\\sigma^\{2\}\\mathbf\{I\}\),\(5\) This augmentation encourages the embeddings to remain stable under small variations of site\-specific parameters\. ### 3\.3Adaptive Decoder The decoder serves as the base\-learner that leverages the outcome of the encoder to condition the actual flux prediction on the Query Set𝒬i=\{Xcurrt\}t=1T\\mathcal\{Q\}\_\{i\}=\\\{X\_\{curr\}^\{t\}\\\}\_\{t=1\}^\{T\}\. In our implementation, the decoder is an LSTM\-based model\. We adopt LSTM because LSTM\-based architectures have consistently demonstrated robust performance in prior methane flux prediction studiesLuoet al\.\([2023](https://arxiv.org/html/2606.00338#bib.bib28)\); Chenet al\.\([2024](https://arxiv.org/html/2606.00338#bib.bib30)\); Sunet al\.\([2025](https://arxiv.org/html/2606.00338#bib.bib20)\)\. We first aggregate𝐖~ik\\tilde\{\\mathbf\{W\}\}\_\{i\}^\{k\}from multiple historical years using attention weightsαi\(k\)\\alpha\_\{i\}^\{\(k\)\}, and then project the site\-specific embedding into the hidden state space of the decoder: 𝐳~i=ϕ\(∑k=1Kαi\(k\)𝐖~ik\),\\tilde\{\\mathbf\{z\}\}\_\{i\}=\\phi\(\\sum\_\{k=1\}^\{K\}\\alpha\_\{i\}^\{\(k\)\}\\,\\tilde\{\\mathbf\{W\}\}\_\{i\}^\{k\}\),\(6\) whereϕ\(⋅\)\\phi\(\\cdot\)is a fully connected projection\. The𝐳~i\\tilde\{\\mathbf\{z\}\}\_\{i\}will be further normalized with the original hidden state of the LSTM decoder with a learnable weightβi\\beta\_\{i\}: 𝐡~i=𝐡~i\+βi𝐳~i1\+\|βi\|,\\tilde\{\\mathbf\{h\}\}\_\{i\}=\\frac\{\\tilde\{\\mathbf\{h\}\}\_\{i\}\+\\beta\_\{i\}\\tilde\{\\mathbf\{z\}\}\_\{i\}\}\{1\+\|\\beta\_\{i\}\|\},\(7\) Then the final prediction is obtained by putting the current\-year inputs and the enhanced hidden state into the LSTM decoder: 𝐘^i\(0\)=LSTMψ\(𝐗i\(0\),𝐡~i\),\\hat\{\\mathbf\{Y\}\}\_\{i\}^\{\(0\)\}=\\text\{LSTM\}\_\{\\psi\}\\big\(\\mathbf\{X\}\_\{i\}^\{\(0\)\},\\tilde\{\\mathbf\{h\}\}\_\{i\}\\big\),\(8\)whereψ\\psidenotes the other parameters of the LSTM, and𝐘^i\(0\)\\hat\{\\mathbf\{Y\}\}\_\{i\}^\{\(0\)\}is the predicted current\-year labels\. ### 3\.4Optimization and Loss Functions The training of CHAM\-net is formulated as a bilevel optimization problem, involving nested inner\- and outer\-loop loss computations and the corresponding bilevel backpropagation process\. The second\-order backpropagation gradient flow is illustrated in Figure[2](https://arxiv.org/html/2606.00338#S3.F2)\. Inner\-loop Reconstruction Loss\. The inner\-loop objective is designed to extract site representations using historical information\. It consists of a historical reconstruction loss and a contrastive enhancement loss: ℒinner\(θ\)=ℒhist\(θ\)\+λconℒcon\(θ\),\\mathcal\{L\}\_\{\\text\{inner\}\}\(\\theta\)=\\mathcal\{L\}\_\{\\text\{hist\}\}\(\\theta\)\+\\lambda\_\{\\text\{con\}\}\\mathcal\{L\}\_\{\\text\{con\}\}\(\\theta\),\(9\)whereθ=\{𝐖ik\}k=1K\\theta=\\\{\\mathbf\{W\}\_\{i\}^\{k\}\\\}\_\{k=1\}^\{K\}is the parameters in the inner\-loop,ℒcon\(θ\)\\mathcal\{L\}\_\{\\text\{con\}\}\(\\theta\)is the contrastive loss defined in Equation[4](https://arxiv.org/html/2606.00338#S3.E4), andλcon\\lambda\_\{\\text\{con\}\}is a configurable hyperparameter that controls the contribution of the contrastive loss\. The historical reconstruction lossℒhist\(θ\)\\mathcal\{L\}\_\{\\text\{hist\}\}\(\\theta\)is defined as: ℒhist\(θ\)=1N∑i=1N∑k=1Kαi\(k\)‖𝐘^i\(k\)−𝐘i\(k\)‖22,\\mathcal\{L\}\_\{\\text\{hist\}\}\(\\theta\)=\\frac\{1\}\{N\}\\sum\_\{i=1\}^\{N\}\\sum\_\{k=1\}^\{K\}\\alpha\_\{i\}^\{\(k\)\}\\left\\\|\\hat\{\\mathbf\{Y\}\}\_\{i\}^\{\(k\)\}\-\\mathbf\{Y\}\_\{i\}^\{\(k\)\}\\right\\\|\_\{2\}^\{2\},\(10\)whereNNis the total number of sites in training\. The encoder parameters are updated through a differentiable inner optimization step: θ′=θ−γ∇θℒinner\(θ\),\\theta^\{\\prime\}=\\theta\-\\gamma\\nabla\_\{\\theta\}\\mathcal\{L\}\_\{\\text\{inner\}\}\(\\theta\),\(11\)whereγ\\gammais the inner learning rate\. Outer\-loop Objective\. The outer\-loop takes the learned representation and performs current\-year prediction, and the loss is calculated by the mean squared error across all the training sites: ℒouter\(ψ,θ∗\)=1N∑i=1N‖𝐘^i\(0\)\(θ′,ψ\)−𝐘i\(0\)‖22\\mathcal\{L\}\_\{\\text\{outer\}\}\(\\psi,\\theta^\{\*\}\)=\\frac\{1\}\{N\}\\sum\_\{i=1\}^\{N\}\\left\\\|\\hat\{\\mathbf\{Y\}\}\_\{i\}^\{\(0\)\}\(\\theta^\{\\prime\},\\psi\)\-\\mathbf\{Y\}\_\{i\}^\{\(0\)\}\\right\\\|\_\{2\}^\{2\}\(12\) The outer optimization updates the parametersψ\\psifor both the encoder and decoder parameters, i\.e\., the BiGRU parameters in the encoder and the LSTM parameters in the decoder\.θ∗\\theta^\{\*\}represents the updated site\-specific representations obtained from the inner\-loop \(via Eq\.[11](https://arxiv.org/html/2606.00338#S3.E11)\)\. Given thatθ∗\\theta^\{\*\}also contributes to the final prediction, the outer\-loop errors are backpropagated to the representationsθ\\theta, which forms a second\-order gradient flow during the training process\. The complete training objective is formulated as follows: minψ\\displaystyle\\min\_\{\\psi\}ℒouter\(ψ,θ∗\)\\displaystyle\\mathcal\{L\}\_\{\\text\{outer\}\}\(\\psi,\\theta^\{\*\}\)\(13\)s\.t\.θ∗=argmin⁡ℒinner\(θ\)\\displaystyle\\theta^\{\*\}=\\operatorname\*\{arg\\,min\}\\mathcal\{L\}\_\{\\text\{inner\}\}\(\\theta\)whereψ\\psiis the parameters of the outer LSTM,θ\\thetais the parameters of inner\-loop\. Note that the outer\-loop parametersψ\\psiare shared across all sites while the inner\-loop parametersθ\\thetaare specific to each site\. Why meta\-learningTraditional methane flux prediction models either rely on globally shared parameters or fail to fully exploit historical information\. In our setting, historical context must be explicitly encoded into the prediction process\. However, since input drivers and labels vary dynamically across training and evaluation, efficiently integrating this information poses a challenge\. Meta\-learning provides a natural solution\. Each site can be viewed as a distinct and evolving task, characterized by unique long\-term dynamics and temporal responses\. Through the inner\-loop, the model learns site\-specific historical information and dynamically adapts it to the outer\-loop objectives, offering an effective mechanism for improving the prediction accuracy\. ## 4Evaluation ### 4\.1Experimental Setup Baselines\.We compare CHAM\-net against nine competitive long\-term time\-series forecasting models\. These include Transformer\-based methods, such as the original TransformerVaswaniet al\.\([2017](https://arxiv.org/html/2606.00338#bib.bib12)\), iTransformerLiuet al\.\([2024b](https://arxiv.org/html/2606.00338#bib.bib13)\), PyraformerLiuet al\.\([2022](https://arxiv.org/html/2606.00338#bib.bib14)\), DUETQiuet al\.\([2025](https://arxiv.org/html/2606.00338#bib.bib18)\), and PatchTSTNieet al\.\([2022](https://arxiv.org/html/2606.00338#bib.bib15)\), as well as two MLP\-based architectures, TSMixerChenet al\.\([2023](https://arxiv.org/html/2606.00338#bib.bib16)\)and TimeMixerWanget al\.\([2024](https://arxiv.org/html/2606.00338#bib.bib17)\)\. We further include two RNN\-based approaches, namely LSTMHochreiter and Schmidhuber \([1997](https://arxiv.org/html/2606.00338#bib.bib11)\)and P\-sLSTMKonget al\.\([2025](https://arxiv.org/html/2606.00338#bib.bib19)\)\. Notably, CHAM\-net can also be regarded as an RNN\-based model, as it relies on BiGRU and LSTM components to extract historical information and condition the prediction\. Datasets\.We evaluate CHAM\-net and all baseline models using both simulation and observational datasets\. We also consider both methane emission and consumption processes, which together constitute the natural methane cycle\. Details of the datasets used in our experiments are provided below\. - ∙\\bulletEmission Datasets\. We leverage the methane emission datasets introduced inSunet al\.\([2025](https://arxiv.org/html/2606.00338#bib.bib20)\), which include a global simulation dataset at 0\.5 degree spatial resolution with daily granularity, TEMZhuanget al\.\([2004](https://arxiv.org/html/2606.00338#bib.bib24)\)\(TEM\-E\), and an observational dataset derived from eddy covariance measurements, FLUXNET\-CH4Delwicheet al\.\([2021](https://arxiv.org/html/2606.00338#bib.bib41)\)\(FLUXNET\-E\)\. The TEM\-E dataset provides 40 years of global data spanning from 1979 to 2018\. The FLUXNET\-E dataset consists of measurements from 30 wetland eddy covariance tower sites, with site\-specific temporal coverage determined by each site, but primarily ranging from 2006 to 2018\. For each site in both TEM\-E and FLUXNET\-E, the datasets include 15 methane\-related scalar input drivers at daily resolution, along with daily methane emission fluxes as target labels\. Among these inputs, 10 are static or multi\-years evolving site\-specific features, including elevation \(clelev\), soil texture fractions \(sand, silt, and clay\) \(clfaotxt\), vegetation type \(cltveg\), soil pH value \(phh2o\), topsoil bulk density \(topsoil\_bulk\_density\), plant functional type \(vegetation\_type\_11\), wetland type \(wetlandtype\), climate type \(climatetype\), atmospheric carbon dioxide concentration \(kco2\), and atmospheric methane concentration \(ch4\)\. The remaining five inputs are dynamic climate\-related variables, including precipitation \(PREC\), air temperature \(TAIR\), solar radiation \(SOLR\), vapor pressure \(VAPR\), and net primary productivity \(NPP\)\. - ∙\\bulletConsumption Datasets\. We use two consumption simulation datasets,TEM\-CZhuanget al\.\([2013](https://arxiv.org/html/2606.00338#bib.bib25)\)andMeMoMurguia\-Floreset al\.\([2018](https://arxiv.org/html/2606.00338#bib.bib51),[2021](https://arxiv.org/html/2606.00338#bib.bib50)\), which are generated by different physics\-based methane consumption models that incorporate different physical processes and geographic constraints\. We additionally include an observational dataset,FLUXNET\-CDelwicheet al\.\([2021](https://arxiv.org/html/2606.00338#bib.bib41)\), which consists of methane consumption measurements from 28 upland sites collected using eddy covariance techniques\. Both TEM\-C and MeMo provide global coverage at 0\.5 degree spatial resolution with monthly granularity data\. The TEM\-C dataset spans from 1979 to 2019, while MeMo covers the period from 1990 to 2009, yielding 20 years of data\. The temporal coverage of FLUXNET\-C varies across sites but primarily ranges from 2006 to 2018\. Due to difficulties of real\-world data collection, FLUXNET\-C contains a limited number of missing observations \(e\.g\., NaN consumption values in certain months for some sites\)\. In our experiments, these missing entries are excluded from loss computation, such that predictions corresponding to NaN labels do not contribute to the optimization objective\. Across all simulation and observational consumption datasets, we use a total of 18 input features\. These include all drivers described in the emission datasets, along with two additional soil texture variables \(sandandclay\) and leaf area index \(LAI\), which characterizes vegetation cover in upland ecosystems\. Implementation DetailsAll models are implemented in PyTorch 2\.5\.1 with CUDA 12\.4\. We use a learning rate of 0\.001, a dropout rate of 0\.2, a hidden dimension 8, three layers, 128 batch size for simulation datasets, 4 batch size for observational datasets, and the Gaussian Error Linear Unit \(GELU\) as the activation functionHendrycks \([2016](https://arxiv.org/html/2606.00338#bib.bib42)\)among all models\. For all Transformer\-based models, the number of attention heads is set to 4\. Other model\-specific hyperparameters follow the default settings provided in the original implementations or corresponding papers\. We also employ an early stopping strategy with a patience of five epochs\. Each experiment is repeated three times, and the reported results are averaged across runs\. All experiments are conducted using the Adam optimizerKingma and Ba \([2017](https://arxiv.org/html/2606.00338#bib.bib52)\)on a single NVIDIA GTX 3080 GPU\. We provide our datasets and code in the open\-source Zenodo repository111Dataset and code link: https://zenodo\.org/records/20450697\.\.

Data Splitting and Evaluation Task\.We focus on temporal extrapolation for both methane emission and consumption tasks\. FollowingSunet al\.\([2025](https://arxiv.org/html/2606.00338#bib.bib20)\), simulation datasets are split chronologically into two equal halves, with the earlier period used for training and the later period used for testing\. For example, in both the TEM\-E and TEM\-C datasets, data from 1979 to 1998 are used for training, while data from 1999 to 2018 are used for testing\. For the observational datasets FLUXNET\-E and FLUXNET\-C, data are split on a per\-site basis, where six\-sevenths of the temporal records are used for training and the remaining one\-seventh for testing\. This 6/7–1/7 split ensures sufficient training data for effective model learning while mitigating overfitting risks\.

Evaluation Metrics\.We use normalized root of mean squared error \(nRMSE\) and the coefficient of determination \(R2\) to evaluate model performance\. nRMSE measures the magnitude of prediction errors normalized by the scale of the observations, enabling fair comparison across sites with substantially different magnitudes\. Lower nRMSE indicates better performance\. This metric particularly fits in our setting, where methane fluxes vary widely across locations\. R2quantifies the proportion of variance in the observations explained by the model, reflecting its ability to capture the temporal patterns\. Higher R2values correspond to a better model fit\. Together, nRMSE and R2provide complementary perspectives on predictive accuracy and pattern fidelity\.

### 4\.2Results

#### 4\.2\.1Predictive performance

Table[1](https://arxiv.org/html/2606.00338#S4.T1)reports the predictive performance on all simulation and observational datasets\. For each column, models are trained and evaluated solely on the corresponding dataset, without pretraining or fine\-tuning across datasets\. We observe that CHAM\-net consistently outperforms all baselines in terms of both nRMSE and R2\. Specifically, on the emission datasets TEM\-E and FLUXNET\-E, CHAM\-net achieves nRMSE values of 0\.43 and 0\.88, respectively\. The nRMSEs are at least 0\.1 lower than those of competing methods, indicating its superior ability to capture the magnitude of methane emissions over time\. CHAM\-net also attains the highest R2value on FLUXNET\-E, showing more accurate modeling of the emission patterns\. The combination of lower nRMSE and higher R2demonstrates that CHAM\-net effectively captures both emission scales and temporal dynamics, which is critical for reliable future methane budget estimation\. Moreover, the improvement reflects that the learned site\-specific representations provide valuable insights into localized emission behavior, which can help inform targeted methane mitigation strategies\.

For the consumption datasets, CHAM\-net demonstrates substantial improvements on TEM\-C, reducing nRMSE by more than 0\.4 and increasing R2by over 0\.2, indicating a significant performance gain\. For the MeMo dataset, which is generated using relatively simpler biogeochemical equations and has more regular patterns, nearly all models achieve lower nRMSE and higher R2compared to the TEM\-C dataset\. Results on both TEM\-C and MeMo demonstrate that explicitly leveraging historical information substantially enhances predictive performance for current\-year consumption\. For the observational dataset FLUXNET\-C, model performance is generally limited due to the small magnitude of consumption fluxes and high levels of environmental noise\. Nevertheless, CHAM\-net still achieves the highest R2value of 0\.31 and the lowest nRMSE of 1\.27 among all methods\.

Table 1:Main results across all datasets and models\. Each dataset is trained and evaluated from scratch\. Lower nRMSE and higher R2indicate better performance\.
#### 4\.2\.2Case Analysis

Figure[3](https://arxiv.org/html/2606.00338#S4.F3)presents two representative site examples from the FLUXNET\-E dataset\. We compare CHAM\-net with the two best baseline models, Pyraformer and TSMixer\. As shown in Figure[3](https://arxiv.org/html/2606.00338#S4.F3)\(a\), CHAM\-net achieves a closer match to both the scale and temporal evolution of methane emissions at site FI\-Lom, particularly after day 200\. While all three models capture the overall temporal pattern, CHAM\-net more accurately estimates the peak value\. In contrast, Pyraformer overestimates the peak value, whereas TSMixer underestimates it\. Figure[3](https://arxiv.org/html/2606.00338#S4.F3)\(b\) illustrates results for site DE\-SfNSchmid and Klatt \([2014](https://arxiv.org/html/2606.00338#bib.bib53)\), which represents a particularly challenging case due to the presence of negative flux values in the ground truth\. Even under this difficult setting, CHAM\-net more effectively captures the site\-specific emission scale than the baseline models, resulting in higher R2, lower nRMSE, and a total emission estimate that is closer to the observed values\.

![Refer to caption](https://arxiv.org/html/2606.00338v1/x3.png)Figure 3:Predictions on representative FLUXNET\-E sites\.Table 2:Performance of models pretrained on simulation datasets and fine\-tuned on observational datasets\.Transfer Learning AnalysisWe report the performance results of adopting transfer learning methods \(e\.g\., pretraining and fine\-tuning\)\. Table[2](https://arxiv.org/html/2606.00338#S4.T2)reports the performance on observational datasets when models are first pretrained on the simulation datasets and then fine\-tuned on observational data\. In the table, the columns correspond to the simulation datasets used for pretraining, which are subsequently fine\-tuned on the corresponding observational emission or consumption datasets\. For example, TEM\-E denotes pretraining on TEM\-E followed by fine\-tuning on FLUXNET\-E\.

First, CHAM\-net consistently outperforms all baseline methods under the pretraining and fine\-tuning setting, demonstrating its strong adaptability across data domains\. Second, pretraining proves particularly beneficial for methane consumption tasks\. As shown in Table[2](https://arxiv.org/html/2606.00338#S4.T2), pretraining on either TEM\-C or MeMo improves both metrics\. Notably, pretraining on TEM\-C increases the R2value from 0\.31 to 0\.43 in CHAM\-net, indicating enhanced pattern learning from simulation datasets\. Finally, performance varies across different pretraining sources, suggesting that the quality of simulation datasets directly impacts fine\-tuning effectiveness\. Because TEM\-C incorporates stronger physical constraints and has more realistic consumption dynamics and scales, it provides a more informative prior and leads to higher fine\-tuned performance on FLUXNET\-C dataset\.

#### 4\.2\.3Model Analysis

Historical Year Length\. We first investigate the optimal length of historical data used in the inner\-loop of CHAM\-net\. As shown in Figure[4](https://arxiv.org/html/2606.00338#S4.F4), we report performance using the R2value to compare different historical year window lengths\. We evaluate historical year lengths of 2, 4, 6, and 8 years to determine the optimal setting for the simulation datasets TEM\-E, TEM\-C, and MeMo\. Due to the sparsity and varying temporal spans of the observational datasets \(e\.g\., some sites contain only three years of data\), we use a single historical year length for FLUXNET\-E and FLUXNET\-C datasets in our experiments\. This setting is sufficient to yield explicit performance improvements, as demonstrated in Table[1](https://arxiv.org/html/2606.00338#S4.T1)\. For the simulation datasets, Figure[4](https://arxiv.org/html/2606.00338#S4.F4)shows that performance varies marginally across different lengths of historical window\. Nevertheless, a four\-year historical window consistently achieves the best performance\.

![Refer to caption](https://arxiv.org/html/2606.00338v1/x4.png)Figure 4:Sensitivity study of CHAM\-net with respect to historical year length\.Learned Representations Analysis\. To interpret the learned representations and identify the most influential inputs, we analyze the correlation between the learned representations and the input variables\. Using FLUXNET\-E as an example, we extract the site\-specific representations for all sites after training and apply principal component analysis \(PCA\) to identify the top three dominant components\. We then compute the Pearson product–moment correlation coefficients between these components and the site\-wise input features\.

As shown in the heatmap in Figure[5](https://arxiv.org/html/2606.00338#S4.F5), the learned representations capture meaningful physical relationships when encoding historical information\. In the figure, the x\-axis represents the 15 input features, while the y\-axis corresponds to the three most important components identified by PCA, namely Weight\_1 to Weight\_3\. In the heatmap, positions in read are indicative of strong associations\. The results show that topsoil bulk density, vegetation types \(cltvegandvegetation\_type\), climate type, solar radiation \(SOLR\), and air temperature \(TAIR\) are among the most influential features\. These findings are consistent with the established understanding of methane processes in natural ecosystems and suggest that the learned representations can provide useful insights for improved methane forecasting and mitigation strategies\.

![Refer to caption](https://arxiv.org/html/2606.00338v1/x5.png)Figure 5:Correlation between learned weights and input features\.
#### 4\.2\.4Ablation Studies

We also conduct the ablation experiments to examine the contribution of each component in our framework\. Using the emission datasets as an example, we evaluate three variants of CHAM\-net: \(i\) a mean\-pooling\-based CHAM\-net, \(ii\) CHAM\-net with cross\-attention, and \(iii\) CHAM\-net with both cross\-attention and contrastive learning\. For reference, we also include the LSTM as baseline\.

As shown in the Figure[6](https://arxiv.org/html/2606.00338#S4.F6), the incorporation of historical information accounts for the largest performance improvement\. Contrastive learning provides the second\-largest gain, as it enhances the model’s ability to distinguish site\-specific characteristics and extract informative features from historical data, leading to more expressive embeddings\. Cross\-attention yields a relatively smaller improvement in this setting\. This is because input features evolve slowly over time, resulting in high similarity across historical years, thus the cross\-attention offers limited benefit over mean pooling\. Nevertheless, cross\-attention remains a configurable component and may offer greater benefits when applied to other domains where inputs diversity across years is more pronounced\.

![Refer to caption](https://arxiv.org/html/2606.00338v1/x6.png)Figure 6:Ablation study of different model variants \(in R2\)\.

## 5Related Work

##### Methane Prediction\.

Previous methane flux prediction studiesKimet al\.\([2020](https://arxiv.org/html/2606.00338#bib.bib27)\); Irvinet al\.\([2021](https://arxiv.org/html/2606.00338#bib.bib26)\); Luoet al\.\([2023](https://arxiv.org/html/2606.00338#bib.bib28)\); Chenet al\.\([2024](https://arxiv.org/html/2606.00338#bib.bib30)\); Sahaet al\.\([2025](https://arxiv.org/html/2606.00338#bib.bib29)\)primarily rely on Random Forest, decision tree, extreme gradient boosting \(XGB\), and Artificial Neural Network \(ANN\) approaches trained on relatively small observational datasets, which often focused on specific regions and coarse spatial resolutions\. The work ofSunet al\.\([2025](https://arxiv.org/html/2606.00338#bib.bib20)\)introduced the first global wetland methane emission dataset that integrates both simulation and observational data, enabling large\-scale machine learning studies\. To the best of our knowledge, this work is the first to jointly model methane emission and consumption in natural ecosystems using machine learning, achieving state\-of\-the\-art performance and providing new insights for global methane budget estimation and analysis\.

##### Knowledge\-guided machine learning\.

Knowledge\-guided machine learning \(KGML\)Willardet al\.\([2022](https://arxiv.org/html/2606.00338#bib.bib31)\); Karpatneet al\.\([2024](https://arxiv.org/html/2606.00338#bib.bib32)\); Yuet al\.\([2025](https://arxiv.org/html/2606.00338#bib.bib55)\)has been successfully applied in various environmental studies, including carbon dioxide modelingLiuet al\.\([2024a](https://arxiv.org/html/2606.00338#bib.bib33)\)and lake temperature profilingJiaet al\.\([2021](https://arxiv.org/html/2606.00338#bib.bib34)\); Yuet al\.\([2024](https://arxiv.org/html/2606.00338#bib.bib54)\), by embedding physical knowledge into the loss functions of ML models to enhance performance\. While directly incorporating biogeochemical equations into methane prediction models is beyond the scope of this work, it represents a promising direction for future research\. In the supplementary material, we leverage pretraining and fine\-tuning to transfer knowledge from simulation datasets to observational datasets, which can also be viewed as a form of knowledge guidance\.

## 6Conclusion

In this paper, we propose a contrastive hierarchical adaptive meta\-learning framework that explicitly leverages site\-specific historical information to capture both spatial heterogeneity and temporal dynamics in methane prediction\. By learning site\-aware representations, our model improves prediction accuracy for both methane emission and consumption across simulation and observational datasets\. These improvements support more reliable estimation of the global methane budget and provide insights that may inform effective strategies for natural methane mitigation\. Experimental results demonstrate that our approach consistently outperforms all other baselines, achieving the lowest nRMSE and the highest R2on both methane emission and consumption datasets\.

## Acknowledgements

Rongchao Dong and Xiaowei Jia were partially supported by the National Science Foundation \(NSF\) grants 2203581, 2239175, 2316305, 2147195, 2425845, and 2530609; the USGS award G22AC00266; and the NASA grants 80NSSC24K1061 and 80NSSC25K0013\. Licheng Liu and Youmi Oh are supported by the Department of Energy \(DOE\) grant DE\-SC0024360, NSF ESIIL 2153040\. Yiqun Xie is supported in part by the NSF under Grant No\. 2126474, 2147195, 2425844, and 2530610; NASA under grant 80NSSC25K0013 and 80NSSC25K7221; Google’s AI for Social Good Impact Scholars program\. We also sincerely thank all reviewers for their thoughtful comments and feedback, and all our collaborators for their insightful contributions\.

## References

- G\. Bohrer and T\. Morin \(2015\)FLUXNET\-CH4 US\-ORv Olentangy River Wetland Research Park\.Dataset,FLUXNET\.Note:Dataset\. Time range: 2011\-2015External Links:[Document](https://dx.doi.org/10.18140/FLX/1669689),[Link](https://doi.org/10.18140/FLX/1669689)Cited by:[§1](https://arxiv.org/html/2606.00338#S1.p3.1)\.
- S\. Chen, L\. Liu, Y\. Ma, Q\. Zhuang, and N\. J\. Shurpali \(2024\)Quantifying global wetland methane emissions with in situ methane flux data and machine learning approaches\.Earth’s Future12\(11\),pp\. e2023EF004330\.Cited by:[§3\.3](https://arxiv.org/html/2606.00338#S3.SS3.p1.1),[§5](https://arxiv.org/html/2606.00338#S5.SS0.SSS0.Px1.p1.1)\.
- S\. Chen, C\. Li, N\. Yoder, S\. Ö\. Arik, and T\. Pfister \(2023\)TSMixer: an all\-mlp architecture for time series forecasting\.CoRRabs/2303\.06053\.External Links:[Link](https://doi.org/10.48550/arXiv.2303.06053),[Document](https://dx.doi.org/10.48550/ARXIV.2303.06053),2303\.06053Cited by:[§4\.1](https://arxiv.org/html/2606.00338#S4.SS1.p1.1)\.
- K\. B\. Delwiche, S\. H\. Knox, A\. Malhotra, E\. Fluet\-Chouinard, G\. McNicol, S\. Feron, Z\. Ouyang, D\. Papale, C\. Trotta, E\. Canfora,et al\.\(2021\)FLUXNET\-ch4: a global, multi\-ecosystem dataset and analysis of methane seasonality from freshwater wetlands\.Earth System Science Data Discussions2021,pp\. 1–111\.Cited by:[1st item](https://arxiv.org/html/2606.00338#S4.I0.i2.I2.i1.p1.1),[2nd item](https://arxiv.org/html/2606.00338#S4.I0.i2.I2.i2.p1.1)\.
- D\. Hendrycks \(2016\)Gaussian error linear units \(gelus\)\.arXiv preprint arXiv:1606\.08415\.Cited by:[§4\.1](https://arxiv.org/html/2606.00338#S4.SS1.p4.1)\.
- S\. Hochreiter and J\. Schmidhuber \(1997\)Long short\-term memory\.Neural computation9\(8\),pp\. 1735–1780\.Cited by:[2nd item](https://arxiv.org/html/2606.00338#S3.I0.i2.I1.i2.p1.1),[§4\.1](https://arxiv.org/html/2606.00338#S4.SS1.p1.1)\.
- T\. Hospedales, A\. Antoniou, P\. Micaelli, and A\. Storkey \(2021\)Meta\-learning in neural networks: a survey\.IEEE transactions on pattern analysis and machine intelligence44\(9\),pp\. 5149–5169\.Cited by:[§1](https://arxiv.org/html/2606.00338#S1.p5.1)\.
- J\. Irvin, S\. Zhou, G\. McNicol, F\. Lu, V\. Liu, E\. Fluet\-Chouinard, Z\. Ouyang, S\. H\. Knox, A\. Lucas\-Moffat, C\. Trotta,et al\.\(2021\)Gap\-filling eddy covariance methane fluxes: comparison of machine learning model predictions and uncertainties at fluxnet\-ch4 wetlands\.Agricultural and Forest Meteorology308,pp\. 108528\.Cited by:[§1](https://arxiv.org/html/2606.00338#S1.p2.1),[§5](https://arxiv.org/html/2606.00338#S5.SS0.SSS0.Px1.p1.1)\.
- X\. Jia, J\. Willard, A\. Karpatne, J\. S\. Read, J\. A\. Zwart, M\. Steinbach, and V\. Kumar \(2021\)Physics\-guided machine learning for scientific discovery: an application in simulating lake temperature profiles\.ACM/IMS Transactions on Data Science2\(3\),pp\. 1–26\.Cited by:[§5](https://arxiv.org/html/2606.00338#S5.SS0.SSS0.Px2.p1.1)\.
- A\. Karpatne, X\. Jia, and V\. Kumar \(2024\)Knowledge\-guided machine learning: current trends and future prospects\.arXiv preprint arXiv:2403\.15989\.Cited by:[§5](https://arxiv.org/html/2606.00338#S5.SS0.SSS0.Px2.p1.1)\.
- Y\. Kim, M\. S\. Johnson, S\. H\. Knox, T\. A\. Black, H\. J\. Dalmagro, M\. Kang, J\. Kim, and D\. Baldocchi \(2020\)Gap\-filling approaches for eddy covariance methane fluxes: a comparison of three machine learning algorithms and a traditional method with principal component analysis\.Global Change Biology26\(3\),pp\. 1499–1518\.Cited by:[§1](https://arxiv.org/html/2606.00338#S1.p2.1),[§5](https://arxiv.org/html/2606.00338#S5.SS0.SSS0.Px1.p1.1)\.
- D\. P\. Kingma and J\. Ba \(2017\)Adam: a method for stochastic optimization\.External Links:1412\.6980,[Link](https://arxiv.org/abs/1412.6980)Cited by:[§4\.1](https://arxiv.org/html/2606.00338#S4.SS1.p4.1)\.
- F\. Koebsch and G\. Jurasinski \(2018\)FLUXNET\-CH4 DE\-Hte Huetelmoor\.Dataset,FLUXNET\.Note:Dataset\. Time range: 2011\-2018External Links:[Document](https://dx.doi.org/10.18140/FLX/1669634),[Link](https://doi.org/10.18140/FLX/1669634)Cited by:[§1](https://arxiv.org/html/2606.00338#S1.p3.1)\.
- Y\. Kong, Z\. Wang, Y\. Nie, T\. Zhou, S\. Zohren, Y\. Liang, P\. Sun, and Q\. Wen \(2025\)Unlocking the power of lstm for long term time series forecasting\.InProceedings of the AAAI Conference on Artificial Intelligence,Vol\.39,pp\. 11968–11976\.Cited by:[§4\.1](https://arxiv.org/html/2606.00338#S4.SS1.p1.1)\.
- L\. Liu, W\. Zhou, K\. Guan, B\. Peng, S\. Xu, J\. Tang, Q\. Zhu, J\. Till, X\. Jia, C\. Jiang,et al\.\(2024a\)Knowledge\-guided machine learning can improve carbon cycle quantification in agroecosystems\.Nature communications15\(1\),pp\. 357\.Cited by:[§2](https://arxiv.org/html/2606.00338#S2.p1.14),[§5](https://arxiv.org/html/2606.00338#S5.SS0.SSS0.Px2.p1.1)\.
- S\. Liu, H\. Yu, C\. Liao, J\. Li, W\. Lin, A\. X\. Liu, and S\. Dustdar \(2022\)Pyraformer: low\-complexity pyramidal attention for long\-range time series modeling and forecasting\.InThe Tenth International Conference on Learning Representations, ICLR 2022, Virtual Event, April 25\-29, 2022,External Links:[Link](https://openreview.net/forum?id=0EXmFzUn5I)Cited by:[§4\.1](https://arxiv.org/html/2606.00338#S4.SS1.p1.1)\.
- Y\. Liu, T\. Hu, H\. Zhang, H\. Wu, S\. Wang, L\. Ma, and M\. Long \(2024b\)ITransformer: inverted transformers are effective for time series forecasting\.InThe Twelfth International Conference on Learning Representations, ICLR 2024, Vienna, Austria, May 7\-11, 2024,External Links:[Link](https://openreview.net/forum?id=JePfAI8fah)Cited by:[§4\.1](https://arxiv.org/html/2606.00338#S4.SS1.p1.1)\.
- A\. Lohila, M\. Aurela, J\. Tuovinen, T\. Laurila, J\. Hatakka, J\. Rainne, and T\. Mäkelä \(2010\)FLUXNET\-CH4 FI\-Lom Lompolojankka\.Dataset,FLUXNET\.Note:Dataset\. Time range: 2006\-2010External Links:[Document](https://dx.doi.org/10.18140/FLX/1669638),[Link](https://doi.org/10.18140/FLX/1669638)Cited by:[§1](https://arxiv.org/html/2606.00338#S1.p4.1)\.
- R\. Luo, J\. Wang, and I\. Gates \(2023\)Machine learning for accurate methane concentration predictions: short\-term training, long\-term results\.Environmental Research Communications5\(8\),pp\. 081003\.Cited by:[§1](https://arxiv.org/html/2606.00338#S1.p2.1),[§3\.3](https://arxiv.org/html/2606.00338#S3.SS3.p1.1),[§5](https://arxiv.org/html/2606.00338#S5.SS0.SSS0.Px1.p1.1)\.
- C\. S\. Malley, N\. Borgford\-Parnell, S\. Haeussling, I\. C\. Howard, E\. N\. Lefèvre, and J\. C\. Kuylenstierna \(2023\)A roadmap to achieve the global methane pledge\.Environmental Research: Climate2\(1\),pp\. 011003\.Cited by:[§1](https://arxiv.org/html/2606.00338#S1.SS0.SSS0.Px1.p1.1)\.
- K\. A\. Mar, C\. Unger, L\. Walderdorff, and T\. Butler \(2022\)Beyond co2 equivalence: the impacts of methane on climate, ecosystems, and health\.Environmental science & policy134,pp\. 127–136\.Cited by:[§1](https://arxiv.org/html/2606.00338#S1.p1.2)\.
- J\. Matthes, C\. Sturtevant, P\. Oikawa, S\. Chamberlain, D\. Szutu, A\. Ortiz, J\. Verfaillie, and D\. Baldocchi \(2018\)FLUXNET\-CH4 US\-Myb Mayberry Wetland\.Dataset,FLUXNET\.Note:Dataset\. Time range: 2010\-2018External Links:[Document](https://dx.doi.org/10.18140/FLX/1669685),[Link](https://doi.org/10.18140/FLX/1669685)Cited by:[§1](https://arxiv.org/html/2606.00338#S1.p3.1)\.
- F\. Murguia\-Flores, S\. Arndt, A\. L\. Ganesan, G\. Murray\-Tortarolo, and E\. R\. Hornibrook \(2018\)Soil methanotrophy model \(memo v1\. 0\): a process\-based model to quantify global uptake of atmospheric methane by soil\.Geoscientific Model Development11\(6\),pp\. 2009–2032\.Cited by:[2nd item](https://arxiv.org/html/2606.00338#S4.I0.i2.I2.i2.p1.1)\.
- F\. Murguia\-Flores, A\. L\. Ganesan, S\. Arndt, and E\. R\. Hornibrook \(2021\)Global uptake of atmospheric methane by soil from 1900 to 2100\.Global Biogeochemical Cycles35\(7\),pp\. e2020GB006774\.Cited by:[2nd item](https://arxiv.org/html/2606.00338#S4.I0.i2.I2.i2.p1.1)\.
- Y\. Nie, N\. H\. Nguyen, P\. Sinthong, and J\. Kalagnanam \(2022\)A time series is worth 64 words: long\-term forecasting with transformers\.CoRRabs/2211\.14730\.External Links:[Link](https://doi.org/10.48550/arXiv.2211.14730),[Document](https://dx.doi.org/10.48550/ARXIV.2211.14730),2211\.14730Cited by:[§4\.1](https://arxiv.org/html/2606.00338#S4.SS1.p1.1)\.
- I\. B\. Ocko, T\. Sun, D\. Shindell, M\. Oppenheimer, A\. N\. Hristov, S\. W\. Pacala, D\. L\. Mauzerall, Y\. Xu, and S\. P\. Hamburg \(2021\)Acting rapidly to deploy readily available methane mitigation measures by sector can immediately slow global warming\.Environmental Research Letters16\(5\),pp\. 054042\.Cited by:[§1](https://arxiv.org/html/2606.00338#S1.p1.2)\.
- X\. Qiu, X\. Wu, Y\. Lin, C\. Guo, J\. Hu, and B\. Yang \(2025\)Duet: dual clustering enhanced multivariate time series forecasting\.InProceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining V\. 1,pp\. 1185–1196\.Cited by:[§4\.1](https://arxiv.org/html/2606.00338#S4.SS1.p1.1)\.
- E\. Saha, O\. Wang, A\. K\. Chakraborty, P\. V\. Garcia, R\. Milne, and H\. Wang \(2025\)Dispersion based recurrent neural network model for methane monitoring in albertan tailings ponds\.Journal of Environmental Management395,pp\. 127748\.Cited by:[§1](https://arxiv.org/html/2606.00338#S1.p2.1),[§5](https://arxiv.org/html/2606.00338#S5.SS0.SSS0.Px1.p1.1)\.
- M\. Saunois, A\. Martinez, B\. Poulter, Z\. Zhang, P\. A\. Raymond, P\. Regnier, J\. G\. Canadell, R\. B\. Jackson, P\. K\. Patra, P\. Bousquet,et al\.\(2025\)Global methane budget 2000–2020\.Earth System Science Data17\(5\),pp\. 1873–1958\.Cited by:[§1](https://arxiv.org/html/2606.00338#S1.p1.2)\.
- H\. Schmid and J\. Klatt \(2014\)FLUXNET\-CH4 DE\-SfN Schechenfilz Nord\.Dataset,FLUXNET\.Note:Dataset\. Time range: 2012\-2014External Links:[Document](https://dx.doi.org/10.18140/FLX/1669635),[Link](https://doi.org/10.18140/FLX/1669635)Cited by:[§4\.2\.2](https://arxiv.org/html/2606.00338#S4.SS2.SSS2.p1.1)\.
- T\. Stocker \(2014\)Climate change 2013: the physical science basis: working group i contribution to the fifth assessment report of the intergovernmental panel on climate change\.Cambridge university press\.Cited by:[§1](https://arxiv.org/html/2606.00338#S1.p1.2)\.
- Y\. Sun, S\. Chen, S\. Chen, C\. Qiu, L\. Liu, Y\. Oh, S\. L\. Malone, G\. McNicol, Q\. Zhuang, C\. Smith,et al\.\(2025\)X\-methanewet: a cross\-scale global wetland methane emission benchmark dataset for advancing science discovery with ai\.arXiv preprint arXiv:2505\.18355\.Cited by:[§1](https://arxiv.org/html/2606.00338#S1.p2.1),[§2](https://arxiv.org/html/2606.00338#S2.p1.14),[§2](https://arxiv.org/html/2606.00338#S2.p4.1),[§3\.3](https://arxiv.org/html/2606.00338#S3.SS3.p1.1),[1st item](https://arxiv.org/html/2606.00338#S4.I0.i2.I2.i1.p1.1),[§5](https://arxiv.org/html/2606.00338#S5.SS0.SSS0.Px1.p1.1)\.
- H\. Tian, G\. Chen, C\. Lu, X\. Xu, W\. Ren, B\. Zhang, K\. Banger, B\. Tao, S\. Pan, M\. Liu,et al\.\(2015\)Global methane and nitrous oxide emissions from terrestrial ecosystems due to multiple environmental changes\.Ecosystem Health and Sustainability1\(1\),pp\. 1–20\.Cited by:[§1](https://arxiv.org/html/2606.00338#S1.p2.1)\.
- A\. Vaswani, N\. Shazeer, N\. Parmar, J\. Uszkoreit, L\. Jones, A\. N\. Gomez, Ł\. Kaiser, and I\. Polosukhin \(2017\)Attention is all you need\.Advances in neural information processing systems30\.Cited by:[§4\.1](https://arxiv.org/html/2606.00338#S4.SS1.p1.1)\.
- S\. Wang, H\. Wu, X\. Shi, T\. Hu, H\. Luo, L\. Ma, J\. Y\. Zhang, and J\. Zhou \(2024\)TimeMixer: decomposable multiscale mixing for time series forecasting\.InThe Twelfth International Conference on Learning Representations, ICLR 2024, Vienna, Austria, May 7\-11, 2024,External Links:[Link](https://openreview.net/forum?id=7oLshfEIC2)Cited by:[§4\.1](https://arxiv.org/html/2606.00338#S4.SS1.p1.1)\.
- J\. Willard, X\. Jia, S\. Xu, M\. Steinbach, and V\. Kumar \(2022\)Integrating scientific knowledge with machine learning for engineering and environmental systems\.ACM Computing Surveys55\(4\),pp\. 1–37\.Cited by:[§5](https://arxiv.org/html/2606.00338#S5.SS0.SSS0.Px2.p1.1)\.
- R\. Yu, C\. Qiu, R\. Ladwig, P\. C\. Hanson, Y\. Xie, Y\. Li, and X\. Jia \(2024\)Adaptive process\-guided learning: an application in predicting lake do concentrations\.In2024 IEEE International Conference on Data Mining \(ICDM\),pp\. 580–589\.Cited by:[§5](https://arxiv.org/html/2606.00338#S5.SS0.SSS0.Px2.p1.1)\.
- R\. Yu, C\. Qiu, R\. Ladwig, P\. Hanson, Y\. Xie, and X\. Jia \(2025\)Physics\-guided foundation model for scientific discovery: an application to aquatic science\.InProceedings of the AAAI Conference on Artificial Intelligence,Vol\.39,pp\. 28548–28556\.Cited by:[§5](https://arxiv.org/html/2606.00338#S5.SS0.SSS0.Px2.p1.1)\.
- Z\. Zhang, N\. E\. Zimmermann, A\. Stenke, X\. Li, E\. L\. Hodson, G\. Zhu, C\. Huang, and B\. Poulter \(2017\)Emerging role of wetland methane emissions in driving 21st century climate change\.Proceedings of the National Academy of Sciences114\(36\),pp\. 9647–9652\.Cited by:[§1](https://arxiv.org/html/2606.00338#S1.p2.1)\.
- Q\. Zhuang, M\. Chen, K\. Xu, J\. Tang, E\. Saikawa, Y\. Lu, J\. M\. Melillo, R\. G\. Prinn, and A\. D\. McGuire \(2013\)Response of global soil consumption of atmospheric methane to changes in atmospheric climate and nitrogen deposition\.Global Biogeochemical Cycles27\(3\),pp\. 650–663\.Cited by:[2nd item](https://arxiv.org/html/2606.00338#S4.I0.i2.I2.i2.p1.1)\.
- Q\. Zhuang, J\. M\. Melillo, D\. W\. Kicklighter, R\. G\. Prinn, A\. D\. McGuire, P\. A\. Steudler, B\. S\. Felzer, and S\. Hu \(2004\)Methane fluxes between terrestrial ecosystems and the atmosphere at northern high latitudes during the past century: a retrospective analysis with a process\-based biogeochemistry model\.Global Biogeochemical Cycles18\(3\)\.Cited by:[§1](https://arxiv.org/html/2606.00338#S1.p2.1),[1st item](https://arxiv.org/html/2606.00338#S4.I0.i2.I2.i1.p1.1)\.
CHAM-net: A Contrastive Hierarchical Adaptive Meta-network for Robust Global Methane Flux Prediction

Similar Articles

ChemHyperMag: Physics-informed magnetic hypergraph learning improves molecular ADMET prediction

Model-Agnostic Meta Learning for Class Imbalance Adaptation

TriHead-GAN: A Generative Adversarial Network with Triple-Head Discriminator for Carbon Emission Time Series Generation

@HuggingPapers: Stable-GFlowNet: Toward Diverse and Robust LLM Red-Teaming via Contrastive Trajectory Balance Naver AI eliminates unsta…

Remedying Coarsening-Based GNN Training under Heterophily via Adaptive Complementary Enhancement

Submit Feedback

Similar Articles

ChemHyperMag: Physics-informed magnetic hypergraph learning improves molecular ADMET prediction
Model-Agnostic Meta Learning for Class Imbalance Adaptation
TriHead-GAN: A Generative Adversarial Network with Triple-Head Discriminator for Carbon Emission Time Series Generation
@HuggingPapers: Stable-GFlowNet: Toward Diverse and Robust LLM Red-Teaming via Contrastive Trajectory Balance Naver AI eliminates unsta…
Remedying Coarsening-Based GNN Training under Heterophily via Adaptive Complementary Enhancement