$\textit{BlockFormer}$ : Transformer-based inference from interaction maps
Summary
BlockFormer introduces a transformer architecture for solving inverse problems from block-structured interaction maps, such as centromere identification from Hi-C data, using a custom simulator for synthetic training data.
View Cached Full Text
Cached at: 05/22/26, 08:50 AM
# Transformer-based inference from interaction maps
Source: [https://arxiv.org/html/2605.21617](https://arxiv.org/html/2605.21617)
Eloïse Touron Univ\. Grenoble Alpes, Inria CNRS, Grenoble INP, LJK, France eloise\.touron@inria\.fr &Pedro L\. C\. Rodrigues Univ\. Grenoble Alpes, Inria CNRS, Grenoble INP, LJK, France pedro\.rodrigues@inria\.fr &Julyan Arbel Univ\. Grenoble Alpes, Inria CNRS, Grenoble INP, LJK, France julyan\.arbel@inria\.fr &Nelle Varoquaux TIMC, Univ\. Grenoble Alpes CNRS, Grenoble INP, France nelle\.varoquaux@univ\-grenoble\-alpes\.fr&Michael Arbel Univ\. Grenoble Alpes, Inria CNRS, Grenoble INP, LJK, France michael\.arbel@inria\.fr
###### Abstract
Inference from interaction maps, such as centromere identification from genome\-wide chromosome conformation capture techniques –notably Hi\-C– can be formulated as a generic inverse problem: infer a set of parameters given a map summarizing pairwise interactions between entities through blocks of variable numbers and sizes\. In this work, we introduce a data\-driven approach that leverages shared structure between these maps, such as global alignment between localized patterns, while handling the variability in number and size of entities arising in real\-world data\. Our approach relies on a transformer architecture capable of handling such variability and a custom simulator to generate abundant, yet computationally cheap synthetic data for training\. Applied to the problem of centromere localization, the method accurately recover their genomic positions across a wide range of species of various genome sizes\.
## 1Introduction
Interaction maps summarize pairwise relationships between entities within a system\. Inherently encoding the underlying system’s structure, these maps can support inference of entity\-level properties\. For example, protein\-protein interaction maps are used to identify key regulatory proteins at the origin of abnormal gene expression[protein](https://arxiv.org/html/2605.21617#bib.bib11), modularity in species\-species interaction maps help identifying critical species for ecosystem stability[ecology](https://arxiv.org/html/2605.21617#bib.bib26)\. The goal is then to solve the following inverse problem : infer per\-entity parameters from a given interaction map\.
In many such problems, these maps exhibit a block\-wise structure, where each block represents interactions between two entities, with block number and size varying across maps\. The maps can exhibit both localized patterns within each block and non\-local information across multiple blocks, for instance, in the form of an alignment between per\-block patterns\. Several biological applications fit this framework, including Hi\-C maps, where the interacting entities are chromosomes\. Hi\-C maps summarize physical contact counts between genomic loci across a population of cells into a block\-wise matrix and have become a central tool for studying DNA folding and some associated genetic diseases, notably through the identification of chromatin loops and topologically associated domains \(TADs\)[tads](https://arxiv.org/html/2605.21617#bib.bib9);[loops](https://arxiv.org/html/2605.21617#bib.bib33)\. Beyond them, centromeres are key elements of genome organization due to their essential role in chromosome segregation and genome stability[kinetochore](https://arxiv.org/html/2605.21617#bib.bib5)\. While they have traditionally been annotated experimentally[FISH](https://arxiv.org/html/2605.21617#bib.bib25);[CHIP](https://arxiv.org/html/2605.21617#bib.bib22), these approaches can be imprecise or fail for some species[inferfail](https://arxiv.org/html/2605.21617#bib.bib13)\. Methods such asCenturioninstead infer centromere positions directly from Hi\-C data by fitting Gaussian profiles to interaction peaks[nelle](https://arxiv.org/html/2605.21617#bib.bib32)\. This procedure, however, is non\-amortized and computationally costly because it requires solving a non\-convex optimization problem; see Appendix[B](https://arxiv.org/html/2605.21617#A2)\.
As a large number of Hi\-C maps became recently available, there is a clear interest in leveraging learning\-based approaches to automatize the inference of properties from a given interaction map, like centromeres location\. This raises methodological challenges such as how to effectively design and learn models capable of handling maps with various block numbers and shapes while capturing consistent structural pattern across blocks\. Off\-the shelf techniques, such as supervised deep learning, could be applied, but they would require manually annotating data, which is costly\. Bayesian inference approaches were proposed to estimate some DNA properties such as chromatin compaction and persistence length[arbona](https://arxiv.org/html/2605.21617#bib.bib3), but they require defining a tractable likelihood model of these interaction maps, which might be challenging to do, given the complexity and rich structure of these interactions\.
Alternatively, leveraging simulated data appears as a promising way to bypass those limitations when learning an inference model\. Simulation\-based inference \(SBI\) was particularly useful in physics applications[sbi\_cosmo](https://arxiv.org/html/2605.21617#bib.bib1);[sbi\_cosmo\_1](https://arxiv.org/html/2605.21617#bib.bib2)due to realistic simulators[cosmo\_simu](https://arxiv.org/html/2605.21617#bib.bib10);[cosmo\_simu\_1](https://arxiv.org/html/2605.21617#bib.bib16)\. Prior\-fitted networks have shown excellent results on real world tabular data[tabfn](https://arxiv.org/html/2605.21617#bib.bib17)while training is still performed on purely synthetic ones\. In both settings, the ability to handle data of various sizes is still an active research topic\.
In this work111This submission is based on an earlier version[etouron](https://arxiv.org/html/2605.21617#bib.bib31)presented at a workshop without official proceedings\., we proposeBlockFormer, a transformer\-based model to infer per\-entity properties from interaction maps with variable block\-wise structure\. Our architecture employs a three dimensional positional encoding that allows handling variable block sizes and numbers while capturing per\-block patterns and aggregating non\-local information across multiple blocks\. We designed a simple simulator that reproduces patterns necessary for inference enabling fast generation of interaction maps\. Pre\-trained on synthetic data of variable structures generated via this simulator,BlockFormercan perform fast and accurate inference for applications including centromere localization from Hi\-C maps of a wide range of species of various genome sizes \(see Figure[1](https://arxiv.org/html/2605.21617#S1.F1)\)\.
Figure 1:Inference from interaction maps usingBlockFormer\(see architecture in Figure[2](https://arxiv.org/html/2605.21617#S3.F2)\)\. The input isCkC\_\{k\}, any sequence of blocks of interactions between entitykkand others and the output is the parameter estimationθ^k\\hat\{\\theta\}\_\{k\}\.
## 2Related work
Deep learning approaches in structural biology\.Prior works have leveraged deep learning methods to model biological structure and function at different scales\.[akita](https://arxiv.org/html/2605.21617#bib.bib21)and[enformer](https://arxiv.org/html/2605.21617#bib.bib4)proposed CNN\- and attention\-based architectures to predict chromatin folding or gene expression directly from DNA sequence\. However, these methods are not directly applicable to our setting, as they operate on single blocks and do not model interactions across multiple blocks\. In the context of chromatin structure, Hi\-C–based graph approaches such as[hicgnn](https://arxiv.org/html/2605.21617#bib.bib12);[hicoex](https://arxiv.org/html/2605.21617#bib.bib34)model contact maps as graphs and apply graph neural networks \(GNNs\) or graph attention networks \(GATs\) to reconstruct 3D genome organization or infer functional relationships\. These methods, however, typically operate on localized regions or treat the genome as a single homogeneous graph, relying on global message passing without explicitly capturing block\-wise organization\. Transformer\-based architectures have also been successfully applied in structural biology, most notably inAlphafold[alphafold](https://arxiv.org/html/2605.21617#bib.bib20), which predicts the 3D structure of individual proteins from amino acid sequences using evolutionary constraints such as multiple sequence alignments \(MSAs\)\. However, such approaches are designed for single\-molecule folding and do not extend to genome\-scale chromatin organization\. In contrast, our method operates directly on genome\-wide Hi\-C contact maps and explicitly models their block\-wise structure\. By leveraging a block\-aware transformer\-based architecture, it enables the identification of large\-scale structural features such as centromeres that are not addressed by prior sequence\-based or graph\-based methods\. Inspired by Vision Transformers \(ViT\)[transf\_image](https://arxiv.org/html/2605.21617#bib.bib8)which tokenize input into fixed\-length patches to remove the image size constraint, we adopt a token\-based mechanism, with key adaptations to block\-wise structures \(see Section[3\.1](https://arxiv.org/html/2605.21617#S3.SS1)and Figure[2](https://arxiv.org/html/2605.21617#S3.F2)\)\.
Bio\-physical simulators of DNA\.In the biological context, highly complex simulators are typically used[simu\_bio](https://arxiv.org/html/2605.21617#bib.bib15);[simu\_folding](https://arxiv.org/html/2605.21617#bib.bib6)\. Based on molecular dynamics, they model DNA fragments as polymers or chains of beads and attempt to mimic the chromosome folding in the cell by solving biophysical equations\. As many variables are involved and must satisfy a set of constraints, such simulators are extremely slow to produce only a single folding configuration\. However, training our model requires lots of contact maps that are summary statistics over a population of chromosome folding, not just one\. Consequently, using a biological simulator to construct thousands of contact maps is computationally impractical within a reasonable runtime\. Moreover, Hi\-C structure already encodes sufficient information to localize centromeres, making explicit reconstruction of full 3D chromatin folding unnecessary\. We therefore build a simplified and lightweight contact map simulator that directly generates the mapCCfrom the centromere positionsθ\\thetawithout simulating any DNA folding \(see Section[4](https://arxiv.org/html/2605.21617#S4)\)\.
## 3BlockFormer: transformer\-based inference from interaction maps
Figure 2:Architecture ofBlockFormer\. The input is any sequence of blocks of interactions between entityiiand others and the output is the parameter estimationθi\\theta\_\{i\}\.The goal is to give a point estimate of the per\-entity parameterθ\\thetagiven any interaction mapCCthat shows an enrichment of interactions at the location of the parameters\. Two entities\(i,j\)\(i,j\)of sizelil\_\{i\}andljl\_\{j\}represent a block of interaction inCCof size\(li,lj\)\(l\_\{i\},l\_\{j\}\)\. Depending on the studied entities, both the number and the size of interacting entities vary, leading to interaction blocks with heterogeneous sizes and numbers\. Consequently, a model applicable to any set of entities should generalize to any number and dimension of blocks\. Transformers provide a natural architecture to handle these challenges: by mapping any input image into a sequence of tokens, they eliminate the image size constraint\. However, in contrast to standard settings, our problem involves block\-wise maps, for which no existing architectures are directly designed\. In particular, we introduce a block\-aware transformer\-based architecture tailored to interaction maps: theBlockFormer\. Along with a proper training strategy, our architecture is flexible to a wide range of interaction maps\. We provide comparisons ofBlockFormerwith other architectures as well as ablation studies of per\-block modeling and the training strategy in Appendix[E](https://arxiv.org/html/2605.21617#A5)\.
### 3\.1A block\-aware transformer\-based architecture
Inferring the full parameterθ\\thetafrom an interaction mapCCcan be challenging because the dimensionality ofθ\\thetavaries with the number of entities\. AssumingIIentities, we therefore decompose the inference problem intoIIsubproblems, where the goal is to infer theithi^\{\\text\{th\}\}componentθi\\theta\_\{i\}from a corresponding sub\-mapCiC\_\{i\}, that describes interactions of entityiiwith others \(see Appendix[F\.2](https://arxiv.org/html/2605.21617#A6.SS2)for decomposition justification\)\. This reformulation reduces the task to learning a shared architectureBlockFormerthat mapsCiC\_\{i\}toθi\\theta\_\{i\}\. The same model can then be applied independently across entities, enabling parameter inference for arbitrary numbers of entities \(see Figure[2](https://arxiv.org/html/2605.21617#S3.F2)for architecture details\)\.
Per\-block patching\.The input interaction mapCiC\_\{i\}is first cut into squared patches of size\(P,P\)\(P,P\)\. The partitioning process must respect the block\-wise structure of the map: to avoid patches overlapping multiple blocks, we first apply an asymmetric zero\-padding on right and bottom of each block ofCiC\_\{i\}to a multiple of the patch size\. After padding, the map is of size\(Hp,Wp\)\(H\_\{p\},W\_\{p\}\)resulting inN=HpWp/P2N=H\_\{p\}W\_\{p\}/P^\{2\}patches\. We define a constant latent vector sizeDDused across all transformer layers and referred to as the embedding dimension\. All the patches are then projected toDDdimensions, resulting in the patch embeddings of size\(N,D\)\(N,D\)\. We prepend a learnable embedding to the previous sequence \(a class token of size\(1,D\)\(1,D\)\) that will encode the estimate of the parameterθi\\theta\_\{i\}\.
Per\-block positional encoding\.To retain the position of each patch in the original map, position embeddings of size\(N\+1,D\)\(N\+1,D\)are added to the patch embeddings\. To enable the network to handle varying numbers of blocks and thus varying number of interacting entities, this information must be incorporated into the positional encoding\. For each patch, we define a 3D\-position vector\(i,j,k\)\(i,j,k\)whereiiis the block index, and\(j,k\)\(j,k\)the relative position of the patch in the blockii\. This vector is then projected toDDdimensions via a fixed per\-coordinate sine\-cosine positional encoding\. The resulting sequence of embedding vectors \(tokens\) serves as input to the transformer\. The transformer adopts the classical series ofBBblocks alternating Multi\-head self\-attention, Layernorm, and MLP presented in vision transformer \(ViT\)[transf\_image](https://arxiv.org/html/2605.21617#bib.bib8)\.
Class token projection\.The target parameterθi\\theta\_\{i\}often represents relative position within each entity and is therefore a non\-negative real scalar\. Consequently, at the final layer output, the class token is extracted and projected to a real scalar using a linear layer followed by a sigmoid activation, ensuring an output between0and11\. This normalized estimate is then rescaled to a real entity position to recover the actual parameterθi\\theta\_\{i\}\.
### 3\.2Training strategy for flexibility to various size and number of blocks\.
For computational efficiency, we propose training the model on small interaction maps while allowing it to generalize to larger ones, that have more blocks \(i\.e\.varying amounts of parameter information\) or larger blocks \(i\.e\. different map resolutions or entity sizes\)\. Specifically, we rely on simulated training data consisting of map/parameter pairs\(Ci,θi\)\(C\_\{i\},\\theta\_\{i\}\), indexed by their corresponding entityii\. To enable the transformer to generalize to various numbers and sizes of blocks, we ensure that training maps are as broadly representative as possible by choosing, per batch, a number of blocks, the size of each block, and the entity to test\. Each map is normalized between0and11to account for differences in their value scales, ensuring more consistent comparisons across maps\. SinceBlockFormer\(ℬℱϕ\\mathcal\{BF\}\_\{\\phi\}\) acts as a projection ofCiC\_\{i\}ontoθi\\theta\_\{i\}, the natural training loss is the regression one:
ℒ\(ϕ\)=1M∑1≤m≤M‖ℬℱϕ\(Cim\)−θ~im‖22,whereθ~iis any normalized parameter\.\\mathcal\{L\}\(\\phi\)=\\frac\{1\}\{M\}\\sum\_\{1\\leq m\\leq M\}\\\|\\mathcal\{BF\}\_\{\\phi\}\(C\_\{i\}^\{m\}\)\-\\tilde\{\\theta\}\_\{i\}^\{m\}\\\|^\{2\}\_\{2\},\\ \\text\{where $\\tilde\{\\theta\}\_\{i\}$ is any normalized parameter\.\}\(1\)
## 4Genomic application: inferring DNA\-fragment locations from contact maps
We validateBlockFormer’s performance through a biological application where entities are chromosomes interacting within the cell nucleus\. The resulting interaction mapCCis a contact map\.
Generic contact maps structure\.The genome\-wide contact mapCCprojects the information contained in a population of 3D chromatin foldings into a 2D square and symmetric matrix made of cis\- \(or intra\-chromosomal\) and trans\- \(or inter\-chromosomal\) blocks of interactions between pairs of chromosomes\. To construct it, we cut each chromosome into genomic windows of a given length \(called resolution, e\.g\.3232kilobases \(kb\)\) and each matrix entry is called the contact count, representing the number of times a given window was in contact with another one over the population \(see Appendix[A\.1](https://arxiv.org/html/2605.21617#A1.SS1)and Figure[6](https://arxiv.org/html/2605.21617#A1.F6)\)\. In our setting, a cis\-block mainly contains a diagonal of enrichment \(i\.e\. high contact frequency region\), whereas a trans\-block between chromosomesiiandjjshows an enrichment of interactions at the location of both centromeres\(θi,θj\)\(\\theta\_\{i\},\\theta\_\{j\}\)\. Thus, the main informative part about centromeres relies only on the trans\-blocks\. To infer the centromereθi\\theta\_\{i\}, only theithi^\{\\text\{th\}\}row of trans\-blocks ofCC\(denotedCiC\_\{i\}\) is considered, reflecting chromosomeii’s interactions with all the others\. As the number of chromosomes can vary across species, from now on,CiC\_\{i\}will be any sequence of multiple trans\-blocks\.
Hi\-C map specificity\.During inference, we use a reference Hi\-C mapCrefC\_\{\\text\{ref\}\}and simulate synthetic contact mapsCC\. Hi\-C contact maps have many biases due to sequencing and mapping errors or to the inherent structure of the chromatin[hicnorm](https://arxiv.org/html/2605.21617#bib.bib18)\. Therefore,CrefC\_\{\\text\{ref\}\}is actually a normalized Hi\-C map, where the normalization corrects those biases, iteratively forcing all rows and columns to sum up to one[hicnorm](https://arxiv.org/html/2605.21617#bib.bib18)\(see Appendix[A\.2](https://arxiv.org/html/2605.21617#A1.SS2)\)\. Contact map quality depends on both resolution and sequencing depth\. Greater sequencing depth increases the number of detected chromatin contacts, improving the signal\-to\-noise ratio and enabling analysis at finer resolutions\. The resolution affects the precision of the parameter’s inference\. Indeed, a pixel represents a fragment of DNA of length the resolution and centromere positions appear as brighter pixels in each trans\-block of the map\. Very often, the resolution \(e\.g\.4040kb\) is much larger than the centromere length \(e\.g\.100100bp\)\. Achieving such precision is therefore challenging and a reasonable goal is to estimate the centromere with sub\-resolution precision\.
Simulator\.We exploit the structure of yeast contact maps to design a very efficient simulator that directly creates the upper trans\-blocks given its centromere positionsθ\\theta\. The simulated maps are simplified compared to real biological ones but still capture minimal sufficient structure \(spot\-like interactions\) required for inference \(see Appendix[C\.2](https://arxiv.org/html/2605.21617#A3.SS2)for a misspecification analysis\)\. At the centromere positions, the chromatin has a brush\-like organization: chromosomal regions near the centromeres often enter in contact over the population, whereas the further we move away from the centromeres, the rarer the contacts become\. To mimic this effect, we simulate a Gaussian spot at the position \(θi,θj\\theta\_\{i\},\\theta\_\{j\}\) for each trans\-contact block\. The depletion of contacts between centromeres and other loci is then simulated via a cross of non\-interaction passing by \(θi,θj\\theta\_\{i\},\\theta\_\{j\}\)\. Between chromosomes, we also observe rare interactions over the population that we reproduce by adding Gaussian noise to all the trans\-blocks up to10%10\\%of the maximal contact count \(see Appendix[C\.1](https://arxiv.org/html/2605.21617#A3.SS1)\)\.
Training procedure\.We aim to trainBlockFormeron small contact maps that can generalize to larger ones while limiting the training data budget\. Accordingly, we adopt a small transformer withB=4B=4blocks,44heads of attention, a patch sizeP=4P=4\(smaller than the typical size of interaction blocks that can be88\), and an embedding dimension ofD=24D=24\(a multiple of the patch size\)\. We design a training set of50 00050\\ 000examples, grouped by batch of size200200\. Each synthetic map is a sequence of trans\-blocks at resolution3232kb where the number and size of blocks vary\. To avoid overfitting to any fixed simulator parameter, each simulated trans\-block has a Gaussian spot that varies in size and location\. Training data generation details are provided in Appendix[D](https://arxiv.org/html/2605.21617#A4)\. The split between train and validation set is90%90\\%\-10%10\\%and the learning rate is fixed to5×10−45\\times 10^\{\-4\}\. The transformer was trained for over200200epochs on an NVIDIA TITAN X \(Pascal\) GPU for1515h, and the model with the lowest validation loss over those epochs was retained\. We did not find any accuracy improvement when using a scheduler\. The efficiency of such strategy is shown in Appendices[E\.3\.1](https://arxiv.org/html/2605.21617#A5.SS3.SSS1)and[E\.3\.2](https://arxiv.org/html/2605.21617#A5.SS3.SSS2)\.
## 5Results on real\-world contact maps
We evaluate the performance ofBlockFormeron two real\-world tasks\. First, we assess its accuracy on the centromere prediction task, demonstrating its generalization to a wide range of species\. We also provide a thorough analysis on the reference organismS\. cerevisiae\. We further show that our model extends to other tasks characterized by spot\-like interaction patterns, such as loop localization\. In all the cases, the parameter estimateθ^\\hat\{\\theta\}is computed as in Appendix[J\.4\.1](https://arxiv.org/html/2605.21617#A10.SS4.SSS1)\.
##### Centromere identification across diverse genome sizes and heterogeneous Hi\-C maps\.
We estimate centromere positions across species where ground truth is known\. Our analysis includes datasets from seven yeast species with varying numbers of chromosomes, the parasiteP\. falciparumat three distinct lifecycle stages, and the plantA\. thaliana\(see Table[1](https://arxiv.org/html/2605.21617#S5.T1),[2](https://arxiv.org/html/2605.21617#S5.T2)and Appendix[J\.4\.2](https://arxiv.org/html/2605.21617#A10.SS4.SSS2)for results\)\. The results are obtained by averaging the predictions of the model over several sub\-sampled22trans\-blocks for fast runtime222Using larger number of blocks did not substantially affect the performance \(see Table[3](https://arxiv.org/html/2605.21617#S5.T3)\)\.\. Species details about full names and genomes are given in Appendix[J\.4\.2](https://arxiv.org/html/2605.21617#A10.SS4.SSS2)\.
Table 1:Comparison of normalized error \(the smaller the better\) and time across species\.MethodS\.C\.L\.K\.L\.T\.S\.M\.ErrTimeErrTimeErrTimeErrTimeInit\.BlockFormer0\.300\.301\.220\.281\.250\.230\.580\.30Centurion0\.582\.040\.480\.981\.351\.080\.603\.78FittingBlockFormer0\.083\.240\.173\.310\.282\.480\.186\.38Centurion0\.0810\.80\.173\.480\.288\.960\.1815\.09Table 2:Species with out\-of\-training chromosomes sizes, too noisy map or high resolution: we apply the refinement step as in Appendix[C\.3\.1](https://arxiv.org/html/2605.21617#A3.SS3.SSS1)\. P\.F\.r\., P\.F\.s\. and P\.F\.t\. stand for the33stages rings, schizonts and trophozoites of the parasite\. Comparison of normalized error \(the smaller the better\) and time\.MethodS\.K\.S\.C\.K\.L\.S\.P\.A\.T\.P\.F\.r\.P\.F\.s\.P\.F\.t\.ErrTimeErrTimeErrTimeErrTimeErrTimeErrTimeErrTimeErrTimeInit\.BlockFormer2\.140\.691\.750\.331\.600\.114\.300\.183\.700\.382\.770\.831\.250\.732\.240\.84Centurion0\.533\.118\.121\.150\.710\.9226\.100\.05173\.0278\.191\.5610\.570\.9711\.11\.0810\.91FittingBlockFormer0\.48620\.990\.2030\.430\.181\.080\.941\.154\.0765\.20\.1825\.30\.2724\.60\.2823\.1Centurion0\.47612\.50\.20132\.30\.183\.3011\.627\.7936\.320446\.60\.181001\.20\.27185\.50\.2851\.24
Across Table[1](https://arxiv.org/html/2605.21617#S5.T1), the network consistently achieves near\-resolution accuracy \(e\.g\.0\.580\.58for S\.M\. \(16 chr\.\),1\.221\.22for L\.K\. \(8 chr\.\)\), showing robustness to varying block sizes and numbers\. Across Tables[1](https://arxiv.org/html/2605.21617#S5.T1)and[2](https://arxiv.org/html/2605.21617#S5.T2),BlockFormeris consistently faster thanCenturionin both initialization and fitting \(e\.g\.0\.270\.27error in24\.624\.6s versus185\.5185\.5s for P\.F\.s\.\)\. UnlikeCenturion, which must be rerun for each species, our model is amortized: trained once during1515h on GPU, it requires only forward passes for inference\. For species with out\-of\-training chromosomes sizes,Centurionis slow and often less accurate \(e\.g\. for A\.T\.,36\.336\.3error in≈\\approx5\.7 h versus4\.074\.07in65\.265\.2s\)\. Overall, even with training cost, after processing such three species, our model becomes more cost\-efficient\.
##### Focus on a reference case:Saccharomyces cerevisiae\(S\.C\.\)\.
We analyze the performance of our model on a reference case: the yeastS\. cerevisiae\. This organism has known centromere positions, and deep sequencing produced a low\-noise genome\-wide contact matrix at resolution3232kb\.
Point estimate\.We compute a point estimate of each centromere usingBlockFormer\. The parameter can be further refined using the fitting procedure from[nelle](https://arxiv.org/html/2605.21617#bib.bib32), which enforces both horizontal and vertical alignment between centromere positions\. Table[3](https://arxiv.org/html/2605.21617#S5.T3)reports the mean absolute error over the number of chromosomes normalized by the resolution, as well as the runtime on a CPU, for various block numberskkand compares them withCenturion\.
Table 3:Comparison forS\. cerevisiaeat3232kb resolution\. We report the mean absolute error divided by the resolution \(the smaller the better\)\. The number of blocks is indicated in brackets\.Pre\-localization MethodNorm\. errorTime \(s\)BlockFormer\(1\)0\.350\.85BlockFormer\(3\)0\.340\.81BlockFormer\(5\)0\.481\.19BlockFormer\(10\)0\.491\.87BlockFormer\(15\)0\.622\.58InitializationCenturion0\.584\.30
Full approach MethodNorm\. errorTime \(s\)BlockFormer\(1\) \+ fitting0\.082\.64BlockFormer\(3\) \+ fitting0\.082\.71BlockFormer\(5\) \+ fitting0\.082\.67BlockFormer\(10\) \+ fitting0\.082\.89BlockFormer\(15\) \+ fitting0\.083\.67Centurion0\.0812\.37
Overall, the number of chosen blocks has little impact on performance, as errors remain below the resolution\. Moreover, the network outputs a more accurate initial candidate \(0\.340\.34for33blocks versus0\.580\.58forCenturion\) in significantly less time\. Applying the fitting step further improves accuracy across all methods, yielding highly precise estimates \(0\.080\.08error or approximately22kb\), well below the data resolution \(3232kb\), while achieving a substantial speedup withBlockFormercompared toCenturion\(2\.642\.64s versus12\.3712\.37s\)\.
Figure 3:Inference using ABC\-Pearson \(see Appendix[G\.1](https://arxiv.org/html/2605.21617#A7.SS1)\), ABC\-CNN, ABC\-Transf, \(see Appendix[G\.2](https://arxiv.org/html/2605.21617#A7.SS2)\) NPE\-CNN, and NPE\-Transf \(a\) \(see Appendix[H](https://arxiv.org/html/2605.21617#A8)\)\. Color shades increase from lightest to darkest across rounds\. Densities are estimated with the5%5\\%bestθ\\thetaaccording to the ABC criterion or sampled from the flow\. In some dimensions, only theBlockFormer\-based approaches ABC\-Transf and NPE\-Transf are accurate \(e\.g\. the densities for the centromere of chromosome11and1212\), the CNN\-based methods lead to biased densities\. \(b\) Mean absolute error distance betweenθ\\thetaandθref\\theta\_\{\\text\{ref\}\}, computed over the5%5\\%best\-performing samples\. The horizontal gray dashed line stands for the resolution of the contact mapCrefC\_\{\\text\{ref\}\}\(in bp\)\.Posterior estimation\.BlockFormerandCenturionoutput only a point estimate of each centromere whereas it is actually a whole segment of chromosome\. To quantify uncertainty about parameter estimations, we instead target the posterior densityp\(θ\|Cref\)p\(\\theta\|C\_\{\\text\{ref\}\}\)using a Bayesian approach\. To reduce dimensionality of the input map, the pre\-trained modelBlockFormerserves as summary statistic within the inference approaches\. Since our model infersθi\\theta\_\{i\}fromCiC\_\{i\}, we actually target in parallel each marginal\. The inference framework strategy is detailed in Appendix[F\.1](https://arxiv.org/html/2605.21617#A6.SS1)\. Our choice of summary statistic is motivated by the fact thatℬℱϕ\(Ci\)\\mathcal\{BF\}\_\{\\phi\}\(C\_\{i\}\)trained with Equation[1](https://arxiv.org/html/2605.21617#S3.E1)approximates the conditional expectation𝔼\[θi\|Ci\]\\mathbb\{E\}\\left\[\\theta\_\{i\}\|C\_\{i\}\\right\]preserving first\-order information\. We consider two simulation\-based inference approaches: approximate Bayesian computation \(method referred as ABC\-Transf, see Appendix[G\.2](https://arxiv.org/html/2605.21617#A7.SS2)\) and neural posterior estimation \(method referred as NPE\-Transf, see Appendix[H](https://arxiv.org/html/2605.21617#A8)\)\. Figure[3](https://arxiv.org/html/2605.21617#S5.F3)\(a\) shows the marginal densities for two components ofθ\\theta\.BlockFormer\-based methods produce sharper and better\-calibrated posteriors, while CNN\-based approaches exhibit bias\. Benchmark of metrics in Figure[3](https://arxiv.org/html/2605.21617#S5.F3)\(b\) and Figure[12](https://arxiv.org/html/2605.21617#A9.F12)in Appendix[I](https://arxiv.org/html/2605.21617#A9)as well as calibration diagnostics confirm this trend\. For instance, the Wasserstein\-2 distance between inferred posteriors and posterior ground truth is lowest forBlockFormer\-based approaches, indicating an error of twice the resolution between samples and ground truth\.
##### Loop localization\.
BlockFormeris applicable to general inference using spot\-like patterns in interaction maps\. Among real applications, loop localization is of particular interest\. Loops often connect two functional elements thanks to a DNA\-binding regulatory protein CTCF[loop\_protein](https://arxiv.org/html/2605.21617#bib.bib27)\. In Hi\-C maps from eukaryotes, they appear as multiple bright and isolated dots away from the main diagonal in cis\-blocks, their total number and position being unknown\. We evaluate our approach using Hi\-C data from the human cell line IMR90 at resolution55kb\. However, becauseBlockFormeris designed to produce a single parameter estimate per entity using multiple trans\-blocks as input, the standard multi loops detection framework is not directly compatible\. To ensure consistency with the training setup, we restrict analysis to chromosomal regions containing a single loop and report results in Appendix[J\.4\.3](https://arxiv.org/html/2605.21617#A10.SS4.SSS3)\. Especially, we achieve better performance thanCenturion, that is not well\-adapted to inference from single block\.
## 6Ablations and synthetic experiments
##### Ablations of block\-aware modeling\.
One of our main contributions is the design of a block\-aware architecture consisting in per\-block padding and per\-block 3D positional encoding that preserves the block\-wise structure of the map and that cannot be naturally included in CNN\-based architectures\. We consider several positional encoding strategies including no positional encoding and a few 2D encoding alternatives\. All the transformers are trained in identical conditions on11to99blocks \(see Appendix[D](https://arxiv.org/html/2605.21617#A4)\)\. To assess the impact of block\-aware modeling, we consider synthetic maps that are sequences of11to1414trans\-blocks \(see details in Appendix[E\.2](https://arxiv.org/html/2605.21617#A5.SS2)\)\. According to Tables[4](https://arxiv.org/html/2605.21617#S6.T4)and[9](https://arxiv.org/html/2605.21617#A5.T9), the per\-block 3D positional encoding along with the per\-block padding method \(3D pos\. per block\) is the best overall method: it has lowest or near\-lowest mean error most consistently with competitive or best median performance and is in general more stable \(lower std in most settings\)\.
Table 4:Normalized absolute error comparison across methods for low/extreme regime \(1, 14 blocks\)\. 3D pos\. per\-block outperforms others methods being the most stable across the number of blocks\.1 Block14 BlocksMethodMean±\\pmStdMedian95% CIMean±\\pmStdMedian95% CI3D pos\. per block0\.43±\\pm0\.320\.39\[0\.02, 1\.22\]0\.36±\\pm0\.290\.29\[0\.01, 0\.93\]2D pos\. per block0\.73±\\pm2\.210\.40\[0\.01, 1\.99\]0\.38±\\pm0\.290\.38\[0\.02, 1\.17\]2D pos\.0\.65±\\pm2\.010\.39\[0\.04, 1\.19\]0\.55±\\pm0\.460\.43\[0\.02, 1\.82\]2D pos\. pad\.0\.64±\\pm2\.410\.33\[0\.02, 1\.20\]0\.53±\\pm0\.450\.41\[0\.02, 1\.66\]1D pos\.1\.04±\\pm0\.920\.73\[0\.04, 3\.41\]2\.21±\\pm5\.090\.59\[0\.03, 19\.61\]no pos\.4\.16±\\pm4\.752\.11\[0\.17, 16\.83\]3\.49±\\pm4\.821\.06\[0\.03, 17\.39\]We now evaluate the robustness ofBlockFormer, trained on contact maps at3232kb resolution, to varying numbers and sizes of blocks, different sequencing depths, and diverse spot\-like patterns\.
Robustness to various sizes of blocks\.The size of each block in a Hi\-C map depends on the species or the selected resolution\. For instance, in S\.C\., the largest chromosome is1\.51\.5Mbp long, while it is nearly3030times larger in the plant A\.T\.\. At4040kb resolution, this translates respectively to blocks of height3838and760760\. When the resolution is doubled \(2020kb\), the heights are also doubled\. We show in Figure[14](https://arxiv.org/html/2605.21617#A10.F14)thatBlockFormergeneralizes to various sizes of blocks: most of the errors remain below the resolution, whatever the number of blocks\. However, high dispersion occurs in some cases \(e\.g\.,11,77blocks\), with estimates accuracy ranging from100100bp to0\.50\.5Mbp\. We then show in Figure[4](https://arxiv.org/html/2605.21617#S6.F4)thatBlockFormergeneralizes to different resolutions\. For this, we simulate synthetic maps from the S\.C\. genome and downsample the maps to resolutions from2020kb to7070kb \(see Appendix[C\.3\.2](https://arxiv.org/html/2605.21617#A3.SS3.SSS2)for the downsampling procedure\)\.BlockFormerhas issues for some chromosomes at high resolution2020kb because block sizes can be far from the training distribution\.
Robustness to various numbers of blocks\.The number of chromosomes varies across species: e\.g\., S\.C\. has1616chromosomes, whereas A\.T\. only has55\. Since a trans\-block in a Hi\-C map represents interactions between22chromosomes, the total number of blocks differs across species\. We show in Figure[4](https://arxiv.org/html/2605.21617#S6.F4)thatBlockFormergeneralizes to varying numbers of blocks\. The parameter estimateθi^\\hat\{\\theta\_\{i\}\}is constructed as in Appendix[J\.4\.1](https://arxiv.org/html/2605.21617#A10.SS4.SSS1)\.
Figure 4:Absolute error per centromere over100100synthetic maps generated from the S\.C\. genome\. For each number of blockskkand each chromosomeii, we report the absolute errorerrik\\mathrm\{err\}\_\{i\}^\{k\}\(see details in[J\.2](https://arxiv.org/html/2605.21617#A10.SS2)\)\. Target chromosomesiion the x\-axis are sorted by length \(bp\)\. Color shades range from blue to red as the number of blockskkincreases from11to1515\. Across resolutions, the centromeres are estimated with a precision lower than the resolution\. Neither the size of the chromosome nor the number of blocks deteriorateBlockFormer’s accuracy\.Neither the size of the chromosome nor the number of blocks seems to impact accuracy: most of the estimations are under the resolution, and there are no large performance changes when the number of blocks increases\.
Robustness to various sequencing depths\.The quality of a Hi\-C map is directly influenced by sequencing depth: a higher sequencing depth yields higher resolution maps and reduces noise, but at an increased cost\. For example, the S\.C\. map at3030kb is remarkably clean in contrast to the S\.K\. map\. We show in Appendix[J\.2](https://arxiv.org/html/2605.21617#A10.SS2)and Figure[15](https://arxiv.org/html/2605.21617#A10.F15)thatBlockFormergeneralizes well to different sequencing depth\. We provide results on both synthetic but also reference map of S\.C\., keeping between10%10\\%and50%50\\%of the sequencing depth\. Most of the errors remain below the resolution in both settings\. Lower sequencing depth \(10%10\\%\) is more difficult since the maps become very noisy\.
Robustness to diverse spot pattern structures\.We evaluate the robustness ofBlockFormeron contact maps with spots of varying shapes, comparing it toCenturion\. The estimateθ^\\hat\{\\theta\}is constructed as in Appendix[J\.4\.1](https://arxiv.org/html/2605.21617#A10.SS4.SSS1)\.
Spots corrupted by noise\.The basic simulated map presents only one visible spot per block, but in real data, due to other genetic elements clustered together, the map can present multiple spots per interaction block or the spot can be affected by noise\. Those confounders can affect the initialization ofCenturion, leading to inaccurate estimation\. To testBlockFormer’s robustness, we simulate maps with additional random bright pixels in each trans\-block \(see Fig\.[23](https://arxiv.org/html/2605.21617#A10.F23)and[24](https://arxiv.org/html/2605.21617#A10.F24)in Appendix[J\.3\.6](https://arxiv.org/html/2605.21617#A10.SS3.SSS6)\) or with an additional Gaussian spot randomly positioned in each trans\-block \(see Fig\.[5](https://arxiv.org/html/2605.21617#S6.F5)aand Fig\.[22](https://arxiv.org/html/2605.21617#A10.F22)in Appendix[J\.3\.5](https://arxiv.org/html/2605.21617#A10.SS3.SSS5)\)\. In both settings,BlockFormeris more accurate and faster thanCenturion\.
Square spots\.If large regions of DNA interact, leading to aggregation of multiple contacts \(e\.g\. centromeres clustering of Drosophila but also enhancer–promoter contacts close together\), the spots can be square\. For such non\-Gaussian spots, the optimization process ofCenturionbased on Gaussian fitting can be challenging\. To investigate this situation, we generate maps with square spots \(see Fig\.[5](https://arxiv.org/html/2605.21617#S6.F5)band Fig\.[17](https://arxiv.org/html/2605.21617#A10.F17)in Appendix[J\.3\.2](https://arxiv.org/html/2605.21617#A10.SS3.SSS2)\)\.Centurionis less accurate and slower thanBlockFormer\. Moreover, our model estimates the parameter with sub\-resolution precision for any number of blocks\.
Other tests on different spot patterns inspired by real\-world data such as Gaussian, ring, or elliptical spots are provided in Appendix[J\.3\.1](https://arxiv.org/html/2605.21617#A10.SS3.SSS1),[J\.3\.4](https://arxiv.org/html/2605.21617#A10.SS3.SSS4), and[J\.3\.3](https://arxiv.org/html/2605.21617#A10.SS3.SSS3)\. In nearly all settings,BlockFormerachieves sub\-resolution precision with faster runtimes thanCenturion\.
Figure 5:Mean absolute error \(in bp, 1st\{\}^\{\\text\{st\}\}and 3rd\{\}^\{\\text\{rd\}\}panels\) and runtime \(in seconds, 2nd\{\}^\{\\text\{nd\}\}and 4th\{\}^\{\\text\{th\}\}panels\) over100100synthetic maps generated from the S\.C\. genome at resolution3030kb\.a: per trans\-block, one major Gaussian spot and one auxiliary Gaussian spot, smaller and less bright\.b: Square spot in each trans\-block\. The black dotted line stands for the resolution\.
## 7Conclusion
We presentedBlockFormer, a novel architecture to infer per\-entity parameters given an interaction map with localized patterns\. We designed a block\-aware transformer\-based model able to handle various sizes and numbers of interaction blocks, rendering inference flexible to many sets of entities\. The network enables amortized accurate point estimation of the parameter in many synthetic or real\-world scenarios with various spot patterns\. It can also serve as an informative summary statistic within a Bayesian framework, enabling a principled quantification of uncertainty in parameter estimates\. We evaluate our approach in a biological setting where entities are chromosomes, interaction maps are Hi\-C contact maps and parameters are centromere positions\. On this task, our method is robust as it does not rely on any initialization or pre\-localization: it uses an uninformative prior, randomly setting each centromere in the range of its chromosome\. Despite its generality to various spot patterns,BlockFormermatches or outperforms the state\-of\-the\-artCenturionin centromere identification\. However, our model always outputs a single parameter per entity which limits its applicability when the number of parameters is unknown or when some maps contain no parameter information\. In particular, loop localization usingBlockFormerrelies on pre\-filtered single\-loop regions\. Our entire inference pipeline is based on a large number of simulations: to mitigate computing bottlenecks, we designed a simplified but efficient contact maps simulator\. While introducing a mismatch between real and synthetic data, it still yields very convincing results for inferences on real experimental data, requiring minimal preprocessing\. Future work could consider applyingBlockFormerto other inference tasks from interaction maps such as inferring the 3D configuration of the genome, accordingly trained with other biologically\-inspired simulators\.
## References
- \[1\]Justin Alsing, Tom Charnock, Stephen Feeney, and Benjamin Wandelt\.Fast likelihood\-free cosmology with neural density estimators and active learning\.Monthly Notices of the Royal Astronomical Society, 488\(3\):4440–4458, 2019\.
- \[2\]Justin Alsing and Benjamin Wandelt\.Massive optimal data compression and density estimation for scalable, likelihood\-free inference in cosmology\.Monthly Notices of the Royal Astronomical Society, 477\(3\):2874–2885, 2018\.
- \[3\]Jean\-Michel Arbona, Sébastien Herbert, Emmanuelle Fabre, and Christophe Zimmer\.Inferring the physical properties of yeast chromatin through Bayesian analysis of whole nucleus simulations\.InGenome Biology, 2017\.
- \[4\]Žiga Avsec, Vikram Agarwal, Daniel Visentin, Joseph R\. Ledsam, Agnieszka Grabska\-Barwinska, Kyle R\. Taylor, Yannis Assael, John Jumper, Pushmeet Kohli, and David R\. Kelley\.Effective gene expression prediction from sequence by integrating long\-range interactions\.Nature Methods, 2021\.
- \[5\]Kerry S\. Bloom\.Centromeric heterochromatin: the primordial segregation machine\.Annu\. Rev\. Genet\., 48:457–484, 2014\.
- \[6\]Christopher A\. Brackley, James Johnson, Steven Kelly, Peter R\. Cook, and Davide Marenduzzo\.Simulated binding of transcription factors to active and inactive regions folds human chromosomes into loops, rosettes and topological domains\.Nucleic Acids Research, 2016\.
- \[7\]Guillaume Cottarel, James H\. Shero, Philip Hieter, and Johannes H\. Hegemann\.A 125\-base\-pair CEN6 DNA fragment is sufficient for complete meiotic and mitotic centromere functions in Saccharomyces cerevisiae\.Mol Cell Biol, 9\(8\):3342–3349, 1989\.
- \[8\]Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, et al\.An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale\.arXiv preprint arXiv:2010\.11929, 2021\.
- \[9\]Kyle P Eagen\.Principles of chromosome architecture revealed by Hi\-C\.InTrends Biochem Sci\., 2018\.
- \[10\]Yu Feng, Man\-Yat Chu, and Uroš Seljak\.Fastpm: a new scheme for fast simulations of dark matter and halos\.arXiv preprint arXiv:1603\.00476, 2016\.
- \[11\]Andrew D\. Fox, Benjamin J\. Hescott, Anselm C\. Blumer, and Donna K\. Slonim\.Connectedness of PPI network neighborhoods identifies regulatory hub proteins\.Bioinformatics, 2011\.
- \[12\]Jian Gao, Wei Zhang, Yuhan Li, Xinyu Wang, Yu Chen, et al\.Hic\-gnn: A generalizable model for 3d chromosome reconstruction using graph convolutional neural networks\.Computational and Structural Biotechnology Journal, 2023\.
- \[13\]Jonathan L\. Gordon, Kevin P\. Byrne, and Kenneth H\. Wolfe\.Mechanisms of Chromosome Number Evolution in Yeast\.PLoS Genet, 7, 2011\.
- \[14\]David S\. Greenberg, Marcel Nonnenmacher, and Jakob H\. Macke\.Automatic posterior transformation for likelihood\-free inference\.InICML, 2019\.
- \[15\]F\. Tosti Guerra, E\. Poppleton, P\. Šulc, and L\. Rovigatti\.Annamo: Coarse\-grained modelling for folding and assembly of rna and dna system\.Journal of Chemical Physics, 2024\.
- \[16\]ChangHoon Hahn, Michael Eickenberg, Shirley Ho, Jiamin Hou, Pablo Lemos, Elena Massara, Chirag Modi, Azadeh Moradinezhad Dizgah, Bruno Régaldo\-Saint Blancard, and Muntazir M Abidi\.A forward modeling approach to analyzing galaxy clustering with simbig\.Proceedings of the National Academy of Sciences, 120\(42\):e2218810120, 2023\.
- \[17\]Noah Hollmann et al\.Accurate predictions on small data with a tabular foundation model\.Nature, 2025\.
- \[18\]Maxim Imakaev, Geoffrey Fudenberg, Rachel P\. McCord, Natalia Naumova, Anton Goloborodko, Bryan Lajoie, Job Dekker, and Leonid Mirny\.Iterative correction of Hi\-C data reveals hallmarks of chromosome organization\.Nature Methods, 9:999–1003, 2012\.
- \[19\]Bai Jiang, Tung yu Wu, Charles Zheng, and Wing H\. Wong\.Learning summary statistic for approximate Bayesian computation via deep neural network\.InStatistica Sinica, 2018\.
- \[20\]John Jumper, Richard Evans, Alexander Pritzel, Tim Green, Michael Figurnov, Olaf Ronneberger, Kathryn Tunyasuvunakool, Russ Bates, Augustin Žídek, Anna Potapenko, et al\.Highly accurate protein structure prediction with alphafold\.Nature, 2021\.
- \[21\]David R\. Kelley\.Predicting 3d genome folding from dna sequence\.Nature Methods, 2020\.
- \[22\]Philippe Lefrançois, Ghia M\. Euskirchen, Raymond K\. Auerbach, Joel Rozowsky, Theodore Gibson, Christopher M\. Yellman, Mark Gerstein, and Michael Snyder\.Efficient yeast ChIP\-Seq using multiplex short\-read DNA sequencing\.BMC Genomics, 10\(37\), 2009\.
- \[23\]Hervé Marie\-Nelly, Martial Marbouty, Axel Cournac, Gianni Liti, Gilles Fischer, Christophe Zimmer, and Romain Koszul\.Filling annotation gaps in yeast genomes using genome\-wide contact maps\.InBioinformatics, 2014\.
- \[24\]Cyril Matthey\-Doret, Lyam Baudry, Axel Breuer, Rémi Montagne, Nadège Guiglielmoni, Vittore Scolari, Etienne Jean, et al\.Computer vision for pattern detection in chromosome contact maps\.Nature Communications, 2020\.
- \[25\]Angela Nietzel, Mariano Rocchi, Heike Starke, Anita Heller, Wolfgang Fiedler, Iwona Wlodarska, Ivan Loncarevic, Volkmar Beensen, Uwe Claussen, and Thomas Liehr\.A new multicolor\-FISH approach for the characterization of marker chromosomes: centromere\-specific multicolor\-FISH \(cenM\-FISH\)\.Human Genetics, 2001\.
- \[26\]Jens M\. Olesen, Jordi Bascompte, Yoko L\. Dupont, and Pedro Jordano\.The modularity of pollination networks\.Proceedings of the National Academy of Sciences, 2007\.
- \[27\]Cheng\-Tong Ong and Victor G\. Corces\.Ctcf: an architectural protein bridging genome topology and function\.Nature Reviews Genetics, 2014\.
- \[28\]George Papamakarios and Iain Murray\.Fastϵ\\epsilon\-free inference of simulation models with Bayesian conditional density estimation\.InNeurIPS, 2016\.
- \[29\]Harianto Tjong, Ke Gong, Lin Chen, and Frank Alber\.Physical tethering and volume exclusion determine higher\-order genome organization in budding yeast\.InGenome Research, 2012\.
- \[30\]Tina Toni, David Welch, Natalja Strelkowa, Andreas Ipsen, and Michael P\.H Stumpf\.Approximate Bayesian computation scheme for parameter inference and model selection in dynamical systems\.InJournal of The Royal Society Interface, 2008\.
- \[31\]Eloïse Touron, Pedro Luiz Coelho Rodrigues, Julyan Arbel, Nelle Varoquaux, and Michael Arbel\.Simulation\-based inference of yeast centromeres\.InNeurIPS 2025 Workshop on Imageomics: Discovering Biological Knowledge from Images Using AI, 2025\.
- \[32\]Nelle Varoquaux, Ivan Liachko, Ferhat Ay, Joshua N Burton, Jay Shendure, Maitreya J Dunham, Jean\-Philippe Vert, and William S Noble\.Accurate identification of centromere locations in yeast genomes using Hi\-C\.InNucleic Acids Research, 2015\.
- \[33\]Joachim Wolff, Rolf Backofen, and Björn Grüning\.Loop detection using Hi\-C data with HiCExplorer\.InGigaScience, volume 11, 2022\.
- \[34\]Ke Zhang, Chenxi Wang, Liping Sun, and Jie Zheng\.Prediction of gene co\-expression from chromatin contacts with graph attention network\.Bioinformatics, 2022\.
###### Contents
1. [1Introduction](https://arxiv.org/html/2605.21617#S1)
2. [2Related work](https://arxiv.org/html/2605.21617#S2)
3. [3BlockFormer: transformer\-based inference from interaction maps](https://arxiv.org/html/2605.21617#S3)1. [3\.1A block\-aware transformer\-based architecture](https://arxiv.org/html/2605.21617#S3.SS1) 2. [3\.2Training strategy for flexibility to various size and number of blocks\.](https://arxiv.org/html/2605.21617#S3.SS2)
4. [4Genomic application: inferring DNA\-fragment locations from contact maps](https://arxiv.org/html/2605.21617#S4)
5. [5Results on real\-world contact maps](https://arxiv.org/html/2605.21617#S5)
6. [6Ablations and synthetic experiments](https://arxiv.org/html/2605.21617#S6)
7. [7Conclusion](https://arxiv.org/html/2605.21617#S7)
8. [References](https://arxiv.org/html/2605.21617#bib)
9. [AContact maps](https://arxiv.org/html/2605.21617#A1)1. [A\.1Generic structure\.](https://arxiv.org/html/2605.21617#A1.SS1) 2. [A\.2Hi\-C map normalization\.](https://arxiv.org/html/2605.21617#A1.SS2)
10. [BA state\-of\-the\-art method for centromere identification:Centurion](https://arxiv.org/html/2605.21617#A2)
11. [CThe simulator](https://arxiv.org/html/2605.21617#A3)1. [C\.1Simulation process](https://arxiv.org/html/2605.21617#A3.SS1) 2. [C\.2Misspecification analysis](https://arxiv.org/html/2605.21617#A3.SS2) 3. [C\.3Failure cases and solutions](https://arxiv.org/html/2605.21617#A3.SS3)1. [C\.3\.1Too large chromosomes/high resolution: refinement step](https://arxiv.org/html/2605.21617#A3.SS3.SSS1) 2. [C\.3\.2Map too noisy: downsampling procedures](https://arxiv.org/html/2605.21617#A3.SS3.SSS2)
12. [DTraining data generation](https://arxiv.org/html/2605.21617#A4)
13. [EAblations](https://arxiv.org/html/2605.21617#A5)1. [E\.1Comparison ofBlockFormerto others architectures](https://arxiv.org/html/2605.21617#A5.SS1) 2. [E\.2Ablations: importance of block\-aware architecture](https://arxiv.org/html/2605.21617#A5.SS2) 3. [E\.3Ablations: importance of variety in training strategy](https://arxiv.org/html/2605.21617#A5.SS3)1. [E\.3\.1Necessity of various block numbers during training](https://arxiv.org/html/2605.21617#A5.SS3.SSS1) 2. [E\.3\.2Sensitivity to simulator parameters](https://arxiv.org/html/2605.21617#A5.SS3.SSS2)
14. [FSimulation\-based inference for parameter uncertainty quantification](https://arxiv.org/html/2605.21617#A6)1. [F\.1Generic framework](https://arxiv.org/html/2605.21617#A6.SS1) 2. [F\.2Inference of variable size parameter](https://arxiv.org/html/2605.21617#A6.SS2)
15. [GSMC\-ABC](https://arxiv.org/html/2605.21617#A7)1. [G\.1With the metric Pearson correlation –ABC\-Pearson](https://arxiv.org/html/2605.21617#A7.SS1) 2. [G\.2With a summary statistic and the classicall2l^\{2\}\-norm –ABC\-TransforABC\-CNN](https://arxiv.org/html/2605.21617#A7.SS2)
16. [HSNPE –NPE\-TransforNPE\-CNN](https://arxiv.org/html/2605.21617#A8)
17. [IApplication: centromeres inference forSaccharomyces cerevisiae– posterior estimation\.](https://arxiv.org/html/2605.21617#A9)
18. [JExperiments](https://arxiv.org/html/2605.21617#A10)1. [J\.1A model flexible to various sizes of blocks](https://arxiv.org/html/2605.21617#A10.SS1) 2. [J\.2A model flexible to various sequencing depths](https://arxiv.org/html/2605.21617#A10.SS2) 3. [J\.3A model flexible to various spot patterns settings](https://arxiv.org/html/2605.21617#A10.SS3)1. [J\.3\.1Gaussian spots](https://arxiv.org/html/2605.21617#A10.SS3.SSS1) 2. [J\.3\.2Square spots](https://arxiv.org/html/2605.21617#A10.SS3.SSS2) 3. [J\.3\.3Elliptical spots](https://arxiv.org/html/2605.21617#A10.SS3.SSS3) 4. [J\.3\.4Ring spots](https://arxiv.org/html/2605.21617#A10.SS3.SSS4) 5. [J\.3\.5Multiple spots](https://arxiv.org/html/2605.21617#A10.SS3.SSS5) 6. [J\.3\.6Noisy map](https://arxiv.org/html/2605.21617#A10.SS3.SSS6) 4. [J\.4Inference from real\-world contact maps](https://arxiv.org/html/2605.21617#A10.SS4)1. [J\.4\.1Strategy to construct the parameter estimationθ^\\hat\{\\theta\}](https://arxiv.org/html/2605.21617#A10.SS4.SSS1) 2. [J\.4\.2Centromere identification](https://arxiv.org/html/2605.21617#A10.SS4.SSS2) 3. [J\.4\.3Loop localization](https://arxiv.org/html/2605.21617#A10.SS4.SSS3)
## Appendix AContact maps
### A\.1Generic structure\.
A genome\-wide contact map summarizes all the chromatin contacts observed over a population of DNA configurations\. To construct it, we define the resolution of the map, which is the length of the chromosome fragment that is represented by one pixel in the map\. Each chromosome is cut into fragments and each entry of the map represents the contact counts of any fragment with another one over the population of DNA\. This creates a matrix by blocks of interactions between chromosomes: for instance, theithi^\{\\text\{th\}\}line of blocks in the map summarizes the interactions of chromosomeiiwith all the other chromosomes\. Usually, we represent them by a heatmap as in Figure[6](https://arxiv.org/html/2605.21617#A1.F6)\.
Figure 6:Process to construct a contact map in the case of22chromosomes\.
### A\.2Hi\-C map normalization\.
To correct biases in reference maps, we use ICE normalization via the Python libraryicedwith the functionICE\_normalizationfrom the modulenormalization\.
## Appendix BA state\-of\-the\-art method for centromere identification:Centurion
\[[32](https://arxiv.org/html/2605.21617#bib.bib32)\]tackled the problem of centromere identification based on Hi\-C data with an algorithm calledCenturion\. It starts with a normalized Hi\-C map and tries to identify peaks of interactions in the trans\- \(or inter\-chromosomal\) blocks for centromere initial candidates\. After heuristic filtering, one candidate is chosen for each chromosome, and the set of candidates serves as initialization of a joint optimization procedure to refine the estimated positions\. The interaction between any chromosomekkandllin a window is modeled by a 2D Gaussian centered at\(θk,θl\)\(\\theta\_\{k\},\\theta\_\{l\}\)\. To refine centromere positions, it performs a least\-squares fit over all pairs of windows under the constraint that a centromere lies in its chromosome range\.Centurion’s strength lies in this optimization step, which forces the alignment of all the Gaussian spots in the map\. However, it highly relies on good pre\-localization of candidates\. This method is deterministic as it outputs only the mean position of each centromere, and the non\-convex optimization process can be time\-consuming for large matrices\. In addition, its accuracy strongly depends on accurate pre\-localization of the centromeres\. Because the optimization relies on a specific Hi\-C map, changing the species requires restarting the entire procedure from scratch\. On the contrary,BlockFormeris amortized: once trained, the network can infer the centromere of other species \(see Section[5](https://arxiv.org/html/2605.21617#S5)and Appendix[J\.4\.2](https://arxiv.org/html/2605.21617#A10.SS4.SSS2)\)\. Moreover,Centurionis specifically fine\-tuned for centromere identification and needs adaptations before being used for other inference tasks \(e\.g\. loop localization\)\.
## Appendix CThe simulator
### C\.1Simulation process
The goal of the simulator is to create a contact mapCCrapidly given the centromere positionsθ\\theta\. AsCCis symmetric, we only simulate the upper trans\-blocks\. We want to mimic the peak of interactions that appears in those blocks, as well as some rare interactions that can occur among the population of DNA\. Given theLLchromosome lengths in bp\{li\}1≤i≤L\\\{l\_\{i\}\\\}\_\{1\\leq i\\leq L\}, the centromere positionsθ=\(θ1,…,θL\)\\theta=\(\\theta\_\{1\},\.\.\.,\\theta\_\{L\}\)are sampled from the prior𝒰\(∏1≤i≤L\[1,li−1\]\)\\mathcal\{U\}\(\\underset\{1\\leq i\\leq L\}\{\\prod\}\[1,l\_\{i\}\-1\]\)\. To create each contact mapCC, the process is described in Algorithm[1](https://arxiv.org/html/2605.21617#alg1)\.
Algorithm 1Simulator of contact mapsInput:
LLchromosome lengths in bp
\{li\}1≤i≤L\\\{l\_\{i\}\\\}\_\{1\\leq i\\leq L\}, resolution of the contact map in bp
rr\(e\.g\.
r=32r=32kb\), centromere positions
θ=\(θ1,…,θL\)\\theta=\(\\theta\_\{1\},\.\.\.,\\theta\_\{L\}\)
Return: the upper trans\-blocks of a simulated contact map
CCat the resolution
rrbp\.
choose the size of the peaks of interaction: sample
σ2\\sigma^\{2\}from
𝒰\(0\.1,10\)\\mathcal\{U\}\(0\.1,10\)
choose the intensity of interaction
α\\alphato simulate the DNA population size: sample
α\\alphafrom
𝒰\(⟦1,1 000⟧\)\\mathcal\{U\}\(\\llbracket 1,1\\ 000\\rrbracket\)
construct the upper trans\-blocks of
CCdenoted
CupperC^\{\\text\{upper\}\}as:
foreach chromosome pair
\(i,j\)\(i,j\),
j\>ij\>ido
define a block of interaction
CijupperC^\{\\text\{upper\}\}\_\{ij\}of size \(
lir,ljr\\frac\{l\_\{i\}\}\{r\},\\frac\{l\_\{j\}\}\{r\}\)
define the center of the peak
\(θi,θj\)\(\\theta\_\{i\},\\theta\_\{j\}\)
apply Gaussian density
𝒩\(\(θi/r,θj/r\),σ2\)\\mathcal\{N\}\(\(\\theta\_\{i\}/r,\\theta\_\{j\}/r\),\\sigma^\{2\}\)to the pixels of the block
CijupperC^\{\\text\{upper\}\}\_\{ij\}
multiply each pixel of
CijupperC^\{\\text\{upper\}\}\_\{ij\}by the intensity factor
α\\alpha
add Gaussian noise up to
10%10\\%of the maximal value of
CijupperC^\{\\text\{upper\}\}\_\{ij\}to mimic the rare contacts:
construct a random matrix
MijM\_\{ij\}of size \(
lir,ljr\\frac\{l\_\{i\}\}\{r\},\\frac\{l\_\{j\}\}\{r\}\) where each pixel is sampled from
𝒩\(max\(Cijupper\)×0\.05,\(max\(Cijupper\)×0\.05\)2\)\\mathcal\{N\}\(\\max\(C^\{\\text\{upper\}\}\_\{ij\}\)\\times 0\.05,\(\\max\(C^\{\\text\{upper\}\}\_\{ij\}\)\\times 0\.05\)^\{2\}\), then add
MijM\_\{ij\}to
CijupperC^\{\\text\{upper\}\}\_\{ij\}
draw a cross of non interaction \(set values to
0\) of width
σ\\sigmapassing through
\(θi,θj\)\(\\theta\_\{i\},\\theta\_\{j\}\)
endfor
C=Cupper\+Cupper, TC=C^\{\\text\{upper\}\}\+C^\{\\text\{upper, T\}\}
returna simulated contact map
CCat resolution
rrbp
Figure 7:reference Hi\-C map \(left\) and a simulated map \(right\) generated from the S\.C\. reference genome \(resolution3232kb\)\.
### C\.2Misspecification analysis
Contact maps vary significantly across species due to differences in folding organization and underlying physical constraints\. The simulator we designed is simplified, avoiding overfitting to any specific species and enables the generation of contact maps that captures only the necessary information for inference, sometimes far from real contact maps\. The mismatch of the simulator can come from33factors:
- •the structure of the spot: to test it, we extract the real spots and add simulated noise to create simulated map with real spot \(referred asCspotC\_\{\\text\{spot\}\}\)\.
- •the structure of the noise: to test it, we transport the Centurion pre\-loc reference map to the closest simulated one via quantile mapping \(referred asCtransC\_\{\\text\{trans\}\}\)\.
- •the pirate pixels \(e\.g\. telomeres interactions\): to test it, we use the reference map used inCenturionpre\-localization \(remove the borders of each trans\-block to remove telomeres interactions\) \(referred asCcentC\_\{\\text\{cent\}\}\)\.
The raw reference map is referred asCrawC\_\{\\text\{raw\}\}and the close simulated one asCsimuC\_\{\\text\{simu\}\}\. To analyze qualitatively the mismatch introduced by our simulator, we present heatmaps as well as the histograms of pixels colors of those maps\. For fair comparison, each map is normalized between 0 and 1, the cis\-blocks are set to Nan\. The metric used to quantify the mismatch is based on the Pearson correlation, commonly used in the domain\. We average the Pearson correlation between each row of one trans\-block ofCsimuC\_\{\\text\{simu\}\}andCrefC\_\{\\text\{ref\}\}and then average all those correlations over the number of upper trans\-blocks \(the closer to 1 the better\)\. We also report the runtime as well as the inference performance of the model with the normalized error: mean absolute error divided by the resolution \(the smaller the better\)\. Any value below 1 is satisfying \(meaning an error below the resolution\)\.



Figure 8:\(top\) Contact maps of yeasts S\.C\., L\.T\., and S\.M\., shown in the following order: simulated mapCsimuC\_\{\\text\{simu\}\}, map with real spots and simulated noiseCspotC\_\{\\text\{spot\}\}, real map without telomere interactionsCcentC\_\{\\text\{cent\}\}, and real map transported to the closest simulated oneCtransC\_\{\\text\{trans\}\}\. \(bottom\) Histograms of pixel\-normalized values\.Table 5:Mismatch versus inference performance ofBlockFormeracross three species of yeast\. We report the normalized error, runtime \(in seconds\), and correlation with the simulated contact mapCsimuC\_\{\\text\{simu\}\}\.BlockFormerachieves similar performance across settings, illustrating that simulator mismatch does not significantly impact results\.SpeciesMethodNorm\. errorTime \(s\)Corr\. withCsimuC\_\{\\text\{simu\}\}S\.C\. \(16 chr\.\)CsimuC\_\{\\text\{simu\}\}0\.300\.32–CrawC\_\{\\text\{raw\}\}0\.300\.350\.49CspotC\_\{\\text\{spot\}\}0\.410\.400\.49CcentC\_\{\\text\{cent\}\}0\.240\.320\.37CtransC\_\{\\text\{trans\}\}0\.300\.300\.58L\.T\. \(8 chr\.\)CsimuC\_\{\\text\{simu\}\}0\.600\.23–CrawC\_\{\\text\{raw\}\}1\.250\.270\.11CspotC\_\{\\text\{spot\}\}0\.660\.280\.11CcentC\_\{\\text\{cent\}\}1\.250\.230\.11CtransC\_\{\\text\{trans\}\}0\.840\.270\.37S\.M\. \(16 chr\.\)CsimuC\_\{\\text\{simu\}\}0\.360\.35–CrawC\_\{\\text\{raw\}\}1\.630\.400\.28CspotC\_\{\\text\{spot\}\}0\.430\.310\.27CcentC\_\{\\text\{cent\}\}0\.580\.300\.25CtransC\_\{\\text\{trans\}\}0\.570\.470\.46The correlations are quite low \(e\.g\. 0\.11 for L\.T\.\) and the distributions are very different \(both the distribution of noise and the distribution of the spots\) when analyzing the histograms: the spots values distribution in simulated maps are discrete whereas it is continuous in real maps\. If we look at the inference performance onCrawC\_\{\\text\{raw\}\}, we have a degradation of at most a factor 5, mostly due to high telomere interactions\. If we remove those outliers as inCcentC\_\{\\text\{cent\}\}\(which is done inCenturion\), we then have a degradation in the normalized error of at most a factor 2 compared toCsimuC\_\{\\text\{simu\}\}\(worst case: 1\.25 versus 0\.60 for L\.T\.\)\. The mismatch degrades performance when working directly with the raw reference matrix: in particular, when there are strong telomeric interactions\. However, since telomeres represent boundaries of interaction blocks, they can be removed \(as done inCenturion\), and in such cases, the simulator mismatch no longer significantly impact inference accuracy\. Globally, our results indicate that accurate inference does not require highly realistic simulators, but rather capturing the key structural patterns of the interaction maps\.
### C\.3Failure cases and solutions
#### C\.3\.1Too large chromosomes/high resolution: refinement step
BlockFormerwas trained on block size between66and6262\(see Appendix[D](https://arxiv.org/html/2605.21617#A4)\) and it has difficulties for block height far from this range\. When species \(e\.g\. A\.T\.\) have too large chromosomes or when the resolution is too fine \(e\.g\.1010kb\), this leads to too big matrices and the inference fails\. To address this problem, we propose an iterative refinement with multi\-resolutions analysis: we start with a pre\-localization step on a modified map at coarse resolution, ensuring that block sizes fall within the training range of block size\. As in the pre\-localization stage ofCenturion, the borders of each block in this map are set to0to avoid bias caused by telomeres interactions\. We then consider the raw map at a fine resolution where the block sizes are too big\. We cut patches of maximal size60×6060\\times 60around the coarse parameter estimation in each trans\-block and stack them for a refined estimation\. The patches are centered around the coarse estimation in the fine map and the patch is clipped to the border of each trans\-block \(referred asCrefineC\_\{\\text\{refine\}\}\)\.
We present refinement results on three species, for which the chromosome sizes are out of the training range\. The yeast K\.L\. is studied at resolution6060kb then3030kb, the yeast S\.P\. is studied at resolution9090kb then3030kb and the plant A\.T\. is studied at resolution600600kb then4040kb\. We report the runtime as well as the normalized error: mean absolute error divided by the resolution \(the smaller the better\)\. Any value below 1 is satisfying \(meaning an error below the resolution\)\.
Table 6:Comparison between raw and refined interaction maps across species\. We report the normalized error and runtime \(in seconds\)\. The refinement step substantially improves accuracy while maintaining low computational cost\.SpeciesMethodNorm\. errorTime \(s\)K\.L\. \(6 chr\.\)CrawC\_\{\\text\{raw\}\}5\.740\.23CrefineC\_\{\\text\{refine\}\}1\.600\.31S\.P\. \(3 chr\.\)CrawC\_\{\\text\{raw\}\}35\.020\.88CrefineC\_\{\\text\{refine\}\}4\.300\.18A\.T\. \(5 chr\.\)CrawC\_\{\\text\{raw\}\}125\.94718\.2CrefineC\_\{\\text\{refine\}\}3\.700\.38
#### C\.3\.2Map too noisy: downsampling procedures
If the contact map deviates strongly from the simulator distribution \(e\.g\. excessive noise obscuring spots\), we propose two approaches: \(i\) reduce resolution via downsampling to smooth the map \(the method is denotedCcoarseC\_\{\\text\{coarse\}\}\) or \(ii\) construct a map closer to simulated one by extracting the enrichment regions and introducing background simulated noise \(method denotedCspotC\_\{\\text\{spot\}\}\)\. To change the resolution of the map, we apply a downsampling procedure\. Suppose we want to go from a fine resolutionrrbp to a coarser resolutionk×rk\\times rbp\. Starting from the high\-resolution map, each pixel in the coarse map is obtained by aggregating \(e\.g\. summing\) the values of the correspondingk×kk\\times kneighborhood in the fine map\. This aggregation is performed independently within each chromosome block, ensuring that the original block structure is preserved\. We evaluate performance via the normalized error on a noisy contact map from yeast S\.K\. \(the smaller the better\)\. Both approaches improve performance, withCspotC\_\{\\text\{spot\}\}achieving the lowest error at low computational cost\.
Table 7:Comparison between noisy raw and smoothed interaction maps for species S\.K\.\. We report the normalized error and runtime \(in seconds\)\. Changing the resolution substantially improves accuracy while maintaining low computational cost\.SpeciesMethodNorm\. errorTime \(s\)S\.K\. \(16 chr\.\)CrawC\_\{\\text\{raw\}\}\(3030kb\)1\.900\.44CspotC\_\{\\text\{spot\}\}\(3030kb\)0\.590\.31CcoarseC\_\{\\text\{coarse\}\}\(6060kb\)0\.740\.30
## Appendix DTraining data generation
Our model can generalize to unseen configurations because training samples span a wide range of block numbers and sizes, forcing the network to learn representations invariant to both the number and the size of blocks\. We detail here the training data generation process\. Since the training set must include interaction maps of varying sizes, we organize the data into batches for computational efficiency\. In each batch, we create a synthetic genome composed of22to1010chromosomes with sizes ranging from2×1052\\times 10^\{5\}to2×1062\\times 10^\{6\}bp\. Both the number of chromosomes and their individual sizes are sampled uniformly\. The entity for which the parameter is to be inferred is also selected uniformly\. The resolution of the maps is set tor=32r=32kb\. Let L denote the number of chromosomes, with sizes\{li\}1≤i≤L\\\{l\_\{i\}\\\}\_\{1\\leq i\\leq L\}and let j be the selected entity\. Within a batch, all training maps share the same structure derived from this synthetic genome: a sequence ofL−1L\-1trans\-blocks with size\(ljr,lir\)\(\\frac\{l\_\{j\}\}\{r\},\\frac\{l\_\{i\}\}\{r\}\)\(namely varying from66to6262\)\. Inside a batch, the generation procedure follows Algorithm[1](https://arxiv.org/html/2605.21617#alg1): each parameterθ\\thetais sampled from∏i𝒰\(1,li−1\)\\prod\_\{i\}\\mathcal\{U\}\(1,l\_\{i\}\-1\), the spot size is sampled from𝒰\(0\.1,10\)\\mathcal\{U\}\(0\.1,10\)and Gaussian noise is added to each map with amplitude up to10%10\\%of the maximum value in the map\. The map is normalized between0and11and each block in the map is then0\-padded such that its size is a multiple of the patch size\.
Figure 9:Examples of simulated maps, the number and the size of blocks vary\. We provide sequences of trans\-blocks created from synthetic genome of22,44,66,88or1010chromosomes\. The spots also vary in size \(σ2\\sigma^\{2\}\) and locations \(θ\\theta\)\.
## Appendix EAblations
We demonstrate in that per\-block modeling along with a highly various training set are necessary to obtain optimal performance in many settings\. In the following ablations[E\.2](https://arxiv.org/html/2605.21617#A5.SS2)and[E\.3](https://arxiv.org/html/2605.21617#A5.SS3), all the architectures are tested in the same setting: we consider maps that are sequences of11to1414trans\-blocks constructed from a synthetic genome made of22to1515chromosomes of size ranging from2×1052\\times 10^\{5\}to2×1062\\times 10^\{6\}bp\. For each number of blocks, we simulate100100maps as in Appendix[D](https://arxiv.org/html/2605.21617#A4)where block sizes vary as well as parameter location and spot sizes \. We evaluate performance by reporting the absolute error between the estimateθ^\\hat\{\\theta\}andθ\\thetanormalized by the resolution\. We first provide comparison ofBlockFormerwith other learning\-based architectures\.
### E\.1Comparison ofBlockFormerto others architectures
To the best of our knowledge, no modern baseline approaches such as graph\-based or attention\-based architectures are available for centromere inference from interaction maps except our prior work in\[[31](https://arxiv.org/html/2605.21617#bib.bib31)\]that proposes a probabilistic framework which uses a CNN\-based architecture as summary statistic\. We include this architecture in our study\. We also tried several architectures including attention\-based architectures to estimate the parameters\. As some architectures are flexible to any maps and some are specific, we will compare them on a single setting with a synthetic genome made of the 3 first chromosomes of the species S\.C\. at resolution3232kb\. The reference map of S\.C\. including only interactions between the three firsts chromosomes is denotedCrefC\_\{\\text\{ref\}\}\. We test the performance of each architecture with the normalized mean absolute error, either averaged over10001000synthetic mapsCsimuC\_\{\\text\{simu\}\}or on the reference mapCrefC\_\{\\text\{ref\}\}\. We designed architectures that takes as input:
- •the entire mapCCwith only the upper trans\-blocks and the lower symmetric part set to 0 but those methods are trained on maps with fixed number and size of blocks\. Moreover, we rapidly face the curse of dimensionality when training those architectures on maps with increasing size \(methods referred as entireCC\)\. Those models were trained on synthetic maps with same structure asCrefC\_\{\\text\{ref\}\}\.
- •row of all blocks for each chromosome including the cis\-block set to 0 but those methods are trained on fixed size row with a fixed number of blocks per chromosomes \(methods referred as fixed row ofCC\) \. Those models were trained on rows with same structure as those ofCrefC\_\{\\text\{ref\}\}\.
- •row of all blocks for each chromosome including the cis\-block set to 0 but those methods are trained on rows of variable sizes with a fixed number of blocks \(methods referred as variable row ofCC\)\. Those models were trained on rows of33blocks of variable sizes\.
- •various sequences of trans\-blocks but those methods are trained on various numbers of blocks with various sizes \(methods referred as variable blocks ofCC\)\. Those models were trained on sequences of11to99trans\-blocks of variable sizes\.
All the architectures that use a convolutional network \(CNN\) have the following backbone: 2 convolutional layers \(1 to 6 to 12 channels with3×33\\times 3kernels\) with interleaved max\-pooling \(with2×22\\times 2kernels\)\. The MLPs used have a factor\-4 reduction to a single unit and a sigmoid activation to ensure a valid output\. The attention pooling \(att\-pool\) is an MLP \(12 to 64 to 1 channels\)\. The transformers \(transf\) used have 4 blocks, with 4 heads of attention, a patch size of 4 and an embedding dimension of 24\. All the architectures are trained on50005000training samples with batch size200200over200200epochs and a fixed learning rate of5×10−45\\times 10^\{\-4\}except the shared architectures based on transformers trained on5000050000samples because of the variety in the training data and the number of model parameters\.
Table 8:Comparison of architectures across different input configuration\. Normalized mean absolute error reported for simulated \(CsimuC\_\{\\text\{simu\}\}\) and reference \(CrefC\_\{\\text\{ref\}\}\) maps\.InputMethodError onCsimuC\_\{\\text\{simu\}\}Error onCrefC\_\{\\text\{ref\}\}GeneralizationentireCCCNN\+MLP0\.350\.37specific to map structureCNN\+att\-pool\+MLP0\.953\.11flexible to any3×33\\times 3per\-block mapTransf\+MLP0\.440\.26specific to map structurefixed row ofCCCNN shared\+MLP per chr\.0\.290\.49specific to map row structureTransf\+MLP \(per chr\.\)0\.410\.60specific to map row structureTransf\+cls token \(per chr\.\)0\.461\.06flexible to any rowvariable row ofCCCNN\+att\-pool\+MLP \(shared\)0\.582\.54flexible to any rowTransf\+cls token \(shared\)0\.400\.86flexible to any rowvariable blocks ofCCTransf\+cls token \(shared\)0\.521\.34flexible to any sequence of blocks
Architectures specific to map structure perform better than flexible architectures \(e\.g\. CNN\+MLP versus Transf\+cls token\)\. However, those architectures must be retrained when the map changes, rendering the approach too costly compared toCenturion\. Among the flexible structures, the transformer\-based approach outperforms the other attention\-based approaches \(Transf\+cls token versus CNN\+att\-pool\+MLP\)\. Finally, presenting full rows of blocks containing the cis\-block set to0that could mislead inference seems less appropriate than variable sequences of trans\-blocks\.
### E\.2Ablations: importance of block\-aware architecture
One of our main contributions is the design of a block\-aware architecture consisting in per\-block padding and per\-block 3D positional encoding that preserves the block\-wise structure of the map and that cannot be naturally included in CNN\-based architectures\. The transformer architectures and training setup used in each case are identical to those described in Section[4](https://arxiv.org/html/2605.21617#S4)Training details and Appendix[D](https://arxiv.org/html/2605.21617#A4)\. We consider the following ablations:
- •no positional encoding with per\-block padding \(referred as no pos\.\)
- •1D positional encoding with per\-block padding \(referred as 1D pos\.\)
- •2D positional encoding with bottom\-right padding \(referred as 2D pos\. pad\.\)
- •2D positional encoding with per\-block padding \(referred as 2D pos\.\)
- •2D per\-block positional encoding with per\-block padding \(referred as 2D pos\. per block\)
- •3D per\-block positional encoding with block index and per\-block padding \(referred as 3D pos\. per block\)
Table 9:Performance comparison across methods for mid/high regime \(4, 10 blocks\)\. Results are reported as mean±\\pmstd, median, and 95% CI\.4 Blocks10 BlocksMethodMean±\\pmStdMedian95% CIMean±\\pmStdMedian95% CI3D pos\. per block0\.39±\\pm0\.300\.33\[0\.02, 1\.08\]0\.39±\\pm0\.300\.35\[0\.03, 1\.13\]2D pos\. per block0\.40±\\pm0\.300\.34\[0\.02, 1\.04\]0\.41±\\pm0\.290\.36\[0\.02, 0\.96\]2D pos\.0\.42±\\pm0\.330\.36\[0\.01, 1\.19\]0\.41±\\pm0\.320\.34\[0\.01, 1\.09\]2D pos\. pad\.0\.39±\\pm0\.340\.31\[0\.01, 1\.19\]0\.42±\\pm0\.390\.34\[0\.03, 1\.36\]1D pos\.0\.52±\\pm0\.480\.43\[0\.01, 1\.26\]1\.04±\\pm0\.920\.73\[0\.04, 3\.41\]no pos\.4\.18±\\pm5\.071\.70\[0\.08, 17\.88\]4\.06±\\pm5\.161\.32\[0\.05, 17\.21\]Results for low \(11block\) and extreme regimes \(out\-of\-training range:1414blocks\) are in Table[4](https://arxiv.org/html/2605.21617#S6.T4)\. The 1D positional encoding \(1D pos\.\) is generally unstable except in the medium regime \(44blocks\), and the no positional encoding \(no pos\.\) is always unstable\. This strengthen the fact to consider 2D or 3D positional encoding\. The bottom\-right padding method \(2D pos\. pad\.\) is the worst method among 2D positional encoding methods: it is highly variable especially in low regime \(1 blocks\) and extreme regime \(14 blocks\) indicating no consistent improvement with scale and reinforcing the need of per\-block padding\. The 2D pos\. with per\-block padding method \(2D pos\.\) is more robust but still have instabilities with degradation in extreme regime \(14 blocks\) and do not show improvement with scale\. The per\-block 2D pos\. with per\-block padding \(2D pos\. per block\) is the closest method of 3D pos\. and performs better than 2D pos\.\. It is unstable in low regime \(1 block\) but stabilizes with scale with no degradation in extreme regime \(14 block\)\. This strengthens the fact to consider per\-block positional encoding\. The per\-block 3D positional encoding along with the per\-block padding method \(3D pos\. per block\) is the best overall method: it has lowest or near\-lowest mean error most consistently with competitive or best median performance and is in general more stable \(lower std in most settings\)\. All methods that do not consider per\-block information \(2D pos\. or 2D pos\. pad\.\) are stable in medium regime but become unstable in low/extreme regime\. Both methods that incorporate per\-block positional information \(2D pos\. per block or 3D pos\. per block\) stay stable at extreme regime\. Adding the block index in positional encoding as in 3D pos\. helps stabilize the method in all regimes\.
### E\.3Ablations: importance of variety in training strategy
We design a training strategy that enables as much flexibility as possible: the parameter location, the spot size but also the entity to infer, the size and the number of blocks vary\. We show that this design is necessary to have a flexible architecture to many interaction maps\.
#### E\.3\.1Necessity of various block numbers during training
Presenting maps with various numbers of blocks to the model is necessary to have an architecture flexible to many numbers of blocks\. We consider the following ablations in the training set:
- •the model is trained on maps of22blocks \(referred as 3 chr\.\)
- •the model is trained on maps of44blocks \(referred as 5 chr\.\)
- •the model is trained on maps of66blocks \(referred as 7 chr\.\)
- •the model is trained on maps of22,44or66blocks \(referred as 3\-5\-7 chr\.\)
Table 10:Normalized absolute error comparison across methods for low/mid regime \(1, 4 blocks\)\. Results are reported as mean±\\pmstd, median, and 95% CI\.1 Block4 BlocksMethodMean±\\pmStdMedian95% CIMean±\\pmStdMedian95% CIBlockFormer0\.43±\\pm0\.320\.39\[0\.02, 1\.22\]0\.39±\\pm0\.300\.33\[0\.02, 1\.08\]3\-5\-7 chr\.0\.47±\\pm0\.430\.38\[0\.02, 1\.47\]0\.37±\\pm0\.290\.28\[0\.03, 1\.05\]7 chr\.0\.62±\\pm0\.540\.49\[0\.02, 1\.74\]0\.39±\\pm0\.280\.36\[0\.03, 1\.06\]5 chr\.0\.89±\\pm0\.680\.74\[0\.02, 2\.42\]0\.44±\\pm0\.680\.30\[0\.03, 1\.06\]3 chr\.0\.45±\\pm0\.360\.40\[0\.05, 1\.32\]0\.76±\\pm0\.690\.52\[0\.04, 2\.38\]Table 11:Normalized absolute error comparison across methods for high, extreme regimes \(10, 14 blocks\)\. Results are reported as mean±\\pmstd, median, and 95% CI\.10 Blocks14 BlocksMethodMean±\\pmStdMedian95% CIMean±\\pmStdMedian95% CIBlockFormer0\.39±\\pm0\.300\.35\[0\.03, 1\.13\]0\.36±\\pm0\.290\.29\[0\.01, 0\.93\]3\-5\-7 chr\.1\.13±\\pm1\.890\.80\[0\.01, 3\.15\]1\.17±\\pm1\.020\.88\[0\.01, 3\.71\]7 chr\.1\.22±\\pm2\.400\.61\[0\.01, 5\.96\]1\.82±\\pm2\.800\.97\[0\.03, 10\.58\]5 chr\.5\.66±\\pm7\.812\.51\[0\.17, 27\.56\]9\.47±\\pm10\.774\.53\[0\.47, 34\.95\]3 chr\.2\.12±\\pm1\.821\.66\[0\.07, 6\.36\]6\.40±\\pm10\.142\.58\[0\.09, 38\.50\]Figure 10:Mean normalized error with standard deviation\.BlockFormer\(BF\) outperforms all the others methods especially for high number of blocks, showing that maps with various number of blocks in the training set is necessary to maintain sub\-resolution accuracy no matter the number of blocks \(under 1\)\.
#### E\.3\.2Sensitivity to simulator parameters
The training set was designed to avoid any sensitivity to simulator parameters especially the parameter locations and spot sizes\. We consider the following ablations in the training set:
- •the spot size is fixed to1\.01\.0in each map\. \(referred as fixedσ\\sigma\)
- •the maps have no background noise\. \(referred as no noise\)
The training setting is identical as the one ofBlockFormerand detailed in Appendix[D](https://arxiv.org/html/2605.21617#A4)\.
Table 12:Normalized absolute error comparison across methods for low/mid regime \(1, 4 blocks\)\. Results are reported as mean±\\pmstd, median, and 95% CI\.1 Block4 BlocksMethodMean±\\pmStdMedian95% CIMean±\\pmStdMedian95% CIBlockFormer0\.43±\\pm0\.320\.39\[0\.02, 1\.22\]0\.39±\\pm0\.300\.33\[0\.02, 1\.08\]fixedσ\\sigma1\.39±\\pm1\.311\.03\[0\.02, 4\.54\]1\.82±\\pm2\.411\.10\[0\.07, 5\.98\]no noise2\.33±\\pm2\.641\.29\[0\.03, 9\.50\]3\.08±\\pm3\.242\.85\[0\.22, 7\.02\]Table 13:Normalized absolute error comparison across methods for high, extreme regimes \(10, 14 blocks\)\. Results are reported as mean±\\pmstd, median, and 95% CI\.10 Blocks14 BlocksMethodMean±\\pmStdMedian95% CIMean±\\pmStdMedian95% CIBlockFormer0\.39±\\pm0\.300\.35\[0\.03, 1\.13\]0\.36±\\pm0\.290\.29\[0\.01, 0\.93\]fixedσ\\sigma2\.05±\\pm2\.751\.33\[0\.03, 9\.74\]2\.77±\\pm4\.021\.67\[0\.07, 13\.22\]no noise2\.81±\\pm2\.712\.13\[0\.09, 9\.48\]3\.63±\\pm3\.462\.81\[0\.04, 11\.74\]Figure 11:Mean normalized error with standard deviation\.BlockFormer\(BF\) outperforms all the others methods showing that adding noise and making spot size vary in the training set is necessary to have sub\-resolution accuracy \(under 1\)\.
## Appendix FSimulation\-based inference for parameter uncertainty quantification
### F\.1Generic framework
Interaction maps are summary statistics over a population of entities and thus inherently contain bias and noise\. This is particularly visible in Hi\-C data where DNA folding is highly variable\. A Bayesian approach offers a clear advantage over simple point estimation by explicitly modeling this uncertainty \(posterior width reflects confidence\), allowing variability and noise to be incorporated into the analysis\. This stochastic framework provides a more faithful representation of the underlying biological heterogeneity and yields more robust, interpretable inferences\. The goal is to infer a set of per\-entity parametersθ\\thetafrom an interaction mapCrefC\_\{\\text\{ref\}\}using a probabilistic framework based on simulations\. The usual way for doing so would be to search for the most appropriateθ\\thetafor a givenCrefC\_\{\\text\{ref\}\}by maximizing the likelihood:
θ^=argmaxθ∈Ωlogp\(Cref\|θ\)\.\\hat\{\\theta\}=\\underset\{\\theta\\in\\Omega\}\{\\text\{argmax\}\}\\ \\log p\(C\_\{\\text\{ref\}\}\|\\theta\)~\.However, as the simulator is often very complex \(e\.g\. biological simulators\), the likelihoodp\(C\|θ\)p\(C\|\\theta\)may be intractable\. As such, we directly target the posterior densityp\(θ\|Cref\)p\(\\theta\|C\_\{\\text\{ref\}\}\)using data from the simulator, either via approximate Bayesian computation \(Sequential Monte\-Carlo ABC: SMC\-ABC\) or by estimating the posterior density with a conditional normalizing flow \(Sequential neural posterior estimation: SNPE\)\[[28](https://arxiv.org/html/2605.21617#bib.bib28),[14](https://arxiv.org/html/2605.21617#bib.bib14)\]\.
### F\.2Inference of variable size parameter
As the dimensionality of the parameter to infer varies with the number of entities, directly inferring anyθ\\thetafrom any interaction mapCCbecomes challenging\. Moreover, if we consider the entire interaction maps and the corresponding parameters in the Bayesian inference methods, we face the curse of dimensionality: the space of parametersθ\\thetabecomes too large to cover with few simulations, and the mapsCCare too big\. To bypass those issues, we decompose the problem into multiple sub\-problems: we perform in parallel one inference per entity, estimatingθi\\theta\_\{i\}fromCiC\_\{i\}, the parameter\-related part ofCC\. The space ofθ\\thetais thus cut into several entity\-length 1D intervals, reducing the train set size\. AssumingIIentities, the actual targeted posterior is
p⊗\(θ\|Cref\):=∏i=1Ip\(θi\|Cref,i\)p^\{\\otimes\}\(\\theta\|C\_\{\\text\{ref\}\}\):=\\prod\_\{i=1\}^\{I\}p\(\\theta\_\{i\}\|C\_\{\\text\{ref\},i\}\)p⊗\(θ\|Cref\)p^\{\\otimes\}\(\\theta\|C\_\{\\text\{ref\}\}\)is not the true joint posteriorp\(θ\|Cref\)p\(\\theta\|C\_\{\\text\{ref\}\}\)\. To justify it, we representCCas a sequence of trans\-blocks,C=\(Ck,l\)1≤k≠l≤IC=\(C\_\{k,l\}\)\_\{1\\leq k\\neq l\\leq I\}and draw a part of the causal graph betweenθ\\thetaandCC:
Ci,kC\_\{i,k\}θ1\\theta\_\{1\}θi\\theta\_\{i\}θj\\theta\_\{j\}θk\\theta\_\{k\}θn\\theta\_\{n\}⋯\\cdots⋯\\cdotsCi,jC\_\{i,j\}Cj,kC\_\{j,k\}
We first introduce amean\-field approximationby assuming the conditional independence across chromosomes:p\(θ\|C\)≈∏i∈Ip\(θi\|C\)p\(\\theta\|C\)\\approx\\prod\_\{i\\in I\}p\(\\theta\_\{i\}\|C\)\. Indeed, sinceCCcontains the blockCi,jC\_\{i,j\}that is jointly generated from\(θi,θj\)\(\\theta\_\{i\},\\theta\_\{j\}\),Ci,jC\_\{i,j\}acts as a collider in the causal graph \(in red\) andθi⟂⟂θj\|C\\theta\_\{i\}\\not\\\!\\perp\\\!\\\!\\\!\\perp\\theta\_\{j\}\|C\. We furthermore introduce alocality assumptionby stating thatp\(θi\|C\)≈p\(θi\|Ci\)p\(\\theta\_\{i\}\|C\)\\approx p\(\\theta\_\{i\}\|C\_\{i\}\)\. θi\\theta\_\{i\}appears directly inCConly through blocksCi,kC\_\{i,k\}orCk,iC\_\{k,i\}and is indirectly influenced by other blocksCj,kC\_\{j,k\}\. Indeed, in the causal graph, the green path is open since it contains a collider andθi⟂⟂Cj,k\|Ci,k\\theta\_\{i\}\\not\\\!\\perp\\\!\\\!\\\!\\perp C\_\{j,k\}\|C\_\{i,k\}\. However, we only consider first\-order dependencies \(direct causal link\)\. Sop\(θi\|C\)=p\(θi\|\(Ck,l,1≤k≠l≤I\)≈p\(θi\|\[Ci,k,Ck,i\],k≠i\)p\(\\theta\_\{i\}\|C\)=p\(\\theta\_\{i\}\|\(C\_\{k,l\},\{1\\leq k\\neq l\\leq I\}\)\\approx p\(\\theta\_\{i\}\|\[C\_\{i,k\},C\_\{k,i\}\],k\\neq i\)\. SinceCk,i=Ci,kTC\_\{k,i\}=C\_\{i,k\}^\{T\}, it does not contain additional information andp\(θi\|\[Ci,k,Ck,i\],k≠i\)=p\(θi\|Ci,k,k≠i\):=p\(θi\|Ci\)p\(\\theta\_\{i\}\|\[C\_\{i,k\},C\_\{k,i\}\],k\\neq i\)=p\(\\theta\_\{i\}\|C\_\{i,k\},k\\neq i\):=p\(\\theta\_\{i\}\|C\_\{i\}\)such thatp\(θi\|C\)≈p\(θi\|Ci\)p\(\\theta\_\{i\}\|C\)\\approx p\(\\theta\_\{i\}\|C\_\{i\}\)\. To summarize:
p\(θ\|Cref\)≈mean\-field approx\.∏1≤i≤Ip\(θi\|Cref\)≈locality∏1≤i≤Ip\(θi\|Cref, i\)=p⊗\(θ\|Cref\)p\(\\theta\|C\_\{\\text\{ref\}\}\)\\underset\{\\text\{\{mean\-field approx\.\}\}\}\{\\approx\}\\prod\_\{1\\leq i\\leq I\}p\(\\theta\_\{i\}\|C\_\{\\text\{ref\}\}\)\\underset\{\\text\{\{locality\}\}\}\{\\approx\}\\prod\_\{1\\leq i\\leq I\}p\(\\theta\_\{i\}\|C\_\{\\text\{ref, i\}\}\)=p^\{\\otimes\}\(\\theta\|C\_\{\\text\{ref\}\}\)Although this approximation ignores higher\-order dependencies induced by collider paths in the full causal graph, it yields a scalable inference procedure that can be independently applied across entities of varying dimensionality\. In all the presented inference methods in Appendices[G](https://arxiv.org/html/2605.21617#A7)and[H](https://arxiv.org/html/2605.21617#A8), we target in parallel, dimension per dimension, each marginalp\(θi\|Ci\)p\(\\theta\_\{i\}\|C\_\{i\}\)\.To avoid notational overhead in the algorithms,θ\\thetawill represent anyθi\\theta\_\{i\}andCC, anyCiC\_\{i\}\.
## Appendix GSMC\-ABC
We use a variant of ABC coupled with sequential Monte\-Carlo \(SMC\)\[[30](https://arxiv.org/html/2605.21617#bib.bib30)\]\. It consists of multiple rounds of ABC where, at each round, relevant\{θk,∗\}k\\\{\\theta^\{k,\*\}\\\}\_\{k\}are selected from the training set\{\(θn,Cn\)\}n\\\{\(\\theta^\{n\},C^\{n\}\)\\\}\_\{n\}depending on a closeness criterion betweenCCandCrefC\_\{\\text\{ref\}\}\. We then associate weights\{wk\}k\\\{w^\{k\}\\\}\_\{k\}to those selected\{θk,∗\}k\\\{\\theta^\{k,\*\}\\\}\_\{k\}, and use the set\{\(θk,∗,wk\)\}k\\\{\(\\theta^\{k,\*\},w^\{k\}\)\\\}\_\{k\}to create the next population of\{θn\}n\\\{\\theta^\{n\}\\\}\_\{n\}for the next round of ABC\. This sequential approach enables us to refine the relevantθ\\thetaat each round\. However, we need to define a metric for discriminating\(θn,θm\)\(\\theta^\{n\},\\theta^\{m\}\)based on their associated observations\(Cn,Cm\)\(C^\{n\},C^\{m\}\)\.
### G\.1With the metric Pearson correlation –ABC\-Pearson
To measure the closeness betweenCCandCrefC\_\{\\text\{ref\}\}, the Pearson correlation is commonly used\[[29](https://arxiv.org/html/2605.21617#bib.bib29),[23](https://arxiv.org/html/2605.21617#bib.bib23),[32](https://arxiv.org/html/2605.21617#bib.bib32)\]\. We find that the vector\-based Pearson correlation averaged over all trans\-contacts blocks is the most discriminative metric: each trans\-contacts block ofCCandCrefC\_\{\\text\{ref\}\}is vectorized and the Pearson correlation is computed between both\. We then average all the correlations over the trans\-contacts blocks \(see Algorithm[2](https://arxiv.org/html/2605.21617#alg2)\)\. However, this metric is fine\-tuned to this specific inference task\.
Algorithm 2SMC\-ABC based on Pearson correlation inspired from\[[30](https://arxiv.org/html/2605.21617#bib.bib30)\]Input:
TTrounds, prior
pp, train set of size
NN, acceptance rate
5%5\\%, perturbation kernel
K=𝒩\(\.,σ2Id\)K=\\mathcal\{N\}\(\.,\\sigma^\{2\}\\text\{Id\}\)\(
σ=\\sigma=resolution \(bp\)\)
Return:
θ∼p\(θ\|corr\(C,Cref\)≥ϵcorr\)\\theta\\sim p\(\\theta\|\\text\{corr\}\(C,C\_\{\\text\{ref\}\}\)\\geq\\epsilon\_\{\\text\{corr\}\}\)
roundt=0t=0
\- sample
θn∼p\\theta^\{n\}\\sim p, and
Cn∼p\(\.\|θn\),n∈⟦1,N⟧C^\{n\}\\sim p\(\.\|\\theta^\{n\}\),n\\in\\llbracket 1,N\\rrbracket
\- compute
corr\(Cn,Cref\)\\text\{corr\}\(C^\{n\},C\_\{\\text\{ref\}\}\)and keep the top
5%5\\%of
\{θn\}n\\\{\\theta^\{n\}\\\}\_\{n\}in terms of the highest correlation:
\{θm,0,m∈⟦1,M⟧\}\\\{\\theta^\{m,0\},m\\in\\llbracket 1,M\\rrbracket\\\}with
M=5%×NM=5\\%\\times N
\- compute weights
\{wm,0=1M,m∈⟦1,M⟧\}\\\{w^\{m,0\}=\\frac\{1\}\{M\},m\\in\\llbracket 1,M\\rrbracket\\\}
output roundt=0t=0:
\{\(θm,0,wm,0\)\}m∈⟦1,M⟧\\\{\(\\theta^\{m,0\},w^\{m,0\}\)\\\}\_\{m\\in\\llbracket 1,M\\rrbracket\}
for
0<t<T0<t<Tdo
roundtt
\- from the previous accepted
\{θm,t−1\}m∈⟦1,M⟧\\\{\\theta^\{m,t\-1\}\\\}\_\{m\\in\\llbracket 1,M\\rrbracket\}, sample
\{θ¯k,k∈⟦1,M⟧\}\\\{\\bar\{\\theta\}^\{k\},k\\in\\llbracket 1,M\\rrbracket\\\}from multinomial
ℳ\(\{θm,t−1\}m,\{wm,t−1\}m\)\\mathcal\{M\}\(\\\{\\theta^\{m,t\-1\}\\\}\_\{m\},\\\{w^\{m,t\-1\}\\\}\_\{m\}\)with replacement
\- perturb
NM\\frac\{N\}\{M\}times the
MMsamples
θ¯k\\bar\{\\theta\}^\{k\}to have
NNsamples
θn\\theta^\{n\}θn←θ¯k\+ϵwithϵ∼𝒩\(0,σ2Id\)fork=nmodMandn=1,…,N\\theta^\{n\}\\leftarrow\\bar\{\\theta\}^\{k\}\+\\epsilon\\text\{ with \}\\epsilon\\sim\\mathcal\{N\}\(0,\\sigma^\{2\}\\text\{Id\}\)\\text\{ for \}k=n\\text\{ mod \}M\\text\{ and \}n=1,\.\.\.,N
\- check that
θn\\theta^\{n\}is in the prior bound otherwise, set
θn←θ¯k\\theta^\{n\}\\leftarrow\\bar\{\\theta\}^\{k\}
\- from this set
\{θn\}n∈⟦1,N⟧\\\{\\theta^\{n\}\\\}\_\{n\\in\\llbracket 1,N\\rrbracket\}, sample
Cn∼p\(\.\|θn\),n∈⟦1,N⟧C^\{n\}\\sim p\(\.\|\\theta^\{n\}\),n\\in\\llbracket 1,N\\rrbracket
\- compute
corr\(Cn,Cref\)\\text\{corr\}\(C^\{n\},C\_\{\\text\{ref\}\}\)and keep the top
5%5\\%of
\{θn\}n\\\{\\theta^\{n\}\\\}\_\{n\}in terms of the highest correlation: \{
θm,t,m∈⟦1,M⟧\}\\theta^\{m,t\},m\\in\\llbracket 1,M\\rrbracket\\\}
\- compute corresponding weights
wm,t=p\(θm,t\)∑k=1Mwk,t−1K\(θm,t;θk,t−1\)w^\{m,t\}=\\frac\{p\(\\theta^\{m,t\}\)\}\{\\sum\_\{k=1\}^\{M\}w^\{k,t\-1\}K\(\\theta^\{m,t\};\\theta^\{k,t\-1\}\)\}
output roundtt:
\{\(θm,t,wm,t\)\}m∈⟦1,M⟧\\\{\(\\theta^\{m,t\},w^\{m,t\}\)\\\}\_\{m\\in\\llbracket 1,M\\rrbracket\}
endfor
returnaccepted samples
θn∼p\(θ\|corr\(Cn,Cref\)≥ϵcorr\)\\theta^\{n\}\\sim p\(\\theta\|\\text\{corr\}\(C^\{n\},C\_\{\\text\{ref\}\}\)\\geq\\epsilon\_\{\\text\{corr\}\}\)
Whenϵcorr→1\\epsilon\_\{\\text\{corr\}\}\\to 1,p\(θ\|corr\(C,Cref\)≥ϵcorr\)→p\(θ\|Cref\)p\(\\theta\|\\text\{corr\}\(C,C\_\{\\text\{ref\}\}\)\\geq\\epsilon\_\{\\text\{corr\}\}\)\\to p\(\\theta\|C\_\{\\text\{ref\}\}\)\[[30](https://arxiv.org/html/2605.21617#bib.bib30)\]\.
### G\.2With a summary statistic and the classicall2l^\{2\}\-norm –ABC\-TransforABC\-CNN
Instead of looking for a specific metric to compareCCtoCrefC\_\{\\text\{ref\}\}, we choose to use the classicall2l^\{2\}\-norm\. For this, we need a summary statisticSSthat will extract the main features ofCCand project it into a low\-dimensional vector\. We employBlockFormeras a*data\-driven summary statistic*\(method referred asABC\-Transf\) motivated by the fact thatℬℱϕ\(C\)\\mathcal\{BF\}\_\{\\phi\}\(C\)pre\-trained with Equation[1](https://arxiv.org/html/2605.21617#S3.E1)approximates the conditional expectation𝔼\[θ\|C\]\\mathbb\{E\}\\left\[\\theta\|C\\right\], which is, by definition, the solution to the regression ofθ\\thetafromCC:
𝔼\[θ\|C\]=argminS∈ℱ𝔼\[‖S\(C\)−θ‖22\],\\mathbb\{E\}\\left\[\\theta\|C\\right\]=\\underset\{S\\in\\mathcal\{F\}\}\{\\text\{argmin\}\}\\ \\mathbb\{E\}\\left\[\\\|S\(C\)\-\\theta\\\|^\{2\}\_\{2\}\\right\]~,whereℱ\\mathcal\{F\}is the set of square integrable functions\. As shown in\[[19](https://arxiv.org/html/2605.21617#bib.bib19)\], such a choice is relevant because𝔼\[θ\|C\]\\mathbb\{E\}\\left\[\\theta\|C\\right\]preserves first\-order information when summarizingCCas the thresholdϵ\\epsilonapproaches0\. We also compare the approach with another architecture forSScomposed of a CNN shared across chromosomes followed by a per\-chromosome set of MLPs \(method referred asABC\-CNN\)\.
Algorithm 3ABC with learned summary statistic inspired from\[[19](https://arxiv.org/html/2605.21617#bib.bib19)\]Input: \(deep\) neural network \(DNN\)
SϕS\_\{\\phi\}, threshold
ϵ\\epsilon, Euclidean norm in
ℝn\\mathbb\{R\}^\{n\}, simulator, prior
pp
Return: Samples
θ\\thetafrom the estimated posterior density
p\(\.∣∥Sϕ\(C\)−Sϕ\(Cref\)∥2≤ϵ\)p\(\.\\mid\\\|S\_\{\\phi\}\(C\)\-S\_\{\\phi\}\(C\_\{\\text\{ref\}\}\)\\\|\_\{2\}\\leq\\epsilon\)
Stage 1: learn the summary statistic
Sϕ\(\.\)S\_\{\\phi\}\(\.\)s\.t\.
Sϕ\(C\)≈𝔼\[θ\|C\]S\_\{\\phi\}\(C\)\\approx\\mathbb\{E\}\\left\[\\theta\|C\\right\]
generate a train set
\(θn,Cn\)\(\\theta^\{n\},C^\{n\}\)from
p\(θ\)p\(C\|θ\)p\(\\theta\)p\(C\|\\theta\)
train a DNN
SϕS\_\{\\phi\}on this train set with the loss to minimize in
ϕ\\phiℒ^DNN\(ϕ\)=1N∑1≤n≤N‖Sϕ\(Cn\)−θn‖22\\widehat\{\\mathcal\{L\}\}\_\{\\text\{DNN\}\}\(\\phi\)=\\frac\{1\}\{N\}\\sum\_\{1\\leq n\\leq N\}\\\|S\_\{\\phi\}\(C^\{n\}\)\-\\theta^\{n\}\\\|^\{2\}\_\{2\}
output
Sϕ\(\.\)S\_\{\\phi\}\(\.\)s\.t\.
Sϕ\(C\)≈𝔼\[θ\|C\]S\_\{\\phi\}\(C\)\\approx\\mathbb\{E\}\\left\[\\theta\|C\\right\]
Stage 2: run ABC with the learned summary statistic
SϕS\_\{\\phi\}and the criterion
‖Sϕ\(C\)−Sϕ\(Cref\)‖2≤ϵ\\\|S\_\{\\phi\}\(C\)\-S\_\{\\phi\}\(C\_\{\\text\{ref\}\}\)\\\|\_\{2\}\\leq\\epsilon
returnaccepted samples
θn∼p\(\.∣∥Sϕ\(Cn\)−Sϕ\(Cref\)∥2≤ϵ\)\\theta^\{n\}\\sim p\(\.\\mid\\\|S\_\{\\phi\}\(C^\{n\}\)\-S\_\{\\phi\}\(C\_\{\\text\{ref\}\}\)\\\|\_\{2\}\\leq\\epsilon\)
ForSϕS\_\{\\phi\}informative enough and whenϵ→0\\epsilon\\to 0, as shown in\[[19](https://arxiv.org/html/2605.21617#bib.bib19)\],
p\(θ∣∥Sϕ\(C\)−Sϕ\(Cref\)∥≤ϵ\)→p\(θ\|Sϕ\(Cref\)\)≈p\(θ\|Cref\)\.p\(\\theta\\mid\\\|S\_\{\\phi\}\(C\)\-S\_\{\\phi\}\(C\_\{\\text\{ref\}\}\)\\\|\\leq\\epsilon\)\\to p\(\\theta\|S\_\{\\phi\}\(C\_\{\\text\{ref\}\}\)\)\\approx p\(\\theta\|C\_\{\\text\{ref\}\}\)\.
## Appendix HSNPE –NPE\-TransforNPE\-CNN
SMC\-ABC yields only samples from the target posterior distributionp\(θ\|Cref\)p\(\\theta\|C\_\{\\text\{ref\}\}\), but evaluating log\-probabilities can be useful for downstream tasks\. In contrast, Neural Posterior Estimation \(NPE\) trains a conditional normalizing flowpψ\(θ\|C\)p\_\{\\psi\}\(\\theta\|C\)\[[28](https://arxiv.org/html/2605.21617#bib.bib28),[14](https://arxiv.org/html/2605.21617#bib.bib14)\]to estimate the posterior distribution\. It is then easy to sample from the posterior and return the values of its log\-probabilities\. To ensure thatpψ\(θ\|Cref\)p\_\{\\psi\}\(\\theta\|C\_\{\\text\{ref\}\}\)is close top\(θ\|Cref\)p\(\\theta\|C\_\{\\text\{ref\}\}\), we minimize the Kullback–Leibler divergence \(DKLD\_\{\\mathrm\{KL\}\}\) between both densities, averaged over the observationsCCas per𝔼C\[DKL\(p\(⋅\|C\)∥pψ\(⋅\|C\)\)\]\\mathbb\{E\}\_\{C\}\\big\[D\_\{\\mathrm\{KL\}\}\\big\(p\(\\cdot\|C\)\\\|p\_\{\\psi\}\(\\cdot\|C\)\\big\)\\big\]\. After simplifications and using a Monte Carlo estimator, the flow is trained to minimize, with\(θn,Cn\)∼p\(θ\)p\(C\|θ\)\(\\theta^\{n\},C^\{n\}\)\\sim p\(\\theta\)p\(C\|\\theta\):
ℒ^NPE\(ψ\)\\displaystyle\\widehat\{\\mathcal\{L\}\}\_\{\\text\{NPE\}\}\(\\psi\)=−1N∑nlog\(pψ\(θn\|Cn\)\)\.\\displaystyle=\-\\frac\{1\}\{N\}\\sum\_\{n\}\\log\(p\_\{\\psi\}\(\\theta^\{n\}\|C^\{n\}\)\)\.Once trained, we obtain an amortized estimator of the posterior densitiesp\(θ\|C\)p\(\\theta\|C\)valid for anyCC\. We just have to plug inCrefC\_\{\\text\{ref\}\}to get the estimated posterior densitypψ\(\.\|Cref\)p\_\{\\psi\}\(\.\|C\_\{\\text\{ref\}\}\)\(see Algorithm[4](https://arxiv.org/html/2605.21617#alg4)\)\. Since we are actually interested in the posterior atCrefC\_\{\\text\{ref\}\}, parametersθ\\thetawith very low posterior density may not be useful for learningψ\\psi\. Thus, we consider a sequential approach of NPE \(SNPE\) with several rounds to get an iterative refinement of the posterior estimate\[[14](https://arxiv.org/html/2605.21617#bib.bib14)\]\. From the second round,θn\\theta^\{n\}are sampled from the latest estimated posterior found instead of the prior\. This way, training samples are more informative aboutCrefC\_\{\\text\{ref\}\}, gradually improving the learning ofψ\\psi\. As the observationsCCare high\-dimensional \(e\.g\. 2D\-matrices\), we encode them in a summary statisticSϕS\_\{\\phi\}before providing them to the normalizing flowpψp\_\{\\psi\}\. This way, we actually learnpψ\(\.\|Sϕ\(Cref\)\)p\_\{\\psi\}\(\.\|S\_\{\\phi\}\(C\_\{\\text\{ref\}\}\)\)that should be close topψ\(\.\|Cref\)p\_\{\\psi\}\(\.\|C\_\{\\text\{ref\}\}\)ifSϕS\_\{\\phi\}is asufficientsummary statistic\. When we use the \(frozen\) pre\-trainedBlockFormer, the method is referred asNPE\-Transf\. For comparison, we also use the CNN\-based architecture in the method referred asNPE\-CNN\.
Algorithm 4SNPE inspired from\[[28](https://arxiv.org/html/2605.21617#bib.bib28)\]and\[[14](https://arxiv.org/html/2605.21617#bib.bib14)\]Input:
TTrounds, posterior density estimator
pψp\_\{\\psi\}, simulator, prior
pp, simulation budget
NN, observation
CrefC\_\{\\text\{ref\}\}, pre\-learned summary statistic
SϕS\_\{\\phi\}
Return: The estimated posterior density
pψ\(\.\|Sϕ\(Cref\)\)p\_\{\\psi\}\(\.\|S\_\{\\phi\}\(C\_\{\\text\{ref\}\}\)\)
forround
t=1,…,Tt=1,\.\.\.,Tdo
if
t=1t=1then
pt=pp\_\{t\}=p
endif
for
n=1,…,Nn=1,\.\.\.,Ndo
sample
θn∼pt\\theta^\{n\}\\sim p\_\{t\}
sample
Cn∼p\(\.\|θn\)C^\{n\}\\sim p\(\.\|\\theta^\{n\}\)
endfor
train the posterior estimator
pψp\_\{\\psi\}on
𝒟=\{\(θn,Cn\)\}n\\mathcal\{D\}=\\\{\(\\theta^\{n\},C^\{n\}\)\\\}\_\{n\}with the loss to minimize in
ψ\\psiℒ^NPE\(ψ\)=−1N∑1≤n≤Nlogpψ\(θn\|Sϕ\(Cn\)\)\\widehat\{\\mathcal\{L\}\}\_\{\\text\{NPE\}\}\(\\psi\)=\-\\frac\{1\}\{N\}\\sum\_\{1\\leq n\\leq N\}\\log p\_\{\\psi\}\(\\theta^\{n\}\|S\_\{\\phi\}\(C^\{n\}\)\)
use
pψp\_\{\\psi\}to construct the estimated posterior:
pψ\(\.\|Sϕ\(Cref\)\)p\_\{\\psi\}\(\.\|S\_\{\\phi\}\(C\_\{\\text\{ref\}\}\)\)\.
define the proposal for the next round:
pt\(θ\)=pψ\(θ\|Sϕ\(Cref\)\)p\_\{t\}\(\\theta\)=p\_\{\\psi\}\(\\theta\|S\_\{\\phi\}\(C\_\{\\text\{ref\}\}\)\)
endfor
returnsamples
θn∼pψ\(θ\|Sϕ\(Cref\)\)\\theta^\{n\}\\sim p\_\{\\psi\}\(\\theta\|S\_\{\\phi\}\(C\_\{\\text\{ref\}\}\)\)
## Appendix IApplication: centromeres inference forSaccharomyces cerevisiae– posterior estimation\.
S\. cerevisiaehas a genome of1616chromosomes, so we look for the centromeresθ=\(θ1,…,θ16\)\\theta=\(\\theta\_\{1\},\.\.\.,\\theta\_\{16\}\)\. To reduce the dimension of the problem, we carry1616parallel inferences: one per dimension ofθ\\theta\. Thus, we have16161D inference problems where the parameterθi\\theta\_\{i\}is drawn from a Uniform prior whose range is the size of the chromosomeiiin bp\. The simulator creates theithi^\{\\mathrm\{th\}\}row of trans\-blocks of a contact mapCC\(denotedCi\)C\_\{i\}\)\. All the inference methods target the posteriorp\(θi\|Cref,i\)p\(\\theta\_\{i\}\|C\_\{\\text\{ref\},i\}\)\. The summary statisticsSϕS\_\{\\phi\}that project each row of trans\-contact blocksCiC\_\{i\}toθi\\theta\_\{i\}isBlockFormer\. To construct the parameter estimate usingBlockFormer, we randomly select1010sets of22trans\-blocks fromCiC\_\{i\}and pass those1010maps to the transformer\. We obtain1010candidates and the final estimation is the mean over these1010candidates\. We compare with the case whereSϕS\_\{\\phi\}is a CNN that captures the information ofCiC\_\{i\}followed by an MLP to project this information intoθi\\theta\_\{i\}\. On the one hand, as the rows of trans\-contact blocksCiC\_\{i\}are quite similar, we choose a shared architecture for the CNN between chromosomes\. On the other hand, each MLP depends on the size of each chromosome so a chromosome\-specific architecture is thus needed for this part of the network\. The summary statistic is used in both methods: ABC and NPE\. WhenBlockFormeris used, the methods are referred asABC\-TransfandNPE\-Transf, when the CNN is used, the methods are referred asABC\-CNNandNPE\-CNN\. For the NPE methods, since we consider sequential methods that are specific to one observationCref, iC\_\{\\text\{ref, i\}\}, we need to learn one normalizing flowpψp\_\{\\psi\}per parameter\. However, we choose the same density estimator for all inferences: a Masked Autoregressive Flow \(MAF\) as well as the sequential approach SNPE\-C\[[14](https://arxiv.org/html/2605.21617#bib.bib14)\]for the experiments\. The MAF consist of 11 sequential transforms: 1 PointwiseAffineTransform followed by 5 blocks of \[MAF \+ RandomPermutation\], where each MAF is parameterized by a MADE network with 2 feedforward blocks \(MaskedLinear 50→50\) and a final MaskedLinear 50→2\. As the posterior ground truthp∗\(θ\|Cref\)p^\{\*\}\(\\theta\|C\_\{\\text\{ref\}\}\)is unknown, we choose to model it via independent Gaussian distributions\. We suppose conditional independence as explained in Section[F\.1](https://arxiv.org/html/2605.21617#A6.SS1)such that:
p∗\(θ\|Cref\)≈∏i=116p\(θi\|Cref,i\)=∏i=116𝒩\(θi;θref, i,σref2\)p^\{\*\}\(\\theta\|C\_\{\\text\{ref\}\}\)\\approx\\prod\_\{i=1\}^\{16\}p\(\\theta\_\{i\}\|C\_\{\\text\{ref\},i\}\)=\\prod\_\{i=1\}^\{16\}\\mathcal\{N\}\(\\theta\_\{i\};\\theta\_\{\\text\{ref, i\}\},\\sigma\_\{\\text\{ref\}\}^\{2\}\)whereσref=50\\sigma\_\{\\text\{ref\}\}=50based on the fact that centromeres in yeasts span genomic regions of roughly100100bp centered aroundθref\\theta\_\{\\text\{ref\}\}\[[7](https://arxiv.org/html/2605.21617#bib.bib7)\]\. The choice of this posterior is further motivated by the result in\[[19](https://arxiv.org/html/2605.21617#bib.bib19)\], which states that if the posterior distribution belongs to the exponential family \(e\.g\. Gaussian density\), then the summary statisticSϕ=𝔼\[θ\|C\]S\_\{\\phi\}=\\mathbb\{E\}\\left\[\\theta\|C\\right\]is asufficientsummary statistic\.
Figure 12:We report the absolute error per dimension ofθ\\thetabetween the meanθ\\thetacomputed over the5%5\\%bestθ\\thetaaccording to the ABC criterion or sampled from the flow andθref\\theta\_\{\\text\{ref\}\}\(a\) as well as the MMD \(b\) and the Wasserstein\-2 distance \(c\) betweenp\(θ\|Cref\)p\(\\theta\|C\_\{\\text\{ref\}\}\)andδθref\\delta\_\{\\theta\_\{\\text\{ref\}\}\}\. The horizontal dotted line stands for the resolution of the contact mapCrefC\_\{\\text\{ref\}\}\(in bp\) in the top figure\.As the methodNPE\-Transfoutperforms all the others and enables access to the density, we further provide calibration diagnostics\. We use the normalizing flow obtained in the final inference round to evaluate whether the predicted posterior distributions are well calibrated\. Specifically, we report simulation\-based calibration \(SBC\) ranks, which check if the true parameters are uniformly distributed within posterior samples, and the expected coverage at levelα=0\.9\\alpha=0\.9, which measures whether the nominal credible intervals match their empirical frequency\.


Figure 13:\(top\) Simulation\-Based Calibration rank\. \(bottom\) Expected coverage of levelα=90%\\alpha=90\\%\.The posteriors are globally well calibrated: in many dimensions, the histograms are roughly flat and the expected coverage are close to the nominal level \(e\.g\. forθ1\\theta\_\{1\},θ6\\theta\_\{6\}orθ11\\theta\_\{11\}\)\. However, some dimensions exhibit U\-shape histograms as well as coverage under the nominal level \(e\.g\. forθ4\\theta\_\{4\}orθ8\\theta\_\{8\}\) indicating overconfident posteriors where the true value tend to fall in the tails\. Conversely, other dimensions show inverted U\-shape histograms as well as coverage over the nominal level \(e\.g\. forθ7\\theta\_\{7\}orθ9\\theta\_\{9\}\) reflecting overly diffuse posteriors and conservative uncertainty estimates\. Overall, these diagnostics suggest good global calibration, with noticeable variability across parameters, likely due to differences in identifiability across chromosomes\.
## Appendix JExperiments
### J\.1A model flexible to various sizes of blocks
Figure 14:Boxplot of the absolute error between estimatedθ^i\\hat\{\\theta\}\_\{i\}and ground truthθi\\theta\_\{i\}averaged over1 0001\\ 000synthetic contact maps per number of trans\-blockskk, wherekkvaries from11to1010\.The maps are generated at resolution3232kb from a synthetic genome ofk\+1k\+1chromosomes, whose sizes vary from2×1052\\times 10^\{5\}bp to22Mbp\. At this resolution, each trans\-block has a size varying between66and6262pixels\. In all the cases, more than75%75\\%of the parameter estimation errors lie under the resolution\.
### J\.2A model flexible to various sequencing depths
We test the accuracy ofBlockFormeron100100synthetic maps generated from the reference genome of S\.C\. and also on the reference map of S\.C\., always keeping10%10\\%or50%50\\%of the sequencing depth\. For each number of blockskkand each chromosomeii, we report in Figure[15](https://arxiv.org/html/2605.21617#A10.F15)the absolute errorerrikerr\_\{i\}^\{k\}, defined as follows: per number of blockskk, we make 10 random choices \(l=1,…,10l=1,\.\.\.,10\) ofkkblocks of various sizes from the map \(e\.g\. fork=2k=2, we choose blocks\(2,9\)\(2,9\)\(choicel=1l=1\), blocks\(1,7\)\(1,7\)\(choicel=2l=2\), …\)\. Then, for target chromosomeii,kkblocks, and mapnn, we computeeik,n=110∑l=110\|θ^in,k,l−θi\|e^\{k,n\}\_\{i\}=\\frac\{1\}\{10\}\\sum\_\{l=1\}^\{10\}\|\\hat\{\\theta\}\_\{i\}^\{n,k,l\}\-\\theta\_\{i\}\|\. We reporterrik=1100∑n=1100eik,nerr^\{k\}\_\{i\}=\\frac\{1\}\{100\}\\sum\_\{n=1\}^\{100\}e^\{k,n\}\_\{i\}\. Figure[15](https://arxiv.org/html/2605.21617#A10.F15)shows that the model generalizes well to different sequencing depths: for synthetic data, the accuracy does not change from10%10\\%to50%50\\%of the sequencing depth and stays under the resolution in most cases\. Concerning the reference matrix,50%50\\%of sequencing depth does not change drastically the accuracy but at10%10\\%we start to see slight deterioration\.

\(a\) Synthetic maps, 10% sequencing depth\.
\(b\) Synthetic maps, 50% sequencing depth\.
\(c\) Reference map, 10% sequencing depth\.
\(d\) Reference map, 50% sequencing depth\.Figure 15:Absolute error per centromere over100100synthetic contact maps generated from the S\.C\. genome\(a\)and\(b\)and one reference map\(c\)and\(d\)with10%10\\%or50%50\\%of sequencing depth\. Chromosomes on the x\-axis are sorted by length \(bp\)\. Color shades range from blue to red as the number of blockskkincreases from11to1515\.
### J\.3A model flexible to various spot patterns settings
We compareBlockFormerwithCenturionin slightly different settings from the centromeres inference task\. The spots have different shapes that can appear in realistic biological cases\. In each setting, we generate100100synthetic maps at resolution3030kb based on the reference genome of S\.C\.\. We report the mean absolute error betweenθ^\\hat\{\\theta\}andθ\\thetain bp\.θ^\\hat\{\\theta\}is constructed as in Appendix[J\.4\.1](https://arxiv.org/html/2605.21617#A10.SS4.SSS1)\.
#### J\.3\.1Gaussian spots
Centurionis fine\-tuned to the centromere inference problem based on Hi\-C maps\. We first compareBlockFormerwith it in a synthetic setting with Gaussian spots of different sizes and locations\. In this setting favorable to both methods, our network is much faster thanCenturionand leads to slightly similar accuracy: around1010kb when working at resolution3030kb\.
Figure 16:Mean absolute error \(bp, left\) and runtime \(s, right\) over100100synthetic maps generated from the S\.C\. genome at resolution3030kb\. The spots in each trans\-block are Gaussian\. The black dotted line in the left plot represents the resolution of the maps\.
#### J\.3\.2Square spots
Figure 17:One synthetic map simulated from the S\.C\. reference genome with square spots in each trans\-block\.
#### J\.3\.3Elliptical spots
In gene\-dense regions, the chromatin is more dynamic leading to interactions fluctuation and anisotropic spots\. To represent such situation, we generate maps with elliptical spots\. In this setting,Centurionis slightly more accurate than our model but remains slower\.
Figure 18:One synthetic map simulated from the S\.C\. reference genome with elliptical spots in each trans\-block\.When the spots are elliptical, both methods output under\-resolution parameter estimation\.Centurionis1\.51\.5more accurate but more than1010times slower thanBlockFormer\.
Figure 19:Mean absolute error \(bp, left\) and runtime \(s, right\) over100100synthetic maps generated from the S\.C\. genome at resolution3030kb\. The spots in each trans\-block are ellipse\. The black dotted line in the left plot represents the resolution of the maps\.
#### J\.3\.4Ring spots
Interactions between entities can also happen at a preferred distance with an exclusion at center likely due to structural constraint \(e\.g\. with protein complexes blocking direct contact\)\. To mimic this scenario, we generate maps with ring\-shaped spots\. When the spots are ring\-shaped,Centurionis more accurate than our method but remains too slow\.
Figure 20:One synthetic map simulated from the S\.C\. reference genome with ring spots in each trans\-block\.When the spots in the maps are rings instead of Gaussian spots,Centurionhas difficulties to output accurate results in reasonable time\. Our approach is1\.21\.2times less accurate thanCenturionsince the accuracy is around twice the resolution but the runtime is nearly10210^\{2\}smaller\.
Figure 21:Mean absolute error \(bp, left\) and runtime \(s, right\) over100100synthetic maps generated from the S\.C\. genome at resolution3030kb\. The spots in each trans\-block are rings\. The black dotted line in the left plot represents the resolution of the maps\.
#### J\.3\.5Multiple spots
Figure 22:One synthetic map simulated from the S\.C\. genome with two Gaussian spots per trans\-block\.When the contact maps are noisy: with22Gaussian spots per trans\-blocks, one major and one auxiliary smaller and less bright, our model outperformsCenturionin both speed and accuracy, achieving 1\.8 times better accuracy while running 10 times faster\. Moreover, our method manages to estimateθ\\thetaat a precision under the resolution\.
#### J\.3\.6Noisy map
Figure 23:One synthetic map simulated from the S\.C\. reference genome with noise\.When the map is noisy: per trans\-block, we add traps consisting in55random pixels with intensity equal the maximum of the block,Centurionproduces estimates with a precision exceeding the map resolution, whereas our method is able to estimateθ\\thetawith sub\-resolution accuracy\. In this settingCenturionis2\.52\.5times less accurate than our method and its runtime is nearly1010times slower\.
Figure 24:Absolute error \(bp, left\) and runtime \(s, right\) over100100synthetic maps generated from the S\.C\. genome at resolution3030kb\. Per trans\-block, we add55random pixels with intensity the maximum of the bloc\. The black dotted line in the left plot represents the resolution of the maps\.
### J\.4Inference from real\-world contact maps
#### J\.4\.1Strategy to construct the parameter estimationθ^\\hat\{\\theta\}
In all experiments,θ^\\hat\{\\theta\}is estimated independently for each component\. In contact maps, some blocks may be large or noisy which can corrupt the centromere signal and make inference difficult\. This can lead to inaccurate estimates depending on the selected blocks\. We exploit the flexibility of our model to block sizes to bypass this issue\. If we want to usekktrans\-blocks, for each parameter componentii, we randomly sample1010sequences ofkkblocks and provide them to the network, producing1010candidate predictions\. The final estimateθi^\\hat\{\\theta\_\{i\}\}is computed as the mean over these candidates\.
#### J\.4\.2Centromere identification
We first provide genomic information of each studied specie\.
SpeciesAbbreviationNumber of chromosomesGenome range \(bp\)YeastsKluyveromyces lactisK\.L\.61 062 590 – 2 602 197Lachancea kluyveriL\.K\.8951 467 – 2 314 951Lachancea thermotoleransL\.T\.8687 718 – 1 720 065Saccharomyces cerevisiaeS\.C\.16230 209 – 1 533 918Saccharomyces kudriavzeviiS\.K\.16170 892 – 1 436 357Saccharomyces mikataeS\.M\.16188 471 – 1 444 590Schizosaccharomyces pombeS\.P\.32 452 883 – 5 579 133ParasitePlasmodium FalciparumP\.F\.14640 851 – 3 291 936PlantArabidopsis thalianaA\.T\.518 585 056 – 30 427 671Table 14:Species used in this study with abbreviations and chromosome counts\.We show the parameter estimation using our network with22trans\-blocks for the yeasts L\.T\. and L\.K\. at resolution3030kb \(see Figure[25](https://arxiv.org/html/2605.21617#A10.F25)\)\.
Figure 25:Centromere estimation for the yeasts L\.T\. \(left\) and L\.K\. \(right\), resolution3030kb\.In almost all dimensions except for chromosome77in L\.T\. and for chromosome88in L\.K\., the network achieves to estimateθi\\theta\_\{i\}with a precision close to the resolution: graphically, the positions are aligned with the spots of interactions on the reference maps\.
#### J\.4\.3Loop localization
We evaluate the capacity ofBlockFormerto estimate some loops given Hi\-C data from the human IMR90 cell line\. Since full contact maps are significantly larger than those used during training, we restrict our analysis to genomic regions around the main diagonal considered as local cis\-blocks\. Several structural characteristics require adaptations to fit toBlockFormer\.
- •loops are intra\-chromosomal features, which necessitates to work on individual cis\-blocks instead of multiple trans\-blocks used for the centromeres identification\.
- •cis\-blocks are symmetric causing loops to appear twice in the map\. We thus provide either the upper or lower triangular part of the map to the model or localized patches around each loop\.
- •the strong signal along the main diagonal in cis\-block can bias the estimation, motivating the use of an observed\-over\-expected normalization for the map \(see Algorithm[5](https://arxiv.org/html/2605.21617#alg5)\)\.
- •the number and locations of loops within a given cis\-block are unknown whereasBlockFormerwas trained on sequences of trans\-blocks containing a single spot per block\. We thus restrict our analysis to genomic regions where only one loop is visible\.
- •each loop position is defined by two coordinates whereasBlockFormerproduces only a single scalar parameter per entity\. We therefore perform two passes through the model to separately infer the x\- and y\-coordinates of each loop\.
Since no experimentally validated ground\-truth loop annotations are available, we use qualitative visual inspection and an external loop\-calling method for quantitative evaluation\. Specifically, we use the algorithmChromosight\[[24](https://arxiv.org/html/2605.21617#bib.bib24)\]a pattern\-based algorithm for detecting structural features in Hi\-C maps such as loops or TADs as a reference for loop localization\.
Algorithm 5Observed\-over\-expected normalizationInput: contact map
CCof size
\(N×N\)\(N\\times N\)
Return: observed\-over\-expected map
OEOEof size
\(N×N\)\(N\\times N\)
compute the matrix of distance to the diagonal
DD\(N×N\)\(N\\times N\)such that
Di,j=\|i−j\|D\_\{i,j\}=\|i\-j\|
compute the expected maps
EE\(N×N\)\(N\\times N\)such that
Ei,j=mean\|k−l\|=Di,j\(Ck,l\)E\_\{i,j\}=\\underset\{\|k\-l\|=D\_\{i,j\}\}\{\\mathrm\{mean\}\}\(C\_\{k,l\}\)
construct the observed\-over\-expected map
OEOEof size
\(N×N\)\(N\\times N\)such that
OEi,j=Ci,jEi,jOE\_\{i,j\}=\\frac\{C\_\{i,j\}\}\{E\_\{i,j\}\}
returnthe observed\-over\-expected map
OEOEof size
\(N×N\)\(N\\times N\)\.
We first report quantitative performance results by treatingChromosightdetections as reference loop positions in an automatized procedure\. We compareBlockFormer’s performance withCenturion\. AsCenturionis fine\-tuned for centromere identification using genome\-wide contact maps, it faces the same limitations as our approach \(single block, unknown number of spots per block, pair coordinates estimation per spot\)\. Therefore,BlockFormerandCenturionrequire to operate on regions of the Hi\-C map that are clean and contain only a single loop\. A pre\-localization step is necessary to identify such regions\. To address this, we employ a simple peak detection algorithm on the Hi\-C map presented in Algorithm[6](https://arxiv.org/html/2605.21617#alg6)\.
Algorithm 6Loop pre\-localizationInput: observed\-over\-expected map
OEOEof size
\(N×N\)\(N\\times N\)
Return: pre\-located loops
smooth the map via Gaussian blur with
σ=1\\sigma=1to denoise it\.
construct a threshold to define what is a peak in the map via a percentile \(e\.g\.
92%92\\%\)\.
find local maxima in the map: compare each pixels to their neighborhood region of a given size \(e\.g\.
55\)\.
filter candidates: a peak is a local max with intensity value above the threshold\.
filter peaks that are too close from the diagonal with a given distance \(e\.g\.
33\) because they correspond to self\-interactions\.
returna set of loops pre\-localization\.
The full process for loop localization usingBlockFormerorCenturionis described in Algorithm[7](https://arxiv.org/html/2605.21617#alg7)\.
Algorithm 7Loop detection withBlockFormerInput: region from Hi\-C map
CCof size
\(N×N\)\(N\\times N\)
Return: loops estimation
compute the observed\-over\-expected map
OEOEfrom
CCvia Algorithm[5](https://arxiv.org/html/2605.21617#alg5)\.
find loops pre\-localization via Algorithm[6](https://arxiv.org/html/2605.21617#alg6)\(referred as pre\-loc loops\)\.
find loops reference positions viaChromosight\(referred as true loops\)\.
filter pre\-loc loops to ensure that each of them are close to one true loop\.
foreach pre\-loc loopdo
define a region of size
30×3030\\times 30in the map
OEOEaround the pre\-loc loop \(referred as
CloopC\_\{\\text\{loop\}\}\)\.
0−0\-pad
CloopC\_\{\\text\{loop\}\}to a multiple of the patch size and norm it between
0−10\-1
pass
CloopC\_\{\\text\{loop\}\}toBlockFormerto find the
ii\-coordinate of the loop:
θi\\theta\_\{i\}\.
pass
CloopTC\_\{\\text\{loop\}\}^\{T\}toBlockFormerto find the
jj\-coordinate of the loop:
θj\\theta\_\{j\}\.
endfor
returna set of loops localization estimations
\(θi,θj\)\(\\theta\_\{i\},\\theta\_\{j\}\)\.
We report performance of both methods across multiple chromosomes\. The metric used is the absolute error normalized by the resolution\. In each chromosome, more than200200loops are identified\.
Figure 26:Loops detection at resolution55kb in automated selected genomic regions of chromosomes11\(left\),33\(middle\) and77\(right\)\.BlockFormer\(referred asBF\) orCenturion\(referred asCent\.\) take as input pre\-localized regions in the observed\-over\-expected maps\. Red lines in the top plots indicate the mean error\. Depending on the region,BlockFormerdoes not always estimate loops at under\-resolution precision but remains however more precise and faster thanCenturion\.Table 15:Performance comparison \(mean±\\pmstd\) across chromosomes\.Methodchr\. 1chr\. 3chr\. 7BlockFormer2\.21±\\pm1\.032\.12±\\pm1\.082\.01±\\pm1\.09Centurion3\.73±\\pm3\.683\.26±\\pm3\.273\.70±\\pm3\.93BlockFormercan be adapted to loop localization, requiring however a pre\-localization step\.Centurionis not well suited for single block inference because its optimization process is based on the alignment of multiple spots across blocks in the map\. On the contrary, thanks to its training strategy and its flexible architecture,BlockFormercan adapt to this setup and outperformsCenturionboth in accuracy and time\.
We further present qualitative results where genomic regions have been manually selected across multiple chromosomes\.


Figure 27:Example of observed\-over\-expected maps used as input toBlockFormerfor loop position estimation\. Only the upper \(left\) or lower \(right\) triangular part is considered for respectively the y\- or x\- coordinate estimations\. Dashed lines represent the model’s predictions of the loops\.


Figure 28:Loops detection at resolution55kb in selected genomic regions of chromosomes11\(left\),33\(middle\) and77\(right\)\. Loops appear as bright off\-diagonal enrichment spots\.BlockFormertakes as input the upper or lower triangular part of the observed\-over\-expected maps\.Chromosighttakes as input the entire raw map but considers only the upper triangular part\. Dashed lines represent model’s loop predictions, while green dots indicateChromosight’s loop positions\. Agreement between both methods indicates that when the map is sufficiently clear, the model accurately identifies loops\.Similar Articles
RigidFormer: Learning Rigid Dynamics using Transformers
RigidFormer is a new mesh-free, object-centric Transformer model that learns rigid dynamics from point clouds, outperforming mesh-based baselines in speed and scalability for multi-object contact dynamics.
Block-Based Double Decoders
Proposes block-based double decoders, a novel transformer architecture using doubly-causal block-based attention masks to combine decoder-only training efficiency with encoder-decoder inference efficiency, achieving strong scaling performance and reduced KV-cache memory.
RT-Transformer: The Transformer Block as a Spherical State Estimator
This paper presents a theoretical framework interpreting Transformer components (attention, residual connections, normalization) as arising from a spherical state estimation problem using Radial-Tangential SDEs.
Interfaze: A new model architecture built for high accuracy at scale
Interfaze introduces a hybrid AI model architecture combining CNN/DNN specialization with transformer capabilities, achieving superior accuracy on deterministic tasks like OCR and translation while maintaining cost efficiency at scale.
@berryxia: Guys, my back isn’t chilling. But, I’m thrilled after seeing this model architecture! While everyone is still frantically stacking parameters and competing with general-purpose large models, Interfaze has introduced a brand-new hybrid architecture. It achieves OCR, vision, STT, and structured output accuracy for deterministic tasks that crushes Gemini-3-Flash…
Interfaze introduces a new hybrid AI model architecture that combines DNN/CNN encoders with transformers to achieve superior accuracy and cost-efficiency for deterministic tasks such as OCR, vision, and STT, compared to generalist models.