dna-modeling

#dna-modeling

LDARNet: DNA Adaptive Representation Network with Learnable Tokenization for Genomic Modeling

arXiv cs.CL ↗ · 2026-06-04 Cached

LDARNet is a 120M-parameter hierarchical genomic foundation model that introduces learnable adaptive tokenization (inspired by H-Net's dynamic chunking) for masked language modeling on DNA sequences. It achieves state-of-the-art results on 5 histone modification tasks and outperforms models up to 20× larger on several genomic benchmarks, with learned token boundaries aligning with biological features like promoter motifs and splice junctions.

0 favorites 0 likes

dna-modeling

LDARNet: DNA Adaptive Representation Network with Learnable Tokenization for Genomic Modeling

Submit Feedback