dna-modeling

Tag

Cards List
#dna-modeling

LDARNet: DNA Adaptive Representation Network with Learnable Tokenization for Genomic Modeling

arXiv cs.CL · 5d ago Cached

LDARNet is a 120M-parameter hierarchical genomic foundation model that introduces learnable adaptive tokenization (inspired by H-Net's dynamic chunking) for masked language modeling on DNA sequences. It achieves state-of-the-art results on 5 histone modification tasks and outperforms models up to 20× larger on several genomic benchmarks, with learned token boundaries aligning with biological features like promoter motifs and splice junctions.

0 favorites 0 likes
← Back to home

Submit Feedback