Carbon: Decoding the Language of Life
Summary
Hugging Face released Carbon, a family of open DNA foundation models that matches state-of-the-art performance of Evo2-7B while being 275x faster, using 6-mer tokenization, factorized loss, and curated genomic data.
Similar Articles
@lvwerra: We are releasing Carbon: a crazy fast DNA model Carbon is 275x faster than the next best model. So fast you can process…
HuggingFace releases Carbon, a DNA model that is 275x faster than the previous state-of-the-art (Evo2), enabling processing of the entire human genome on a single GPU in under two days. The model uses a unique tokenizer that splits sequences into 6-base chunks while maintaining single-base resolution, and comes with an interactive demo.
@ClementDelangue: The future of biology shouldn’t stay behind black-box APIs. Especially when it touches personal health. Whether you’re …
Hugging Face releases Carbon, an open-source DNA base model that is 275x faster than comparable models, enabling local processing of whole genomes on a single GPU.
@adithya_s_k: Wake up ppl Huggingface just open sourced Genomic Foundational Models
Huggingface has open-sourced genomic foundational models, including Carbon, a DNA model that is 275x faster than the next best model and can process the entire human genome on a single GPU in under 2 days.
LDARNet: DNA Adaptive Representation Network with Learnable Tokenization for Genomic Modeling
LDARNet is a 120M-parameter hierarchical genomic foundation model that introduces learnable adaptive tokenization (inspired by H-Net's dynamic chunking) for masked language modeling on DNA sequences. It achieves state-of-the-art results on 5 histone modification tasks and outperforms models up to 20× larger on several genomic benchmarks, with learned token boundaries aligning with biological features like promoter motifs and splice junctions.
Decoding genetics with OpenAI o1
OpenAI introduces the o1 model series, designed to reason through complex tasks before responding, with applications in genetics, science, coding, and math. The announcement highlights use cases in decoding genetics with researcher Catherine Brownstein.