Tag
Huggingface has open-sourced genomic foundational models, including Carbon, a DNA model that is 275x faster than the next best model and can process the entire human genome on a single GPU in under 2 days.
HuggingFace releases Carbon, a DNA model that is 275x faster than the previous state-of-the-art (Evo2), enabling processing of the entire human genome on a single GPU in under two days. The model uses a unique tokenizer that splits sequences into 6-base chunks while maintaining single-base resolution, and comes with an interactive demo.
This GitHub repository provides 135 ready-to-use scientific AI skills covering biology, chemistry, medicine, and other fields. They can be integrated into AI agents with one click to accelerate research workflows.
Introduces scShapeBench, a benchmark dataset for shape detection in high-dimensional single-cell data, and scReebTower, a baseline method that uses diffusion geometry and Reeb graphs to classify data shapes into clusters, trajectories, multi-branches, and archetypes.
This paper proposes SoftBlobGIN, a framework that enhances the interpretability of protein language model representations by projecting them onto contact graphs for structure-aware message passing. It demonstrates improved performance on enzyme classification and binding-site detection while providing auditable structural explanations.
This paper introduces a new paradigm for universal Gene Regulatory Network (GRN) inference using single-cell foundation models, proposing Virtual Value Perturbation and Gradient Trajectory methods to distill regulatory knowledge.
This paper introduces Evo-PU, a positive-unlabeled learning framework that models survivorship bias in protein sequence data by leveraging evolutionary mutation processes. The authors demonstrate that Evo-PU outperforms standard PU methods and protein language models in predicting protein functionality for influenza, RSV, and SARS-CoV-2.
This article introduces ProtSent, a contrastive fine-tuning framework for protein language models that improves embedding quality for downstream tasks like remote homology detection and structural retrieval.
This paper presents a Transformer-based model for classifying wildlife species using only daily GPS movement trajectories, demonstrating superior accuracy over LSTM and CNN baselines across different studies and regions.
This paper introduces PlantMarkerBench, a multi-species benchmark for evaluating language models' ability to interpret evidence for plant marker genes from scientific literature across four species. It highlights that while frontier models perform well on direct evidence, they struggle with functional and indirect evidence types.
The author describes a workflow using Gemini Nano Pro, Tripo, and Codex to generate 3D biological structures, highlighting AI's potential to accelerate education.
The article reviews DeepMind's decision-making process behind open-sourcing AlphaFold in 2021, praising Demis Hassabis's leadership style for taking risks and freely making basic research available. It also notes that although this initiative did not generate direct profits, it successfully led to the creation of Isomorphic Labs, valued at $2 billion.
TD3B is a sequence-based generative framework for designing allosteric binders with specific agonist or antagonist behaviors using transition-directed discrete diffusion. The paper introduces a method to control directional transitions in protein states, addressing limitations of static structure-based design.
This content covers methodologies for categorizing amino acids, likely involving computational or biological analysis techniques.
This paper introduces GATHER, a convergence-centric retrieval method for zero-shot cell-type annotation using knowledge graphs, which improves accuracy and reduces LLM costs compared to existing KG-RAG baselines.
This paper introduces Shesha, a geometric stability metric that quantifies directional coherence of single-cell CRISPR perturbation responses using mean cosine similarity, revealing regulatory architecture and predicting cellular stress across 2,200+ perturbations in five CRISPR datasets.
Machine Learning at Berkeley collaborated with LatchBio to benchmark their AI agent's performance on spatial transcriptomics workflows, evaluating its ability to automate complex bioinformatics tasks.
The paper introduces RGxEStat, a lightweight interactive tool that applies mixed-effect models to analyze gene-environment interactions, offering breeders a user-friendly alternative to complex SAS/R programming.
This article details the development of CodonRoBERTa, a language model trained across 25 species for mRNA codon optimization, highlighting a cost-effective pipeline that includes protein folding and sequence design.
Biohub releases ESMC, ESMFold2, and ESM Atlas — a world model for protein biology enabling state-of-the-art prediction, design, and discovery across scales, including a billion-structure atlas.