Tag
Baidu has released Unlimited-OCR, which processes entire documents in a single pass without chunking, overcoming a major limitation of current OCR technology.
Lite Any Stereo V2 presents an efficient stereo matching approach achieving state-of-the-art accuracy with significantly reduced latency through optimized architecture and training strategies, including a 2D-only cost aggregation framework and a three-stage training strategy.
A detailed guide to building a correct PyTorch training loop, highlighting common mistakes and proper ordering of operations.
This article delves into the principles of LoRA and its variants (QLoRA, VeRA, DoRA), explaining how low-rank decomposition reduces trainable parameters to enable efficient fine-tuning of large models.
A reflection on the landmark 'Attention Is All You Need' paper, highlighting how removing recurrence and relying solely on attention mechanisms revolutionized AI and led to modern LLMs like GPT and Claude.
This paper introduces Tapered Language Models (TLMs), an architecture principle that allocates more parameters to earlier layers and fewer to later layers, consistently improving perplexity and downstream performance across multiple architectures without extra cost.
Recommends 15 YouTube channels for learning AI in 2026, categorized by learning stage, with study path advice for beginners, engineering projects, and cutting-edge trends.
An update on Matrix Recurrent Units (MRU), a linear-time attention alternative. The author explores methods to stabilize training, finding that orthogonal matrices underperform while LDU factorization works best, and shows MRU underperforms transformers on larger datasets like TinyStories.
An educational overview of knowledge distillation, covering its history, core concepts like softmax and temperature, types, scaling laws, and practical examples including DeepSeek-R1.
A comprehensive 15-part series covering LLM internals from tokenization to serving, grounded in Gemma 4 12B's actual config.
This paper proposes eCNNTO, a CNN with residual connections to accelerate density-based topology optimization by predicting near-optimal densities from early iteration histories, achieving up to 97% reduction in iterations and strong generalization across different boundary conditions, geometries, and mesh resolutions.
Introduces ITNet, a neural architecture based on a learnable integral transform that unifies convolution, attention, and recurrence, achieving strong results across multiple modalities.
David Ha and Jürgen Schmidhuber recount how foundational deep learning techniques like transformers, unsupervised pretraining, knowledge distillation, and residual networks were pioneered in Munich in 1991, laying the groundwork for the current AI boom.
The author explains operator fusion as a key mechanism behind torch.compile's speedups, and provides a minimal 500-line Python implementation and notebook as an educational tool.
A tweet promoting a free deep learning resource with 68 interactive Python notebooks covering topics from basics to advanced techniques like GANs and diffusion models, ideal for self-learners.
Proposes a Wasserstein-GAN approach for unsupervised calibration of sensor-induced distribution drifts, validated on tracking detector toy models and simulated calorimeter data with aging effects.
The paper investigates whether weight norm directly controls the grokking delay in neural networks or if its effect is mediated by logit scale and softmax saturation under cross-entropy loss. Experiments show that the delay is almost entirely explained by the effective logit scale, with weight norm contributing negligibly.
A structured survey of AI-based models for soil moisture estimation and classification, covering statistical time-series, geostatistical, classical ML, deep learning, and probabilistic/Bayesian methods.
This paper introduces QG-MIL, a gated transformer aggregator that mitigates attention concentration in multiple instance learning for medical imaging, achieving domain-agnostic performance without auxiliary losses.
A discussion on the challenge of enabling AI systems to develop deep expertise from documents, akin to humans learning from textbooks, highlighting a form of continual learning.