deep-learning

Tag

Cards List
#deep-learning

@ErickSky: Baidu has just broken one of the biggest limitations of current OCR. Unlimited-OCR processes entire documents in a sing…

X AI KOLs Timeline · 4d ago Cached

Baidu has released Unlimited-OCR, which processes entire documents in a single pass without chunking, overcoming a major limitation of current OCR technology.

0 favorites 0 likes
#deep-learning

Lite Any Stereo V2: Faster and Stronger Efficient Zero-Shot Stereo Matching

Hugging Face Daily Papers · 4d ago Cached

Lite Any Stereo V2 presents an efficient stereo matching approach achieving state-of-the-art accuracy with significantly reduced latency through optimized architecture and training strategies, including a 2D-only cost aggregation framework and a three-stage training strategy.

0 favorites 0 likes
#deep-learning

The annotated PyTorch training loop

Hacker News Top · 4d ago Cached

A detailed guide to building a correct PyTorch training loop, highlighting common mistakes and proper ordering of operations.

0 favorites 0 likes
#deep-learning

@0xSero: Highly recommended educational content. LoRA is one of the coolest things to dabble in, lets anyone fine tune models re…

X AI KOLs Timeline · 5d ago Cached

This article delves into the principles of LoRA and its variants (QLoRA, VeRA, DoRA), explaining how low-rank decomposition reduces trainable parameters to enable efficient fine-tuning of large models.

0 favorites 0 likes
#deep-learning

Attention Is All You Need

Reddit r/ArtificialInteligence · 5d ago

A reflection on the landmark 'Attention Is All You Need' paper, highlighting how removing recurrence and relying solely on attention mechanisms revolutionized AI and led to modern LLMs like GPT and Claude.

0 favorites 0 likes
#deep-learning

Tapered Language Models

Hugging Face Daily Papers · 5d ago Cached

This paper introduces Tapered Language Models (TLMs), an architecture principle that allocates more parameters to earlier layers and fewer to later layers, consistently improving perplexity and downstream performance across multiple architectures without extra cost.

0 favorites 0 likes
#deep-learning

@PandaTalk8: The Most Worth-Following YouTube Channels for Learning AI in 2026, No-Nonsense Edition. Bookmark them, study in this order: 1. 3Blue1Brown AI / Math Foundation. Uses visualizations to clearly explain linear algebra, neural networks, and underlying mathematical intuition. https://youtube.c…

X AI KOLs Timeline · 5d ago Cached

Recommends 15 YouTube channels for learning AI in 2026, categorized by learning stage, with study path advice for beginners, engineering projects, and cutting-edge trends.

1 favorites 1 likes
#deep-learning

An Update on Matrix Recurrent Units, an Attention Alternative [R]

Reddit r/MachineLearning · 6d ago

An update on Matrix Recurrent Units (MRU), a linear-time attention alternative. The author explores methods to stabilize training, finding that orthogonal matrices underperform while LDU factorization works best, and shows MRU underperforms transformers on larger datasets like TinyStories.

0 favorites 0 likes
#deep-learning

@TheTuringPost: https://x.com/TheTuringPost/status/2068474648925216861

X AI KOLs Timeline · 6d ago Cached

An educational overview of knowledge distillation, covering its history, core concepts like softmax and temperature, types, scaling laws, and practical examples including DeepSeek-R1.

0 favorites 0 likes
#deep-learning

I wrote a free 15-part series on LLM internals — real math, real tensor shapes, real hardware constraints. All grounded in Gemma 4 12B's actual config.

Reddit r/LocalLLaMA · 2026-06-20

A comprehensive 15-part series covering LLM internals from tokenization to serving, grounded in Gemma 4 12B's actual config.

0 favorites 0 likes
#deep-learning

eCNNTO: A Highly Generalizable ConvNet for Accelerating Topology Optimization

arXiv cs.AI · 2026-06-20 Cached

This paper proposes eCNNTO, a CNN with residual connections to accelerate density-based topology optimization by predicting near-optimal densities from early iteration histories, achieving up to 97% reduction in iterations and strong generalization across different boundary conditions, geometries, and mesh resolutions.

0 favorites 0 likes
#deep-learning

ITNet: A Learnable Integral Transform That Subsumes Convolution, Attention, and Recurrence

arXiv cs.AI · 2026-06-20 Cached

Introduces ITNet, a neural architecture based on a learnable integral transform that unifies convolution, attention, and recurrence, achieving strong results across multiple modalities.

0 favorites 0 likes
#deep-learning

Munich 1991: The Roots of the Current AI Boom

Hacker News Top · 2026-06-19 Cached

David Ha and Jürgen Schmidhuber recount how foundational deep learning techniques like transformers, unsupervised pretraining, knowledge distillation, and residual networks were pioneered in Munich in 1991, laying the groundwork for the current AI boom.

0 favorites 0 likes
#deep-learning

How does torch.compile() achieve massive speedups despite highly optimized NumPy functions? [D]

Reddit r/MachineLearning · 2026-06-19

The author explains operator fusion as a key mechanism behind torch.compile's speedups, and provides a minimal 500-line Python implementation and notebook as an educational tool.

0 favorites 0 likes
#deep-learning

@0x0SojalSec: This free Deep Learning resource is insane bro, Perfect for self-learners. 68 interactive Python notebooks. One of the …

X AI KOLs Timeline · 2026-06-19 Cached

A tweet promoting a free deep learning resource with 68 interactive Python notebooks covering topics from basics to advanced techniques like GANs and diffusion models, ideal for self-learners.

0 favorites 0 likes
#deep-learning

Correcting Sensor-Induced Distribution Drift with Wasserstein Adversarial Learning

arXiv cs.LG · 2026-06-18 Cached

Proposes a Wasserstein-GAN approach for unsupervised calibration of sensor-induced distribution drifts, validated on tracking detector toy models and simulated calorimeter data with aging effects.

0 favorites 0 likes
#deep-learning

What Does the Weight Norm Control in Grokking? Logit-Scale Mediation under Cross-Entropy

arXiv cs.LG · 2026-06-18 Cached

The paper investigates whether weight norm directly controls the grokking delay in neural networks or if its effect is mediated by logit scale and softmax saturation under cross-entropy loss. Experiments show that the delay is almost entirely explained by the effective logit scale, with weight norm contributing negligibly.

0 favorites 0 likes
#deep-learning

A Survey on Data-Driven Models for Soil Moisture Regression and Classification

arXiv cs.LG · 2026-06-18 Cached

A structured survey of AI-based models for soil moisture estimation and classification, covering statistical time-series, geostatistical, classical ML, deep learning, and probabilistic/Bayesian methods.

0 favorites 0 likes
#deep-learning

QG-MIL: A Gated Transformer Aggregator for Domain-Agnostic Multiple Instance Learning in Medical Imaging

Hugging Face Daily Papers · 2026-06-18 Cached

This paper introduces QG-MIL, a gated transformer aggregator that mitigates attention concentration in multiple instance learning for medical imaging, achieving domain-agnostic performance without auxiliary losses.

0 favorites 0 likes
#deep-learning

@lateinteraction: Been extremely excited about this work by @jacobli99! We're disappointed in the current ways our agents develop experti…

X AI KOLs Following · 2026-06-17 Cached

A discussion on the challenge of enabling AI systems to develop deep expertise from documents, akin to humans learning from textbooks, highlighting a form of continual learning.

0 favorites 0 likes
← Previous
Next →
← Back to home

Submit Feedback