Tag
A discussion on the methodologies and challenges involved in evaluating AI features once they are deployed in production environments.
Introduces Neural Particle Automata, a method for learning self-organizing particle dynamics using smooth particle hydrodynamics perception, enabling particles to have local perception vectors for an update rule, analogous to Neural Cellular Automata but on continuous particle positions.
This guest post explores the proposed Cross-Origin Storage API to improve caching of AI model resources in Transformers.js, enabling efficient reuse across origins while maintaining privacy and integrity for in-browser inference.
This article explains in detail the core ideas of JAX, including function purity, immutability, explicit state management, and JIT compilation, helping readers shift from object-oriented thinking to functional programming to optimize machine learning performance.
Uses the Bradley-Terry model and Elo rating system to statistically determine a dog's favorite treat through pairwise comparison experiments.
The author shares progress on building a CPU-only tensor library in C, covering basics like add/mul, reduce, strides, and 2D matmul, along with insights from reading Arcee's technical blogs on foundation models.
Meta employees are petitioning against the Model Capability Initiative (MCI), which collects computer-use data like keystrokes, mouse movements, and screen content for AI training, raising serious privacy and regulatory concerns.
Recommends 15 YouTube channels for learning AI in 2026, categorized by learning stage, with study path advice for beginners, engineering projects, and cutting-edge trends.
An update on Matrix Recurrent Units (MRU), a linear-time attention alternative. The author explores methods to stabilize training, finding that orthogonal matrices underperform while LDU factorization works best, and shows MRU underperforms transformers on larger datasets like TinyStories.
A list of essential research papers for LLM engineers, including key works on transformers, scaling laws, and fine-tuning techniques.
Sharing a machine learning systems notes repo on GitHub, covering distributed computing, parallelization, quantization, and PyTorch internals related to LLM training and inference. Suitable for learners interested in ML systems.
MIT researchers have developed a machine-learning-based approach to accurately model the behavior of metal alloys, regardless of chemical complexity, enabling faster and cheaper materials innovation.
A thread introducing Loop Engineering as a solution to the common problem of quant strategies that backtest perfectly but fail in live trading, emphasizing the need for iterative optimization.
Noam Shazeer describes a coding convention for naming tensors with dimension suffixes to improve code readability and sanity, used at Character.AI since 2022.
A discussion thread on multivariate probability models in machine learning.
This paper introduces TS-Fault, a benchmark for evaluating time series forecasting models under structured fault scenarios like broken dependencies and regime changes, finding that clean-data accuracy often anti-correlates with robustness and that foundation models are especially fragile.
This paper systematically evaluates reject inference methods in credit scoring and identifies a failure mode where accuracy improves while recall collapses, creating an illusion of improvement while rejection quality deteriorates. It proposes a controlled exploration strategy that breaks the feedback loop and shows that even minimal exploration rates are sufficient to diagnose the problem.
This paper argues that measurement noise, not model inadequacy, explains why nonlinear models often fail to outperform linear regression in biomedical prediction, as noise attenuates nonlinear structure faster than linear structure, a limitation that cannot be overcome by more data or model complexity.
Introduces P²CE, a model-agnostic algorithm for generating plausible Pareto-optimal counterfactual explanations that balances feasibility, plausibility, and computational efficiency using an isolation forest outlier detector and SHAP values.
TxBench-PP is a benchmark for evaluating AI agents on small-molecule preclinical pharmacology tasks. Across 16 model-harness configurations, the best system achieved only 59.3% accuracy, indicating significant room for improvement.