model-scaling

Tag

Cards List
#model-scaling

Olmo Hybrid: From Theory to Practice and Back

arXiv cs.CL · 2026-04-20 Cached

This paper presents Olmo Hybrid, a 7B-parameter language model that combines attention and Gated DeltaNet recurrent layers, demonstrating both theoretical and empirical advantages over pure transformers. The work shows that hybrid models have greater expressivity, scale more efficiently during pretraining, and outperform comparable transformer baselines.

0 favorites 0 likes
#model-scaling

LLM Attribution Analysis Across Different Fine-Tuning Strategies and Model Scales for Automated Code Compliance

arXiv cs.CL · 2026-04-20 Cached

This paper analyzes how different fine-tuning strategies (FFT, LoRA, quantized LoRA) and model scales affect LLM interpretive behavior for automated code compliance tasks using perturbation-based attribution analysis. The findings show FFT produces more focused attribution patterns than parameter-efficient methods, and larger models develop specific interpretive strategies with diminishing performance returns beyond 7B parameters.

0 favorites 0 likes
#model-scaling

Model Capability Dominates: Inference-Time Optimization Lessons from AIMO 3

Hugging Face Daily Papers · 2026-04-16 Cached

This paper analyzes inference-time optimization techniques for AIMO 3, finding that model capability dominates over prompt engineering and diverse sampling strategies. The study reveals that high-temperature sampling already decorrelates errors maximally, leaving no room for prompt-based improvements, and identifies a 6-point selection loss gap between individual model pass@20 and majority voting consensus.

0 favorites 0 likes
#model-scaling

Simplifying, stabilizing, and scaling continuous-time consistency models

OpenAI Blog · 2024-10-23 Cached

OpenAI presents sCM (simplified continuous-time consistency models), a new approach that scales consistency models to 1.5B parameters and achieves ~50x speedup over diffusion models by generating high-quality samples in just 2 steps. The method demonstrates comparable sample quality to state-of-the-art diffusion models while using less than 10% of the effective sampling compute.

0 favorites 0 likes
← Back to home

Submit Feedback