single-gpu

#single-gpu

@che_shr_cat: 1/ We have been treating GPU memory all wrong. What if the GPU didn't need to store your model at all? MegaTrain enable…

X AI KOLs Timeline ↗ · 20h ago Cached

MegaTrain enables full-precision training of 100B+ LLMs on a single GPU by treating VRAM as a transient stateless cache, inverting the memory hierarchy.

0 favorites 0 likes

#single-gpu

Ornith-1.0-35B GGUF update: native MTP speculative-decode graft + full serving/TTFT/long-context numbers (llama.cpp, tp=1)

Reddit r/LocalLLaMA ↗ · yesterday

An update on the Ornith-1.0-35B GGUF model introduces a native MTP speculative-decode graft for faster inference on a single GPU, achieving ~1.3-1.35x decode speedup while maintaining near-identical token distribution. Benchmark numbers for throughput, TTFT, and long-context performance across multiple quants are provided.

0 favorites 0 likes

#single-gpu

Small Experiments, Cheaper Decisions: A Case Study in Staged Promotion for Micro-Pretraining

arXiv cs.CL ↗ · 2026-06-11 Cached

This paper studies a staged promotion protocol for micro-pretraining, using escalating budgets from minutes to hours to filter configurations. It finds that early screens are useful but unstable, and that a staged approach can retain a long-horizon reference while identifying alternatives that fail continuation thresholds.

0 favorites 0 likes

#single-gpu

I designed a methodology for (autonomously) training transformer language models on a single consumer GPU.

Reddit r/openclaw ↗ · 2026-05-31

A methodology for autonomously training transformer language models on a single consumer GPU, structured in six stages with verification gates and AGENTS.md specs for orchestration frameworks like OpenClaw.

0 favorites 0 likes

#single-gpu

ModeSwitch-LLM: A Lightweight Phase-Aware Controller for Cross-Mode LLM Inference on a Single GPU

arXiv cs.LG ↗ · 2026-05-25 Cached

ModeSwitch-LLM is a lightweight controller that routes LLM inference requests to appropriate fixed modes (e.g., FP16, quantization, speculative decoding) on a single GPU, achieving up to 2.10× latency speedup and 51.7% energy reduction without retraining the model.

0 favorites 0 likes

#single-gpu

@heygurisingh: 𝑩𝒊𝒍𝒍𝒊𝒐𝒏-𝒑𝒂𝒓𝒂𝒎𝒆𝒕𝒆𝒓 𝑳𝑳𝑴𝒔 𝒖𝒔𝒆𝒅 𝒕𝒐 𝒄𝒐𝒔𝒕 $10𝑴+ 𝒕𝒐 𝒕𝒓𝒂𝒊𝒏. Someone open sourced a repo t…

X AI KOLs Timeline ↗ · 2026-05-20 Cached

An open-source repository called train-llm-from-scratch enables training billion-parameter LLMs on a single GPU, with a configurable pipeline from raw text to inference, including dataset streaming and checkpointing, under MIT License.

0 favorites 0 likes

#single-gpu

TideGS: Scalable Training of Over One Billion 3D Gaussian Splatting Primitives via Out-of-Core Optimization

Hugging Face Daily Papers ↗ · 2026-05-19 Cached

TideGS introduces an out-of-core training framework that enables 3D Gaussian Splatting with over one billion primitives on a single GPU by managing parameters across SSD-CPU-GPU hierarchy via block-virtualization, asynchronous pipeline, and differential streaming techniques.

0 favorites 0 likes

#single-gpu

@_vmlops: FINE-TUNING A 12B MODEL ON A SINGLE GPU IS REAL NOW most people think you need a massive gpu cluster to fine-tune large…

X AI KOLs Timeline ↗ · 2026-05-17 Cached

Hugging Face's PEFT library enables parameter-efficient fine-tuning of large models on a single GPU, reducing compute and storage costs while maintaining performance.

0 favorites 0 likes

#single-gpu

@tom_doerr: Trains billion-parameter LLMs from scratch on a single GPU https://github.com/FareedKhan-dev/train-llm-from-scratch…

X AI KOLs Timeline ↗ · 2026-05-17 Cached

A GitHub repository provides scripts to train billion-parameter language models from scratch on a single GPU using PyTorch, based on the Transformer architecture.

0 favorites 0 likes

#single-gpu

Built an open-source one-prompt-to-cinematic-reel pipeline on a single GPU — FLUX.2 [klein] for character keyframes, Wan2.2-I2V for animation, vision critic with auto-retry, music + 9-language narration in the same pipeline

Reddit r/LocalLLaMA ↗ · 2026-05-14

Built an open-source pipeline that takes a single sentence and produces a cinematic reel with characters, animation, music, and narration, using FLUX.2, Wan2.2, and other models on a single AMD GPU. The pipeline includes a director agent, character generation, keyframe animation, vision critic, music, and narration stages.

0 favorites 0 likes

#single-gpu

@tom_doerr: Runs 70B LLMs on single 4GB GPU https://github.com/lyogavin/airllm

X AI KOLs Timeline ↗ · 2026-05-13 Cached

AirLLM is an open-source tool that optimizes inference memory usage, enabling 70B LLMs to run on a single 4GB GPU without quantization, and supports 405B models on 8GB VRAM.

0 favorites 0 likes

#single-gpu

@seclink: Just hit 134 tok/s with Qwen 3.5-27B Dense and 73 tok/s with the new Qwen 3.6-27B on a single RTX 3090. The 2026 open-source scene is moving at lightspeed…

X AI KOLs Following ↗ · 2026-04-23 Cached

A single RTX 3090 pushes 134 tok/s on the fresh 27B Qwen 3.5 Dense and 73 tok/s on Qwen 3.6-27B via fused kernels plus speculative decoding, with GGUF drops the same evening.

1 favorites 1 likes

#single-gpu

lyogavin/airllm

GitHub Trending (daily) ↗ · 2026-06-03 Cached

AirLLM is an open-source library that enables running large language models (up to 405B) on a single 4GB GPU without quantization, distillation, or pruning, significantly lowering the hardware barrier for local LLM inference.

0 favorites 0 likes

single-gpu

Submit Feedback