local-llm

Tag

Cards List
#local-llm

@analogalok: I just got Gemma 4 26B A4B MoE model running fully locally with Hermes agent on an 8GB RTX 4060 and it's now backtestin…

X AI KOLs Following · 1h ago Cached

A developer demonstrates running Gemma 4 26B MoE model locally on an 8GB RTX 4060 with Hermes agent to fully automate backtesting of trading strategies, highlighting the growing capability of local LLMs as autonomous agents.

0 favorites 0 likes
#local-llm

@PrajwalTomar_: https://x.com/PrajwalTomar_/status/2069409824824316060

X AI KOLs Following · 2h ago Cached

The author built a fully offline AI agent using local embedding models, Llama via Ollama, and VectorAI DB to address the risks of cloud-dependent AI. The agent runs on an 8GB MacBook, processes sensitive documents, and maintains memory across sessions.

0 favorites 0 likes
#local-llm

Is there any reason for a lack of love for Gemma 4 26b?

Reddit r/LocalLLaMA · 10h ago

A user asks why Gemma 4 26b receives less attention compared to Qwen models, sharing their experience using these models for a personal assistant project on a 3090.

0 favorites 0 likes
#local-llm

@rohanpaul_ai: Sakana Fugu Ultra just beat the other models on visual polish in a live trading-desk coding test, got close to GLM 5.2,…

X AI KOLs Following · 17h ago Cached

Sakana's Fugu Ultra model orchestration system outperformed other models in a live coding test for a trading desk UI, though at 17x higher cost, demonstrating its strength in visual polish and multi-agent coordination.

0 favorites 0 likes
#local-llm

been tracking EU DDR5 data for 25 days: Prices are dropping, and the DE vs. NL gap is wild (good news for local LLM builders in EU)

Reddit r/LocalLLaMA · yesterday

DDR5 RAM prices are dropping across the EU, with Germany up to 20% cheaper than the Netherlands/Belgium, making it a good time for local LLM builders to upgrade. A live tracker at pricesquirrel.com monitors these trends.

0 favorites 0 likes
#local-llm

@karminski3: Thinking of buying a Mac to run large models? This is a deterrent post. Actually, the estimation method is simple. Even if you buy a MacStudio to run the Qwen3.6-27B 4bit quantized version, then enable DFlash to use Qwen's built-in speculative decoding, it only reaches 65 token/s. And now most large models can run at 40 token/s…

X AI KOLs Timeline · yesterday Cached

The author calculates the token cost and break-even period of running large models on a Mac Studio, concluding that it is not cost-effective for ordinary users to buy a Mac for personal large model use, and suggests that using APIs or renting GPUs is more economical.

0 favorites 0 likes
#local-llm

Do you think dedicated hardware for running local LLMs will become affordable anytime soon?

Reddit r/LocalLLaMA · yesterday

Discusses the potential for affordable dedicated hardware for running local LLMs, considering Chinese manufacturers' ability to produce low-cost hardware at scale.

0 favorites 0 likes
#local-llm

@TheAhmadOsman: INCREDIBLE RESOURCE The MOST COMPLETE GUIDE for understanding LLMs from first principles is now available online to rea…

X AI KOLs Timeline · yesterday Cached

A comprehensive free guide explaining LLMs from first principles, covering tokens, transformers, attention, fine-tuning, and local deployment.

0 favorites 0 likes
#local-llm

Good results fine tuning a local LLM like Qwen 3:0.6B to categorize questions

Hacker News Top · yesterday Cached

A developer fine-tunes a small Qwen 3 0.6B model using the Unsloth framework to categorize household questions, achieving good results with only 850 training examples.

0 favorites 0 likes
#local-llm

It’s time to decentralize model distribution! Introducing Noema Atlas

Reddit r/LocalLLaMA · 2d ago

Noema Atlas is a free and open-source peer-to-peer desktop app for decentralized distribution of LLM model weights, using content-addressed verification and Iroh for direct machine-to-machine transfers, with Hugging Face as a fallback.

0 favorites 0 likes
#local-llm

You can now convert EXL3 quants on Apple Silicon Mac

Reddit r/LocalLLaMA · 2d ago

A new tool enables converting and running EXL3 quantized models on Apple Silicon Macs, matching or nearly matching RTX conversion quality, making high-fidelity quants more accessible.

0 favorites 0 likes
#local-llm

Best local LLM for English story summarization

Reddit r/LocalLLaMA · 3d ago

A guide comparing the best local LLMs for English story summarization, offering recommendations based on performance and accessibility.

0 favorites 0 likes
#local-llm

@onchainmilady: ANTHROPIC TRIED TO BAN HIS GITHUB Chinese guy published 70B parameter LLM, 20,000 starts on Github + a lawsuit from big…

X AI KOLs Timeline · 3d ago Cached

A Chinese developer published a 70B parameter LLM that runs locally on minimal hardware (4GB GPU) using flat memory and layer-by-layer loading, potentially replacing expensive subscription services.

0 favorites 0 likes
#local-llm

@ciruai: Testing DeepSeek v4 Flash on the AMD Ryzen AI Max+ 395 Strix Halo with 128GB RAM. Getting ~15 TPS over a decently long …

X AI KOLs Timeline · 5d ago Cached

Testing DeepSeek v4 Flash on the AMD Ryzen AI Max+ 395 with 128GB RAM achieves ~15 TPS for a 284B MoE model (13B active) locally, costing $3,000 versus $25,000+ for a datacenter setup, highlighting the feasibility of running large models on consumer hardware.

0 favorites 0 likes
#local-llm

gave my local llm agent mcp tools for local image + video gen, so it just generates when i ask (fully offline+free)

Reddit r/LocalLLaMA · 5d ago

A user demonstrates giving a local LLM agent MCP tools for local image and video generation, enabling fully offline and free generation on demand.

0 favorites 0 likes
#local-llm

LocalLLaMA crowdsourced coding dataset

Reddit r/LocalLLaMA · 5d ago

A community member proposes creating a crowdsourced coding dataset for local LLMs to enable collaborative model training and fine-tuning, addressing concerns about future availability of open-weight models.

0 favorites 0 likes
#local-llm

@0xSero: Best models for your hardware - 4gb to 12gb vram - VibeThinker-3B - smokes everything remotely close to its weight clas…

X AI KOLs Timeline · 5d ago Cached

This thread recommends AI models optimized for different VRAM levels, highlighting VibeThinker-3B for its strong reasoning performance at 3B parameters, along with other models for coding and general use.

0 favorites 0 likes
#local-llm

I made a FAQ Chatbot that runs completely in browser; Local AI in Two Clicks

Reddit r/artificial · 5d ago Cached

A FAQ chatbot that runs entirely in the browser using local AI, requiring only two clicks to start.

0 favorites 0 likes
#local-llm

@julien_c: Llama.cpp has a new branding + official website. Run local models today! Now more than ever, open source must win. By @…

X AI KOLs Following · 5d ago Cached

Llama.cpp has unveiled a new branding and official website, promoting the local execution of AI models and reinforcing the importance of open-source software.

0 favorites 0 likes
#local-llm

llama.cpp - how to free up even more space on your GPU

Reddit r/LocalLLaMA · 5d ago

A thread sharing practical tips for freeing up GPU memory in llama.cpp, such as offloading mmproj to CPU and adjusting KV cache types, while discussing parameters like --cache-type-k/v and --spec-draft-n-max.

0 favorites 0 likes
Next →
← Back to home

Submit Feedback