@analogalok: I just got Gemma 4 26B A4B MoE model running fully locally with Hermes agent on an 8GB RTX 4060 and it's now backtestin…

X AI KOLs Following News

Summary

A developer demonstrates running Gemma 4 26B MoE model locally on an 8GB RTX 4060 with Hermes agent to fully automate backtesting of trading strategies, highlighting the growing capability of local LLMs as autonomous agents.

I just got Gemma 4 26B A4B MoE model running fully locally with Hermes agent on an 8GB RTX 4060 and it's now backtesting trading strategies end to end, no hand holding. If you’re a trader or work on Wall Street, you don’t want to miss this. Yes. fully automated. No cloud. No APIs beyond market data. # Here's what I did: Setup: - Model: Gemma 4 26B-A4B QAT (MoE), Q4_K_XL Unsloth's quant (link in the comments) - Inference: llama.cpp (turboquant fork by @no_stp_on_snek link in the comments) - Hardware: RTX 4060, 8GB VRAM + 16GB RAM only (with 50 other chrome tabs open) - Context: 64K llama.cpp turboquant flags: -m gemma-4-26B-A4B-it-qat-UD-Q4_K_XL.gguf -c 64000 --cache-type-k q8_0 --cache-type-v turbo3 --port 8080 turboquant helps achieve high prefill and decode throughput for interactive sessions. throughput with Hermes agent: decode: 25+ tokens/sec prefill: 250+ tokens/sec # Then I gave the agent one task: Backtest a strategy: - Buy when RSI crosses above 30 - Sell at +2% profit or -1% stoploss - No overlapping positions - Use Google stock via yfinance - Generate a full HTML report with candlestick charts + signals What happened next was wild. It didn't just write code, it ran the entire workflow itself: Audited the environment (pip list, dependency check) Hit a ModuleNotFoundError, multiple Python installs were conflicting Ran where python to map every interpreter on the system Manually selected the correct Python 3.13 path and re ran the script Wrote a clean statevmachine backtester (strict no overlapping trades logic) Patched a yfinance MultiIndex quirk that would've crashed the script Built Plotly candlestick + RSI charts with buy/sell markers Calculated win rate, PnL, and summary stats Exported a polished single file HTML report. check the report at the end of the video or in the comments. Biggest takeaway: local LLMs aren't just "chat assistants" anymore. They debug their own environment, write production code, and ship a finished deliverable on consumer hardware, for $0 in API costs. If you're still calling local models "toys," you're already behind. This is just the beginning. Hermes agent just surpassed 1 trillion tokens in a single day on OpenRouter. Think about the scale of total token generation happening right now. Disclaimer: This is not financial advice. Consult a professional before making any trading decisions.
Original Article
View Cached Full Text

Cached at: 06/23/26, 03:51 PM

I just got Gemma 4 26B A4B MoE model running fully locally with Hermes agent on an 8GB RTX 4060 and it’s now backtesting trading strategies end to end, no hand holding.

If you’re a trader or work on Wall Street, you don’t want to miss this.

Yes. fully automated. No cloud. No APIs beyond market data.

Here’s what I did:

Setup:

  • Model: Gemma 4 26B-A4B QAT (MoE), Q4_K_XL Unsloth’s quant (link in the comments)
  • Inference: llama.cpp (turboquant fork by @no_stp_on_snek link in the comments)
  • Hardware: RTX 4060, 8GB VRAM + 16GB RAM only (with 50 other chrome tabs open)
  • Context: 64K

llama.cpp turboquant flags: -m gemma-4-26B-A4B-it-qat-UD-Q4_K_XL.gguf -c 64000 –cache-type-k q8_0 –cache-type-v turbo3 –port 8080

turboquant helps achieve high prefill and decode throughput for interactive sessions.

throughput with Hermes agent: decode: 25+ tokens/sec prefill: 250+ tokens/sec

Then I gave the agent one task:

Backtest a strategy:

  • Buy when RSI crosses above 30
  • Sell at +2% profit or -1% stoploss
  • No overlapping positions
  • Use Google stock via yfinance
  • Generate a full HTML report with candlestick charts + signals

What happened next was wild. It didn’t just write code, it ran the entire workflow itself:

Audited the environment (pip list, dependency check)

Hit a ModuleNotFoundError, multiple Python installs were conflicting

Ran where python to map every interpreter on the system

Manually selected the correct Python 3.13 path and re ran the script

Wrote a clean statevmachine backtester (strict no overlapping trades logic)

Patched a yfinance MultiIndex quirk that would’ve crashed the script

Built Plotly candlestick + RSI charts with buy/sell markers

Calculated win rate, PnL, and summary stats Exported a polished single file HTML report. check the report at the end of the video or in the comments.

Biggest takeaway: local LLMs aren’t just “chat assistants” anymore. They debug their own environment, write production code, and ship a finished deliverable on consumer hardware, for $0 in API costs.

If you’re still calling local models “toys,” you’re already behind.

This is just the beginning.

Hermes agent just surpassed 1 trillion tokens in a single day on OpenRouter. Think about the scale of total token generation happening right now.

Disclaimer: This is not financial advice. Consult a professional before making any trading decisions.

Teknium 🪽 (@Teknium): Wait we actually just broke 1T tokens in a day for the first time on OpenRouter :O

Please keep contributing to the most awesome project I’ve ever worked on to help make Hermes Agent the best software stack on the planet! Thank you contributors🍻🍻

Similar Articles

@VincentLogic: An entry-level laptop with 8GB VRAM can now run a fully autonomous AI Agent. Method: Gemma 4 26B + Hermes Desktop. Run the 26B model locally with just 8GB VRAM + 16GB RAM. What can it do after connecting Hermes? …

X AI KOLs Timeline

Introduces running a fully autonomous AI Agent on an entry-level laptop with 8GB VRAM using the Gemma 4 26B model and Hermes Desktop tool, enabling local file operations, code modification, web browsing, etc., significantly lowering the barrier for local Agents.

Gemma4 26b MoE running in MLX with turboquant (and custom kernel)

Reddit r/LocalLLaMA

A developer successfully ran Gemma4 26b MoE on Apple MacBook Air M5 using MLX with turboquant and a custom kernel, achieving faster prompt processing and generation speeds than llama.cpp with lower memory usage. The implementation includes instructions for local deployment.