@analogalok: I just got Gemma 4 26B A4B MoE model running fully locally with Hermes agent on an 8GB RTX 4060 and it's now backtestin…

X AI KOLs Following 06/23/26, 02:09 PM News

local-llm gemma-4 moe hermes-agent automation backtesting trading

Summary

A developer demonstrates running Gemma 4 26B MoE model locally on an 8GB RTX 4060 with Hermes agent to fully automate backtesting of trading strategies, highlighting the growing capability of local LLMs as autonomous agents.

I just got Gemma 4 26B A4B MoE model running fully locally with Hermes agent on an 8GB RTX 4060 and it's now backtesting trading strategies end to end, no hand holding. If you’re a trader or work on Wall Street, you don’t want to miss this. Yes. fully automated. No cloud. No APIs beyond market data. # Here's what I did: Setup: - Model: Gemma 4 26B-A4B QAT (MoE), Q4_K_XL Unsloth's quant (link in the comments) - Inference: llama.cpp (turboquant fork by @no_stp_on_snek link in the comments) - Hardware: RTX 4060, 8GB VRAM + 16GB RAM only (with 50 other chrome tabs open) - Context: 64K llama.cpp turboquant flags: -m gemma-4-26B-A4B-it-qat-UD-Q4_K_XL.gguf -c 64000 --cache-type-k q8_0 --cache-type-v turbo3 --port 8080 turboquant helps achieve high prefill and decode throughput for interactive sessions. throughput with Hermes agent: decode: 25+ tokens/sec prefill: 250+ tokens/sec # Then I gave the agent one task: Backtest a strategy: - Buy when RSI crosses above 30 - Sell at +2% profit or -1% stoploss - No overlapping positions - Use Google stock via yfinance - Generate a full HTML report with candlestick charts + signals What happened next was wild. It didn't just write code, it ran the entire workflow itself: Audited the environment (pip list, dependency check) Hit a ModuleNotFoundError, multiple Python installs were conflicting Ran where python to map every interpreter on the system Manually selected the correct Python 3.13 path and re ran the script Wrote a clean statevmachine backtester (strict no overlapping trades logic) Patched a yfinance MultiIndex quirk that would've crashed the script Built Plotly candlestick + RSI charts with buy/sell markers Calculated win rate, PnL, and summary stats Exported a polished single file HTML report. check the report at the end of the video or in the comments. Biggest takeaway: local LLMs aren't just "chat assistants" anymore. They debug their own environment, write production code, and ship a finished deliverable on consumer hardware, for $0 in API costs. If you're still calling local models "toys," you're already behind. This is just the beginning. Hermes agent just surpassed 1 trillion tokens in a single day on OpenRouter. Think about the scale of total token generation happening right now. Disclaimer: This is not financial advice. Consult a professional before making any trading decisions.

Original Article

View Cached Full Text

Cached at: 06/23/26, 03:51 PM

I just got Gemma 4 26B A4B MoE model running fully locally with Hermes agent on an 8GB RTX 4060 and it’s now backtesting trading strategies end to end, no hand holding.

If you’re a trader or work on Wall Street, you don’t want to miss this.

Yes. fully automated. No cloud. No APIs beyond market data.

Here’s what I did:

Setup:

Model: Gemma 4 26B-A4B QAT (MoE), Q4_K_XL Unsloth’s quant (link in the comments)
Inference: llama.cpp (turboquant fork by @no_stp_on_snek link in the comments)
Hardware: RTX 4060, 8GB VRAM + 16GB RAM only (with 50 other chrome tabs open)
Context: 64K

llama.cpp turboquant flags: -m gemma-4-26B-A4B-it-qat-UD-Q4_K_XL.gguf -c 64000 –cache-type-k q8_0 –cache-type-v turbo3 –port 8080

turboquant helps achieve high prefill and decode throughput for interactive sessions.

throughput with Hermes agent: decode: 25+ tokens/sec prefill: 250+ tokens/sec

Then I gave the agent one task:

Backtest a strategy:

Buy when RSI crosses above 30
Sell at +2% profit or -1% stoploss
No overlapping positions
Use Google stock via yfinance
Generate a full HTML report with candlestick charts + signals

What happened next was wild. It didn’t just write code, it ran the entire workflow itself:

Audited the environment (pip list, dependency check)

Hit a ModuleNotFoundError, multiple Python installs were conflicting

Ran where python to map every interpreter on the system

Manually selected the correct Python 3.13 path and re ran the script

Wrote a clean statevmachine backtester (strict no overlapping trades logic)

Patched a yfinance MultiIndex quirk that would’ve crashed the script

Built Plotly candlestick + RSI charts with buy/sell markers

Calculated win rate, PnL, and summary stats Exported a polished single file HTML report. check the report at the end of the video or in the comments.

Biggest takeaway: local LLMs aren’t just “chat assistants” anymore. They debug their own environment, write production code, and ship a finished deliverable on consumer hardware, for $0 in API costs.

If you’re still calling local models “toys,” you’re already behind.

This is just the beginning.

Hermes agent just surpassed 1 trillion tokens in a single day on OpenRouter. Think about the scale of total token generation happening right now.

Disclaimer: This is not financial advice. Consult a professional before making any trading decisions.

Teknium 🪽 (@Teknium): Wait we actually just broke 1T tokens in a day for the first time on OpenRouter :O

Please keep contributing to the most awesome project I’ve ever worked on to help make Hermes Agent the best software stack on the planet! Thank you contributors🍻🍻

@analogalok: I just got Gemma 4 26B A4B MoE model running fully locally with Hermes agent on an 8GB RTX 4060 and it's now backtestin…

Here’s what I did:

Then I gave the agent one task:

Similar Articles

@analogalok: Run Gemma 4 26B MoE on 8GB VRAM with 250k context at 20+ tokens/sec If you own any 8GB VRAM graphics card, stop what yo…

@VincentLogic: An entry-level laptop with 8GB VRAM can now run a fully autonomous AI Agent. Method: Gemma 4 26B + Hermes Desktop. Run the 26B model locally with just 8GB VRAM + 16GB RAM. What can it do after connecting Hermes? …

@svpino: Hermes with Gemma 4 or Qwen 3.5 is literally the best combo you can run locally on your computer. You've got to give th…

@analogalok: my 8 GB VRAM gaming laptop is absolutely going to hate me for this. but I still did it. ran a 31b dense model (Gemma 4 …

Gemma4 26b MoE running in MLX with turboquant (and custom kernel)

Submit Feedback

Similar Articles

@analogalok: Run Gemma 4 26B MoE on 8GB VRAM with 250k context at 20+ tokens/sec If you own any 8GB VRAM graphics card, stop what yo…

@VincentLogic: An entry-level laptop with 8GB VRAM can now run a fully autonomous AI Agent. Method: Gemma 4 26B + Hermes Desktop. Run the 26B model locally with just 8GB VRAM + 16GB RAM. What can it do after connecting Hermes? …

@svpino: Hermes with Gemma 4 or Qwen 3.5 is literally the best combo you can run locally on your computer. You've got to give th…

@analogalok: my 8 GB VRAM gaming laptop is absolutely going to hate me for this. but I still did it. ran a 31b dense model (Gemma 4 …

Gemma4 26b MoE running in MLX with turboquant (and custom kernel)