prompt-processing

#prompt-processing

Nemotron - King of the Deep? Comparison of 4 models <=120B

Reddit r/LocalLLaMA ↗ · 2026-06-14

Comparison of four large language models (≤120B parameters) on deep context performance using Strix Halo hardware. Nemotron Super excels in prompt processing speed at deep context depths compared to GPT-OSS and Qwen models.

0 favorites 0 likes

#prompt-processing

Macs for Local LLM and Openclaw - What I wish I had known.....

Reddit r/openclaw ↗ · 2026-05-25

A user shares their experience running local LLMs on Mac, noting that prompt processing is slow for AI agents compared to Nvidia GPUs, and recommends cloud models like Deepseek unless privacy is a concern.

0 favorites 0 likes

#prompt-processing

For everyone that uses OpenCode / Pi - Heres your promptprocessing fix!

Reddit r/LocalLLaMA ↗ · 2026-05-21

A pull request for llama.cpp fixes the constant prompt processing issue that occurs when using OpenCode or Pi with the library.

0 favorites 0 likes

#prompt-processing

[Benchmark] 5090RTX: Promt Parsing, Token Generation and Power Level

Reddit r/LocalLLaMA ↗ · 2026-05-14

A user benchmarks the Nvidia 5090 RTX GPU for LLM inference using llama.cpp, measuring prompt processing and token generation at various power levels, finding that prompt processing is more sensitive to power limits than token generation, and noting differences from the 4090 RTX.

0 favorites 0 likes

#prompt-processing

Drastically improve prompt processing speed for --n-cpu-moe partially offloaded models

Reddit r/LocalLLaMA ↗ · 2026-05-12

The article shares a performance optimization trick for llama.cpp, showing that increasing the micro-batch size (`-ub`) combined with partial CPU offloading (`--n-cpu-moe`) can drastically improve prompt processing speed for large models like gpt-oss-120b on consumer GPUs.

0 favorites 0 likes

prompt-processing

Nemotron - King of the Deep? Comparison of 4 models <=120B

Macs for Local LLM and Openclaw - What I wish I had known.....

For everyone that uses OpenCode / Pi - Heres your promptprocessing fix!

[Benchmark] 5090RTX: Promt Parsing, Token Generation and Power Level

Drastically improve prompt processing speed for --n-cpu-moe partially offloaded models

Submit Feedback