Ollama Model Tester (GitHub Repo)

TLDR AI 06/05/26, 12:00 AM Tools

ollama cli model-testing open-source python llm local-ai

Summary

A small, dependency-free Python CLI tool that runs the same prompt against your local Ollama models and saves every response to disk, making it easy to compare models side by side.

Ollama Model Tester is a CLI tool for comparing local Ollama models by running the same prompt multiple times and saving responses for easy comparison.

Original Article

View Cached Full Text

Cached at: 06/05/26, 02:06 PM

ulyssestenn/omt

Source: https://github.com/ulyssestenn/omt

Ollama Model Tester

A small, dependency-free CLI for running the same prompt against your local Ollama models and saving every response to disk — so you can compare models (or compare repeated runs of one model) side by side.

It uses only the Python standard library: no pip install required.

Requirements

Python 3.7 or newer
Ollama running locally (the default http://localhost:11434)
At least one model pulled, e.g. ollama pull llama3.1:8b

Quick start

Make sure Ollama is running, then:

python3 ollama_model_test.py

You’ll be asked, in order:

Which model to use (pick a number from your installed models)
The prompt — type as many lines as you like, then put /done on its own line to finish
How many times to run the prompt
Temperature (0.0–2.0), or press Enter to use Ollama’s default
Whether to stream the responses live to the terminal

It then runs the prompt the requested number of times and writes the results under ollama-runs/.

Command-line flags (optional)

Every prompt above can be supplied up front, which makes the tool scriptable. Anything you omit is still asked interactively.

Flag	Description
`--model NAME`	Local model to use (must already be installed)
`--runs N`	Number of generations to run
`--temperature T`	Temperature, `0.0`–`2.0`
`--prompt-file PATH`	Read the prompt from a UTF-8 text file
`--stream` / `--no-stream`	Stream responses live, or don’t

Example — run a saved prompt three times, fully non-interactive:

python3 ollama_model_test.py \
  --model llama3.1:8b \
  --prompt-file prompt.txt \
  --runs 3 \
  --temperature 0.7 \
  --no-stream

Output

Results are grouped into one folder per prompt:

ollama-runs/
  what-are-the-main-tradeoffs-between_835562a4/
    prompt.md         # the prompt, with its hash and timestamp
    metadata.json     # every run against this prompt (model, timing, options)
    llama3.1-8b.md    # responses + Ollama metadata for this model
    gemma3-1b.md

The folder name is the first few words of the prompt plus a short hash of the full prompt. Because the folder is keyed on the prompt, running the same prompt against a different model drops its output into the same folder — making model-to-model comparison easy. Each model’s file records every run’s response alongside Ollama’s run metadata (token counts, timings, and so on).

Ollama Model Tester (GitHub Repo)

ulyssestenn/omt

Ollama Model Tester

Requirements

Quick start

Command-line flags (optional)

Output

Similar Articles

Added direct model downloads right from the UI in Anubis OSS - if anyone would help test that would be great

@NousResearch: Ollama now supports Hermes Desktop Run: 'ollama launch hermes-desktop'

I built a local autonomous coding agent with Ollama — fine-tuned soul model, 40-round agentic loop, MiniMax M3 for the heavy lifting

I made a small local model (llama3.2 3B) reliably extract structured JSON from documents - the hard part wasn't the model, it was everything around it

Built a Tauri v2 desktop chat shell for local LLMs — point it at Ollama / llama.cpp / any OpenAI-compatible endpoint, MIT, ~12 MB binary

Submit Feedback

Similar Articles

Added direct model downloads right from the UI in Anubis OSS - if anyone would help test that would be great

@NousResearch: Ollama now supports Hermes Desktop Run: 'ollama launch hermes-desktop'

I built a local autonomous coding agent with Ollama — fine-tuned soul model, 40-round agentic loop, MiniMax M3 for the heavy lifting

I made a small local model (llama3.2 3B) reliably extract structured JSON from documents - the hard part wasn't the model, it was everything around it

Built a Tauri v2 desktop chat shell for local LLMs — point it at Ollama / llama.cpp / any OpenAI-compatible endpoint, MIT, ~12 MB binary