Local Inference & Shell
Local Inference & Shell
Not every recipe step needs a cloud model. Mentu runs steps three ways:
| Mode | Key | Cost | What it does |
|---|---|---|---|
| Cloud | claude |
Per-call API pricing | Reasoning via Claude Opus or Sonnet |
| Local LLM | ollama / ollamaAgent |
$0.00 | On-device inference via Ollama |
| Shell | shell |
$0.00 | Direct /bin/sh -c execution. No model involved. |
You can mix all three in a single recipe. A local model scans the codebase, Claude analyzes what it found, a shell step runs the tests.
Shell
The shell backend executes the prompt content as a raw shell command. No LLM is involved -- the prompt IS the command. Zero cost, zero latency from inference, deterministic output.
{
"label": "run-tests",
"backend": "shell",
"args": ["-p", "cd project && swift test 2>&1; echo STEP_COMPLETE"],
"completion_keyword": "STEP_COMPLETE",
"timeout": 300,
"expected_changes": []
}Shell backend only activates via explicit "backend": "shell". It is never auto-selected. This prevents accidental shell execution of prompts meant for models.
Prompt delivery works two ways:
-P path/to/script.sh-- file content becomes the shell command-p "echo hello"-- inline text becomes the shell command
Use cases: build verification, test runners, deployment scripts, data pipelines, health checks, linting, any step where the action is known in advance and does not need reasoning.
Shell steps pair naturally with LLM steps. A common pattern: an LLM step writes or modifies code, then a shell step compiles and tests it. The shell step gives you deterministic verification at zero cost.
Local LLM (Ollama)
Local models run on your machine via Ollama. Zero cost, no rate limits, no data leaves your machine.
Two variants:
ollama -- text completion only. Good for formatting, classification, summarization, report generation.
{
"label": "format-report",
"backend": "ollama",
"model": "qwen2.5-coder:7b",
"args": ["-P", "prompts/format-report.md"]
}ollamaAgent -- full tool access. The model can execute bash commands, read and write files, search with grep and glob. Use this for multi-step workflows that need tool use but not cloud-level reasoning.
{
"label": "run-monitor",
"backend": "ollamaAgent",
"model": "qwen3:30b",
"effort": "max",
"args": ["-P", "prompts/monitor-cycle.md"]
}Setup
brew install ollama
brew services start ollama
ollama pull qwen2.5-coder:7b # 4.7GB, fast text completion
ollama pull qwen3:30b # 18GB, agentic tool useMixing modes in a recipe
A single sequence can use all three modes:
{
"type": "sequence",
"name": "daily-audit",
"steps": [
{
"label": "scan-codebase",
"backend": "ollamaAgent",
"model": "qwen3:30b",
"effort": "max",
"args": ["-P", "prompts/scan.md"]
},
{
"label": "analyze",
"model": "claude-opus-4-6",
"args": ["-P", "prompts/analyze.md"]
},
{
"label": "run-tests",
"backend": "shell",
"args": ["-p", "cd project && swift test 2>&1; echo STEP_COMPLETE"],
"completion_keyword": "STEP_COMPLETE"
}
]
}Step 1 runs locally via Ollama (zero cost). Step 2 uses Claude (reasoning). Step 3 runs a shell command (deterministic).
Effort-based routing
When a step has "effort": "low" and Ollama is running, the engine routes to the local model automatically. No explicit backend needed.
{ "label": "parse-output", "effort": "low" }Routing priority:
- Explicit
backendon the step (always wins) - Effort-based routing (
effort: "low"or"medium"prefers local) - Default backend from config
Fallback
When a local model fails (bad output, connection error), the engine escalates to Claude automatically. Cost only incurred when local fails.
CLI override
Force all steps to a specific backend at runtime:
mentu sequence daily-audit --backend ollama
mentu sequence daily-audit --backend claudeSee also
- Architecture
- CLI Commands for the
--backendflag - Recipe Schema for the
backendfield