Mentu

Local Inference & Shell

Local Inference & Shell

Not every recipe step needs a cloud model. Mentu runs steps three ways:

Mode Key Cost What it does
Cloud claude Per-call API pricing Reasoning via Claude Opus or Sonnet
Local LLM ollama / ollamaAgent $0.00 On-device inference via Ollama
Shell shell $0.00 Direct /bin/sh -c execution. No model involved.

You can mix all three in a single recipe. A local model scans the codebase, Claude analyzes what it found, a shell step runs the tests.


Shell

The shell backend executes the prompt content as a raw shell command. No LLM is involved -- the prompt IS the command. Zero cost, zero latency from inference, deterministic output.

{
  "label": "run-tests",
  "backend": "shell",
  "args": ["-p", "cd project && swift test 2>&1; echo STEP_COMPLETE"],
  "completion_keyword": "STEP_COMPLETE",
  "timeout": 300,
  "expected_changes": []
}

Shell backend only activates via explicit "backend": "shell". It is never auto-selected. This prevents accidental shell execution of prompts meant for models.

Prompt delivery works two ways:

  • -P path/to/script.sh -- file content becomes the shell command
  • -p "echo hello" -- inline text becomes the shell command

Use cases: build verification, test runners, deployment scripts, data pipelines, health checks, linting, any step where the action is known in advance and does not need reasoning.

Shell steps pair naturally with LLM steps. A common pattern: an LLM step writes or modifies code, then a shell step compiles and tests it. The shell step gives you deterministic verification at zero cost.


Local LLM (Ollama)

Local models run on your machine via Ollama. Zero cost, no rate limits, no data leaves your machine.

Two variants:

ollama -- text completion only. Good for formatting, classification, summarization, report generation.

{
  "label": "format-report",
  "backend": "ollama",
  "model": "qwen2.5-coder:7b",
  "args": ["-P", "prompts/format-report.md"]
}

ollamaAgent -- full tool access. The model can execute bash commands, read and write files, search with grep and glob. Use this for multi-step workflows that need tool use but not cloud-level reasoning.

{
  "label": "run-monitor",
  "backend": "ollamaAgent",
  "model": "qwen3:30b",
  "effort": "max",
  "args": ["-P", "prompts/monitor-cycle.md"]
}

Setup

brew install ollama
brew services start ollama
ollama pull qwen2.5-coder:7b    # 4.7GB, fast text completion
ollama pull qwen3:30b           # 18GB, agentic tool use

Mixing modes in a recipe

A single sequence can use all three modes:

{
  "type": "sequence",
  "name": "daily-audit",
  "steps": [
    {
      "label": "scan-codebase",
      "backend": "ollamaAgent",
      "model": "qwen3:30b",
      "effort": "max",
      "args": ["-P", "prompts/scan.md"]
    },
    {
      "label": "analyze",
      "model": "claude-opus-4-6",
      "args": ["-P", "prompts/analyze.md"]
    },
    {
      "label": "run-tests",
      "backend": "shell",
      "args": ["-p", "cd project && swift test 2>&1; echo STEP_COMPLETE"],
      "completion_keyword": "STEP_COMPLETE"
    }
  ]
}

Step 1 runs locally via Ollama (zero cost). Step 2 uses Claude (reasoning). Step 3 runs a shell command (deterministic).

Effort-based routing

When a step has "effort": "low" and Ollama is running, the engine routes to the local model automatically. No explicit backend needed.

{ "label": "parse-output", "effort": "low" }

Routing priority:

  1. Explicit backend on the step (always wins)
  2. Effort-based routing (effort: "low" or "medium" prefers local)
  3. Default backend from config

Fallback

When a local model fails (bad output, connection error), the engine escalates to Claude automatically. Cost only incurred when local fails.

CLI override

Force all steps to a specific backend at runtime:

mentu sequence daily-audit --backend ollama
mentu sequence daily-audit --backend claude

See also

© 2026 Mentu.