Autonomous QA

Point it at your web app. It logs in as every user, tests every page, captures every API call, and tells you what broke.

formula29 steps9 waves$935m

claudeCrawliodatabase

Preflight

1/29 · Sonnet 4.6

Before anything runs, mentu checks that your dev server is up, the browser is reachable, and your test users can log in. If any of these fail, the run stops here. Not three steps later.

46s · $0.14

Where each step's output is infrastructure for the next.

▓░░░░░░░░░░░░░░░░░░░░░░░░░░░░1/29 · preflight · Sonnet 4.6

MCP: headless-browser ✓, database ✓

Dev server responding. Database connected. Test users exist.

dev_server_status = "200", buyer_user_exists = true, supplier_user_exists = true

builder · closed · 8 turns · $0.14 · 32s

▓░░░░░░░░░░░░░░░░░░░░░░░░░░░░ 1/29 · preflight · ✓ 46s · $0.14

Scaffold Data

2/29 · Sonnet 4.6

The agent creates test data in your database. Suppliers, tenders, purchase orders, invoices. Then it queries every table to verify counts match. If the data isn't right, nothing downstream will be either.

5m6s · $0.74

▓▓░░░░░░░░░░░░░░░░░░░░░░░░░░░2/29 · scaffold-data · Sonnet 4.6

MCP: headless-browser ✓, database ✓

Creating test entities: suppliers, licitaciones, purchase orders, invoices.

All counts verified:

suppliers 3 ✓ · licitaciones 3 ✓ · lotes 6 ✓ · purchase_orders 3 ✓

invoices 4 ✓ · goods_receipts 1 ✓ · budgets 2 ✓ · pools 1 ✓

builder · closed · 31 turns · $0.74 · 4m57s

▓▓░░░░░░░░░░░░░░░░░░░░░░░░░░░ 2/29 · scaffold-data · ✓ 5m6s · $0.74

Scaffold Verify

3/29 · skipped

This step already passed in a prior run. The engine skips it. No reason to verify data that hasn't changed.

0s · $0

⚡ scaffold-verify · skipped (already succeeded in prior run)

Login

2 steps · parallel

Two agents launch at the same time. One logs in as a buyer, the other as a supplier. Each types credentials into a real browser, submits the form, and extracts the auth token. Both write reports.

2m10s · $0.56 · 2/2 passed

┌ Parallel layer 4: buyer-login-deep, supplier-login-deep

⠙ buyer-login-deep · 12s

⠹ supplier-login-deep · 12s

⠹ buyer-login-deep · 1m57s

⠸ supplier-login-deep · 1m57s

┌─ supplier-login-deep ─────────────────

│ Login form loaded. Credentials typed. JWT extracted.

│ consoleErrorCount = 0

✓ supplier-login-deep · 2m2s · $0.28

┌─ buyer-login-deep ────────────────────

│ Login form loaded. Credentials typed. JWT extracted.

│ consoleErrorCount = 0

✓ buyer-login-deep · 2m10s · $0.28

└ Layer 4 complete

Browser Tests

15 steps · parallel burst

The big one. 15 agents run at once, each testing a different page. Every agent logs in, navigates, records all network traffic, checks the DOM, counts errors, takes a screenshot, and writes a report. Dashboard, invoices, analytics, settings -- every page your users touch.

16m9s · $4.20 · 15/15 passed

┌ Parallel layer 5: 15 browser tests

⠼ test-buyer-analytics · 20s

⠴ test-buyer-calificacion · 20s

⠦ test-buyer-configuracion · 20s

⠧ test-buyer-dashboard · 30s

⠇ test-buyer-facturas · 30s

⠏ test-buyer-licitaciones · 46s

⠋ test-buyer-ordenes · 46s

⠙ test-buyer-pools · 46s

⠹ test-supplier-dashboard · 1m16s

⠸ test-supplier-oportunidades · 1m16s

⠼ test-supplier-perfil · 1m16s

… +4 more running

┌─ test-buyer-proveedores ──────────────

│ auth_audit: all_authenticated=true, token_consistent=true

│ console_errors: 0 · api_errors: 0

✓ test-buyer-proveedores · 4m27s · $0.57

┌─ test-buyer-licitaciones ─────────────

│ auth_audit: all_authenticated=true, token_consistent=true

│ api_calls: 5 unique endpoints, all status < 400

│ console_errors: 0

✓ test-buyer-licitaciones · 4m48s · $0.60

┌─ test-buyer-facturas ─────────────────

│ Network: 215 requests captured, 0 errors

│ console_errors: 0

✓ test-buyer-facturas · 5m55s · $0.75

┌─ test-buyer-dashboard ────────────────

│ auth_audit: all_authenticated=true, org_consistent=true

│ all_api_success: true

✓ test-buyer-dashboard · 16m9s · $0.53

└ Layer 5 complete · 15/15 passed

Detail + Fault Injection

6 steps · parallel

Now mentu breaks things on purpose. It injects server errors, timeouts, and permission denials on critical pages. The test: does your app show an error message, or does it crash?

5m33s · $1.95 · 6/6 passed

┌ Parallel layer 6: detail tests + fault injection

⠸ fault-buyer-critical · 12s

⠼ fault-supplier-critical · 12s

⠴ test-buyer-detalle-licitacion · 12s

⠦ test-buyer-nueva-licitacion · 17s

⠧ test-buyer-proveedor-detail · 17s

⠇ test-supplier-oportunidad-detalle · 17s

┌─ fault-buyer-critical ────────────────

│ 4 fault scenarios injected: all_graceful = true

✓ fault-buyer-critical · 3m12s · $0.54

┌─ fault-supplier-critical ─────────────

│ 3 fault scenarios injected: all_graceful = true

✓ fault-supplier-critical · 3m9s · $0.54

┌─ test-supplier-oportunidad-detalle ───

│ auth_audit: all_authenticated=true, token_consistent=true

│ api_calls: 8 unique endpoints

✓ test-supplier-oportunidad-detalle · 5m33s · $0.74

└ Layer 6 complete

API Error Analysis

27/29 · Sonnet 4.6

Every browser test captured its network traffic. This step reads all of it and looks for any API call that returned an error. Zero errors across 149 calls.

32s · $0.11

▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓░░27/29 · api-error-analysis · Sonnet 4.6

API error analysis complete: 0 errors across 0 unique URLs.

All test steps reported clean -- no 4xx/5xx API errors detected

across any of the buyer/supplier flow tests.

builder · closed · 2 turns · $0.11 · 22s

▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓░░ 27/29 · api-error-analysis · ✓ 32s · $0.11

Auth Cross-Audit

28/29 · Sonnet 4.6

Did any auth token leak between buyer and supplier sessions? This step reads every report and checks. Org IDs, JWT consistency, request authentication -- across both portals, across 149 API calls.

1m12s · $0.38

▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓░28/29 · auth-cross-audit · Sonnet 4.6

Reading all 21 reports in parallel.

Buyer org_id consistency: true (single org across all pages)

Supplier org_id consistency: true

Total API calls audited: 149 (96 buyer + 53 supplier)

All pages authenticated: true · All tokens consistent: true

Fault injection: 7 scenarios, 7 graceful

builder · closed · 26 turns · $0.38 · 1m1s

▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓░ 28/29 · auth-cross-audit · ✓ 1m12s · $0.38

Result

29 steps complete

Done. 28 of 29 steps passed. The engine saves two patterns to the recipe library for future runs. Every report, screenshot, and network capture is on disk.

CIR: 2 pattern(s) crystallized

⊕ Promoted pattern → recipe cites-in-commitment (formula, 1 step)

⊕ Promoted pattern → recipe cites-in-step (formula, 1 step)

══════════════════════════════════════════════════

✓ test-platform--interceptor · 29 steps · 28 ok · 1 skipped

══════════════════════════════════════════════════

Where each step's output is infrastructure for the next.

▓░░░░░░░░░░░░░░░░░░░░░░░░░░░░1/29 · preflight · Sonnet 4.6

MCP: headless-browser ✓, database ✓

Dev server responding. Database connected. Test users exist.

dev_server_status = "200", buyer_user_exists = true, supplier_user_exists = true

builder · closed · 8 turns · $0.14 · 32s

▓░░░░░░░░░░░░░░░░░░░░░░░░░░░░ 1/29 · preflight · ✓ 46s · $0.14

▓▓░░░░░░░░░░░░░░░░░░░░░░░░░░░2/29 · scaffold-data · Sonnet 4.6

MCP: headless-browser ✓, database ✓

Creating test entities: suppliers, licitaciones, purchase orders, invoices.

All counts verified:

suppliers 3 ✓ · licitaciones 3 ✓ · lotes 6 ✓ · purchase_orders 3 ✓

invoices 4 ✓ · goods_receipts 1 ✓ · budgets 2 ✓ · pools 1 ✓

builder · closed · 31 turns · $0.74 · 4m57s

▓▓░░░░░░░░░░░░░░░░░░░░░░░░░░░ 2/29 · scaffold-data · ✓ 5m6s · $0.74

⚡ scaffold-verify · skipped (already succeeded in prior run)

┌ Parallel layer 4: buyer-login-deep, supplier-login-deep

⠙ buyer-login-deep · 12s

⠹ supplier-login-deep · 12s

⠹ buyer-login-deep · 1m57s

⠸ supplier-login-deep · 1m57s

┌─ supplier-login-deep ─────────────────

│ Login form loaded. Credentials typed. JWT extracted.

│ consoleErrorCount = 0

✓ supplier-login-deep · 2m2s · $0.28

┌─ buyer-login-deep ────────────────────

│ Login form loaded. Credentials typed. JWT extracted.

│ consoleErrorCount = 0

✓ buyer-login-deep · 2m10s · $0.28

└ Layer 4 complete

┌ Parallel layer 5: 15 browser tests

⠼ test-buyer-analytics · 20s

⠴ test-buyer-calificacion · 20s

⠦ test-buyer-configuracion · 20s

⠧ test-buyer-dashboard · 30s

⠇ test-buyer-facturas · 30s

⠏ test-buyer-licitaciones · 46s

⠋ test-buyer-ordenes · 46s

⠙ test-buyer-pools · 46s

⠹ test-supplier-dashboard · 1m16s

⠸ test-supplier-oportunidades · 1m16s

⠼ test-supplier-perfil · 1m16s

… +4 more running

┌─ test-buyer-proveedores ──────────────

│ auth_audit: all_authenticated=true, token_consistent=true

│ console_errors: 0 · api_errors: 0

✓ test-buyer-proveedores · 4m27s · $0.57

┌─ test-buyer-licitaciones ─────────────

│ auth_audit: all_authenticated=true, token_consistent=true

│ api_calls: 5 unique endpoints, all status < 400

│ console_errors: 0

✓ test-buyer-licitaciones · 4m48s · $0.60

┌─ test-buyer-facturas ─────────────────

│ Network: 215 requests captured, 0 errors

│ console_errors: 0

✓ test-buyer-facturas · 5m55s · $0.75

┌─ test-buyer-dashboard ────────────────

│ auth_audit: all_authenticated=true, org_consistent=true

│ all_api_success: true

✓ test-buyer-dashboard · 16m9s · $0.53

└ Layer 5 complete · 15/15 passed

┌ Parallel layer 6: detail tests + fault injection

⠸ fault-buyer-critical · 12s

⠼ fault-supplier-critical · 12s

⠴ test-buyer-detalle-licitacion · 12s

⠦ test-buyer-nueva-licitacion · 17s

⠧ test-buyer-proveedor-detail · 17s

⠇ test-supplier-oportunidad-detalle · 17s

┌─ fault-buyer-critical ────────────────

│ 4 fault scenarios injected: all_graceful = true

✓ fault-buyer-critical · 3m12s · $0.54

┌─ fault-supplier-critical ─────────────

│ 3 fault scenarios injected: all_graceful = true

✓ fault-supplier-critical · 3m9s · $0.54

┌─ test-supplier-oportunidad-detalle ───

│ auth_audit: all_authenticated=true, token_consistent=true

│ api_calls: 8 unique endpoints

✓ test-supplier-oportunidad-detalle · 5m33s · $0.74

└ Layer 6 complete

▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓░░27/29 · api-error-analysis · Sonnet 4.6

API error analysis complete: 0 errors across 0 unique URLs.

All test steps reported clean -- no 4xx/5xx API errors detected

across any of the buyer/supplier flow tests.

builder · closed · 2 turns · $0.11 · 22s

▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓░░ 27/29 · api-error-analysis · ✓ 32s · $0.11

▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓░28/29 · auth-cross-audit · Sonnet 4.6

Reading all 21 reports in parallel.

Buyer org_id consistency: true (single org across all pages)

Supplier org_id consistency: true

Total API calls audited: 149 (96 buyer + 53 supplier)

All pages authenticated: true · All tokens consistent: true

Fault injection: 7 scenarios, 7 graceful

builder · closed · 26 turns · $0.38 · 1m1s

▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓░ 28/29 · auth-cross-audit · ✓ 1m12s · $0.38

CIR: 2 pattern(s) crystallized

⊕ Promoted pattern → recipe cites-in-commitment (formula, 1 step)

⊕ Promoted pattern → recipe cites-in-step (formula, 1 step)

══════════════════════════════════════════════════

✓ test-platform--interceptor · 29 steps · 28 ok · 1 skipped

══════════════════════════════════════════════════

Wave 1/9 · $0.14

Key numbers

29Steps

9Waves

$9Total cost

35mDuration

Autonomous QA

Preflight

Scaffold Data

Scaffold Verify

Login

Browser Tests

Detail + Fault Injection

API Error Analysis

Auth Cross-Audit

Result

Key numbers

See also