Been iterating on
@tomosman's loop.
This one's winning:
/goal produce a verified, code-derived behavioral spec for this web platform, captured in one canonical spreadsheet that carries every feature from spec -> tested -> fixed -> verified.
Why: we need a single source of truth that maps every feature to its expected behavior *as the code implements it*, so that gaps and bugs surface and the platform can be driven to a known-good state. The spreadsheet is the source of truth.
Work on the current repo. Do Phase 0 and Phase 1 under this goal; when the spec is complete, switch into the /loop below to drive testing and remediation. Keep moving through phases without stopping, except at a real checkpoint (defined below).
Phase 0 - Plan (first): Detect the stack, the feature surface (routes, pages, components, API endpoints, background jobs, auth, settings…), and the test infra that already exists (unit/integration/e2e, browser automation, seeds/fixtures, a runnable dev server).
Propose (a) how you'll inventory features, (b) the spreadsheet schema, and (c) how you'll test in the loop given what's available. Proceed once the plan holds.
Phase 1 - Catalog & spec: Read the code and, for every feature, write a user story + the expected behavior as implemented, citing the file/function. Where the code is ambiguous, or behavior is undefined, log an open question - don't guess. Record every feature as a row in the canonical spreadsheet (create with the xlsx skill). Exit: every discoverable feature has a row.
One row, concretely:
| Area | User story | Expected behavior (from code) | Status | Defects | Type | Notes / source |
|---|---|---|---|---|---|---|
| Auth | As a returning user I want to log in with email+password so I can reach my dashboard | `POST /api/login` validates via bcrypt, sets httpOnly session cookie, 302 -> `/dashboard`; bad creds -> 401 + inline error | Spec'd | - | - | `api/auth/login.ts`, `LoginForm.tsx` |
Canonical artifact: exactly one .xlsx, updated in place across every phase and loop iteration - never fork into per-phase or per-iteration files. Status flows Spec'd -> Tested-Pass / Tested-Fail -> Fixed -> Verified. The main thread is the single writer.
Agentic execution:
- Delegate breadth to subagents: fan feature discovery and per-area testing across subagents so the main thread stays focused.
- Verify by running, not claiming - report real command/test output; state skips and unknowns plainly.
- Checkpoint (pause, ask, end the turn) only for a destructive/irreversible action, a fix needing a genuine product decision, or input only I can give. Otherwise, keep going.
- Self-check at each phase/loop boundary via a fresh-context subagent: re-verify the spreadsheet against the code (Phase 1) and against actual results (each loop pass).
/loop Quality cycle - once the spec is complete, iterate test -> fix -> re-test until clean.
Each iteration, in order:
1. Test: exercise every user story not yet Verified against the running app, preferring the strongest method available (browser/e2e automation > existing suites > documented static check only where execution truly isn't possible). Record actual pass/fail in the same spreadsheet; log every defect with its type (functional/logistical or UX). No app-behavior changes in this step.
2. Fix: think hard about root cause, then fix every functional/logistical and UX defect logged this iteration - cause, not symptom. Scope: only logged defects; no new features, no unrelated refactors. Update each row's status.
3. Re-test: re-run every story touched by a fix using the same method; set Verified, or back to Tested-Fail with notes if the fix didn't hold.
Exit when all user stories are Verified and no open functional/UX defects remain. Safety cap: if a story is still failing after 3 full iterations, stop, leave it Tested-Fail with root-cause notes, and report it rather than looping further.