How I work with Claude
A method, not magic.
I treat AI pair-engineering as a discipline, not a shortcut: written specs, verification gates, and harvested learnings. Claude multiplies my output only because the process around it is strict. I built this portfolio with Claude Design and Claude Code, reviewed it commit by commit, and the page you are reading is that process running in public.
Principles
Spec before prompt
I write the brief — schema, constraints, acceptance criteria — before Claude generates anything. On the Zelebrix RAG build I specified hybrid retrieval, PII scrubbing, and local data queries up front, then had Claude implement against that.
PREVENTS: the model inventing requirements, and a payments tenant’s data leaking past a boundary I never named.
Claude proposes, evals dispose
Acceptance is decided by tests, not by how the output reads. At Gestamp my golden dataset with ground-truth results gates every Text-to-SQL deploy at 100% on controlled tests; at Zelebrix a 150-question golden set plus a retrieval probe scores each change.
PREVENTS: “looks right” shipping, and a wrong SQL answer breaking trust silently.
Measure before you blame the model
I make Claude prove the bottleneck with data before I let it tune anything. On Zelebrix an LLM benchmark showed the failure was retrieval — synonym gaps — not the generation model.
PREVENTS: effort flowing to the loudest suspect, and the real defect surviving a redesign that changed nothing.
Review every commit
I read AI-assisted work commit by commit before it lands, the same way I review intern PRs at Gestamp and give architectural guidance. Claude drafts; I accept, reject, or rewrite each diff.
PREVENTS: unreviewed changes accreting, and a subtle regression in retrieval or impersonation shipping under my name.
Harvest everything
Every build mines a reusable artifact — the Gestamp eval framework, the Zelebrix RAGAS-lite scorer, the IDEA ingestion and retrieval libraries. Claude helps generalize one project’s harness into the next.
PREVENTS: hard-won workflows dying in old repos — exactly the gap my planned skills-and-harnesses collection closes.
The workflow
Frame
Writes the problem, constraints, and acceptance criteria.
Restates the brief and surfaces ambiguities.
EXIT ARTIFACT: an agreed spec with named failure modes.
Scaffold
Defines the module contracts (endpoint factory, ingestion/retrieval libraries).
Generates the first implementation.
EXIT ARTIFACT: a running skeleton that compiles and wires together.
Iterate
Asks for small, single-purpose changes.
Edits one slice and runs it.
EXIT ARTIFACT: a reviewed commit that does one thing.
Verify
Writes the eval set and edge cases.
Generates the test harness and runs it.
EXIT ARTIFACT: a green run on the golden set before deploy.
Harvest
Names what should outlive the project.
Extracts it into a reusable skill or library.
EXIT ARTIFACT: a harness ready for the next build.
DISCIPLINE
Where it breaks — and what I do about it:
- Claude over-trusts its first architecture. I demand alternatives with trade-offs and benchmark them — the Zelebrix LLM comparison settled a design call with data, not preference.
- It cannot judge its own retrieval quality. I own eval design — golden sets, ground-truth results, retrieval probes — because a model grading itself reports comfort, not correctness.
- It does not know my privacy and permission boundaries. I specify and verify them by hand — PII scrubbing, local data queries, RBAC impersonation — because a leaked boundary is not something a prompt fixes after the fact.
Toolchain
| Claude Code | The implementation loop: scaffolding, edits, and the commit-by-commit reviews I gate every change through. |
| Claude Design | Visual and UX iteration — this site is the demo, built without prior front-end stack knowledge. |
| Skills & harnesses | Codified practice: my eval frameworks, retrieval libraries, and scorers, carried forward between builds. |
| Evals / RAGAS | Acceptance gates — golden sets, retrieval probes, and the RAGAS module wired into the IDEA platform. |
| Langfuse | Observability on the IDEA RAG platform: tracing what the agents actually did in production. |
Worked example
Full annotated build log coming from the LLMwiki project. Meanwhile, the working-with-Claude notes on each project page — Zelebrix, IDEA, and the Gestamp agents — show this method in action on shipped work.
A spec the generator
cannot deviate from.