The work
Everything →AI automations
Workflows that run unattended in real businesses. No babysitting required.
Look here →Production systems
Full products, built and run end-to-end. Isolation enforced at the database. One metered funnel for every model call.
Look here →Agents & tooling
Loops, tool registries, MCP, permission engines. Written by hand first, so no framework holds mysteries.
Look here →Measurement & safety
Evals, retrieval benchmarks, tracing, red-teams. The machinery that makes the rest defensible.
Look here →Hardware & embedded
The same detector in Python, C, and Arduino, under CI. Range past the browser.
Look here →Current focus
One measured artifact per phase · 2026–27Evals
Golden datasets from real briefs, deterministic checks, a calibrated LLM judge, and a CI gate that fails before a regression reaches a customer.
Ships: the harness, and a regression caught on cameraRetrieval
Hybrid search with reranking, scored on precision and recall. Four approaches, benchmarked on production data.
Ships: the four-way bake-offObservability
Every call traced, every dollar attributed, live traffic scored asynchronously. Then cost comes down without quality going with it.
Ships: the trace that found the wasteSafety
A versioned attack battery run against my own systems, guardrails in depth, and the before/after numbers, responsibly disclosed.
Ships: the red-team report