Applied AI engineer

Tim
Costagliola.

I build AI systems I can defend. Evals gate the releases. Retrieval gets measured. Traces explain the bill. Guardrails get attacked first.

Five systems running Measuring: evals · Q3 2026 Every number here is earned, or absent
Tim Costagliola

Current focus

One measured artifact per phase · 2026–27
Now · Q3 2026

Evals

Golden datasets from real briefs, deterministic checks, a calibrated LLM judge, and a CI gate that fails before a regression reaches a customer.

Ships: the harness, and a regression caught on camera
Q4 2026

Retrieval

Hybrid search with reranking, scored on precision and recall. Four approaches, benchmarked on production data.

Ships: the four-way bake-off
Q1 2027

Observability

Every call traced, every dollar attributed, live traffic scored asynchronously. Then cost comes down without quality going with it.

Ships: the trace that found the waste
Q2 2027

Safety

A versioned attack battery run against my own systems, guardrails in depth, and the before/after numbers, responsibly disclosed.

Ships: the red-team report