Make AI products with complex knowledge reliable.

I help teams turn unreliable AI answers into inspectable systems: better context, clearer controls, and workflows that hold up in real use.

Diagnose a retrieval problem See the audit method

“If I pick one individual for an unknown data problem, it will be Isaac.”

Jake Selvey, Director of Analytics & Data Science

Diagnostic checklist

If this sounds familiar

These failures show up in docs assistants, support bots, internal copilots, and agents that need retrieved memory or source-grounded tool use.

Common breakpoints

retrievalrerankingcontextcitationevalworkflow

The answer sounds right but used the wrong evidence

The model gives a polished answer from a partial, stale, or irrelevant context packet.

Citations point to pages, not proof

The cited source exists, but the claim is not there, or the page is not the source of authority.

Search finds nearby docs and misses the deciding one

A plausible chunk outranks the policy, ticket, code path, or section that actually controls the answer.

No one can replay how the answer happened

The query rewrite, retrieved sources, rerank, context, answer, and citations are scattered or missing.

Every fix turns into a prompt argument

Prompt, retrieval, eval, and workflow issues blur together because there is no shared trace.

Real users ask questions your eval set never sees

The product is tested on tidy examples while production questions mix missing details, messy wording, and private context.

Recommended engagement

Start with a retrieval product audit

The audit is the clean first move when the product is already answering real questions and quality is uneven. It tells you where the context pipeline breaks and what to fix first.

Recommended first step

Retrieval Product Audit

Find where the product first loses the evidence it needs, then leave with the failures, fixes, and next implementation decisions clearly named.

Start with an audit

Timing

1-2 weeks

Input

Real questions, source docs, traces, logs, or a demo

Output

Failure taxonomy, trace review, and prioritized fix plan

Best for: Teams with an existing assistant, search product, support bot, docs assistant, or agent workflow where quality is uneven.

What you leave with

Source and workflow review
Query set and failure review
Retrieval trace and citation review
Product-specific failure taxonomy
30-day implementation plan

After diagnosis

Retrieval Pipeline Sprint

Build or repair the context pipeline behind a retrieval-heavy AI product or agent workflow.

Timing

2-6 weeks

Input

A target workflow, source corpus, and team owner

Output

Pipeline changes, trace logging, eval harness, and handoff docs

Best for: Teams that know the context pipeline needs implementation work, not only advice.

Ongoing review

AI Product Advisory

Senior product and architecture review while your team builds AI workflows and agents over private or complex knowledge.

Timing

Monthly or project-based

Input

Roadmap, architecture, evals, traces, and product questions

Output

Decision reviews, failure analysis, and next-step guidance

Best for: Teams building multiple AI workflows that need steady judgment while they ship.

Audit method

The evidence-path audit method

For RAG assistants and agents, the audit follows one question: where did the right evidence first stop reaching the answer or action?

Evidence path trace

Every audit follows the same chain. The expensive bugs usually appear where evidence changes shape.

sourcecontextanswer

User question

The actual task, not a tidy test case.

input

Source truth

Policies, tickets, code, docs, memory.

authority

Retrieval

Search, filters, chunking, reranking.

selection

Context packet

What the model or agent actually sees.

audit breakpoint

payload

Answer/action

The response, tool call, or handoff.

output

Support

Citations, trace, and review record.

record

Audit sequence

Five decisions turn a vague quality problem into a fixable system problem.

Inspect the workflow

Name the user task, the decision the answer or action supports, and the knowledge the system needs.

Trace the evidence path

Review what was searched, retrieved, reranked, kept in context, cited, used, or acted on.

Name the failures

Separate source coverage, ranking, context assembly, citation, eval, tool, state, and workflow failures.

Build the next fix

Turn the finding into an eval set, pipeline change, memory or tool change, trace loop, or workflow change.

Hand off the system

Leave the team with the method, not only the patch.

What gets checked

Sources

coveragefreshnesspermissionssource of truth

Retrieval

parsingchunkingmetadataexact searchgrep/rghybrid searchreranking

Generation

context assemblyanswer supportcitationssource support

Agent Workflow

memorytool callsskillsMCP serversstatehandoffs

Review Loop

real-question evalstracesobservabilityrun logsreview workflow

Public proof

Artifacts that show the working style

Public artifacts that show how I debug retrieval and agent systems: traces, evals, source-grounded memory, and investigation loops.

Field tool

Retrieval Trace Template

The field tool behind the hero image. It turns final-answer or agent-run review into source, context, citation, and fix review.

Course

Learn RAG

A practical RAG course I built and taught: search, chunking, reranking, citations, and eval habits.

Reference architecture

AgentKB

A reference pattern for inspectable agent memory with source files, indexes, and traces.

Reference architecture

RLM

A research loop for agents that need to investigate before they answer.

Case study

Calculus Alignment

A case study in expert search and synthesis over curriculum material.

Essay

Find Where The Evidence First Goes Missing

The operating principle behind the audit: find the first point where evidence disappears in a RAG or agent system.

Buying questions

Questions buyers ask

You do not need a perfect eval setup before we start. You need a real workflow, a real knowledge problem, and a team willing to inspect the evidence path.

Do we need evals or traces already?

No. If you have them, I will use them. If you do not, the first step is usually turning real user questions into a small eval set and a trace format the team can review.

Is this for existing products or new builds?

Both. Existing products usually start with an audit. New builds usually start with workflow design, source review, and a small retrieval prototype before the team commits to an architecture.

Can this work with private data?

Yes. The work can start from sanitized traces, representative source samples, or a private environment. The important part is preserving the shape of the workflow and the evidence path.

Is this only for RAG, or also agents?

Both. Agents are often context systems with memory, tools, state, and handoffs. If an agent depends on knowledge outside the prompt, we need to trace what it found, what it kept, what it used, and what it cited or acted on.

What do we get at the end of an audit?

You get the failure modes, the evidence behind them, the highest-impact fixes, and a 30-day plan. The goal is to leave with decisions the team can act on.

When is a sprint better than an audit?

Choose a sprint when the team already knows the retrieval layer needs implementation work: ingestion, chunking, hybrid search, reranking, citations, trace logging, or eval plumbing.

How technical does our team need to be?

I can work with product, engineering, data, or founder-led teams. The work is most useful when someone owns the product workflow and someone can make or review implementation changes.

Founder note

I care about leaving the team sharper than I found it.

Isaac Flath / retrieval systems, RAG, and AI product review

I have taught AI coding and RAG, and that shapes how I consult. The goal is not to make me the only person who understands the system.

The goal is to leave your team with a review habit: name the failure, tie the fix to evidence, and keep the method after I leave.

Name the failure clearly.
Tie the fix to evidence.
Keep the review method.

Next step

Send the retrieval or agent problem

Send one failure you are seeing in a retrieval product, assistant, or agent workflow. I will reply with the best next step: a quick diagnostic call, an audit, a sprint, advisory, or a clear no-fit.

A useful first message includes:

One real failure, user question, or agent run.
What the system should have done.
Whether you already have traces, evals, logs, or source docs. No need to send private data yet.

Make AI products with complex knowledge reliable.

If this sounds familiar

Common breakpoints

The answer sounds right but used the wrong evidence

Citations point to pages, not proof

Search finds nearby docs and misses the deciding one

No one can replay how the answer happened

Every fix turns into a prompt argument

Real users ask questions your eval set never sees

Start with a retrieval product audit

Retrieval Product Audit

What you leave with

Retrieval Pipeline Sprint

AI Product Advisory

The evidence-path audit method

Evidence path trace

User question

Source truth

Retrieval

Context packet

Answer/action

Support

Audit sequence

Inspect the workflow

Trace the evidence path

Name the failures

Build the next fix

Hand off the system

What gets checked

Sources

Retrieval

Generation

Agent Workflow

Review Loop

Artifacts that show the working style

Retrieval Trace Template

Learn RAG

AgentKB

RLM

Calculus Alignment

Find Where The Evidence First Goes Missing

Questions buyers ask

Do we need evals or traces already?

Is this for existing products or new builds?

Can this work with private data?

Is this only for RAG, or also agents?

What do we get at the end of an audit?

When is a sprint better than an audit?

How technical does our team need to be?

I care about leaving the team sharper than I found it.

Send the retrieval or agent problem

A useful first message includes:

Problem brief