Retrieval-first AI product consulting

Make AI products with complex knowledge reliable.

I help teams turn unreliable AI answers into inspectable systems: better context, clearer controls, and workflows that hold up in real use.

“If I pick one individual for an unknown data problem, it will be Isaac.”
Jake Selvey, Director of Analytics & Data Science

Diagnostic checklist

If this sounds familiar

These failures show up in docs assistants, support bots, internal copilots, and agents that need retrieved memory or source-grounded tool use.

Common breakpoints

retrievalrerankingcontextcitationevalworkflow
1

The answer sounds right but used the wrong evidence

The model gives a polished answer from a partial, stale, or irrelevant context packet.

2

Citations point to pages, not proof

The cited source exists, but the claim is not there, or the page is not the source of authority.

3

Search finds nearby docs and misses the deciding one

A plausible chunk outranks the policy, ticket, code path, or section that actually controls the answer.

4

No one can replay how the answer happened

The query rewrite, retrieved sources, rerank, context, answer, and citations are scattered or missing.

5

Every fix turns into a prompt argument

Prompt, retrieval, eval, and workflow issues blur together because there is no shared trace.

6

Real users ask questions your eval set never sees

The product is tested on tidy examples while production questions mix missing details, messy wording, and private context.

Recommended engagement

Start with a retrieval product audit

The audit is the clean first move when the product is already answering real questions and quality is uneven. It tells you where the context pipeline breaks and what to fix first.

Recommended first step

Retrieval Product Audit

Find where the product first loses the evidence it needs, then leave with the failures, fixes, and next implementation decisions clearly named.

Timing

1-2 weeks

Input

Real questions, source docs, traces, logs, or a demo

Output

Failure taxonomy, trace review, and prioritized fix plan

Best for: Teams with an existing assistant, search product, support bot, docs assistant, or agent workflow where quality is uneven.

What you leave with

  • Source and workflow review
  • Query set and failure review
  • Retrieval trace and citation review
  • Product-specific failure taxonomy
  • 30-day implementation plan

After diagnosis

Retrieval Pipeline Sprint

Build or repair the context pipeline behind a retrieval-heavy AI product or agent workflow.

Timing

2-6 weeks

Input

A target workflow, source corpus, and team owner

Output

Pipeline changes, trace logging, eval harness, and handoff docs

Best for: Teams that know the context pipeline needs implementation work, not only advice.

Ongoing review

AI Product Advisory

Senior product and architecture review while your team builds AI workflows and agents over private or complex knowledge.

Timing

Monthly or project-based

Input

Roadmap, architecture, evals, traces, and product questions

Output

Decision reviews, failure analysis, and next-step guidance

Best for: Teams building multiple AI workflows that need steady judgment while they ship.

Audit method

The evidence-path audit method

For RAG assistants and agents, the audit follows one question: where did the right evidence first stop reaching the answer or action?

Evidence path trace

Every audit follows the same chain. The expensive bugs usually appear where evidence changes shape.

sourcecontextanswer
1

User question

The actual task, not a tidy test case.

input
2

Source truth

Policies, tickets, code, docs, memory.

authority
3

Retrieval

Search, filters, chunking, reranking.

selection
4

Context packet

What the model or agent actually sees.

audit breakpoint

payload
5

Answer/action

The response, tool call, or handoff.

output
6

Support

Citations, trace, and review record.

record

Audit sequence

Five decisions turn a vague quality problem into a fixable system problem.

1

Inspect the workflow

Name the user task, the decision the answer or action supports, and the knowledge the system needs.

2

Trace the evidence path

Review what was searched, retrieved, reranked, kept in context, cited, used, or acted on.

3

Name the failures

Separate source coverage, ranking, context assembly, citation, eval, tool, state, and workflow failures.

4

Build the next fix

Turn the finding into an eval set, pipeline change, memory or tool change, trace loop, or workflow change.

5

Hand off the system

Leave the team with the method, not only the patch.

What gets checked

Sources

coveragefreshnesspermissionssource of truth

Retrieval

parsingchunkingmetadataexact searchgrep/rghybrid searchreranking

Generation

context assemblyanswer supportcitationssource support

Agent Workflow

memorytool callsskillsMCP serversstatehandoffs

Review Loop

real-question evalstracesobservabilityrun logsreview workflow

Buying questions

Questions buyers ask

You do not need a perfect eval setup before we start. You need a real workflow, a real knowledge problem, and a team willing to inspect the evidence path.

Do we need evals or traces already?

No. If you have them, I will use them. If you do not, the first step is usually turning real user questions into a small eval set and a trace format the team can review.

Is this for existing products or new builds?

Both. Existing products usually start with an audit. New builds usually start with workflow design, source review, and a small retrieval prototype before the team commits to an architecture.

Can this work with private data?

Yes. The work can start from sanitized traces, representative source samples, or a private environment. The important part is preserving the shape of the workflow and the evidence path.

Is this only for RAG, or also agents?

Both. Agents are often context systems with memory, tools, state, and handoffs. If an agent depends on knowledge outside the prompt, we need to trace what it found, what it kept, what it used, and what it cited or acted on.

What do we get at the end of an audit?

You get the failure modes, the evidence behind them, the highest-impact fixes, and a 30-day plan. The goal is to leave with decisions the team can act on.

When is a sprint better than an audit?

Choose a sprint when the team already knows the retrieval layer needs implementation work: ingestion, chunking, hybrid search, reranking, citations, trace logging, or eval plumbing.

How technical does our team need to be?

I can work with product, engineering, data, or founder-led teams. The work is most useful when someone owns the product workflow and someone can make or review implementation changes.

Founder note

I care about leaving the team sharper than I found it.

Isaac Flath / retrieval systems, RAG, and AI product review

I have taught AI coding and RAG, and that shapes how I consult. The goal is not to make me the only person who understands the system.

The goal is to leave your team with a review habit: name the failure, tie the fix to evidence, and keep the method after I leave.

  • Name the failure clearly.
  • Tie the fix to evidence.
  • Keep the review method.

Next step

Send the retrieval or agent problem

Send one failure you are seeing in a retrieval product, assistant, or agent workflow. I will reply with the best next step: a quick diagnostic call, an audit, a sprint, advisory, or a clear no-fit.

A useful first message includes:

  • One real failure, user question, or agent run.
  • What the system should have done.
  • Whether you already have traces, evals, logs, or source docs. No need to send private data yet.

Problem brief