Back to Writing

mgrep with Founding Engineer Rui Huang

By Isaac FlathยทDecember 10, 2025
mgrep with Founding Engineer Rui Huang

mgrep with Founding Engineer Rui Huang

The Problem with grep for AI Agents

I hosted a talk with Rui Huang, a founding engineer at Mixedbread, about mgrep, a tool now essential to my workflow.

I sync every repo locally, automatically, all the time. I've seen companies replace production semantic search with Mixedbread's cloud service because it finds more relevant results and ranks them well.

Rui's talk focused on using their semantic search technology, specifically mgrep, to build more effective coding agents. I was excited to hear from him directly, since he's on the engineering team building the product.

This post covers key points from our conversation: the limits of tools like grep, how mgrep provides a semantic alternative, and the multi-vector and multimodal technology that powers it.

For an even deeper dive on what makes mgrep work and the research that went into it, see Most RAG Systems Have a Context Problem

The Problem with grep for AI Agents

The "RAG is dead" headlines are bombastic, but common. The argument goes: modern AI agents are so powerful they don't need fancy semantic search. They can use basic tools like grep to find context with better results than semantic search.

This argument usually targets older semantic search methods.

Slide discussing the
Agentic search using `grep` is powerful but has significant drawbacks. Rui explained issues his team observed in long-running tasks.

1. It's Slow and Expensive.

grep is a pattern-matching tool. For a high-level task like "refactor the authentication flow," an agent must guess keywords (authentication, middleware, credentials) and run multiple tool calls. That increases latency and token consumption, stuffing the context window with noise.

2. It Degrades Quality.

More tool calls fill the context window with partial information. The original intent gets diluted. Rui pointed out this is when you see agents hallucinate or get stuck, responding with phrases like "You're absolutely right" without making progress.

I've found this true in my own work. Faster feedback loops matter. When an agent uses fewer tokens and gets to the point faster, I can iterate and stay focused.

Introducing mgrep: Semantic Search for Code

Mixedbread's solution is mgrep, a command-line tool designed to be a semantic version of grep. Instead of matching patterns, it understands the intent behind a natural language query.

The Semantic Query

mgrep "how streaming is implemented"

The tool returns a list of relevant files with specific line numbers and a similarity score, which helps the agent gauge the confidence of each result.

Slide showing the mgrep command and its output format
This approach lets the agent progressively discover context instead of stuffing multiple files into its prompt.

The impact is not subtle. In Mixedbread's internal tests with Claude on complex coding tasks, mgrep delivered a large performance advantage:

  • 53% fewer tokens used

  • 48% faster response times

  • 3.2x better quality responses

Slide with performance metrics for mgrep
> Note: I've seen similar gains in [my own > experiments](/writing/boosting-claude-faster-clearer-code-analysis-with-mgrep), > which is what inspired me to host this talk.

Live Demo: mgrep vs. grep

Rui demoed a playground that runs Claude side-by-side: standard grep vs. mgrep. The task was to query the React codebase (over 6 million tokens) with the question: "Explain how the useEffect hook works in common patterns."

Side-by-side comparison of Claude with grep vs. mgrep
mgrep is complementary to grep, not a replacement. Agents get both tools and can choose the right one: mgrep for semantic exploration and grep for exact-match symbol searching.

Try the playground yourself to see the difference without any setup.

Getting Started: Setup & Terminal Demo

mgrep syncs a local directory to a cloud-backed search index. Rui walked through the setup.

First, install the tool via npm:

npm install -g @mixedbread/mgrep # or pnpm / bun

Then, log in to connect to your Mixedbread account:

mgrep login

Once set up, you can sync any directory. Rui used Andrej Karpathy's nanogpt repository as an example. The watch command indexes the current folder, respects .gitignore, and syncs to a remote Mixedbread store.

mgrep watch

Screenshot at 22:15
Ingestion is fast and cheap: the 60-million-token React codebase takes about five minutes and costs \$20 to index.

Once synced, you can query your code with natural language. mgrep also includes a -a (--answer) flag that uses an LLM to return a direct answer with citations.

Screenshot at 26:30
This gives the agent a concise summary, reducing the need to process large file snippets.

Integrating mgrep with Coding Agents

While you can use mgrep manually, its core usage is in agent integration. Mixedbread provides plugins for popular coding agents like Claude Code.

A simple install command configures the agent to be aware of mgrep and how to use it.

mgrep install-claude-code

This command sets up the necessary skills and prompts. The mgrep watch process runs in the background during an agent session, keeping the index up to date.

These plugins are transparent. They're wrappers around the mgrep CLI with a pre-written prompt. You can customize behavior by creating your own prompts in your agent's configuration files.

Under the Hood: Multi-Vector & Multimodal Search

mgrep is powered by the Mixedbread Search API. When you sync files, they're sent to a Mixedbread store where the pipeline takes over.

Architecture diagram of the mgrep system
This is more than a vector database. The service analyzes file types, applies chunking strategies (e.g., different logic for Markdown vs. code), and generates embeddings using state-of-the-art models.

The key innovation is multi-vector search. Traditional semantic search creates one vector per chunk. Mixedbread represents every word as its own vector, creating a richer, more granular representation. Advanced quantization techniques make this approach scalable and affordable.

This system is also multimodal. It can natively index and search images, videos, audio, and PDFs without transcribing them to text. Rui demoed searches like "sad cat" and "angry cat."

Screenshot at 40:50
Terminal showing a multimodal search for
This is powerful for codebases with diagrams, visual assets, or complex PDFs. Agents can find visual information that text-only tools like grep miss.

In legal domains, PDFs are often the source of truth for contracts. In e-commerce, product images are often the source of truth.

Conclusion: Give AI the Best Tools

Capable agents don't mean semantic search is dead. It means agents need better search tools. Combining mgrep with grep gives agents a more powerful, efficient way to understand a codebase.

Conclusion slide from the presentation
The key takeaways are:
  • Agentic search needs semantic search. Relying on grep alone is slow, expensive, and leads to lower-quality results for complex tasks.

  • Better tools lead to better agents. mgrep improves agent performance by reducing token usage, increasing speed, and providing more relevant context.

  • The future is multimodal. As agents handle more data types, their tools must keep up. Native search across code, PDFs, and images is a significant advantage.

No matter how advanced agents become, their performance is constrained by tool quality. Search is fundamental, and it shouldn't be the bottleneck.

Q&A Highlights

We ended with audience questions:

  • Multilingual Support: Mixedbread's models are multilingual by default, supporting languages like Arabic, Chinese, and Korean. Try the search demo in different languages.

  • Cost: Indexing is priced per token. The full React codebase (6M+ tokens) costs roughly $20. The team is open to discounts at scale.

  • Engineering Optimizations: Low latency comes from quantization research and an optimized end-to-end pipeline. The team plans more blog posts detailing the work.