mgrep with Founding Engineer Rui Huang
mgrep with Founding Engineer Rui Huang
The Problem with grep for AI Agents
I hosted a talk with Rui Huang, a founding engineer at Mixedbread, about
mgrep, a tool now essential to my workflow.
I sync every repo locally, automatically, all the time. I've seen companies replace production semantic search with Mixedbread's cloud service because it finds more relevant results and ranks them well.
Rui's talk focused on using their semantic search technology, specifically mgrep, to build more effective coding agents. I was excited to hear from him directly, since he's on the engineering team building the product.
This post covers key points from our conversation: the limits of tools like grep, how mgrep provides a semantic alternative, and the multi-vector and multimodal technology that powers it.
For an even deeper dive on what makes mgrep work and the research that went into it, see Most RAG Systems Have a Context Problem
The Problem with grep for AI Agents
The "RAG is dead" headlines are bombastic, but common. The argument
goes: modern AI agents are so powerful they don't need fancy semantic
search. They can use basic tools like grep to find context with better
results than semantic search.
This argument usually targets older semantic search methods.
1. It's Slow and Expensive.
grep is a pattern-matching tool. For a high-level task like "refactor
the authentication flow," an agent must guess keywords
(authentication, middleware, credentials) and run multiple tool
calls. That increases latency and token consumption, stuffing
the context window with noise.
2. It Degrades Quality.
More tool calls fill the context window with partial information. The original intent gets diluted. Rui pointed out this is when you see agents hallucinate or get stuck, responding with phrases like "You're absolutely right" without making progress.
I've found this true in my own work. Faster feedback loops matter. When an agent uses fewer tokens and gets to the point faster, I can iterate and stay focused.
Introducing mgrep: Semantic Search for Code
Mixedbread's solution is mgrep, a command-line tool designed to be a semantic version of grep. Instead of matching patterns, it understands the intent behind a natural language query.
The Semantic Query
mgrep "how streaming is implemented"
The tool returns a list of relevant files with specific line numbers and a similarity score, which helps the agent gauge the confidence of each result.
The impact is not subtle. In Mixedbread's internal tests with Claude on
complex coding tasks, mgrep delivered a large performance advantage:
-
53% fewer tokens used
-
48% faster response times
-
3.2x better quality responses
Live Demo: mgrep vs. grep
Rui demoed a playground that runs Claude side-by-side: standard grep vs. mgrep. The task was to query the React codebase (over 6 million tokens) with the question: "Explain how the useEffect hook works in common patterns."
Try the playground yourself to see the difference without any setup.
Getting Started: Setup & Terminal Demo
mgrep syncs a local directory to a cloud-backed search index. Rui walked through the setup.
First, install the tool via npm:
npm install -g @mixedbread/mgrep # or pnpm / bun
Then, log in to connect to your Mixedbread account:
mgrep login
Once set up, you can sync any directory. Rui used Andrej Karpathy's
nanogpt repository as an example. The watch command indexes the current
folder, respects .gitignore, and syncs to a remote Mixedbread store.
mgrep watch
Once synced, you can query your code with natural language. mgrep also
includes a -a (--answer) flag that uses an LLM to return a direct
answer with citations.
Integrating mgrep with Coding Agents
While you can use mgrep manually, its core usage is in agent integration. Mixedbread provides plugins for popular coding agents like Claude Code.
A simple install command configures the agent to be aware of mgrep and how to use it.
mgrep install-claude-code
This command sets up the necessary skills and prompts. The mgrep watch process runs in the background during an agent session, keeping the index up to date.
These plugins are transparent. They're wrappers around the mgrep CLI with a pre-written prompt. You can customize behavior by creating your own prompts in your agent's configuration files.
Under the Hood: Multi-Vector & Multimodal Search
mgrep is powered by the Mixedbread Search API. When you sync files, they're sent to a Mixedbread store where the pipeline takes over.
The key innovation is multi-vector search. Traditional semantic search creates one vector per chunk. Mixedbread represents every word as its own vector, creating a richer, more granular representation. Advanced quantization techniques make this approach scalable and affordable.
This system is also multimodal. It can natively index and search images, videos, audio, and PDFs without transcribing them to text. Rui demoed searches like "sad cat" and "angry cat."
In legal domains, PDFs are often the source of truth for contracts. In e-commerce, product images are often the source of truth.
Conclusion: Give AI the Best Tools
Capable agents don't mean semantic search is dead. It means agents need
better search tools. Combining mgrep with grep gives agents a more
powerful, efficient way to understand a codebase.
-
Agentic search needs semantic search. Relying on grep alone is slow, expensive, and leads to lower-quality results for complex tasks.
-
Better tools lead to better agents.
mgrepimproves agent performance by reducing token usage, increasing speed, and providing more relevant context. -
The future is multimodal. As agents handle more data types, their tools must keep up. Native search across code, PDFs, and images is a significant advantage.
No matter how advanced agents become, their performance is constrained by tool quality. Search is fundamental, and it shouldn't be the bottleneck.
Q&A Highlights
We ended with audience questions:
-
Multilingual Support: Mixedbread's models are multilingual by default, supporting languages like Arabic, Chinese, and Korean. Try the search demo in different languages.
-
Cost: Indexing is priced per token. The full React codebase (6M+ tokens) costs roughly $20. The team is open to discounts at scale.
-
Engineering Optimizations: Low latency comes from quantization research and an optimized end-to-end pipeline. The team plans more blog posts detailing the work.