AboutBlogRSS

Retrieval 101

A practical guide to implementing semantic search for technical content

The Challenge of Finding Relevant Content

Have you ever spent hours writing a blog post only to have it disappear into the void? You're not alone. I spend a lot of time making detailed tutorials, explanations, and code examples, yet much of this content is hard to find.

90% of users never venture past the first page of search results, and most scan only the top 3-5 entries before reformulating their query or abandoning the search altogether. For technical blogs, this problem is even more acute - the specialized vocabulary and conceptual relationships between topics make keyword matching particularly ineffective.

Search solutions that rely solely on finding exact words will often have huge misses. For example my post about custom tags for FastHTML may be be highly relevant to someone looking to use web components but web components is a term never mentioned directly in the blog post!

This disconnect creates a frustrating experience for both content creators and consumers. Content creators have to spend time trying to jam in keywords to make things more discoverable, and readers struggle to find the solutions they're looking for.

The question becomes: how can we make technical content discoverable by meaning rather than just keywords?

What This Post Will Deliver

In this tutorial, you'll learn how to implement a semantic search system using LanceDB. Instead of relying on exact keyword matches, you'll be able to find content based on meaning and conceptual relationships. We will go beyond the hello-world of "vector-search" and do a hybrid vector-search + keyword search approach and then re-rank final results with a cross-encoder. By the end of this tutorial, you'll know what that means and how to implement it.

By the end of this post, you'll have a complete solution that can:

  • Find conceptually related blog posts even when they use different terminology
  • Surface relevant technical content based on the intent behind a search query
  • Provide more accurate and helpful search results to your readers

Here's a quick preview of what we'll build:

# A simple semantic search query that finds related content
query = "How do I make web components?"

results = blog_search.search(query, limit=3)

for result in results:
    print(f"Post: {result.metadata['title']}")
    print(f"Relevance score: {result.score:.2f}")

# Output:
# Post: Basic Testing in Python
# Relevance score: 0.87
#
# Post: Python Tips and Tricks
# Relevance score: 0.72

This isn't just theory - you'll implement this system step-by-step using real blog posts, and I'll show you how to adapt it to your own content. Whether you have a personal tech blog, manage documentation for a larger project, or have anything else you want people to be able to search, this approach will help your valuable content reach the right audience

This is the foundation of a modern retrieval system and I am hard-pressed to thing of an example where you want good semantic search but would not want this as the foundation.

💡 Check out the companion repo for runnable code examples

Vector Embeddings: The Key to Semantic Search

At the heart of semantic search is the concept of vector embeddings. AI models cannot understand words directly so we have to convert everything to numbers to compare them programatically. Let's take a simple example, maybe we give 3 words vector embeddings

  • Cat = [1.5, 2.5, 2.2]
  • Dog = [1.8, 2.6, 2.6]
  • Animal = [1.6, 2.4, 2.4]
  • Bog = [0.1, 0.3, 5.9]

You can see how using just the numbers Cat and Dog are more similar to Animal than they are to Bog. What does each specific number mean? I don't know - they don't map to human language! Similar concepts end up close to each other in this space, even if they use different words. That's what semantic search is.

In a more targetted example, the phrases "how to test Python code" and "writing unit tests for Python functions" might use different words, but their vector embeddings would be very similar because they represent the same concept.

This leads to the question: "How do you pick the numbers for each word, sentence, or article?".

Modern language models like BERT, GPT, and their derivatives can generate these embeddings by processing vast amounts of text and learning the relationships between words and concepts. We can leverage those pre-existing models to generate these embeddings (training a new model to do this is out of the scope of this tutorial).

Initial Solution Attempt: Setting Up LanceDB

fh_tutorial = clean_post(read_url('https://isaac.up.railway.app/blog/blog_post?fpath=posts%2FFasthtmlTutorial.ipynb'))
ms_scratch = clean_post(read_url('https://isaac.up.railway.app/blog/blog_post?fpath=posts%2FMeanShiftFromScratch.ipynb'))
py_tips = clean_post(read_url('https://isaac.up.railway.app/blog/blog_post?fpath=posts%2FPython.ipynb'))
bas_test = clean_post(read_url('https://isaac.up.railway.app/blog/blog_post?fpath=posts%2FBasicTesting.ipynb'))

We'll prepare our blog posts. For simplicity, we'll extract titles and content from our sample posts:

Credit: This post was inspired by Ben Clavie's excellent work on semantic search implementations. His talk on RAG systems at a conference provided many of the core concepts and techniques explored in this tutorial. I highly recommend checking out his original presentation for additional context and perspectives. You will be glad you did - it is well worth the time.

import pandas as pd
from sentence_transformers import SentenceTransformer
import lancedb
posts = [fh_tutorial, ms_scratch, py_tips, bas_test]
blog_data = [{'title': post.split('\n')[0].replace('# ', ''), 'content': post} for post in posts]
df = pd.DataFrame(blog_data)
df
title content
0 Creating Custom FastHTML Tags for Markdown Ren... # Creating Custom FastHTML Tags for Markdown R...
1 MeanShift From Scratch # MeanShift From Scratch\n\nA deep dive on mea...
2 Python Programming Tips # Python Programming Tips\n\nA list of handy t...
3 Introduction To Statistical Testing # Introduction To Statistical Testing\n\nAn in...

The next thing we need is some sort of vector embedding for each post. As mentioned before, we can us a pretrained model for this.

model = SentenceTransformer("sentence-transformers/all-MiniLM-L6-v2")

💡 One place to look for embedding models to use is the MTEB Leaderboard BEIR. Though be careful because many of the models are overfit to the leaderboard.

df['vector'] = df['content'].apply(lambda x: model.encode(x, normalize_embeddings=True))
df.head()
title content vector
0 Creating Custom FastHTML Tags for Markdown Ren... # Creating Custom FastHTML Tags for Markdown R... [-0.0600754, 0.07570859, 0.0017243278, 0.00696...
1 MeanShift From Scratch # MeanShift From Scratch\n\nA deep dive on mea... [0.022285162, -0.03508064, -0.006767927, -0.04...
2 Python Programming Tips # Python Programming Tips\n\nA list of handy t... [-0.017614271, -0.015829531, -0.044863362, 0.0...
3 Introduction To Statistical Testing # Introduction To Statistical Testing\n\nAn in... [-0.0035132996, -0.026535656, -0.10052579, 0.0...

We can load all that into our vector database, lancedb.

db = lancedb.connect("./lancedb-tutorial")
table = db.create_table("blog_posts", data=df ,mode="overwrite")

Now, we're ready to use LanceDB to search. There are a two main steps for this

1. Create an embedding of your query or question

query = "How do I make web components?" # Get a query
query_embedding = model.encode(query) # Create an embedding of it
query_embedding.shape, query_embedding[:3] # 384 numbers representing the query

((384,), array([-0.01018321, -0.05370776, -0.08562952], dtype=float32))

2. Compare Similarity

Now we can search the table to find most similar embeddings

results = table.search(query_embedding).metric('cosine').to_pandas()
results['_distance']

0 0.585938 1 0.810537 2 0.951037 3 0.976564 Name: _distance, dtype: float32

💡 Different models use different distance metrics. This was trained with cosine similarity per the huggingface model card so I matched to that.

3. See Results

Now we can see which post is the most similar to the query based on those embeddings

for _, row in results.iterrows():
    print(f"Post: {row['title']}")
    print(f"Distance: {row['_distance']:.2f}")
Post: Creating Custom FastHTML Tags for Markdown Rendering
Distance: 0.59
Post: Python Programming Tips
Distance: 0.81
Post: Introduction To Statistical Testing
Distance: 0.95
Post: MeanShift From Scratch
Distance: 0.98
assert not 'web component' in fh_tutorial.lower()

This basic implementation allows us to search our blog posts semantically. When we run a query like ""How do I make web components?", it will find the revelant post even though web component is not mentioned in the tutorial directly.

However, this approach has a lot of limitations.

Why the Initial Approach Isn't Enough

Our simple semantic search implementation works, but it has several limitations that make it inadequate for serious technical content:

  1. Full document embeddings lose detail: When we embed entire blog posts, we're compressing thousands of words into a single 384-dimensional vector. This means specific technical details get lost.

  2. Code blocks get mixed with text: Technical blogs contain a mix of explanatory text and code examples. Our current approach treats both the same way, diluting the search quality.

  3. No re-ranking: We're using a simple bi-encoder approach with cosine similarity, but as Ben Clavié explains, this misses the nuance that cross-encoders can provide.

Chunking

To address the limitations of our initial approach, we need to implement a more sophisticated solution. Following Ben Clavié's recommendations, we'll improve our system by:

  1. Chunking our documents by markdown sections
  2. Adding keyword search capabilities (BM25)
  3. Implementing a re-ranking step

Let's start with the chunking strategy, which will help us preserve more context and detail:

There are a plethora of chunking strategies you can test, but lets use the most obvious one. Let's use the chunks defined by the author of the article, and split on markdown headers.

Note: There's lots of discussions on optimal chunk length, and overlap, and different ways to split it. I recommend starting with what makes sense, and then try out different ones.

def chunk_by_markdown_sections(markdown_text, min_length=250):
    """Split markdown text into chunks based on header sections."""
    lines = markdown_text.split('\n')
    chunks = []
    current_chunk = []
    current_title = "Introduction"
    
    for line in lines:
        if line.startswith('#'):  # New header found
            # Save previous chunk if it's substantial
            if current_chunk and len('\n'.join(current_chunk)) >= min_length:
                chunks.append({'title': current_title, 'content': '\n'.join(current_chunk)})
            
            # Start new chunk
            current_title = line.lstrip('# ')
            current_chunk = [line]
        else:
            current_chunk.append(line)
    
    # Add the final chunk if it exists and meets length requirement
    if current_chunk and len('\n'.join(current_chunk)) >= min_length:
        chunks.append({'title': current_title, 'content': '\n'.join(current_chunk)})
    
    return chunks
chunk_by_markdown_sections("""# Markdown\n# Chunked\n## Based on markdown\n## Headers for RAG""", min_length=1)

[{'title': 'Markdown', 'content': '# Markdown'}, {'title': 'Chunked', 'content': '# Chunked'}, {'title': 'Based on markdown', 'content': '## Based on markdown'}, {'title': 'Headers for RAG', 'content': '## Headers for RAG'}]

Now let's apply this chunking function to our blog posts:

all_chunks = []
for i, row in df.iterrows():
    for chunk in chunk_by_markdown_sections(row['content']):
        all_chunks.append({
            'post_title': row['title'], 
            'chunk_title': chunk['title'],
            'content': chunk['content'],
            'vector': model.encode(chunk['content'], normalize_embeddings=True)})

# Create DataFrame with all chunks
chunks_df = pd.DataFrame(all_chunks)

The most important thing to learn from this entire guide is that when you do things, you should look at your data. Don't assume things were right. Don't assume your idea made sense. Print it out and look!

for i,r in chunks_df.iterrows():
    print(r.content)
    print("-----\n")
    if i==2: break
# Intro

This post will cover how to render markdown using zero-md in FastHTML in a practical example. This includes:

  * Defining a custom HTML tag in FastHTML
  * Using external CSS and javascript libraries with FastHTML
  * Adding CSS styling
  * Organize UI into columns

In this tutorial we will convert a markdown of an early lesson in the boot.dev curriculum and a fake conversation between a student and a chatbot about the lesson to HTML. Boot.dev is an online learning platform that offers self-paced, gamified courses for back-end web development.

-----

# Markdown With Zero-md

[code]

    # Import style 1 
    from fasthtml.common import *
    from functools import partial
    
    # Import style 2
    from fasthtml.core import P, Script, Html, Link, Div, Template, Style, to_xml
    from fasthtml.components import show
[/code]

In FastHTML we can use the `P` function to put text in a paragraph `<p></p>` tag (a common way of displaying text). However, markdown is not rendered properly and is hard to read!

While text can be read without styling, markdown has headers, code, bullets and other elements. So we need something more than just a regular text rendering.

We need to convert markdown formatting into a format that HTML understands. We can use a javascript library called zero-md to do this, but this tag does not have a function in FastHTML. There are still two options for using this tag in FastHTML.

> ### 💡 What is zero-md?
>
> In web development, HTML defines the general structure of a web page. However, HTML alone is usually not sufficient. Javascript allows us to extend what we can do beyond out-of-the-box HTML. `zero-md` is a Javascript library that adds functionality for displaying markdown content that we can use with an HTML tag.

The first option is to write the HTML in a text string and use that.

[code]

    NotStr(f'''<zero-md><script type="text/markdown">{lesson_content}</script></zero-md>''')
    
[/code]

> ### 💡 Tip
>
> `NotStr` is a FastHTML function designed for passing a string that should be executed as HTML code rather than a string. In the example above, because `NotStr` is used, FastHTML will treat it as HTML code rather than a Python string. If we removed the `NotStr`, all the HTML tags would be displayed on the page just as they are written rather than being rendered nicely for your web application.

This is fine for very simple things, but the more you build, the messier and harder it gets to work with. It is better to create a FastHTML style tag that works just like everything else. It's incredibly simple to create a custom tag. By importing from `fasthtml.components` the HTML tag will be created automatically (defined in the module's `__getattr__`).

[code]

    from fasthtml.components import Zero_md
[/code]

Now that we have our custom tag defined, we can use that with the `<script>` tag (included in FastHTML) to apply the formatting per the zero-md documentation. For now, we will use the defaults and do nothing with CSS (more details on this later).

[code]

    def render_local_md(md, css = ''):
        css_template = Template(Style(css), data_append=True)
        return Zero_md(css_template, Script(md, type="text/markdown"))
    
    lesson_content_html = render_local_md(lesson_content)
    print(to_xml(lesson_content_html))
[/code]

[code]

    <zero-md><template data-append>    <style></style>
    </template><script type="text/markdown"># Startup bug
    
    A new startup has a bug in its server code. The code is supposed to print messages indicating the server has started successfully.
    
    ## Challenge
    
    Fix the 2 errors in the code and get it to run!
    
    ```python
    print(&quot;Starting up server...&#x27;)
    prnt(&quot;local server is listening on port 8080&quot;)
    ```</script></zero-md>
    
[/code]

The last thing we need to do is load zero-md from a CDN. We can do this by adding a `<script>` tag to the `<head>` of the HTML, and it all works!

[code]

    with open('static/_readme.md') as f: lesson_content = f.read()
    
    zeromd_headers = [Script(type="module", src="https://cdn.jsdelivr.net/npm/zero-md@3?register")]
[/code]

`Html(*zeromd_headers, lesson_content_html)`

-----

# Markdown Conversation Bubbles

We will start with default DaisyUI chat bubbles. For many types of conversations this is fine, but for this use case we need markdown to render properly for code snippets and structural elements.

> ### 💡 Note
>
> This part of the tutorial picks up where the step-by-step the DaisyUI example in the FastHTML documentation leaves off. For more information, start there!
[code]

    #loading messages
    import json
    with open('static/conversation.json') as f:
        messages = json.load(f)['messages']
[/code]

[code]

    # Loading tailwind and daisyui
    chat_headers = [Script(src="https://cdn.tailwindcss.com"),
               Link(rel="stylesheet", href="https://cdn.jsdelivr.net/npm/daisyui@4.11.1/dist/full.min.css")]
[/code]

We re-use the code from the daisyUI example with one change. We are using the `render_local_md` function we defined.

[code]

    # Functionality identical to Daisy UI example linked above
    def ChatMessage(msg, render_md_fn=lambda x: x):
        md = render_md_fn(msg['content'])
        return Div(
            Div(msg['role'], cls="chat-header"),
            Div(md, cls=f"chat-bubble chat-bubble-{'primary' if msg['role'] == 'user' else 'secondary'}"),
            cls=f"chat chat-{'end' if msg['role'] == 'user' else 'start'}")
[/code]

Using this, markdown doesn't render properly, causing readability issues.

Instead let's do exactly what we did before with Zero-md. Our markdown renders, however there are some issues with css styles clashing.

[code]

    chat_bubble =Html(*(chat_headers+zeromd_headers), ChatMessage(messages[1], render_md_fn=render_local_md))
[/code]

We can inject CSS styling to handle this issue by telling zero-md to use a template and ignore the default styles to make beautiful properly rendered conversations.

> ### 💡 Tip
>
> CSS allows us to extend what we can do with just HTML by providing a syntax for adding styling to HTML elements in a programmatic way. You may want every header to have a specific text color or every paragraph to have a specific background color. CSS allows us to do that.
[code]

    css = '.markdown-body {background-color: unset !important; color: unset !important;}'
    _render_local_md = partial(render_local_md, css=css)
    chat_bubble = Html(*(chat_headers+zeromd_headers), ChatMessage(messages[1], render_md_fn=_render_local_md))
[/code]

Now that it looks good we can apply this style to all messages

-----

Now we can create a new LanceDB table with our chunked content:

chunk_table = db.create_table("blog_chunks", data=chunks_df, mode="overwrite")

We can then query to find the most relevant chunks instead of entire documents.

results = chunk_table.search(query_embedding).metric('cosine').to_pandas()

# Print results
for _, row in results.iterrows():
    print(f"Distance: {row['_distance']:.2f}")
    print(f"Post: {row['post_title']} : {row['chunk_title']}")
    print('---')
    if _==2: break
Distance: 0.63
Post: Creating Custom FastHTML Tags for Markdown Rendering : Intro
---
Distance: 0.64
Post: Creating Custom FastHTML Tags for Markdown Rendering : Putting it Together
---
Distance: 0.66
Post: Creating Custom FastHTML Tags for Markdown Rendering : Markdown With Zero-md
---

This chunking approach gives us several advantages:

  1. Each vector now represents a more focused piece of content
  2. We preserve the hierarchical structure of the document
  3. We can return specific sections rather than entire posts
  4. We stay within the token limits of our embedding model

In the next section, we'll enhance our retrieval system by implementing a hybrid search approach that combines vector similarity with keyword matching for more accurate results.

Hybrid Search

Now that we've improved our system with chunking, let's implement the next key improvements from Ben Clavié's recommendations:

  1. Adding keyword search (BM25) alongside vector search
  2. Implementing a re-ranking step

Let's start with implementing keyword search to complement our vector search:

import bm25s
import numpy as np
from sklearn.preprocessing import normalize

BM25 is a powerful keyword-based search algorithm that works by analyzing term frequency and document length. We'll use it to complement our vector search by capturing exact keyword matches that semantic search might miss.

tokenized_chunks = [doc.split() for doc in chunks_df['content']]
def hybrid_search(query, top_k=5, vector_weight=0.7):
    """Perform hybrid search using both vector similarity and BM25 keyword matching."""
    # Vector search
    query_embedding = model.encode(query, normalize_embeddings=True)
    vector_results = chunk_table.search(query_embedding).metric('cosine').limit(top_k*2).to_pandas()
    vector_results['vector_score'] = 1 - vector_results['_distance']
    
    # Keyword search with BM25s
    # Create corpus from our chunks
    corpus = chunks_df['content'].tolist()
    
    # Tokenize the corpus and index it
    corpus_tokens = bm25s.tokenize(corpus)
    retriever = bm25s.BM25(corpus=corpus)
    retriever.index(corpus_tokens)
    
    # Tokenize the query and retrieve results
    query_tokens = bm25s.tokenize(query)
    docs, scores = retriever.retrieve(query_tokens, k=len(corpus))  # Get scores for all documents
    
    # Map BM25 scores to our dataframe indices
    bm25_scores = {i: scores[0, idx] for idx, i in enumerate(docs[0])}
    vector_results['bm25_score'] = vector_results.index.map(
        lambda x: bm25_scores.get(x, 0) if x in bm25_scores else 0)
    
    # Normalize BM25 scores
    if vector_results['bm25_score'].max() > 0:
        vector_results['bm25_score'] = vector_results['bm25_score'] / vector_results['bm25_score'].max()
    
    # Combine scores with weighting
    vector_results['combined_score'] = (
        vector_weight * vector_results['vector_score'] + 
        (1 - vector_weight) * vector_results['bm25_score'])
    
    return vector_results.sort_values('combined_score', ascending=False).head(top_k)
query = "How do I make web components?"
results = hybrid_search(query, 6)

for _, row in results.iterrows():
    print(f"Post: {row['post_title']}")
    print(f"Section: {row['chunk_title']}")
    print(f"Relevance: {row['combined_score']:.2f}")
    print("---")
    if _==2: break

Split strings: 0%| | 0/66 [00:00<?, ?it/s]

BM25S Count Tokens: 0%| | 0/66 [00:00<?, ?it/s]

BM25S Compute Scores: 0%| | 0/66 [00:00<?, ?it/s]

Split strings: 0%| | 0/1 [00:00<?, ?it/s]

BM25S Retrieve: 0%| | 0/1 [00:00<?, ?it/s]

Post: Creating Custom FastHTML Tags for Markdown Rendering
Section: Intro
Relevance: 0.26
---
Post: Creating Custom FastHTML Tags for Markdown Rendering
Section: Putting it Together
Relevance: 0.25
---
Post: Creating Custom FastHTML Tags for Markdown Rendering
Section: Markdown With Zero-md
Relevance: 0.24
---

Re-ranking

So far with Vector Search a model takes in a piece of text and creates a representation of that output. This is really fast in practice because all of the documents you want to search against can have these vectors pre-calculated. Their vector representation doesn't change regardless of the user query so we calculated them up front and stored them in LanceDB.

However, cross-encoders make more powerful vector embeddings. It accomplished this by examining each query-document pair in context, which helps it understand nuanced relationships that might be missed by the initial retrieval step. For example, it can better understand when a document answers a question even if it uses different terminology.

The process works in two stages:

  1. We use our hybrid search (vector + BM25) to efficiently retrieve a set of candidate chunks
  2. We then apply the more computationally expensive cross-encoder to re-rank these candidates

This approach gives us the best of both worlds - the efficiency of bi-encoders for initial retrieval and the accuracy of cross-encoders for final ranking.

from rerankers import Reranker
ranker = Reranker('cross-encoder', model='mixedbread-ai/mxbai-rerank-base-v1',verbose=False)
Loading default cross-encoder model for language en
Warning: Model type could not be auto-mapped with the defaults list. Defaulting to TransformerRanker.
If your model is NOT intended to be ran as a one-label cross-encoder, please reload it and specify the model_type! Otherwise, you may ignore this warning. You may specify `model_type='cross-encoder'` to suppress this warning in the future.

💡 A good safe bet (today) is to use Cohere Rerank for an API for reranking. I'm not using this for this blog post because it requires an API key and it's not neccesary for this intro tutorial, but something you should look into!

def search_blog_posts(query, top_k=3):
    "Search blog posts using hybrid search followed by cross-encoder reranking"
    # Get candidates with hybrid search
    candidates = hybrid_search(query, top_k=top_k*2)
    
    # Rerank candidates
    texts = candidates['content'].tolist()
    doc_ids = candidates.index.tolist()
    ranked = ranker.rank(query=query, docs=texts, doc_ids=doc_ids)
    
    # Map scores back to candidates and return top results
    candidates['rerank_score'] = candidates.index.map(
        {r.document.doc_id: r.score for r in ranked.results}.get)
    return candidates.sort_values('rerank_score', ascending=False).head(top_k)
query = "How do I make web components?"
results = search_blog_posts(query)

Split strings: 0%| | 0/66 [00:00<?, ?it/s]

BM25S Count Tokens: 0%| | 0/66 [00:00<?, ?it/s]

BM25S Compute Scores: 0%| | 0/66 [00:00<?, ?it/s]

Split strings: 0%| | 0/1 [00:00<?, ?it/s]

BM25S Retrieve: 0%| | 0/1 [00:00<?, ?it/s]

for _, row in results.iterrows():
    print(f"Post: {row['post_title']}")
    print(f"Section: {row['chunk_title']}")
    print(f"Relevance: {row['rerank_score']:.2f}")
    print("---")
Post: Creating Custom FastHTML Tags for Markdown Rendering
Section: Putting it Together
Relevance: 0.01
---
Post: Creating Custom FastHTML Tags for Markdown Rendering
Section: Markdown With Zero-md
Relevance: -1.34
---
Post: Creating Custom FastHTML Tags for Markdown Rendering
Section: Markdown Conversation Bubbles
Relevance: -2.31
---

While our hybrid search with re-ranking is already a significant improvement, this are still lots of improvements that can be made.

Key Takeaways and Principles

  1. Semantic search isn't magic - It's about transforming text into numbers that capture meaning. These numerical representation are flawed and cannot be relied on exclusively for every type of query.

  2. Domain Knowledge is king - Understanding your specific domain and content type allows you to make intelligent decisions. Be a user of your own system and actually query and see responses. Do this A LOT. This allows you to start with a simple approach, identifying limitations, and systematically addressing them with targeted improvements.

  3. Hybrid approaches outperform single methods - There is no magic answer that fixes all problems. A combination of vector search, keyword matching, and re-ranking provides significantly better results than any single approach alone. But this is just the basics and there are many more things to add based on use-case.

  4. Chunking matters - How you divide your content has a huge impact on retrieval quality and user experience. The chunks should be meaningful units that preserve context while remaining focused enough to be useful. After chunking, look at them to see if they make sense!

  5. The bi-encoder/cross-encoder pattern is widely applicable - The pattern of using a fast but less accurate method for initial retrieval, followed by a slower but more accurate method for refinement is super powerful

  6. Evaluation is essential - Though we didn't cover it in this tutorial, having a way to measure search quality is critical for ongoing improvement. What gets measured getscan be improved.

Next Steps

If you've followed along to this point, you now have a powerful semantic search system for your blog posts. But there's massive room for growth and experimentation:

💡 Check out the companion repo to start experimenting!

Extend Your Implementation

  • Add Metadata Filtering: Enable users to narrow results by programming language, difficulty level, or post type (e.g., "Show me only Python tutorials for beginners").

  • Add Multi-Modal Search: Incorporate images, diagrams, and code snippets in your search index to find visual explanations alongside text (e.g., "Find me posts with architecture diagrams for microservices").

  • Add Evaluation Framework: Build a systematic way to measure search quality with metrics like precision and recall to continuously improve your system. Create implicit evaluation (link tracking) or user feedback systems.

  • Add Query Pre-processing: Identify and prioritize technical terms and entities in user queries to better match domain-specific content (e.g., recognizing "React hooks" as a specific technical concept).

  • Add Query Classification: Detect the intent behind searches to provide more tailored results (e.g., distinguishing between "how to" tutorials vs conceptual explanations).

  • Add Query Expansion: Automatically add related technical terms to queries to improve recall (e.g., expanding "web components" to include "custom elements" and "shadow DOM").

  • Experiment with different chunking strategies: Test different chunking strategies and sizes to find the optimal balance between context preservation and specificity for your content.

  • "More Like This" Functionality: Allow users to find similar content to a post they're already reading, creating a natural exploration path through your technical content.

Resources for Further Learning

Join the Conversation

I'd love to hear about your experiences implementing semantic search:

  • What chunking strategies worked best for your content?
  • Which embedding models performed best in your domain?
  • What unexpected challenges did you encounter?

Share your implementation, ask questions, or suggest improvements in the comments below or reach out on Twitter/X @isaac_flath.

Remember, the field of semantic search and retrieval is evolving rapidly. The techniques we've covered provide a solid foundation, but staying curious and experimental will keep your system at the cutting edge.

Stay Updated

Get notified about new posts on AI, web development, and tech insights.

Let's Connect

© 2025 Isaac Flath • All rights reserved