---
title: Brain Memory Hybrid Search
category: product
entity_type: skill
price: $15
canonical: https://forgehouse.ai/skills/brain-memory-hybrid-search/
lang: en
hreflang_alt: https://forgehouse.ai/tr/skiller/brain-memory-hybrid-search/
last_updated: 2026-06-20
---

# Brain Memory Hybrid Search

> Bir agent'in memory/bilgi korpusu icin BM25 (lexical) + pgvector (semantic) hibrit arama, RRF skor birlesimi…

A complete recipe for a hybrid memory-search endpoint that combines BM25 lexical search with pgvector semantic search and fuses them with Reciprocal Rank Fusion, returning a diverse top-5. It recalls thousands of accumulated notes through one RAG endpoint with a sub-200ms P95 target, and injects results into agent context. Where exact term match wins it uses BM25, where intent matters it uses vectors, and the fusion beats either alone.

## Use cases
- Injecting relevant past notes into agent context at the start of a task
- Standing up a hybrid RAG endpoint on Supabase with pgvector plus tsvector
- Migrating loose JSON memory files into an indexed Postgres table
- Tracking how often a note is recalled to detect self-reinforcing bias
- Planning a re-embed when upgrading the embedding model and detecting drift
- Adding paid-access course content search with row-level security

## Benefits
- Recall that beats single-mode search by combining exact-term and semantic matching
- Fast retrieval with a sub-200ms P95 target via tuned HNSW and GIN indexes
- Balanced context that avoids one-sided bias through source-file and cluster diversity caps
- Near-zero embedding cost and lower latency through batching and a short-TTL query cache

## What’s included
- Full schema with tsvector GIN and pgvector HNSW indexes plus row-level security
- A Reciprocal Rank Fusion search function with diversity filtering by source and cluster
- A Python embedding pipeline with pre-embedding PII masking and token-aware chunking
- A recall API that injects the top-5 into agent context and bumps recall counters asynchronously
- A 12-row anti-pattern table and 8 defensive patterns (cache, async counters, drift checks)
- A worked example for paid-access course search with section-weighted ranking

## Who it’s for
AI engineers and teams building a RAG memory layer who need fast, bias-aware hybrid recall on Postgres for both internal agents and customer-facing knowledge bases.

## How it runs
Every recall query fires two searches at once, lexical and semantic, then fuses the results. What follows is the full pipeline from chunking and masking to the five diverse chunks that land in agent context under 200ms.
1. Chunks each memory file into 512-token windows with 64-token overlap, masks personal data and secrets before any embedding API call, and tags every chunk with its source file and cluster
2. Embeds chunks in batches of 64 and upserts them into Postgres with an embedding version label, alongside an auto-generated full-text index on the same rows
3. At query time, runs keyword search (BM25 over the text index) and vector search (HNSW cosine) in parallel, each returning its top 100 ranked candidates
4. Fuses the two lists with Reciprocal Rank Fusion (k=60), a rank-based merge that needs no score normalization, so a document strong in either modality rises
5. Applies a diversity filter before returning: maximum 2 chunks per source file plus a per-cluster quota, so one old note cannot dominate the injected context
6. Returns the top 5 chunks into the agent context, bumps their recall counters asynchronously, and logs latency against the 200ms budget; chunks recalled too often get flagged for staleness review

## FAQ
### Do I need a dedicated vector database, or will Postgres carry this?
Postgres carries it: pgvector handles the semantic side and tsvector the lexical side, so on Supabase or plain Postgres there's no separate vector store to run. The whole hybrid endpoint lives in one database.

### If semantic search already understands meaning, why bother with BM25 lexical on top?
Because semantic search fumbles exact tokens: IDs, names, error codes, while lexical search misses paraphrases. Reciprocal Rank Fusion blends both rankings so you don't lose precise matches or conceptual ones; that combination is the whole point.

### Does the sub-200ms target hold once my notes reach the millions?
That P95 target is framed around thousands of notes, not millions. At much larger scale you move into index tuning territory, this is a solid starting architecture, not a promise that latency stays flat forever.

## Price
$15, one-time, no subscription. VAT included.

Related guide: [AI and LLM engineering](https://forgehouse.ai/guides/ai-llm-engineering/)
