Vector Databases & RAG for Semantic Search and Retrieval
1. Vector Databases — High-Dimensional Embeddings
Store and search high-dimensional vector embeddings. Used in semantic search, similarity search, and RAG pipelines.
Indexing Techniques
- Flat Index (Brute Force) → accurate but slow.
-
Approximate Nearest Neighbor (ANN) → fast and scalable.
- Algorithms: HNSW, FAISS, Annoy.
3. Retrieval-Augmented Generation (RAG)
Overview
Enhances LLM output by integrating retrieved external knowledge.
- Reduces hallucination and outdated responses.
- Improves factual grounding.
RAG Workflow
- Indexing: Convert raw data (PDF, HTML, Word) → embeddings.
- Retrieval: Retrieve relevant document chunks using similarity search.
- Generation: LLM synthesizes results with the query to produce the final answer.
Retrieval Types
| Type | Description | Example |
|---|---|---|
| Sparse (Lexical) | Term-based retrieval | TF-IDF, BM25 |
| Dense (Semantic) | Embedding-based | BERT, SentenceTransformers |
Dense Retrieval
- Uses Cosine Similarity or Euclidean Distance to find nearest neighbors.
- ANN Algorithms: Graph-based (HNSW), Hash-based (LSH), Clustering-based (IVF-PQ).
Generator Evaluation
| Metric | Description |
|---|---|
| Exact Match (EM) | Checks if generated answer equals ground truth |
| Semantic Similarity | BLEU, ROUGE, BERTScore, Cosine Similarity |
| Knowledge Gap Detection | Ability to respond with "don't know" when uncertain |
| Groundedness / Faithfulness | Whether output relies on retrieved information or hallucination |
5. RAG Optimization Techniques
Chunking Strategies
| Method | Description |
|---|---|
| Fixed-Length | Simple, but may cut context |
| Recursive Chunking | Section → paragraph → sentence |
| Token-Based | Aligns with model tokenization |
| Overlapping Chunks | Preserve sentence continuity |
Trade-offs:
- Small chunks → better retrieval, less context. Large chunks → more context, less precision.
Reranking
- Improves retrieval accuracy by reordering candidates.
- Two-Stage Retrieval: Fast retriever → Precise reranker.
- Time-Based Reranking: Prioritizes recent data.
Empirical Findings
- Best chunk sizes: 512–1024 tokens. Optimal retrieval: Top-k = 7–9 chunks.
6. Graph RAG — Knowledge Graph + RAG
Limitations of Basic RAG
- Fails in multi-hop reasoning (connecting related facts).
Graph RAG Concepts
- Integrates Knowledge Graphs (KGs) for structured retrieval.
- Nodes = Entities (People, Projects, Concepts, etc.).
- Edges = Relationships (works_on, uses, related_to, etc.).
Architecture Steps
- Graph Construction: Extract triples (Entity A → Relation → Entity B). Example: ("Azure AI Studio" → integrates_with → "OpenAI APIs").
- Community Clustering: Groups related entities (Leiden Algorithm).
- Hierarchical Summarization: Local → Global summaries.
- Query Modes: Local, Global, DRIFT, Dynamic Community Selection.
-
LLM Response Generation:
- Map Phase: Local summaries.
- Reduce Phase: Combine & refine for final output.
English with a size of 206.28 KB