Vector Databases & RAG for Semantic Search and Retrieval

Posted by Anonymous and classified in Computers

Written on in English with a size of 206.28 KB

1. Vector Databases — High-Dimensional Embeddings

Store and search high-dimensional vector embeddings. Used in semantic search, similarity search, and RAG pipelines.

Indexing Techniques

  • Flat Index (Brute Force) → accurate but slow.
  • Approximate Nearest Neighbor (ANN) → fast and scalable.
    • Algorithms: HNSW, FAISS, Annoy.
    • f3Q1622KC84AAAAASUVORK5CYII= 8pk5+AsHqPHAAAAAElFTkSuQmCC

3. Retrieval-Augmented Generation (RAG)

Overview

Enhances LLM output by integrating retrieved external knowledge.

  • Reduces hallucination and outdated responses.
  • Improves factual grounding.

RAG Workflow

  1. Indexing: Convert raw data (PDF, HTML, Word) → embeddings.
  2. Retrieval: Retrieve relevant document chunks using similarity search.
  3. Generation: LLM synthesizes results with the query to produce the final answer.

Retrieval Types

TypeDescriptionExample
Sparse (Lexical)Term-based retrievalTF-IDF, BM25
Dense (Semantic)Embedding-basedBERT, SentenceTransformers

Dense Retrieval

  • Uses Cosine Similarity or Euclidean Distance to find nearest neighbors.
  • ANN Algorithms: Graph-based (HNSW), Hash-based (LSH), Clustering-based (IVF-PQ).

Generator Evaluation

MetricDescription
Exact Match (EM)Checks if generated answer equals ground truth
Semantic SimilarityBLEU, ROUGE, BERTScore, Cosine Similarity
Knowledge Gap DetectionAbility to respond with "don't know" when uncertain
Groundedness / FaithfulnessWhether output relies on retrieved information or hallucination

5. RAG Optimization Techniques

Chunking Strategies

MethodDescription
Fixed-LengthSimple, but may cut context
Recursive ChunkingSection → paragraph → sentence
Token-BasedAligns with model tokenization
Overlapping ChunksPreserve sentence continuity

Trade-offs:

  • Small chunks → better retrieval, less context. Large chunks → more context, less precision.

Reranking

  • Improves retrieval accuracy by reordering candidates.
  • Two-Stage Retrieval: Fast retriever → Precise reranker.
  • Time-Based Reranking: Prioritizes recent data.

Empirical Findings

  • Best chunk sizes: 512–1024 tokens. Optimal retrieval: Top-k = 7–9 chunks.

6. Graph RAG — Knowledge Graph + RAG

Limitations of Basic RAG

  • Fails in multi-hop reasoning (connecting related facts).

Graph RAG Concepts

  • Integrates Knowledge Graphs (KGs) for structured retrieval.
  • Nodes = Entities (People, Projects, Concepts, etc.).
  • Edges = Relationships (works_on, uses, related_to, etc.).

Architecture Steps

  1. Graph Construction: Extract triples (Entity A → Relation → Entity B). Example: ("Azure AI Studio" → integrates_with → "OpenAI APIs").
  2. Community Clustering: Groups related entities (Leiden Algorithm).
  3. Hierarchical Summarization: Local → Global summaries.
  4. Query Modes: Local, Global, DRIFT, Dynamic Community Selection.
  5. LLM Response Generation:
    • Map Phase: Local summaries.
    • Reduce Phase: Combine & refine for final output.

Related entries: