RAG Assistant

RAG Knowledge Assistant

Upload documents or crawl web pages, then ask questions answered exclusively from your indexed content using hybrid AI search.

Built with FastAPI, Next.js, PostgreSQL + pgvector, LangChain, and OpenAI. Full-stack RAG with streaming responses, real-time pipeline visibility, and interactive search analysis.

Get Started Sign In

How It Works

A production-grade Retrieval-Augmented Generation system that grounds every answer in your documents -- no hallucinations.

Multi-Source Ingestion

Upload PDF and TXT files, or paste any URL. A headless Chromium browser renders JavaScript-heavy pages to extract their full content.

Hybrid Search

Combines vector similarity search (cosine distance) with PostgreSQL full-text search, merged via Reciprocal Rank Fusion for the best of both worlds.

Streaming Answers

Responses are streamed token-by-token via Server-Sent Events. You see the answer forming in real time, not waiting for the full generation.

Pipeline Transparency

Watch every step of the ingestion pipeline live: text extraction, chunking, embedding generation, and vector storage. See the AI thinking.

Search Analysis

Compare semantic, keyword, and hybrid search results side-by-side with relevance scores. Understand why certain chunks are retrieved.

Multi-Tenant Security

Clerk JWT authentication, per-user data isolation, SSRF protection on URLs, rate limiting, and automatic cleanup of inactive sessions.

The AI Pipeline

Every document goes through a multi-stage pipeline before it can answer your questions. Here is exactly what happens at each step.

Document Ingestion

1. Text Extraction

PDF files are parsed page-by-page with PyPDF2. URLs are rendered in a headless Chromium browser via Playwright, executing JavaScript and expanding collapsed sections.

Handles SPAs, dynamic content, and lazy-loaded pages

2. Recursive Chunking

Text is split using LangChain's RecursiveCharacterTextSplitter, which tries paragraph breaks first, then sentences, then words -- preserving semantic boundaries.

1200 chars per chunk, 200 char overlap

3. Embedding Generation

Each chunk is sent to OpenAI's text-embedding-3-small model, which converts it into a 1536-dimensional vector capturing its semantic meaning.

Cached in Redis to avoid redundant API calls

4. Vector Storage

Vectors are stored in PostgreSQL with the pgvector extension. An HNSW index (m=16, ef_construction=64) enables sub-millisecond approximate nearest-neighbor search.

Plus a GIN index on tsvector for full-text search

Query and Retrieval

5. Semantic Search

Your question is embedded into the same vector space and compared against all stored chunks using cosine similarity. Finds conceptually related content even with different wording.

Returns top 15 candidates ranked by vector distance

6. Keyword Search

In parallel, PostgreSQL's full-text search engine matches exact terms, acronyms, and technical identifiers that vector similarity might miss.

plainto_tsquery with ts_rank_cd scoring

7. Reciprocal Rank Fusion

Results from both searches are merged using RRF: each document gets score 1/(k+rank) from each method, then scores are summed. This consistently outperforms either method alone.

k=60, top 5 results selected

8. Grounded Generation

The top chunks are sent to GPT-4o-mini along with conversation history. The system prompt strictly constrains answers to the provided context -- no hallucinations.

temperature=0.1, last 10 messages of history

Tech Stack

Frontend

Next.js 15, React 19, Tailwind

Backend

FastAPI, Python 3.11, LangChain

Database

PostgreSQL 16, pgvector, Redis

AI Models

GPT-4o-mini, text-embedding-3-small

Auth

Clerk (JWT RS256)

Crawler

Playwright (headless Chromium)

Queue

Redis Queue (RQ) + workers

Deploy

Docker Compose, Traefik

Try It Out

Upload a document, ask a question, and explore the pipeline in action.

Get Started

Showcase mode -- data auto-deleted after 10 minutes of inactivity