Air-Gapped RAG Platform
Production Q&A system for regulated enterprise environments
Architected and deployed a production-grade, deterministic RAG system fully deployable in secure air-gapped enterprise environments — zero external API calls. Docling-powered document ingestion, semantic chunking, vector embedding via Sentence Transformers, and vLLM-served LLM inference behind FastAPI microservices.
Tech stack
Problem
Enterprise clients in regulated industries needed LLM-powered document Q&A on sensitive internal data — but couldn't allow any data to leave their network perimeter.
Architecture
Key Engineering Decisions
Why vLLM over TGI: better continuous batching, PagedAttention for memory efficiency, OpenAI-compatible API surface makes swapping models trivial.
Why Elasticsearch over Chroma/FAISS: enterprise-grade reliability, existing ops tooling, hybrid BM25 + dense retrieval out of the box.
Outcome
Successfully deployed in two regulated enterprise environments. Deterministic retrieval pipeline with audit logs for every query-document pair.