AI Engineer · Toronto, ON

Bhaven
Naik

LLM Infrastructure · RAG Systems · MLOps · AI Platform Engineering

Building production-grade LLM systems, RAG pipelines, and ML infrastructure. 3+ years delivering enterprise GenAI — from data ingestion to scalable inference in regulated and air-gapped environments.

vLLMLangChainHugging FaceElasticsearchFAISSSentence TransformersFastAPIPyTorchTensorFlowAirflowMLflowDocker
Scroll

Building AI that ships to production

Bhaven Naik

Bhaven Naik

AI Engineer

Toronto, ON

AI Engineer with 3+ years of production experience building and deploying end-to-end LLM applications, RAG systems, and scalable ML infrastructure. Deep expertise in Generative AI, LLM orchestration, retrieval-augmented generation, and MLOps. I bridge research and real-world impact — from data ingestion and model development to scalable inference, monitoring, and operationalization in enterprise and air-gapped environments.

I'm particularly passionate about designing robust AI platforms, retrieval-augmented generation systems, and cloud-native ML infrastructure. My focus is on bridging research and real-world impact — building reliable, scalable, maintainable solutions that work in regulated and constrained environments.

Based in downtown Toronto.

Toronto, ON

3+

Years production AI

2

Enterprise RAG systems

~40%

Analyst time saved (GraphRAG)

~50%

Infra deploy time reduced

Master of Applied Computer Science

St. Francis Xavier University · 2020 – 2022

Bachelor of Computer Engineering

University of Mumbai · 2016 – 2020

IBM Machine Learning Essentials

IBM · 2022

Where I've built things

  • Architected a production-grade, deterministic RAG Q&A system using Elasticsearch, vLLM, Sentence Transformers, and Docling — deployed in secure air-gapped enterprise environments.
  • Engineered a GraphRAG-powered due diligence platform using Neo4j knowledge graphs, integrating NER, multi-document summarization, and multi-source ETL via LLM pipelines, cutting analyst research cycles by ~40%.
  • Built full-stack GenAI applications with Angular frontend and FastAPI backend, delivering AI interfaces to enterprise clients.
  • Automated end-to-end ML training workflows using Python, Apache Airflow, and MLflow — enabling reproducible experiment tracking and one-click model promotion.
  • Provisioned scalable cloud infrastructure via Terraform and Ansible on AWS, reducing deployment lead time by ~50%.
  • Designed distributed RL environments using multi-GPU setups to accelerate training pipelines.
vLLMLangChainElasticsearchNeo4jFastAPIAirflowMLflowTerraformDockerKubernetesAWS

Things I've built

Production systems, research projects, and open-source work. NDA-protected systems are described architecturally — the patterns are mine to share.

ProductionFeatured
NDA

Air-Gapped RAG Platform

Production Q&A system for regulated enterprise environments

Architected and deployed a production-grade, deterministic RAG system fully deployable in secure air-gapped enterprise environments — zero external API calls. Docling-powered document ingestion, semantic chunking, vector embedding via Sentence Transformers, and vLLM-served LLM inference behind FastAPI microservices.

vLLMElasticsearchSentence TransformersDoclingFastAPIDockerGitLab CI
Read case study
ProductionFeatured
NDA

GraphRAG Due Diligence Tool

Knowledge-graph-powered M&A document analysis

Knowledge-graph-powered analysis platform for due diligence workflows. Automated NER, entity resolution, multi-source ETL, and multi-document summarization via LLM pipelines. Reduced analyst research cycles by ~40%.

Neo4jLangChainspaCyPythonFastAPIDocker
Read case study
ResearchFeatured

DCGAN Video Augmentation

GAN-based data augmentation for human action recognition

Research implementation of a Deep Convolutional GAN for Human Action Recognition data augmentation on the HMDB51 dataset. Scalable training pipeline with multi-GPU support, benchmarked against pre-trained PyTorchVideo classifiers.

PyTorchPyTorch LightningPyTorchVideoPythonGANs
Read case study
ML Application

Diabetic Retinopathy Classifier

End-to-end medical AI with web deployment on AWS

Fine-tuned VGG16 to classify 5 severity grades of diabetic retinopathy from fundus images. Built a Flask web interface and deployed on AWS EC2 for clinical accessibility.

TensorFlowKerasVGG16FlaskOpenCVAWS EC2
Read case study

What I work with

Generative AI & LLMs

LangChainLlamaIndexvLLMHugging FacePrompt EngineeringFine-tuning

RAG & Search

ElasticsearchFAISSSentence TransformersDoclingGraphRAGNeo4j

MLOps & Infrastructure

AirflowMLflowDVCDockerKubernetesTerraformAnsible

Cloud & DevOps

AWSAzureGitLab CI/CDFastAPIFlaskMicroservices

ML & Deep Learning

PyTorchTensorFlowScikit-LearnReinforcement LearningGANs

Data & Streaming

KafkaPySparkPostgreSQLMongoDBRedisHadoop

Let's build something

Working on something interesting in LLMs, RAG, or ML infrastructure? Always happy to connect with people building in this space — whether it's a technical question, a collaboration, or just a conversation.

Get in touch