Designing the Future of Policy-Driven AI: My Ongoing Journey with Semantic Chunking, Blue/Green RAG, and Ollama + pgvector

Chandrasekar Jayabharathy
Aug 14, 2025
2 min read

Where I’m Starting

Right now, I’m deep into an AI project that’s both exciting and challenging.

The mission?

To design and implement an enterprise-grade, policy-aware AI knowledge base that can handle frequent policy updates without any downtime and lay out a roadmap that keeps the system future-ready for the next decade.

We’re not just building a system; we’re building a platform that will support real-time compliance, scalable retrieval, and explainable AI for our organisation.

The Problem We’re Tackling

Every quarter, the policy team delivers a fresh batch of PDFs regulatory changes, new internal guidelines, revised risk procedures.

The traditional approach would mean:

Waiting days for model retraining
Risking downtime during index updates
Accepting inconsistent retrieval accuracy due to naive chunking

Our stakeholders want something better:Always-on AI that instantly reflects the latest policies.

The Architecture I’m Designing

I’m currently finalizing an architecture that blends:

Semantic Chunking : break policies at logical boundaries for meaningful context.
Blue/Green Re-Embedding : maintain two vector stores (BLUE and GREEN), swap the active view instantly.
Local AI Inference : use Ollama to run Mistral (mistral:instruct) and mxbai-embed-large embeddings entirely on-prem.
pgvector Indexing : leverage HNSW for high-speed, high-accuracy semantic search.

The pipeline design:

Ingestion: PDF → semantic chunking → embeddings → inactive table.
Retrieval: Query embedding → search active view → pass results to Mistral.
Update: Re-embed into inactive table → swap → rollback if needed.

POC to Production

I’ve already run a proof of concept where an 80-page policy PDF was ingested, semantically chunked, embedded, and indexed in under two minutes. The swap between BLUE and GREEN vector views took milliseconds delivering truly zero downtime. Retrieval accuracy improved significantly compared to fixed-token chunking. We’re now moving from POC to MVP with three major additions:

Automated re-embedding scripts for streamlined updates
Quality metrics like recall@k and groundedness checks for continuous evaluation
Integration with downstream decision engines to enable instant, policy-driven actions

The Roadmap I’m Building

I’m shaping a 12-month roadmap that includes:

Phase 1 : MVP with semantic chunking + blue/green updates + local inference
Phase 2 : Observability (Prometheus/Grafana), policy change diffing
Phase 3 : Multi-modal ingestion (text, tables, images)
Phase 4 : Hybrid RAG + rule engine integration
Phase 5 : Model versioning and automated regression testing

The goal is to future-proof this platform so it can handle:

New embedding models without full re-engineering
Increased policy update frequency
Regulatory changes demanding higher explainability

If You’re Building Something Similar

Here’s my advice:

Invest in semantic chunking early, it’s worth the extra effort.
Treat your vector store like production code, with staging, rollouts, and monitoring.
Keep your embedding model and LLM separate so you can swap independently.
Automate re-embedding scripts so it’s just another pipeline step.

Why This Project Excites Me

I’m not just solving a technical problem. I’m shaping the operating model for how policy-driven AI systems should work:

Always-on availability
Privacy and compliance with on-prem LLMs
Dynamic knowledge base without retraining overhead
Scalable architecture ready for AI evolution

ArchiDecode

Decode the future of architecture, technology, and distributed systems.