top of page
Search

Designing the Future of Policy-Driven AI: My Ongoing Journey with Semantic Chunking, Blue/Green RAG, and Ollama + pgvector

  • Writer: Chandrasekar Jayabharathy
    Chandrasekar Jayabharathy
  • Aug 14
  • 2 min read
ree


Where I’m Starting

Right now, I’m deep into an AI project that’s both exciting and challenging.


The mission?

To design and implement an enterprise-grade, policy-aware AI knowledge base that can handle frequent policy updates without any downtime and lay out a roadmap that keeps the system future-ready for the next decade.


We’re not just building a system; we’re building a platform that will support real-time compliance, scalable retrieval, and explainable AI for our organisation.


The Problem We’re Tackling

Every quarter, the policy team delivers a fresh batch of PDFs regulatory changes, new internal guidelines, revised risk procedures.


The traditional approach would mean:

  • Waiting days for model retraining

  • Risking downtime during index updates

  • Accepting inconsistent retrieval accuracy due to naive chunking

Our stakeholders want something better:Always-on AI that instantly reflects the latest policies.


The Architecture I’m Designing

I’m currently finalizing an architecture that blends:

  1. Semantic Chunking : break policies at logical boundaries for meaningful context.

  2. Blue/Green Re-Embedding : maintain two vector stores (BLUE and GREEN), swap the active view instantly.

  3. Local AI Inference : use Ollama to run Mistral (mistral:instruct) and mxbai-embed-large embeddings entirely on-prem.

  4. pgvector Indexing : leverage HNSW for high-speed, high-accuracy semantic search.

The pipeline design:

  • Ingestion: PDF → semantic chunking → embeddings → inactive table.

  • Retrieval: Query embedding → search active view → pass results to Mistral.

  • Update: Re-embed into inactive table → swap → rollback if needed.


ree


POC to Production

I’ve already run a proof of concept where an 80-page policy PDF was ingested, semantically chunked, embedded, and indexed in under two minutes. The swap between BLUE and GREEN vector views took milliseconds delivering truly zero downtime. Retrieval accuracy improved significantly compared to fixed-token chunking. We’re now moving from POC to MVP with three major additions:

  • Automated re-embedding scripts for streamlined updates

  • Quality metrics like recall@k and groundedness checks for continuous evaluation

  • Integration with downstream decision engines to enable instant, policy-driven actions


The Roadmap I’m Building

I’m shaping a 12-month roadmap that includes:

  • Phase 1 : MVP with semantic chunking + blue/green updates + local inference

  • Phase 2 : Observability (Prometheus/Grafana), policy change diffing

  • Phase 3 : Multi-modal ingestion (text, tables, images)

  • Phase 4 : Hybrid RAG + rule engine integration

  • Phase 5 : Model versioning and automated regression testing

The goal is to future-proof this platform so it can handle:

  • New embedding models without full re-engineering

  • Increased policy update frequency

  • Regulatory changes demanding higher explainability


If You’re Building Something Similar

Here’s my advice:

  • Invest in semantic chunking early, it’s worth the extra effort.

  • Treat your vector store like production code, with staging, rollouts, and monitoring.

  • Keep your embedding model and LLM separate so you can swap independently.

  • Automate re-embedding scripts so it’s just another pipeline step.


Why This Project Excites Me

I’m not just solving a technical problem. I’m shaping the operating model for how policy-driven AI systems should work:

  • Always-on availability

  • Privacy and compliance with on-prem LLMs

  • Dynamic knowledge base without retraining overhead

  • Scalable architecture ready for AI evolution


 
 
 

Comments


Never Miss a Post. Subscribe Now!

I'm a paragraph. Click here to add your own text and edit me. It's easy.

Thanks for submitting!

© 2035 by ArchiDecode

    bottom of page