Horizon LabsHorizon Labs

RAG Implementation Consulting

Most production RAG systems fail not because the LLM is wrong, but because retrieval is. We design RAG architectures end-to-end — chunking and embedding strategies tuned to your data, vector store selection (pgvector, Pinecone, Weaviate, or self-hosted), hybrid search combining semantic and keyword recall, reranking layers, citation-grounded prompts, and the evaluation harnesses that prove the system actually works on real queries. Built for Australian compliance contexts where data residency, Privacy Act obligations under APP 11, and auditability are not optional. Runs in your VPC if needed; portable across model providers; no vendor lock-in.

What you get

Retrieval evaluation harnesses that measure recall@k on your actual queries
Hybrid retrieval combining dense embeddings, BM25, and reranking
Citation-grounded outputs — every claim backed by a retrieved source
AU data residency and APP 11 compliance built into the architecture

Real examples

Internal knowledge retrieval

Illustrative scenario: an Australian financial services firm with ~40,000 policy documents needs employees to query in natural language. RAG over Confluence + SharePoint + internal wikis with role-based access controls and an audit log of every retrieval.

Customer-facing answer engines

Illustrative scenario: an enterprise SaaS replacing a static documentation search with a RAG-powered answer interface. Citations to source articles preserved on every response; deflection metrics measurable against support ticket volume.

Common questions

Why not just use ChatGPT with file upload?

Consumer LLM tools don’t give you control over chunking, retrieval evaluation, residency, or citation. Production RAG needs reproducible retrieval, evaluation harnesses, and infrastructure that runs inside your security boundary. We build that.

Which vector database do you recommend?

Depends on scale and ops appetite. pgvector for teams already on Postgres and under ~10M chunks. Pinecone or Weaviate for larger workloads or when you need managed scaling. We benchmark options against your actual query patterns before committing.

How do you evaluate RAG quality?

Two layers: retrieval evaluation (recall@k, MRR on a labelled query set) and end-to-end evaluation (faithfulness, answer relevance, citation accuracy via LLM-as-judge with calibration). Every change ships with an eval delta — no ‘looks better to me’.

Can RAG run on-premises or in our VPC?

Yes. We deploy fully self-hosted RAG using open-source models (Llama, Mistral, embedding models from BGE/E5) or routed through Anthropic/OpenAI via Australian regional endpoints depending on your residency requirements.

How long does an RAG implementation take?

Initial production-ready pilot in 6–8 weeks for a single document corpus. Multi-corpus enterprise rollouts take 3–6 months including evaluation harness, observability, and operational runbooks.

Ready to get started?

Tell us about your project and we'll tell you honestly how we can help.

Get in Touch

Let's build something intelligent

Tell us about your product challenge. Whether you're launching from scratch, scaling an existing product, or need AI capabilities — we'll tell you honestly how we can help.

First conversation is free, no obligations. If there's a fit, we'll scope a small first step so you can see results before committing to anything bigger.