Horizon LabsHorizon Labs

LangChain & LlamaIndex

LangChain and LlamaIndex are useful prototyping frameworks — fast enough to stand up a candidate RAG or agent architecture and pressure-test whether it'll hold under real data. But they're opinionated, fast-moving, and full of abstractions that don't always survive the trip to production. Our pattern: prototype with LangChain or LlamaIndex to validate the architecture, then strip back to direct API calls for the load-bearing pieces. Prototyping quickly is the easy part; what separates a working pilot from a production system that runs at 3am without anyone watching is the evaluation harness, the observability, and the operational discipline that lives outside any framework.

What you get

Architecture validation — stand up a candidate RAG or agent design and pressure-test it against real data before committing to production patterns
LlamaIndex's data connectors do most of the heavy lifting on enterprise ingestion (SharePoint, Confluence, Notion, etc.)
Agent scaffolding (tool calling, ReAct loops, memory) covered by the framework — production hardening (eval harnesses, guardrails, observability) is still our work to do
Strong evaluation tooling (LangSmith) for tracing and debugging during development
Framework-agnostic deployment — when the architecture is validated, we strip to direct API calls if the abstraction isn't pulling weight

Real examples

Fast RAG prototyping for proof-of-concept engagements

Illustrative scenario: a 4-week paid pilot for a Melbourne-based legal firm. LlamaIndex handles document chunking and embedding, LangChain orchestrates retrieval-and-generation. End-to-end working system in week 1; remaining 3 weeks spent on evaluation harnesses and accuracy tuning.

Agent prototypes before committing to a production architecture

Illustrative scenario: a growing retailer wants to automate customer service triage. We build the agent loop in LangChain to validate the workflow shape — tools, prompts, escalation paths — then port the production version to direct Claude API calls with our own orchestration to remove framework risk.

LangSmith for production observability

Illustrative scenario: an established RAG system shipping inconsistent answers. We instrument with LangSmith for tracing without rebuilding — every retrieval step, every prompt, every model call is visible. Root cause identified in days, fix shipped in a week.

Common questions

When do you NOT recommend LangChain?

Three cases. One, simple single-step RAG over one corpus — direct API calls are cleaner and have fewer dependencies. Two, anything where the abstraction overhead matters for latency (sub-1-second response time targets). Three, projects where the team will maintain the code long-term but can't keep up with LangChain's API churn — the framework has shipped breaking changes too often for that to be a safe bet.

LangChain or LlamaIndex?

Different strengths. LlamaIndex is the better data-loading and indexing layer — its connectors and ingestion patterns save real engineering time. LangChain is the better orchestration layer — chains, agents, tool calling. We routinely use both in the same project, each for what it's best at.

What about LangGraph?

LangGraph (the agent-orchestration sibling) is more mature now and we use it when the agent workflow has genuine branching or stateful loops. For simpler agents we still write the orchestration ourselves — fewer dependencies, easier to reason about, no framework upgrades to track.

How do you handle LangChain's frequent breaking changes?

Pin versions aggressively. Treat the LangChain version as a deliberate infrastructure choice, not a 'latest' default. Upgrade only when there's a concrete reason and run the full eval suite afterwards. This is part of why we strip framework dependencies from load-bearing code — fewer surfaces to break on upgrades.

Can you migrate an existing LangChain prototype to production?

Yes — this is a common engagement shape. We audit the existing system, identify which framework pieces are pulling weight (LlamaIndex data loaders, LangSmith tracing) and which are unnecessary abstraction. The production version keeps the genuinely useful parts and replaces the rest with direct API calls and our own evaluation harnesses.

Ready to get started?

Tell us about your project and we'll tell you honestly how we can help.

Get in Touch

Let's build something intelligent

Tell us about your product challenge. Whether you're launching from scratch, scaling an existing product, or need AI capabilities — we'll tell you honestly how we can help.

First conversation is free, no obligations. If there's a fit, we'll scope a small first step so you can see results before committing to anything bigger.