Horizon LabsHorizon Labs

OpenAI GPT

OpenAI's GPT family is part of our toolkit but not our default. Where it wins: broad general-purpose reasoning and creative generation (the GPT-5 line leads on creative writing), strong multimodal vision, native image generation (GPT Image) and speech (Whisper) in the same SDK, and clients with existing OpenAI / Azure OpenAI commitments where switching would create unnecessary procurement friction. Where we hesitate: long-context RAG and agentic coding (Claude consistently outperforms on our eval suites), instruction stability over time (OpenAI deprecates and swaps models aggressively — GPT-4, GPT-4o and GPT-4.1 were all retired within months of each other), and pricing for the highest-tier models when Claude Sonnet or Gemini Flash delivers similar quality at lower cost. We route per task, not per vendor.

What you get

Broadest general-purpose model — GPT-5 is a strong default for open-ended reasoning, drafting, and creative generation
Strong multimodal vision — handles diagram, screenshot, and document-image extraction with low error rates
Native image generation (GPT Image) and speech-to-text (Whisper) in one SDK — image + speech workflows without extra vendors
Mature function-calling for structured agent workflows, with the broadest ecosystem support of any vendor
Azure OpenAI route gives Australian-region inference + enterprise compliance tooling that direct OpenAI doesn't

Real examples

Natural-language interfaces in developer-facing tools

Illustrative scenario: an Australian SaaS embeds an in-app assistant. GPT-5 generates SQL queries from natural language, with the generated query previewed before execution. Per-query cost stays under 1 cent at scale via prompt caching + function-calling for structured output.

Multimodal document extraction at scale

Illustrative scenario: an insurer needs to extract structured data from scanned PDFs, diagrams, and form images. GPT-5's vision layer produces structured JSON that flows into the downstream system. Accuracy benchmark beats every OCR-plus-rules pipeline we tested.

Clients with existing Azure OpenAI commitments

Illustrative scenario: a regulated financial services client has provisioned Azure OpenAI capacity already. We build the production RAG and agent system on Azure OpenAI rather than switching to Claude, deferring to procurement reality while maintaining a model-agnostic abstraction layer for future moves.

Common questions

When do you pick GPT over Claude?

Three scenarios. One, general-purpose breadth and creative generation — the GPT-5 line leads on creative writing and open-ended drafting. Two, image generation and vision — GPT Image plus GPT-5's multimodal accuracy cover both in one SDK. Three, when the client already has Azure OpenAI procurement done and switching would slow shipping more than the marginal quality gain justifies. For long-context RAG and agentic coding we still default to Claude.

Why is GPT not your default?

Two reasons. Long-context RAG performance (Claude's 200K window holds up better at the upper end), and instruction stability — OpenAI has shipped behaviour changes that broke production prompts. We've watched it cost teams several days of post-incident debugging. Claude has been more predictable for the workload shape we ship most.

Azure OpenAI vs OpenAI direct?

Azure OpenAI for any client with compliance requirements (Privacy Act, APRA), or when you need Australian-region inference, or when the client already has Azure procurement done. Direct OpenAI for fastest access to new models (Azure typically trails by weeks) and for non-regulated workloads.

Do you fine-tune GPT?

Rarely. GPT fine-tuning is expensive and the maintenance burden is real — every model version change can degrade your tuned model. We default to prompt engineering + RAG for the same outcome. If genuine fine-tuning is needed (rare), we'd often choose an open-source model on dedicated infrastructure instead.

How do you control GPT API costs?

Same playbook as Claude. Route to a smaller GPT-5 mini/nano tier for high-volume / lower-stakes tasks. Use prompt caching aggressively. Set hard per-day cost ceilings via the OpenAI dashboard. Set up evaluation harnesses so you catch a cost regression in days, not weeks.

Ready to get started?

Tell us about your project and we'll tell you honestly how we can help.

Get in Touch

Let's build something intelligent

Tell us about your product challenge. Whether you're launching from scratch, scaling an existing product, or need AI capabilities — we'll tell you honestly how we can help.

First conversation is free, no obligations. If there's a fit, we'll scope a small first step so you can see results before committing to anything bigger.