Horizon LabsHorizon Labs

GPU Compute (A100 / H100)

Most production AI doesn't need dedicated GPU infrastructure — hosted APIs handle inference fine. But when GPU compute is the right answer, the difference between getting it right and wrong is significant: idle GPU hours cost more per month than entire teams. We provision GPU compute on AWS (P5 instances), GCP (A3), Azure (NDv5), or specialised providers (Lambda, CoreWeave) based on workload economics — sustained-utilisation workloads tip toward dedicated infrastructure; bursty training favours spot instances; experimental workflows go to lower-cost hosted providers. The goal is always utilisation: a GPU running at 80% justifies its cost; one running at 15% doesn't.

What you get

Right-sized GPU selection — A100 for most production fine-tuning, H100 for cutting-edge training or large-batch inference
Spot instances cut training cost 50-70% — viable for any workload that tolerates interruption (most training does)
Multi-region failover for production inference — Sydney + Singapore + US-East routing for latency + availability
Specialised providers (Lambda, CoreWeave, RunPod) for cost-optimal experimentation — often 30-50% cheaper than hyperscalers for non-production
Reserved capacity contracts for predictable workloads — discounts of 30-60% when annual usage justifies it

Real examples

Self-hosted Llama 70B inference on H100s

Illustrative scenario: a sovereign-data client needs a Llama 4 model running in their VPC. We provision dual H100s on AWS P5 with vLLM as the inference server, autoscaling between 1 and 4 GPUs based on request volume. Sustained 60-70% utilisation; cost-effective above ~3M tokens/day inference.

Spot training for a custom recommendation model

Illustrative scenario: a media company trains a recommendation model weekly on their full catalogue. AWS Spot A100s + SageMaker handle the training; W&B tracks runs; checkpoint resumption handles spot interruptions. Cost per training run drops to ~35% of on-demand pricing.

Hybrid GPU strategy across experimentation + production

Illustrative scenario: a research-led AI startup needs cheap GPU for ad-hoc experimentation but reliable infrastructure for production. We split: Lambda / RunPod for daily experimentation (low ops, low cost), AWS P5 reserved instances for production serving (high reliability, predictable cost).

Common questions

When do we actually need dedicated GPU compute?

Three cases. Self-hosted LLM inference at high volume (above ~$5K/month spent on hosted APIs typically tips toward self-hosting). Custom model training (basically always GPU). Cases where data can't leave a sovereign boundary. Outside those, hosted APIs are operationally cheaper and we recommend them.

A100 or H100?

A100 for most production fine-tuning and inference of 7-70B models — cheaper, widely available, mature. H100 when the workload genuinely needs the extra throughput (frontier model training, 100B+ inference, very high request rates). The H100 premium is rarely worth it for the workloads we typically run.

Spot vs reserved vs on-demand?

Spot for any training where interruption is acceptable — usually all of it with proper checkpointing. Reserved for production serving with predictable utilisation — discount of 30-60% justifies the lock-in. On-demand for short-duration burst workloads or experimentation. Most projects end up mixing all three.

Australian-region availability?

Limited and expensive. AWS Sydney has A100s (P4d) but waitlists are real; H100s rare. GCP Sydney has A100s on A2 instances. For training, we often route to US regions (cheaper, more available) since training jobs don't have user-facing latency. For inference with AU customers, Sydney + Singapore is the typical routing — adds latency but keeps data residency in-region.

Specialised providers — Lambda, CoreWeave, RunPod — are they production-ready?

Lambda + CoreWeave yes for production at smaller scale; both have legitimate uptime and support. RunPod and Vast.ai are great for experimentation but we wouldn't put production serving on them. Bigger picture: specialised providers can be 30-50% cheaper than hyperscalers but trade off on tooling integration, compliance certifications, and enterprise support — match the choice to the workload.

Ready to get started?

Tell us about your project and we'll tell you honestly how we can help.

Get in Touch

Let's build something intelligent

Tell us about your product challenge. Whether you're launching from scratch, scaling an existing product, or need AI capabilities — we'll tell you honestly how we can help.

First conversation is free, no obligations. If there's a fit, we'll scope a small first step so you can see results before committing to anything bigger.