Horizon LabsHorizon Labs

Weights & Biases

W&B is the discipline layer for custom ML work. Every training run logs hyperparameters, metrics, gradients, system stats, and model artefacts — automatically. Three weeks into a project, when someone asks 'why did the model do worse on Tuesday?', W&B has the answer. Model registry tracks every promotion-to-production with full lineage. Sweeps automate hyperparameter search without us writing the orchestration. We use W&B on every PyTorch / TensorFlow project — the operational cost is minimal and the time it saves the next time something goes wrong is enormous.

What you get

Automatic experiment tracking — every training run captures hyperparameters, metrics, gradients, system stats without manual logging
Model registry tracks lineage — every production model has a full chain from data to weights to deployment
Sweeps automate hyperparameter search across grid, random, and Bayesian strategies — no orchestration code
Reports turn experiments into shareable narratives — collaboration with stakeholders without screenshots in Slack
Integrates cleanly with PyTorch, TensorFlow, Hugging Face, SageMaker — one line of code in the training loop

Real examples

Reproducibility audit for a regulated client

Illustrative scenario: a financial services client needs to prove how a deployed model was trained for regulatory audit. W&B has the full lineage — training data version, hyperparameters, code commit, system environment. Audit answered in hours from the registry, not weeks of reconstruction.

Hyperparameter optimisation at scale

Illustrative scenario: a custom NLP classifier project requires sweeping ~200 hyperparameter combinations. W&B Sweeps orchestrates the runs across a GPU cluster, Bayesian optimisation suggests the next combo to try, the best configuration emerges in 3 days instead of weeks of manual experimentation.

Production model monitoring for drift

Illustrative scenario: a deployed model's accuracy is degrading over weeks. W&B logs production inference distributions alongside training distributions; the drift is visible in a chart, triggering automatic retraining. Caught before the business impact reached the executive dashboard.

Common questions

W&B vs MLflow vs TensorBoard?

Different scopes. TensorBoard is just visualisation — fine for solo development, doesn't track across runs or teams. MLflow is closer feature-parity to W&B but the UX is rougher and the model registry is less mature. W&B's collaboration features (Reports, dashboards, team workspaces) make it the better choice when more than one person touches the project. MLflow when self-hosting is mandatory.

Does W&B handle data sovereignty?

W&B Cloud is multi-region but not Australian-region native. For strict residency we deploy W&B Server self-hosted on Australian infrastructure — same product, your VPC. Adds ops overhead; only worth it when the data + experiment metadata genuinely can't leave the boundary.

Is W&B necessary for LLM projects?

Less so. LLM projects with hosted APIs (Claude, GPT) don't have the training-experiment shape W&B is built for — there's nothing to train, so the experiment tracking layer collapses to prompt versioning (which LangSmith handles better). W&B is genuinely important for custom-ML projects with training loops.

How do you keep W&B costs under control?

The free tier is generous for small teams. Production usage at scale: pin run retention (delete experiments older than X months), batch logging frequency (every 10 steps, not every step), and use the file-storage limits to avoid uploading model artefacts you don't need long-term. Total monthly cost is usually negligible compared to the GPU bill.

Can we onboard W&B into an existing custom-ML team?

Yes, it's typically a low-friction adoption. We instrument existing training scripts with W&B logging (a few lines), backfill the most important historical runs where the data still exists, then make it the default for new work. Within a month the team usually wonders how they worked without it.

Ready to get started?

Tell us about your project and we'll tell you honestly how we can help.

Get in Touch

Let's build something intelligent

Tell us about your product challenge. Whether you're launching from scratch, scaling an existing product, or need AI capabilities — we'll tell you honestly how we can help.

First conversation is free, no obligations. If there's a fit, we'll scope a small first step so you can see results before committing to anything bigger.