The invoice arrived in January 2024 and the head of technology at a mid-sized Sydney retail business stared at it for a long time. The AWS bill for December had come in at $67,000. The budget had been $42,000. The overage came from a combination of factors that individually seemed minor: a data analytics project that had spun up Redshift clusters for testing and left them running over the Christmas break, an S3 lifecycle policy that had never been configured so archived data sat in expensive standard storage, and a new application environment that a developer had built on large EC2 instances because the team had not established instance sizing guidelines.
Nothing was malicious. Nothing was even careless in the way that word usually implies. It was the normal entropy of a growing engineering organisation in a cloud environment without governance structures. The company spent three weeks investigating the overage, identified $18,000 in monthly waste that could be eliminated immediately, and concluded that the December bill was probably not entirely an anomaly.
The Scale of the Problem
Cloud waste is not a niche issue. Flexera's 2024 State of the Cloud Report found that organisations globally estimated they were wasting 28 percent of their cloud spend — a figure that has remained stubbornly consistent for several years despite growing awareness of the problem. For Australian businesses, where cloud adoption accelerated sharply between 2020 and 2023, the accumulated inefficiency in many environments is significant.
The anatomy of cloud waste follows predictable patterns. Idle and underutilised resources — virtual machines running at single-digit CPU utilisation, databases with negligible connection counts, load balancers with no traffic — are the largest category, typically accounting for 30 to 40 percent of waste. Storage costs compound quietly: snapshots accumulate, old backups are never deleted, data sits in expensive tiers when lifecycle policies would move it to near-zero cost archive storage automatically. And provisioned capacity that was right-sized at deployment often becomes oversized as usage patterns evolve, with no process to trigger a review.
Right-Sizing: Where to Start
The highest-return cloud cost work is almost always right-sizing: identifying resources that are provisioned larger than their workload requires and reducing them to an appropriate scale. AWS, Azure, and Google Cloud all provide native tools — Cost Explorer, Azure Advisor, and GCP Recommender respectively — that analyse utilisation metrics and surface specific right-sizing recommendations with estimated monthly savings.
Marcus Chen, who leads cloud practice at a Sydney-based MSP, runs right-sizing assessments for new clients as standard practice. "The median finding for a client who hasn't done this work before is about 25 percent immediate savings," he says. "It's not uncommon to find development environments running production-sized infrastructure, or test databases on tiers that were specified for peak load that never materialised."
The risk in right-sizing is over-correcting: reducing resources below what peak demand requires, causing performance degradation. The safe approach is to pull at least 30 days of utilisation metrics, identify the 95th percentile load (not the average), and size to that with a modest headroom allowance. For non-production environments that do not require constant availability, automated shutdown schedules — turning dev and test environments off outside business hours — deliver meaningful savings with zero performance impact.
Commitments and Reservations
On-demand pricing is a convenience premium. Organisations running predictable workloads on AWS, Azure, or Google Cloud can reduce compute costs by 30 to 60 percent by committing to one or three-year reserved instances or savings plans. The catch is that commitments are less flexible: if a workload changes significantly, unused reserved capacity is a sunk cost.
The right approach is to establish what proportion of compute capacity is genuinely stable — typically the baseline load that persists regardless of business seasonality — and commit that portion, leaving variable or uncertain capacity on demand. Most organisations can safely commit 50 to 70 percent of their baseline compute. Some commit more aggressively; the analysis requires visibility into planned infrastructure changes, not just current utilisation.
Governance: The Boring Part That Actually Matters
Technical optimisation without governance is a temporary fix. Cloud costs drift back toward waste because the underlying behaviour — engineers creating resources without cost context, projects without budget accountability, no process to review and clean up infrastructure — continues unchanged.
Tagging discipline is foundational. Every cloud resource should carry tags that identify its owner, its purpose, its environment (production, staging, development), and its cost centre. Without tagging, cloud bills are aggregated totals with no way to allocate costs to projects or teams, no way to identify orphaned resources, and no accountability mechanism. Implementing a tagging policy retroactively on a large cloud environment is painful; doing it from the start is trivial.
Budget alerts at the project, team, and account level catch overruns before they become invoices. Anomaly detection — which AWS, Azure, and Google Cloud all offer natively — flags unusual spending patterns in near-real time, typically catching errant workloads within hours rather than at month end.
The organisations that have genuinely brought cloud costs under control share a characteristic: they have made costs visible to the engineers who create them. When a developer can see in real time that the Redshift cluster they left running over Christmas is generating $400 a day in charges, they turn it off. When that information is invisible until the monthly invoice, it is not.