Kubernetes Cost Optimisation: How We Reduced Cloud Spend by 40% for a MENA Enterprise
Kubernetes makes it easy to run workloads. It also makes it easy to waste money running them. A systematic approach to finding and eliminating the waste.
Kubernetes abstracts away the underlying infrastructure. This is mostly a feature. One of the places it becomes a problem: cost visibility. When everything is a pod, it's easy to lose track of what's actually running, what it costs, and whether anyone needs it.
In mid-2024 we engaged with a regional e-commerce enterprise to audit their Kubernetes spend. Their monthly cloud bill was consistently over $120,000, had grown 40% year-on-year, and the engineering team had no clear explanation for why. Over eight weeks we brought the bill down to $71,000 — a 41% reduction — without removing any production capability. This is what we found.
The audit methodology
Before making any changes, we spent two weeks in observation mode. The goal was to understand what was running, why it was running, and what it was costing.
Tooling: We deployed Kubecost alongside their existing Prometheus stack. Kubecost attaches cost attribution to every pod, namespace, and workload using cloud provider pricing APIs. It surfaces idle resource costs — the gap between what's allocated (requests/limits) and what's actually used.
Key metrics we tracked:
- CPU and memory utilisation vs requests, by namespace and workload
- Node utilisation and bin-packing efficiency
- Spot vs on-demand split for stateless workloads
- Storage costs (persistent volumes, often overlooked)
- Data transfer costs (frequently the hidden bill)
The audit surfaced five categories of waste. Here they are in order of impact.
Category 1: Over-provisioned resource requests
This was the single largest driver of waste. Resource requests determine how Kubernetes schedules pods — a pod requesting 4 CPU cores will not be placed on a node that doesn't have 4 cores available, even if the pod is actually using 0.3 cores.
The pattern we found: teams had set resource requests conservatively high to avoid OOMKills and CPU throttling. This is reasonable defensive behaviour. The consequence: nodes were only 28% utilised on average — but reported as "full" because requests were consumed.
We ran a 10-day profiling window on every deployment, captured p95 CPU and memory usage, and reset requests to p95 + 20% headroom. Memory limits were kept higher to avoid OOMKills on burst. The result: average node utilisation improved from 28% to 61%, allowing us to run 40% fewer nodes.
Important caveat: Do not reduce resource requests on JVM-based workloads without understanding heap configuration. The JVM allocates memory at startup, not at load. We found several Spring Boot services that would OOMKill if requests were reduced to match observed RSS.
Category 2: Orphaned workloads
The second-largest category was simply things that were running that nobody needed.
Over three years of development and migration, the cluster had accumulated:
- 14 staging deployments that were no longer associated with active feature branches
- 6 "temporary" environments that had outlived the projects they served
- 3 copies of a monitoring stack from a vendor evaluation that was never cleaned up
- 11 CronJobs, 4 of which ran jobs that wrote to databases that no longer existed
Identifying these required conversations with engineering teams, not just tooling. The tooling could tell us which workloads had low traffic; engineers could tell us which ones were actually safe to delete.
We established a namespace lifecycle policy: namespaces require a labelled owner and a last-reviewed date. Namespaces without a review in 30 days trigger a Slack alert to the labelled owner. This is now enforced by an admission webhook.
"We deleted a deployment that had been running for 18 months and nobody noticed. Nobody had looked at it in at least six months. It was spending $2,800 per month." — Platform Lead
Category 3: Node type mismatches
The cluster was running exclusively on general-purpose m5.xlarge nodes. Some workloads were heavily CPU-bound; others were memory-bound; a large number were lightweight stateless services.
We introduced a node pool strategy:
- General purpose pool (m5.xlarge): unchanged for mixed workloads
- Memory-optimised pool (r6i.large): for the three data processing services that were running hot on memory
- Spot pool (mixed instance, c5/m5 family): for all stateless workloads that could tolerate interruption
Moving stateless workloads to spot reduced the per-node cost by 65% for those workloads. Spot interruption handling was implemented via a PodDisruptionBudget on each deployment and a Karpenter consolidation policy. In two months of production operation, we had three spot interruptions — all handled gracefully with zero user impact.
Category 4: Storage waste
Persistent volumes are cheap per GB. They become expensive when provisioned generously and never reviewed. We found:
- 4.2 TB of PersistentVolumes actively attached to running workloads
- 1.8 TB of PersistentVolumes attached to no workloads (orphaned volumes from deleted StatefulSets)
- Average PV utilisation: 34%
The orphaned volumes were straightforward: identify, confirm with owners, delete. The low utilisation was trickier — for databases and message queues, disk is cheap and headroom is safety. We focused on application-level storage: log volumes, cache directories, and scratch space. Reducing these from their over-provisioned baselines recovered meaningful cost.
Category 5: Data transfer costs
Often invisible until it's a line item that makes a finance team ask questions. Data transfer costs in AWS accrue for:
- Traffic leaving a VPC to the internet
- Traffic between AZs
- Traffic to S3 from EC2 outside the same region
We found two patterns:
Cross-AZ traffic: Several services had Kubernetes services that were routing traffic cross-AZ because their endpoints were spread across zones without affinity hints. Adding topologySpreadConstraints and service topology hints reduced cross-AZ traffic by ~60% for those services.
S3 access patterns: A data pipeline was downloading the same reference datasets from S3 on every job run, rather than caching them. Switching to a local EFS mount with a 24-hour refresh reduced S3 GET requests by 94% and transfer costs from $3,200/month to $180/month.
What we built to sustain it
Cost reduction without a maintenance process reverts. We left the client with:
- Kubecost with weekly cost reports emailed to each team lead
- Namespace lifecycle policy enforced by admission webhook
- Resource right-sizing runbook run quarterly against the Prometheus data
- Spot instance playbook covering instance type diversification and graceful interruption handling
- Budget alerts at 85% and 100% of the monthly target, triggering a review meeting
Five things to take away
- Over-provisioned resource requests are the most common source of Kubernetes waste — node utilisation below 50% is almost always recoverable.
- Orphaned workloads are invisible to tooling without human context. Schedule a quarterly "what is this?" review with each team.
- Spot instances for stateless workloads with proper interruption handling typically reduce compute cost by 60–70%.
- Cross-AZ data transfer costs are often invisible until they're significant — topology hints and spread constraints address the root cause.
- Cost reduction without a sustained process is temporary. Build the review cadence into engineering rituals before you finish the project.