Five engagements, written without the consultancy gloss. What broke, what we built, the AWS services we used, and the number that changed the conversation. Names and figures used with the client's permission.
// case 01FINTECHus-east-1 · eu-west-1 · af-south-111 weeks
Hivecart Pay
Active-active multi-region for an 11M-transaction-per-month payment platform — without changing a single line of application code.
// the challenge
A West-African payments processor was running on a single us-east-1 stack. Their largest merchant — a logistics platform processing 4.2M of those 11M monthly transactions — refused to renew a $1.2M ARR contract without a contractual multi-region SLA. The application team had no spare capacity for a refactor, and the merchant's deadline was eight weeks out.
// what we built
We added eu-west-1 as a true active-active second region behind a latency-routed Route 53 setup. State moved to DynamoDB Global Tables (3-region replication, including a read-only replica in af-south-1 for South African merchants). S3 cross-region replication carried receipts and webhook payloads. EventBridge global endpoints kept idempotency keys consistent across regions.
The application code itself didn't change — only its IAM role and a single config flag. We ran two intentional regional failover fire-drills before sign-off, both completing inside 90 seconds.
// stack
Route 53API GatewayAWS LambdaDynamoDB Global TablesEventBridgeS3 CRRCloudFrontKMSTerraform
// result
+$1.2M ARR
Contract closed two weeks after go-live. p99 transaction latency held at 117ms across both primary regions. First six months in production: zero unplanned downtime, two intentional regional failovers (fire-drill).
"Fortbees rewrote our deployment topology without touching our deployment artefacts. The merchant signed before we'd even drafted the press release."
// next case
// case 02HEALTHTECHus-east-1 · af-south-13 weeks
Lumen Health
Migrated $84,000-a-month of mostly-idle RDS to Aurora Serverless v2 — without dropping below the latency floor that telehealth depends on.
// the challenge
A telehealth platform serving 200,000 monthly active patients was paying $84,000/month for provisioned RDS Postgres at 100% allocated capacity. Off-peak utilisation sat under 30%; peak rarely cracked 70%. Downsizing risked breaching their 100ms p99 SLA on appointment booking — a regulatory commitment, not just a UX nice-to-have.
// what we built
We migrated to Aurora Serverless v2 over three weeks using zero-downtime DMS replication. ACUs were tuned to absorb the diurnal pattern — 1.0 floor, 16.0 ceiling — with CloudWatch alarms wired to both ends. RDS Proxy went in front to absorb connection bursts from Lambda.
We added a read replica in af-south-1 for the South African user base, which had been suffering 280ms cross-Atlantic round-trips. Performance Insights baselines were captured before, during, and after — published to the engineering channel for transparency.
Monthly database spend cut by 41%. p99 booking latency dropped from 94ms to 71ms. South African users saw p99 fall from 320ms to 84ms. Cutover finished in a 12-minute scheduled maintenance window — versus the 4-hour outage their previous vendor had quoted.
"We were three weeks from a board conversation about cutting a feature to afford the database bill. Fortbees made the conversation unnecessary."
// next case
// case 03LOGISTICS · IOTaf-south-1 · us-east-114 weeks
Routelane
Replatformed a logistics-tracking monolith from EC2 cron jobs to an event-driven AWS pipeline. Same vehicles. Triple the fleet. Two-thirds the cost per vehicle.
// the challenge
Routelane was tracking 8,000 commercial vehicles across Nigeria, Ghana, and Kenya on a Django monolith that polled vehicle modems every 90 seconds. The system couldn't accept a 14,000-vehicle expansion contract — at the existing per-vehicle compute cost the customer would be operating at a loss above 9,500 vehicles, and database lock contention was already throwing intermittent 5xx errors at peak.
// what we built
We rebuilt the ingestion path as event-driven. Vehicle modems publish MQTT to AWS IoT Core (af-south-1 for the West African fleet). The IoT rules engine routes telemetry into a Kinesis stream. A Lambda consumer writes hot state to a DynamoDB single-table design — one row per vehicle, sparse GSIs for region and status queries.
A second Lambda fan-out writes raw events to S3 in Parquet for Athena. OpenSearch indexes a rolling 30-day window for the dashboards customer-success uses. The Django monolith stays — but only as a thin web tier reading from the new stores.
// stack
AWS IoT CoreKinesis Data StreamsAWS LambdaDynamoDBOpenSearchAthenaS3 + ParquetGlue CrawlersTerraform
// result
14k vehicles, −62% unit cost
Scaled to 14,000 vehicles inside the same architecture (validated to 22,000 in load tests). p95 location-update latency dropped from 41 seconds to 780ms. Infrastructure cost per vehicle per month fell 62%. The expansion contract closed at $4.4M ARR.
// architecture, in one breath
01. Modem → MQTT → IoT Core (af-south-1)
02. IoT Rule → Kinesis Data Stream
03. Lambda hot-path → DynamoDB single table
04. Lambda fan-out → S3 Parquet → Athena
05. OpenSearch sink → CS dashboards
// next case
// case 04FINTECH · COMPLIANCEaf-south-1 · eu-west-1 (DR)9 weeks
Aro Microfinance
Stood up an NDPA-compliant multi-account landing zone in nine weeks — with the audit completed at first attempt.
// the challenge
A Lagos-based microfinance lender serving 340,000 customers had grown out of a single AWS account. Dev, staging, and prod all shared one blast radius — cross-team incidents averaged six per quarter (over-broad roles, accidental prod access, secret leakage). Nigeria's Data Protection Act 2023 audit window opened in 90 days. "We'll fix it later" stopped being a viable answer.
// what we built
We deployed AWS Control Tower with a custom OU structure — Security, Workloads/Prod, Workloads/Non-Prod, Sandbox, Suspended. Service Control Policies enforced data residency to af-south-1 for customer PII, blocked unencrypted S3 entirely, and prevented IAM policy changes outside CloudFormation. CloudTrail and AWS Config aggregated org-wide into a dedicated Audit account.
IAM Identity Center replaced 47 long-lived IAM users with SSO and time-limited assumed roles. KMS keys with mandatory rotation; Secrets Manager replaced .env files in CI. GuardDuty + Security Hub gave the security lead a single pane of glass to bring into the audit meeting.
NDPA audit passed at first attempt — auditor specifically commended the org-wide CloudTrail aggregation and least-privilege IAM design. Cross-team blast-radius incidents dropped from six per quarter to zero over the six months following go-live. New-engineer onboarding fell from 2 days to 90 minutes.
"We'd been told a NDPA-ready landing zone took six months. Fortbees did it in nine weeks and the audit was the easiest meeting we had that quarter."
// next case
// case 05EDTECH · COSTus-east-16 weeks
Kira Learning
Cut a $187,000-a-month bill by 37% in six weeks — without slowing down a single learner.
// the challenge
A K-12 EdTech serving 1.4M students was running EKS workloads with 60% headroom-by-default and a $187k/month AWS bill that had grown faster than revenue for three quarters. NAT Gateway data transfer alone accounted for 28% of the bill — internal S3 traffic had been routing through NAT for years. Finance had floated a parent-facing price increase. The CTO wanted a smaller bill instead.
// what we built
We started with the Cost & Usage Report and built a QuickSight dashboard finance and engineering could read together — a deliverable in itself. Phase 1: VPC Endpoints for S3, ECR, CloudWatch Logs, and SSM eliminated the bulk of NAT egress. Phase 2: Karpenter replaced the cluster autoscaler on EKS, bin-packing pods onto Spot with a 5% on-demand floor for stateful workloads.
Phase 3: Savings Plans purchased against the new, post-optimisation baseline — not the old inflated one (a common mistake that locks in the waste). Compute Optimizer informed RDS and ElastiCache rightsizing.
$69,000/month in savings — 37% of the bill — locked in within six weeks. NAT data-transfer line item dropped 94%. Average EKS node utilisation moved from 41% to 73%. Zero customer-visible latency regressions across 14 monitored endpoints. The price-increase conversation got shelved.
// where the savings came from
NAT Gateway egress−$31.2k
EKS rightsizing + Spot−$22.8k
Savings Plans coverage−$11.4k
RDS / ElastiCache rightsizing−$3.6k
// across all five
A pattern that keeps showing up.
$5.6M
In contract revenue unlocked or annual cost saved across the five engagements above.
43 wks
Total delivery time, summed. Average per engagement: 8.6 weeks.
0
Unplanned production outages caused during cutover or after handover.
5/5
Of the five clients above are still on retainer or have rebooked us.
// next steps
Want to be the next story?
Tell us what's broken, what's expensive, or what your auditor is asking about. A senior engineer reads every email and replies inside one business day.