Forget Backup and Disaster Recovery... Business Continuity Is Where It's At

Akins IT • August 10, 2016
Connect with us

Earlier this week Delta Airlines’ computer systems suffered a catastrophic failure, leading to the grounding of over 1,000 flights and delays to countless more. Analysts attributed the colossal service interruption to a power outage in Atlanta that crippled its network infrastructure. Given the cobbled-together nature of the airlines systems, it’s not hugely surprising that one of the many points of failure eventually gave way. After all, each time Delta absorbed another airline, the systems had to be integrated together on the fly… pun intended.


But what stands out to me are two main points. The first, that I just can’t seem to ignore, is how on earth the systems for one of the worlds more prominent airlines didn’t have gasoline powered generators to cover at least 24 hours of downtime. I’m just going to leave it at that.



What seems to be a much more common problem, is that Delta just continued to kick the can down the road when it comes to business continuity. In typical old-guard fashion, execs under scrutiny of the board and/or stockholders didn’t make the necessary investments in infrastructure to ensure that if a failure did take place, a reasonable RTO (Recovery Time Objective) could be achieved. It’s not as pretty as a new fleet of planes or fancy accommodations in the admiral’s club, but when it comes down to overall customer satisfaction, ensuring that your systems are available will always reign supreme.


In today’s world, merely having a backup of your data won’t cut it. Businesses are no longer able to tolerate an interruption to their technology services and ensuring that systems can fail-over within minutes is the new standard. Hopefully the failures of titans like Delta can teach smaller companies valuable lessons in how to properly allocate funds for IT; specifically, when it comes to the resiliency of their infrastructure.

By Shawn Akins October 20, 2025
October 20, 2025 — Early today, Amazon Web Services experienced a major incident centered in its US‑EAST‑1 (N. Virginia) region. AWS reports the event began around 12:11 a.m. PT and tied back to DNS resolution affecting DynamoDB , with mitigation within a couple of hours and recovery continuing thereafter. As the outage rippled, popular services like Snapchat, Venmo, Ring, Roblox, Fortnite , and even some Amazon properties saw disruptions before recovering. If your apps or data are anchored to a single cloud, a morning like this can turn into a help‑desk fire drill. A multi‑cloud or cloud‑smart approach helps you ride through these moments with minimal end‑user impact. What happened (and why it matters) Single‑region fragility: US‑EAST‑1 is massive—and when it sneezes, the internet catches a cold. Incidents here have a history of wide blast radius. Shared dependencies: DNS issues to core services (like DynamoDB endpoints) can cascade across workloads that never directly “touch” that service. Multi‑cloud: practical resilience, not buzzwords For mid‑sized orgs, schools, and local government, multi‑cloud doesn’t have to mean “every app in every cloud.” It means thoughtful redundancy where it counts : Multi‑region or multi‑provider failover for critical apps Run active/standby across AWS and Azure (or another provider), or at least across two AWS regions with automated failover. Start with citizen‑facing portals, SIS/LMS access, emergency comms, and payment gateways. Portable platforms Use Kubernetes and containers, keep state externalized, and standardize infra with Terraform/Ansible so you can redeploy fast when a region (or a provider) wobbles. (Today’s DNS hiccup is exactly the kind of scenario this protects against.) Resilient data layers Replicate data asynchronously across clouds/regions; choose databases with cross‑region failover and test RPO/RTO quarterly. If you rely on a managed database tied to one region, design an escape hatch. Traffic and identity that float Use global traffic managers/DNS to shift users automatically; keep identity (MFA/SSO) highly available and not hard‑wired to a single provider’s control plane. Run the playbook Document health checks, automated cutover, and comms templates. Then practice —tabletops and live failovers. Many services today recovered within hours, but only teams with rehearsed playbooks avoided user‑visible downtime. The bottom line Cloud concentration risk is real. Outages will happen—what matters is whether your constituents, students, and staff feel it. A pragmatic multi‑cloud stance limits the blast radius and keeps your mission‑critical services online when one provider has a bad day. Need a resilience check? Akins IT can help you prioritize which systems should be multi‑cloud, design the right level of redundancy, and validate your failover plan—without overspending. Let’s start with a quick, 30‑minute review of your most critical services and RPO/RTO targets. (No slideware, just actionable next steps.)
By Shawn Akins October 13, 2025
How a Zero-Day in GoAnywhere MFT Sparked a Ransomware Wave—and What Mid-Sized IT Leaders Must Do Now
By Shawn Akins October 13, 2025
The clock is ticking: Learn your options for Windows 11 migration, Extended Security Updates, and cost‑smart strategies before support ends.
More Posts