Rick

Rick
Rick

Thursday, April 27, 2017

Radom thoughts by Rick Hightower: DR and multi-region

DR based on region is silly for most apps and services. It is an expensive bet.

Multi-region is great for reducing latency for sure and DR for sure, but multi-region hot standbys is silly for most apps.

Mutli-AZs deployments are enough for DR IMO for 99% of use cases. 
If your app/service can survive a single-AZ outage, it is better than 99.999% of apps out there. 

I am not saying to not do multi-region deploys (hot standbys), but merely that it has a cost, and your app may not need it. 


If you have a regular backup and a way to restore from another region, you are ahead of the game.
  • frequent EBS snapshots sent to another region, 
  • back things up to S3, replicate S3 bucket to S3 bucket, 
  • read replicas for DBs in another region if you must


For many services and applications, you don’t have to run a hot standby if you are spread across three AZs. 
Focus on surviving a single AZ failure. Get that right. Then focus on how to recover in another region from backups:

  • snapshot, AMIs, etc. ready to go, ready to be spun up, 
  • backups to S3 with S3 bucket replication. Cheap and easy.

If all hell breaks loose, and it takes you 15 minutes to 1 hour to spin up in a new region that is a lot cheaper than running hot-standby in a second region 24/7 365 days a year. The probability of a complete region failure and the cost to your business being down for 15 minutes to an hour vs. the cost of running a second set of servers all of the time. 

Engineers love to over engineer (especially bad ones). Hot standbys are expensive. Unless you need to run in multiple regions to reduce latency. 

If CA falls into the ocean, no one is going to care if your app serving virtual tractors is down for a few hours. 
If Ohio is nuked, and your app is down for an hour, no one will care that they saw the same ad twice.
We can serve a default ad without personalization for an hour. 


Kafka and Cassandra support, training for AWS EC2 Cassandra 3.0 Training