Building a Highly Available Web Server with Terraform - Auto Scaling, Load Balancing, and Cross-Region Failover

Khoa Hoang

03-15-2025

AWS

As part of honing my Terraform skills, I developed a script to provision a highly available web server, designed to simulate a real-world infrastructure solution.

Architecture Overview

My architecture follows the recommended AWS best practices, including:

Multi-region deployment - Primary in US West, Secondary in US East
Auto Scaling Groups in each region to handle traffic fluctuations
Application Load Balancers to distribute traffic
Route53 DNS Failover for automatic cross-region recovery
Health checks to detect and respond to failure

To reduce expenses, I omitted the database from this project. The diagram shows my complete setup minus the database.

How It Works

Here’s how the system handles different scenarios:

Normal Operation

Users access the website, in this case is web.khoah.net
Route53 routes traffic to ALB in the primary region (US West)
The ALB distributes requests to healthy instances in the ASG
ASG maintains the desired number of instances based on load

Primary Region Failure

Health checks detect the primary region is unavailable
CloudWatch Alarm triggers a Lambda function to spin up the secondary region (US East). It also trigger the SNS topic to send notification to me
Route53 automatically routes traffic to the secondary region
Users continue to access the application through the same URL
When the primary region recovers, traffic fails back automatically, then the CloudWatch Alarm will turn off the secondary region to save cost

Traffic Spike

Increased load causes higher CPU utilization
ASG scales out additional instances when CPU exceeds 50%
ALB distributes traffic across all healthy instances
As load decreases, ASG terminates excess instances

Cost Optimization

The architecture is designed to be cost-effective:

Auto-scaling ensures you only pay for what you need
Multi-AZ deployment provides high availability with minimal resources
Implementing CloudWatch Alarms and Lambda functions to manage secondary server activation based on primary region health

For the full setup, you can check out my code here.

Encountered Issues

Initially, I designed a system with two auto-scaling groups running simultaneously in two regions. However, I realized this was not cost-effective, so I redesigned it and have to write the terraform code again.

Also while setting up auto scaling with Terraform, I spent hours troubleshooting why the auto scaling group wasn’t created—only to realize I had referenced the security group by name instead of id (security_groups = [aws_security_group.my_sg.id]).

Moreoever, “thanks to” a bug, when I enabled the IP address in the launch template for the instance in the Terraform code, it helped to remember that the network interface required a security group assignment. If you use the console to create the launch template, it will automatically select the default security group for you.

I also hit an issue configuring a CloudWatch Alarm for a Lambda trigger. The Route 53 Health Check metric is only in us-east-1, and my aws_cloudwatch_metric_alarm failed until I explicitly set the correct provider in the CloudWatch configuration.

Improvements

In the real-world, there are a lot of things that need to be added to a web application to make it perform better, but here are the things I should consider:

Multi region Active-Active Architecture: Same concept but the secondary region always up
WAF integration for additional security
CloudFront for global edge caching
Private subnets for the application tier
Enhanced monitoring with CloudWatch dashboards

-> Never miss a beat, click here to subscribe to the blog