Automated Infrastructure Provisioning on Cloud with Auto-Scale

Mehul Malik
Infra Provisioning AWS Auto Scale Terraform

Automated Infra Provisioning

Problem Statement

Setting up the automated infrastructure setup process for a License Manager Application on the Cloud with ability to Auto-Scale based on custom metrics.

Infrastructure Setup

Following are the top level infra bits to be considered while setting up the infrastructure:

  • Network
  • Container Service
  • Registry Service
  • Load Balancers
  • IAM
  • S3
  • ElasticSearch

Following a high-level diagram to describe the infrastructure provisioning process:

The above diagram gives an overview of the Terraform scripts, each divided into modules. Each of the module will be getting the information either from variables or even user inputs. The output from each of the module is be passed on to the next module as they are interlinked.

Monitoring, Alerting & Scaling

Once the ECS Service is up and running it starts to emit metrics to CloudWatch. The CloudWatch Module is setup for the following 4 resources:

  • CloudWatch Dashboard
  • CloudWatch Event Alarm
  • CloudWatch Event Rule
  • CloudWatch Event Target

Once CloudWatch starts to receive usage metrics from the ECS Service (sidecar container), we have to define Event Rules with specific thresholds which will trigger the Scaling Module to Scale Up / Scale Down the ECS Service.

Along with this, we also set up a basic metric dashboard that the sidecar container will spit out, also, if needed we can set up some other CPU / Memory dashboards for the containers if needed.

Scaling Up

Once a particular container reaches an upper threshold, we trigger a CloudWatch event which will trigger a Scale Up action for the ECS Cluster.

Scaling Down

In case for a given container the metric reaches a lower threshold, then, we trigger a CloudWatch event which will trigger a Scale Down event and kill the specific container. As a part of scale down event we will first stop traffic to that container and then fire a lambda function which essentially will be something like lm down which gracefully shuts the server down and kills the container as well.

Additionally, we send out alarms / mails so that we are apprised of the events taking place.