Auto Scaling - Introduction

AWS Auto Scaling is a feature that allows you to automatically adjust the number of resources in use based on predefined conditions or in response to changing demand.

20 May 2026 by

Tony FinOps

Overview

AWS Auto Scaling is a feature that allows you to automatically adjust the number of resources in use based on predefined conditions or in response to changing demand.

The primary goal of Auto Scaling is to ensure that you have the right number of resources available to handle the current workload efficiently and cost-effectively.

Subtitles are available in English, French and Spanish. Click here to get the full masterclass.

Benefits

Scalability: Auto Scaling helps you scale out (add resources) to handle increased load and scale in (remove resources) when the demand decreases.

High Availability: By automatically replacing unhealthy instances, Auto Scaling helps maintain the availability of your application.

Cost Efficiency: Auto Scaling helps optimize costs by ensuring that you only pay for the resources you need. It reduces the need for over-provisioning resources to handle peak loads.

Automatic Management: Auto Scaling can be configured to automatically manage the lifecycle of your resources, including launching, terminating, and maintaining a specified number of instances.

Example: Cover variable demand

Let’s consider a basic web application running on AWS.

During the beginning and end of the week, usage of this application is minimal.
During the middle of the week, the demand on the application increases significantly.

Option 1: Add enough servers so that the application always has enough capacity to meet demand.

Option 2: Have enough capacity to handle the average demand on the application.

Option 3: Add new instances to the application only when necessary, and terminate them when they're no longer needed.

Scaling Strategy: Dynamic Scaling

Dynamic scaling creates target tracking scaling policies for the resources in your scaling plan.
The intention is to provide enough capacity to maintain utilization at the target value specified.

E.g: You can configure your scaling plan to keep the number of tasks that your ECS service runs at 75% of CPU.

When the CPU utilization of your service exceeds 75%, then your scaling policy adds another task to help out with the increased load.

Scaling Strategy: Predictive Scaling

Predictive scaling uses machine learning to analyze each resource's historical workload and regularly forecasts the future load.
Using the forecast, predictive scaling generates scheduled scaling actions to make sure that the resource capacity is available before your application needs it.

E.g: You can enable predictive scaling and configure your scaling strategy to keep the average CPU utilization of your Auto Scaling group at 50%.

Your forecast calls for traffic spikes to occur every day at 8:00.

Your scaling plan creates the future scheduled scaling actions to make sure that your Auto Scaling group is ready to handle that traffic ahead of time.

Impacted Services

EC2 Instances
ECS
EKS
RDS
DynamoDB
SQS
Lambda
SageMaker
…

in Saving on Scheduling

# Auto Scaling & Scheduling

Other Auto Scaling Services

Understand your application's workload patterns and performance metrics.