Skip to main content

Auto-Scaling

Auto-scaling refers to the process of increasing or decreasing the number of instances based on defined rules. For example, if there are more than 1,000 requests to the API over the course of an hour, you could increase the instances by 2.

Adding Auto-Scaling Conditions

You can choose to add a scaling condition to create a new auto-scaling condition.

Scaling Conditions

Here is a description of the conditions that can be selected.

  • Metric
    • Select a metric that will be used as the basis for increasing or decreasing instances.
    • You can choose from metrics like total requests (total_requests), requests per second (requests_per_sec), or request latency (latency_ms).
  • Target Value, Operator
    • Compare the metric value using the target value and operator.
    • For example, you can choose "greater than or equal to" with a target value of "50" for the total requests (total_requests) metric.
  • Measurement Period
    • This is the period for fetching values.
    • For instance, if the measurement period for the total requests (total_requests) metric is "10" minutes, it will aggregate the total number of requests over the last 10 minutes.
  • Cooldown
    • This is the period during which no additional actions will take place after the auto-scaling has been triggered.
    • By setting this value, you can limit how frequently the auto-scaling occurs.
  • Action
    • Choose the desired action, such as increasing or decreasing instances.
    • For example, if you select "decrease instances" with a count of "1," then when the specified metric condition is met, an instance will decrease by 1.

Auto-Scaling Records

These are records documenting when the specified auto-scaling conditions were executed. This allows you to verify when the conditions were triggered.