Notice: This resource is no longer being actively maintained and as a result some information may be out of date or inaccurate. Please see for technical instructions or for support related concerns.
RUN@cloud » Autoscaling explained

Autoscaling explained

Last modified by Michael Neale on 2013/10/15 03:55

Automatic scaling of your application (autoscale) sometimes causes concern (will it cost me too much? how does it work?).

All it is simple is a layer of continuous monitoring and measurement to ensure your app stays with in certain constraints (which relate to your users experience) while running. This article hopefully will explain it some more. 

What is auto scaling

Firstly - lets look at an apps request graph (from the console): 


You can see this is fairly "peak" ish - in this case it isn't to do with time of day. If it is time of day related, would be a bit "smoother" and perhaps have some curves: 


So - in the above graphs there is a red and green line (put there by me). Lets say that above the red line - user experience suffers. Below the green line means that we are wasting money ! (running too many servers). ie: the ideal would be for requests to sit in between the lines as much as possible (optimal user experience for amount of money spent). 

That is all autoscale does ! It tries to keep things in the applications "happy place" as you describe it. There are many things you can measure and base it on (request count per server is usually good simple one).

Note: keep an eye out for *why* your application may be responding that way - if it is due to errors and client retries - then adding servers may not always help it (unless it really is a capacity issue). 

How you use it - quick start

In the console, go to the configuration page, and find the section that sets the scaling - and set it to "Automatic": 

The set "maximum instances" to 4.

Then put in some settings like this (this is typical - almost recommended, setup) - this assumes a single instance of your app can handle 1000 requests PER MINUTE happily (adjust accordingly):


For most people - a setting like the above should do nicely - if you want to know more - read on!

What this says is that "when an instance of my app gets more than 1500 requests per minute, scale up, if there is less than 1000 requests per minute - scale down) - ie add or remove instances. "Request count" in this case is really "requests per minute" - you select "median" or "average" to say "the average value of requests per minute, over this time window" - you want to average this out (or median) - as more servers come on line - you will naturally get more data captured in your measurement window. 

Min and Max instances

Set min/max instances to set the limits of your cluster - min of 2 means you will always have at least 2 instances. 

Measurement and cool down windows

Defaults are fine for this.

Measurement window describes how much data to take in - ie look at your statistics every 1 minute, or 10 minutes (1 minute is typical) - anything that is older than 1 minute is discarded. 

Cool down means that after taking an action (eg adding a server, removing a server) - it will wait (for example) 1 minute before doing another action (limits the rate of change to your app)

Fields to calculate on

Request Count: Requests Per minute - for this - use average or median (recommended) - how many requests per minute your app (a single instance of it) is processing. 

Response Time: time taken to process a request. 

Active sessions: the number of active sessions per instance of your app.

Average, median, min, max, total

Each instance of your app reports data - as more instances of your app report data, they are captured in the "measurement window". You need to use an aggregate (eg average) to smooth these out to give a value you can decide to scale on. The "measurement window" captures data from multiple instances of your app - or multiple reports - as your app reports stat data once per minute. 

Settings in the console like the above would try to keep your application within the red and blue lines (red being 1500 requests per server per minute - green being below 1000). As autoscale adds more servers - the load is spread out - thus reducing the per-server counts (and so on).

There are many metrics you can use to trigger scaling - could be CPU or threads or memory, every application is different (but request count is a great starting point). Note that "request count" in this context means per server count - which is over the period of time selected below:  


You also set upper and lower limits of course (so in the worst case you would be running 10 app instances). 

The measurement window is how long to take the samples over (a minute is sensible). 

Cool-down means how long to wait after taking action before taking action again - this allows the change to take effect before autoscale will re-evaluate and decide if more corrective action is required. 

You can see these are short periods of time - so you can quickly let the system adapt to changes in load and hopefully save some money and hassle !

So how does it work?

A picture is worth a few words: 


Each app instance that runs is supervised by an agent - part of this agents job is to collect all sorts of statistical information and health data. This data is then pushed onto a message bus (we call it the "app.stat" stream) - this stream is then tapped into by the Autoscale service.

The autoscale service also knows about your applications preferences - this is stored as a list of scale rules - in JSON. The service then listens to the app.stat stream and does very simple stream processing over it (keeping a short window of data to calculate means, medians etc over) - should it detect that your application is outside of its "happy place" - it will send a message to the deployment api service - this will then add more capacity or reduce capacity as it is told - as long as it is in the range that you specified (it won't go crazy and just keep adding or removing servers). Tools like New Relic can provide insight to help you identify what the triggers should be for your application. 

Hopefully this takes some of the mystery out of the autoscale service. Interesting trivia: the autoscale service is written as an erlang application. 

Created by Spike Washburn on 2011/12/19 04:24