Value Proposition for the Auto Scale feature in a Cloud environment
With the Cloud paradigm that it is getting every year more and more popular, the providers (AWS first and Azure, Rackspace etc later) are offering the Auto Scale feature highlighting its benefits.
In a nutshell, the auto scale feature allows to configure the infrastructure in such a way that, according to certain metrics, the system automatically add or reduce the underlying Virtual Machines, that are hosting your application, in order to smoothly meet the workload. In contrast with a traditional cluster setup, where it must be decided upfront how many machines should be used for a certain cluster setup, with the auto scale feature there are the following benefits:
- No need to perform any estimation exercise to figure out the cluster size for the production deployment and no need to manually provision the hardware (physical or virtual). The cloud watches certain configurable parameters (e.g. CPU, network data in or out, Disk I/O etc ) and automatically add or delete the underlying Virtual Machines with the full stack to accomodate the user workload. Efficient usage of resources.
- Avoid costs for idle resources during low system load. Since the system can detect situations when there is no load (or low), the number of resources can be ramped down to the minimum and pay a fraction.
A step forward: Serverless
There is no doubt that the Auto Scale feature is a step forward compared to the traditional setup but there is a new approach going under the name Serverless that in my opinion has a room for further grow.
An example of this paradigma is AWS Lambda or Microsoft Azure Funtions.
The idea is to write a piece of code that will be executed in the Cloud and the infrastructure is not a concern at ALL!
When we compare the Serverless approach to the classic Cloud Auto Scale Feature it is easy to spot the following limits:
- Although with the Auto Scale feature, it is not anymore a concern the setup of a large cluster to handle huge workload, there is still some thoughts to have on the infrastructure design as the auto scaling metric need to be configured. Depending on how the metrics are configured, the resource allocation will vary with the related costs. With Serverless approach every service request is executed in parallel on the common Cloud infrastructure where there is no need to instantiate a specific VM with the custom client’s software stack.
- Although the Auto Scale can be configured to shrink the system in case of no workload, at minimum one Virtual Machine must be up and running at all the times to ensure the system remains alive. This means that there will be a cost associated to maintain the VM alive even if nobody uses the system for several hours/days. With Serverless approach, the charge occurs only when the code get executed.
- The Auto Scale feature seems more suitable for a traditional layered Architecture (Front End, Business, Data) to scale the entire deployment bundle (e.g. FE + Business when shipped together, web service module). Although it is possible to separate them, usually multiple business services are shipped together in a single deployment bundle (e.g. WAR module with all web services). If one service become busy it can trigger the auto scaling mechanism leading the deployment of the entire bundle with all services and not just the busy one. The Serverless approach promise instead to scale independently every single service/function that would make it a perfect match for a Microservice Architecture
Limit and Challenges of Serverless
Looking at the current offer of serverless implementation the first restriction is on the limited number of languages and dependencies injections.
Probably Java will become available in the future.
It is clear that the serverless environment consist of pre-built environment runtime that are ready to execute the customer code.
For this reason, it is feasible for a Cloud provider to prepare under the hood an Auto Scaling cluster for each runtime that it is shared across all users and hence hide the infrastructure scaling details.
As I said, the Funtion code works well on top of pre-build environment but it is not possible to specify today any dependency required by your code. It means that your code can uses only the available libraries in the pre-built environment and the functionalities provided by external service calls (e.g. other function, other network remote service).
Potential Extension of Serverless approach
It would be great if a custom service (e.g. Java module with all its dependencies and runtime) could be deployed in a Cloud environment in a Serverless fashion with all related benefits (e.g. Charge occurs only when the code is executed and no costs at all during idle time).
As explained above, unfortunately this is not possible yet.
The main impediment consists on the fact that a custom software stack should be built for every single custom service and leave it idle at the cost of the Cloud provider.
The following trade off could make it technically feasible and maybe as well as a value add for the Cloud Provider offer.
- Each custom service can be configured with a deployment bundle that allow the setup of the entire execution environment of the given custom service.
- The Execution environment is not deployed until the first request. Since the environment deployment requires some time, a Circuit Breaker as Gateway middle component could provide a smooth temporary “out of service” response. It is accepted that by time to time the service is temporary not available. As soon as the environment is ready, the circuit breaker will forward the request to the up and running service and consequent requests will be processed by the deployed code. The Cloud Provider watches the usage of the service and it will keep the environment up and running until it won’t be used for a certain amount of time. According to the Serverless style, the Cloud provider will charge the client not based on the time the system has kept up and running but according to the Servless metrics (e.g. number of request, execution time etc). If the system is not used after a given timeout, the system is deleted and the next request will experience again the temporary out of service.
- Different class of services can be defined based on several criteria (e.g. Timeout after which the system is deleted)
I believe that for those cases where temporary out of services are acceptable, it could be a good compromise to have a Cloud scalable system that it is even cheaper to the current offering.
To find out more about the Auto Scale feature and the differences across the biggest Cloud Providers, see the following references