Value proposition for a Microservice Architecture in a serverless Cloud environment

Value Proposition for the Auto Scale feature in a Cloud environment

With the Cloud paradigm that it is getting every year more and more popular, the providers (AWS first and Azure, Rackspace etc later) are offering the Auto Scale feature highlighting its benefits.

In a nutshell, the auto scale feature allows to configure the infrastructure in such a way that, according to certain metrics, the system automatically add or reduce the underlying Virtual Machines, that are hosting your application, in order to smoothly meet the workload.  In contrast with a traditional cluster setup, where it must be decided upfront how many machines should be used for a certain cluster setup, with the auto scale feature there are the following benefits:

  1. No need to perform any estimation exercise to figure out the cluster size for the production deployment and no need to manually provision the hardware (physical or virtual). The cloud watches certain configurable parameters (e.g. CPU, network data in or out, Disk I/O etc ) and automatically add or delete the underlying Virtual Machines with the full stack to accomodate the user workload. Efficient usage of resources.
  2. Avoid costs for idle resources during low system load. Since the system can detect situations when there is no load (or low), the number of resources can be ramped down to the minimum  and pay a fraction.

A step forward: Serverless

There is no doubt that the Auto Scale feature is a step forward compared to the traditional setup but there is a new approach going under the name Serverless that in my opinion has a room for further grow.

An example of this paradigma is AWS Lambda or Microsoft Azure Funtions.

The idea is to write a piece of code that will be executed in the Cloud and the infrastructure is not a concern at ALL!

When we compare the Serverless approach to the classic Cloud Auto Scale Feature it is easy to spot the following limits:

  1. Although with the Auto Scale feature, it is not anymore a concern the setup of a large cluster to handle huge workload, there is still some thoughts to have on the infrastructure design as the auto scaling metric need to be configured. Depending on how the metrics are configured, the resource allocation will vary with the related costs. With Serverless approach every service request is executed in parallel on the common Cloud infrastructure where there is no need to instantiate a specific VM with the custom client’s software stack. 
  2. Although the Auto Scale can be configured to shrink the system in case of no workload, at minimum one Virtual Machine must be up and running at all the times to ensure the system remains alive. This means that there will be a cost associated to maintain the VM alive even if nobody uses the system for several hours/days. With Serverless approach, the charge occurs only when the code get executed.
  3. The Auto Scale feature seems more suitable for a traditional layered Architecture (Front End, Business, Data) to scale the entire deployment bundle (e.g. FE + Business when shipped together, web service module). Although it is possible to separate them, usually multiple business services are shipped together in a single deployment bundle (e.g. WAR module with all web services). If one service become busy it can trigger the auto scaling mechanism leading the deployment of the entire bundle with all services and not just the busy one. The Serverless approach promise instead to scale independently every single service/function that would make it a perfect match for a Microservice Architecture

Limit and Challenges of Serverless

Looking at the current offer of serverless implementation the first restriction is on the limited number of languages and dependencies injections.

Languages

  • Microsoft Azure provides support for C#, Javascript, bash, powershell, PHP and few more. No Java
  • AWS provides support for Javascript (Node.js) and Python. No Java

Probably Java will become available in the future.

It is clear that the serverless environment consist of pre-built environment runtime that are ready to execute the customer code.

For this reason, it is feasible for a Cloud provider to prepare under the hood an Auto Scaling cluster for each runtime that it is shared across all users and hence hide the infrastructure scaling details.

Dependencies

As I said, the Funtion code works well on top of pre-build environment but it is not possible to specify today any dependency required by your code. It means that your code can uses only the available libraries in the pre-built environment and the functionalities provided by external service calls (e.g. other function, other network remote service).

Potential Extension of Serverless approach

It would be great if a custom service (e.g. Java module with all its dependencies and runtime) could be deployed in a Cloud environment in a Serverless fashion with all related benefits (e.g. Charge occurs only when the code is executed and no costs at all during idle time).

As explained above, unfortunately this is not possible yet.

The main impediment consists on the fact that a custom software stack should be built for every single custom service and leave it idle at the cost of the Cloud provider.

The following trade off could make it technically feasible and maybe as well as a value add for the Cloud Provider offer.

  • Each custom service can be configured with a deployment bundle that allow the setup of the entire execution environment of the given custom service.
  • The Execution environment is not deployed until the first request. Since the environment deployment requires some time, a Circuit Breaker as Gateway middle component could provide a smooth temporary “out of service” response. It is accepted that by time to time the service is temporary not available. As soon as the environment is ready, the circuit breaker will forward the request to the up and running service and consequent requests will be processed by the deployed code. The Cloud Provider watches the usage of the service and it will keep the environment up and running until it won’t be used for a certain amount of time. According to the Serverless style, the Cloud provider will charge the client not based on the time the system has kept up and running but according to the Servless metrics (e.g. number of request, execution time etc). If the system is not used after a given timeout, the system is deleted and the next request will experience again the temporary out of service.
  • Different class of services can be defined based on several criteria (e.g. Timeout after which the system is deleted)

 

I believe that for those cases where temporary out of services are acceptable, it could be a good compromise to have a Cloud scalable system that it is even cheaper to the current offering.

References

To find out more about the Auto Scale feature and the differences across the biggest Cloud Providers, see the following references

The Reactive Manifesto

You may have already heard the expression “Reactive Systems” in the Software Architecture space.

If not, I’d suggest you to read the Reactive Manifesto (http://www.reactivemanifesto.org/) and optionally to sign it, as I did.

It is nothing new but simply four characteristics of a software system that would denotate a system as a “Reactive” one.

  1. Responsiveness
  2. Resilience
  3. Elastic
  4. Message Driven

Enjoy your reading.

A general purpose architecture for background processes

Introduction

The purpose of this article is to present a general purpose architecture for background processes. In this page, background process is used interchangeable with job.

Like any general purpose architecture, it cannot fullfil every single case since the requirements can be different across the different applications.

This architecture tries to address the requirements listed below and it should be adjusted accordingly whereas the specific case has a different set of constraint, requirements and wishes.

Requirements

In order to have a general purpose architecture that can be applied in a high number of scenarios, the following requirements should be satisfied.

Separation of concern

The business logic performed by the job must be loose coupled from the launcher logic.

The laucher component is responsible just to invoke a specific job.

The launcher can be either a scheduler that automatically triggers the job according to its configuration or manually through an explicit trigger action

Multiple Execution Environment

A general purpose architecture should allow to run the jobs either in a standalone environment or in a container.

Cluster awareness

In case jobs are deployed and launched within a container (e.g. within a web application), they must support those scenario where the deployment is performed in a cluster environment.

If a scheduler is used as a launcher component to trigger a job, it must keep in account that the job, according to its scheduling, must be triggered only once and not multiple times because of the multiple nodes.

 Description

The architecture is based on a launcher (trigger) component that is responsible to run the job.

Since there could be different way how a job can be launched, the launcher component should contains multiple implemetations such as

  1. REST API: If the launcher is running in a container a REST endpoind or any other protocol could be used to trigger the job remotely
  2. Command Line Interface (CLI): A command line interface could be useful to trigger a job (either on a local machine or remotely) using shell commands
  3. Internal Java API: Once the launcher component is deployed along with an enterprise application, it could be useful to expose the launcher API so that other Java components will have the capability to trigger jobs

All different launcher implementations finally could rely on the popular Quartz Scheduler to actually trigger a job.

Among many other features, it supports deployments in cluster environments where the scheduler runs on multiple nodes but still the jobs are triggered only once.

The job itself can be either a typical Spring Batch job or any generic background process.

GeneralPurposeArchitecture

 

 

 

 

 

 

Enterprise Architecture frameworks comparison

Before talking about Enterprise Architecture (EA) frameworks comparison, it is important having clearly in mind what EA itself is all about:

From Wikipedia

Enterprise architecture (EA) is “a well-defined practice for conducting enterprise analysis, design, planning, and implementation, using a holistic approach at all times, for the successful development and execution of strategy. Enterprise architecture applies architecture principles and practices to guide organizations through the business, information, process, and technology changes necessary to execute their strategies. These practices utilize the various aspects of an enterprise to identify, motivate, and achieve these changes

Large organizations embracing  EA practice should consider it as their own business with continuos development, resources and budget allocated for it.

Since a number of frameworks (claiming to provide guidelines on how to develop EA) have been created, I was curious  to analyze the differences between them. During my research, I came across the following comparison made by Pragmaticea

The comparison, carried on by Pragmaticea, analyzes four frameworks (MAGENTA, TOGAF, Zachman, PEAF) and for obvious reasons the best one results to be PEAF sponsored by Pragmaticea 🙂

I personally don’t agree with that analysis but what I’ve found very unfair is to place TOGAF as the least one.

I don’t want to be a TOGAF advocate but the following statement in their report, is quite arguable:

From comparison made by Pragmaticea

TOGAF is mostly concerned with IT rather than the entire organization

Answering to this point, I would refer to the Business Architecture (B) domain as defined in TOGAF

TOGAF ADM
TOGAF ADM

As far as I am concerned, according to TOGAF, the Business Architecture is all about business:

As a matter of facts the Business Architecture:

  • It focus on business capability.
  • It is owned by business people and not by IT people.
  • It is not concerned on IT execution.

After said that, it is true that TOGAF addresses as well the IT execution concerns but it is made from the phase E going forward and this function is usually owned by PMO (Project Management Office).

Moreover I’ve found this statement deceptive because we are all in the space of setting up a EA framework and the scope of EA (regardless of the framework adopted) is all about designing how the IT system should operate in order to align it to the business concerns.

From this perspective, I would say that TOGAF addresses fully this concern.