Tutorial: Integrating Gitlab and Jenkins

This is a short tutorial showing how to integrate Jenkins and Gitlab.

There are many uses cases and customized development processes established across teams. For the sake of the discussion, I will reference the following development process that could be used in those environments with high focus in having high quality code on the main branch at all the times.

Typical Development Process

Usually the workflow starts as soon as a developer creates or updates a Pull Request. Some people set up processes by running the build, test & deployment from the feature branch. If all tests are passed, the Pull Request is considered a good candidate to be merged with the main branch.

Although it can work in many cases, this approach has a potential flaw since it could happen that after the Pull Request is merged, the new version on the main branch does not work as expected (e.g. build failure, test failures, deployment failures etc).

The key point is that the feature branch MUST be merged with the main branch before running any build/test/deployment and not using only the branch on its own.

This consideration increases the level of the confidence while performing code integration that it is especially crucial when it is performed automatically.

After the build/test/deployment completes, the pipeline outcome is reported back to the Pull Request by updating the GIT commit status.

If the pipeline failed, the merge feature is disabled to enforce the team to fix the issue by pushing additional commit or re-executing the pipeline if the problem was not code related.

The following diagram shows the example workflow.

 

Prerequisite

The Jenkins Gitlab plugin is the best source of informations for supported versions as well as for community discussions.

My setup is based on the following stack:

  • Jenkins 2.89.1
    • Git Plugin 3.6.4
    • Gitlab Plugin 1.5.2
  • GitLab Community Edition 10.2.3

 

Jenkins Global setup

Configure the Gitlab plugin pointing to your gitlab instance as shown by the following example

Jenkins Job

Create a Jenkins job that will be used as quality gateway. It should be comprehensive so that all the typical stages are executed (build, test, deploy etc). If possible it should be implemented as Jenkins pipeline to leverage all the benefit of Infrastructure as Code paradigma. The following screenshots shows an example of a traditional Maven Jenkins job but the same can be easily translated in Jenkins pipeline.

SCM Git

The SCM must be configured with Git by pointing to the GIT URL where the project is hosted.

In order to build the feature branch and to perform the important branch operation before the build execution, use the following settings

  • Name: origin
  • refspec: +refs/heads/*:refs/remotes/origin/* +refs/merge-requests/*/head:refs/remotes/origin/merge-requests/*
  • branch specifier: origin/${gitlabSourceBranch}
  • Additional behaviour Merge before build
    • name of repository: origin
    • Branch to merge to: ${gitlabTargetBranch}

Build Trigger

The Jenkins job must be configured to start automatically as soon as our events occur. In this example workflow, we want to trigger a Jenkins build whenever of the two following events occur

  1. Create/Update Pull Request. We don’t want to start a build as soon as a push is performed against our branch (unless the Pull Request is already opened and the tile does not contain the WIP prefix) but only upon a Pull Request creation/update.
  2. A specific comment is added in the Pull Request. Sometime happens that a build fails and it must be restarted. Triggering it from Gitlab, will allow us to keep the Pull Request status in synch.

Jenkins should be configured like the following example:

Take a note of the Gitlab CI Service URL since it is required to configure properly Gitlab in the next step.

Update Commit Status

The last configuration is to tell Jenkins to update Gitlab (commit that triggered the build) with the build status.  Simply add the post build action as shown below:

GitLab setup

Project creation

Create a new project and upload it to Gitlab. The build process can be based on any tool (e.g. maven, gradle etc).

Gitlab should contain the source code, as well as the build scripts (e.g. pom.xml) and the Jenkins configuration such as the Jenkins pipeline (e.g. Jenkinsfile). This example shows instead the Jenkins configuration using the traditional web UI approach but the same concept applies if the job config is moved to the Jenkinsfile (reccomended).

Disable Gitlab Pipeline

Since we will be using Jenkins as CI tool, it is recommended to disable the Gitlab CI tool to avoid conflict in setting the commit status.

Navigate to Project / Settings / General / Permissions / Repository / Pipeline and disable it

Gitlab webhook

Once a Jenkins job is configured to kick off upon a Gitlab event, the Gitlab project must be configured as well to send those events to Jenkins.

Navigate to Gitlab project / Settings / Integration and create a new webhook like the following example. Make sure to use the URL shown by Jenkins Gitlab Plugin and select the interested events (In this example only merge requests and comments)

Test

Everything is ready and should work as expected.

To test the setup, create a branch and push some changes. No build should start on Jenkins yet.

By creating a pull request, a Jenkins build should start automatically and the Pull request shows that the pipeline is running.

Gitlab shows under the Pull Request all Jenkins invocations (Gitlab pipeline) along with the status:

CloudU Certification

Cloud Computing as Digital Business revolution enabler

Digital Business Revolution

As an IT professional, could not be a better time for me to live in, as I am directly seeing the effect of today digital business revolution.

Every day we learn that new opportunities and new businesses are created thanks to the multitude of applications, services and, in more general way, the technology that can be leveraged.

Also traditional activities, that we have been doing for ages without any IT support, now have some sort of  IT system that is supporting it.

As a whole, technology has been an enabler to develop new business ideas with keys factors such as

  • Boom of mobile user population and the consequent business opportunities.
  • IoT  that every day shows a new use case.
  • Big Data & Analytics that allow to extract values from the huge amount of data that it produced every second.
  • Virtual world where with examples like Pokémon GO we saw how people are tending to live more the virtual world rather than the real life.
  • Cryptocurrency with Blockchain technology and its most popular Bitcoin implementation where several products and services are now only “bit” away.

Regardless the size of the business, every organization recognizes that flexibility and agility are paramount characteristics that cannot be overlooked, otherwise big are the chances that IT systems are no longer aligned to business objectives with the relative consequences.

In the recent years, the disruptive technology that enabled agility and time to market is the Cloud Computing.

One great benefit of the Cloud is that small companies and startup can give a try to their idea at a fraction of a cost.

With the Cloud computing paradigma, self starter people suddenly had the opportunity to implement their ideas without the need of an upfront investment to set up an IT infrastructure; They could quickly develop and deploy a new product and shutdown everything in case of unsuccessful experiment.

CloudU Certification

If you are interested in the Cloud topic, I suggest you to look at the CloudU certification program organized by Rackspace Cloud University.

By going through the course package, you’ll learn the fundamental of Cloud Computing that are valid across the several Cloud providers since it is Vendor neutral.

Once you’re ready, you can go through the online exam and (if you pass it) you’ll receive a certificate as I did.

Renato_Del_Gaudio-CloudU-certificate-41cc489f-f8ac-4169-aa81-66b8362fd315

Downloadable version

Networking at home

Sometime you need to do things just to have fun and not becuase there is a real need.

This is the case of my home network that I’ve over complicated ON PURPOSE.

network-diagram

  • Two different private LAN (192.168.1.0/24, 10.0.0.0/24)
  • 8 Wifi networks
  • Internal DNS service to resolve hostname under subdomain (intranet.renatodelgaudio.com)
  • Internal DHCP service (Disabled the one provided by the router) that automatically update the entries in the DNS). Picture below
  • The two DNS machines are behind a clustered load balancer that redirect in Round Robin fashion the client’s  DNS queries.
  • Two NAS devices with over 9 TB available storage
  • Home server running 24×7 hosting around 20 VM with APC battery

 

Cluster of loadbalancer to server DNS queries (This is really over engineering and made only for fun!! )

DNS-cluster-loadbalancer

My first AWS event on BigData & Analytics

I am glad that I joined the event powered by AWS about BigData and Data Analytics as it was definitely an interesting day.

I’ve found the session useful as it was characterised by a mix of architectural approach, demo as well as best practices when working with BigData and amazon web services (AWS).

A picture taken during the speech shows the aws cloud services that can be used across the typical workflow

Collect ==> Store ==> Analyze

aws-services-by-type

With data analytics usually we have to deal with unstructured data that must be turned in structured ones  before they can be analysed.

The next picture shows instead how those services could be combined together to process data depending on their nature.

aws-services-combined

Particular focus was given on the following services:

My thoughts about Amazon Kinesis

Amazon Kinesis consists of a set of services to process stream data in the cloud.

The first service launched was Kinesis Stream where the data stream capacity (shard) must be determined at creation time and a charge will occur accordingly.

For further reading, the following source contains all the key concepts like the following diagram offering a visual representation of what Kinesis is:

Once the records are sent to the stream from the producers, the Kinesis applications (consumers) can consume them.

No autoscaling in Kinesis Stream yet

Surprisingly the data stream capacity does not automatically scale up/down but a manual stream resizing operation must be carried out to increase/decrease the number of shards of the stream whereas it is desired. This is a straight forward operation from AWS console but I suppose that an automatic scale up/down would be useful.

Partition key as mandatory input

It is arguable the decision to make the partition key a mandatory input to  get/put records from the stream. It is clear that the partition key is used to provide the record grouping feature but in my opinion it should be an optional one since a grouping/sorting might not be a requirement for each use case.

Kinesis Firehose

Leveraging Kinesis Stream, AWS has built Kinesis Firehose that allow the users to easily create a stream delivering data directly to Amazon Redshift or Amazon S3

 

 

With Firehose, in my opinion AWS has made a great step forward. While creating a delivery stream, there is no sizing to specify for the underlying shard nor any partition key to provide as input param since the service handles it transparently.

Why a temporary S3 bucket to deliver to Redshift?

My only remark regarding Kinesis Firehose is the temporary S3 bucket that must be specified upon its configuration against a Redshift target. In my opinion, the intermediate S3 bucket should be transparent to the user. Why a user should have an additional S3 bucket in his account just because Firehose has been implemented with an intermediate store in mind?

The following screenshot shows the concerned configuration:

kinesis-firehose.creation

Grid Computing for a better future

Everyone should donate some processing power

In this post, it’s not my intention to talk about the technology behind the grid computing as there are already a high number of sources where you can raise some awareness on this topic.

My main goal is instead to raise some awareness and promote the projects, relying on grid computing, with the aim to improve the human life such as discovering treatments for diseases such as HIV, Ebola, Cancer and so on so forth.

Just to give a minimum of context, it is enough the following minimalistic definition of Grid computing taken from wikipedia

Grids are composed of many computers acting together to perform large tasks.

In other terms, when there is a very complex computational problem to resolve, putting together hundred thousand of PC working together is in most cases the only feasible way to go.

The examples of uses cases requiring high computational power range from art, cryptographic, finance, games, maths, molecular science and so on.

For this reason, it was born the need to create a platform for the distributed computing to be used by the demanding projects.

One of the platform for the distributed computing that is becoming more and more popular is the one developed by the Berkeley University known as BOINC

Berkeley

Open

Infrastructure

Network

Computing

Today there are a number of research projects that leverage BOINC and distributed computing to beat the cancer and many other global issues.

The key message is that all those projects need our support and we should all contribute as it is for a good cause.

All you have to do is to download a piece of software and let it run. It will receive some work and it will do it when your PC, laptop, phone is idle (not using any resource) and eventually it will return the results to the researchers who will analyse them.

I think that it makes no sense to write here any tutorial on how to proceed as the below links contain enough information on what to do but should you encounter any problem or have question on this topic, I’ll be delight to help you as this is my active contribution to this cause.

My Contributions

I write here the actions that I took to contribute to this cause. You can do the same or adapt to your possibilities.

 Download the BOINC software

The first step is to download the software from BOINC

I installed it on

  • my Android phone
  • laptop
  • deskptop
  • dedicated VM running on my VMWARE ESXi host(24×7)

Although you can configure the program to run when your device is not doing anything else, you need to be aware that there will be a slight higher power consumption as an intensive CPU work is not for free. This is all about donations.

In my case, the power consumption of a VM running 24×7 (limited CPU) is only few Watts

boinc_consuption

Join a Project

You can either join single projects or decide to use an Account Manager like BAM (as I did) where you manage centrally the projects that you wish to contribute to.

At current time, I’ve joined

My contributions are visible here

Spread the word

I wrote this article 🙂

 

AWS Route53 Ip updater utility program

I started exploring AWS (amazon web services) back in 2007 and since then I’ve been keeping an eye at their services and offers as I suppose they play an important role in the cloud space.

This post is related to the AWS Route53 service that allow to leverage the distributed amazon DNS.
As an aws route53 user and owner of a domain, it is possible to easily associate domain names with public IP and, as usual, you get charged on usage basis.

It is possible to use aws route53 service to either point hostnames to EC2 components like instances and ELB (Elastic Load Balancer) or to external machines.

Since I have been using AWS Route53 to point my domain to a machine with dynamic IP, I was forced to login into the AWS console to update the configuration every time a new IP was assigned to my machine.

I was looking around for a utility like noip where you install a program that detect your public IP and update your AWS Route53 record.

Surprisingly I could not find anything that suits to my needs so I decided to write my own solution leveraging the AWS Route53 API.

I’ve implemented this small utility using Java and the source code along with the binaries are available to everyone for download on GitHub

Additional use case:

Another interesting use case of my utility is when you run one or more aws EC2 instance and you would like to point some hostnames to those instances.

One possible way to do so would be to reserve one or more aws elastic IP and then configure Route53 pointing your hostname to the reserved public IP.

Unfortunately, only the first elastic IP is for free as aws will charge for the additional ones.

Installing this utility program on your was EC2 instance will make the trick as the public IP associated to the was EC2 instance will be automatically registered on was route53 without the need to reserve a Elastic IP (and get charged) associated to the instance.

Of course this approach is only feasible for EC2 instance where you have full control and not for such services such as the load balancer.