Monday, June 25, 2007

Continuous integration infrastructure, which OS to use

I have been on several projects lately where we have been using continuous integration-servers like CruiseControl and Continuum. This part of the development infrastructure is crucial, unless you are all by yourself (unlikely) you will need some server that will continuously integrate, build and deploy your code. This has been pretty much been a de-facto standard for quite a while now, and is not what I am trying to address here. What I am trying to address is, does the OS used for continuous integration-servers (CI servers) really matter? Yes it does and it should not run on Windows for reasons that should become obvious after reading this article.

It really matters, it is really not the question of which operating-system to use, but which operation system not to use. For some reason, it is not uncommon to configure and set-up CI-servers on Windows early in early development stages. I know several companies that only supports W2K/XP as part of their central operations, or Unix of course. But in case of Unix, the servers are often quite costly, cannot be dedicated for running CI, needs to be ordered and configured centrally and so on. Lately more and more companies are also supporting Linux as part of their infrastructure base, but it is not as widely supported as W2K/XP (I don't know anyone running Vista yet).

It is also quite common to use Windows as development desktop, even in environments that does not rely on windows-specific features. I am primarily developing in Java and for the most I use Windows desktops for development at client sites, not because I want to, but becose this is the company standard at most of my clients where I do contracting. At my home office I run Linux (Ubuntu) and Laptop runs Mac OSX. Given the fact that you will most likely end-up developing code using a Windows box when you are at work, it is also quite possible that it is easier to get access to a spare box for running one or several instances of your preferred CI-server if you go for Windows.

Well, so you go about and install CruiseControl, Continuum or whatever other software that you want to use, on Windows. Well, I have tried the same quite a few times now, and I can tell you, it simply doesn't work on Windows. Windows cannot and should not be used ever to run CI-server, because you will most certainly end up in a situation that some file is locked and that the CI server cannot run its goal because it cannot delete a file locked by some other process.
Another thing is long pathnames. Windows doesn't support pathnames that are longer than 255 characters, and I guarantee you, if you have a deep project structure you will almost certainly run into this problem. A third issue that may become a problem is access. When accessing a windows-box, people has this mental image that they will need to use explorer to actually view and edit contents of directories and files, which is just stupid. People will need to access the box when the test fails on the CI-sever and not locally, which is a quite common case (hard coded paths, locale-issues and network access in integration tests are common things). This means that developers will need to use RemoteDesktop to access the server. Well, only a limited amount on connections are available, and it is heavy on the server to support many simultaneous connections so this will quickly bring you into trouble as well. All these issues makes Windows an no-go for CI servers, it is simply not possible to maintain stable CI servers without file-locking, pathname-problems and locking because of simultaneous access. You will need a full time employee to do the maintenance or go with Unix or Linux.

What should I do?

Well, an option is to take any Unix/Linux box available and use that for continuous integration instead. It will make all the problems mentioned in the previous paragraph go away. File locking is not a problem, pathnames can be as long as you want (almost true) and since everyone is using ssh, you could support a high number of simultaneous users viewing the build-log files using terminal access (which is very lightweight) instead of RemoteDesktop. The short answer is go with Unix based system. But what if that is not possible?
That was the case at previous engagement. Going for Linux was a no-op. And if we wanted to go for Unix, Solaris was the way to go. But we would never get root-access to that box and they have to order it first....*sigh*. Well, first of all.

My experience is that configuring CI-servers are generally best left as a task to the development team, preferably someone that has done this a couple of times before. I don't think it is possible to put these kind of things on order at your system-administration department, those responsible for test and production environments. It is probably outsourced anyway, so you will have to fill out a form before you are even allowed to all them. So much for agile.
Well, so what did we do. We went for Virtualization using VMWare. VMWare gives you the ability to run any number for operating systems, called the guest OS, by using a player. The OS running the player is referred to as the the host OS. Here we were able to use a Windows-box running Ubuntu-Linux (my choice of course :-) as a guest operating system. After the guest operating system has been started it behaves looks an performs like a full-blown standalone OS without all the quirks mentioned earlier. The Virtual image was created with VMWare Workstation and may be executed with either VMWare Workstation or VMWare Player which is free. The process of installing the operation system (Ubuntu Feisty) is identical to what you would do when you install it from scratch. I will not cover the installation details here, but briefly reflect on some additional benefit that may be provided when using VMware when running CI-servers (these issues may also apply to running appliances in general). Note that this also makes Windows boxes a viable alternative as a host OS.

Benefits of running a virtualized CI-server

In my experience, CI servers are not treated as production-goods my most organizations, although they should, This often results in decisions like, hey we need to use that box we gave you last week, can't you just use this one instead? The point is, it should be really easy to change hardware. In VMWare your entire installation is basically just a couple of files in a directory. If you copy and transfer those files to a new box, you can start the guest-OS using the player there instead. The only thing you will have to install on the new box is the VMWare Player.

Staging or upgrading such environments are often also problematic. This is actually production software for your development team. If you want to upgrade the CI server with lets' say a new version of Continuum, you can do so my taking a snapshot of your current image. Copy that image and start it up on your desktop. Here you can do whatever changes that you want to the guest operating system, including upgrading to the newest version of Maven and Continuum. You can even dry test the build on your local machine. If it work, you are all set to offload an updated snapshot to the CI-production server. By using VMWare in this way, it really gives the term "staging" true meaning.
Based upon experience for several years running large projects with CI servers, this is really the best approach. Having a build CI-server going to a stop caused by reasons not related to code or test uncovering actual problems over and over a again will quickly render your CI infrastructure useless, people will stop to trust it, and this is not what you want.