Saturday, December 5, 2015

Choosing a Linux Distribution for Your Web Server

http://inthebox.webmin.com/choosing-a-linux-distribution-for-web-server

centos_logo_smalldebian
ubuntu
Several years ago, I wrote a post about choosing a Linux distribution for a web server. It’s be so long that I don’t even remember where I posted it (so I, unfortunately can’t link to it), so it’s probably time to revisit the subject, as it does come up pretty frequently in our forums and in conversations with customers. The choice is somewhat more obvious today than it was back then, and I recall I covered at least five distributions (and I believe Solaris and FreeBSD) in that previous article. In this article, the leaders in the server operating system market are pretty clear, at least for Open Source platform web deployment, such as node.js, Ruby, Python, PHP, Perl, or Go. Because there are clear market leaders, I’m going to focus my attention on just three Linux distributions: CentOSDebian, and Ubuntu.
I will briefly explain why there are only three distributions most people should be considering for server deployment, and I’ll also briefly mention some situations where you might want to branch out and consider other options.
So, let’s get on with it, and pick out the right Linux distribution for your new web deployment!

Lifecycle Is Really, Really, Incredibly Important

The average server remains in service for over 36 months. I have a couple of machines that have been in use for over six years without an OS upgrade! Upgrading the Operating System on a production server, even when a remote or in-place upgrade option is available, is prone to breaking existing services in unpredictable ways, or at least in ways that are difficult to predict without a very long and time-consuming audit of all of the software running on the system and how all of the pieces interact and how they will change when upgrading to newer versions.
Thus, one goal when selecting an OS for your server should be to insure you have plenty of time between mandatory upgrades. Of course, nothing stops you from upgrading earlier than you need to. If you want newer packages and have the time to perform the upgrade or to migrate to a new server before the OS reaches its end-of-life date, there is nothing stopping you. What we are more concerned about is how soon that decision will be forced on us.
With regard to lifecycle of the major Linux server distributions, CentOS (and RHEL) is, by far, the king, with a 10 year support period. Ubuntu LTS is second with a 5 year cycle. Debian is somewhat unpredictable, but always has at least a 3 year lifecycle; sometimes there may be an LTS repository that will continue support for a given version.
Non-LTS Ubuntu releases should not be considered for server usage under any circumstances, as the lifecycle of ~18 months is simply too short. Likewise, Fedora Linux should not be considered for any server deployment.
The end-of-life for current CentOS releases is as follows:
CentOS 5Mar 31, 2017
CentOS 6November 30, 2020
CentOS 7June 30, 2024
For Ubuntu LTS:
Ubuntu 10.04 LTSApril, 2015
Ubuntu 12.04 LTSApril, 2017
Ubuntu 14.04 LTSApril, 2019
For Debian:
Debian 6 (with LTS support)February, 2016
Debian 7~Late 2016 estimated
Lifecycle Winner
CentOS by a five year landslide. If you don’t know when you’ll have the time and inclination to upgrade your server OS or move to a new server, CentOS may be the best choice for your deployment, if the other deciding factors don’t sway you to something else. Not having to think about server upgrades until 2024 is pretty cool.

 Package Management

The reason a long lifecycle for your server operating system is is so important is that you need to be able to count on your OS to provide security updates for the useful life of your server. And, the method by which software updates, particularly security updates, are provided is vitally important. It needs to be easy, reliable, and preferably something you can automate without risk.
All of the distributions in this comparison have excellent package management tools and infrastructure. In fact, they are all so excellent that I was tempted to ignore this factor altogether. But, there are some subtle differences, particularly in the available package selection. And, if you’re considering going outside of the Big Three Linux distributions covered here, or are considering a BSD or Windows for your deployment, you should definitely consider how updates will be handled, as the picture is not nearly as pleasant on every distribution and OS, and many cannot be reliably automated.

apt

The package manager invented for Debian and also found on Ubuntu is called apt. It is a very capable, fast, and efficient, package manager that handles dependency resolution and downloading and installing packages from both the OS-standard repositories and third-party repositories. It is easy to use, has numerous GUIs for searching and installing packages, and can be automated relatively reliably. apt installs and manages .deb packages. It is reasonably well-documented, though it has some surprising edge cases.

yum

Yum, aka Yellow Dog Updater Modified, was initially developed for the Yellow Dog Linux distribution as the Yellow Dog Updater (a special build of Red Hat/Fedora for Macintosh hardware), and then forked and enhanced by Seth Vidal. yum installs and manages RPM packages, and is found on CentOS, Fedora, RHEL, and several other RPM-based distributions. There are both command line and GUI utilities for working with yum, and it is well-documented.

Which is better?

Choosing between package managers is difficult, as both mostly have the same basic capabilities, and both are reasonably reliable. They both have been in use for many years, and have received significant development attention, so they are quite stable. I believe you could easily find fans of both package managers, and I wouldn’t really want to argue too strongly either way.
I’ve worked extensively with both, and the only time I had a preference was when I was creating my own repositories of packages and when I needed to customize the package manager, and in both cases yum was much more hacker-friendly. Creating yum repositories is as simple as putting some files on a webserver, and running the createrepocommand. Creating apt repositories is much more time-consuming, and requires learning a number of disparate tools, and creating scripts to automate management of the repositories.
Package Management Winner
yum on CentOS, by a small margin, if you plan to host your own package repositories. If you have no need for your own repos, or are already familiar with apt, either as a user or developer, it is a tie.

Package Selection

Closely related to package management is package selection. In other words, how many packages are readily available for your OS from the system standard repositories, and how new are those packages? Here, there are some interesting differences in philosophy between the various systems, and those differences may help you choose.

CentOS

CentOS package selection is the smallest, by far, of these three distributions, in the standard OS repositories. In the Virtualmin repositories, we have to fill in the gaps by providing a number of what we consider packages that are core to hosting service. It is missing things like ClamAV virus scanner for email processing, ProFTPd FTP server (among the most popular and more feature-filled FTP servers available), and others. This is an annoyance which the other two distros do not make you endure. CentOS has about 6,000 packages in the standard repository.
On the other hand, CentOS has the Fedora EPEL repositories, which provide Fedora packages rebuilt for CentOS. This expands the selection of available packages on CentOS with a couple thousand extra packages. One thing to keep in mind is that EPEL is not subject to the lifecycle promises of the official CentOS repositories, and is subject to volunteer contributions to keep the packages up to date (much like Debian). Most of the popular packages are pretty well-maintained, but I have occasionally seen security updates fall behind in the EPEL repos for some packages for older versions of CentOS, which can be worrying. I generally advise selectively enabling EPEL repositories, by using theincludepkgs or exclude options within the repo configuration file. In this way, you’ll know exactly which packages have come from EPEL and which ones need extra caution as time passes to insure they are kept up to date and secure.
CentOS packages in the latest release also tend to be older than those found in the latest Ubuntu release. Often this merely depends on who has had a more recent major version release, and for the moment CentOS 7 has some newer packages than the latest Ubuntu 14.04 LTS release. But, the latter also has newer versions of some important packages despite being released earlier.
CentOS is particularly strong (or weak, depending on how you look at it) about keeping the same version of packages throughout the entire lifecycle of the OS release. Thus, CentOS 7 will have Apache version 2.4.6 throughout the entire ten year life of the OS. Security updates will be applied as patches to that version of Apache, rather than adding new versions to the repository. This insures compatibility throughout the entire lifecycle, and makes it much more predictable that your server will continue to function through security updates. However, it also insures that in five years you’ll be wishing for newer versions of PHP, Ruby, Perl, Python, MySQL or MariaDB, and Apache. It is a double-edged sword and for some people the cost is too high.
In addition to the EPEL package repositories, there is also the Software Collections (SCL) repository. This repository includes updated versions of popular software, mostly programming languages and databases. There is currently SCL support for CentOS 6, but it is likely to be available for CentOS 7 as the packages found in CentOS 7 become more dated. This can allow you to continue to use an older OS version while still utilizing modern language and database versions. You can read more about the Software Collections in the CentOS Wiki.

Ubuntu

Ubuntu, with all repositories enabled (including universe), has about 23,000 packages. As you can see, there are a lot more packages available for Ubuntu than CentOS. But, many of the less popular packages are considerably less well-maintained. Sticking to the core repositories (main and security) may be advisable, in the same way that avoiding general use of EPEL on CentOS is advised. It’s best to know your packages are being well-cared for and that lots of other people are using those packages, so bugs are found quickly.
In our Virtualmin repositories for Ubuntu, we don’t have to maintain any binary packages aside from our own programs, which is indicative of how well-equipped the standard Ubuntu repositories are for web hosting deployments. It is possible to install nearly anything you could want or need, and in a relatively recent version, on the latest Ubuntu release. Ubuntu is also less strict about keeping the same version, and more likely to provide multiple versions, of common packages, like Apache, PHP, and MySQL or MariaDB. This makes Ubuntu a favorite among developers who like to stay on the bleeding edge of web development tools like PHP, Ruby On Rails, Perl Dancer, Python Django, etc.
In short, Ubuntu has far more packages and generally more recent packages, than CentOS. Ubuntu usually has more recent packages than Debian stable releases, as well, and a better update policy in terms of stability. Ubuntu’s update policy is not a strict or predictable as that of CentOS, but it is unlikely you will run into compatibility problems between minor version changes that can happen on Ubuntu with some of the core hosting software.

Debian

Debian has the most packages in its standard repositories, with something along the lines of 23,000 packages. The popular packages tend to be well-maintained by a veritable army of volunteers and using excellent infrastructure to assure quality. However, many of the packages will be quite old, at any given time. And there is less assurance of compatibility between updates in Debian than in CentOS, or even the core Ubuntu repositories.
Given Debian’s short lifecycle vs CentOS, and Ubuntu’s ability to tap into the universerepository for access to roughly the same number and quality of packages as Debian, it is hard to argue that Debian leads in this category, even though historically its huge selection of packages was hard to beat. Debian’s stable release also tends to have somewhat older packages, even in the beginning of its lifecycle, which can be a negative for some deployments.
Package Selection Winner
Ubuntu, if sheer number and newness of packages is most important. Or, possibly CentOS, by a small margin, if you prefer stability over newness, and prefer to insure your software never stops working due to incompatible changes in software running on the system.

Upgrading

I recommend not upgrading servers to entirely new versions of the OS frequently, generally speaking, since it can be time-consuming and it can introduce subtle malfunctions that can be hard to identify and fix. If you do need to upgrade, a valuable feature is the ability to upgrade without physical access to the system. This can be somewhat nerve-wracking, for servers you don’t have easy hands-on access to, but some distributions are better at it than others.

Debian and Ubuntu

apt has long been an accepted method of performing an OS upgrade on Debian, since long before Ubuntu even existed. The apt-get dist-upgrade command will handle not just dependency resolution, but it will also handle packages that have been made obsolete by newer packages or situations where various libraries have moved to new packages. This allows a system to be upgraded to a new version with very little disruption, and because it has been in use for many years, it is generally pretty reliable and a well-supported method of upgrading the system.
The process of upgrading Debian or Ubuntu using apt is quite similar, though in my experience Debian upgrades are historically smoother than Ubuntu upgrades, for a variety of reasons, but mostly because of the more conservative nature of Debian development, and the fact that more Debian users are in various states of running newer and older software together (mixing and matching of repositories on Debian is more commonly done to get newer packages, and for development purposes), so community testing of various package versions within each system version is broader, if not deeper. This is a historic difference, based on my own experiences with Debian and Ubuntu upgrade, and may be alleviated by the much larger number of Ubuntu users today.
The important thing here, however, is that upgrades on Debian or Ubuntu are a relatively painless affair, at least when compared to CentOS.

CentOS

Upgrading a CentOS system is more cumbersome. While it is possible to perform an OS upgrade with yum, it is not currently recommended or supported by the CentOS developers, so remote upgrades are very challenging. In fact, there isn’t even a very clear path for upgrading from CentOS 6 to CentOS 7 while sitting at the console. There are new tools in development for handling OS upgrades using yumfedup on Fedora and redhat-upgrade-tool on RHEL/CentOS, which will likely eventually provide a reasonable upgrade process. Though, I have never seen an upgrade using this process work without significant manual correction of issues after the upgrade process completes. I would not trust this method to upgrade a remote system, unless I had KVM access, and remote hands available in the data center to handle inserting a rescue CD should it come to that.
In short, CentOS should be considered a “cannot upgrade” OS for servers in remote locations. The only tools for performing remote upgrades are very early alpha quality at best and are not recommended by their developers for production systems.
Upgrade Winner
Debian, because of its long history of users upgrading via apt and its ideology of mixing and matching packages from various repositories, relying on the dependency metadata of the packages to allow them to reliably interoperate. Ubuntu provides a reasonable upgrade path using the same mechanism, and is a very close second. CentOS isn’t even in the game, and cannot be upgraded remotely via any reasonable mechanism.

Popularity

Ordinarily, I don’t recommend looking to popularity as a major deciding factor in choosing software, though for a variety of reasons, it does make sense to choose tools that are used by a reasonably large community. This is especially true for Open Source software.
Popular software will have more people using it, more people asking and answering questions about it online, and more people who are experts or at least comfortable working with it. This insures you can get help when you need it, you’ll be able to find plentiful documentation, and you’ll be able to hire people with expertise if you get stuck in a situation that’s over your head.
On this front, things have shifted quite a bit in the past several years. CentOS once ruled the web server market, with a huge market share advantage. Among our many thousand Virtualmin installations, CentOS accounted for approximately 85%. Today, CentOS is still the most popular web server OS, with about 50% market share (depending on who you ask and which specific niche you’re talking about, this may vary quite a bit), with Ubuntu following closely behind with 30% (and in some niches it may even hold a larger share than CentOS), and Debian following behind with about 15%.
For the majority of users, any of these three systems has achieved the minimum level of popularity necessary to insure you have a large and vibrant community of developers, users, authors, and freelancers, available to make the system work well in a wide variety of use-cases. I would not hesitate to recommend any of these systems, but would caution going outside of these three systems, because the user base of everything else is so very small.
Popularity Winner
CentOS, but it probably doesn’t matter all that much. With a 50% market share, you’re most likely to find the help you need when problems or question arise. But, Ubuntu and Debian also have very large and active communities, and you’re likely to find all the help and documentation you need for any of them.

Your Experience Level

This one won’t have a winner that I can choose for you, and simply has to be decided based on your own experience level. And, it may even be the most important single factor. If you are an expert on one distribution, but a novice on the others, you would almost certainly want to choose the one you know over the ones you don’t (unless others on your team have different expertise).
If you use Ubuntu on your desktop or laptop machine, you may find that using an Ubuntu LTS release on the server provides the least friction; you can develop in roughly the same environment you’ll be deploying into. Likewise, if you are a Fedora user on the desktop, CentOS is an obvious choice, because they share the same philosophy, package manager, and many of the same packages (Fedora can be seen as the rapidly moving development version of CentOS, and most packages and policies that find their way into CentOS began by being introduced into Fedora a year or more before).
Of course, if you have no strong existing preference, it would be wise to consider your needs for your systems and compare the other factors in this article.
Experience Winner
You! You get to choose from some of the most amazing feats of software engineering ever to exist, representing millions of person-hours of development, and they’re all free and Open Source. We live in amazing times, don’t we?

Some Final Thoughts

If you’ve made it this far, congratulations! You now know I like all three of the most popular web server Linux distributions quite a bit, and think you will probably be pretty happy with any of them. You also know that CentOS is possibly the “safest” choice for new users, by virtue of being so popular on servers, but that Ubuntu is also a fine choice, especially if you use Ubuntu on the desktop.
But, let’s talk about the other distributions out there for a moment. There are some excellent, but less popular distributions, some of them even have a reasonable life cycle and a good package manager with good package selection and upgrade process. I won’t start naming them here, as the list could grow quite long. I do think that if you have a Linux distribution that you are extremely fond of, and more importantly, extremely familiar with, and the rest of your team shares that enthusiasm and experience, you may be best off choosing what you know, as long as you do the research and make sure the lifecycle is reasonable (three years is a little short, but most folks would be OK with a 5 year lifecycle, especially if upgrading is reasonably painless).
There are also a variety of special purpose distributions out there that may play a role in your deployment, if your server’s purpose matches that of the distribution. Some good examples of this include CoreOS or Boot2docker, which are very small distributions designed just for launching Docker containers, and those containers would include a more standard Linux distribution. Those are outside of the scope of this particular article, but I’ll talk more about them in a future post.
And, if you’ll be installing the Virtualmin control panel on the system (and I think you should, because it’s the most powerful Open Source control panel and also has a well-supported commercial version), you’ll want to make sure it’s one of our Grade A Supported operating systems.