Spotter-VM/doc/toolchain/lxc-overview.md

6.7 KiB

LXC containers overview

All user-installable applications run in LXC containers. A container is defined by a configuration file with following settings:

  • Network type and interface configuration
  • OverlayFS storage layers
  • Mountpoints to store / load persistent data
  • Functional user and binary to be executed on startup
  • Environment variables propagated to the container namespace
  • Signal used to stop the container
  • TTY / console logging
  • Syscall capability restrictions
  • Event hooks

The container must have at least one storage layer defined. The term layer is used because the storage is handled by OverlayFS filesystem consisting of groups of files overlaid over each other. This allows to have a layer with the basic operating system, another layer with python runtime, and final layer with the application using both. Therefore it's not necessary to duplicate the functionality in every container and waste disk space.

Each layer is then packaged to a separate installable package and given as dependencies to the packages which require them. Packages with final application layer also contain the container configuration and installation scripts to set up the application and interface it with other components installed on host and in other containers.

Why LXC

There are several container runtimes, with Docker being probably the most popular nowadays. There are several reasons why LXC was eventually selected instead.

First and foremost, Docker contains a huge set of tools for use with various orchestrators and large-scale applications. The premise of Docker is to run multiple instances of application containers where the individual instances are configured on runtime via command line parameters. Docker daemon and its shim processes contain a lot of abstraction, effectively obstructing the visibility on what is actually going on under the hood. LXC, on the other hand, keeps thing close to the bare minimum and transparently uses container techniques, syscalls and namespaces exposed by the linux kernel. The containers in LXC are fully defined via configuration files and don't require any additional configuration on runtime. This can arguably be achieved even on Docker via docker-composer, but that adds yet another layer of abstraction and generally is not suitable for scenarios where the container images need to be added or removed on the fly.

Docker is written in Go language, which is designed to create runtime-safe statically linked executables. With the shear amount of Docker capabilities, this unfortunately means that the whole Docker infrastructure occupies roughly 200 MB on the VM hard drive. The basic virtual machine image is designed to be as small as possible, so having a 200 MB large container host on an operating system which alone occupies roughly 40 MB does not seem ideal. LXC runtime written in C/C++ on the other hand occupies roughly 4 MB and doesn't need any other dependencies besides cgroupfs which, for performance reasons, is good to have installed anyway.

Due to the Docker's approach, storage overlay layers cannot be easily managed by the container builder and instead depend on the amount and order of directives in Dockerfile recipe file. This often leads to duplication of layers just because they are in slightly different order than another container has. So if one container has order of layers system -> python -> java and another has system -> java -> nodejs, only the *system will be shared but the java will be duplicated. This of course makes sense if reordering the layers makes the final content inconsistent, however this is not the case with Alpine linux (there is one specific case where it is a problem, but it can be circumvented), so with LXC, we have a full control on what will be in a single layer and in which order will the layers be overlaid.

Finally, Docker maintainers explicitly refuse to implement a possibility to isolate the docker daemon to private Docker repositories (registries) in the community edition of Docker. It is possible to have some custom and even private repositories, but it is not possible to deactivate the default public Dockerhub.

The downsides of using LXC is that its usage requires a bit more knowledge about how the linux containers actually work, and that most 3rd party applications are distributed using Dockerfile, which requires rewriting into LXC, however this is simplified by the lxcbuild tool, which aims to automatize LXC container building using Dockerfile-like syntax.

Container interfaces

Due to the fact that LXC container has all environment variables, mounts, used layers, init and everything it needs for starting, running and stopping in its config configuration file, you are completely free to build the container in any way possible. The only requirement imposed by the host (virtual machine) infrastructure is that in case of containers with user-accessible web interfaces via HTTP(S) which needs to be proxied via nginx HTTP server, the application needs to be reachable via plain HTTP on port 8080/TCP.

The container itself is normally handled as a service via hosts OpenRC init scripts, calling lxc-start and lxc-stop. Should the application in the container be user-accessible, it needs to register itself in host's nginx HTTP proxy server via hooks described in VMMgr hooks. Full example of an application container init script is as follows:

#!/sbin/openrc-run

description="CKAN container"

depend() {
	need ckan-datapusher postgres redis solr
}

start() {
	lxc-start ckan
}

start_post() {
	vmmgr register-proxy ckan
}

stop_pre() {
	vmmgr unregister-proxy ckan
}

stop() {
	lxc-stop ckan
}

See openrc-run(8) manual page for reference.

If the application itself doesn't support connection via plain HTTP (e.g. it is a CGI/WSGI application), the container needs to contain also a web server which will proxy the connection. Recommended web server for this purpose is nginx HTTP server, which is lightweight and can proxy all commonly used gateway interfaces. Bear in mind that in some cases, the application needs to able to infer its own HTTP host, so some header tuning for both HTTP and CGI/WSGI protocols might be in order.

In case there are more components or services running within the same container (e.g. nginx HTTP server and PHP-FPM), it is advised to have them spawned and supervised using some lightweight init daemon. Recommended init system for LXC is s6, however if you are familiar with daemontools or runit, feel free to use them as well. In the worst case you can use OpenRC. Systems like Upstart, Systemd or SysV init are not recommended for their complexity or inability to properly supervise the spawned processes.