Spotter-VM/doc/toolchain/spoc-overview.rst

SPOC overview
===============

SPOC is the main component of the platform. It provides means for the layers, images, containers and applications to be built, published, installed and run. The name "*SPOC*" may have many meanings - It can stand for **Spo**\ tterVM **C**\ ontainers, **S**\ ingle **P**\ oint **o**\ f **C**\ ontact, as it's the only component with which the builder and the user needs to interact, or it can represent a wish for the applications to "*Live long and prosper*" as is the catchphrase of the better-known bearer of the same name.

SPOC is written in python 3.7 and uses LXC as a userspace interface for the Linux kernel containment features. SPOC consists of a python module, configuration file, init task and three main binaries to work with the respective deliverable types.

.. image:: attachments/schema-arch-overview.png
   :alt: Architecture - Platform overview schema
   :align: center

Goals and non-goals
-------------------

SPOC aims to:

- Be easily comprehensible from system administration point of view - the less abstraction the better
- Be minimalistic - implements only features which are consumed locally
- Be independent on any 3rd party services (both paid and free) or online repositories - everything is self-hosted
- Have minimal storage size footprint for use on desktops and laptops and on connections with low bandwidth - prefer small size before speed

SPOC doesn't aim to:

- Replace or reimplement existing container management or application distribution solutions (Docker, Flatpak, Snap, Conda, BIBBOX etc.)
- Implement any features for orchestration or remote management

Container runtime
-----------------

LXC is used as the container runtime. There are several well-known container runtimes, with *Docker* being probably the most popular. At one point in the history of SpotterVM development, Docker has been considered and used, however its size requirements, opacity of configuration from administration point of view, and unsatisfactory overlap of required and provided functionality caused that LXC was eventually selected instead. This section should probably be more appropriately titled *Why not Docker* as most comparisons will be done to it.

First and foremost, Docker contains a huge set of tools for use with various orchestrators and large-scale applications. Docker is a perfect tool when multiple instances of the same container image need to run and controlled on multiple systems. The individual instances are configured on runtime via command line parameters or stored as Docker services in cluster configuration. Docker daemon and its shim processes contain a considerable amount of abstraction, effectively obstructing the visibility on what is happening under the hood. LXC, on the other hand, keeps things close to the bare minimum and transparently uses container techniques, syscalls and namespaces exposed by the linux kernel. The containers in LXC are fully defined via configuration files and don't require any additional configuration on runtime. This can arguably be achieved even on Docker via *docker-composer*, but that adds yet another layer of abstraction and generally is not suitable for scenarios where the container images need to be added or removed on the fly.

Docker is written in Go language, which is designed to create runtime-safe statically linked executables. With the shear amount of Docker capabilities, this unfortunately means that the whole Docker infrastructure occupies roughly 200 MB on the VM hard drive. The basic virtual machine image is designed to be as small as possible, so having a 200 MB large container host on an operating system which alone occupies roughly 40 MB does not seem ideal. LXC runtime written in C/C++ on the other hand occupies roughly 4 MB and doesn't need any other dependencies besides *cgroupfs* which, for performance reasons, is good to have installed anyway. This may change with general availability of Podman on Alpine linux, as Podman offers similar functionality as Docker in quarter of the size.

Due to the Docker's approach, storage overlay layers cannot be easily managed by the container builder. They instead depend on the amount and order of directives in *Dockerfile* recipe file. Docker layers reside in directories with hashed names, which makes them impossible to manage effectively, especially with regard to minimal filesystem size footprint.

Docker requires a daemon to run at all times. Whenever the daemon needs to be stopped or restarted (e.g. due to an update), all containers need to be stopped too. It is also yet another a single point of failure. LXC is daemonless, as it simply prepares the individual container namespaces and launches the init process in them.

Finally, Docker maintainers explicitly refuse to implement a feature which would allow to restrict the Docker daemon to private repositories (registries) in the community edition of Docker (*Docker Enterprise* allows this). It is possible to have custom and even private repositories, but it is not possible to deactivate the default public *Dockerhub*.

The downside of using LXC is that its usage requires more knowledge about how the linux containers actually work. Another problem is fast availability of the desired image as most 3rd party applications are shipped with `Dockerfile` or directly distributed as Docker images. SPOC/LXC requires rewriting into LXC-compatible containers, however this is simplified by SPOC as the `image building <spoc-builder.html>`_ aims to mimic the features of Docker and automatize LXC container building using *Dockerfile*-like syntax.

In the future, LXC and part of SPOC may be replaced by Podman and Podman-compose, Vagrant or even Ansible to reach even wider audience.

Package manager
---------------

SPOC uses custom simplistic built-in package manager. There are several benefits of using custom package management which outweigh the shortcomings of native APK package manager.

Native packaging toolchain (abuild) is designed for automated bulk package building. Building packages from pre-existing directories requires some customizations and workarounds to strip unnecessary (resp. outright harmful) steps like binary stripping, symlink resolution and dependency tracing. It also requires to be run under non-root user inside ``fakeroot`` which is problematic when LXC containers should be packaged. Most of the limitations can be worked around (run as root using ``-F``, spoof build process by bind-mounting existing directory to packaging directory, skip dependency tracing using ``options="!tracedeps"`` in ``APKFILE`` and omit majority of the build process by running only ``build package prepare_metafiles create_apks index clean`` abuild targets), however there is no real benefit in abusing the native tools this way.

Furthermore, when ``apk`` package manager installs a package, it first unpacks it, then runs post-install script and once all packages are installed, only then it modifies the permissions and ownership of the files to the original values contained in the package. This means that it's not possible to run container setup as part of post-install script as most applications require the permissions to be already correct. Every single file including its ownership and permissions along with a hash is recorded in ``/lib/apk/db/installed``, which only unnecessarily bloats the database of locally installed packages (e.g. the basic python 3 layer contains ~6500 files).

With custom package manager, the whole download, unpacking and installation process can be observed directly, keeping the user informed about the currently ongoing step, as opposed to a mere download percentage offered by the bare ``apk``. Finally, the APK packages are only gzipped whereas the custom solution uses xz (LZMA2), allowing for up to 70% smaller packages, which is one of the SPOC goals.

SPOC recognizes three types of repositories, all of which work in the same fashion (i.e. with ``.tar.xz`` archives and JSON metadata)

 - Local repository - Contains information about all layers, images, containers and applications installed and available on the local machine.
 - Online repository - A remote repository containing images and applications available for download and installation. Unlike traditional package manages, SPOC can have configured only a single online repository.
 - Publish repository - Local directory where the builder stages archives and metadata, which are later supposed to be copied to the online repository. SPOC doesn't handle the copying, so that remains at discretion of the builder.

Deliverables
------------

Layers
^^^^^^

A layer is a collection of files which logically belong together and form a product or a group of products, which can be reused in a larger unit.

The term *layer* is used because the storage is handled by OverlayFS filesystem consisting of groups of files overlaid over each other. This allows us to have e.g. a layer with the basic operating system, another layer with runtime, and final layer with the application itself. This reduces the necessity to duplicate functionality in every container and waste disk space.

Images
^^^^^^

An image is a layer or a collection of layers with attached metadata describing how to work with the layers. It is basically a blueprint for container creation.

The metadata describe the order of OverlayFS layers, init working directory, user and command to be executed on startup etc. Images are closely tied to layers, therefore they always have the same name as their topmost layer. SPOC doesn't allow to create layers without images, so even with no metadata, it will be possible to spawn a container out of a layer (resp. image with no metadata) if all parent layers exist. Metadata from parent image are always overridden by metadata of the image which builds on top of it.

Each layer is packaged as a separate installable archive and the associated image's metadata are written in the repository manifest.

Containers
^^^^^^^^^^

A singular runnable instance of an image with additional metadata specific for the purpose of the container instance - e.g. persistent volume paths.

All user-installable applications run in containers. A *container* is defined by following settings combined from the image from which it has been created and additional metadata:

- Network type and interface configuration
- OverlayFS storage layers
- Mountpoints to store persistent data
- Functional user and binary to be executed on startup
- Environment variables propagated to the container namespace
- Signal used to stop the container
- TTY / console logging
- Syscall capability restrictions
- Event hooks (these are used by SPOC to circumvent some shortcomings of LXC, resp. OverlayFS)

Applications
^^^^^^^^^^^^

A collection of distinct containers working together to form an environment dedicated to the given application. Apart from metadata, application package includes also scripts for installation, update and uninstallation of such environment on a host system.

The application should be the final product exposed to the user, which the user can effortlessly install and run, without thinking about integration with other components, persistence or migrations between versions during updates.

Example
^^^^^^^

To illustrate the types, let's take a simple web application - e.g. WordPress. WordPress requires a web server, PHP interpreter and a database to function. The aim of the layers and images is to be reusable in multiple applications. We can logically split the WordPress image into several lower layers/images.

- Basic OS layer - Can be later reused for any other images which we wish to build on the same OS.
- nginx+PHP layer - On top of the OS layer, we create another layer with nginx web server and PHP interpreter, as these can be reused again for any other web application requiring the same components.
- WordPress layer - Finally, on top of the nginx+PHP layer we create a layer with the actual WordPress files.
- MySQL layer - Since we need also a database and it's likely that another application will also need a database, we base our MySQL layer on the basic OS layer again. That way we deduplicate the basic OS files and reduce the amount of data user needs to download and store.

Once the images are build, we create the application definition, which instructs SPOC that two containers should be spawned. One from the image with WordPress and the other from the image with MySQL. We also need to write a short installation script to set up the application environment - e.g. populate the database and create an admin user.

Finally we publish the images and the application and end up with following structure:

.. code-block:: text

    Basic OS layer/image
    ├─ nginx+PHP layer/image
    │  └─ WordPress layer/image
    └─ MySQL layer/image

.. code-block:: text

    WordPress application definition
    ├─ WordPress container definition based on WordPress image
    ├─ MySQL container definition based on MySQL image
    └─ install.sh script

The end-user then issues a single command to install the WordPress application which will download all the required layers, register the image metadata, create the containers, execute the installation script and finally register the application metadata too.