Spotter-VM/doc/toolchain/spoc-architecture.rst

SPOC architecture
=================

.. image:: attachments/schema-arch-interop.png
   :alt: Architecture - Services interoperation schema
   :align: center

Configuration file
------------------

The configuration file is located in ``/etc/spoc/spoc.conf`` and contains just a few directives. Standard user rarely needs to touch it, builder may need to adjust a few values for the building and package signing to work.

.. code-block:: ini

    [general]
    data-dir = /var/lib/spoc/
    log-dir = /var/log/spoc/
    network-interface = spocbr0

    [publish]
    publish-dir = /srv/build/spoc/
    signing-key = /etc/spoc/publish.key

    [repo]
    url = https://repo.spotter.cz/spoc/
    public-key = MHYwEAYHKoZIzj0CAQYFK4EEACIDYgAEWJXH4Qm0kt2L86sntQH+C1zOJNQ0qMRt0vx4krTxRs9HQTQYAy//JC92ea2aKleA8OL0JF90b1NYXcQCWdAS+vE/ng9IEAii8C2+5nfuFeZ5YUjbQhfFblwHSM0c7hEG

Section general contains common settings which is absolutely necessary for SPOC to work.

- ``data-dir`` is the path to the root directory of SPOC, under which all data are stored - repository metadata, layers, container configurations, persistent volumes, installation scripts etc. See section *Filesystem structure* for details on hierarchy.
- ``log-dir`` is the path to log directory, where transcripts of PTYs of the individual containers are logged.
- ``network-interface`` is the host bridge interface dedicated for use with SPOC containers. There are some assumptions mate to make the configuration as lightweight as possible. See section *Networking* for more details.

Section publish contains configuration for image and application building and publishing. Consumer doesn't need to set anything in this section.

- ``publish-dir`` is the path to which the newly built containers and applications will be published, i.e. where the archive and metadata will be stored and can be picked up manually od by another process to be transferred elsewhere.
- ``signing-key`` is the path to ECDSA private key used for digitally signing the built packages. See section `Package manager` on more details about how are the packages signed and verified.

Section repo contains information about online repository used by the consumer to access the metadata and download the packages.

- ``url`` is the URL of the base repository directory. The URL may contain username and password and are expected to be in RFC 1738 compliant common internet scheme syntax ``<scheme>://<user>:<password>@<host>:<port>/<url-path>``.
- ``public-key`` is the pinned ECDSA public key used for verification of metadata and downloaded content. See section `Package manager` on more details about how are the packages signed and verified.

Package manager
---------------

There are several types of deliverables, however only images and application are subject to packaging and lifecycle management. In order to create a deliverable package, the builder must have an ECDSA private key, which will be used for signing. The principle is similar to any other public-key cryptography, so an RSA or GPG keys could have been used, but ECDSA has been chosen as it uses has reasonably short keys while retaining high level of security.

When an image or application are packaged, the files (the layer in case of image and the setup scripts in case of application) are archived as *.tar.xz*, placed in the publish repository. XZ (LZMA2) has been selected as it has the best compression ratio from the commonly used compression algorithms. Then, using the builders ECDSA private key, a signed SHA512 hash of the archive is calculated and stored with the rest of the metadata. This hash, uncompressed data size and compressed archive size (download size) are written into the metadata file along with the rest of the metadata. Finally, another signed SHA512 hash is calculated from the metadata JSON file ``repository.json`` and stored as a separate signature file ``repository.sig``. All files (XZ archives, repository.json and repository.sig) then need to be transferred to a desired location on a distribution server.

In order to consume the packages, the consumer needs to know the builder's, resp. online repository's ECDSA public key and needs to have it configured (pinned) in ``/etc/spoc/spoc.conf``, otherwise any attempt for signature verification will fail.

The consumer issues a command to install the application. First, SPOC downloads the ``repository.json`` metadata file and the ``repository.sig`` signature file and verifies the signature of the metadata file. If the signature can't be verified, installation will immediately fail at this point. Otherwise, the installation continues. SPOC's dependency resolver then checks which images and layers does the application (resp. its containers) require and compares the requirements with the packages already registered in the local repository. If the application requires any images which are not yet locally available, they will be downloaded. The package is downloaded into a temporary location under ``/var/lib/spoc/tmp``. All downloaded archives have their signature verified against the signed hash contained in ``repository.json``. This ensures both that the archive has been downloaded completely and without errors and that it hasn't been tampered with. Only after a successful verification it is extracted. The extraction reuses the same file descriptor to make the verification and extraction operation atomic in attempt to prevent any local attack vectors and replacements of the archive between the two subtasks. Once the image archive is unpacked, its metadata are registered to the local repository and the downloaded archive is deleted.

When all required images are present, the archive with application setup scripts is downloaded, verified and unpacked in similar fashion. Then the containers are created as described in metadata. If an ``uninstall.sh`` file is present in the application archive, it is run in order to ensure sanity of the environment. The script is supposed to clean up any data from any previous potentially failed installations. Then, if ``install.sh`` script is present in the now extracted application archive, it is executed. It is expected that it will prepare the application environment for immediate usage by the consumer (configures the application, generates passwords, populates database etc.). If the script fails, the containers are attempted to be destroyed and cleaned up, so the installation can be retried at a future time. If the installation succeeds, the application is registered to the local repository.

SPOC compares the ``"version"`` field using `semantic versioning <https://semver.org/>`_ rules. Once a new version of application is packages and made available in the online repository, the consumer may choose to update the locally installed version. All containers belonging to the application will be stopped and the dependency solver will check if any new images/layers are to be downloaded. After all necessary packages are downloaded, verified and unpacked, SPOC destroys all containers existing in the previous application versions and recreates new ones for the current version. The recreated containers are expected to reuse the persistent storage for application data from the previous version. After the containers are recreated, an ``update.sh`` script is launched, if it exists in the application archive. This script carries out any update or migration steps to make the application environment ready for immediate usage by the consumer after the upgrade. Old images/layers which are no longer in use are not automatically removed, but the user may issue ``spoc-image clean`` cleanup command to do so.

When the consumer wishes to uninstall the application, SPOC destroys the containers including their log files and runs the ``uninstall.sh`` application script, if it is present. The script is expected to clean up persistent volumes for the application. Finally, SPOC unregisters the application from the local repository metadata and removes its setup scripts. Once again, the orphaned images/layers are not automatically removed, but the user may do so with ``spoc-image clean`` cleanup command.

Filesystem structure
--------------------

.. code-block:: text

    /etc/spoc/
    ├─ spoc.conf            - Main (and the only) configuration file described above
    └─ publish.key          - ECDSA private key used by the builder for package signing

    /var/lib/spoc/          - Root directory with all data
    ├─ apps/                - Applications directory
    │  └─ <application>/    - Application-specific directory with install/update/uninstall scripts
    ├─ containers/          - Containers directory
    │  └─ <container>/      - Container specific data
    │     ├─ config         - LXC container configuration file
    │     ├─ ephemeral/     - Directory with topmost ephemeral OverlayFS layer
    │     ├─ olwork/        - Working directory for OverlayFS
    │     └─ rootfs/        - Mountpoint for the OverlayFS serving as the root directory for the container
    ├─ hosts                - SPOC-global /etc/hosts file shared by all containers
    ├─ layers/              - Layers directory
    │  └─ <layer>           - Layer-specific files
    ├─ repository.json      - Local repository containing info about installed images, containers and apps
    ├─ tmp/                 - Directory where the downloaded archives are temporarily stored before they are unpacked
    │  ├─ apps/             - Temporary location for application archives
    │  └─ layers/           - Temporary location for layer archives
    └─ volumes/             - Volumes directory
       └─ <volume>/         - Specific persistent volume storage

    /var/lock/              - Standard system directory with locks
    ├─ spoc-hosts.lock      - Lock file for operations with SPOC-global hosts file
    ├─ spoc-local.lock      - Lock file for operations with local repository metadata file
    ├─ spoc-publish.lock    - Lock file for operations with publishing repository metadata file
    └─ spoc.lock            - Main lock file for the installation operations

    /var/log/spoc/          - Log directory
    └─ <container>.log      - Container-specific PTY transcript log file

Repository metadata
-------------------

Metadata are stored as JSON file. They contain all application, container and image definitions and the relations between them. The metadata basically contain all the information besides the actual files used by OverlayFS.

Metadata pertaining to local installation reside in ``/var/lib/spoc/repository.json`` and look as follows.

.. code-block:: json

    {
        "apps": {
            "<application>": {
                "containers": [
                    "<container>"
                ],
                "meta": {
                    "desc-cs": "<Description in Czech>",
                    "desc-en": "<Description in English>",
                    "license": "<Licence type>",
                    "title": "<Application Title>"
                },
                "version": "<version>"
            }
        },
        "containers": {
            "<container>": {
                "cmd": "<init command>",
                "gid": "<gid>",
                "layers": [
                    "<layer>"
                ],
                "mounts": {
                    "<volume>": "<mountpoint>"
                },
                "ready": "<readiness check command>",
                "uid": "<uid>"
            }
        },
        "images": {
            "<image>": {
                "cmd": "<init command>",
                "dlsize": "<archive size>",
                "hash": "<archive hash>",
                "layers": [
                    "<layer>"
                ],
                "size": "<filesystem size>"
            }
        }
    }

Networking
----------

Container managers traditionally use services like *dnsmasq* to provide DNS and DHCP capabilities. SPOC, however, uses very naive approach and use mostly static network configuration. SPOC expects that simple init systems will be used in the containers and that the network configuration can be done in a "traditional" way via using static IP addresses, ``/etc/hosts`` and ``/etc/resolv.conf`` files.

The cornerstone of network configuration is the host bridge. The default host bridge found on VM is named ``spocbr0`` and has IP 172.17.0.1/16. The IP and netmask is what SPOC uses for configuration, so just from the IP and mask, it infers that the containers need to have their IP addresses in range 172.17.0.2 - 172.17.255.254 with the same netmask.

*Poor man's DHCP* is implemented via SPOC-global hosts file ``/var/lib/spoc/hosts``. This file holds the IP leases for the individual containers and at the same time it is readonly-mounted in the containers as ``/etc/hosts``, so the containers have the information about surrounding containers and can communicate with them. Leases are added when a container is created and removed when the container is destroyed, so the container has the same IP address for its whole lifetime.

Since there is no DNS server, ``/etc/resolv.conf`` of the host is readonly-mounted into the container as-is. The containers therefore inherit the DNS configuration of the host this way.

File locking
------------

SPOC uses file locks for every operation where there is a risk of a race condition while writing metadata or other files. This allows for simultaneous run of multiple install / update / uninstall commands which will always race for the lock and will therefore be run in sequence without any interferences. The individual actions needed for the operation to succeed are always determined when the process performing the operation obtains the lock and starts processing it. Whenever another tool uses the SPOC API (python module), it is important to use the locking mechanism as well. VMMgr uses the locking, so it's safe to run install / update / uninstall from command line and web UI in parallel. They will be automatically serialized using the lock.

OverlayFS hooks
---------------

The containers are run as unprivileged ones, but they are started by the root user. This helps to work around some restrictions of OverlayFS mounts and simplifies changing ownership of the files. Normally, OverlayFS mounting is restricted to root and it's `not possible to create OverlayFS mount from user namespace <https://www.spinics.net/lists/linux-fsdevel/msg105877.html>`_. Some linux distributions, notably Debian and Ubuntu, ship with kernel `patches allowing to circumvent the restriction <https://salsa.debian.org/kernel-team/linux/blob/master/debian/patches/debian/overlayfs-permit-mounts-in-userns.patch>`_, however this is not the case with Alpine. Another option is to use `fuse-overlayfs <https://github.com/containers/fuse-overlayfs>`_, but it has some known problems on non-glibc linuxes and is not mature enough for production usage.

SPOC therefore simply uses the standard OverlayFS mounts, however since the containers are unprivileged and the rootfs mounting traditionally happens in the container namespace, the overlay layers can't be defined in the LXC container configuration. Instead, they are taken from the local repository metadata and the rootfs mountpoint is prepared before the container namespace, using LXC hooks.

During the container startup ``lxc.hook.pre-start`` event hook launches a process which will clean the contents of the topmost ephemeral layer in case the container has been shut down unexpectedly before, and creates the rootfs via standard means.

.. code-block:: bash

    mount -t overlay -o upperdir=/var/spoc/lib/containers/<container>/ephemeral,lowerdir=/var/spoc/lib/layers/<layer1>:/var/spoc/lib/layers/<layer2>,workdir=/var/spoc/lib/containers/<container>/olwork none /var/spoc/lib/containers/<container>/rootfs

On container shutdown, ``lxc.hook.post-stop`` event hook launches a process again which unmounts the rootf and cleans the ephemeral layer.

Init service
------------

SPOC install an init service, however no long-lived service process is run. The init task simply executes ``spoc-app start-autostarted`` on host startup and ``spoc-app stop-all`` on host shutdown. The former command starts all containers for all application which have autostart flag set, the latter will stop all currently running SPOC containers.