Spotter-VM/doc/toolchain/spoc-builder.rst

SPOC building and packaging
=============================

.. image:: attachments/schema-spoc-builder.png
   :alt: Deliverable lifecycle - Building and packaging schema
   :align: center

Usage
-----

``spoc-image build`` creates an LXC image based on a file with a sequence of build directives (also known as build recipe) given via command line parameter. The directory in which the file resides is taken as build context, i.e. all relative paths in the recipe are resolved from it.

.. code-block:: text

    usage: spoc-image build [-f] [-p] filename

    positional arguments:
      filename       Path to the file with build recipe

    optional arguments:
      -f, --force    Force rebuild already existing image
      -p, --publish  Publish the image after successful build

``spoc-image publish`` creates a `.tar.xz` archive of the built image and publishes it to the publish repository along with its metadata.

.. code-block:: text

    usage: spoc-image publish [-f] image

    positional arguments:
      image        Name of the image to publish

    optional arguments:
      -f, --force  Force republish already published image

To remove published image from publish repo, ``spoc-image unpublish`` command should be used.

.. code-block:: text

    usage: spoc-image unpublish image

    positional arguments:
      image       Name of the image to unpublish

Conversely, ``spoc-app publish`` creates a `.tar.xz` archive of the application setup scripts (``install.sh``, ``update.sh``, ``uninstall.sh`` and associated directories) and publishes it to the publish repository along with its metadata.

.. code-block:: text

    usage: spoc-app publish [-f] filename

    positional arguments:
      filename     Path to metadata file of the application to publish

    optional arguments:
      -f, --force  Force republish already published application

And to remove the published application, ``spoc-app unpublish`` should be used.

.. code-block:: text

    usage: spoc-app unpublish [-h] app

    positional arguments:
      app         Name of the application to unpublish

Build directives
----------------

The syntax is designed to resemble *Dockerfile* syntax in order to ease the potential transition. Since LXC operates on much lower level of abstraction than Docker, some principles are applied more explicitly and verbosely. Major difference between Docker and SPOC is that every directive in *Dockerfile* creates a new filesystem layer whereas layers in SPOC are managed manually and there's usually only a single layer produced from the build recipe.

IMAGE
^^^^^

- **Usage:** ``IMAGE <name>``
- **Description:** Sets image/layer name. Every image/layer needs to have one. Any subsequent directives requiring to be handled in containers namespace will use this name also to create a temporary LXC container.
- **Docker equivalent:** ``-t`` in ``docker build`` command line parameters

FROM
^^^^^

- **Usage:** ``FROM <path>``
- **Description:** Designates an OverlayFS layer on which the layer will be based. Unlike *Dockerfile*'s `FROM`, is SPOC this directive is optional, so if no ``FROM`` directive is given, it translates to Docker's ``FROM scratch``.
- **Docker equivalent:** ``FROM``

RUN
^^^

- **Usage:**

  .. code-block:: docker

    RUN <label>
      <commands>
    <label>

- **Description:** Executes a shell script in the image/container which is currently being built. The ``<label>`` is an arbitrary user defined string which needs to be given as the first parameter and repeated at the end of the script block, in similar fashion like heredoc label to which it actually translates. The shell script between the labels is passed as-is, including comments and empty lines, to a POSIX shell with ``-e`` and ``-v`` parameters set. Basically, following ``RUN`` entry:

  .. code-block:: docker

    RUN EOF
      # Comment
      command1
      command2
    EOF

  translates to the following script:

  .. code-block:: bash

    #!/bin/sh
    set -ev

    # Comment
    command1
    command2

  The command chaining via ``&&`` which is required in *Dockerfile* is not required in SPOC.

- **Docker equivalent:** ``RUN``

COPY
^^^^

- **Usage:** ``COPY <source> [destination]``
- **Description:** Recursively copies ``<source>`` files into ``<destination>``. Source path is relative to the build context directory, destination path is relative to the container root directory. The files are copied with the same permissions and owner/group, which are immediately shifted to the appropriate UID/GID within the container namespace. The ``<source>`` can be given as *http://* or *https://* URL in which case gzip, bzip2 or xz tar archive is expected to be downloaded and unpacked into the ``<destination>``. This is commonly used for creating a basic root filesystem of the container in similar fashion like with Docker's ``FROM scratch``.
- **Docker equivalent:** ``COPY`` or ``ADD``

USER
^^^^

- **Usage:** ``USER <uid> <gid>``
- **Description:** Sets UID/GID of the container init process to ``<uid>`` and ``<gid>``. The default UID/GID is ``0:0 (root:root)``. The values can be given also as user/group name in which case they're looked up in the image/container namespace.
- **Docker equivalent:** ``USER``
- **Populates LXC field:** ``lxc.init.uid`` and ``lxc.init.gid``

CMD
^^^

- **Usage:** ``CMD <command> [parameters...]``
- **Description:** Sets the init process of the container. This is the process which is automatically started after the container is launched. The default command is ``/bin/true`` which immediately terminates with return code 0.
- **Docker equivalent:** ``CMD``
- **Populates LXC field:** ``lxc.init.cmd``

ENV
^^^

- **Usage:** ``ENV <variable> <value>``
- **Description:** Populates environment variable ``<variable>`` with value ``<value>`` which is then passed to the init process when the container is launched and is expected to be propagated by it to subsequently launched processes.
- **Docker equivalent:** ``ENV``
- **Populates LXC field:** ``lxc.environment``

WORKDIR
^^^^^^^

- **Usage:** ``WORKDIR <dirname>``
- **Description:** Sets working directory of the container init process to ``<dirname>``. The default working directory is the container's root directory.
- **Docker equivalent:** ``WORKDIR``
- **Populates LXC field:** ``lxc.init.uid`` and ``lxc.init.gid``

HALT
^^^^

- **Usage:** ``HALT <signal>``
- **Description:** Sets container stop signal to ``<signal>``. The default signal is SIGINT.
- **Docker equivalent:** ``--signal`` in ``docker kill`` command line parameters
- **Populates LXC field:** ``lxc.signal.halt``

READY
^^^^^

- **Usage:** ``READY <command> [parameters...]``
- **Description:** Sets a command to be run after the container is started to check if the container is ready and a depending container may be started too. The command is retried until it returns returncode 0 or until the attempt to start the container times out after 30 seconds.
- **Docker equivalent:** ``HEALTHCHECK``

LXC config
----------

Although SPOC populates some LXC config fields, there are lot of defaults with remain unchanged. The template file to which *spoc-image build* fills in the values looks as follows:

.. code-block:: ini

    # Container name
    lxc.uts.name = {name}

    # Network
    lxc.net.0.type = veth
    lxc.net.0.link = spocbr0
    lxc.net.0.flags = up
    lxc.net.0.ipv4.address = {ip_address}/{ip_netmask}
    lxc.net.0.ipv4.gateway = {ip_gateway}

    # Root filesystem
    lxc.rootfs.path = {rootfs}

    # Mounts
    lxc.mount.entry = shm dev/shm tmpfs rw,nodev,noexec,nosuid,relatime,mode=1777,create=dir 0 0
    lxc.mount.entry = /etc/resolv.conf etc/resolv.conf none bind,ro,create=file 0 0
    lxc.mount.entry = {hosts} etc/hosts none bind,ro,create=file 0 0
    {mounts}

    # Environment
    lxc.environment = PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
    {env}

    # Init
    lxc.init.uid = {uid}
    lxc.init.gid = {gid}
    lxc.init.cwd = {cwd}
    lxc.init.cmd = {cmd}

    # Halt
    lxc.signal.halt = {halt}

    # Log
    lxc.console.size = 1MB
    lxc.console.logfile = {log}

    # ID map
    lxc.idmap = u 0 100000 65536
    lxc.idmap = g 0 100000 65536

    # Hooks
    lxc.hook.version = 1
    lxc.hook.pre-start = /usr/bin/spoc-hook
    lxc.hook.post-stop = /usr/bin/spoc-hook

    # Other
    lxc.arch = linux64
    lxc.include = /usr/share/lxc/config/common.conf
    lxc.include = /usr/share/lxc/config/userns.conf

For explanation of hooks and overall container integration and behavior, refer to `OverlayFS hooks <spoc-architecture.html#overlayfs-hooks>`_ section on the `SPOC Architecture <spoc-architecture.html>`_ page.

Example build recipe
--------------------

The following is an example of build recipe for *Redis* image:

.. code-block:: docker

    IMAGE redis_5.0.7-200403
    FROM alpine3.11_3.11.5-200403

    RUN EOF
        # Create OS user (which will be picked up later by apk add)
        addgroup -S -g 6379 redis
        adduser -S -u 6379 -h /var/lib/redis -s /bin/false -g redis -G redis redis

        # Install Redis
        apk --no-cache add redis
    EOF

    USER redis
    CMD /usr/bin/redis-server /etc/redis.conf

Container interfaces
--------------------

Due to the fact that LXC container has all environment variables, mounts, used layers, init and everything it needs for starting, running and stopping in its configuration file and SPOC metadata, you are completely free to build the container in any way possible. The only requirement imposed by the host (virtual machine) infrastructure is that the container has to be a linux one.

There's currently another restriction for containers with user-accessible web interfaces via HTTP(S) as the requests need to be reverse-proxied via VM's nginx HTTP server. The application within the container needs to be reachable via plain HTTP on port 8080/TCP. The reverse proxy sets the ``X-Forwareded-*`` headers which can (and should) be processed by the application.

Application publishing
----------------------

Application is defined by a JSON file with metadata and optionally by installation, update and uninstallation scripts and files associated to them.

- ``"version": "<version string>"`` - Defines the version of application and the value from local repository is compared with the one in online repository whenever the consumer checks for updated.
- ``"meta": { "<key>": "value" }`` - Attaches arbitrary metadata intended for consumption by other tools, e.g. VMMgr. These metadata have no meaning for SPOC.
- ``"containers": { "<name>": {<container definition>} }`` - Creates a definition for container. More directives for container definitions continue below.
- ``"image": "<image name>"`` - Defines which image is to be used as the base for the container creation.
- ``"depends": [ "<container name>" ]`` - Defines start dependencies between containers. This helps to ensure that e.g. a database container will be always ready before the application container is attempted to be started.
- ``"mounts": { "<volume path>": "<mountpoint>" }`` - Defines persistent volumes and their mountpoints within the container. The volume path is a relative path under ``/var/lib/spoc/volumes``. The mountpoint is a relative path under container's rootfs. If the mountpoint doesn't exist within the container, it is created. By default, the volume and the mountpoint are directories, but optionally, the mountpoint can be suffixed with ``:file`` which instructs SPOC to handle the paths as regular files instead of directories.
- ``"env": { "<key>": "<value>" }`` - Defines environment variables specific for the container. These variables are set when the init process starts and it's up to the init process to ensure any propagation to the target applications.

Setup scripts
^^^^^^^^^^^^^

While packaging / publishing, SPOC looks into the directory with the JSON file and searches for files called ``install.sh``, ``update.sh``, ``uninstall.sh`` and directories named ``install``, ``update`` and ``uninstall``, when the appropriately named script is found. These scripts and directories carry the setup steps, data, configuration files and templates required for the application to be fully installed and used by the consumer without any additional configuration.

The scripts need to be executable and can be written in any language, however traditionally they are written as shell scripts. Note that the VM has ony POSIX-compliant *ash* shell, not *bash*.

The script is always executed with working directory set to the directory where it resides. All path to the installation files can then be put as relative paths. If the need arises, the scripts can rely on following environment variables passed by SPOC, in case the default data path is not ``/var/lib/spoc``.

- ``LAYERS_DIR`` - Directory where the layers are stored. ``/var/lib/spoc/layers`` by default.
- ``VOLUMES_DIR`` - Directory where the persistent volumes are stored. ``/var/lib/spoc/volumes`` by default.
- ``APPS_DIR`` - Directory where the application scripts are stored. ``/var/lib/spoc/apps`` by default.
- ``LOG_DIR`` - Directory with container logs. ``/var/log/spoc`` by default.

Example application metadata
----------------------------

The following is an example of application metadata for *Kanboard* application:

.. code-block:: json

    {
        "version": "1.2.15-200416",
        "meta": {
            "title": "Kanboard",
            "description": "Kanban project management",
            "license": "GPL"
        },
        "containers": {
            "kanboard": {
                "image": "kanboard_1.2.15-200416",
                "depends": [
                    "kanboard-postgres"
                ],
                "mounts": {
                    "kanboard/kanboard_data": "srv/kanboard/data/files",
                    "kanboard/kanboard_conf/config.php": "srv/kanboard/config.php:file"
                }
            },
            "kanboard-postgres": {
                "image": "postgres_12.2.0-200403",
                "mounts": {
                    "kanboard/postgres_data": "var/lib/postgresql"
                }
            }
        }
    }

And the associated ``install.sh`` script (other files mentioned in the script are not shown)

.. code-block:: bash

    #!/bin/sh
    set -ev

    # Volumes
    POSTGRES_DATA="${VOLUMES_DIR}/kanboard/postgres_data"
    KANBOARD_CONF="${VOLUMES_DIR}/kanboard/kanboard_conf"
    KANBOARD_DATA="${VOLUMES_DIR}/kanboard/kanboard_data"

    # Create Postgres instance
    install -o 105432 -g 105432 -m 700 -d ${POSTGRES_DATA}
    spoc-container exec kanboard-postgres -- initdb -D /var/lib/postgresql

    # Configure Postgres
    install -o 105432 -g 105432 -m 600 postgres_data/postgresql.conf ${POSTGRES_DATA}/postgresql.conf
    install -o 105432 -g 105432 -m 600 postgres_data/pg_hba.conf ${POSTGRES_DATA}/pg_hba.conf

    # Configure Kanboard
    export KANBOARD_PWD=$(head -c 18 /dev/urandom | base64 | tr -d '+/=')
    install -o 108080 -g 108080 -m 750 -d ${KANBOARD_CONF}
    install -o 108080 -g 108080 -m 750 -d ${KANBOARD_DATA}
    envsubst <kanboard_conf/config.php | install -o 108080 -g 108080 -m 640 /dev/stdin ${KANBOARD_CONF}/config.php

    # Populate database
    spoc-container start kanboard-postgres
    envsubst <createdb.sql | spoc-container exec kanboard-postgres -- psql
    spoc-container exec kanboard -- cat /srv/kanboard/app/Schema/Sql/postgres.sql | spoc-container exec kanboard-postgres -- sh -c "PGPASSWORD=${KANBOARD_PWD} psql kanboard kanboard"

    # Create admin account
    export KANBOARD_ADMIN_USER="admin"
    export KANBOARD_ADMIN_PWD=$(head -c 12 /dev/urandom | base64 | tr -d '+/=')
    export KANBOARD_ADMIN_HASH=$(python3 -c "import bcrypt; print(bcrypt.hashpw('${KANBOARD_ADMIN_PWD}'.encode(), bcrypt.gensalt()).decode().replace('2b', '2y'))")
    envsubst <adminpwd.sql | spoc-container exec kanboard-postgres -- psql kanboard

    # Stop services required for setup
    spoc-container stop kanboard-postgres

Recommended tools and practices
-------------------------------

If the application itself doesn't support connection via plain HTTP (e.g. it is a CGI/WSGI application), the container needs to contain also a web server which will proxy the connection. Recommended web server for this purpose is *nginx* HTTP server, which is lightweight and can server as proxy for all commonly used gateway interfaces. Bear in mind that in some cases, the application needs to able to infer its own HTTP host, so some header tuning for both HTTP and CGI/WSGI protocols might be in order.

In case there are more components or services running within the same container (e.g. nginx HTTP server and PHP-FPM), it is advised to have them spawned and supervised using some lightweight init daemon. Recommended init system for LXC containers is *s6*, however if you are familiar with *daemontools* or *runit*, feel free to use them as well. In the worst case you can use *OpenRC*. Systems like *Upstart*, *Systemd* or *SysV init* are not recommended for their complexity or inability to properly supervise the spawned processes.

When an image is built, its name (and name of its immediate layer) should be universally unique. The idea is that the images are not versioned (resp. SPOC is version agnostic towards images/layers), so depending on the application, resp. container requirements, it's entirely possible to have multiple "versions" of the same image/layer installed simultaneously. It's therefore required to have the version as part of the image/layer name. E.g. ``alpine3.11_3.11.5-200403`` - this makes it clear that it's an image/layer for Alpine 3.11, specifically 3.11.5 built on 2020-04-03. At the same time this makes it possible to have e.g. ``alpine3.11_3.11.5-200520`` (newer build of the same version, possibly containing some extra security fixes) simultaneously installed and used by another container without any conflicts between the two Alpine 3.11.5 layers.

The ``uninstall.sh`` script should be designed in a way which allows it to be successfully executed even when the application is not installed. The script is run *before* ``install.sh`` in order to clean up any data from any previous potentially failed installations.

Don't hardcode any absolute paths in the install / update / uninstall scripts. Always rely on ``LAYERS_DIR``, ``VOLUMES_DIR``, ``APPS_DIR`` and ``LOG_DIR`` environment variables.