Project

General

Profile

NextGen Stage3 - Dracut-based Initramfs

Based on the Dracut Praktikum project run in Freiburg.

Overview

dracut is a modular initramfs generation framework. Its basic functionality can be easily extended with custom modules, e.g. to realize a network boot module using qcow2 images exported via dnbd3 as the root filesystem.
The core framework provides a set of builtin modules handling typical tasks of the init process (mounting pseudo-fs, hardware detection with udev, ...).
See dracut for more information.

A custom dracut module was needed to extend the base functionality with the required components to support a network boot based on dnbd3.
See systemd-init for the module's repository.

Boot process overview

Here is a brief description of the most critical steps this modules realizes:
  • Parse the kernel command line for
    • slxsrv and slxbase parameter to fetch the configuration later
    • DHCP information received during the initial PXE
  • Hardware detection with udev
    • Network cards (mandatory)
    • Hard disk drive (optional, sortof)
  • Network setup
    • (view dedicated section)
  • Fetch the configuration file from http://${slxsrv}/${slxbase}/config, contains amongst other:
    • Server and path to the dnbd3 image to connect
    • Label of the partition to use as writable space during stage4
  • Stage4 setup
    • Connect the dnbd3 image specified in the configuration file
    • Mount it as raw with xmount + libxmount_input_qemu
    • Add writable layer with through device mapper
  • Stage4 configuration
    • Copy over core stage3 services to stage4
    • Extract config.tgz
  • pivot-root

Network setup

(Dedicated section because its such a critical part of stage3)

The main job of a custom dracut module is to setup the network to access the remote rootfs.
In a PXE setup, the IP information received by the client during the initial DHCP should be passed to the initramfs via kernel command line.
This is achieved with the syslinux option IPAPPEND 3 or by crafting the ip= option using iPXE's builtin variables.

From here on out, different dracut modules can handle the network setup:
  • The standard network module parses the IP information from the kernel command line and configures the network interface accordingly using low-level tools like ip and route. It only supports a limited set of specific ip= parameter formats (see Dracut network), which does not include the PXE-like format. A workaround for this problem is to rewrite the ip= parameter to a dracut-friendly format by modifying the KCL using a bind-mount on /proc/cmdline.
  • The systemd-networkd module uses systemd's network manager to bring up the network. The module itself only installs the required binaries and service files to the initramfs, but, unlike the network module, does not setup a "wait-for-network" logic in the dracut's main booting process. As the dnbd3-rootfs module expects the network to be accessible in the pre-mount phase, an extension of the basic systemd-networkd module should handle the
  • A custom network setup module would requires a lot of work in writing a solid network module capable of handling different hardware setups. While this approach is the most flexible when it comes to supporting exotic network configurations, the use of existing network-related modules should be evaluated first and a complete custom network setup solution should only be considered as a last resort.

First tests seem to indicate that systemd-networkd is best suited for our scenarios.

Stage4 setup

This section describes how the stage4 is setup during the stage3 initial boot phase.
The configuration file downloaded during stage3's boot process contains the path to the stage4 image to use.

Several steps are needed to prepare that (still) read-only image for operations:
  • Connect dnbd3 image specified in the SLX configuration file config
  • Expose the qcow2 image with xmount + libinput_qemu as RAW
  • Add RW-layer with device mapper (either ID44 or tmpfs)
  • Mount the device mapper output device as the future rootfs

These steps are central to every boot process and are performed on every dnbd3 image.

Some open questions concerning future development:
  • Where to save qcow2 backing files (diffs)? User management?
  • Mounting: Filesystems snapshots like BTRFS?
    • Caveat: resize2fs problems with ext4 (konrad)

Stage4 configuration

WIP:
  • user-config: download and extract config.tgz (if needed)
  • auto-config: generate configuration file for the stage4's network manager
  • auto-config: generate fstab for stage4
  • auto-config: copy required dracut services

Configuration

During the boot process, the clients will download a configuration file from the boot server containing the path to the DNBD3 image, the root partition label and various options needed for the RW-layer (of note: OpenSLX-ID44 GPT partition label).
Its path is built using the kernel command line parameters slxsrv and slxbase: http://${slxsrv}/${slxbase}/config
It will be downloaded to /etc/openslx in the stage3 and copied over as /opt/openslx/openslx in the stage4.

Example

This will embed the DNBD3 image named SLX_DNBD3_IMAGE with revision SLX_DNBD3_RID from the DNBD3 server at SLX_DNBD3_SERVERS (supports more than one, but let's assume only one is given for now).
It will then use SLX_SYSTEM_PARTITION_IDENTIFIER to find the root partition within the DNBD3 image (using lsblk).
If a partition labeled as OpenSLX-ID44 is found, it will be used as a writable space for the copy-on-write file of the base qcow2 image using device mapper
The writable device is then mounted as /sysroot using SLX_MOUNT_ROOT_OPTIONS.

SLX_CONFIGURATION_LOCATION='/opt/openslx/'
SLX_DNBD3_SERVERS='1.2.3.4'
SLX_DNBD3_RID='1'
SLX_DNBD3_DEVICE='/dev/dnbd0'
SLX_DNBD3_IMAGE='packer.ubuntu1604.qcow2'
SLX_SYSTEM_PARTITION_IDENTIFIER='SLX_SYS'
SLX_SYSTEM_PARTITION_PREPARATION_SCRIPT=''
SLX_WRITABLE_DEVICE_IDENTIFIER='OpenSLX-ID44'
SLX_WRITABLE_DEVICE_IDENTIFIER_TIMEOUT_IN_SECONDS='10'
SLX_WRITABLE_DEVICE_STORAGE_FILESYSTEM_CREATE_COMMAND='mkfs.ext4'
SLX_WRITABLE_DEVICE_STORAGE_FILESYSTEM_CHECK_COMMAND='fsck.ext4'
SLX_WRITABLE_DEVICE_STORAGE_MAXIMUM_FILE_SIZE_IN_MB='2000'
SLX_WRITABLE_DEVICE_STORAGE_FILE_PATH=''
SLX_WRITABLE_DEVICE_PERSISTENT='no'
SLX_GENERATE_FSTAB_SCRIPT=''
SLX_RAMDISK_SIZE_IN_KB='1000000'
SLX_MOUNT_ROOT_OPTIONS='-o subvol=@'
SLX_LOG_FILE_PATH='/var/log/openslx'

Notes

systemd-networkd

  • The network dracut module sets up an active wait for network in dracut's initqueue step
    # make sure dracut runs initqueue
    touch /lib/dracut/need-initqueue
    # stay in initqueue as long as we don't have network access
    /sbin/initqueue --finished /lib/systemd/systemd-networkd-wait-online
    
  • The systemd-networkd module does not do this by default
    • needs a module extension
  • Renaming of the boot interface can be done via .link files
    • Tested ok on Ubuntu 16.04
    • Failed on CentOS 7.3