nixfiles

My NixOS configuration and assorted other crap, powered by flakes. Clone to /etc/nixos.

CI checks ensure that code is formatted and passes linting. Run those locally with:

nix flake check
nix run .#fmt
nix run .#lint

Overview

This is an opinionated config making assumptions which work for me but might not for you:

  • These are primarily single-user hosts, with me being that user. While security and availability are important, convenience takes priority.
  • Observability is good but there's no central graphing or alerting stack, every host has to run their own.
  • Databases should not be shared, each service has its own containerised instance. This means a single host may run several instances of the same database software, but that's an acceptable overhead.
  • Persistent docker volumes should be backed by bind-mounts to the filesystem.
  • For ZFS systems, wiping / on boot is good actually.

Everything in shared/default.nix is enabled on every host by default. Notable decisions are:

  • Every user gets a ~/tmp directory with files cleaned out after 7 days.
  • Automatic upgrades (including reboots if needed), automatic deletions of generations older than 30 days, and automatic garbage collection are all enabled.
  • Locale, timezone, and keyboard layout all set to UK / GB values (yes, even on servers).
  • Firewall and fail2ban are enabled, but pings are explicitly allowed.
  • SSH accepts pubkey auth only: no passwords.
  • Syncthing is enabled.

For monitoring and alerting specifically:

  • Prometheus, Grafana, and Alertmanager are all enabled by default (Alertmanager needs AWS credentials provided to actually send alerts).
  • The Node Exporter is enabled, along with a dashboard.
  • cAdvisor is enabled, along with a dashboard.

If using ZFS there are a few more things configured:

  • All pools are scrubbed monthly.
  • The auto-trim and auto-snapshot jobs are enabled (for pools which have those configured).
  • There's a Prometheus alert for pools in a state other than "online".

Everything else in shared/ is available to every host, but disabled by default.

Tools

Backups

Backups are managed by shared/restic-backups and uploaded to Backblaze B2 with restic.

List all the snapshots with:

nix run .#backups                                # all snapshots
nix run .#backups -- snapshots --host <hostname> # for a specific host
nix run .#backups -- snapshots --tag <tag>       # for a specific tag

Restore a snapshot to <restore-dir> with:

nix run .#backups restore <snapshot> [<restore-dir>]

If unspecified, the snapshot is restored to /tmp/restic-restore-<snapshot>.

Secrets

Secrets are managed with sops-nix. Create / edit secrets with:

nix run .#secrets                   # secrets.yaml for current host
nix run .#secrets <hostname>        # secrets.yaml for <hostname>
nix run .#secrets <hostname> <name> # <name>.yaml for <hostname>

Hosts

carcosa

This is a VPS (hosted by Hetzner Cloud).

It serves barrucadu.co.uk and other services on it. Websites are served with Caddy, with certs from Let's Encrypt.

It's set up in "erase your darlings" style, so most of the filesystem is wiped on boot and restored from the configuration, to ensure there's no accidentally unmanaged configuration or state hanging around. However, it doesn't reboot automatically, because I also use this server for a persistent IRC connection.

Alerting: enabled (standard only)

Backups: enabled (standard + extras)

Public hostname: carcosa.barrucadu.co.uk

Role: server

Declared in: hosts/carcosa/configuration.nix

nyarlathotep

This is my home server.

It runs writable instances of the bookdb and bookmarks services, which have any updates copied across to carcosa hourly; it acts as a NAS; and it runs a few utility services.

Like carcosa, this host is set up in "erase your darlings" style but, unlike carcosa, it automatically reboots to install updates: so that takes effect significantly more frequently.

Alerting: enabled (standard only)

Backups: enabled (standard + extras)

Public hostname: n/a

Role: server

Declared in: hosts/nyarlathotep/configuration.nix

Modules

<shared>

Common configuration enabled on all hosts.

Alerts:

  • A zpool is in "degraded" status (alertmanager)

Options:

Declared in: shared/default.nix

bookdb

bookdb is a webapp to keep track of all my books, with a public instance on bookdb.barrucadu.co.uk.

bookdb uses a containerised elasticsearch database, it also stores uploaded book cover images.

Backups: the elasticsearch database and uploaded files.

Erase your darlings: overrides the dataDir.

Options:

Declared in: shared/bookdb/default.nix

bookmarks

bookmarks is a webapp to keep track of all my bookmarks, with a public instance on bookmarks.barrucadu.co.uk.

bookmarks uses a containerised elasticsearch database.

Backups: the elasticsearch database.

Options:

Declared in: shared/bookmarks/default.nix

concourse

Concourse CI is a "continuous thing-doer", it's a CI / CD tool. This module sets up a single-user instance, with GitHub authentication.

Concourse uses a containerised postgres database.

Provides a grafana dashboard.

Backups: the postgres database.

Options:

Declared in: shared/concourse/default.nix

erase-your-darlings

Wipe / on boot, inspired by "erase your darlings".

This module is responsible for configuring standard NixOS options and services, all of my modules have their own erase-your-darlings.nix file which makes any changes that they need.

This requires a setting up ZFS in a specific way when first installing NixOS. See the "set up a new host" runbook.

Options:

Declared in: shared/erase-your-darlings/default.nix

finder

finder is a webapp to read downloaded manga. There is no public deployment.

finder uses a containerised elasticsearch database, and requires read access to the filesystem where manga is stored. It does not manage the manga, only provides an interface to search and read.

The database can be recreated from the manga files, so this module does not include a backup script.

Options:

Declared in: shared/finder/default.nix

foundryvtt

FoundryVTT is a virtual tabletop to run roleplaying games. It is licensed software and needs to be downloaded after purchase. This module doesn't manage the FoundryVTT program files, only operating it.

The downloaded FoundryVTT program files must be in ''${dataDir}/bin.

Backups: the data files - this requires briefly stopping the service, so don't schedule backups during game time.

Erase your darlings: overrides the dataDir.

Options:

Declared in: shared/foundryvtt/default.nix

minecraft

Minecraft Java Edition runner. Supports multiple servers, with mods. This module doesn't manage the Minecraft server files, only operating them.

Yes, I know there's a NixOS minecraft module, but it uses the Minecraft in nixpkgs and only runs one server, whereas I want to run multiple modded servers.

The Minecraft server files must be in ''${dataDir}/{name}.

This module does not include a backup script. Servers must be backed up independently.

Erase your darlings: overrides the dataDir.

Options:

Declared in: shared/minecraft/default.nix

oci-containers

To do

Run podman containers run as a non-root user.

An abstraction over running containers as systemd units, enforcing some good practices:

  • Container DNS behaves the same under docker and podman.
  • Ports are exposed on 127.0.0.1, rather than 0.0.0.0.
  • Volumes are backed up by bind-mounts to the host filesystem.

Switching between using docker or podman for the container runtime should be totally transparent.

Erase your darlings: overrides the volumeBaseDir.

Options:

Declared in: shared/oci-containers/default.nix

pleroma

Pleroma is a fediverese server.

Pleroma uses a containerised postgres database.

Backups: the postgres database, uploaded files, and custom emojis.

Erase your darlings: transparently stores data on the persistent volume.

Options:

Declared in: shared/pleroma/default.nix

resolved

resolved is a recursive DNS server for LAN DNS.

Provides a grafana dashboard.

Options:

Declared in: shared/resolved/default.nix

restic-backups

Manage regular incremental, compressed, and encrypted backups with restic.

Backups are uploaded to the barrucadu-backups-a19c48 B2 bucket.

List all the snapshots with:

nix run .#backups                                # all snapshots
nix run .#backups -- snapshots --host <hostname> # for a specific host
nix run .#backups -- snapshots --tag <tag>       # for a specific tag

Restore a snapshot to <restore-dir> with:

nix run .#backups restore <snapshot> [<restore-dir>]

If unspecified, the snapshot is restored to /tmp/restic-restore-<snapshot>.

Alerts:

  • Creating or uploading a snapshot fails.

Options:

Declared in: shared/restic-backups/default.nix

torrents

Transmission is a bittorrent client. This module configures it along with a web UI.

This module does not include a backup script. Torrented files must be backed up independently.

Erase your darlings: transparently stores session data on the persistent volume.

Options:

Declared in: shared/torrents/default.nix

umami

umami is a web analytics tool.

umami uses a containerised postgres database.

Backups: the postgres database.

Options:

Declared in: shared/umami/default.nix

Options

nixfiles.bookdb.dataDir

Directory to store uploaded files to.

If the erase-your-darlings module is enabled, this is overridden to be on the persistent volume.

Type: string

Default: "/srv/bookdb"

Declared in: shared/bookdb/options.nix

nixfiles.bookdb.elasticsearchPort

Port (on 127.0.0.1) to expose the elasticsearch container on.

Type: signed integer

Default: 47164

Declared in: shared/bookdb/options.nix

nixfiles.bookdb.elasticsearchTag

Tag to use of the elasticsearch container image.

Type: string

Default: "8.0.0"

Declared in: shared/bookdb/options.nix

nixfiles.bookdb.enable

Enable the bookdb service.

Type: boolean

Default: false

Declared in: shared/bookdb/options.nix

nixfiles.bookdb.logFormat

Format of the log messages.

Type: string

Default: "json,no-time"

Declared in: shared/bookdb/options.nix

nixfiles.bookdb.logLevel

Verbosity of the log messages.

Type: string

Default: "info"

Declared in: shared/bookdb/options.nix

nixfiles.bookdb.port

Port (on 127.0.0.1) to expose bookdb on.

Type: signed integer

Default: 46667

Declared in: shared/bookdb/options.nix

nixfiles.bookdb.readOnly

Launch the service in "read-only" mode. Enable this if exposing it to a public network.

Type: boolean

Default: false

Declared in: shared/bookdb/options.nix

nixfiles.bookmarks.elasticsearchPort

Port (on 127.0.0.1) to expose the elasticsearch container on.

Type: signed integer

Default: 43389

Declared in: shared/bookmarks/options.nix

nixfiles.bookmarks.elasticsearchTag

Tag to use of the elasticsearch container image.

Type: string

Default: "8.0.0"

Declared in: shared/bookmarks/options.nix

nixfiles.bookmarks.enable

Enable the bookmarks service.

Type: boolean

Default: false

Declared in: shared/bookmarks/options.nix

nixfiles.bookmarks.logFormat

Format of the log messages.

Type: string

Default: "json,no-time"

Declared in: shared/bookmarks/options.nix

nixfiles.bookmarks.logLevel

Verbosity of the log messages.

Type: string

Default: "info"

Declared in: shared/bookmarks/options.nix

nixfiles.bookmarks.port

Port (on 127.0.0.1) to expose bookmarks on.

Type: signed integer

Default: 48372

Declared in: shared/bookmarks/options.nix

nixfiles.bookmarks.readOnly

Launch the service in "read-only" mode. Enable this if exposing it to a public network.

Type: boolean

Default: false

Declared in: shared/bookmarks/options.nix

nixfiles.concourse.concourseTag

Tag to use of the concourse/concourse container image.

Type: string

Default: "7.11.2"

Declared in: shared/concourse/options.nix

nixfiles.concourse.enable

Enable the Concourse CI service.

Type: boolean

Default: false

Declared in: shared/concourse/options.nix

nixfiles.concourse.environmentFile

Environment file to pass secrets into the service. This is of the form:

# GitHub OAuth credentials
CONCOURSE_GITHUB_CLIENT_ID="..."
CONCOURSE_GITHUB_CLIENT_SECRET="..."

# AWS SSM credentials
CONCOURSE_AWS_SSM_REGION="..."
CONCOURSE_AWS_SSM_ACCESS_KEY="..."
CONCOURSE_AWS_SSM_SECRET_KEY="..."

Type: string

Declared in: shared/concourse/options.nix

nixfiles.concourse.githubUser

The GitHub user to authenticate with.

Type: string

Default: "barrucadu"

Declared in: shared/concourse/options.nix

nixfiles.concourse.metricsPort

Port (on 127.0.0.1) to expose the Prometheus metrics on.

Type: signed integer

Default: 45811

Declared in: shared/concourse/options.nix

nixfiles.concourse.port

Port (on 127.0.0.1) to expose Concourse CI on.

Type: signed integer

Default: 46498

Declared in: shared/concourse/options.nix

nixfiles.concourse.postgresTag

Tag to use of the postgres container image.

Type: string

Default: "16"

Declared in: shared/concourse/options.nix

nixfiles.concourse.workerScratchDir

Mount a directory from the host into the worker container to use as temporary storage. This is useful if the filesystem used for container volumes isn't very big.

Type: null or path

Default: null

Declared in: shared/concourse/options.nix

nixfiles.eraseYourDarlings.barrucaduPasswordFile

File containing the hashed password for barrucadu.

If using sops-nix set the neededForUsers option on the secret.

Type: string

Declared in: shared/erase-your-darlings/options.nix

nixfiles.eraseYourDarlings.enable

Enable wiping / on boot and storing persistent data in ${persistDir}.

Type: boolean

Default: false

Declared in: shared/erase-your-darlings/options.nix

nixfiles.eraseYourDarlings.machineId

An arbitrary 32-character hexadecimal string, used to identify the host. This is needed for journalctl logs from previous boots to be accessible.

See the systemd documentation.

Type: string

Example: "64b1b10f3bef4616a7faf5edf1ef3ca5"

Declared in: shared/erase-your-darlings/options.nix

nixfiles.eraseYourDarlings.persistDir

Persistent directory which will not be erased. This must be on a different ZFS dataset that will not be wiped when rolling back to the rootSnapshot.

This module moves various files from / to here.

Type: path

Default: "/persist"

Declared in: shared/erase-your-darlings/options.nix

nixfiles.eraseYourDarlings.rootSnapshot

ZFS snapshot to roll back to on boot.

Type: string

Default: "local/volatile/root@blank"

Declared in: shared/erase-your-darlings/options.nix

nixfiles.finder.elasticsearchTag

Tag to use of the elasticsearch container image.

Type: string

Default: "8.0.0"

Declared in: shared/finder/options.nix

nixfiles.finder.enable

Enable the finder service.

Type: boolean

Default: false

Declared in: shared/finder/options.nix

nixfiles.finder.image

Container image to run.

Type: string

Declared in: shared/finder/options.nix

nixfiles.finder.mangaDir

Directory to serve manga files from.

Type: path

Example: "/mnt/nas/manga"

Declared in: shared/finder/options.nix

nixfiles.finder.port

Port (on 127.0.0.1) to expose finder on.

Type: signed integer

Default: 44986

Declared in: shared/finder/options.nix

nixfiles.firewall.ipBlocklistFile

File containing IPs to block. This is of the form:

ip-address # comment
ip-address # comment
...

Type: null or string

Default: null

Declared in: shared/options.nix

nixfiles.foundryvtt.dataDir

Directory to store data files in.

The downloaded FoundryVTT program files must be in ${dataDir}/bin.

If the erase-your-darlings module is enabled, this is overridden to be on the persistent volume.

Type: string

Default: "/var/lib/foundryvtt"

Declared in: shared/foundryvtt/options.nix

nixfiles.foundryvtt.enable

Enable the FoundryVTT service.

Type: boolean

Default: false

Declared in: shared/foundryvtt/options.nix

nixfiles.foundryvtt.port

Port (on 127.0.0.1) to expose FoundryVTT on.

Type: signed integer

Default: 46885

Declared in: shared/foundryvtt/options.nix

nixfiles.minecraft.dataDir

Directory to store data files in.

If the erase-your-darlings module is enabled, this is overridden to be on the persistent volume.

Type: path

Default: "/var/lib/minecraft"

Declared in: shared/minecraft/options.nix

nixfiles.minecraft.enable

Enable the Minecraft service.

Type: boolean

Default: false

Declared in: shared/minecraft/options.nix

nixfiles.minecraft.servers

Attrset of minecraft server definitions. Each server {name} is run in the working directory ${dataDir}/{name}.

Type: attribute set of (submodule)

Default: { }

Declared in: shared/minecraft/options.nix

nixfiles.minecraft.servers.<name>.autoStart

Start the server automatically on boot.

Type: boolean

Default: true

Declared in: shared/minecraft/options.nix

nixfiles.minecraft.servers.<name>.jar

Name of the JAR file to use. This file must be in the working directory.

Type: string

Default: "minecraft-server.jar"

Declared in: shared/minecraft/options.nix

nixfiles.minecraft.servers.<name>.jre

Java runtime package to use.

Type: package

Default: <derivation openjdk-headless-17.0.7+7>

Declared in: shared/minecraft/options.nix

nixfiles.minecraft.servers.<name>.jvmOpts

Java runtime arguments. Cargo cult these from a forum post and then never think about them again.

Type: strings concatenated with " "

Default: "-Xmx4G -Xms4G -XX:+UnlockExperimentalVMOptions -XX:+UseG1GC -XX:G1NewSizePercent=20 -XX:G1ReservePercent=20 -XX:MaxGCPauseMillis=50 -XX:G1HeapRegionSize=32M"

Declared in: shared/minecraft/options.nix

nixfiles.minecraft.servers.<name>.port

Port to open in the firewall. This must match the port in the server.properties file.

Type: signed integer

Declared in: shared/minecraft/options.nix

nixfiles.oci-containers.backend

The container runtime.

Type: one of "docker", "podman"

Default: "docker"

Declared in: shared/oci-containers/options.nix

nixfiles.oci-containers.pods

Attrset of pod definitions.

Type: attribute set of (submodule)

Default: { }

Declared in: shared/oci-containers/options.nix

nixfiles.oci-containers.pods.<name>.containers

Attrset of container definitions.

Type: attribute set of (submodule)

Default: { }

Declared in: shared/oci-containers/options.nix

nixfiles.oci-containers.pods.<name>.containers.<name>.autoStart

Start the container automatically on boot.

Type: boolean

Default: true

Declared in: shared/oci-containers/options.nix

nixfiles.oci-containers.pods.<name>.containers.<name>.cmd

Command-line arguments to pass to the container image's entrypoint.

Type: list of string

Default: [ ]

Declared in: shared/oci-containers/options.nix

nixfiles.oci-containers.pods.<name>.containers.<name>.dependsOn

Other containers that this one depends on, in ${pod}-${name} format.

Type: list of string

Default: [ ]

Example: [ "concourse-db" ]

Declared in: shared/oci-containers/options.nix

nixfiles.oci-containers.pods.<name>.containers.<name>.environment

Environment variables to set for this container.

Type: attribute set of string

Default: { }

Declared in: shared/oci-containers/options.nix

nixfiles.oci-containers.pods.<name>.containers.<name>.environmentFiles

List of environment files for this container.

Type: list of path

Default: [ ]

Declared in: shared/oci-containers/options.nix

nixfiles.oci-containers.pods.<name>.containers.<name>.extraOptions

Extra options to pass to docker run / podman run.

Type: list of string

Default: [ ]

Declared in: shared/oci-containers/options.nix

nixfiles.oci-containers.pods.<name>.containers.<name>.image

Container image to run.

Type: string

Declared in: shared/oci-containers/options.nix

nixfiles.oci-containers.pods.<name>.containers.<name>.login.passwordFile

File containing the password for the container registry.

Type: null or string

Default: null

Declared in: shared/oci-containers/options.nix

nixfiles.oci-containers.pods.<name>.containers.<name>.login.registry

Container registry to authenticate with.

Type: null or string

Default: null

Declared in: shared/oci-containers/options.nix

nixfiles.oci-containers.pods.<name>.containers.<name>.login.username

Username for the container registry.

Type: null or string

Default: null

Declared in: shared/oci-containers/options.nix

nixfiles.oci-containers.pods.<name>.containers.<name>.ports

List of ports to expose.

Type: list of (submodule)

Default: [ ]

Declared in: shared/oci-containers/options.nix

nixfiles.oci-containers.pods.<name>.containers.<name>.ports.*.host

Host port (on 127.0.0.1) to expose the container port on.

Type: signed integer

Declared in: shared/oci-containers/options.nix

nixfiles.oci-containers.pods.<name>.containers.<name>.ports.*.inner

The container port to expose to the hosti.

Type: signed integer

Declared in: shared/oci-containers/options.nix

nixfiles.oci-containers.pods.<name>.containers.<name>.pullOnStart

Pull the container image when starting (useful for :latest images).

Type: boolean

Default: true

Declared in: shared/oci-containers/options.nix

nixfiles.oci-containers.pods.<name>.containers.<name>.volumes

List of volume definitions.

Type: list of (submodule)

Default: [ ]

Declared in: shared/oci-containers/options.nix

nixfiles.oci-containers.pods.<name>.containers.<name>.volumes.*.host

Directory on the host to bind-mount into the container.

This option conflicts with ${name}.

Type: null or string

Default: null

Declared in: shared/oci-containers/options.nix

nixfiles.oci-containers.pods.<name>.containers.<name>.volumes.*.inner

Directory in the container to mount the volume to.

Type: string

Declared in: shared/oci-containers/options.nix

nixfiles.oci-containers.pods.<name>.containers.<name>.volumes.*.name

Name of the volume. This is equivalent to:

host = "${volumeBaseDir}/${volumeSubDir}/${name}";

This option c.logonflicts with ${host}.

Type: null or string

Default: null

Declared in: shared/oci-containers/options.nix

nixfiles.oci-containers.pods.<name>.volumeSubDir

Subdirectory of the ${volumeBaseDir} to store bind-mounts under.

Type: string

Default: "‹name›"

Declared in: shared/oci-containers/options.nix

nixfiles.oci-containers.volumeBaseDir

Directory to store volume bind-mounts under.

If the erase-your-darlings module is enabled, this is overridden to be on the persistent volume.

Type: string

Declared in: shared/oci-containers/options.nix

nixfiles.pleroma.adminEmail

Email address used to contact the server operator.

Type: string

Default: "mike@barrucadu.co.uk"

Declared in: shared/pleroma/options.nix

nixfiles.pleroma.allowRegistration

Allow new users to sign up.

Type: boolean

Default: false

Declared in: shared/pleroma/options.nix

nixfiles.pleroma.domain

Domain which Pleroma will be exposed on.

Type: string

Example: "social.lainon.life"

Declared in: shared/pleroma/options.nix

nixfiles.pleroma.enable

Enable the Pleroma service.

Type: boolean

Default: false

Declared in: shared/pleroma/options.nix

nixfiles.pleroma.faviconPath

File to use for the favicon.

Type: null or path

Default: null

Declared in: shared/pleroma/options.nix

nixfiles.pleroma.instanceName

Name of the instance, defaults to the ${domain} if not set.

Type: null or string

Default: null

Declared in: shared/pleroma/options.nix

nixfiles.pleroma.notifyEmail

Email address used for notification, defaults to the ${adminEmail} if not set.

Type: null or string

Default: null

Declared in: shared/pleroma/options.nix

nixfiles.pleroma.port

Port (on 127.0.0.1) to expose Pleroma on.

Type: signed integer

Default: 46283

Declared in: shared/pleroma/options.nix

nixfiles.pleroma.postgresTag

Tag to use of the postgres container image.

Type: string

Default: "16"

Declared in: shared/pleroma/options.nix

nixfiles.pleroma.secretsFile

File containing secret configuration.

See the Pleroma documentation for what this needs to contain.

Type: string

Declared in: shared/pleroma/options.nix

nixfiles.resolved.address

Address to listen on.

Type: string

Default: "0.0.0.0:53"

Declared in: shared/resolved/options.nix

nixfiles.resolved.authoritativeOnly

Only answer queries for which this server is authoritative: do not perform recursive or forwarding resolution.

Type: boolean

Default: false

Declared in: shared/resolved/options.nix

nixfiles.resolved.cacheSize

How many records to hold in the cache.

Type: signed integer

Default: 512

Declared in: shared/resolved/options.nix

nixfiles.resolved.enable

Enable the resolved service.

Type: boolean

Default: false

Declared in: shared/resolved/options.nix

nixfiles.resolved.forwardAddress

Act as a forwarding resolver, not a recursive resolver: forward queries which can't be answered from local state to this nameserver and cache the result.

Type: null or string

Default: null

Declared in: shared/resolved/options.nix

nixfiles.resolved.hostsDirs

List of directories to read hosts files from.

Type: list of string

Default: [ ]

Declared in: shared/resolved/options.nix

nixfiles.resolved.logFormat

Format of the log messages.

Type: string

Default: "json,no-time"

Declared in: shared/resolved/options.nix

nixfiles.resolved.logLevel

Verbosity of the log messages.

Type: string

Default: "dns_resolver=info,resolved=info"

Declared in: shared/resolved/options.nix

nixfiles.resolved.metricsAddress

Address to listen on to serve Prometheus metrics.

Type: string

Default: "127.0.0.1:9420"

Declared in: shared/resolved/options.nix

nixfiles.resolved.protocolMode

How to choose between connecting to upstream nameservers over IPv4 or IPv6 when acting as a recursive resolver.

Type: string

Default: "only-v4"

Declared in: shared/resolved/options.nix

nixfiles.resolved.useDefaultZones

Include the default zone files.

Type: boolean

Default: true

Declared in: shared/resolved/options.nix

nixfiles.resolved.zonesDirs

List of directories to read zone files from.

Type: list of string

Default: [ ]

Declared in: shared/resolved/options.nix

nixfiles.restic-backups.backups

Attrset of backup job definitions.

Type: attribute set of (submodule)

Default: { }

Declared in: shared/restic-backups/options.nix

nixfiles.restic-backups.backups.<name>.cleanupCommand

A script to run after taking the backup.

Type: null or string

Default: null

Declared in: shared/restic-backups/options.nix

nixfiles.restic-backups.backups.<name>.paths

List of paths to back up.

Type: list of string

Default: [ ]

Declared in: shared/restic-backups/options.nix

nixfiles.restic-backups.backups.<name>.prepareCommand

A script to run before beginning the backup.

Type: null or string

Default: null

Declared in: shared/restic-backups/options.nix

nixfiles.restic-backups.backups.<name>.startAt

When to run the backup.

Type: string

Default: "Mon, 04:00"

Declared in: shared/restic-backups/options.nix

nixfiles.restic-backups.checkRepositoryAt

If not null, when to run restic check to validate the repository metadata.

Type: null or string

Default: null

Declared in: shared/restic-backups/options.nix

nixfiles.restic-backups.enable

Enable the backup service.

Type: boolean

Default: false

Declared in: shared/restic-backups/options.nix

nixfiles.restic-backups.environmentFile

Environment file to pass secrets into the service. This is of the form:

# Repository password
RESTIC_PASSWORD="..."

# B2 credentials
B2_ACCOUNT_ID="..."
B2_ACCOUNT_KEY="..."

# AWS SNS credentials
AWS_ACCESS_KEY="..."
AWS_SECRET_ACCESS_KEY="..."
AWS_DEFAULT_REGION="..."

If any of the backup jobs need secrets, those should be specified in this file as well.

Type: null or string

Declared in: shared/restic-backups/options.nix

nixfiles.restic-backups.sudoRules

List of additional sudo rules to grant the backup user.

Type: list of (submodule)

Default: [ ]

Declared in: shared/restic-backups/options.nix

nixfiles.restic-backups.sudoRules.*.command

The command for which the rule applies.

Type: string

Declared in: shared/restic-backups/options.nix

nixfiles.restic-backups.sudoRules.*.runAs

The user / group under which the command is allowed to run.

A user can be specified using just the username: "foo". It is also possible to specify a user/group combination using "foo:bar" or to only allow running as a specific group with ":bar".

Type: string

Default: "ALL:ALL"

Declared in: shared/restic-backups/options.nix

nixfiles.torrents.downloadDir

Directory to download torrented files to.

Type: string

Example: "/mnt/nas/torrents/files"

Declared in: shared/torrents/options.nix

nixfiles.torrents.enable

Enable the Transmission service.

Type: boolean

Default: false

Declared in: shared/torrents/options.nix

nixfiles.torrents.group

The group to run Transmission as.

Type: string

Declared in: shared/torrents/options.nix

nixfiles.torrents.logLevel

Verbosity of the log messages.

Type: integer between 0 and 6 (both inclusive)

Default: 2

Declared in: shared/torrents/options.nix

nixfiles.torrents.openFirewall

Allow connections from TCP and UDP ports ${portRange.from} to ${portRange.to}.

Type: boolean

Default: true

Declared in: shared/torrents/options.nix

nixfiles.torrents.peerPort

Port to accept peer connections on.

Type: 16 bit unsigned integer; between 0 and 65535 (both inclusive)

Default: 50000

Declared in: shared/torrents/options.nix

nixfiles.torrents.rpcPort

Port to accept RPC connections on. Bound on 127.0.0.1.

Type: 16 bit unsigned integer; between 0 and 65535 (both inclusive)

Default: 49528

Declared in: shared/torrents/options.nix

nixfiles.torrents.stateDir

Directory to store service state in.

Type: string

Example: "/var/lib/torrents"

Declared in: shared/torrents/options.nix

nixfiles.torrents.user

The user to run Transmission as.

Type: string

Declared in: shared/torrents/options.nix

nixfiles.torrents.watchDir

Directory to monitor for new .torrent files.

Type: string

Example: "/mnt/nas/torrents/watch"

Declared in: shared/torrents/options.nix

nixfiles.umami.enable

Enable the umami service.

Type: boolean

Default: false

Declared in: shared/umami/options.nix

nixfiles.umami.environmentFile

Environment file to pass secrets into the service. This is of the form:

HASH_SALT="..."

Type: string

Declared in: shared/umami/options.nix

nixfiles.umami.port

Port (on 127.0.0.1) to expose umami on.

Type: signed integer

Default: 46489

Declared in: shared/umami/options.nix

nixfiles.umami.postgresTag

Tag to use of the postgres container image.

Type: string

Default: "16"

Declared in: shared/umami/options.nix

nixfiles.umami.umamiTag

Tag to use of the ghcr.io/umami-software/umami container image.

Type: string

Default: "postgresql-v2.9.0"

Declared in: shared/umami/options.nix

DiskSpaceLow

This alert fires when a partition has under 10% free space remaining.

The alert will say which partitions are affected, df -h also has the information:

$ df -h
Filesystem                Size  Used Avail Use% Mounted on
devtmpfs                  1.6G     0  1.6G   0% /dev
tmpfs                      16G  112K   16G   1% /dev/shm
tmpfs                     7.8G  9.8M  7.8G   1% /run
tmpfs                      16G  1.1M   16G   1% /run/wrappers
local/volatile/root       1.7T  1.8G  1.7T   1% /
local/persistent/nix      1.7T  5.1G  1.7T   1% /nix
local/persistent/persist  1.7T  2.0G  1.7T   1% /persist
local/persistent/var-log  1.7T  540M  1.7T   1% /var/log
efivarfs                  128K   40K   84K  33% /sys/firmware/efi/efivars
local/persistent/home     1.7T   32G  1.7T   2% /home
/dev/nvme0n1p2            487M   56M  431M  12% /boot
data/nas                   33T   22T   11T  68% /mnt/nas
tmpfs                     3.2G   12K  3.2G   1% /run/user/1000

Note all ZFS datasets in the same pool (local/* and data/* in the example above) share the underlying storage.

Debugging steps:

  • See the node_filesystem_avail_bytes metric for how quickly disk space is being consumed
  • Use ncdu -x to work out where the space is going
  • Buy more storage if need be

ZPoolStatusDegraded

This alert fires when an HDD fails.

The zpool status -x command will say which drive has failed; what, specifically, the problem is; and link to a runbook:

$ zpool status -x
  pool: data
 state: DEGRADED
status: One or more devices could not be used because the label is missing or
        invalid.  Sufficient replicas exist for the pool to continue
        functioning in a degraded state.
action: Replace the device using 'zpool replace'.
   see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-4J
  scan: scrub in progress since Thu Feb  1 00:00:01 2024
        19.3T / 20.6T scanned at 308M/s, 17.8T / 20.6T issued at 284M/s
        0B repaired, 86.49% done, 02:51:42 to go
config:

        NAME                                         STATE     READ WRITE CKSUM
        data                                         DEGRADED     0     0     0
          mirror-0                                   DEGRADED     0     0     0
            11478606759844821041                     UNAVAIL      0     0     0  was /dev/disk/by-id/ata-ST10000VN0004-1ZD101_ZA206882-part2
            ata-ST10000VN0004-1ZD101_ZA27G6C6-part2  ONLINE       0     0     0
          mirror-1                                   ONLINE       0     0     0
            ata-ST10000VN0004-1ZD101_ZA22461Y        ONLINE       0     0     0
            ata-ST10000VN0004-1ZD101_ZA27BW6R        ONLINE       0     0     0
          mirror-2                                   ONLINE       0     0     0
            ata-ST10000VN0008-2PJ103_ZLW0398A        ONLINE       0     0     0
            ata-ST10000VN0008-2PJ103_ZLW032KE        ONLINE       0     0     0

errors: No known data errors

Follow the provided runbook. In most cases the solution will be to:

  1. Buy a new HDD (of at least the same size as the failed one)
  2. Physically replace the failed HDD with the new one
  3. Run zpool replace <pool> <old-device> <new-device>
  4. Wait for the new device to resilver

Set up a new host

Install NixOS

Boot into the ISO and install NixOS with tools/provision-machine.sh:

sudo -i
nix-env -f '<nixpkgs>' -iA git
curl https://raw.githubusercontent.com/barrucadu/nixfiles/master/tools/provision-machine.sh > provision-machine.sh
bash provision-machine.sh gpt /dev/sda

Then:

  1. Rename /mnt/persist/etc/nixos/hosts/new after the new hostname
  2. Add the host to /mnt/persist/etc/nixos/flake.nix
  3. Add the new files to git
  4. Run nixos-install --flake /mnt/persist/etc/nixos#hostname
  5. Reboot

First boot

Generate an age public key from the host SSH key:

nix-shell -p ssh-to-age --run 'ssh-keyscan localhost | ssh-to-age'

Add a new section with this key to /persist/etc/nixos/.sops.yaml:

creation_rules:
  ...
  - path_regex: hosts/<hostname>/secrets(/[^/]+)?\.yaml$
    key_groups:
      - age:
          - *barrucadu
          - '<key>'

Add a users/barrucadu secret with the hashed user password:

nix run .#secrets

Copy the host SSH keys to /etc/persist:

mkdir /persist/etc/ssh
cp /etc/ssh/ssh_host_rsa_key /persist/etc/ssh/ssh_host_rsa_key
cp /etc/ssh/ssh_host_ed25519_key /persist/etc/ssh/ssh_host_ed25519_key

Enable nixfiles.eraseYourDarlings:

nixfiles.eraseYourDarlings.enable = true;
nixfiles.eraseYourDarlings.barrucaduPasswordFile = config.sops.secrets."users/barrucadu".path;
sops.secrets."users/barrucadu".neededForUsers = true;

Then:

  1. Rebuild the system: sudo nixos-rebuild boot --flake /persist/etc/nixos
  2. Reboot

Optional: Add DNS records

Add A / AAAA records to the ops repo and apply the change via Concourse.

Optional: Configure alerting

All hosts have Alertmanager installed and enabled. To actually publish alerts, create a secret for the environment file with credentials for the host-notifications SNS topic:

AWS_ACCESS_KEY="..."
AWS_SECRET_ACCESS_KEY="..."

Then configure the environment file:

services.prometheus.alertmanager.environmentFile = config.sops.secrets."services/alertmanager/env".path;
sops.secrets."services/alertmanager/env" = { };

Optional: Configure backups

All hosts which run any sort of service with data I care about should take automatic backups.

Firstly, add the backup credentials to the secrets:

nix run .#secrets

Then enable backups in the host configuration:

nixfiles.restic-backups.enable = true;
nixfiles.restic-backups.environmentFile = config.sops.secrets."nixfiles/restic-backups/env".path;
sops.secrets."nixfiles/restic-backups/env" = { };

Most services define their own backup scripts. For any other needs, write a custom backup job:

nixfiles.restic-backups.backups.<name> = { ... };

Optional: Generate SSH key

Generate an ed25519 SSH key:

ssh-keygen -t ed25519

If the host should be able to interact with GitHub: add the public key to the GitHub user configuration as an SSH key.

If the host should be able to push commits to GitHub: add the public key to the GitHub user configuration as a signing key, and also add it to the allowed_signers file.

If the host should be able to connect to other machines: add the public key to shared/default.nix.

Optional: Configure Syncthing

Use the Syncthing Web UI (localhost:8384) to get the machine's ID. Add this ID to any other machines which it should synchronise files with, through their web UIs.

Then configure any shared folders.

Move a configuration to a new machine

Follow the set up a new host instructions up to step 5 (cloning the nixfiles repo to /etc/nixos).

Then:

  1. Merge the generated machine configuration into the nixfiles configuration
  2. Copy the sops master key to .config/sops/age/keys.txt
  3. If using secrets: Re-encrypt the secrets
  4. If there is a backup: Restore the latest backup
  5. Remove the sops master key
  6. If wiping / on boot: Copy any files which need to be preserved to the appropriate place in /persist
  7. Optional: Update DNS records
  8. Optional: Generate SSH key
  9. Build the new system configuration with sudo nixos-rebuild switch --flake '.#<hostname>'
  10. Reboot
  11. Commit, push, & merge
  12. Optional: Configure Syncthing

If using secrets: Re-encrypt the secrets

After first boot, generate an age public key from the host SSH key:

nix-shell -p ssh-to-age --run 'ssh-keyscan localhost | ssh-to-age'

Replace the old key in .sops.yaml with the new key:

creation_rules:
  ...
  - path_regex: hosts/<hostname>/secrets(/[^/]+)?\.yaml$
    key_groups:
      - age:
          - *barrucadu
          - '<old-key>' # delete
          - '<new-key>' # insert

Update the host's encryption key:

nix shell "nixpkgs#sops" -c sops updatekeys hosts/<hostname>/secrets.yaml

If there is a backup: Restore the latest backup

Download the latest backup to /tmp/backup-restore:

nix run .#backups restore <hostname>

Then move files to restore to the appropriate locations.

Optional: Update DNS records

If there are any DNS records referring to the old machine which are now incorrect (e.g. due to an IP address change), make the needed changes to the ops repo and apply the change via Concourse.

Optional: Generate SSH key

Generate an ed25519 SSH key:

ssh-keygen -t ed25519

If the host should be able to interact with GitHub: add the public key to the GitHub user configuration as an SSH key.

If the host should be able to push commits to GitHub: add the public key to the GitHub user configuration as a signing key, and also add it to the allowed_signers file.

If the host should be able to connect to other machines: add the public key to shared/default.nix.

Remove the old SSH key for this host from anywhere it's used.

Optional: Configure Syncthing

Use the Syncthing Web UI (localhost:8384) to get the machine's ID. Replace the old machine's ID and folder sharing permissions with the new machine, for any other machines which synchronised files with it.

Upgrade to a new version of postgres

Change the default postgres version for a module

  1. Individually upgrade all hosts to the new version, following the processes below.
  2. Change the default value of the postgresTag option for the module.
  3. Remove the per-host postgresTag options.

Upgrade to a new minor version

This is generally safe. Just change the postgresTag and rebuild the NixOS configuration.

Upgrade to a new major version

In brief: take a backup, shut down the database, bring up the new one, and restore the backup. This does have some downtime, but is relatively risk free.

Shell variables:

  • $CONTAINER - the database container name
  • $POSTGRES_DB - the database name
  • $POSTGRES_USER - the database user
  • $POSTGRES_PASSWORD - the database password
  • $VOLUME_DIR - the directory on the host that the container's /var/lib/postgresql/data is bind-mounted to
  • $TAG - the new container tag to use

Replace podman with docker in the following commands if you're using that.

  1. Stop all services which write to the database.

  2. Dump the database:

    sudo podman exec -i "$CONTAINER" pg_dump -U "$POSTGRES_USER" --no-owner -Fc "$POSTGRES_DB" > "${CONTAINER}.dump"
    
  3. Stop the database container:

    sudo systemctl stop "podman-$CONTAINER"
    
  4. Back up the database volume:

    sudo mv "$VOLUME_DIR" "${VOLUME_DIR}.bak"
    
  5. Create the new volume:

    sudo mkdir "$VOLUME_DIR"
    
  6. Bring up a new database container with the dump bind-mounted into it:

    sudo podman run --rm --name="$CONTAINER" -v "$(pwd):/backup" -v "${VOLUME_DIR}:/var/lib/postgresql/data" -e "POSTGRES_DB=${POSTGRES_DB}" -e "POSTGRES_USER=${POSTGRES_USER}" -e "POSTGRES_PASSWORD=${POSTGRES_PASSWORD}" --shm-size=1g "postgres:${TAG}"
    
  7. In another shell, restore the dump:

    sudo podman exec "$CONTAINER" pg_restore -U "$POSTGRES_USER" -d "$POSTGRES_DB" -Fc -j4 --clean "/backup/${CONTAINER}.dump"
    
  8. Ctrl-c the database container after the dump has restored successfully.

  9. Change the postgresTag option in the host's NixOS configuration.

  10. Rebuild the NixOS configuration and check that the database and all of its dependent services come back up:

    sudo nixos-rebuild switch
    

Rollback

The old database files are still present at ${VOLUME_DIR}.bak, so:

  1. Stop all the relevant services, including the database container.

  2. Restore the backup:

    sudo mv "$VOLUME_DIR" "${VOLUME_DIR}.aborted"
    sudo mv "${VOLUME_DIR}.bak" "$VOLUME_DIR"
    
  3. If the postgresTag has been updated in the NixOS configuration:

    1. Revert it to its previous version.
    2. Rebuild the NixOS configuration.
  4. Restart all the relevant services.