Pushing my Home Raspberry Pi cluster into a production state

Saturday 25 January 2025 · 40 mins read · Viewed 85 times

Table of contents 🔗

Table of contents
Introduction
Hardware overhaul
Software overhaul
Conclusion

Introduction 🔗

New year, new budget, new hardware. I've upgraded my home Raspberry Pi cluster with a new storage node, a new router, new OSes and new services.

If you read my last article on my home Raspberry Pi cluster, you may know that I was looking to upgrade the storage node and router. Well that's what I did, and did a lot more since then.

Hardware overhaul 🔗

Replacing the router 🔗

The setup of my cluster was the following:

Nodes are interconnected with each other via one switch to use the whole 1Gbps bandwidth.
To connect the nodes to the internet, I've used a wireless "router"... but in reality this is a Raspberry Pi 3+ which acts as a router.

Small issue: the wireless bandwidth of the Raspberry Pi is very limited.

Luckily, the whole house network has been upgraded to Wi-Fi 6E using a dual TP Link Deco XE75 setup, which replaced the old repeater and gave me a new wireless router: a D-Link router with OpenWRT.

I managed to replace the RPi router by using the "Wi-Fi Extender/Repeater with relayd" setup, without the repeater feature (disabled Access-Point):

root@router:/etc/config/network (extract)
 1config interface 'lan' # Management interface
 2	option device 'br-lan'
 3	option proto 'static'
 4	option ipaddr '192.168.2.1'
 5	option netmask '255.255.255.0'
 6	option ip6assign '60'
 7
 8config interface 'wwan' # Use to connect to the modem
 9	option proto 'dhcp'
10
11config interface 'repeater_bridge' # Relay interface
12	option proto 'relay'
13	option network 'lan wwan'

root@router:/etc/config/wireless
 1config wifi-device 'radio0' # 5GHz band, used to connect to the modem.
 2	option type 'mac80211'
 3	option path 'pci0000:00/0000:00:00.0'
 4	option band '5g'
 5	option htmode 'VHT80'
 6	option disabled '0'
 7	option country 'FR' # This is important! Based on the country, the wrong band might be used.
 8
 9config wifi-iface 'default_radio0'
10	option device 'radio0'
11	option network 'wwan'
12	option mode 'sta'
13	option ssid '[REDACTED]'
14	option encryption 'psk2'
15	option key '[REDACTED]'
16
17config wifi-device 'radio1' # 2.4 GHz band, which can be used as AP.
18	option disabled '1'
19
20config wifi-iface 'default_radio1'
21	option device 'radio1'
22	# Interface is disabled.

root@router:/etc/config/firewall (extract)
1config zone
option name 'lan'
option network 'lan repeater_bridge wwan'
option input 'ACCEPT'
option output 'ACCEPT'
option forward 'ACCEPT'

The setup gave me the following results with iperf3:

Around 160Mbps in download speed (cluster ← world).
Around 70Mbps in upload speed (world → cluster).

Which seems small, but considering it's wireless and faster than the old setup, it's pretty good.

Adding the storage node 🔗

My old storage node is an ODroid XU4. It doesn't have enough RAM to cache data over NFS. So, I need to set up a new storage node. To avoid also sending a lot of data over a small "channel", I will prefer to have some computing power on the storage node to only exposed processed data. Therefore, I will use a two tier storage setup.

The reason being that I will prefer a two tier storage setup:

The slow tier: the ODroid XU4, running on SATA SSDs, which is good for archiving, hosting videos, etc.
The fast tier: the new storage node, a Rapsberry Pi 5 with NVME SSDs, which is good for storage, but also for computing, giving me the power to host PostgreSQL databases, LDAP, etc.

Therefore, I'm replacing CockroachDB with a simple PostgreSQL setup (especially since CockroachDB decided to close the free tier). And this also means I'm installing k3s on the new node, but without k3os.

Replacing k3os with simple RaspiOS with k3s 🔗

K3OS is dead, but it's been dead for a long time (2 years at least). However, I tried to maintain a fork of k3os, but it is at that moment I saw some issues.

At the very beginning, I used this project: picl-k3os-image-generator, which is a way to generate k3os images with Busybox as base.

This project added these issues:

Busybox is not updated during k3os upgrade. In fact, k3os upgrades were simply k3s upgrades.
The kernel didn't also update and used the initial kernel that was installed with the image generator.

Basically, the OS of the Raspberry Pi didn't update during the last 3 years. But, obviously, I tried to update them, however the manipulation isn't worth it:

You need to eject the SD cards of the Raspberry Pi.
Download the latest RaspiOS image and extract the kernel and firmware.
Install the new kernel and firmware.
Reinstall the SD card and boot.

The simple fact that I have to manually install the kernel negates the whole point of k3os: to have an immutable OS.

Therefore, I will use a simple RaspiOS image with k3s, which permits kernel and firmware updates via apt. The installation process was the following:

Cordon and drain every nodes.
Remove any plan from the system-upgrade-controller.
Backup /var/lib/rancher/k3s/server/token and /var/lib/rancher/k3s/server/db/ (I recommend to back up files and also use the sqlite3 backup utility).
Install RaspiOS.
(controller) Restore /var/lib/rancher/k3s/server/token and /var/lib/rancher/k3s/server/db/.
Install k3s.
It's good to go!

By the way, you may need to vacuum the sqlite database (make sure to backup before that). The commands are:

root@controller:/var/lib/rancher/k3s/server/
1sqlite3 state.db
2
3sqlite> delete from kine where id in (select id from (select id, name from kine where id not in (select max(id) as id from kine group by name)));
4sqlite> vacuum;
5sqlite> .quit

This could repair some issues with the database and k3s. After the migration, I come back to the old style of infrastructure management: mutable OSes... which means I need to setup Ansible.

Software overhaul 🔗

Ansible 🔗

Wait what? There wasn't ansible before?

Yes, in fact, when you setup an immutable infrastructure, the idea is that everything is declarative, including the OS configuration like Kernel version, installed software.

If you declaratively deploy a Kubernetes cluster (like using Terraform on Google Cloud, or K0sctl with K0s on your own infrastructure), the only tools needed is your deployment software (Terraform or K0sctl) and Kubectl.

But since we switched to using a mutable infrastructure (mutable OSes), we need to configure the software installed on the nodes.

Therefore, I've setup Ansible with simply two roles:

roles/storage to format the storage node.
roles/upgrade_and_reboot to upgrade the OS and reboot the node.

And that's it! We can also add the k3s install process in the cluster, however, I setup the k3s upgrade controller on Kubernetes which can self-upgrade the cluster.

Basically, my rules are:

If it can be handled with Kubernetes (like CronJobs), use Kubernetes.
If it is at almost infrastructure level, use Ansible.

Migrating CockroachDB to PostgreSQL 🔗

Since, CockroachDB is no more free, I will simply switch to PostgreSQL with backups. CockroachDB uses too many CPUs and RAM while I expected a low usage, especially when using the Raft consensus. CockroachDB not being fully PostgreSQL-API compatible means that its implementation is certainly doubtful.

So... during my search I was looking for a method to migrate CockroachDB data to PostgreSQL... in which, none work.

So here's my method, which is a lot of work, but it works with certainty.

The method is the following. Let's say you want to migrate the data of Service A:

Deploy PostgreSQL on the production cluster.
To avoid any downtime, deploy Service A locally and connect it to PostgreSQL. This will run proper DB migrations, and therefore, fix any issues with the SQL schema.
Then, use DBeaver to export data:

WARNING

Don't forget to also export the sequences!
And finally, reconnect the production Service A to PostgreSQL.

That's all, do note I don't have many services running on CockroachDB, only Grafana and VaultWarden. But now, I'm running a lot of services!

New services, and death to some 🔗

FluxCD 🔗

I've talked about it in an older article, but didn't really officialize it. That's because I was still doubting of FluxCD's capabilities.

Today, I can finally say that FluxCD is the best lightweight and fully-featured GitOps solution. I had zero issues with it during the 4 past months.

The actual "UI".

The issues I've talked about in the past were:

FluxCD is not clever about Helm Chart. But in reality, this is because I used the subchart pattern which works with ArgoCD. What I've done instead is simply using the chart with the release tag, and if I need to patch it, I can simply write manifests alongside the Helm release thanks to FluxCD capabilities. (It's difficult to explain, but let's say simply there is no need for kustomized-helm with FluxCD.)
Capacitor is slow. And it is still slow, but with the notifications setup and Flux CLI installed, I'm simply not using capacitor anymore.

LLDAP 🔗

I've always hated LDAP and the existing implementations:

OpenLDAP: too complex to configure, too many "runtime" configuration.
FreeIPA: too complex and fat.
389ds: actually pretty damn good, but there are also runtime configuration issues.

However, there is now LLDAP, a lightweight LDAP implementation which does exactly what you need: a user database with preconfigured schemas.

LLDAP can use PostgreSQL as DB and has a small UI:

The UI. You can see it supports groups too!

I will be using LLDAP to unify the authentication layer of my services, especially since I want to use that as user DB for Authelia.

Authelia 🔗

Authelia is lightweight authentication server. It is an OIDC provider and is also able to handle ForwardAuth requests.

It does NOT support sign up, so it is mainly used for internal authentication. It uses Postgres as DB for Authentication method storage, and LDAP as user DB.

And that's it. It's simple, stateless and highly scalable. It doesn't have any admin UI, but has a personal user page:

Authelia can be configured with a simple YAML file.

Crowdsec 🔗

To ban bots and use crowdsourced blocklists, I've setup Crowdsec.

The setup is the following:

Parsers are installed alongside Traefik to parse the logs.
The Local API fetch the information from the parsers and make decisions.
Bouncers are installed on Traefik (or on Linux, but I didn't do that since the only entrypoint to my cluster is through HTTP, and not SSH and cie.).

I've also configured Traefik to give JSON logs so that I can analyze it on Grafana.

VictoriaLogs and Vectors 🔗

Now that I have computing power on my storage node, not only did I migrate my VictoriaMetrics instance on it, but I've also added VictoriaLogs.

VictoriaLogs is an alternative to Grafana Loki and Elasticsearch, but it highly-optimised for logs and doesn't use a lot of memory, CPU and storage. It doesn't need any custom index and it's super fast.

Here's its usage with ~80 services running on the cluster, all of them logged on VictoriaLogs:

It doesn't go above 384 MiB of RAM, and certainly doesn't use above 100m of CPU!

Now, about the agents. I use Vectors to fetch Kubernetes logs. Its configuration is dead simple:

vector.yaml
 1data_dir: /vector-data-dir
 2api:
enabled: false
address: 0.0.0.0:8686
playground: true
 6enrichment_tables:
gip:
  type: geoip
  path: /geoip/GeoLite2-City.mmdb
10sources:
k8s:
  type: kubernetes_logs
internal_metrics:
  type: internal_metrics
15transforms:
parser:
  type: remap
  inputs: [k8s]
  source: |
    structured, err = parse_json(.message)
    if err == null {
      . = merge!(., structured)
    }
routes:
  type: route
  inputs: [parser]
  route:
    traefik: '.kubernetes.container_name == "tail-accesslogs" && contains(to_string(.kubernetes.pod_name) ?? "", "traefik")'
  reroute_unmatched: true # Send unmatched logs to routes._unmatched stream
traefik:
  type: remap
  inputs: [routes.traefik]
  source: |
    # Enrich with geoip data
    geoip, err = get_enrichment_table_record("gip", { "ip": .ClientHost }, ["country_code","latitude","longitude"] )
    if err == null {
      if is_array(geoip){
        geoip = geoip[0]
      }
      if geoip != null {
        .geoip = geoip
      }
    }
44sinks:
exporter:
  type: prometheus_exporter
  address: 0.0.0.0:9090
  inputs: [internal_metrics]
vlogs:
  type: elasticsearch
  inputs: [routes._unmatched, traefik]
  endpoints: << include "vlogs.es.urls" . >>
  mode: bulk
  api_version: v8
  compression: gzip
  healthcheck:
    enabled: false
  request:
    headers:
      VL-Time-Field: timestamp
      VL-Stream-Fields: stream,kubernetes.pod_name,kubernetes.container_name,kubernetes.pod_namespace
      VL-Msg-Field: message,msg,_msg,log.msg,log.message,log
      AccountID: '0'
      ProjectID: '0'
65

With this configuration, logs are collected from Kubernetes, parsed as JSON (if compatible), and routed based on conditions (e.g., logs from Traefik). GeoIP enrichment is applied to Traefik logs using a GeoLite2 database to add geolocation data. Unmatched logs and enriched Traefik logs are sent to VictoriaLogs, while internal metrics are exported to Prometheus for monitoring.

Parsing these logs on Grafana give these results:

Pretty great, yeah? Also it's over 30 days of data, and the query is instant.

It can give a pretty precise insight:

ArchiSteamFarm 🔗

Lastly, I'm selfhosting ArchiSteamFarm. Hey! I need have a lot of games, okay? I could sells those useless cards and buy games that I actually want.

Tried self-hosting a mail server with Maddy, using Scaleway Transactional mail instead 🔗

I tried to selfhost my own mailserver since Authelia requires to setup SMTP for "Password reset" mails.

However, residential IPs are banned, there this was useless (it still worked though).

Instead, I switched to Scaleway Transactional Mails since it's free for the first 300 mails and I have my domain there. Also, due to SMTP constraints, I'v also enabled DNSSEC, so, basically, everything was optimized to use Scaleway.

It also has mail logging.

It also has autoconfiguration. Life is great.

Backups and AWS mountpoint S3 CSI Driver 🔗

Lastly, I've setup backups everywhere and used Scaleway S3 offering. Obviously, I have encrypted the backups.

Something that I didn't know, but helped like hell, is using mountpoint-s3 (like s3fuse) as CSI driver.

With this, I can upload backups to S3 without the need to install awscli or s3cli, and simply use cp.

Postgres backup

cronjob.yaml
  1apiVersion: batch/v1
  2kind: CronJob
  3metadata:
name: postgresql-read-pgdumpall
namespace: postgresql
labels:
  app.kubernetes.io/component: pg_dumpall
  app.kubernetes.io/instance: postgresql
  app.kubernetes.io/managed-by: Helm
  app.kubernetes.io/name: postgresql
  app.kubernetes.io/version: 17.2.0
  helm.sh/chart: postgresql-16.4.5
  helm.toolkit.fluxcd.io/name: postgresql
  helm.toolkit.fluxcd.io/namespace: flux-system
annotations:
  meta.helm.sh/release-name: postgresql
  meta.helm.sh/release-namespace: postgresql
 18spec:
schedule: '@daily'
concurrencyPolicy: Allow
suspend: false
jobTemplate:
  metadata:
  spec:
    template:
      metadata:
        labels:
          app.kubernetes.io/component: pg_dumpall
          app.kubernetes.io/instance: postgresql
          app.kubernetes.io/managed-by: Helm
          app.kubernetes.io/name: postgresql
          app.kubernetes.io/version: 17.2.0
          helm.sh/chart: postgresql-16.4.5
      spec:
        volumes:
          - name: raw-certificates
            secret:
              secretName: postgresql.internal.home-cert
              defaultMode: 420
          - name: datadir
            persistentVolumeClaim:
              claimName: postgres-backups-pvc
          - name: empty-dir
            emptyDir: {}
          - name: tmp
            emptyDir: {}
        containers:
          - name: postgresql-read-pgdumpall
            image: docker.io/bitnami/postgresql:17.2.0-debian-12-r8
            command:
              - /bin/sh
              - '-c'
              - >
                DATE="$(date '+%Y-%m-%d-%H-%M')"
 55
                pg_dumpall --clean --if-exists --load-via-partition-root
                --quote-all-identifiers --no-password
                --file=/tmp/pg_dumpall-$DATE.pgdump
 59
                mv /tmp/pg_dumpall-$DATE.pgdump
                ${PGDUMP_DIR}/pg_dumpall-$DATE.pgdump
            env:
              - name: PGUSER
                value: postgres
              - name: PGPASSWORD
                valueFrom:
                  secretKeyRef:
                    name: postgresql-secret
                    key: postgres-password
              - name: PGHOST
                value: postgresql-read
              - name: PGPORT
                value: '5432'
              - name: PGDUMP_DIR
                value: /backup/pgdump
              - name: PGSSLROOTCERT
                value: /tmp/certs/ca.crt
            resources:
              limits:
                ephemeral-storage: 2Gi
                memory: 192Mi
              requests:
                cpu: 100m
                ephemeral-storage: 50Mi
                memory: 128Mi
            volumeMounts:
              - name: raw-certificates
                mountPath: /tmp/certs
              - name: datadir
                mountPath: /backup/pgdump
              - name: empty-dir
                mountPath: /tmp
                subPath: tmp-dir
            imagePullPolicy: IfNotPresent
            securityContext:
              capabilities:
                drop:
                  - ALL
              privileged: false
              seLinuxOptions: {}
              runAsUser: 1001
              runAsGroup: 1001
              runAsNonRoot: true
              readOnlyRootFilesystem: true
              allowPrivilegeEscalation: false
              seccompProfile:
                type: RuntimeDefault
        restartPolicy: OnFailure
        terminationGracePeriodSeconds: 30
        securityContext:
          fsGroup: 1001
successfulJobsHistoryLimit: 3
failedJobsHistoryLimit: 1
114---
115apiVersion: v1
116kind: PersistentVolumeClaim
117metadata:
name: postgres-backups-pvc
namespace: postgresql
120spec:
resources:
  requests:
    storage: 50Gi
volumeMode: Filesystem
accessModes:
  - ReadWriteMany
storageClassName: ''
volumeName: postgres-backups-pv
129---
130apiVersion: v1
131kind: PersistentVolume
132metadata:
name: postgres-backups-pv
134spec:
capacity:
  storage: 50Gi
accessModes:
  - ReadWriteMany
storageClassName: ''
mountOptions:
  - prefix postgres-backups/
  - endpoint-url https://s3.fr-par.scw.cloud
  - uid=1001
  - gid=1001
  - allow-other
csi:
  driver: s3.csi.aws.com
  volumeHandle: postgres-backups-pv
  volumeAttributes:
    bucketName: REDACTED
151

K3s backups

cronjob.yaml
  1apiVersion: batch/v1
  2kind: CronJob
  3metadata:
name: k3s-db-backup
  5spec:
schedule: '0 0 * * *' # Runs every day at midnight
jobTemplate:
  spec:
    template:
      spec:
        priorityClassName: system-cluster-critical
        tolerations:
          - key: 'CriticalAddonsOnly'
            operator: 'Exists'
          - key: 'node-role.kubernetes.io/control-plane'
            operator: 'Exists'
            effect: 'NoSchedule'
          - key: 'node-role.kubernetes.io/master'
            operator: 'Exists'
            effect: 'NoSchedule'
        nodeSelector:
          node-role.kubernetes.io/control-plane: 'true'
        containers:
          - name: k3s-db-backup
            image: alpine:latest
            imagePullPolicy: IfNotPresent
            volumeMounts:
              - name: gpg-passphrase
                mountPath: /etc/backup
                readOnly: true
              - name: backup-dir
                mountPath: /tmp/backups # Directory for temporary backup files
              - name: db-dir
                mountPath: /host/db # K3s database directory
                readOnly: true
              - name: output
                mountPath: /out
            command: ['/bin/ash', '-c']
            args:
              - |
                set -ex
 42
                # Install dependencies
                apk add --no-cache zstd gnupg sqlite
 45
                # Define backup file paths
                BACKUP_DIR="/host/db"
                SQLITE_DB="$BACKUP_DIR/state.db"
                TIMESTAMP=$(date +"%Y-%m-%d_%H-%M-%S")
                BACKUP_FILE="/tmp/backups/k3s_db_$TIMESTAMP.tar.zst"
                BACKUP_SQLITE_FILE="/tmp/backups/state_$TIMESTAMP.db"
                ENCRYPTED_FILE="$BACKUP_FILE.gpg"
                ENCRYPTED_SQLITE_FILE="$BACKUP_SQLITE_FILE.gpg"
 54
                # Compress the database directory (File-based backup)
                tar -cf - -C "$BACKUP_DIR" . | zstd -q -o "$BACKUP_FILE"
 57
                # Encrypt with GPG
                gpg --batch --yes --passphrase-file /etc/backup/gpg-passphrase --cipher-algo AES256 -c -o "$ENCRYPTED_FILE" "$BACKUP_FILE"
 60
                # Change permissions for the encrypted file
                chmod 600 "$ENCRYPTED_FILE"
 63
                # Upload to S3 using custom endpoint
                cp "$ENCRYPTED_FILE" "/out/$(basename $ENCRYPTED_FILE)"
 66
                # Cleanup (remove the backup, compressed, and encrypted files)
                rm -f "$BACKUP_FILE" "$ENCRYPTED_FILE"
 69
                # Do a sqlite3 backup
                sqlite3 "$SQLITE_DB" ".backup '$BACKUP_SQLITE_FILE'"
 72
                # Encrypt the sqlite3 backup
                gpg --batch --yes --passphrase-file /etc/backup/gpg-passphrase --cipher-algo AES256 -c -o "$ENCRYPTED_SQLITE_FILE" "$BACKUP_SQLITE_FILE"
 75
                # Change permissions for the encrypted sqlite3 file
                chmod 600 "$ENCRYPTED_SQLITE_FILE"
 78
                # Upload to S3 using custom endpoint
                cp "$ENCRYPTED_SQLITE_FILE" "/out/$(basename $ENCRYPTED_SQLITE_FILE)"
 81
                # Cleanup (remove the sqlite3 backup, compressed, and encrypted files)
                rm -f "$BACKUP_SQLITE_FILE" "$ENCRYPTED_SQLITE_FILE"
 84
        restartPolicy: OnFailure
        volumes:
          - name: gpg-passphrase
            secret:
              secretName: backup-secret
              defaultMode: 0400
              items:
                - key: gpg-passphrase
                  path: gpg-passphrase
          - name: backup-dir
            emptyDir: {} # Empty directory to hold temporary files like backups
          - name: db-dir
            hostPath:
              path: /var/lib/rancher/k3s/server/db
              type: Directory
          - name: output
            persistentVolumeClaim:
              claimName: k3s-backups-pvc
103---
104apiVersion: v1
105kind: PersistentVolumeClaim
106metadata:
name: k3s-backups-pvc
108spec:
accessModes:
  - ReadWriteMany
resources:
  requests:
    storage: 50Gi
storageClassName: ''
volumeName: k3s-backups-pv
volumeMode: Filesystem
117---
118apiVersion: v1
119kind: PersistentVolume
120metadata:
name: k3s-backups-pv
122spec:
capacity:
  storage: 50Gi
accessModes:
  - ReadWriteMany
storageClassName: ''
mountOptions:
  - prefix k3s-backups/
  - endpoint-url https://s3.fr-par.scw.cloud
  - uid=1001
  - gid=1001
  - allow-other
csi:
  driver: s3.csi.aws.com
  volumeHandle: k3s-backups-pv
  volumeAttributes:
    bucketName: REDACTED
139

Conclusion 🔗

CockroachDB and NFS were the main bottlenecks of my old infrastructure. With a storage node with compute power, my setup is now super efficient. Having also learned to clean my k3s DD, there is no more abnormal CPU usage.

Having my storage node also under Kubernetes, I can install monitoring agents super easily:

Control node

Worker 0

Worker 1

Storage

As you can see, I have a lot of resources even if my whole monitoring stack is installed, compared to last year when the Prometheus and Grafana Loki stack were installed.

Pretty happy how it turns out, and ready to install more stuff!