497 lines
20 KiB
Markdown
497 lines
20 KiB
Markdown
# Homelab Kubernetes Pipeline
|
|
|
|
This cool repo bootstraps a hybrid kubeadm cluster and then hands app delivery to
|
|
Argo CD.
|
|
|
|
## Architecture
|
|
|
|
The lab is intentionally small but production-shaped:
|
|
|
|
- a Debian amd64 host runs the kubeadm control plane and local deployment tools
|
|
- a Raspberry Pi arm64 node runs selected workloads
|
|
- a provisioning layer can PXE boot Debian 13 arm64 VMs for Pimox worker
|
|
templates
|
|
- OpenTofu owns the bootstrap layers for cluster, platform, apps, and edge
|
|
- Argo CD continuously reconciles Kubernetes manifests from this repo
|
|
- a local registry stores the website and demos images built for the worker
|
|
architecture
|
|
- SOPS with age is the committed secret-management path for future encrypted
|
|
Kubernetes secrets
|
|
- an OCI jump box provides the public edge path back into the homelab over
|
|
Tailscale
|
|
|
|
Run `./lab.sh up` and `./lab.sh nuke` only from the Debian homelab server. The
|
|
script intentionally refuses to run from non-Debian machines so a laptop cannot
|
|
accidentally modify the cluster.
|
|
|
|
## Flow
|
|
|
|
1. `bootstrap/provisioning`
|
|
- prepares a Debian server as a PXE and preseed service for arm64 VMs
|
|
- serves Debian 13 arm64 netboot assets through TFTP and HTTP
|
|
- creates a golden image install path with Kubernetes, containerd,
|
|
qemu-guest-agent, cloud-init, and storage client packages ready
|
|
- is driven by `./lab.sh up` when Pimox is reachable, without changing
|
|
Orange Pi host networking
|
|
|
|
2. `bootstrap/cluster`
|
|
- creates the kubeadm control plane on the Debian amd64 node
|
|
- joins worker nodes such as Raspberry Pi and Pimox Debian arm64 nodes
|
|
- configures Calico-compatible pod CIDR
|
|
- configures containerd to pull from the in-cluster NodePort registry
|
|
- creates retained host directories under `/var/openebs/local`
|
|
|
|
3. `bootstrap/platform`
|
|
- installs a minimal Calico deployment through the Tigera operator
|
|
- installs NodeLocal DNSCache for node-local DNS query caching
|
|
- can install MetalLB for LAN `LoadBalancer` services after an address pool
|
|
is chosen
|
|
- installs OpenEBS
|
|
- creates `openebs-hostpath-retain`
|
|
- installs Argo CD
|
|
- installs Kyverno with audit-first baseline Pod Security policies
|
|
- registers the private GitOps repo without storing the SSH private key in
|
|
Terraform state
|
|
|
|
4. `bootstrap/apps`
|
|
- registers Argo CD Applications from the `applications` map
|
|
- default apps are `container-registry`, `gitea`, `website-production`, and
|
|
`demos-static`
|
|
|
|
5. `bootstrap/edge`
|
|
- connects to the OCI jump box
|
|
- uploads nginx, HAProxy, Varnish, and Squid configs
|
|
- obtains and renews Let's Encrypt certificates for the configured hostname
|
|
- runs the edge cache/proxy chain with Docker Compose
|
|
|
|
## Prerequisites
|
|
|
|
On the Debian host:
|
|
|
|
- OpenTofu
|
|
- Docker with Buildx
|
|
- kubeadm, kubelet, kubectl, and containerd
|
|
- SSH access to worker nodes
|
|
- SSH access to the OCI edge host
|
|
- enough persistent storage for `/var/openebs/local` and `/var/lib/docker`
|
|
|
|
The default kubeconfig path is `/home/jv/.kube/config`. Override it with
|
|
`KUBECONFIG_PATH` or `TF_VAR_kubeconfig_path` when needed.
|
|
|
|
## Deploying
|
|
|
|
From the Debian server:
|
|
|
|
```bash
|
|
cd ~/my-homelab-configs
|
|
./lab.sh up
|
|
```
|
|
|
|
The script detects the Pimox host at `192.168.100.80` in auto mode. When SSH,
|
|
`qm`, and `vmbr0` are available, it applies `bootstrap/provisioning`, creates or
|
|
reuses the Debian 13 arm64 template, creates or reuses one worker VM clone,
|
|
discovers the guest IP through qemu-guest-agent, and passes that worker into the
|
|
cluster layer. It then applies the remaining OpenTofu stacks, refreshes Argo CD
|
|
apps, waits for the local registry, builds the website and demos images when
|
|
their source changed, pushes them to the registry, recreates pods only after a
|
|
new image is built, and applies the edge stack.
|
|
|
|
Set `LAB_PIMOX_PIPELINE=false` to skip Pimox automation. Set
|
|
`LAB_PIMOX_WORKER_COUNT=0` to create or refresh only the template. The pipeline
|
|
keeps the template on its configured `local` storage, creates new worker VM
|
|
clones on `nvme_thin_pool` by default, checks that the Pimox bridge already
|
|
exists, refuses `local` as worker clone storage, and refuses to edit Orange Pi
|
|
host networking.
|
|
|
|
`LAB_PIMOX_SKIP_WORKER_INDEXES` defaults to `1` because the first Pimox worker
|
|
slot was created manually. With the default `LAB_PIMOX_WORKER_COUNT=1`, the
|
|
pipeline keeps the template current and leaves VMID `9010` alone. Set
|
|
`LAB_PIMOX_SKIP_WORKER_INDEXES=''` if you want the pipeline to own the first
|
|
slot, or set `LAB_PIMOX_WORKER_COUNT=2` to manage the second slot while still
|
|
skipping the first.
|
|
|
|
OpenWrt firewall VM automation is opt-in because it attaches to both WAN and
|
|
LAN bridges. Set `LAB_OPENWRT_VM=true` after `vmbr1` already exists on the
|
|
Orange Pi. The pipeline downloads the OpenWrt ARM SystemReady EFI image, writes
|
|
basic WAN/LAN/firewall config into the image, imports it as VM `9050`, attaches
|
|
`vmbr0` as WAN and `vmbr1` as LAN, and stores the VM disk on `nvme_thin_pool`.
|
|
It does not use the Debian Kubernetes golden-node template for OpenWrt.
|
|
|
|
The website and demos images default to `linux/arm64` because both deployments
|
|
are pinned to the Raspberry Pi worker. Override with `WEBSITE_IMAGE_PLATFORMS`
|
|
or `DEMOS_IMAGE_PLATFORMS` only if node placement changes.
|
|
|
|
Build metadata is written under `.lab/` so repeat runs can skip the website
|
|
or demos image build when the source hash, platform, image reference, and
|
|
registry manifest still match.
|
|
|
|
## Validation
|
|
|
|
Useful checks after a rebuild:
|
|
|
|
```bash
|
|
export KUBECONFIG=/home/jv/.kube/config
|
|
|
|
kubectl get nodes
|
|
kubectl -n argocd get applications
|
|
kubectl -n container-registry get pods
|
|
kubectl -n gitea-system get pods
|
|
kubectl -n website-production get pods -o wide
|
|
kubectl -n demos-static get pods -o wide
|
|
|
|
docker info --format '{{.DockerRootDir}}'
|
|
df -h / /var/openebs/local /var/lib/docker
|
|
```
|
|
|
|
The website should be reached through the configured public hostname, not the raw
|
|
OCI IP address, because the Let's Encrypt certificate is issued for the
|
|
hostname.
|
|
|
|
## Adding Nodes
|
|
|
|
For Pimox on Orange Pi 5 Plus, `./lab.sh up` can create the Debian 13 arm64
|
|
template and worker VM clones automatically. Defaults are intentionally tied to
|
|
the observed host: Pimox SSH host `192.168.100.80`, bridge `vmbr0`, template VMID
|
|
`9000` on `local` storage, worker VMIDs starting at `9010`, and worker clone
|
|
storage `nvme_thin_pool`. Details and override variables are in
|
|
`bootstrap/provisioning/README.md`.
|
|
|
|
Worker indexes are stable. Index `1` maps to VMID `9010`, node name
|
|
`pimox-worker-01`, and worker key `pimox01`; index `2` maps to VMID `9011`, and
|
|
so on. `LAB_PIMOX_SKIP_WORKER_INDEXES=1` leaves the already-created first slot
|
|
unmanaged while allowing higher indexes to be automated.
|
|
|
|
Add entries to `bootstrap/cluster/variables.tf` or a `.tfvars` file:
|
|
|
|
```hcl
|
|
worker_nodes = {
|
|
raspberrypi = {
|
|
host = "192.168.100.89"
|
|
user = "jv"
|
|
node_name = "raspberry"
|
|
ssh_key_path = "/home/jv/.ssh/id_ed25519"
|
|
}
|
|
}
|
|
```
|
|
|
|
Stateful apps currently pin retained local PVs to the `debian` node. Move or
|
|
duplicate those PV manifests when you want storage on another node.
|
|
|
|
## Workload Placement
|
|
|
|
`bootstrap/cluster` labels nodes with homelab placement metadata:
|
|
|
|
- `homelab.dev/node-role=control-plane` and `homelab.dev/storage=local` on the
|
|
Debian control plane
|
|
- `homelab.dev/node-role=edge-app` and `homelab.dev/storage=local` on the
|
|
Raspberry Pi worker
|
|
- `homelab.dev/node-role=app` and `homelab.dev/storage=nvme` on automated Pimox
|
|
worker clones
|
|
|
|
Override `control_plane_node_labels`, `worker_node_labels`,
|
|
`LAB_RASPBERRY_NODE_LABELS_JSON`, or `LAB_PIMOX_WORKER_NODE_LABELS_JSON` when
|
|
the physical layout changes. The current website, demos, registry, and Gitea
|
|
manifests are not moved automatically because the public NodePort path and
|
|
retained OpenEBS hostpath PVs are node-local. Move workloads only after their
|
|
storage and edge path are ready on the target node.
|
|
|
|
The website and demos NodePorts are reachable from the OCI jump box through the
|
|
Raspberry Pi Tailscale interface. `bootstrap/cluster` installs a persistent
|
|
`homelab-tailscale-nodeport.service` on the configured worker to restore the
|
|
route, rp_filter settings, and iptables rules after reboot. Override the
|
|
defaults through `tailscale_nodeport_access` when the jump-box IP, Pi Tailscale
|
|
IP, pod CIDR, primary NodePort, or pod target port changes. Add any additional
|
|
public NodePorts to `tailscale_nodeport_extra_ports`:
|
|
|
|
```hcl
|
|
tailscale_nodeport_access = {
|
|
enabled = true
|
|
worker_key = "raspberrypi"
|
|
peer_ip = "100.118.255.19"
|
|
node_tailscale_ip = "100.77.80.72"
|
|
pod_cidr = "10.244.0.0/16"
|
|
node_port = 30080
|
|
target_port = 80
|
|
}
|
|
|
|
tailscale_nodeport_extra_ports = [30081]
|
|
```
|
|
|
|
For `./lab.sh nuke`, set `WORKER_SSH_TARGETS` to a space-separated list of
|
|
remote SSH targets when more worker nodes exist. Set it to an empty string for a
|
|
single-node rebuild.
|
|
|
|
## Adding Platform Tools
|
|
|
|
Add Helm releases through `bootstrap/platform`'s `extra_helm_releases` map.
|
|
|
|
## Policy Guardrails
|
|
|
|
`bootstrap/platform` installs Kyverno and the upstream baseline Pod Security
|
|
policies in `Audit` mode. This gives the lab policy reports for unsafe workload
|
|
settings without blocking existing pods during the first rollout. After reports
|
|
are clean, individual policies can be promoted to `Enforce` in
|
|
`bootstrap/platform/main.tf`.
|
|
|
|
## DNS Cache
|
|
|
|
`bootstrap/platform` installs NodeLocal DNSCache in `kube-system` with
|
|
`registry.k8s.io/dns/k8s-dns-node-cache`. The default listens on
|
|
`169.254.20.10` and the kube-dns service IP `10.96.0.10`, which keeps the
|
|
rollout compatible with the current kube-proxy iptables path without rewriting
|
|
kubelet DNS settings across the nodes. Override `nodelocal_dns` if the service
|
|
CIDR or upstream DNS servers change.
|
|
|
|
## MetalLB
|
|
|
|
MetalLB is present in `bootstrap/platform` but disabled by default. Enable it
|
|
only after reserving a LAN IP range outside DHCP and outside any future OpenWrt
|
|
LAN pool:
|
|
|
|
```bash
|
|
export TF_VAR_metallb='{
|
|
enabled = true
|
|
repository = "https://metallb.github.io/metallb"
|
|
version = "0.16.0"
|
|
namespace = "metallb-system"
|
|
address_pool = ["192.168.100.240-192.168.100.250"]
|
|
l2_advertisement_enabled = true
|
|
pool_name = "homelab-lan"
|
|
}'
|
|
```
|
|
|
|
The current website, demos, registry, and Gitea services remain `NodePort`
|
|
services until the LAN address pool and edge route are tested manually.
|
|
|
|
## Secrets
|
|
|
|
Use SOPS with age for secrets that need to live in Git. Start from
|
|
`.sops.yaml.example`, replace the age recipient with the public key generated on
|
|
the Debian host, and commit the resulting `.sops.yaml`. Keep the private age key
|
|
outside the repo. Operational notes are in `docs/secrets.md`.
|
|
|
|
## Edge Services
|
|
|
|
The OCI jump box runs the public edge path:
|
|
|
|
```text
|
|
nginx -> HAProxy -> Varnish/Squid -> Raspberry Pi Tailscale NodePort
|
|
```
|
|
|
|
The `bootstrap/edge` stack renders configs from `bootstrap/edge/templates` and
|
|
deploys them to `/opt/homelab-edge` on the OCI host. Defaults are in
|
|
`bootstrap/edge/variables.tf`; override them through `TF_VAR_*` or a `.tfvars`
|
|
file when the public host, SSH key, server name, backend Tailscale IP, or
|
|
NodePort changes.
|
|
|
|
Use the configured `server_name` in the browser, for example
|
|
`https://lab2025.duckdns.org`. A raw OCI IP address will still show a browser
|
|
certificate warning because the trusted certificate is issued for the hostname.
|
|
|
|
The edge stack uses HTTP-01 validation, so public DNS for `server_name` must
|
|
point to the OCI public IP and inbound TCP 80 and 443 must be open before
|
|
`./lab.sh up` runs. Set `TF_VAR_letsencrypt_email` to receive expiry notices,
|
|
or leave it empty to register without an email. Set
|
|
`TF_VAR_enable_letsencrypt=false` to keep using the temporary local certificate.
|
|
|
|
## Adding Apps
|
|
|
|
Add Kubernetes manifests under `apps/<name>` and register them in
|
|
`bootstrap/apps`'s `applications` map. Argo CD will own sync, pruning, and
|
|
self-healing for the app.
|
|
|
|
## Storage
|
|
|
|
OpenEBS provides the platform storage provisioner. Stateful homelab apps use
|
|
retained local PV paths such as `/var/openebs/local/gitea` and
|
|
`/var/openebs/local/registry`; these paths are intentionally outside kubeadm
|
|
reset paths so data can survive cluster destroy/create cycles. Those critical
|
|
volumes are declared explicitly as retained local PVs so a rebuilt cluster binds
|
|
back to the same host paths instead of creating fresh directories.
|
|
|
|
For the current lab, `/var/openebs/local` and `/var/lib/docker` are expected to
|
|
live on larger storage than the root filesystem. This keeps retained PVs,
|
|
container layers, Buildx state, and image caches from filling `/`.
|
|
|
|
## Gitea
|
|
|
|
Gitea is deployed from `apps/gitea`, stores data in the retained local PV at
|
|
`/var/openebs/local/gitea`, and is exposed through the public edge path at
|
|
`https://lab2025.duckdns.org/git/`. HTTP clone and push traffic goes through the
|
|
same path. The NodePort remains available inside the lab at port `30300`.
|
|
|
|
`./lab.sh up` applies the Gitea manifests directly before creating Argo CD
|
|
Applications. This keeps the Git service bootstrap-safe if the GitOps repo is
|
|
later moved into in-cluster Gitea.
|
|
|
|
After the repo exists in Gitea, Argo CD can be pointed at the internal service
|
|
URL so it no longer depends on the old external Git server:
|
|
|
|
```bash
|
|
export TF_VAR_gitops_repo_url='http://gitea.gitea-system.svc.cluster.local:3000/jv/my-homelab-configs.git'
|
|
tofu -chdir=bootstrap/platform apply -auto-approve
|
|
tofu -chdir=bootstrap/apps apply -auto-approve
|
|
```
|
|
|
|
## Gitea Backups
|
|
|
|
`./lab.sh up` installs a Debian-host systemd timer named
|
|
`homelab-gitea-backup.timer`. The timer runs daily, executes `gitea dump` inside
|
|
the Gitea pod, copies the dump out of Kubernetes, and stores it under
|
|
`/var/backups/homelab/gitea` on the Debian server. The default retention is 30
|
|
days.
|
|
|
|
The same install step also creates `homelab-gitea-restore-drill.timer`. The
|
|
monthly drill is non-destructive: it verifies the latest backup ZIP, extracts it
|
|
to a temporary directory, records a report under
|
|
`/var/backups/homelab/gitea-restore-drills`, and removes the temporary extract.
|
|
It does not write into the live Gitea PVC.
|
|
|
|
Run a manual backup from the Debian server with:
|
|
|
|
```bash
|
|
./lab.sh backup-gitea
|
|
```
|
|
|
|
Run the restore drill manually with:
|
|
|
|
```bash
|
|
./lab.sh drill-gitea-restore
|
|
```
|
|
|
|
Useful checks:
|
|
|
|
```bash
|
|
systemctl list-timers homelab-gitea-backup.timer
|
|
systemctl list-timers homelab-gitea-restore-drill.timer
|
|
sudo systemctl start homelab-gitea-backup.service
|
|
sudo ls -lh /var/backups/homelab/gitea
|
|
sudo ls -lh /var/backups/homelab/gitea-restore-drills
|
|
```
|
|
|
|
## Gitea Actions
|
|
|
|
This repo includes a Gitea Actions workflow at
|
|
`.gitea/workflows/homelab-main.yml`. It runs only on pushes to `main` and targets
|
|
a repository-scoped Debian host runner with the label `homelab-debian`.
|
|
|
|
The workflow validates shell syntax, Kubernetes manifests, and all OpenTofu
|
|
stacks before deployment. It automatically stops when high-impact files under
|
|
`bootstrap/provisioning`, `bootstrap/cluster`, `bootstrap/platform`,
|
|
`bootstrap/edge`, `lab.sh`, or `.gitea/workflows` change; those changes still
|
|
require a manual Debian run. Lower-risk app changes proceed to `./lab.sh apps`
|
|
after validation passes, which skips Pimox, cluster, platform, and edge changes.
|
|
|
|
Enable Actions for the repository in Gitea, then create a repository-level runner
|
|
token from:
|
|
|
|
```text
|
|
https://lab2025.duckdns.org/git/jv/my-homelab-configs/settings/actions/runners
|
|
```
|
|
|
|
Register and start the Debian runner from the Debian server:
|
|
|
|
```bash
|
|
cd ~/my-homelab-configs
|
|
GITEA_RUNNER_REGISTRATION_TOKEN='<repo-runner-token>' ./lab.sh install-gitea-runner
|
|
```
|
|
|
|
The runner is installed as `homelab-gitea-runner.service`, runs as user `jv`, and
|
|
uses a host label instead of a Docker job container because deployment needs the
|
|
Debian host's Docker, OpenTofu, kubeconfig, SSH keys, and local state.
|
|
|
|
The deployment job is non-interactive. User `jv` must be able to run `sudo -n
|
|
true` on the Debian host or the workflow will fail before deployment.
|
|
|
|
Useful checks:
|
|
|
|
```bash
|
|
systemctl status homelab-gitea-runner.service
|
|
journalctl -u homelab-gitea-runner.service -n 100 --no-pager
|
|
```
|
|
|
|
## Renovate
|
|
|
|
`renovate.json` defines dependency update rules for Dockerfiles, OpenTofu
|
|
providers, Helm chart versions, and the pinned tools used by the Gitea Actions
|
|
workflow. Renovate should open reviewable update branches or PRs only; it must
|
|
not auto-merge infrastructure changes. Keep app-only dependency updates on the
|
|
normal Gitea Actions path, and run `./lab.sh up` manually on the Debian server
|
|
for platform or provisioning updates.
|
|
|
|
## Destructive Rebuilds
|
|
|
|
`./lab.sh nuke` resets kubeadm, containerd runtime state, CNI files, Calico
|
|
links, iptables rules, local OpenTofu state, and configured worker nodes. It does
|
|
not delete retained data under `/var/openebs/local`.
|
|
|
|
For multi-node labs, set `WORKER_SSH_TARGETS` to a space-separated list of SSH
|
|
targets. For a single-node rebuild, set it to an empty string.
|
|
|
|
## Website App
|
|
|
|
The website is a PHP app under `apps/website`. It includes a home page, CV page,
|
|
blog page, and demos page, plus a lightweight translation flow backed by Ollama.
|
|
Static language files live in `apps/website/lang`; unsupported browser languages
|
|
can be translated by the client and saved through `save_lang.php` as runtime
|
|
JSON data on the website PVC.
|
|
|
|
The CV page has two client-side presentation modes:
|
|
|
|
- `Elegant`: dark, minimal, terminal-inspired styling with a square profile
|
|
image and light green console text.
|
|
- `Fancy`: centered circular profile image, cursive orbit text, and a
|
|
cursor-following portrait rotation effect.
|
|
|
|
The Demos page is a catalog in the PHP website. The actual demo applications are
|
|
served from a separate `demos-static` artifact under `apps/demos-static` and are
|
|
published through the `demos-static` Argo CD application. Public traffic reaches
|
|
them through the edge path at `/demo-apps/`.
|
|
|
|
`./lab.sh up` builds and pushes two independent images:
|
|
|
|
- `php-website:latest` from `apps/website`
|
|
- `demos-static:latest` from `apps/demos-static`
|
|
|
|
The first demo, `The Client-Side Media Cruncher (Wasm + TS)`, currently performs
|
|
private, browser-only image compression and conversion using native Canvas APIs.
|
|
Heavier video conversion, such as MP4 to WebM, should use a Rust core compiled
|
|
to WebAssembly with a TypeScript UI so the codec work stays fast and still
|
|
avoids backend uploads.
|
|
|
|
The demos are designed to be local-first so the current cluster can serve them
|
|
from the Raspberry Pi worker without turning either pod into an application
|
|
server. The website pod serves the portfolio shell and the `demos-static` pod
|
|
serves static demo bundles; CPU-heavy work runs in the visitor's browser. With
|
|
the current deployments pinned to the Raspberry Pi, avoid bundling large ML
|
|
models, server-side WebSocket probes, or backend video transcoders into either
|
|
image. If those demos become production-grade, lazy load model assets in the
|
|
browser or move backend workers to a larger node, such as VMs on the Orange Pi 5
|
|
Plus.
|
|
|
|
Current demo inventory:
|
|
|
|
- Client-side media cruncher: image conversion/compression with Canvas; future
|
|
Rust/Wasm codec path for video.
|
|
- Internet quality visualizer: live Canvas graph for latency, jitter, and
|
|
stability using same-origin browser probes; a dedicated WebSocket echo endpoint
|
|
would be the production version.
|
|
- Local log and JSON toolbelt: JSON formatting, JWT decoding, URL parsing, and
|
|
local text-log filtering.
|
|
- Architecture simulator: click-driven load, crash, and auto-scale simulation.
|
|
- Offline traveler converter: PWA shell with timezone, currency, and GB/GiB
|
|
conversions.
|
|
- Privacy-first redactor: local image redaction prototype; future
|
|
onnxruntime-web plus quantized YOLO or face model path.
|
|
- Local sentiment sandbox: lightweight local sentiment, keyword, and summary
|
|
prototype; future Transformers.js/ONNX path.
|
|
- Model drift simulator: visual MLOps playground for spikes, corrupted inputs,
|
|
and retraining.
|
|
|
|
The Kubernetes deployment uses `apps/website/web-app.yaml`. Keep the image
|
|
reference there aligned with `TF_VAR_registry_endpoint`, because `lab.sh` derives
|
|
the registry endpoint from that manifest.
|
|
|
|
Keep the `.terraform.lock.hcl` files committed. They pin provider selections and
|
|
make bootstrap behavior reproducible across nodes and rebuilds.
|