The k8s-cluster-config repository is the single source of truth for the Kubernetes cluster state. It uses the ArgoCD app-of-apps pattern where a single root application automatically discovers and deploys all components.
k8s-cluster-config/
├── argocd/
│ └── root-app.yaml # Root Application (app-of-apps)
├── core-components/
│ └── <component>/
│ ├── application.yaml # ArgoCD Application resource
│ ├── values.yaml # Helm values
│ └── resources/ # Optional: VSO secrets, extra manifests
├── applications/
│ └── lifeops/
│ ├── application.yaml
│ └── values.yaml
└── CLAUDE.md
The root application (argocd/root-app.yaml) is configured to:
spec:
source:
repoURL: [email protected]:AnhTran1610/k8s-cluster-config.git
path: core-components
directory:
recurse: true
include: "**/application.yaml"
ArgoCD recursively scans core-components/ for all application.yaml files and creates an ArgoCD Application for each one. Sync waves control the deployment order.
Each component uses the multi-source pattern:
$values) for the values fileresources/ directory)sources:
- repoURL: https://charts.example.com
chart: my-chart
targetRevision: "1.0.0"
helm:
valueFiles:
- $values/core-components/<name>/values.yaml
- repoURL: [email protected]:AnhTran1610/k8s-cluster-config.git
targetRevision: HEAD
ref: values
monitoring namespace)prune: true, selfHeal: true, CreateNamespace=true, ServerSideApply=true[email protected]:AnhTran1610/k8s-cluster-config.gitSee the full Core Components table for all 30 components with their sync waves, charts, versions, and namespaces.
Workloads are pinned to specific nodes via nodeSelector to prevent memory imbalance:
| Node | Role | Pinned Workloads |
|---|---|---|
| k8s-node1 (16GB) | Databases | authentik-postgresql, harbor-database, harbor-redis, nextcloud-postgresql, lifeops-postgres |
| k8s-node2 | Monitoring logs | VictoriaLogs |
| k8s-node3 | Monitoring metrics | VictoriaMetrics |
| Any | Stateless apps | Traefik, ArgoCD, Harbor core, Authentik server, CrowdSec, etc. |
Why pin databases to node1?
longhorn (replicated block storage). Keeping all DBs on one node means Longhorn's primary replica is always local — faster I/O than network-attached.nodeSelector stay Pending until their target node is Ready — they will never accidentally migrate to another node and leave a Longhorn volume detach/reattach.Helm values pattern for nodeSelector:
# Bitnami postgresql subchart (authentik, nextcloud)
postgresql:
primary:
nodeSelector:
kubernetes.io/hostname: k8s-node1
# Harbor chart (note: nested under .internal)
database:
internal:
nodeSelector:
kubernetes.io/hostname: k8s-node1
redis:
internal:
nodeSelector:
kubernetes.io/hostname: k8s-node1
Harbor gotcha: Harbor's database and redis components are nested under
.internal.database.nodeSelectorandredis.nodeSelectorat the top level are silently ignored — always usedatabase.internal.nodeSelector.
Safe procedure to resize a Proxmox VM without data loss:
# 1. Verify all DB StatefulSets have nodeSelector pinning the target node
# (if not, add nodeSelector first and wait for ArgoCD sync)
# 2. Cordon the node (stop new pods scheduling)
kubectl cordon k8s-node1
# 3. Drain (evict all non-DaemonSet pods)
# DB pods will go Pending — nodeSelector prevents them landing elsewhere
kubectl drain k8s-node1 --ignore-daemonsets --delete-emptydir-data
# 4. SSH to Proxmox and resize
ssh [email protected]
qm shutdown 108 # graceful shutdown
qm set 108 --memory 16384 # set new RAM (MB)
qm start 108
# 5. Wait for node to rejoin
until kubectl get node k8s-node1 --no-headers | grep -v NotReady | grep -q Ready; do
sleep 5; echo -n "."
done
# 6. Uncordon — DB pods will immediately schedule back to node1
kubectl uncordon k8s-node1
Longhorn instance-manager PDB: During drain, Longhorn's instance-manager pod repeatedly hits its PodDisruptionBudget and refuses eviction for ~2-3 minutes before eventually allowing it. This is normal — just let it retry. Do not add --disable-eviction as it will forcibly terminate Longhorn's volume manager.