All external HTTP/HTTPS traffic entering the cluster is protected by CrowdSec, a collaborative, behaviour-based intrusion detection and prevention system. CrowdSec runs as native Kubernetes workloads and integrates directly with Traefik at the entrypoint level.
| Component | Detail |
|---|---|
| Chart | crowdsecurity/crowdsec v0.22.1 (appVersion v1.7.6) |
| Namespace | crowdsec |
| Sync Wave | 3 |
| LAPI | 1× Deployment (crowdsec-lapi), NFS RWX PVCs (1 Gi data, 100 Mi config) |
| Agents | 4× DaemonSet pods (one per node: control-plane + node1/2/3) |
| AppSec | 1× Deployment (crowdsec-appsec), port 7422, OWASP CRS + virtual-patching |
| Bouncer | Traefik plugin v1.3.5, stream mode, 30 s sync interval |
| CAPI | Connected — community blocklist pulling enabled |
Every HTTPS request hits the Traefik bouncer plugin first. The plugin maintains an in-memory cache of banned IPs, refreshed from the CrowdSec LAPI every 30 seconds. Banned IPs receive a 403 Forbidden immediately — no backend service is hit.
Each allowed request is also forwarded to the AppSec component which uses a custom appsec-config (custom/crs-and-vpatch) loading all rules as inband (real-time blocking):
| Rule Set | Loaded as | Coverage |
|---|---|---|
crowdsecurity/base-config |
Inband | ModSecurity base configuration, anomaly score variables |
crowdsecurity/vpatch-* |
Inband | ~130 specific CVE virtual patches (.env, .git/config, Spring4Shell, etc.) |
crowdsecurity/crs |
Inband | OWASP CRS with full anomaly scoring: SQLi, XSS, RCE, LFI, RFI, Log4Shell, Shellshock, etc. |
CRS anomaly scoring: Individual CRS rules add to tx.inbound_anomaly_score. REQUEST-949-BLOCKING-EVALUATION blocks the request when the total score ≥ tx.inbound_anomaly_score_threshold (default: 5, set by REQUEST-901-INITIALIZATION.conf). This means multi-pattern SQLi/XSS payloads accumulate score and trigger a single deny.
AppSec catches attack payloads even from IPs not yet in the ban list. It is fail-open: if the AppSec pod is unreachable, requests still flow through (the IP-reputation check still applies).
LAN bypass: Traffic from
clientTrustedIPs(192.168.88.0/24, 192.168.20.0/24, 192.168.100.0/24) completely skips AppSec. To test AppSec from a LAN machine, useX-Forwarded-For: <external-ip>— the NAS is inforwardedHeadersTrustedIPsso XFF is trusted and the effective IP becomes the XFF value.
Agents run as a DaemonSet on all 4 nodes and read logs from two sources:
| Source | Acquisition method | Detects |
|---|---|---|
| Traefik HTTP logs | File: /var/log/containers/traefik-*_traefik_*.log on the node running Traefik (node2) |
HTTP attacks, CVE probes, bad user agents |
/var/log/auth.log |
hostPath file (hostVarLog=true) | SSH brute force |
When an IP triggers enough suspicious events within a time window, the agent pushes an alert to the LAPI. The LAPI applies the default profile (ban for 4 h) and propagates the decision to the bouncer within ≤30 s.
Critical: The agent on each node only reads logs from its own node's
/var/log/containers/. Traefik runs on node2, so only the agent on node2 (crowdsec-agent-gsxr2) reads Traefik logs. The other three agents handle SSH brute force only.
The LAPI is connected to api.crowdsec.net. Known malicious IPs from the global CrowdSec community (~400 k instances worldwide) are blocked automatically before they attempt an attack.
| Collection | Key Scenarios |
|---|---|
crowdsecurity/traefik |
Traefik JSON log parsing — the base for all HTTP detection |
crowdsecurity/base-http-scenarios |
Generic brute force (http-generic-bf), path probing, bad user agents, aggressive crawling, backdoor path requests, admin interface probing (/wp-admin, /phpMyAdmin), sensitive files (.env, .git), path traversal (../), SQL injection patterns, XSS patterns, open proxy abuse |
crowdsecurity/http-cve |
18 named CVEs: Log4Shell (CVE-2021-44228), Spring4Shell (CVE-2022-22965), Grafana path traversal (CVE-2021-43798), Confluence RCE (CVE-2022-26134, CVE-2023-22515), Apache traversal (CVE-2021-41773), VMware vCenter, Fortinet, F5 BIG-IP, PHP unit RCE, Text4Shell |
| Scenario | Description |
|---|---|
ssh-bf |
Classic SSH brute force (many failed logins) |
ssh-slow-bf |
Distributed / slow SSH brute force |
ssh-time-based-bf |
Time-pattern based SSH attacks |
ssh-cve-2024-6387 |
RegreSSHion CVE exploit attempts |
| Rule Set | Inband Config | Test Result |
|---|---|---|
crowdsecurity/base-config |
custom/crs-and-vpatch |
✅ Base config loaded (ModSec variables initialized) |
crowdsecurity/vpatch-* |
custom/crs-and-vpatch |
✅ .env, .git/config → HTTP 403 |
crowdsecurity/crs |
custom/crs-and-vpatch |
✅ SQLi, XSS, Log4Shell, Shellshock, RCE → HTTP 403 |
151 inband rules loaded (up from ~128 when CRS was not included).
Every ban decision triggers a Telegram message to the homelab notification channel:
kv/crowdsec/telegram, injected as $TELEGRAM_BOT_TOKEN env var — never in Git377753554 (same channel as Alertmanager alerts)default_ip_remediation — all ban decisions trigger the notification# values.yaml — config.notifications.http.yaml
format: |
{"chat_id":"377753554","text":"🚫 CrowdSec Ban\n{{ range . }}IP: {{.Source.Value}}\nScenario: {{.Scenario}}\nDuration: {{(index .Decisions 0).Duration}}\n{{ end }}"}
Common mistake: The
models.Alerttype does NOT have.Valueor.Durationfields at the top level. Use.Source.Valuefor the IP and(index .Decisions 0).Durationfor the ban duration. Using the wrong fields causes a silent template error and no messages are ever sent.
Traffic from these CIDRs skips the bouncer check (clientTrustedIPs):
| CIDR | Network |
|---|---|
192.168.88.0/24 |
Main LAN (work VLAN) |
192.168.20.0/24 |
Home VLAN |
192.168.100.0/24 |
WireGuard VPN |
10.244.0.0/16 |
K8s pod CIDR (added 2026-03-08) |
Why pod CIDR? CrowdSec AppSec OWASP CRS flags binary request bodies with anomaly scores 5–10. This caused Kaniko's OCI blob upload PATCH requests to receive 403 Forbidden, while POST (initiate upload) succeeded — a misleading symptom. See Gotchas below.
Additionally, crowdsecurity/whitelist-good-actors whitelists all RFC1918 ranges at the agent level:
127.0.0.0/8, 10.0.0.0/8, 172.16.0.0/12, 192.168.0.0/16This means LAN traffic that passes the bouncer whitelist is also whitelisted by the detection pipeline — no false positives from internal hosts.
| Gap | Detail |
|---|---|
| East-west traffic | Pod-to-pod traffic inside the cluster never hits Traefik — lateral movement is not detected. Requires Cilium network policies or a service mesh. |
| Layer 4 attacks | Raw TCP floods and port scans at the network level. CrowdSec is HTTP-focused. |
| Container runtime | Privilege escalation, container escape, cryptomining. Falco would complement this. |
| K8s API server | kubectl / API-level attacks. Audit logging is not configured on this cluster. |
| NAS services | Portainer, Immich, Paperless etc. run outside K8s — their logs are not acquired. |
| Stream latency | ≤30 s window between ban decision and enforcement (halved from 60 s). |
| AppSec ban decisions | CRS inband blocks return HTTP 403 per-request immediately. However, the CRS on_match: SendAlert() is not triggered (that only fires in outofband mode). Persistent IP bans rely on the agent detecting repeated attacks in Traefik access logs (30 s window). |
| Vault Path | K8s Secret | Namespace | Purpose |
|---|---|---|---|
kv/crowdsec/bouncer |
crowdsec-bouncer-key |
crowdsec |
LAPI pre-registers the Traefik bouncer with this API key |
kv/traefik/crowdsec-bouncer |
crowdsec-bouncer-key |
traefik |
Bouncer plugin reads API key from /etc/crowdsec/api_key file mount |
kv/crowdsec/telegram |
crowdsec-telegram |
crowdsec |
Telegram bot token for ban notifications |
Dual-path pattern: The same bouncer API key lives at two Vault paths because K8s secrets cannot be shared across namespaces. VSO syncs each independently.
| Item | Location |
|---|---|
| ArgoCD Application | k8s-cluster-config/core-components/crowdsec/application.yaml |
| Helm values | k8s-cluster-config/core-components/crowdsec/values.yaml |
| VSO resources | k8s-cluster-config/core-components/crowdsec/resources/ |
| Traefik middleware | k8s-cluster-config/core-components/traefik/values.yaml (extraObjects) |
| LAPI service | crowdsec-service.crowdsec.svc.cluster.local:8080 |
| AppSec service | crowdsec-appsec-service.crowdsec.svc.cluster.local:7422 |
# List all active ban decisions
kubectl exec -n crowdsec deploy/crowdsec-lapi -- cscli decisions list
# Manually ban an IP (4 hour default)
kubectl exec -n crowdsec deploy/crowdsec-lapi -- cscli decisions add --ip 1.2.3.4
# Manually unban an IP
kubectl exec -n crowdsec deploy/crowdsec-lapi -- cscli decisions delete --ip 1.2.3.4
# Check registered machines (agents + appsec)
kubectl exec -n crowdsec deploy/crowdsec-lapi -- cscli machines list
# Check registered bouncers
kubectl exec -n crowdsec deploy/crowdsec-lapi -- cscli bouncers list
# Check recent alerts (last 10)
kubectl exec -n crowdsec deploy/crowdsec-lapi -- cscli alerts list --limit 10
# Check collections on a specific agent
kubectl exec -n crowdsec <agent-pod> -- cscli collections list
# Check AppSec collections and metrics
kubectl exec -n crowdsec deploy/crowdsec-appsec -- cscli collections list
kubectl exec -n crowdsec deploy/crowdsec-appsec -- cscli metrics
# Check acquisition pipeline metrics (parsed/whitelisted counts per source)
kubectl exec -n crowdsec <agent-pod> -- cscli metrics show acquisition
# Check CAPI status
kubectl exec -n crowdsec deploy/crowdsec-lapi -- cscli console status
# Force hard refresh in ArgoCD (clears manifest cache)
kubectl annotate application -n argocd crowdsec argocd.argoproj.io/refresh=hard --overwrite
# Re-register agents if LAPI was replaced (triggers fresh init containers)
kubectl rollout restart ds/crowdsec-agent -n crowdsec
kubectl rollout restart deploy/crowdsec-appsec -n crowdsec
A test script is maintained at k8s-cluster-config/core-components/crowdsec/test-scenarios.sh:
# Run from any machine (uses cscli inside LAPI pod — safe from any IP)
./test-scenarios.sh simulate all # SSH bf, HTTP bf, bad user-agent
# Test Telegram notification (adds a 2 min test ban)
./test-scenarios.sh telegram
# Real HTTP tests (external IP only — not effective from LAN)
./test-scenarios.sh real appsec # SQLi, XSS, path traversal, CVEs
./test-scenarios.sh real bruteforce # 30 rapid login failures
# Full status summary
./test-scenarios.sh status
# Cleanup all active decisions
./test-scenarios.sh cleanup
Incident (2026-03-06): LAPI was configured with Longhorn RWO PVCs. When the LAPI pod rescheduled to a different node, Longhorn threw a Multi-Attach error (RWO PVC already bound to the old node). The new pod started with an emptyDir (no PVC) — corrupting the SQLite WAL that was mid-write when the pod was evicted.
Result: Machine registrations from all 4 agents and AppSec were lost. Agents could not re-register because init containers only run once per pod lifetime — not on container restarts. All agents fell into CrashLoopBackOff with "machine not found" errors for 12+ hours.
Fix: Switch LAPI PVCs to nfs-synology StorageClass with ReadWriteMany. NFS has no single-node attachment constraint, so LAPI can reschedule freely without losing the SQLite database or agent registrations.
# values.yaml — correct PVC config
lapi:
persistentVolume:
data:
enabled: true
storageClassName: nfs-synology
accessModes:
- ReadWriteMany
size: 1Gi
config:
enabled: true
storageClassName: nfs-synology
accessModes:
- ReadWriteMany
size: 100Mi
If changing from Longhorn → NFS: PVC spec is immutable. Delete the old PVCs (remove the pvc-protection finalizer if stuck in Terminating), then force-sync ArgoCD to recreate them.
The wait-for-lapi-and-register init container runs cscli lapi register exactly once per pod lifetime. If the LAPI loses its database and a new LAPI pod starts without the previous machine registrations, the agents will fail with "machine not found" on every main container restart.
Recovery: Force new agent pods (which re-run the init container):
kubectl rollout restart ds/crowdsec-agent -n crowdsec
kubectl rollout restart deploy/crowdsec-appsec -n crowdsec
The HTTP notifier template receives []*models.Alert — the fields are not at the top level:
| Template Variable | Wrong | Correct |
|---|---|---|
| Attacker IP | {{.Value}} |
{{.Source.Value}} |
| Ban duration | {{.Duration}} |
{{(index .Decisions 0).Duration}} |
| Scenario | {{.Scenario}} |
{{.Scenario}} ✅ |
Using .Value causes a silent template render error (can't evaluate field Value in type *models.Alert) — LAPI logs the error but no Telegram message is sent. There is no visible failure in pod health or bouncer status.
externalTrafficPolicy: Cluster Blinds CrowdSec HTTP DetectionProblem: With externalTrafficPolicy: Cluster (the default), kube-proxy may route external traffic through any cluster node. The SNAT applied by kube-proxy rewrites the source IP to the pod-network gateway (10.244.x.x). Traefik logs this internal IP as ClientHost. The crowdsecurity/whitelist-good-actors parser whitelists all 10.0.0.0/8 addresses — so every Traefik log line is silently dropped by the detection pipeline. HTTP scenarios never trigger.
Verified impact: 1130+ Traefik log lines processed — 1130 whitelisted, 0 poured to any detection bucket.
Fix: Set externalTrafficPolicy: Local on the Traefik LoadBalancer service. MetalLB L2 mode then announces the VIP only from the node running the Traefik pod (node2). Traffic arrives at that node without SNAT, so Traefik logs the real client IP.
# traefik values.yaml
service:
spec:
loadBalancerIP: 192.168.88.12
externalTrafficPolicy: Local # was Cluster — see above
MetalLB automatically re-announces from a new node if Traefik reschedules. LAN traffic (192.168.88.x) continues to be whitelisted by crowdsecurity/whitelist-good-actors — only real internet IPs are now subject to detection.
The crowdsec-bouncer-traefik-plugin returns 403 for ALL traffic if it cannot reach the LAPI at startup. This caused a complete ingress outage during initial deployment.
Pre-flight before enabling the global middleware (--entryPoints.websecure.http.middlewares):
kubectl get pods -n crowdsec # all Running
kubectl get secret -n traefik crowdsec-bouncer-key # exists (VSO synced)
kubectl exec -n crowdsec deploy/crowdsec-lapi -- cscli bouncers list # traefik listed
https://crowdsec.github.io/helm-charts → 404https://crowdsecurity.github.io/helm-charts (org is crowdsecurity, not crowdsec)After changing the repo URL, ArgoCD kept showing the old error. A normal refresh was not enough — required a hard refresh to clear the manifest cache:
kubectl annotate application -n argocd crowdsec argocd.argoproj.io/refresh=hard --overwrite
Detection (parsers + scenarios) runs on agents, not the LAPI. The LAPI only stores and distributes decisions. crowdsecurity/traefik only needs to be in agent.env COLLECTIONS. The LAPI's cscli collections list showing fewer collections than agents is correct behaviour.
source in additionalAcquisitionThe chart JSON schema requires a source field in every additionalAcquisition entry:
additionalAcquisition:
- source: file # required by chart schema
filenames:
- /var/log/auth.log
labels:
type: syslog
crowdsecurity/virtual-patching Does NOT Load CRSThe hub provides three appsec-configs:
| Config | Inband rules | Outofband rules |
|---|---|---|
crowdsecurity/virtual-patching |
base-config, vpatch-* |
— |
crowdsecurity/appsec-default |
base-config, vpatch-*, generic-* |
experimental-* |
crowdsecurity/crs |
— | crs (alerts only — non-blocking by design) |
Using appsec_config: crowdsecurity/virtual-patching (the default example in CrowdSec docs) means CRS is never loaded at all — even if crowdsecurity/appsec-crs is installed via COLLECTIONS. SQLi, XSS, Log4Shell, and Shellshock all return HTTP 200.
Fix: Use a custom appsec-config that loads crowdsecurity/crs as an inband rule alongside vpatch-*:
# values.yaml
appsec:
configs:
crs-and-vpatch.yaml: |
name: custom/crs-and-vpatch
default_remediation: ban
inband_rules:
- crowdsecurity/base-config
- crowdsecurity/vpatch-*
- crowdsecurity/crs
acquisitions:
- source: appsec
listen_addr: "0.0.0.0:7422"
path: /
appsec_config: custom/crs-and-vpatch
labels:
type: appsec
The appsec.configs chart key creates a ConfigMap file mounted to /etc/crowdsec/appsec-configs/<filename>. The name field in the YAML becomes the appsec_config reference in acquisitions.
Note: Loading CRS inband uses anomaly scoring but skips the outofband-only
SendAlert()that the hub'scrsconfig applies. Detected attacks are blocked per-request (HTTP 403) but do not automatically create LAPI ban decisions. Persistent bans still require the agent detecting repeated attacks via Traefik log patterns.
Despite the Helm values using namespace/podName syntax, the chart generates a file-based acquisition config that watches /var/log/containers/traefik-*_traefik_*.log on each node's filesystem. The agent on the node where Traefik is NOT running will log a warning:
No matching files for pattern /var/log/containers/traefik-*_traefik_*.log
This is expected — only the agent on the same node as the Traefik pod reads Traefik logs.
Symptom: CI pipeline fails with 403 Forbidden on PATCH /v2/.../blobs/uploads/<uuid>?_state=.... Backend / Harbor logs show no errors. Credentials test fine. POST (initiate upload) returns 202 ✓.
Root cause: OWASP CRS anomaly scoring in the AppSec component flags binary PATCH request bodies (image layer data) with scores of 5–10, which exceeds the blocking threshold. The 403 is returned by Traefik's bouncer plugin — the request never reaches Harbor.
Diagnosis:
kubectl -n crowdsec logs deploy/crowdsec-appsec --since=30m | grep "WAF block"
# WAF block: anomaly score block: anomaly: 10, from 10.244.3.221 (10.244.1.6)
# The first IP is the source pod; the second is the Traefik pod forwarding it.
Fix: Add the K8s pod CIDR to clientTrustedIPs in the Traefik CrowdSec bouncer middleware. In-cluster traffic from pods is trusted internal traffic.
# k8s-cluster-config/core-components/traefik/values.yaml
clientTrustedIPs:
- "192.168.88.0/24"
- "192.168.20.0/24"
- "192.168.100.0/24"
- "10.244.0.0/16" # K8s pod CIDR — bypasses AppSec for in-cluster traffic
Key debugging insight: When a WAF blocks a request, the backend sees nothing — 403 comes from Traefik. Always check crowdsec-appsec logs first when facing unexplained 403s on specific HTTP methods/paths.