# SAPL — Kubernetes Redis Manifests for the shared Redis instance used by all SAPL pods for cross-pod rate limiting (DB 1) and view/static-file caching (DB 0). --- ## Directory layout ``` docker/k8s/ └── redis/ ├── redis-configmap.yaml # redis.conf — no persistence, allkeys-lru, 5 GB ceiling ├── redis-deployment.yaml # Deployment (1 replica, redis:7-alpine) └── redis-service.yaml # ClusterIP service on port 6379 ``` --- ## Prerequisites - `kubectl` configured to talk to the target cluster. - A `sapl-redis` namespace (created below if it doesn't exist). --- ## Deploy ```bash # 1. Create the namespace (idempotent) rancher kubectl create namespace sapl-redis --dry-run=client -o yaml | rancher kubectl apply -f - # 2. Apply all three manifests rancher kubectl apply -f docker/k8s/redis/redis-configmap.yaml rancher kubectl apply -f docker/k8s/redis/redis-deployment.yaml rancher kubectl apply -f docker/k8s/redis/redis-service.yaml # 3. Verify the pod is Running rancher kubectl -n sapl-redis get pods -l app=sapl-redis ``` Expected output: ``` NAME READY STATUS RESTARTS AGE sapl-redis-6d9f8b7c4d-xk2lm 1/1 Running 0 30s ``` --- ## Verify the rate limiter `scripts/test_ratelimiter.py` fires repeated GET requests at a SAPL URL and reports when the first 429 is returned. ### Usage ``` python scripts/test_ratelimiter.py [-n NUM] [-d DELAY] [-t TIMEOUT] ``` | Flag | Default | Meaning | |------|---------|---------| | `url` | *(required)* | Full URL including scheme, e.g. `http://localhost` | | `-n`, `--num-requests` | `50` | Maximum requests to send | | `-d`, `--delay` | `0.1` | Seconds between requests | | `-t`, `--timeout` | `10` | Per-request timeout in seconds | The script stops and prints a summary as soon as a 429 is received. ### Examples ```bash # Hit the anonymous threshold (35 req/min) — fire 40 requests with minimal delay python scripts/test_ratelimiter.py http://localhost -n 40 -d 0.05 # Slower fire — check that legitimate traffic is not rate-limited python scripts/test_ratelimiter.py http://localhost -n 20 -d 2 # Test against a staging pod via port-forward rancher kubectl port-forward -n deploy/sapl 8080:80 & python scripts/test_ratelimiter.py http://localhost:8080 -n 40 -d 0.05 ``` ### Reading the output ``` Request 1: Status 200 | Time: 0.045s ... Request 36: Status 429 | Time: 0.038s -> Rate limited on request 36 Summary: Total requests attempted: 36 Successful (200): 35 Rate limited (429): 1 First 429 occurred at request: 36 ``` A first-429 near the configured anonymous threshold (35 req/min) confirms the middleware is wired correctly. A first-429 much earlier points to nginx `limit_req` firing before Django sees the request. --- ## Inject REDIS_URL into SAPL instances `REDIS_URL` points at the shared instance: ``` redis://redis.sapl-redis.svc.cluster.local:6379 ^^^^^ ^^^^^^^^^^ svc namespace ``` `start.sh` picks it up on every pod startup and sets the `REDIS_CACHE` waffle switch automatically — no further intervention needed. ### Fleet-wide rollout Uses the `app.kubernetes.io/name=sapl` pod label to discover every SAPL namespace automatically — onboarding a new municipality requires no script changes. ```bash for ns in $(rancher kubectl get pods -A -l app.kubernetes.io/name=sapl \ -o jsonpath='{.items[*].metadata.namespace}' | tr ' ' '\n' | sort -u); do rancher kubectl set env deployment/sapl \ REDIS_URL=redis://redis.sapl-redis.svc.cluster.local:6379 \ -n $ns done ``` ### Roll back ```bash for ns in $(rancher kubectl get pods -A -l app.kubernetes.io/name=sapl \ -o jsonpath='{.items[*].metadata.namespace}' | tr ' ' '\n' | sort -u); do rancher kubectl set env deployment/sapl REDIS_URL- -n $ns done ``` `kubectl set env deployment/sapl REDIS_URL-` (trailing `-`) removes the variable. `start.sh` then falls back to file-based cache automatically. --- ## Monitor ### Pod and events ```bash # Pod status rancher kubectl -n sapl-redis get pods -l app=sapl-redis -o wide # Deployment events (useful right after apply) rancher kubectl -n sapl-redis describe deployment sapl-redis # Pod events (OOMKill, restarts, etc.) rancher kubectl -n sapl-redis describe pod -l app=sapl-redis ``` ### Logs ```bash # Tail live logs rancher kubectl -n sapl-redis logs -f deploy/sapl-redis # Last 100 lines rancher kubectl -n sapl-redis logs deploy/sapl-redis --tail=100 ``` ### Redis INFO ```bash # Memory usage rancher kubectl exec -n sapl-redis deploy/sapl-redis -- \ redis-cli info memory \ | grep -E 'used_memory_human|maxmemory_human|mem_fragmentation_ratio' # Connection pressure rancher kubectl exec -n sapl-redis deploy/sapl-redis -- \ redis-cli info stats \ | grep -E 'rejected_connections|instantaneous_ops_per_sec' # Key distribution per DB rancher kubectl exec -n sapl-redis deploy/sapl-redis -- redis-cli info keyspace # Recent slow queries rancher kubectl exec -n sapl-redis deploy/sapl-redis -- redis-cli slowlog get 10 # Live command sampling (1-second window) rancher kubectl exec -n sapl-redis deploy/sapl-redis -- redis-cli --latency-history -i 1 ``` ### Rate-limiter keys (DB 1) ```bash rancher kubectl exec -n sapl-redis deploy/sapl-redis -- \ redis-cli -n 1 dbsize rancher kubectl exec -n sapl-redis deploy/sapl-redis -- \ redis-cli -n 1 --scan --pattern 'rl:ip:*' | head -20 ``` --- ## Seed the UA deny list (once after first deploy) `rl:bot:ua:blocked` is a permanent Redis SET in DB 1. Each member is the SHA-256 of a **UA token** — the identifying fragment extracted after splitting on `/`, spaces, `;`, `(`, `)`, e.g.: ``` UA string: "GPTBot/1.1 (+https://openai.com/gptbot)" Tokens: GPTBot 1.1 +https: ... Hash stored: sha256("GPTBot") ``` The middleware (`_is_redis_blocked_ua`) tokenises the incoming UA the same way and checks each token hash against the cached set. The SET is fetched from Redis at most once per `RATE_LIMITER_UA_BLOCKLIST_REFRESH` seconds (default 60) per worker process. The bots in `BOT_UA_FRAGMENTS` (Python list, always active) and this Redis SET are **independent** — the Python list provides the baseline and the Redis SET allows adding new offenders at runtime **without a code deploy**. ```bash rancher kubectl exec -n sapl-redis deploy/sapl-redis -- redis-cli -n 1 \ SADD rl:bot:ua:blocked \ "$(echo -n 'GPTBot' | sha256sum | cut -d' ' -f1)" \ "$(echo -n 'ClaudeBot' | sha256sum | cut -d' ' -f1)" \ "$(echo -n 'PerplexityBot' | sha256sum | cut -d' ' -f1)" \ "$(echo -n 'Bytespider' | sha256sum | cut -d' ' -f1)" \ "$(echo -n 'AhrefsBot' | sha256sum | cut -d' ' -f1)" \ "$(echo -n 'meta-externalagent' | sha256sum | cut -d' ' -f1)" # Add a new offender at runtime (picked up within RATE_LIMITER_UA_BLOCKLIST_REFRESH seconds) rancher kubectl exec -n sapl-redis deploy/sapl-redis -- redis-cli -n 1 \ SADD rl:bot:ua:blocked "$(echo -n 'NewBot' | sha256sum | cut -d' ' -f1)" ``` --- ## Local standalone Redis (development / testing) No Kubernetes? Run Redis directly with Docker: ```bash sudo docker run --rm -p 6379:6379 redis:7-alpine \ redis-server --save "" --appendonly no ``` Then point Django at it by exporting the env var before starting the dev server: ```bash export REDIS_URL="redis://localhost:6379" export CACHE_BACKEND="redis" python manage.py runserver ``` Or add them to your local `.env` file: ``` REDIS_URL=redis://localhost:6379 CACHE_BACKEND=redis ``` > **Note**: the waffle switch `REDIS_CACHE` must also be `on` in your local > database for `start.sh` to activate the Redis backend. Run: > ```bash > python manage.py waffle_switch REDIS_CACHE on --create > ``` --- ## Update `redis.conf` without redeploying ```bash # Edit the ConfigMap rancher kubectl -n sapl-redis edit configmap redis-config # Restart the pod to pick up the new config rancher kubectl -n sapl-redis rollout restart deployment/sapl-redis ``` --- ## Rate limiting — two layers, two jobs SAPL enforces rate limits at two independent layers. They use different algorithms and protect different things; their thresholds must be tuned separately. ### Layer 1 — nginx `limit_req` (leaky bucket) Defined in `docker/config/nginx/nginx.conf` (zones) and `sapl.conf` (burst). ``` sapl_general rate=30r/m # 1 token every 2 s sapl_heavy rate=10r/m # 1 token every 6 s (PDF/report endpoints) ``` `burst=N nodelay` means nginx accepts up to N requests instantly above the current token level, then enforces the drip rate. Requests beyond the burst cap return 429 before reaching Gunicorn — **zero Python cost**. Burst values are set at container startup via env vars: | Env var | Default | Location | |---------|---------|----------| | `NGINX_BURST_GENERAL` | `60` | `location /`, `location /media/` | | `NGINX_BURST_API` | `60` | `location /api/` | | `NGINX_BURST_HEAVY` | `20` | `location /relatorios/` | Defaults are 2× the zone's per-minute rate, so a user can spend a full minute's quota in a single burst before the leaky bucket takes over. ### Layer 2 — Django `RateLimitMiddleware` (sliding window) Defined in `sapl/middleware/ratelimit.py`, backed by Redis DB 1. Requests that pass nginx reach Python. The middleware counts them in a 60-second sliding window per IP (anonymous) or per user (authenticated): | Env var | Default | Scope | |---------|---------|-------| | `RATE_LIMITER_RATE` | `35/m` | Anonymous IP | | `RATE_LIMITER_RATE_AUTHENTICATED` | `120/m` | Authenticated user | | `RATE_LIMITER_RATE_BOT` | `5/m` | *(reserved — bots are currently blocked outright, not counted)* | | `RATE_LIMITER_UA_BLOCKLIST_REFRESH` | `60` s | How often each worker re-fetches `rl:bot:ua:blocked` from Redis | When the window count hits the threshold the IP/user is written to a Redis blocked-set with a 300 s TTL and subsequent requests return 429 with `Retry-After: 300` — without touching the database. Decision flow inside `RateLimitMiddleware._evaluate()`: ``` 1. IP in whitelist? → pass (no further checks) 1a. UA matches BOT_UA_FRAGMENTS list? → 429 reason=known_ua 1b. UA token hash in rl:bot:ua:blocked SET? → 429 reason=redis_ua 2. IP in rl:ip:{ip}:blocked? → 429 reason=ip_blocked 3. Authenticated user? 3a. User in rl:{ns}:user:{uid}:blocked? → 429 reason=user_blocked 3b. Suspicious headers (no Accept/AL)? → 429 reason=suspicious_headers_auth 3c. User request count ≥ auth threshold? → SET blocked, 429 reason=auth_user_rate 4. Anonymous: 4a. Suspicious headers? → 429 reason=suspicious_headers 4b. IP request count ≥ anon threshold? → SET blocked, 429 reason=ip_rate 4c. NS/IP window count ≥ anon threshold? → SET blocked, 429 reason=ua_rotation → pass ``` ### Why they are not the same number | | nginx burst | Django threshold | |-|------------|-----------------| | **Algorithm** | Leaky bucket — token refills over time | Sliding window — hard count per 60 s | | **Protects** | Gunicorn workers from being flooded | Per-client fairness, business policy | | **Tuned by** | Capacity of the server | Acceptable request volume per client | | **Failure mode** | Workers overwhelmed | Legitimate user over-browsing | A user loading a page quickly may fire 5–10 Django requests in two seconds. With `rate=30r/m` (1 token/2 s) and `burst=60` they absorb that fine; the leaky bucket refills before they click the next link. The Django threshold (35/m sliding window) catches sustained automated traffic from a single IP that looks like scraping even if it arrives slowly enough to beat the nginx burst cap. --- ## Request routing — how nginx reaches Django `proxy_pass http://sapl_server` forwards the HTTP request — with the original path intact — to the Gunicorn Unix socket. Django doesn't know or care that nginx is in front; it sees a standard HTTP request. ``` GET /media/foo.pdf │ ▼ nginx (sapl.conf) location /media/ → proxy_pass to Unix socket │ ▼ Gunicorn (WSGI server) receives raw HTTP, calls Django WSGI application │ ▼ Django middleware stack (settings.MIDDLEWARE) RateLimitMiddleware → pass or 429 │ ▼ Django URL router (sapl/urls.py) r'^media/(?P.*)$' → serve_media │ ▼ serve_media(request, path='foo.pdf') returns HttpResponse with X-Accel-Redirect: /_accel/media/foo.pdf │ ▼ nginx sees X-Accel-Redirect header /_accel/media/ internal location → reads file from disk → sends to client ``` nginx does no routing beyond picking a `location` block. The mapping from URL path to Python function lives entirely in `sapl/urls.py`. `proxy_pass` is just a pipe. --- ## Media file serving — `serve_media` and X-Accel-Redirect All `/media/` requests (public and private) are routed through Gunicorn so that Django middleware runs on every hit. Nginx serves the file bytes via `X-Accel-Redirect` — the Gunicorn worker is freed as soon as it sends the response headers. ### nginx locations (`docker/config/nginx/sapl.conf`) ```nginx # Proxied to Gunicorn — Django middleware + serve_media() run here. location /media/ { limit_req zone=sapl_general burst=${NGINX_BURST_GENERAL} nodelay; proxy_pass http://sapl_server; } # Internal — only reachable via X-Accel-Redirect, not by external clients. location /_accel/media/ { internal; alias /var/interlegis/sapl/media/; sendfile on; etag on; } ``` ### Django view (`sapl/base/media.py`) `serve_media(request, path)` — registered at `^media/(?P.*)$` in `sapl/urls.py`. Per-request steps: 1. **Path traversal guard** — `os.path.abspath` check; raises 404 on escape. 2. **Auth gate** — `documentos_privados/` paths require an authenticated session; redirects to login otherwise. 3. **Path counter** — increments `rl:{ns}:path:{sha256}:reqs` in Redis DB 1 (TTL = `MEDIA_PATH_COUNTER_TTL`). 4. **Content-type cache** — reads `file:{ns}:{sha256}` from Django default cache (DB 0); on miss, calls `mimetypes.guess_type`, stores result (TTL = `MEDIA_FILE_CACHE_TTL`). 5. **Serve** — in DEBUG: `django.views.static.serve` directly. In production: `X-Accel-Redirect: /_accel/media/`. ### Settings | Setting | Default | Purpose | |---------|---------|---------| | `FILE_META_KEY` | `'file:{ns}:{sha256}'` | Key template for content-type cache (DB 0) | | `MEDIA_PATH_COUNTER_TTL` | `60` s | Per-path counter window | | `MEDIA_FILE_CACHE_TTL` | `3600` s | Content-type metadata TTL | --- ## Key schema reference | DB | Use case | Key pattern | TTL | Constant | |----|----------|-------------|-----|----------| | 0 | Page / view cache | `cache:{ns}:*` | 300 s (default) | `CACHES['default']` KEY_PREFIX | | 0 | Static file cache (logos) | `static:{ns}:{sha256}` | 3 – 24 h | *Future* (requires OpenResty/Lua) | | 0 | Media file content-type cache | `file:{ns}:{sha256}` | 1 h | `FILE_META_KEY` | | 1 | IP rate-limit counter | `rl:ip:{ip}:reqs` | 60 s | `RL_IP_REQUESTS` | | 1 | IP blocked marker | `rl:ip:{ip}:blocked` | 300 s | `RL_IP_BLOCKED` | | 1 | User rate-limit counter | `rl:{ns}:user:{uid}:reqs` | 60 s | `RL_USER_REQUESTS` | | 1 | User blocked marker | `rl:{ns}:user:{uid}:blocked` | 300 s | `RL_USER_BLOCKED` | | 1 | Namespace/IP sliding window | `rl:{ns}:ip:{ip}:w:{bucket}` | 120 s | `RL_NS_WINDOW` | | 1 | Path counter (`/media/`) | `rl:{ns}:path:{sha256}:reqs` | 60 s | `RL_PATH_REQUESTS` | | 1 | Path counter (`/static/`) | `rl:{ns}:path:{sha256}:reqs` | 60 s | *Future* (requires OpenResty/Lua) | | 1 | UA deny list | `rl:bot:ua:blocked` | permanent SET | `RL_UA_BLOCKLIST` | | 2 | Django Channels | `channels:*` | session TTL | *Future* |