# SAPL — Kubernetes Redis

Manifests for the shared Redis instance used by all SAPL pods for
cross-pod rate limiting (DB 1) and view/static-file caching (DB 0).

---

## Directory layout

```
docker/k8s/
└── redis/
    ├── redis-configmap.yaml    # redis.conf — no persistence, allkeys-lru, 5 GB ceiling
    ├── redis-deployment.yaml   # Deployment (1 replica, redis:7-alpine)
    └── redis-service.yaml      # ClusterIP service on port 6379
```

---

## Prerequisites

- `kubectl` configured to talk to the target cluster.
- A `sapl-redis` namespace (created below if it doesn't exist).

---

## Deploy

```bash
# 1. Create the namespace (idempotent)
rancher kubectl create namespace sapl-redis --dry-run=client -o yaml | rancher kubectl apply -f -

# 2. Apply all three manifests
rancher kubectl apply -f docker/k8s/redis/redis-configmap.yaml
rancher kubectl apply -f docker/k8s/redis/redis-deployment.yaml
rancher kubectl apply -f docker/k8s/redis/redis-service.yaml

# 3. Verify the pod is Running
rancher kubectl -n sapl-redis get pods -l app=sapl-redis
```

Expected output:
```
NAME                          READY   STATUS    RESTARTS   AGE
sapl-redis-6d9f8b7c4d-xk2lm   1/1     Running   0          30s
```

---

## Verify the rate limiter

`scripts/test_ratelimiter.py` fires repeated GET requests at a SAPL URL and reports
when the first 429 is returned.

### Usage

```
python scripts/test_ratelimiter.py <URL> [-n NUM] [-d DELAY] [-t TIMEOUT]
```

| Flag | Default | Meaning |
|------|---------|---------|
| `url` | *(required)* | Full URL including scheme, e.g. `http://localhost` |
| `-n`, `--num-requests` | `50` | Maximum requests to send |
| `-d`, `--delay` | `0.1` | Seconds between requests |
| `-t`, `--timeout` | `10` | Per-request timeout in seconds |

The script stops and prints a summary as soon as a 429 is received.

### Examples

```bash
# Hit the anonymous threshold (35 req/min) — fire 40 requests with minimal delay
python scripts/test_ratelimiter.py http://localhost -n 40 -d 0.05

# Slower fire — check that legitimate traffic is not rate-limited
python scripts/test_ratelimiter.py http://localhost -n 20 -d 2

# Test against a staging pod via port-forward
rancher kubectl port-forward -n <NAMESPACE> deploy/sapl 8080:80 &
python scripts/test_ratelimiter.py http://localhost:8080 -n 40 -d 0.05
```

### Reading the output

```
Request   1: Status 200 | Time: 0.045s
...
Request  36: Status 429 | Time: 0.038s
  -> Rate limited on request 36

Summary:
  Total requests attempted: 36
  Successful (200):          35
  Rate limited (429):        1
  First 429 occurred at request: 36
```

A first-429 near the configured anonymous threshold (35 req/min) confirms the
middleware is wired correctly. A first-429 much earlier points to nginx `limit_req`
firing before Django sees the request.

---

## Inject REDIS_URL into SAPL instances

`REDIS_URL` points at the shared instance:

```
redis://redis.sapl-redis.svc.cluster.local:6379
         ^^^^^  ^^^^^^^^^^
         svc    namespace
```

`start.sh` picks it up on every pod startup and sets the `REDIS_CACHE` waffle switch
automatically — no further intervention needed.

### Fleet-wide rollout

Uses the `app.kubernetes.io/name=sapl` pod label to discover every SAPL namespace
automatically — onboarding a new municipality requires no script changes.

```bash
for ns in $(rancher kubectl get pods -A -l app.kubernetes.io/name=sapl \
  -o jsonpath='{.items[*].metadata.namespace}' | tr ' ' '\n' | sort -u); do
  rancher kubectl set env deployment/sapl \
    REDIS_URL=redis://redis.sapl-redis.svc.cluster.local:6379 \
    -n $ns
done
```

### Roll back

```bash
for ns in $(rancher kubectl get pods -A -l app.kubernetes.io/name=sapl \
  -o jsonpath='{.items[*].metadata.namespace}' | tr ' ' '\n' | sort -u); do
  rancher kubectl set env deployment/sapl REDIS_URL- -n $ns
done
```

`kubectl set env deployment/sapl REDIS_URL-` (trailing `-`) removes the variable.
`start.sh` then falls back to file-based cache automatically.

---

## Monitor

### Pod and events

```bash
# Pod status
rancher kubectl -n sapl-redis get pods -l app=sapl-redis -o wide

# Deployment events (useful right after apply)
rancher kubectl -n sapl-redis describe deployment sapl-redis

# Pod events (OOMKill, restarts, etc.)
rancher kubectl -n sapl-redis describe pod -l app=sapl-redis
```

### Logs

```bash
# Tail live logs
rancher kubectl -n sapl-redis logs -f deploy/sapl-redis

# Last 100 lines
rancher kubectl -n sapl-redis logs deploy/sapl-redis --tail=100
```

### Redis INFO

```bash
# Memory usage
rancher kubectl exec -n sapl-redis deploy/sapl-redis -- \
  redis-cli info memory \
  | grep -E 'used_memory_human|maxmemory_human|mem_fragmentation_ratio'

# Connection pressure
rancher kubectl exec -n sapl-redis deploy/sapl-redis -- \
  redis-cli info stats \
  | grep -E 'rejected_connections|instantaneous_ops_per_sec'

# Key distribution per DB
rancher kubectl exec -n sapl-redis deploy/sapl-redis -- redis-cli info keyspace

# Recent slow queries
rancher kubectl exec -n sapl-redis deploy/sapl-redis -- redis-cli slowlog get 10

# Live command sampling (1-second window)
rancher kubectl exec -n sapl-redis deploy/sapl-redis -- redis-cli --latency-history -i 1
```

### Rate-limiter keys (DB 1)

```bash
rancher kubectl exec -n sapl-redis deploy/sapl-redis -- \
  redis-cli -n 1 dbsize

rancher kubectl exec -n sapl-redis deploy/sapl-redis -- \
  redis-cli -n 1 --scan --pattern 'rl:ip:*' | head -20
```

---

## Seed the UA deny list (once after first deploy)

`rl:bot:ua:blocked` is a permanent Redis SET in DB 1.  Each member is the
SHA-256 of a **UA token** — the identifying fragment extracted after splitting
on `/`, spaces, `;`, `(`, `)`, e.g.:

```
UA string:  "GPTBot/1.1 (+https://openai.com/gptbot)"
Tokens:      GPTBot  1.1  +https:  ...
Hash stored: sha256("GPTBot")
```

The middleware (`_is_redis_blocked_ua`) tokenises the incoming UA the same
way and checks each token hash against the cached set.  The SET is fetched
from Redis at most once per `RATE_LIMITER_UA_BLOCKLIST_REFRESH` seconds (default 60)
per worker process.

The bots in `BOT_UA_FRAGMENTS` (Python list, always active) and this Redis
SET are **independent** — the Python list provides the baseline and the Redis
SET allows adding new offenders at runtime **without a code deploy**.

```bash
rancher kubectl exec -n sapl-redis deploy/sapl-redis -- redis-cli -n 1 \
  SADD rl:bot:ua:blocked \
    "$(echo -n 'GPTBot'             | sha256sum | cut -d' ' -f1)" \
    "$(echo -n 'ClaudeBot'          | sha256sum | cut -d' ' -f1)" \
    "$(echo -n 'PerplexityBot'      | sha256sum | cut -d' ' -f1)" \
    "$(echo -n 'Bytespider'         | sha256sum | cut -d' ' -f1)" \
    "$(echo -n 'AhrefsBot'          | sha256sum | cut -d' ' -f1)" \
    "$(echo -n 'meta-externalagent' | sha256sum | cut -d' ' -f1)"

# Add a new offender at runtime (picked up within RATE_LIMITER_UA_BLOCKLIST_REFRESH seconds)
rancher kubectl exec -n sapl-redis deploy/sapl-redis -- redis-cli -n 1 \
  SADD rl:bot:ua:blocked "$(echo -n 'NewBot' | sha256sum | cut -d' ' -f1)"
```

---

## Local standalone Redis (development / testing)

No Kubernetes? Run Redis directly with Docker:

```bash
sudo docker run --rm -p 6379:6379 redis:7-alpine \
  redis-server --save "" --appendonly no
```

Then point Django at it by exporting the env var before starting the dev server:

```bash
export REDIS_URL="redis://localhost:6379"
export CACHE_BACKEND="redis"
python manage.py runserver
```

Or add them to your local `.env` file:

```
REDIS_URL=redis://localhost:6379
CACHE_BACKEND=redis
```

> **Note**: the waffle switch `REDIS_CACHE` must also be `on` in your local
> database for `start.sh` to activate the Redis backend. Run:
> ```bash
> python manage.py waffle_switch REDIS_CACHE on --create
> ```

---

## Update `redis.conf` without redeploying

```bash
# Edit the ConfigMap
rancher kubectl -n sapl-redis edit configmap redis-config

# Restart the pod to pick up the new config
rancher kubectl -n sapl-redis rollout restart deployment/sapl-redis
```

---

## Rate limiting — two layers, two jobs

SAPL enforces rate limits at two independent layers. They use different
algorithms and protect different things; their thresholds must be tuned
separately.

### Layer 1 — nginx `limit_req` (leaky bucket)

Defined in `docker/config/nginx/nginx.conf` (zones) and `sapl.conf` (burst).

```
sapl_general  rate=30r/m   # 1 token every 2 s
sapl_heavy    rate=10r/m   # 1 token every 6 s  (PDF/report endpoints)
```

`burst=N nodelay` means nginx accepts up to N requests instantly above the
current token level, then enforces the drip rate.  Requests beyond the burst
cap return 429 before reaching Gunicorn — **zero Python cost**.

Burst values are set at container startup via env vars:

| Env var | Default | Location |
|---------|---------|----------|
| `NGINX_BURST_GENERAL` | `60` | `location /`, `location /media/` |
| `NGINX_BURST_API` | `60` | `location /api/` |
| `NGINX_BURST_HEAVY` | `20` | `location /relatorios/` |

Defaults are 2× the zone's per-minute rate, so a user can spend a full
minute's quota in a single burst before the leaky bucket takes over.

### Layer 2 — Django `RateLimitMiddleware` (sliding window)

Defined in `sapl/middleware/ratelimit.py`, backed by Redis DB 1.

Requests that pass nginx reach Python.  The middleware counts them in a
60-second sliding window per IP (anonymous) or per user (authenticated):

| Env var | Default | Scope |
|---------|---------|-------|
| `RATE_LIMITER_RATE` | `35/m` | Anonymous IP |
| `RATE_LIMITER_RATE_AUTHENTICATED` | `120/m` | Authenticated user |
| `RATE_LIMITER_RATE_BOT` | `5/m` | *(reserved — bots are currently blocked outright, not counted)* |
| `RATE_LIMITER_UA_BLOCKLIST_REFRESH` | `60` s | How often each worker re-fetches `rl:bot:ua:blocked` from Redis |

When the window count hits the threshold the IP/user is written to a Redis
blocked-set with a 300 s TTL and subsequent requests return 429 with
`Retry-After: 300` — without touching the database.

Decision flow inside `RateLimitMiddleware._evaluate()`:

```
1.  IP in whitelist?                          → pass (no further checks)
1a. UA matches BOT_UA_FRAGMENTS list?         → 429  reason=known_ua
1b. UA token hash in rl:bot:ua:blocked SET?   → 429  reason=redis_ua
2.  IP in rl:ip:{ip}:blocked?                 → 429  reason=ip_blocked
3.  Authenticated user?
    3a. User in rl:{ns}:user:{uid}:blocked?   → 429  reason=user_blocked
    3b. Suspicious headers (no Accept/AL)?    → 429  reason=suspicious_headers_auth
    3c. User request count ≥ auth threshold?  → SET blocked, 429  reason=auth_user_rate
4.  Anonymous:
    4a. Suspicious headers?                   → 429  reason=suspicious_headers
    4b. IP request count ≥ anon threshold?    → SET blocked, 429  reason=ip_rate
    4c. NS/IP window count ≥ anon threshold?  → SET blocked, 429  reason=ua_rotation
    → pass
```

### Why they are not the same number

| | nginx burst | Django threshold |
|-|------------|-----------------|
| **Algorithm** | Leaky bucket — token refills over time | Sliding window — hard count per 60 s |
| **Protects** | Gunicorn workers from being flooded | Per-client fairness, business policy |
| **Tuned by** | Capacity of the server | Acceptable request volume per client |
| **Failure mode** | Workers overwhelmed | Legitimate user over-browsing |

A user loading a page quickly may fire 5–10 Django requests in two seconds.
With `rate=30r/m` (1 token/2 s) and `burst=60` they absorb that fine; the
leaky bucket refills before they click the next link.  The Django threshold
(35/m sliding window) catches sustained automated traffic from a single IP
that looks like scraping even if it arrives slowly enough to beat the nginx
burst cap.

---

## Request routing — how nginx reaches Django

`proxy_pass http://sapl_server` forwards the HTTP request — with the original
path intact — to the Gunicorn Unix socket.  Django doesn't know or care that
nginx is in front; it sees a standard HTTP request.

```
GET /media/foo.pdf
      │
      ▼
   nginx (sapl.conf)
   location /media/ → proxy_pass to Unix socket
      │
      ▼
   Gunicorn (WSGI server)
   receives raw HTTP, calls Django WSGI application
      │
      ▼
   Django middleware stack (settings.MIDDLEWARE)
   RateLimitMiddleware → pass or 429
      │
      ▼
   Django URL router (sapl/urls.py)
   r'^media/(?P<path>.*)$' → serve_media
      │
      ▼
   serve_media(request, path='foo.pdf')
   returns HttpResponse with X-Accel-Redirect: /_accel/media/foo.pdf
      │
      ▼
   nginx sees X-Accel-Redirect header
   /_accel/media/ internal location → reads file from disk → sends to client
```

nginx does no routing beyond picking a `location` block.  The mapping from
URL path to Python function lives entirely in `sapl/urls.py`.  `proxy_pass` is
just a pipe.

---

## Media file serving — `serve_media` and X-Accel-Redirect

All `/media/` requests (public and private) are routed through Gunicorn so that
Django middleware runs on every hit.  Nginx serves the file bytes via
`X-Accel-Redirect` — the Gunicorn worker is freed as soon as it sends the
response headers.

### nginx locations (`docker/config/nginx/sapl.conf`)

```nginx
# Proxied to Gunicorn — Django middleware + serve_media() run here.
location /media/ {
    limit_req zone=sapl_general burst=${NGINX_BURST_GENERAL} nodelay;
    proxy_pass http://sapl_server;
}

# Internal — only reachable via X-Accel-Redirect, not by external clients.
location /_accel/media/ {
    internal;
    alias /var/interlegis/sapl/media/;
    sendfile on;
    etag on;
}
```

### Django view (`sapl/base/media.py`)

`serve_media(request, path)` — registered at `^media/(?P<path>.*)$` in `sapl/urls.py`.

Per-request steps:

1. **Path traversal guard** — `os.path.abspath` check; raises 404 on escape.
2. **Auth gate** — `documentos_privados/` paths require an authenticated session; redirects to login otherwise.
3. **Path counter** — increments `rl:{ns}:path:{sha256}:reqs` in Redis DB 1 (TTL = `MEDIA_PATH_COUNTER_TTL`).
4. **Content-type cache** — reads `file:{ns}:{sha256}` from Django default cache (DB 0); on miss, calls `mimetypes.guess_type`, stores result (TTL = `MEDIA_FILE_CACHE_TTL`).
5. **Serve** — in DEBUG: `django.views.static.serve` directly.  In production: `X-Accel-Redirect: /_accel/media/<path>`.

### Settings

| Setting | Default | Purpose |
|---------|---------|---------|
| `FILE_META_KEY` | `'file:{ns}:{sha256}'` | Key template for content-type cache (DB 0) |
| `MEDIA_PATH_COUNTER_TTL` | `60` s | Per-path counter window |
| `MEDIA_FILE_CACHE_TTL` | `3600` s | Content-type metadata TTL |

---

## Key schema reference

| DB | Use case | Key pattern | TTL | Constant |
|----|----------|-------------|-----|----------|
| 0 | Page / view cache | `cache:{ns}:*` | 300 s (default) | `CACHES['default']` KEY_PREFIX |
| 0 | Static file cache (logos) | `static:{ns}:{sha256}` | 3 – 24 h | *Future* (requires OpenResty/Lua) |
| 0 | Media file content-type cache | `file:{ns}:{sha256}` | 1 h | `FILE_META_KEY` |
| 1 | IP rate-limit counter | `rl:ip:{ip}:reqs` | 60 s | `RL_IP_REQUESTS` |
| 1 | IP blocked marker | `rl:ip:{ip}:blocked` | 300 s | `RL_IP_BLOCKED` |
| 1 | User rate-limit counter | `rl:{ns}:user:{uid}:reqs` | 60 s | `RL_USER_REQUESTS` |
| 1 | User blocked marker | `rl:{ns}:user:{uid}:blocked` | 300 s | `RL_USER_BLOCKED` |
| 1 | Namespace/IP sliding window | `rl:{ns}:ip:{ip}:w:{bucket}` | 120 s | `RL_NS_WINDOW` |
| 1 | Path counter (`/media/`) | `rl:{ns}:path:{sha256}:reqs` | 60 s | `RL_PATH_REQUESTS`  |
| 1 | Path counter (`/static/`) | `rl:{ns}:path:{sha256}:reqs` | 60 s | *Future* (requires OpenResty/Lua) |
| 1 | UA deny list | `rl:bot:ua:blocked` | permanent SET | `RL_UA_BLOCKLIST` |
| 2 | Django Channels | `channels:*` | session TTL | *Future* |