Sistema de Apoio ao Processo Legislativo

16 KiB

Raw Blame History

SAPL — Kubernetes Redis

Manifests for the shared Redis instance used by all SAPL pods for cross-pod rate limiting (DB 1) and view/static-file caching (DB 0).

Directory layout

docker/k8s/
└── redis/
    ├── redis-configmap.yaml    # redis.conf — no persistence, allkeys-lru, 5 GB ceiling
    ├── redis-deployment.yaml   # Deployment (1 replica, redis:7-alpine)
    └── redis-service.yaml      # ClusterIP service on port 6379

Prerequisites

kubectl configured to talk to the target cluster.
A sapl-redis namespace (created below if it doesn't exist).

Deploy

# 1. Create the namespace (idempotent)
rancher kubectl create namespace sapl-redis --dry-run=client -o yaml | rancher kubectl apply -f -

# 2. Apply all three manifests
rancher kubectl apply -f docker/k8s/redis/redis-configmap.yaml
rancher kubectl apply -f docker/k8s/redis/redis-deployment.yaml
rancher kubectl apply -f docker/k8s/redis/redis-service.yaml

# 3. Verify the pod is Running
rancher kubectl -n sapl-redis get pods -l app=sapl-redis

Expected output:

NAME                          READY   STATUS    RESTARTS   AGE
sapl-redis-6d9f8b7c4d-xk2lm   1/1     Running   0          30s

Verify the rate limiter

scripts/test_ratelimiter.py fires repeated GET requests at a SAPL URL and reports when the first 429 is returned.

Usage

python scripts/test_ratelimiter.py <URL> [-n NUM] [-d DELAY] [-t TIMEOUT]

Flag	Default	Meaning
`url`	(required)	Full URL including scheme, e.g. `http://localhost`
`-n`, `--num-requests`	`50`	Maximum requests to send
`-d`, `--delay`	`0.1`	Seconds between requests
`-t`, `--timeout`	`10`	Per-request timeout in seconds

The script stops and prints a summary as soon as a 429 is received.

Examples

# Hit the anonymous threshold (35 req/min) — fire 40 requests with minimal delay
python scripts/test_ratelimiter.py http://localhost -n 40 -d 0.05

# Slower fire — check that legitimate traffic is not rate-limited
python scripts/test_ratelimiter.py http://localhost -n 20 -d 2

# Test against a staging pod via port-forward
rancher kubectl port-forward -n <NAMESPACE> deploy/sapl 8080:80 &
python scripts/test_ratelimiter.py http://localhost:8080 -n 40 -d 0.05

Reading the output

Request   1: Status 200 | Time: 0.045s
...
Request  36: Status 429 | Time: 0.038s
  -> Rate limited on request 36

Summary:
  Total requests attempted: 36
  Successful (200):          35
  Rate limited (429):        1
  First 429 occurred at request: 36

A first-429 near the configured anonymous threshold (35 req/min) confirms the middleware is wired correctly. A first-429 much earlier points to nginx limit_req firing before Django sees the request.

Inject REDIS_URL into SAPL instances

REDIS_URL points at the shared instance:

redis://redis.sapl-redis.svc.cluster.local:6379
         ^^^^^  ^^^^^^^^^^
         svc    namespace

start.sh picks it up on every pod startup and sets the REDIS_CACHE waffle switch automatically — no further intervention needed.

Fleet-wide rollout

Uses the app.kubernetes.io/name=sapl pod label to discover every SAPL namespace automatically — onboarding a new municipality requires no script changes.

for ns in $(rancher kubectl get pods -A -l app.kubernetes.io/name=sapl \
  -o jsonpath='{.items[*].metadata.namespace}' | tr ' ' '\n' | sort -u); do
  rancher kubectl set env deployment/sapl \
    REDIS_URL=redis://redis.sapl-redis.svc.cluster.local:6379 \
    -n $ns
done

Roll back

for ns in $(rancher kubectl get pods -A -l app.kubernetes.io/name=sapl \
  -o jsonpath='{.items[*].metadata.namespace}' | tr ' ' '\n' | sort -u); do
  rancher kubectl set env deployment/sapl REDIS_URL- -n $ns
done

kubectl set env deployment/sapl REDIS_URL- (trailing -) removes the variable. start.sh then falls back to file-based cache automatically.

Monitor

Pod and events

# Pod status
rancher kubectl -n sapl-redis get pods -l app=sapl-redis -o wide

# Deployment events (useful right after apply)
rancher kubectl -n sapl-redis describe deployment sapl-redis

# Pod events (OOMKill, restarts, etc.)
rancher kubectl -n sapl-redis describe pod -l app=sapl-redis

Logs

# Tail live logs
rancher kubectl -n sapl-redis logs -f deploy/sapl-redis

# Last 100 lines
rancher kubectl -n sapl-redis logs deploy/sapl-redis --tail=100

Redis INFO

# Memory usage
rancher kubectl exec -n sapl-redis deploy/sapl-redis -- \
  redis-cli info memory \
  | grep -E 'used_memory_human|maxmemory_human|mem_fragmentation_ratio'

# Connection pressure
rancher kubectl exec -n sapl-redis deploy/sapl-redis -- \
  redis-cli info stats \
  | grep -E 'rejected_connections|instantaneous_ops_per_sec'

# Key distribution per DB
rancher kubectl exec -n sapl-redis deploy/sapl-redis -- redis-cli info keyspace

# Recent slow queries
rancher kubectl exec -n sapl-redis deploy/sapl-redis -- redis-cli slowlog get 10

# Live command sampling (1-second window)
rancher kubectl exec -n sapl-redis deploy/sapl-redis -- redis-cli --latency-history -i 1

Rate-limiter keys (DB 1)

rancher kubectl exec -n sapl-redis deploy/sapl-redis -- \
  redis-cli -n 1 dbsize

rancher kubectl exec -n sapl-redis deploy/sapl-redis -- \
  redis-cli -n 1 --scan --pattern 'rl:ip:*' | head -20

Seed the UA deny list (once after first deploy)

rl:bot:ua:blocked is a permanent Redis SET in DB 1. Each member is the SHA-256 of a UA token — the identifying fragment extracted after splitting on /, spaces, ;, (, ), e.g.:

UA string:  "GPTBot/1.1 (+https://openai.com/gptbot)"
Tokens:      GPTBot  1.1  +https:  ...
Hash stored: sha256("GPTBot")

The middleware (_is_redis_blocked_ua) tokenises the incoming UA the same way and checks each token hash against the cached set. The SET is fetched from Redis at most once per RATE_LIMITER_UA_BLOCKLIST_REFRESH seconds (default 60) per worker process.

The bots in BOT_UA_FRAGMENTS (Python list, always active) and this Redis SET are independent — the Python list provides the baseline and the Redis SET allows adding new offenders at runtime without a code deploy.

rancher kubectl exec -n sapl-redis deploy/sapl-redis -- redis-cli -n 1 \
  SADD rl:bot:ua:blocked \
    "$(echo -n 'GPTBot'             | sha256sum | cut -d' ' -f1)" \
    "$(echo -n 'ClaudeBot'          | sha256sum | cut -d' ' -f1)" \
    "$(echo -n 'PerplexityBot'      | sha256sum | cut -d' ' -f1)" \
    "$(echo -n 'Bytespider'         | sha256sum | cut -d' ' -f1)" \
    "$(echo -n 'AhrefsBot'          | sha256sum | cut -d' ' -f1)" \
    "$(echo -n 'meta-externalagent' | sha256sum | cut -d' ' -f1)"

# Add a new offender at runtime (picked up within RATE_LIMITER_UA_BLOCKLIST_REFRESH seconds)
rancher kubectl exec -n sapl-redis deploy/sapl-redis -- redis-cli -n 1 \
  SADD rl:bot:ua:blocked "$(echo -n 'NewBot' | sha256sum | cut -d' ' -f1)"

Local standalone Redis (development / testing)

No Kubernetes? Run Redis directly with Docker:

sudo docker run --rm -p 6379:6379 redis:7-alpine \
  redis-server --save "" --appendonly no

Then point Django at it by exporting the env var before starting the dev server:

export REDIS_URL="redis://localhost:6379"
export CACHE_BACKEND="redis"
python manage.py runserver

Or add them to your local .env file:

REDIS_URL=redis://localhost:6379
CACHE_BACKEND=redis

Note: the waffle switch REDIS_CACHE must also be on in your local database for start.sh to activate the Redis backend. Run:
python manage.py waffle_switch REDIS_CACHE on --create

Update `redis.conf` without redeploying

# Edit the ConfigMap
rancher kubectl -n sapl-redis edit configmap redis-config

# Restart the pod to pick up the new config
rancher kubectl -n sapl-redis rollout restart deployment/sapl-redis

Rate limiting — two layers, two jobs

SAPL enforces rate limits at two independent layers. They use different algorithms and protect different things; their thresholds must be tuned separately.

Layer 1 — nginx `limit_req` (leaky bucket)

Defined in docker/config/nginx/nginx.conf (zones) and sapl.conf (burst).

sapl_general  rate=30r/m   # 1 token every 2 s
sapl_heavy    rate=10r/m   # 1 token every 6 s  (PDF/report endpoints)

burst=N nodelay means nginx accepts up to N requests instantly above the current token level, then enforces the drip rate. Requests beyond the burst cap return 429 before reaching Gunicorn — zero Python cost.

Burst values are set at container startup via env vars:

Env var	Default	Location
`NGINX_BURST_GENERAL`	`60`	`location /`, `location /media/`
`NGINX_BURST_API`	`60`	`location /api/`
`NGINX_BURST_HEAVY`	`20`	`location /relatorios/`

Defaults are 2× the zone's per-minute rate, so a user can spend a full minute's quota in a single burst before the leaky bucket takes over.

Layer 2 — Django `RateLimitMiddleware` (sliding window)

Defined in sapl/middleware/ratelimit.py, backed by Redis DB 1.

Requests that pass nginx reach Python. The middleware counts them in a 60-second sliding window per IP (anonymous) or per user (authenticated):

Env var	Default	Scope
`RATE_LIMITER_RATE`	`35/m`	Anonymous IP
`RATE_LIMITER_RATE_AUTHENTICATED`	`120/m`	Authenticated user
`RATE_LIMITER_RATE_BOT`	`5/m`	(reserved — bots are currently blocked outright, not counted)
`RATE_LIMITER_UA_BLOCKLIST_REFRESH`	`60` s	How often each worker re-fetches `rl:bot:ua:blocked` from Redis

When the window count hits the threshold the IP/user is written to a Redis blocked-set with a 300 s TTL and subsequent requests return 429 with Retry-After: 300 — without touching the database.

Decision flow inside RateLimitMiddleware._evaluate():

1.  IP in whitelist?                          → pass (no further checks)
1a. UA matches BOT_UA_FRAGMENTS list?         → 429  reason=known_ua
1b. UA token hash in rl:bot:ua:blocked SET?   → 429  reason=redis_ua
2.  IP in rl:ip:{ip}:blocked?                 → 429  reason=ip_blocked
3.  Authenticated user?
    3a. User in rl:{ns}:user:{uid}:blocked?   → 429  reason=user_blocked
    3b. Suspicious headers (no Accept/AL)?    → 429  reason=suspicious_headers_auth
    3c. User request count ≥ auth threshold?  → SET blocked, 429  reason=auth_user_rate
4.  Anonymous:
    4a. Suspicious headers?                   → 429  reason=suspicious_headers
    4b. IP request count ≥ anon threshold?    → SET blocked, 429  reason=ip_rate
    4c. NS/IP window count ≥ anon threshold?  → SET blocked, 429  reason=ua_rotation
    → pass

Why they are not the same number

	nginx burst	Django threshold
Algorithm	Leaky bucket — token refills over time	Sliding window — hard count per 60 s
Protects	Gunicorn workers from being flooded	Per-client fairness, business policy
Tuned by	Capacity of the server	Acceptable request volume per client
Failure mode	Workers overwhelmed	Legitimate user over-browsing

A user loading a page quickly may fire 5–10 Django requests in two seconds. With rate=30r/m (1 token/2 s) and burst=60 they absorb that fine; the leaky bucket refills before they click the next link. The Django threshold (35/m sliding window) catches sustained automated traffic from a single IP that looks like scraping even if it arrives slowly enough to beat the nginx burst cap.

Request routing — how nginx reaches Django

proxy_pass http://sapl_server forwards the HTTP request — with the original path intact — to the Gunicorn Unix socket. Django doesn't know or care that nginx is in front; it sees a standard HTTP request.

GET /media/foo.pdf
      │
      ▼
   nginx (sapl.conf)
   location /media/ → proxy_pass to Unix socket
      │
      ▼
   Gunicorn (WSGI server)
   receives raw HTTP, calls Django WSGI application
      │
      ▼
   Django middleware stack (settings.MIDDLEWARE)
   RateLimitMiddleware → pass or 429
      │
      ▼
   Django URL router (sapl/urls.py)
   r'^media/(?P<path>.*)$' → serve_media
      │
      ▼
   serve_media(request, path='foo.pdf')
   returns HttpResponse with X-Accel-Redirect: /_accel/media/foo.pdf
      │
      ▼
   nginx sees X-Accel-Redirect header
   /_accel/media/ internal location → reads file from disk → sends to client

nginx does no routing beyond picking a location block. The mapping from URL path to Python function lives entirely in sapl/urls.py. proxy_pass is just a pipe.

Media file serving — `serve_media` and X-Accel-Redirect

All /media/ requests (public and private) are routed through Gunicorn so that Django middleware runs on every hit. Nginx serves the file bytes via X-Accel-Redirect — the Gunicorn worker is freed as soon as it sends the response headers.

nginx locations (`docker/config/nginx/sapl.conf`)

# Proxied to Gunicorn — Django middleware + serve_media() run here.
location /media/ {
    limit_req zone=sapl_general burst=${NGINX_BURST_GENERAL} nodelay;
    proxy_pass http://sapl_server;
}

# Internal — only reachable via X-Accel-Redirect, not by external clients.
location /_accel/media/ {
    internal;
    alias /var/interlegis/sapl/media/;
    sendfile on;
    etag on;
}

Django view (`sapl/base/media.py`)

serve_media(request, path) — registered at ^media/(?P<path>.*)$ in sapl/urls.py.

Per-request steps:

Path traversal guard — os.path.abspath check; raises 404 on escape.
Auth gate — documentos_privados/ paths require an authenticated session; redirects to login otherwise.
Path counter — increments rl:{ns}:path:{sha256}:reqs in Redis DB 1 (TTL = MEDIA_PATH_COUNTER_TTL).
Content-type cache — reads file:{ns}:{sha256} from Django default cache (DB 0); on miss, calls mimetypes.guess_type, stores result (TTL = MEDIA_FILE_CACHE_TTL).
Serve — in DEBUG: django.views.static.serve directly. In production: X-Accel-Redirect: /_accel/media/<path>.

Settings

Setting	Default	Purpose
`FILE_META_KEY`	`'file:{ns}:{sha256}'`	Key template for content-type cache (DB 0)
`MEDIA_PATH_COUNTER_TTL`	`60` s	Per-path counter window
`MEDIA_FILE_CACHE_TTL`	`3600` s	Content-type metadata TTL

Key schema reference

DB	Use case	Key pattern	TTL	Constant
0	Page / view cache	`cache:{ns}:*`	300 s (default)	`CACHES['default']` KEY_PREFIX
0	Static file cache (logos)	`static:{ns}:{sha256}`	3 – 24 h	Future (requires OpenResty/Lua)
0	Media file content-type cache	`file:{ns}:{sha256}`	1 h	`FILE_META_KEY`
1	IP rate-limit counter	`rl:ip:{ip}:reqs`	60 s	`RL_IP_REQUESTS`
1	IP blocked marker	`rl:ip:{ip}:blocked`	300 s	`RL_IP_BLOCKED`
1	User rate-limit counter	`rl:{ns}:user:{uid}:reqs`	60 s	`RL_USER_REQUESTS`
1	User blocked marker	`rl:{ns}:user:{uid}:blocked`	300 s	`RL_USER_BLOCKED`
1	Namespace/IP sliding window	`rl:{ns}:ip:{ip}:w:{bucket}`	120 s	`RL_NS_WINDOW`
1	Path counter (`/media/`)	`rl:{ns}:path:{sha256}:reqs`	60 s	`RL_PATH_REQUESTS`
1	Path counter (`/static/`)	`rl:{ns}:path:{sha256}:reqs`	60 s	Future (requires OpenResty/Lua)
1	UA deny list	`rl:bot:ua:blocked`	permanent SET	`RL_UA_BLOCKLIST`
2	Django Channels	`channels:*`	session TTL	Future

16 KiB Raw Blame History