Sistema de Apoio ao Processo Legislativo
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
 
 
 
 
 

16 KiB

SAPL — Kubernetes Redis

Manifests for the shared Redis instance used by all SAPL pods for cross-pod rate limiting (DB 1) and view/static-file caching (DB 0).


Directory layout

docker/k8s/
└── redis/
    ├── redis-configmap.yaml    # redis.conf — no persistence, allkeys-lru, 5 GB ceiling
    ├── redis-deployment.yaml   # Deployment (1 replica, redis:7-alpine)
    └── redis-service.yaml      # ClusterIP service on port 6379

Prerequisites

  • kubectl configured to talk to the target cluster.
  • A sapl-redis namespace (created below if it doesn't exist).

Deploy

# 1. Create the namespace (idempotent)
rancher kubectl create namespace sapl-redis --dry-run=client -o yaml | rancher kubectl apply -f -

# 2. Apply all three manifests
rancher kubectl apply -f docker/k8s/redis/redis-configmap.yaml
rancher kubectl apply -f docker/k8s/redis/redis-deployment.yaml
rancher kubectl apply -f docker/k8s/redis/redis-service.yaml

# 3. Verify the pod is Running
rancher kubectl -n sapl-redis get pods -l app=sapl-redis

Expected output:

NAME                          READY   STATUS    RESTARTS   AGE
sapl-redis-6d9f8b7c4d-xk2lm   1/1     Running   0          30s

Verify the rate limiter

scripts/test_ratelimiter.py fires repeated GET requests at a SAPL URL and reports when the first 429 is returned.

Usage

python scripts/test_ratelimiter.py <URL> [-n NUM] [-d DELAY] [-t TIMEOUT]
Flag Default Meaning
url (required) Full URL including scheme, e.g. http://localhost
-n, --num-requests 50 Maximum requests to send
-d, --delay 0.1 Seconds between requests
-t, --timeout 10 Per-request timeout in seconds

The script stops and prints a summary as soon as a 429 is received.

Examples

# Hit the anonymous threshold (35 req/min) — fire 40 requests with minimal delay
python scripts/test_ratelimiter.py http://localhost -n 40 -d 0.05

# Slower fire — check that legitimate traffic is not rate-limited
python scripts/test_ratelimiter.py http://localhost -n 20 -d 2

# Test against a staging pod via port-forward
rancher kubectl port-forward -n <NAMESPACE> deploy/sapl 8080:80 &
python scripts/test_ratelimiter.py http://localhost:8080 -n 40 -d 0.05

Reading the output

Request   1: Status 200 | Time: 0.045s
...
Request  36: Status 429 | Time: 0.038s
  -> Rate limited on request 36

Summary:
  Total requests attempted: 36
  Successful (200):          35
  Rate limited (429):        1
  First 429 occurred at request: 36

A first-429 near the configured anonymous threshold (35 req/min) confirms the middleware is wired correctly. A first-429 much earlier points to nginx limit_req firing before Django sees the request.


Inject REDIS_URL into SAPL instances

REDIS_URL points at the shared instance:

redis://redis.sapl-redis.svc.cluster.local:6379
         ^^^^^  ^^^^^^^^^^
         svc    namespace

start.sh picks it up on every pod startup and sets the REDIS_CACHE waffle switch automatically — no further intervention needed.

Fleet-wide rollout

Uses the app.kubernetes.io/name=sapl pod label to discover every SAPL namespace automatically — onboarding a new municipality requires no script changes.

for ns in $(rancher kubectl get pods -A -l app.kubernetes.io/name=sapl \
  -o jsonpath='{.items[*].metadata.namespace}' | tr ' ' '\n' | sort -u); do
  rancher kubectl set env deployment/sapl \
    REDIS_URL=redis://redis.sapl-redis.svc.cluster.local:6379 \
    -n $ns
done

Roll back

for ns in $(rancher kubectl get pods -A -l app.kubernetes.io/name=sapl \
  -o jsonpath='{.items[*].metadata.namespace}' | tr ' ' '\n' | sort -u); do
  rancher kubectl set env deployment/sapl REDIS_URL- -n $ns
done

kubectl set env deployment/sapl REDIS_URL- (trailing -) removes the variable. start.sh then falls back to file-based cache automatically.


Monitor

Pod and events

# Pod status
rancher kubectl -n sapl-redis get pods -l app=sapl-redis -o wide

# Deployment events (useful right after apply)
rancher kubectl -n sapl-redis describe deployment sapl-redis

# Pod events (OOMKill, restarts, etc.)
rancher kubectl -n sapl-redis describe pod -l app=sapl-redis

Logs

# Tail live logs
rancher kubectl -n sapl-redis logs -f deploy/sapl-redis

# Last 100 lines
rancher kubectl -n sapl-redis logs deploy/sapl-redis --tail=100

Redis INFO

# Memory usage
rancher kubectl exec -n sapl-redis deploy/sapl-redis -- \
  redis-cli info memory \
  | grep -E 'used_memory_human|maxmemory_human|mem_fragmentation_ratio'

# Connection pressure
rancher kubectl exec -n sapl-redis deploy/sapl-redis -- \
  redis-cli info stats \
  | grep -E 'rejected_connections|instantaneous_ops_per_sec'

# Key distribution per DB
rancher kubectl exec -n sapl-redis deploy/sapl-redis -- redis-cli info keyspace

# Recent slow queries
rancher kubectl exec -n sapl-redis deploy/sapl-redis -- redis-cli slowlog get 10

# Live command sampling (1-second window)
rancher kubectl exec -n sapl-redis deploy/sapl-redis -- redis-cli --latency-history -i 1

Rate-limiter keys (DB 1)

rancher kubectl exec -n sapl-redis deploy/sapl-redis -- \
  redis-cli -n 1 dbsize

rancher kubectl exec -n sapl-redis deploy/sapl-redis -- \
  redis-cli -n 1 --scan --pattern 'rl:ip:*' | head -20

Seed the UA deny list (once after first deploy)

rl:bot:ua:blocked is a permanent Redis SET in DB 1. Each member is the SHA-256 of a UA token — the identifying fragment extracted after splitting on /, spaces, ;, (, ), e.g.:

UA string:  "GPTBot/1.1 (+https://openai.com/gptbot)"
Tokens:      GPTBot  1.1  +https:  ...
Hash stored: sha256("GPTBot")

The middleware (_is_redis_blocked_ua) tokenises the incoming UA the same way and checks each token hash against the cached set. The SET is fetched from Redis at most once per RATE_LIMITER_UA_BLOCKLIST_REFRESH seconds (default 60) per worker process.

The bots in BOT_UA_FRAGMENTS (Python list, always active) and this Redis SET are independent — the Python list provides the baseline and the Redis SET allows adding new offenders at runtime without a code deploy.

rancher kubectl exec -n sapl-redis deploy/sapl-redis -- redis-cli -n 1 \
  SADD rl:bot:ua:blocked \
    "$(echo -n 'GPTBot'             | sha256sum | cut -d' ' -f1)" \
    "$(echo -n 'ClaudeBot'          | sha256sum | cut -d' ' -f1)" \
    "$(echo -n 'PerplexityBot'      | sha256sum | cut -d' ' -f1)" \
    "$(echo -n 'Bytespider'         | sha256sum | cut -d' ' -f1)" \
    "$(echo -n 'AhrefsBot'          | sha256sum | cut -d' ' -f1)" \
    "$(echo -n 'meta-externalagent' | sha256sum | cut -d' ' -f1)"

# Add a new offender at runtime (picked up within RATE_LIMITER_UA_BLOCKLIST_REFRESH seconds)
rancher kubectl exec -n sapl-redis deploy/sapl-redis -- redis-cli -n 1 \
  SADD rl:bot:ua:blocked "$(echo -n 'NewBot' | sha256sum | cut -d' ' -f1)"

Local standalone Redis (development / testing)

No Kubernetes? Run Redis directly with Docker:

sudo docker run --rm -p 6379:6379 redis:7-alpine \
  redis-server --save "" --appendonly no

Then point Django at it by exporting the env var before starting the dev server:

export REDIS_URL="redis://localhost:6379"
export CACHE_BACKEND="redis"
python manage.py runserver

Or add them to your local .env file:

REDIS_URL=redis://localhost:6379
CACHE_BACKEND=redis

Note: the waffle switch REDIS_CACHE must also be on in your local database for start.sh to activate the Redis backend. Run:

python manage.py waffle_switch REDIS_CACHE on --create

Update redis.conf without redeploying

# Edit the ConfigMap
rancher kubectl -n sapl-redis edit configmap redis-config

# Restart the pod to pick up the new config
rancher kubectl -n sapl-redis rollout restart deployment/sapl-redis

Rate limiting — two layers, two jobs

SAPL enforces rate limits at two independent layers. They use different algorithms and protect different things; their thresholds must be tuned separately.

Layer 1 — nginx limit_req (leaky bucket)

Defined in docker/config/nginx/nginx.conf (zones) and sapl.conf (burst).

sapl_general  rate=30r/m   # 1 token every 2 s
sapl_heavy    rate=10r/m   # 1 token every 6 s  (PDF/report endpoints)

burst=N nodelay means nginx accepts up to N requests instantly above the current token level, then enforces the drip rate. Requests beyond the burst cap return 429 before reaching Gunicorn — zero Python cost.

Burst values are set at container startup via env vars:

Env var Default Location
NGINX_BURST_GENERAL 60 location /, location /media/
NGINX_BURST_API 60 location /api/
NGINX_BURST_HEAVY 20 location /relatorios/

Defaults are 2× the zone's per-minute rate, so a user can spend a full minute's quota in a single burst before the leaky bucket takes over.

Layer 2 — Django RateLimitMiddleware (sliding window)

Defined in sapl/middleware/ratelimit.py, backed by Redis DB 1.

Requests that pass nginx reach Python. The middleware counts them in a 60-second sliding window per IP (anonymous) or per user (authenticated):

Env var Default Scope
RATE_LIMITER_RATE 35/m Anonymous IP
RATE_LIMITER_RATE_AUTHENTICATED 120/m Authenticated user
RATE_LIMITER_RATE_BOT 5/m (reserved — bots are currently blocked outright, not counted)
RATE_LIMITER_UA_BLOCKLIST_REFRESH 60 s How often each worker re-fetches rl:bot:ua:blocked from Redis

When the window count hits the threshold the IP/user is written to a Redis blocked-set with a 300 s TTL and subsequent requests return 429 with Retry-After: 300 — without touching the database.

Decision flow inside RateLimitMiddleware._evaluate():

1.  IP in whitelist?                          → pass (no further checks)
1a. UA matches BOT_UA_FRAGMENTS list?         → 429  reason=known_ua
1b. UA token hash in rl:bot:ua:blocked SET?   → 429  reason=redis_ua
2.  IP in rl:ip:{ip}:blocked?                 → 429  reason=ip_blocked
3.  Authenticated user?
    3a. User in rl:{ns}:user:{uid}:blocked?   → 429  reason=user_blocked
    3b. Suspicious headers (no Accept/AL)?    → 429  reason=suspicious_headers_auth
    3c. User request count ≥ auth threshold?  → SET blocked, 429  reason=auth_user_rate
4.  Anonymous:
    4a. Suspicious headers?                   → 429  reason=suspicious_headers
    4b. IP request count ≥ anon threshold?    → SET blocked, 429  reason=ip_rate
    4c. NS/IP window count ≥ anon threshold?  → SET blocked, 429  reason=ua_rotation
    → pass

Why they are not the same number

nginx burst Django threshold
Algorithm Leaky bucket — token refills over time Sliding window — hard count per 60 s
Protects Gunicorn workers from being flooded Per-client fairness, business policy
Tuned by Capacity of the server Acceptable request volume per client
Failure mode Workers overwhelmed Legitimate user over-browsing

A user loading a page quickly may fire 5–10 Django requests in two seconds. With rate=30r/m (1 token/2 s) and burst=60 they absorb that fine; the leaky bucket refills before they click the next link. The Django threshold (35/m sliding window) catches sustained automated traffic from a single IP that looks like scraping even if it arrives slowly enough to beat the nginx burst cap.


Request routing — how nginx reaches Django

proxy_pass http://sapl_server forwards the HTTP request — with the original path intact — to the Gunicorn Unix socket. Django doesn't know or care that nginx is in front; it sees a standard HTTP request.

GET /media/foo.pdf
      │
      ▼
   nginx (sapl.conf)
   location /media/ → proxy_pass to Unix socket
      │
      ▼
   Gunicorn (WSGI server)
   receives raw HTTP, calls Django WSGI application
      │
      ▼
   Django middleware stack (settings.MIDDLEWARE)
   RateLimitMiddleware → pass or 429
      │
      ▼
   Django URL router (sapl/urls.py)
   r'^media/(?P<path>.*)$' → serve_media
      │
      ▼
   serve_media(request, path='foo.pdf')
   returns HttpResponse with X-Accel-Redirect: /_accel/media/foo.pdf
      │
      ▼
   nginx sees X-Accel-Redirect header
   /_accel/media/ internal location → reads file from disk → sends to client

nginx does no routing beyond picking a location block. The mapping from URL path to Python function lives entirely in sapl/urls.py. proxy_pass is just a pipe.


Media file serving — serve_media and X-Accel-Redirect

All /media/ requests (public and private) are routed through Gunicorn so that Django middleware runs on every hit. Nginx serves the file bytes via X-Accel-Redirect — the Gunicorn worker is freed as soon as it sends the response headers.

nginx locations (docker/config/nginx/sapl.conf)

# Proxied to Gunicorn — Django middleware + serve_media() run here.
location /media/ {
    limit_req zone=sapl_general burst=${NGINX_BURST_GENERAL} nodelay;
    proxy_pass http://sapl_server;
}

# Internal — only reachable via X-Accel-Redirect, not by external clients.
location /_accel/media/ {
    internal;
    alias /var/interlegis/sapl/media/;
    sendfile on;
    etag on;
}

Django view (sapl/base/media.py)

serve_media(request, path) — registered at ^media/(?P<path>.*)$ in sapl/urls.py.

Per-request steps:

  1. Path traversal guardos.path.abspath check; raises 404 on escape.
  2. Auth gatedocumentos_privados/ paths require an authenticated session; redirects to login otherwise.
  3. Path counter — increments rl:{ns}:path:{sha256}:reqs in Redis DB 1 (TTL = MEDIA_PATH_COUNTER_TTL).
  4. Content-type cache — reads file:{ns}:{sha256} from Django default cache (DB 0); on miss, calls mimetypes.guess_type, stores result (TTL = MEDIA_FILE_CACHE_TTL).
  5. Serve — in DEBUG: django.views.static.serve directly. In production: X-Accel-Redirect: /_accel/media/<path>.

Settings

Setting Default Purpose
FILE_META_KEY 'file:{ns}:{sha256}' Key template for content-type cache (DB 0)
MEDIA_PATH_COUNTER_TTL 60 s Per-path counter window
MEDIA_FILE_CACHE_TTL 3600 s Content-type metadata TTL

Key schema reference

DB Use case Key pattern TTL Constant
0 Page / view cache cache:{ns}:* 300 s (default) CACHES['default'] KEY_PREFIX
0 Static file cache (logos) static:{ns}:{sha256} 3 – 24 h Future (requires OpenResty/Lua)
0 Media file content-type cache file:{ns}:{sha256} 1 h FILE_META_KEY
1 IP rate-limit counter rl:ip:{ip}:reqs 60 s RL_IP_REQUESTS
1 IP blocked marker rl:ip:{ip}:blocked 300 s RL_IP_BLOCKED
1 User rate-limit counter rl:{ns}:user:{uid}:reqs 60 s RL_USER_REQUESTS
1 User blocked marker rl:{ns}:user:{uid}:blocked 300 s RL_USER_BLOCKED
1 Namespace/IP sliding window rl:{ns}:ip:{ip}:w:{bucket} 120 s RL_NS_WINDOW
1 Path counter (/media/) rl:{ns}:path:{sha256}:reqs 60 s RL_PATH_REQUESTS
1 Path counter (/static/) rl:{ns}:path:{sha256}:reqs 60 s Future (requires OpenResty/Lua)
1 UA deny list rl:bot:ua:blocked permanent SET RL_UA_BLOCKLIST
2 Django Channels channels:* session TTL Future