Browse Source

Phase 5: X-Accel-Redirect for /media/, UA Redis deny list, per-path counters

nginx:
- /media/ proxied through Gunicorn (sapl_general rate limit) instead of
  direct alias — Django middleware now runs on every media request
- /_accel/media/ internal location serves file bytes via X-Accel-Redirect

sapl/base/media.py (new):
- serve_media() gate: path traversal guard, auth redirect for
  documentos_privados/, per-path Redis counter, content-type metadata
  cache, X-Accel-Redirect response; falls back to Django serve() in DEBUG

sapl/middleware/ratelimit.py:
- RL_PATH_REQUESTS, RL_UA_BLOCKLIST, FILE_META_KEY constants
- _incr_with_ttl() extracted to module level (reused by media.py)
- Runtime UA deny list: _refresh_ua_blocklist() fetches rl:bot:ua:blocked
  SET from Redis (SMEMBERS, cached per worker, TTL=RATE_LIMITER_UA_BLOCKLIST_REFRESH);
  _is_redis_blocked_ua() tokenises UA and checks sha256 of each token

sapl/settings.py:
- RATE_LIMITER_UA_BLOCKLIST_REFRESH, MEDIA_PATH_COUNTER_TTL,
  MEDIA_FILE_CACHE_TTL added (all env-tunable via config())

plan/RATE_LIMITER_PLAN.md:
- Key schema table updated; media file serving section added;
  decision flow documented; UA deny list seed section expanded

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
rate-limiter-2026
Edward Ribeiro 3 weeks ago
parent
commit
64c9b241fa
  1. 145
      CLAUDE.md
  2. 34
      docker/config/nginx/sapl.conf
  3. 169
      docker/scripts/redis_inject_test_data.py
  4. 474
      plan/RATE_LIMITER_PLAN.md
  5. 1231
      plan/rate-limiter-v2.md
  6. 96
      sapl/base/media.py
  7. 106
      sapl/middleware/ratelimit.py
  8. 104
      sapl/settings.py
  9. 13
      sapl/urls.py

145
CLAUDE.md

@ -0,0 +1,145 @@
# CLAUDE.md
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
## Project Overview
SAPL (Sistema de Apoio ao Processo Legislativo) is a Django-based legislative management system used by Brazilian municipal and state legislative houses. It manages bills, parliamentary sessions, committees, norms, protocols, and related legislative workflows.
## Commands
### Development
```bash
# Run dev server
python manage.py runserver
# Docker (dev, without bundled DB)
docker-compose -f docker/docker-compose-dev.yml up
# Docker (dev, with PostgreSQL container)
docker-compose -f docker/docker-compose-dev-db.yml up
```
### Database Setup (local PostgreSQL)
```bash
sudo -u postgres psql -c "CREATE ROLE sapl LOGIN ENCRYPTED PASSWORD 'sapl' NOSUPERUSER INHERIT CREATEDB NOCREATEROLE NOREPLICATION;"
sudo -u postgres psql -c "CREATE DATABASE sapl WITH OWNER=sapl ENCODING='UTF8' LC_COLLATE='pt_BR.UTF-8' LC_CTYPE='pt_BR.UTF-8' CONNECTION LIMIT=-1 TEMPLATE template0;"
python manage.py migrate
```
### Testing
```bash
# All tests (reuses DB by default for speed)
pytest
# Single test file or test function
pytest sapl/materia/tests/test_materia.py
pytest sapl/materia/tests/test_materia.py::test_function_name
# Force DB recreation
pytest --create-db
# With coverage
pytest --cov=sapl
```
Tests require `DJANGO_SETTINGS_MODULE=sapl.settings` (set in `pytest.ini`). All tests must be marked with `@pytest.mark.django_db`. The `conftest.py` root fixture provides an `app` fixture (WebTest `DjangoTestApp`).
### Linting / Formatting
```bash
flake8 .
isort .
autopep8 --in-place <file.py>
```
### Restore Database from Backup
```bash
./scripts/restore_db.sh -f /path/to/dump
./scripts/restore_db.sh -f /path/to/dump -p 5433 # Docker port
```
## Architecture
### Django Apps
Apps are under `sapl/` and follow domain boundaries:
| App | Domain |
|-----|--------|
| `base` | `CasaLegislativa` (legislative house config), `AppConfig`, `Autor` (authorship) |
| `parliamentary` | `Parlamentar`, `Legislatura`, `SessaoLegislativa`, `Coligacao` |
| `materia` | Bills (`MateriaLegislativa`), types, tracking, annexes |
| `norma` | Laws/norms (`NormaJuridica`) and hierarchies |
| `sessao` | Plenary sessions, agenda, attendance, voting |
| `comissoes` | Committees (`Comissao`) and meetings (`Reuniao`) |
| `protocoloadm` | Administrative protocols and document intake |
| `compilacao` | Structured/articulated texts (LexML-like tree structure) |
| `lexml` | LexML XML standard integration |
| `audiencia` | Public hearings |
| `painel` | Real-time session display panel |
| `relatorios` | PDF report generation |
| `api` | REST API entry point (auto-generated ViewSets) |
| `crud` | Generic CRUD base views |
| `rules` | Business rules and permission definitions |
### REST API
The API uses a custom `drfautoapi` package (`drfautoapi/drfautoapi.py`) that auto-generates DRF ViewSets, Serializers, and FilterSets from Django models. Authentication is Token + Session. Permissions use a custom `SaplModelPermissions` class that maps HTTP methods to Django model permissions.
OpenAPI 3.0 docs are generated by drf-spectacular.
### Caching
- **Default:** File-based (`/var/tmp/django_cache`)
- **Production:** Redis via django-redis; configured at startup by `configure_redis_cache()` in `sapl/settings.py`
- **Cache key prefix:** `cache:{POD_NAMESPACE}:` (namespace-isolated for multi-tenant k8s)
- **Rate limiter state** is shared via Redis keys
### Feature Flags
django-waffle is used for feature flags. Switches (global on/off) can be toggled via:
```bash
python manage.py waffle_switch <switch_name> on|off
```
### Key Environment Variables
| Variable | Purpose |
|----------|---------|
| `DATABASE_URL` | PostgreSQL connection string |
| `SECRET_KEY` | Django secret key |
| `DEBUG` | Debug mode |
| `REDIS_URL` | Redis host:port |
| `CACHE_BACKEND` | `file` or `redis` |
| `POD_NAMESPACE` | K8s namespace (used in cache key prefix) |
| `USE_SOLR` | Enable Haystack/Solr full-text search |
| `SOLR_URL` / `SOLR_COLLECTION` | Solr connection |
### Docker Build
The production build requires a MaxMind GeoLite2-ASN license key (for nginx ASN-based bot blocking):
```bash
docker build --secret id=maxmind_key,src=.env -f docker/Dockerfile -t sapl:local .
```
Optional build args: `WITH_NGINX`, `WITH_GRAPHVIZ`, `WITH_POPPLER`, `WITH_PSQL_CLIENT`.
### Key File Locations
| File | Purpose |
|------|---------|
| `sapl/settings.py` | All Django settings, including cache/rate-limit setup |
| `pytest.ini` | Test configuration (DJANGO_SETTINGS_MODULE, addopts) |
| `conftest.py` | Root pytest fixtures |
| `drfautoapi/drfautoapi.py` | Auto-API generation logic |
| `docker/startup_scripts/start.sh` | Container entrypoint (migrations, waffle, gunicorn) |
| `requirements/requirements.txt` | Production deps |
| `requirements/test-requirements.txt` | Test deps |
| `requirements/dev-requirements.txt` | Dev/lint deps |

34
docker/config/nginx/sapl.conf

@ -45,21 +45,28 @@ server {
} }
# ---------------------------------------------------------------- # ----------------------------------------------------------------
# Media files — FIX: add ETags and Cache-Control headers. # Media files — routed through Django for auth, rate counting,
# sendfile on + etag on converts repeat bot requests to 304s. # and content-type caching; served from disk via X-Accel-Redirect.
# ---------------------------------------------------------------- # ----------------------------------------------------------------
location /media/ { location /media/ {
alias /var/interlegis/sapl/media/; limit_req zone=sapl_general burst=${NGINX_BURST_GENERAL} nodelay;
sendfile on; limit_req_status 429;
etag on;
add_header Cache-Control "public, max-age=86400, stale-while-revalidate=3600"; proxy_set_header X-Request-ID $req_id;
add_header X-Robots-Tag "noindex" always; proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
proxy_set_header Host $http_host;
proxy_redirect off;
proxy_pass http://sapl_server;
} }
# Private documents — X-Accel-Redirect after auth check in Django. # Internal location used exclusively by X-Accel-Redirect responses
location /media/documentos_privados/ { # from serve_media(). Not reachable by external clients.
location /_accel/media/ {
internal; internal;
alias /var/interlegis/sapl/media/documentos_privados/; alias /var/interlegis/sapl/media/;
sendfile on;
etag on;
} }
# ---------------------------------------------------------------- # ----------------------------------------------------------------
@ -67,7 +74,7 @@ server {
# Tighter rate limit; extended timeout for uncached generation. # Tighter rate limit; extended timeout for uncached generation.
# ---------------------------------------------------------------- # ----------------------------------------------------------------
location /relatorios/ { location /relatorios/ {
limit_req zone=sapl_heavy burst=5 nodelay; limit_req zone=sapl_heavy burst=${NGINX_BURST_HEAVY} nodelay;
limit_req_status 429; limit_req_status 429;
proxy_read_timeout 180s; proxy_read_timeout 180s;
@ -102,7 +109,7 @@ server {
# /api/ — rate limited, CORS maintained from original config. # /api/ — rate limited, CORS maintained from original config.
# ---------------------------------------------------------------- # ----------------------------------------------------------------
location /api/ { location /api/ {
limit_req zone=sapl_general burst=30 nodelay; limit_req zone=sapl_general burst=${NGINX_BURST_API} nodelay;
limit_req_status 429; limit_req_status 429;
add_header 'Access-Control-Allow-Origin' '*'; add_header 'Access-Control-Allow-Origin' '*';
@ -134,7 +141,7 @@ server {
# General traffic — moderate rate limit. # General traffic — moderate rate limit.
# ---------------------------------------------------------------- # ----------------------------------------------------------------
location / { location / {
limit_req zone=sapl_general burst=20 nodelay; limit_req zone=sapl_general burst=${NGINX_BURST_GENERAL} nodelay;
limit_req_status 429; limit_req_status 429;
proxy_set_header X-Request-ID $req_id; proxy_set_header X-Request-ID $req_id;
@ -147,6 +154,7 @@ server {
error_page 429 /429.html; error_page 429 /429.html;
location = /429.html { location = /429.html {
add_header Retry-After 60 always;
root /var/interlegis/sapl/sapl/static/; root /var/interlegis/sapl/sapl/static/;
internal; internal;
} }

169
docker/scripts/redis_inject_test_data.py

@ -0,0 +1,169 @@
#!/usr/bin/env python3
"""
redis_inject_test_data.py inject synthetic rate-limiter entries into Redis.
Purpose: validate that RateLimitMiddleware reads the expected key schema,
that Redis CLI / RedisInsight shows the right structure, and that blocking
logic fires correctly without waiting for real traffic.
Usage:
# Against docker-compose Redis (default)
python3 docker/scripts/redis_inject_test_data.py
# Against a different host/port
REDIS_URL=redis://localhost:6379 python3 docker/scripts/redis_inject_test_data.py
# Clear all synthetic keys written by a previous run
CLEAR=1 python3 docker/scripts/redis_inject_test_data.py
Key schema (DB 1 rate limiter):
rl:ip:{ip}:reqs INCR counter anonymous request count (TTL 60s)
rl:ip:{ip}:blocked string "1" IP hard-blocked (TTL 300s)
rl:{ns}:user:{uid}:reqs INCR counter auth user request count (TTL 60s)
rl:{ns}:user:{uid}:blocked string "1" user hard-blocked (TTL 300s)
rl:{ns}:ip:{ip}:w:{bucket} INCR namespace/IP sliding window (TTL 120s)
"""
import os
import sys
import time
from decouple import config
# ── dependency check ──────────────────────────────────────────────────────
try:
import redis
except ImportError:
print("ERROR: redis-py not installed. Run: pip install redis", file=sys.stderr)
sys.exit(1)
# ── config ────────────────────────────────────────────────────────────────
REDIS_URL = config("REDIS_URL", default="redis://localhost:6379")
RATELIMIT_DB = 1 # DB1 is the rate-limiter database
CLEAR = config("CLEAR", default="0").lower() in ("1", "true", "yes")
# Synthetic values — tweak to exercise different code paths
NAMESPACE = "sapl" # POD_NAMESPACE value (hostname or k8s namespace)
ANON_WINDOW = 60 # seconds — must match settings.RATE_LIMITER_RATE period
AUTH_WINDOW = 60
BLOCK_TTL = 300
TEST_IPS = [
"203.0.113.1", # below threshold (20 reqs)
"203.0.113.2", # AT threshold (35 reqs — should trigger block)
"203.0.113.3", # already blocked
"203.0.113.4", # namespace/window counter near threshold
]
TEST_USERS = [
{"uid": "42", "reqs": 50, "blocked": False}, # normal auth user
{"uid": "99", "reqs": 120, "blocked": False}, # AT auth threshold
{"uid": "7", "reqs": 10, "blocked": True}, # pre-blocked user
]
# ── helpers ───────────────────────────────────────────────────────────────
def key_ip_reqs(ip):
return f"rl:ip:{ip}:reqs"
def key_ip_blocked(ip):
return f"rl:ip:{ip}:blocked"
def key_user_reqs(ns, uid):
return f"rl:{ns}:user:{uid}:reqs"
def key_user_blocked(ns, uid):
return f"rl:{ns}:user:{uid}:blocked"
def key_ns_window(ns, ip, bucket):
return f"rl:{ns}:ip:{ip}:w:{bucket}"
def write(r, key, value, ttl, label):
if isinstance(value, int):
pipe = r.pipeline()
pipe.set(key, value, ex=ttl)
pipe.execute()
else:
r.set(key, value, ex=ttl)
print(f" SET {key!r} = {value!r} EX {ttl}s ({label})")
def delete_pattern(r, pattern):
keys = r.keys(pattern)
if keys:
r.delete(*keys)
print(f" DEL {len(keys)} keys matching {pattern!r}")
else:
print(f" (no keys matching {pattern!r})")
# ── main ──────────────────────────────────────────────────────────────────
def main():
r = redis.from_url(REDIS_URL, db=RATELIMIT_DB, decode_responses=True)
try:
r.ping()
except redis.ConnectionError as exc:
print(f"ERROR: cannot connect to Redis at {REDIS_URL}: {exc}", file=sys.stderr)
sys.exit(1)
print(f"Redis: {REDIS_URL} DB={RATELIMIT_DB} clear={CLEAR}")
print()
# ── clear mode ────────────────────────────────────────────────────────
if CLEAR:
print("=== Clearing synthetic test keys ===")
for ip in TEST_IPS:
delete_pattern(r, f"rl:ip:{ip}:*")
delete_pattern(r, f"rl:{NAMESPACE}:ip:{ip}:*")
for u in TEST_USERS:
delete_pattern(r, f"rl:{NAMESPACE}:user:{u['uid']}:*")
print("Done.")
return
# ── anonymous IP counters ─────────────────────────────────────────────
print("=== Anonymous IP request counters (DB1) ===")
write(r, key_ip_reqs(TEST_IPS[0]), 20, ANON_WINDOW, "below threshold")
write(r, key_ip_reqs(TEST_IPS[1]), 35, ANON_WINDOW, "AT threshold → middleware will block on next req")
write(r, key_ip_reqs(TEST_IPS[3]), 30, ANON_WINDOW, "below threshold")
print()
# ── blocked IPs ───────────────────────────────────────────────────────
print("=== Blocked IPs (DB1) ===")
write(r, key_ip_blocked(TEST_IPS[2]), "1", BLOCK_TTL, "hard-blocked")
print()
# ── namespace/IP sliding window ───────────────────────────────────────
print("=== Namespace/IP sliding window (DB1) ===")
bucket = int(time.time() // ANON_WINDOW)
write(r, key_ns_window(NAMESPACE, TEST_IPS[3], bucket), 34, ANON_WINDOW * 2,
"near window threshold (next req triggers ua_rotation block)")
print()
# ── authenticated user counters ───────────────────────────────────────
print("=== Authenticated user request counters (DB1) ===")
for u in TEST_USERS:
if not u["blocked"]:
write(r, key_user_reqs(NAMESPACE, u["uid"]), u["reqs"], AUTH_WINDOW,
f"uid={u['uid']} reqs={u['reqs']}")
print()
# ── blocked users ─────────────────────────────────────────────────────
print("=== Blocked users (DB1) ===")
for u in TEST_USERS:
if u["blocked"]:
write(r, key_user_blocked(NAMESPACE, u["uid"]), "1", BLOCK_TTL,
f"uid={u['uid']} hard-blocked")
print()
# ── summary ───────────────────────────────────────────────────────────
all_keys = r.keys("rl:*")
print(f"=== DB{RATELIMIT_DB} now contains {len(all_keys)} rl:* keys ===")
for k in sorted(all_keys):
ttl = r.ttl(k)
val = r.get(k)
print(f" {k!r:55s} val={val!r:5} ttl={ttl}s")
if __name__ == "__main__":
main()

474
plan/RATE_LIMITER_PLAN.md

@ -0,0 +1,474 @@
# SAPL — Kubernetes Redis
Manifests for the shared Redis instance used by all SAPL pods for
cross-pod rate limiting (DB 1) and view/static-file caching (DB 0).
---
## Directory layout
```
docker/k8s/
└── redis/
├── redis-configmap.yaml # redis.conf — no persistence, allkeys-lru, 5 GB ceiling
├── redis-deployment.yaml # Deployment (1 replica, redis:7-alpine)
└── redis-service.yaml # ClusterIP service on port 6379
```
---
## Prerequisites
- `kubectl` configured to talk to the target cluster.
- A `sapl-redis` namespace (created below if it doesn't exist).
---
## Deploy
```bash
# 1. Create the namespace (idempotent)
rancher kubectl create namespace sapl-redis --dry-run=client -o yaml | rancher kubectl apply -f -
# 2. Apply all three manifests
rancher kubectl apply -f docker/k8s/redis/redis-configmap.yaml
rancher kubectl apply -f docker/k8s/redis/redis-deployment.yaml
rancher kubectl apply -f docker/k8s/redis/redis-service.yaml
# 3. Verify the pod is Running
rancher kubectl -n sapl-redis get pods -l app=sapl-redis
```
Expected output:
```
NAME READY STATUS RESTARTS AGE
sapl-redis-6d9f8b7c4d-xk2lm 1/1 Running 0 30s
```
---
## Verify the rate limiter
`scripts/test_ratelimiter.py` fires repeated GET requests at a SAPL URL and reports
when the first 429 is returned.
### Usage
```
python scripts/test_ratelimiter.py <URL> [-n NUM] [-d DELAY] [-t TIMEOUT]
```
| Flag | Default | Meaning |
|------|---------|---------|
| `url` | *(required)* | Full URL including scheme, e.g. `http://localhost` |
| `-n`, `--num-requests` | `50` | Maximum requests to send |
| `-d`, `--delay` | `0.1` | Seconds between requests |
| `-t`, `--timeout` | `10` | Per-request timeout in seconds |
The script stops and prints a summary as soon as a 429 is received.
### Examples
```bash
# Hit the anonymous threshold (35 req/min) — fire 40 requests with minimal delay
python scripts/test_ratelimiter.py http://localhost -n 40 -d 0.05
# Slower fire — check that legitimate traffic is not rate-limited
python scripts/test_ratelimiter.py http://localhost -n 20 -d 2
# Test against a staging pod via port-forward
rancher kubectl port-forward -n <NAMESPACE> deploy/sapl 8080:80 &
python scripts/test_ratelimiter.py http://localhost:8080 -n 40 -d 0.05
```
### Reading the output
```
Request 1: Status 200 | Time: 0.045s
...
Request 36: Status 429 | Time: 0.038s
-> Rate limited on request 36
Summary:
Total requests attempted: 36
Successful (200): 35
Rate limited (429): 1
First 429 occurred at request: 36
```
A first-429 near the configured anonymous threshold (35 req/min) confirms the
middleware is wired correctly. A first-429 much earlier points to nginx `limit_req`
firing before Django sees the request.
---
## Inject REDIS_URL into SAPL instances
`REDIS_URL` points at the shared instance:
```
redis://redis.sapl-redis.svc.cluster.local:6379
^^^^^ ^^^^^^^^^^
svc namespace
```
`start.sh` picks it up on every pod startup and sets the `REDIS_CACHE` waffle switch
automatically — no further intervention needed.
### Fleet-wide rollout
Uses the `app.kubernetes.io/name=sapl` pod label to discover every SAPL namespace
automatically — onboarding a new municipality requires no script changes.
```bash
for ns in $(rancher kubectl get pods -A -l app.kubernetes.io/name=sapl \
-o jsonpath='{.items[*].metadata.namespace}' | tr ' ' '\n' | sort -u); do
rancher kubectl set env deployment/sapl \
REDIS_URL=redis://redis.sapl-redis.svc.cluster.local:6379 \
-n $ns
done
```
### Roll back
```bash
for ns in $(rancher kubectl get pods -A -l app.kubernetes.io/name=sapl \
-o jsonpath='{.items[*].metadata.namespace}' | tr ' ' '\n' | sort -u); do
rancher kubectl set env deployment/sapl REDIS_URL- -n $ns
done
```
`kubectl set env deployment/sapl REDIS_URL-` (trailing `-`) removes the variable.
`start.sh` then falls back to file-based cache automatically.
---
## Monitor
### Pod and events
```bash
# Pod status
rancher kubectl -n sapl-redis get pods -l app=sapl-redis -o wide
# Deployment events (useful right after apply)
rancher kubectl -n sapl-redis describe deployment sapl-redis
# Pod events (OOMKill, restarts, etc.)
rancher kubectl -n sapl-redis describe pod -l app=sapl-redis
```
### Logs
```bash
# Tail live logs
rancher kubectl -n sapl-redis logs -f deploy/sapl-redis
# Last 100 lines
rancher kubectl -n sapl-redis logs deploy/sapl-redis --tail=100
```
### Redis INFO
```bash
# Memory usage
rancher kubectl exec -n sapl-redis deploy/sapl-redis -- \
redis-cli info memory \
| grep -E 'used_memory_human|maxmemory_human|mem_fragmentation_ratio'
# Connection pressure
rancher kubectl exec -n sapl-redis deploy/sapl-redis -- \
redis-cli info stats \
| grep -E 'rejected_connections|instantaneous_ops_per_sec'
# Key distribution per DB
rancher kubectl exec -n sapl-redis deploy/sapl-redis -- redis-cli info keyspace
# Recent slow queries
rancher kubectl exec -n sapl-redis deploy/sapl-redis -- redis-cli slowlog get 10
# Live command sampling (1-second window)
rancher kubectl exec -n sapl-redis deploy/sapl-redis -- redis-cli --latency-history -i 1
```
### Rate-limiter keys (DB 1)
```bash
rancher kubectl exec -n sapl-redis deploy/sapl-redis -- \
redis-cli -n 1 dbsize
rancher kubectl exec -n sapl-redis deploy/sapl-redis -- \
redis-cli -n 1 --scan --pattern 'rl:ip:*' | head -20
```
---
## Seed the UA deny list (once after first deploy)
`rl:bot:ua:blocked` is a permanent Redis SET in DB 1. Each member is the
SHA-256 of a **UA token** — the identifying fragment extracted after splitting
on `/`, spaces, `;`, `(`, `)`, e.g.:
```
UA string: "GPTBot/1.1 (+https://openai.com/gptbot)"
Tokens: GPTBot 1.1 +https: ...
Hash stored: sha256("GPTBot")
```
The middleware (`_is_redis_blocked_ua`) tokenises the incoming UA the same
way and checks each token hash against the cached set. The SET is fetched
from Redis at most once per `RATE_LIMITER_UA_BLOCKLIST_REFRESH` seconds (default 60)
per worker process.
The bots in `BOT_UA_FRAGMENTS` (Python list, always active) and this Redis
SET are **independent** — the Python list provides the baseline and the Redis
SET allows adding new offenders at runtime **without a code deploy**.
```bash
rancher kubectl exec -n sapl-redis deploy/sapl-redis -- redis-cli -n 1 \
SADD rl:bot:ua:blocked \
"$(echo -n 'GPTBot' | sha256sum | cut -d' ' -f1)" \
"$(echo -n 'ClaudeBot' | sha256sum | cut -d' ' -f1)" \
"$(echo -n 'PerplexityBot' | sha256sum | cut -d' ' -f1)" \
"$(echo -n 'Bytespider' | sha256sum | cut -d' ' -f1)" \
"$(echo -n 'AhrefsBot' | sha256sum | cut -d' ' -f1)" \
"$(echo -n 'meta-externalagent' | sha256sum | cut -d' ' -f1)"
# Add a new offender at runtime (picked up within RATE_LIMITER_UA_BLOCKLIST_REFRESH seconds)
rancher kubectl exec -n sapl-redis deploy/sapl-redis -- redis-cli -n 1 \
SADD rl:bot:ua:blocked "$(echo -n 'NewBot' | sha256sum | cut -d' ' -f1)"
```
---
## Local standalone Redis (development / testing)
No Kubernetes? Run Redis directly with Docker:
```bash
sudo docker run --rm -p 6379:6379 redis:7-alpine \
redis-server --save "" --appendonly no
```
Then point Django at it by exporting the env var before starting the dev server:
```bash
export REDIS_URL="redis://localhost:6379"
export CACHE_BACKEND="redis"
python manage.py runserver
```
Or add them to your local `.env` file:
```
REDIS_URL=redis://localhost:6379
CACHE_BACKEND=redis
```
> **Note**: the waffle switch `REDIS_CACHE` must also be `on` in your local
> database for `start.sh` to activate the Redis backend. Run:
> ```bash
> python manage.py waffle_switch REDIS_CACHE on --create
> ```
---
## Update `redis.conf` without redeploying
```bash
# Edit the ConfigMap
rancher kubectl -n sapl-redis edit configmap redis-config
# Restart the pod to pick up the new config
rancher kubectl -n sapl-redis rollout restart deployment/sapl-redis
```
---
## Rate limiting — two layers, two jobs
SAPL enforces rate limits at two independent layers. They use different
algorithms and protect different things; their thresholds must be tuned
separately.
### Layer 1 — nginx `limit_req` (leaky bucket)
Defined in `docker/config/nginx/nginx.conf` (zones) and `sapl.conf` (burst).
```
sapl_general rate=30r/m # 1 token every 2 s
sapl_heavy rate=10r/m # 1 token every 6 s (PDF/report endpoints)
```
`burst=N nodelay` means nginx accepts up to N requests instantly above the
current token level, then enforces the drip rate. Requests beyond the burst
cap return 429 before reaching Gunicorn — **zero Python cost**.
Burst values are set at container startup via env vars:
| Env var | Default | Location |
|---------|---------|----------|
| `NGINX_BURST_GENERAL` | `60` | `location /`, `location /media/` |
| `NGINX_BURST_API` | `60` | `location /api/` |
| `NGINX_BURST_HEAVY` | `20` | `location /relatorios/` |
Defaults are 2× the zone's per-minute rate, so a user can spend a full
minute's quota in a single burst before the leaky bucket takes over.
### Layer 2 — Django `RateLimitMiddleware` (sliding window)
Defined in `sapl/middleware/ratelimit.py`, backed by Redis DB 1.
Requests that pass nginx reach Python. The middleware counts them in a
60-second sliding window per IP (anonymous) or per user (authenticated):
| Env var | Default | Scope |
|---------|---------|-------|
| `RATE_LIMITER_RATE` | `35/m` | Anonymous IP |
| `RATE_LIMITER_RATE_AUTHENTICATED` | `120/m` | Authenticated user |
| `RATE_LIMITER_RATE_BOT` | `5/m` | *(reserved — bots are currently blocked outright, not counted)* |
| `RATE_LIMITER_UA_BLOCKLIST_REFRESH` | `60` s | How often each worker re-fetches `rl:bot:ua:blocked` from Redis |
When the window count hits the threshold the IP/user is written to a Redis
blocked-set with a 300 s TTL and subsequent requests return 429 with
`Retry-After: 300` — without touching the database.
Decision flow inside `RateLimitMiddleware._evaluate()`:
```
1. IP in whitelist? → pass (no further checks)
1a. UA matches BOT_UA_FRAGMENTS list? → 429 reason=known_ua
1b. UA token hash in rl:bot:ua:blocked SET? → 429 reason=redis_ua
2. IP in rl:ip:{ip}:blocked? → 429 reason=ip_blocked
3. Authenticated user?
3a. User in rl:{ns}:user:{uid}:blocked? → 429 reason=user_blocked
3b. Suspicious headers (no Accept/AL)? → 429 reason=suspicious_headers_auth
3c. User request count ≥ auth threshold? → SET blocked, 429 reason=auth_user_rate
4. Anonymous:
4a. Suspicious headers? → 429 reason=suspicious_headers
4b. IP request count ≥ anon threshold? → SET blocked, 429 reason=ip_rate
4c. NS/IP window count ≥ anon threshold? → SET blocked, 429 reason=ua_rotation
→ pass
```
### Why they are not the same number
| | nginx burst | Django threshold |
|-|------------|-----------------|
| **Algorithm** | Leaky bucket — token refills over time | Sliding window — hard count per 60 s |
| **Protects** | Gunicorn workers from being flooded | Per-client fairness, business policy |
| **Tuned by** | Capacity of the server | Acceptable request volume per client |
| **Failure mode** | Workers overwhelmed | Legitimate user over-browsing |
A user loading a page quickly may fire 5–10 Django requests in two seconds.
With `rate=30r/m` (1 token/2 s) and `burst=60` they absorb that fine; the
leaky bucket refills before they click the next link. The Django threshold
(35/m sliding window) catches sustained automated traffic from a single IP
that looks like scraping even if it arrives slowly enough to beat the nginx
burst cap.
---
## Request routing — how nginx reaches Django
`proxy_pass http://sapl_server` forwards the HTTP request — with the original
path intact — to the Gunicorn Unix socket. Django doesn't know or care that
nginx is in front; it sees a standard HTTP request.
```
GET /media/foo.pdf
nginx (sapl.conf)
location /media/ → proxy_pass to Unix socket
Gunicorn (WSGI server)
receives raw HTTP, calls Django WSGI application
Django middleware stack (settings.MIDDLEWARE)
RateLimitMiddleware → pass or 429
Django URL router (sapl/urls.py)
r'^media/(?P<path>.*)$' → serve_media
serve_media(request, path='foo.pdf')
returns HttpResponse with X-Accel-Redirect: /_accel/media/foo.pdf
nginx sees X-Accel-Redirect header
/_accel/media/ internal location → reads file from disk → sends to client
```
nginx does no routing beyond picking a `location` block. The mapping from
URL path to Python function lives entirely in `sapl/urls.py`. `proxy_pass` is
just a pipe.
---
## Media file serving — `serve_media` and X-Accel-Redirect
All `/media/` requests (public and private) are routed through Gunicorn so that
Django middleware runs on every hit. Nginx serves the file bytes via
`X-Accel-Redirect` — the Gunicorn worker is freed as soon as it sends the
response headers.
### nginx locations (`docker/config/nginx/sapl.conf`)
```nginx
# Proxied to Gunicorn — Django middleware + serve_media() run here.
location /media/ {
limit_req zone=sapl_general burst=${NGINX_BURST_GENERAL} nodelay;
proxy_pass http://sapl_server;
}
# Internal — only reachable via X-Accel-Redirect, not by external clients.
location /_accel/media/ {
internal;
alias /var/interlegis/sapl/media/;
sendfile on;
etag on;
}
```
### Django view (`sapl/base/media.py`)
`serve_media(request, path)` — registered at `^media/(?P<path>.*)$` in `sapl/urls.py`.
Per-request steps:
1. **Path traversal guard**`os.path.abspath` check; raises 404 on escape.
2. **Auth gate**`documentos_privados/` paths require an authenticated session; redirects to login otherwise.
3. **Path counter** — increments `rl:{ns}:path:{sha256}:reqs` in Redis DB 1 (TTL = `MEDIA_PATH_COUNTER_TTL`).
4. **Content-type cache** — reads `file:{ns}:{sha256}` from Django default cache (DB 0); on miss, calls `mimetypes.guess_type`, stores result (TTL = `MEDIA_FILE_CACHE_TTL`).
5. **Serve** — in DEBUG: `django.views.static.serve` directly. In production: `X-Accel-Redirect: /_accel/media/<path>`.
### Settings
| Setting | Default | Purpose |
|---------|---------|---------|
| `FILE_META_KEY` | `'file:{ns}:{sha256}'` | Key template for content-type cache (DB 0) |
| `MEDIA_PATH_COUNTER_TTL` | `60` s | Per-path counter window |
| `MEDIA_FILE_CACHE_TTL` | `3600` s | Content-type metadata TTL |
---
## Key schema reference
| DB | Use case | Key pattern | TTL | Constant |
|----|----------|-------------|-----|----------|
| 0 | Page / view cache | `cache:{ns}:*` | 300 s (default) | `CACHES['default']` KEY_PREFIX |
| 0 | Static file cache (logos) | `static:{ns}:{sha256}` | 3 – 24 h | *Future* (requires OpenResty/Lua) |
| 0 | Media file content-type cache | `file:{ns}:{sha256}` | 1 h | `FILE_META_KEY` |
| 1 | IP rate-limit counter | `rl:ip:{ip}:reqs` | 60 s | `RL_IP_REQUESTS` |
| 1 | IP blocked marker | `rl:ip:{ip}:blocked` | 300 s | `RL_IP_BLOCKED` |
| 1 | User rate-limit counter | `rl:{ns}:user:{uid}:reqs` | 60 s | `RL_USER_REQUESTS` |
| 1 | User blocked marker | `rl:{ns}:user:{uid}:blocked` | 300 s | `RL_USER_BLOCKED` |
| 1 | Namespace/IP sliding window | `rl:{ns}:ip:{ip}:w:{bucket}` | 120 s | `RL_NS_WINDOW` |
| 1 | Path counter (`/media/`) | `rl:{ns}:path:{sha256}:reqs` | 60 s | `RL_PATH_REQUESTS` |
| 1 | Path counter (`/static/`) | `rl:{ns}:path:{sha256}:reqs` | 60 s | *Future* (requires OpenResty/Lua) |
| 1 | UA deny list | `rl:bot:ua:blocked` | permanent SET | `RL_UA_BLOCKLIST` |
| 2 | Django Channels | `channels:*` | session TTL | *Future* |

1231
plan/rate-limiter-v2.md

File diff suppressed because it is too large

96
sapl/base/media.py

@ -0,0 +1,96 @@
"""
serve_media X-Accel-Redirect gate for all /media/ files.
Production flow (nginx proxies /media/ to Gunicorn):
1. Django middleware runs (IP rate-limit, bot UA check, etc.).
2. serve_media() runs auth check for documentos_privados/, writes
per-path counter to Redis DB 1, caches content-type in Redis DB 0.
3. Returns an empty 200 with X-Accel-Redirect pointing to the nginx
internal location /_accel/media/<path>. Nginx serves the bytes
directly from disk Gunicorn worker is freed immediately.
Development flow (DEBUG=True, nginx absent):
Falls back to django.views.static.serve for live file serving.
Redis side-effects per request:
DB 1 rl:{ns}:path:{sha256}:reqs per-path access counter, TTL=MEDIA_PATH_COUNTER_TTL
DB 0 file:{ns}:{sha256} content-type metadata, TTL=MEDIA_FILE_CACHE_TTL
(sha256 is of the URL path, e.g. sha256('/media/2024/01/doc.pdf'))
Key template: FILE_META_KEY (sapl/middleware/ratelimit.py); TTLs in sapl/settings.py
"""
import hashlib
import mimetypes
import os
from django.conf import settings
from django.core.cache import caches
from django.http import Http404, HttpResponse
from django.views.static import serve
from sapl import settings as sapl_settings
from sapl.middleware.ratelimit import (
_NAMESPACE,
FILE_META_KEY,
RL_PATH_REQUESTS,
_incr_with_ttl,
)
def _safe_resolve(rel_path):
"""
Return the absolute path of rel_path inside MEDIA_ROOT.
Raises Http404 if the resolved path would escape the root
(path traversal guard).
"""
abs_root = os.path.abspath(settings.MEDIA_ROOT)
abs_path = os.path.abspath(os.path.join(abs_root, rel_path))
if not abs_path.startswith(abs_root + os.sep) and abs_path != abs_root:
raise Http404
return abs_path
def serve_media(request, path):
"""
Registered in sapl/urls.py for both DEBUG and production.
Route: ^media/(?P<path>.*)$
"""
# Path traversal guard — raises Http404 on escape attempt.
abs_path = _safe_resolve(path)
# Auth gate for private documents — redirect to login if anonymous.
if path.startswith('documentos_privados/'):
user = getattr(request, 'user', None)
if user is None or not user.is_authenticated:
from django.contrib.auth.views import redirect_to_login
return redirect_to_login(request.get_full_path())
# Per-path rate counter (DB 1) — key uses URL path so that storage
# location changes in the next PR don't reset existing counters.
path_hash = hashlib.sha256(f'/media/{path}'.encode()).hexdigest()
_incr_with_ttl(
RL_PATH_REQUESTS.format(ns=_NAMESPACE, sha256=path_hash),
ttl=sapl_settings.MEDIA_PATH_COUNTER_TTL,
)
# Content-type metadata cache (DB 0) — avoids mimetypes.guess_type
# and os.path.isfile on every hit for hot files.
file_key = FILE_META_KEY.format(ns=_NAMESPACE, sha256=path_hash)
content_type = caches['default'].get(file_key)
if content_type is None:
if not os.path.isfile(abs_path):
raise Http404
guessed, _ = mimetypes.guess_type(abs_path)
content_type = guessed or 'application/octet-stream'
caches['default'].set(file_key, content_type, timeout=sapl_settings.MEDIA_FILE_CACHE_TTL)
if settings.DEBUG:
# Development: no nginx present; serve the file directly.
return serve(request, path, document_root=settings.MEDIA_ROOT)
# Production: tell nginx to serve the file from the internal location.
response = HttpResponse(content_type=content_type)
response['X-Accel-Redirect'] = f'/_accel/media/{path}'
response['Cache-Control'] = 'public, max-age=86400, stale-while-revalidate=3600'
response['X-Robots-Tag'] = 'noindex'
return response

106
sapl/middleware/ratelimit.py

@ -2,7 +2,8 @@
RateLimitMiddleware cross-pod rate limiting backed by shared Redis. RateLimitMiddleware cross-pod rate limiting backed by shared Redis.
Decision flow (per request): Decision flow (per request):
1. Known bot UA? 429 1. Known bot UA? 429 (Python list substring match)
1b. Redis UA deny list? 429 (runtime SET token hash match, refreshed every 60 s)
2. IP in blocked set? 429 2. IP in blocked set? 429
3. Authenticated user? 3. Authenticated user?
a. User blocked? 429 a. User blocked? 429
@ -13,7 +14,6 @@ Decision flow (per request):
b. IP rate 35/min? SET RL_IP_BLOCKED, 429 b. IP rate 35/min? SET RL_IP_BLOCKED, 429
c. NS/IP window hit? SET RL_IP_BLOCKED, 429 c. NS/IP window hit? SET RL_IP_BLOCKED, 429
All decisions are no-ops when RATELIMIT_DRY_RUN=True (logged only).
Degrades gracefully to non-atomic counting when Redis is unavailable. Degrades gracefully to non-atomic counting when Redis is unavailable.
_NAMESPACE is settings.POD_NAMESPACE, resolved once at startup: _NAMESPACE is settings.POD_NAMESPACE, resolved once at startup:
@ -27,9 +27,10 @@ no per-request lookup is needed or correct.
import hashlib import hashlib
import logging import logging
import re
import time import time
from django.conf import settings from sapl import settings
from django.core.cache import caches from django.core.cache import caches
from django.http import HttpResponse from django.http import HttpResponse
@ -52,6 +53,9 @@ RL_IP_BLOCKED = 'rl:ip:{ip}:blocked'
RL_USER_REQUESTS = 'rl:{ns}:user:{uid}:reqs' RL_USER_REQUESTS = 'rl:{ns}:user:{uid}:reqs'
RL_USER_BLOCKED = 'rl:{ns}:user:{uid}:blocked' RL_USER_BLOCKED = 'rl:{ns}:user:{uid}:blocked'
RL_NS_WINDOW = 'rl:{ns}:ip:{ip}:w:{bucket}' RL_NS_WINDOW = 'rl:{ns}:ip:{ip}:w:{bucket}'
RL_PATH_REQUESTS = 'rl:{ns}:path:{sha256}:reqs'
RL_UA_BLOCKLIST = 'rl:bot:ua:blocked' # permanent SET — runtime UA deny list
FILE_META_KEY = 'file:{ns}:{sha256}' # content-type metadata cache (DB 0)
# --------------------------------------------------------------------------- # ---------------------------------------------------------------------------
# Bot UA fragments # Bot UA fragments
@ -169,31 +173,67 @@ def _parse_rate(rate_str):
return count, seconds return count, seconds
def _incr_with_ttl(key, ttl):
"""
Atomic INCR + EXPIRE via Redis Lua script (ratelimit cache, DB 1).
Falls back to non-atomic cache get/set when Redis is unavailable.
Exported at module level so sapl.base.media can reuse it for path counters.
"""
try:
from django_redis import get_redis_connection
client = get_redis_connection('ratelimit')
return client.eval(_INCR_LUA, 1, key, ttl)
except Exception:
rl_cache = caches['ratelimit']
count = (rl_cache.get(key) or 0) + 1
rl_cache.set(key, count, timeout=ttl)
return count
class RateLimitMiddleware: class RateLimitMiddleware:
BLOCK_TTL = 300 # seconds an IP/user stays blocked after threshold breach BLOCK_TTL = 300 # seconds an IP/user stays blocked after threshold breach
# In-process cache for the Redis UA deny list.
# Shared across all instances in the same worker process (one per worker).
# Refreshed every RATE_LIMITER_UA_BLOCKLIST_REFRESH seconds via SMEMBERS.
_ua_blocklist: set = set()
_ua_blocklist_fetched_at: float = 0.0
def __init__(self, get_response): def __init__(self, get_response):
self.get_response = get_response self.get_response = get_response
self.dry_run = settings.RATELIMIT_DRY_RUN
self.anon_threshold, self.anon_window = _parse_rate(settings.RATE_LIMITER_RATE) self.anon_threshold, self.anon_window = _parse_rate(settings.RATE_LIMITER_RATE)
self.auth_threshold, self.auth_window = _parse_rate(settings.RATE_LIMITER_RATE_AUTHENTICATED) self.auth_threshold, self.auth_window = _parse_rate(settings.RATE_LIMITER_RATE_AUTHENTICATED)
self.whitelist = set(settings.RATE_LIMIT_WHITELIST_IPS) self.whitelist = set(settings.RATE_LIMIT_WHITELIST_IPS)
self._rl_cache = caches['ratelimit'] self._rl_cache = caches['ratelimit']
logger.info(
'[RATELIMIT] anon=%s auth=%s bot=%s whitelist=%s',
settings.RATE_LIMITER_RATE,
settings.RATE_LIMITER_RATE_AUTHENTICATED,
settings.RATE_LIMITER_RATE_BOT,
list(self.whitelist) or '(none)',
)
def __call__(self, request): def __call__(self, request):
decision = self._evaluate(request) decision = self._evaluate(request)
if decision['action'] == 'block': if decision['action'] == 'block':
logger.warning( logger.warning(
'ratelimit_block reason=%s ip=%s path=%s dry_run=%s namespace=%s', 'ratelimit_block reason=%s ip=%s path=%s namespace=%s',
decision['reason'], decision['reason'],
decision['ip'], decision['ip'],
request.path, request.path,
self.dry_run,
_NAMESPACE, _NAMESPACE,
extra={'ua': request.META.get('HTTP_USER_AGENT', '')}, extra={'ua': request.META.get('HTTP_USER_AGENT', '')},
) )
if not self.dry_run: response = HttpResponse(status=429)
return HttpResponse(status=429) response['Retry-After'] = self.BLOCK_TTL
return response
logger.debug(
'ratelimit_pass ip=%s path=%s user=%s namespace=%s',
decision['ip'],
request.path,
getattr(getattr(request, 'user', None), 'pk', 'anon'),
_NAMESPACE,
)
return self.get_response(request) return self.get_response(request)
# ------------------------------------------------------------------ # ------------------------------------------------------------------
@ -206,12 +246,16 @@ class RateLimitMiddleware:
if ip in self.whitelist: if ip in self.whitelist:
return {'action': 'pass', 'ip': ip} return {'action': 'pass', 'ip': ip}
# Check 1: known bad UA # Check 1: known bad UA (hardcoded Python list — substring match)
ua = request.META.get('HTTP_USER_AGENT', '') ua = request.META.get('HTTP_USER_AGENT', '')
for fragment in BOT_UA_FRAGMENTS: for fragment in BOT_UA_FRAGMENTS:
if fragment.lower() in ua.lower(): if fragment.lower() in ua.lower():
return {'action': 'block', 'reason': 'known_ua', 'ip': ip} return {'action': 'block', 'reason': 'known_ua', 'ip': ip}
# Check 1b: runtime UA deny list (Redis SET — token hash match)
if self._is_redis_blocked_ua(ua):
return {'action': 'block', 'reason': 'redis_ua', 'ip': ip}
# Check 2: IP already blocked # Check 2: IP already blocked
if self._rl_cache.get(RL_IP_BLOCKED.format(ip=ip)): if self._rl_cache.get(RL_IP_BLOCKED.format(ip=ip)):
return {'action': 'block', 'reason': 'ip_blocked', 'ip': ip} return {'action': 'block', 'reason': 'ip_blocked', 'ip': ip}
@ -268,20 +312,46 @@ class RateLimitMiddleware:
return {'action': 'pass', 'ip': ip} return {'action': 'pass', 'ip': ip}
# ------------------------------------------------------------------ # ------------------------------------------------------------------
# Helpers # Helpers — delegate to module-level so media.py can reuse them
# ------------------------------------------------------------------ # ------------------------------------------------------------------
def _incr_with_ttl(self, key, ttl): def _incr_with_ttl(self, key, ttl):
return _incr_with_ttl(key, ttl)
def _refresh_ua_blocklist(self):
""" """
Atomic INCR + EXPIRE via Redis Lua script. Fetch the full UA deny list from Redis DB 1 (SMEMBERS).
Falls back to non-atomic cache get/set when Redis is unavailable Stores sha256 hex-strings in the class-level set.
(dry-run mode or file-based cache correct enough for logging). Falls back silently an empty set means no runtime blocks.
""" """
try: try:
from django_redis import get_redis_connection from django_redis import get_redis_connection
client = get_redis_connection('ratelimit') client = get_redis_connection('ratelimit')
return client.eval(_INCR_LUA, 1, key, ttl) raw = client.smembers(RL_UA_BLOCKLIST)
except Exception: RateLimitMiddleware._ua_blocklist = {
count = (self._rl_cache.get(key) or 0) + 1 m.decode() if isinstance(m, bytes) else m for m in raw
self._rl_cache.set(key, count, timeout=ttl) }
return count RateLimitMiddleware._ua_blocklist_fetched_at = time.time()
logger.debug('[RATELIMIT] ua_blocklist refreshed entries=%d', len(raw))
except Exception as exc:
logger.debug('[RATELIMIT] ua_blocklist refresh skipped: %s', exc)
def _is_redis_blocked_ua(self, ua):
"""
Return True if any slash/space/semicolon token in `ua` has a sha256
that appears in the Redis UA deny list.
The SET stores sha256(fragment) e.g. sha256('GPTBot').
Tokenising by common UA separators means 'GPTBot/1.1 (OpenAI)'
produces token 'GPTBot' whose hash matches the seeded entry.
Degrades to False when Redis is unavailable.
"""
if time.time() - self._ua_blocklist_fetched_at > settings.RATE_LIMITER_UA_BLOCKLIST_REFRESH:
self._refresh_ua_blocklist()
if not self._ua_blocklist:
return False
tokens = re.split(r'[\s/;()+,]+', ua)
return any(
hashlib.sha256(t.encode()).hexdigest() in self._ua_blocklist
for t in tokens if t
)

104
sapl/settings.py

@ -43,7 +43,7 @@ ALLOWED_HOSTS = ['*']
LOGIN_REDIRECT_URL = '/' LOGIN_REDIRECT_URL = '/'
LOGIN_URL = '/login/?next=' LOGIN_URL = '/login/?next='
SAPL_VERSION = '3.1.165-RC2' SAPL_VERSION = '3.1.164-RC5'
if DEBUG: if DEBUG:
EMAIL_BACKEND = 'django.core.mail.backends.console.EmailBackend' EMAIL_BACKEND = 'django.core.mail.backends.console.EmailBackend'
@ -146,8 +146,6 @@ MIDDLEWARE = [
'sapl.middleware.endpoint_restriction.EndpointRestrictionMiddleware', 'sapl.middleware.endpoint_restriction.EndpointRestrictionMiddleware',
'django.middleware.csrf.CsrfViewMiddleware', 'django.middleware.csrf.CsrfViewMiddleware',
'django.contrib.auth.middleware.AuthenticationMiddleware', 'django.contrib.auth.middleware.AuthenticationMiddleware',
# RateLimitMiddleware runs after AuthenticationMiddleware so it can
# distinguish authenticated users (higher threshold) from anonymous ones.
'sapl.middleware.ratelimit.RateLimitMiddleware', 'sapl.middleware.ratelimit.RateLimitMiddleware',
'django.contrib.messages.middleware.MessageMiddleware', 'django.contrib.messages.middleware.MessageMiddleware',
'django.middleware.clickjacking.XFrameOptionsMiddleware', 'django.middleware.clickjacking.XFrameOptionsMiddleware',
@ -222,62 +220,69 @@ POD_NAMESPACE = config('POD_NAMESPACE', default=host)
REDIS_URL = config('REDIS_URL', default='') REDIS_URL = config('REDIS_URL', default='')
CACHE_BACKEND = config('CACHE_BACKEND', default='file') CACHE_BACKEND = config('CACHE_BACKEND', default='file')
_redis_ready = CACHE_BACKEND == 'redis' and bool(REDIS_URL)
_redis_pool = { def _build_cache_layer(pod_namespace, cache_backend, redis_url):
"""
Return the CACHES dict for the given runtime environment.
Two backends are always defined:
default DB 0: page/view/static-file cache, KEY_PREFIX isolates tenants.
ratelimit DB 1: rate-limiter counters; pass-through KEY_FUNCTION keeps
raw 'rl:*' keys consistent between RateLimitMiddleware
(get_redis_connection) and @ratelimit decorator paths.
Redis path: both backends share the same connection-pool settings so a
single pool object is created once and referenced by both caches.
File path: used in development and as a fallback when Redis is absent.
"""
if cache_backend == 'redis' and bool(redis_url):
_pool = {
'max_connections': 6, # 1,200 pods × 2 workers × 6 = 14,400 peak 'max_connections': 6, # 1,200 pods × 2 workers × 6 = 14,400 peak
'socket_timeout': 0.5, 'socket_timeout': 0.5,
'socket_connect_timeout': 0.5, 'socket_connect_timeout': 0.5,
} }
return {
CACHES = {
# DB0 — page / view / static-file cache
'default': { 'default': {
'BACKEND': ( 'BACKEND': 'django_redis.cache.RedisCache',
'django_redis.cache.RedisCache' if _redis_ready 'LOCATION': f'{redis_url}/0',
else 'django.core.cache.backends.filebased.FileBasedCache' 'KEY_PREFIX': f'cache:{pod_namespace}',
), 'TIMEOUT': 300,
'LOCATION': REDIS_URL + '/0' if _redis_ready else '/var/tmp/django_cache',
'KEY_PREFIX': f'cache:{POD_NAMESPACE}',
**(
{
'OPTIONS': { 'OPTIONS': {
'CLIENT_CLASS': 'django_redis.client.DefaultClient', 'CLIENT_CLASS': 'django_redis.client.DefaultClient',
'CONNECTION_POOL_KWARGS': _redis_pool, 'CONNECTION_POOL_KWARGS': _pool,
'IGNORE_EXCEPTIONS': True, # degrades to cache miss on failure 'IGNORE_EXCEPTIONS': True, # degrades to cache miss on Redis failure
}, },
'TIMEOUT': 300,
} if _redis_ready else {
'OPTIONS': {'MAX_ENTRIES': 10000},
}
),
}, },
# DB1 — rate-limiter counters (raw keys, no KEY_PREFIX / version mangling)
'ratelimit': { 'ratelimit': {
'BACKEND': ( 'BACKEND': 'django_redis.cache.RedisCache',
'django_redis.cache.RedisCache' if _redis_ready 'LOCATION': f'{redis_url}/1',
else 'django.core.cache.backends.filebased.FileBasedCache'
),
'LOCATION': REDIS_URL + '/1' if _redis_ready else '/var/tmp/django_ratelimit_cache',
# Pass-through key function so django-ratelimit decorator keys ('rl:{hash}')
# are stored as-is, matching the 'rl:*' keys written directly by
# RateLimitMiddleware via get_redis_connection(). Without this, Django's
# default key function would produce ':1:rl:{hash}' (empty prefix + version).
'KEY_FUNCTION': 'sapl.middleware.ratelimit.make_ratelimit_cache_key', 'KEY_FUNCTION': 'sapl.middleware.ratelimit.make_ratelimit_cache_key',
**(
{
'OPTIONS': { 'OPTIONS': {
'CLIENT_CLASS': 'django_redis.client.DefaultClient', 'CLIENT_CLASS': 'django_redis.client.DefaultClient',
'CONNECTION_POOL_KWARGS': _redis_pool, 'CONNECTION_POOL_KWARGS': _pool,
'IGNORE_EXCEPTIONS': True, 'IGNORE_EXCEPTIONS': True,
}, },
} if _redis_ready else { },
'OPTIONS': {'MAX_ENTRIES': 5000},
} }
),
return {
'default': {
'BACKEND': 'django.core.cache.backends.filebased.FileBasedCache',
'LOCATION': '/var/tmp/django_cache',
'KEY_PREFIX': f'cache:{pod_namespace}',
'OPTIONS': {'MAX_ENTRIES': 10000},
},
'ratelimit': {
'BACKEND': 'django.core.cache.backends.filebased.FileBasedCache',
'LOCATION': '/var/tmp/django_ratelimit_cache',
'KEY_FUNCTION': 'sapl.middleware.ratelimit.make_ratelimit_cache_key',
'OPTIONS': {'MAX_ENTRIES': 5000},
}, },
} }
CACHES = _build_cache_layer(POD_NAMESPACE, CACHE_BACKEND, REDIS_URL)
RATELIMIT_USE_CACHE = 'ratelimit' RATELIMIT_USE_CACHE = 'ratelimit'
ROOT_URLCONF = 'sapl.urls' ROOT_URLCONF = 'sapl.urls'
@ -394,10 +399,6 @@ FILE_UPLOAD_TEMP_DIR = '/var/interlegis/sapl/tmp'
# --------------------------------------------------------------------------- # ---------------------------------------------------------------------------
# Rate limiting — RateLimitMiddleware (sapl/middleware/ratelimit.py) # Rate limiting — RateLimitMiddleware (sapl/middleware/ratelimit.py)
# --------------------------------------------------------------------------- # ---------------------------------------------------------------------------
# Start with RATELIMIT_DRY_RUN=True; flip to False one check at a time
# after validating in logs that no legitimate traffic is flagged.
RATELIMIT_DRY_RUN = config('RATELIMIT_DRY_RUN', default=True, cast=bool)
RATE_LIMITER_RATE = config('RATE_LIMITER_RATE', default='35/m') RATE_LIMITER_RATE = config('RATE_LIMITER_RATE', default='35/m')
RATE_LIMITER_RATE_AUTHENTICATED = config('RATE_LIMITER_RATE_AUTHENTICATED', default='120/m') RATE_LIMITER_RATE_AUTHENTICATED = config('RATE_LIMITER_RATE_AUTHENTICATED', default='120/m')
RATE_LIMITER_RATE_BOT = config('RATE_LIMITER_RATE_BOT', default='5/m') RATE_LIMITER_RATE_BOT = config('RATE_LIMITER_RATE_BOT', default='5/m')
@ -410,6 +411,14 @@ RATE_LIMIT_WHITELIST_IPS = config(
cast=lambda v: [x.strip() for x in v.split(',') if x.strip()], cast=lambda v: [x.strip() for x in v.split(',') if x.strip()],
) )
# Seconds between re-fetches of the runtime UA deny list from Redis DB 1.
# Lower values pick up new blocked UAs faster; higher values reduce Redis round-trips.
RATE_LIMITER_UA_BLOCKLIST_REFRESH = config('RATE_LIMITER_UA_BLOCKLIST_REFRESH', default=60, cast=int)
# Media file serving — serve_media (sapl/base/media.py) via X-Accel-Redirect.
MEDIA_PATH_COUNTER_TTL = config('MEDIA_PATH_COUNTER_TTL', default=60, cast=int) # seconds — per-path counter window
MEDIA_FILE_CACHE_TTL = config('MEDIA_FILE_CACHE_TTL', default=3600, cast=int) # seconds — content-type metadata TTL
# --------------------------------------------------------------------------- # ---------------------------------------------------------------------------
# Anonymous page caching — AnonCachePageMixin (sapl/middleware/page_cache.py) # Anonymous page caching — AnonCachePageMixin (sapl/middleware/page_cache.py)
# TTLs apply only to anonymous (unauthenticated) GET responses. # TTLs apply only to anonymous (unauthenticated) GET responses.
@ -422,6 +431,13 @@ PAGE_CACHE_TTL_DETAIL = config('PAGE_CACHE_TTL_DETAIL', default=300, cast=int)
# High-stability detail views (parlamentar, comissão) — change only each term # High-stability detail views (parlamentar, comissão) — change only each term
PAGE_CACHE_TTL_STABLE = config('PAGE_CACHE_TTL_STABLE', default=600, cast=int) PAGE_CACHE_TTL_STABLE = config('PAGE_CACHE_TTL_STABLE', default=600, cast=int)
logger.info(
'[PAGE_CACHE] list=%ds detail=%ds stable=%ds',
PAGE_CACHE_TTL_LIST,
PAGE_CACHE_TTL_DETAIL,
PAGE_CACHE_TTL_STABLE,
)
# Internationalization # Internationalization
# https://docs.djangoproject.com/en/1.8/topics/i18n/ # https://docs.djangoproject.com/en/1.8/topics/i18n/
LANGUAGE_CODE = 'pt-br' LANGUAGE_CODE = 'pt-br'
@ -512,7 +528,7 @@ CRISPY_FAIL_SILENTLY = not DEBUG
FILTERS_HELP_TEXT_FILTER = False FILTERS_HELP_TEXT_FILTER = False
LOGGING_CONSOLE_VERBOSE = config( LOGGING_CONSOLE_VERBOSE = config(
'LOGGING_CONSOLE_VERBOSE', cast=bool, default=False) 'LOGGING_CONSOLE_VERBOSE', cast=bool, default=True)
LOGGING = { LOGGING = {
'version': 1, 'version': 1,

13
sapl/urls.py

@ -82,6 +82,13 @@ urlpatterns += [
] ]
# Media files — served via X-Accel-Redirect in production, directly in DEBUG.
from sapl.base.media import serve_media # noqa: E402
urlpatterns += [
url(r'^media/(?P<path>.*)$', serve_media, name='serve_media'),
]
# Fix a static asset finding error on Django 1.9 + gunicorn: # Fix a static asset finding error on Django 1.9 + gunicorn:
# http://stackoverflow.com/questions/35510373/ # http://stackoverflow.com/questions/35510373/
@ -95,11 +102,7 @@ if settings.DEBUG:
urlpatterns += static(settings.STATIC_URL, urlpatterns += static(settings.STATIC_URL,
document_root=settings.STATIC_ROOT) document_root=settings.STATIC_ROOT)
urlpatterns += [ # media/ is handled by serve_media below (works in DEBUG too)
url(r'^media/(?P<path>.*)$', view_static_server, {
'document_root': settings.MEDIA_ROOT,
}),
]
# Make the rate limiter return 429 (Too Many Requests) instead of 403 (Forbidden Access) # Make the rate limiter return 429 (Too Many Requests) instead of 403 (Forbidden Access)

Loading…
Cancel
Save