Phase 6: scanner probe blocking, plan consolidation, and flow diagram

Code: - Block IPs dynamically on scanner extension probes (.php, .asp, .aspx, .jsp, .cgi, .env) — writes rl:ip:{ip}:blocked on first hit; subsequent requests short-circuit at check 2 with zero counting overhead - Add RATE_LIMIT_SCANNER_EXTENSIONS setting (space-separated, env-overridable) - Import os in ratelimit.py for os.path.splitext Plan (RATE_LIMITER_PLAN.md → RATE-LIMITER-PLAN.md): - Rename to kebab-case for consistency with rate-limiter-v2.md - Merge missing content from rate-limiter-v2.md: context & problem statement, component diagram (DB0/DB1 split), decision log, Gunicorn tuning, nginx real-IP fixes, upload settings, N+1 fix (synced to actual implementation), enforcement graduation order, decorator migration table, file serving decision matrix, dynamic page caching guidelines, open questions - Add Mermaid decision flow diagram for RateLimitMiddleware._evaluate() - Add rationale section for rl:{ns}:ip:{ip}:w:{bucket} namespace scoping (5 arguments covering attack pattern match, gaming resistance, key orthogonality, multi-portal fairness, and isolation contract) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
3 weeks ago · c5eea025ab
5 changed files with 1000 additions and 505 deletions
--- a/plan/RATE-LIMITER-PLAN.md
+++ b/plan/RATE-LIMITER-PLAN.md
@ -0,0 +1,967 @@
+# SAPL — Rate Limiter & Redis Operations
+
+> **Scope**: Django / Gunicorn / nginx / Kubernetes fleet of 1,200+ pods.
+> Each pod has a dedicated PostgreSQL instance. A K8s Ingress sits in front of all tenants.
+> **This document is canonical** — all earlier session notes are consolidated here.
+
+---
+
+## Context & Problem Statement
+
+### Fleet
+
+| Item | Detail |
+|------|--------|
+| System | SAPL — Django 2.2, legislative management for Brazilian municipal chambers |
+| Fleet | ~1,200 Kubernetes pods, each with a dedicated PostgreSQL pod |
+| Pod limits | 1 core CPU (limit) / 35m (request) · 1600Mi RAM (limit) / 800Mi (request) |
+| Users | Legislative house staff, often behind NAT (many users, one public IP) |
+| Workloads | PDF generation (synchronous, ReportLab), file uploads up to 150 MB, WebSocket voting panel |
+
+### OOM Kill Pattern
+
+Workers grow from ~35 MB at birth to 800–900 MB within 2–3 minutes, then are killed and replaced in a continuous cycle.
+
+Root causes:
+- Bot scraping triggers synchronous PDF generation — entire document built in RAM (ReportLab)
+- `worker_max_memory_per_child` only checks **between requests**; workers blocked on long requests are never recycled
+- `TIMEOUT=300` lets bots hold threads for up to 5 minutes while memory accumulates
+- 3 workers × 300 MB each = ~900 MB — breaching the 800Mi request threshold
+
+### Bot Traffic Profile (Barueri pod, 16 days, 662 k requests)
+
+| Actor | Requests | % of total |
+|-------|----------|-----------|
+| Googlebot | ~154,000 | 23.2% |
+| Chrome/98.0.4758 (spoofed scraper) | 90,774 | 13.7% |
+| kube-probe (healthcheck) | 69,065 | 10.4% |
+| meta-externalagent | 28,325 | 4.3% |
+| GPTBot | 11,489 | 1.7% |
+| bingbot | 7,639 | 1.1% |
+| OAI-SearchBot + Applebot | 6,681 | 1.0% |
+| **Total identified bots** | **~377,000** | **~56.9%** |
+
+**Botnet fingerprint:**
+- Rotates User-Agents (Chrome/121, Chrome/122, Firefox/123, Safari/17…) across requests
+- Crawls all sub-endpoints of the same matéria within 1 second from different IPs
+- Distributes crawling across tenants — each pod stays under the per-pod rate limit, never triggering it
+- Primary targets: `/relatorios/{id}/etiqueta-materia-legislativa` (~40 KB PDF) and all `/materia/{id}/*` sub-endpoints
+
+### Static File Traffic (from CSV analysis)
+
+| Category | Requests | Transfers |
+|----------|----------|----------|
+| Logos / images | 62,776 | ~24 GB |
+| PDFs | 8,869 | 5.1 GB |
+| Parliamentarian photos | 11,856 | ~0.5 GB |
+| **Total** | **83,501** | **~30 GB** |
+
+Top offender: `Brasão - Foz do Iguaçu.png` — 14,512 requests, 5.6 GB from a single 392 KB file.
+
+### Hard Constraints
+
+| Constraint | Impact |
+|------------|--------|
+| Per-pod PostgreSQL | Rate-limit counters not shared across pods |
+| NAT environments | IP-based rate limiting causes false positives |
+| `TIMEOUT=300` / uploads to 150 MB | Must not be broken — intentional for slow workflows |
+
+---
+
+## Architecture Overview
+
+### Component Diagram
+
+```mermaid
+graph TD
+    Client([Bot / Human Client])
+    nginx[nginx]
+    gunicorn[Gunicorn\n2 workers / 4 threads]
+    mw[Django Middleware\nRateLimitMiddleware]
+    view[View Layer\nCBV + decorators]
+    db0[(Redis DB0\npage cache)]
+    db1[(Redis DB1\nrate limiter)]
+    pg[(PostgreSQL\nper-pod)]
+    fs[Filesystem\nPDFs / media]
+
+    Client -->|HTTP| nginx
+    nginx -->|proxy_pass| gunicorn
+    gunicorn --> mw
+    mw -->|pass| view
+    mw -->|429| nginx
+    view --> pg
+    view --> fs
+    view -->|read/write cached pages| db0
+    mw -->|counters + blocked markers| db1
+```
+
+> DB2 is reserved for Django Channels (WebSocket — future).
+
+### Redis Memory Budget
+
+| Key type | Key schema | TTL | DB | Est. size |
+|----------|-----------|-----|----|----------|
+| Page / view cache | `cache:{ns}:*` | 60–600 s | 0 | ~0.5 GB |
+| Static cache (images/logos) | `static:{ns}:{sha256}` | 3–24 h | 0 | ~2.4 GB |
+| IP request counter | `rl:ip:{ip}:reqs` | 60 s | 1 | ~0.6 MB |
+| IP blocked marker | `rl:ip:{ip}:blocked` | 300 s | 1 | ~0.06 MB |
+| User request counter | `rl:{ns}:user:{uid}:reqs` | 60 s | 1 | negligible |
+| User blocked marker | `rl:{ns}:user:{uid}:blocked` | 300 s | 1 | negligible |
+| Path counter | `rl:{ns}:path:{sha256}:reqs` | 60 s | 1 | ~0.3 MB |
+| UA deny list | `rl:bot:ua:blocked` | permanent SET | 1 | ~0.03 MB |
+| NS/IP/window counter | `rl:{ns}:ip:{ip}:w:{bucket}` | 120 s | 1 | ~0.6 MB |
+| Redis overhead (× 1.5) | | | | ~1.6 GB |
+| **Total ceiling** | | | | **~5 GB** |
+
+---
+
+## Decision Log
+
+| Decision | Chosen | Rationale |
+|----------|--------|-----------|
+| Redis topology | **Single pod** (no Sentinel, no Cluster) | 65 MB of active data fits comfortably; cluster complexity not justified |
+| PDF caching in Redis | **No** — ETags + sendfile are sufficient | Once rate limiting + ETags are active, repeat requests become 304s with zero bytes transferred |
+| Rate-limit enforcement | **Django middleware** with shared Redis | No nginx image changes required; solves cross-pod consistency immediately |
+| `worker_max_memory_per_child` | **400 MB** | Pod limit 1600Mi, 2 workers × 400 MB = 800 MB — leaves 800 Mi headroom |
+| `sendfile off` → `on` | **Bug** — flip to `on` | No valid production reason found; disabling userspace copy is always correct |
+| `/media/` serving | **X-Accel-Redirect** | Routes all `/media/` through Gunicorn so Django middleware runs; nginx serves bytes via internal location |
+| Cache backend switch | **At pod startup** via `start.sh` + waffle switch | Pod restart is acceptable; avoids per-request runtime overhead |
+
+---
+
+## Directory layout
+
+```
+docker/k8s/
+└── redis/
+    ├── redis-configmap.yaml    # redis.conf — no persistence, allkeys-lru, 5 GB ceiling
+    ├── redis-deployment.yaml   # Deployment (1 replica, redis:7-alpine)
+    └── redis-service.yaml      # ClusterIP service on port 6379
+```
+
+---
+
+## Prerequisites
+
+- `kubectl` configured to talk to the target cluster.
+- A `sapl-redis` namespace (created below if it doesn't exist).
+
+---
+
+## Deploy
+
+```bash
+# 1. Create the namespace (idempotent)
+rancher kubectl create namespace sapl-redis --dry-run=client -o yaml | rancher kubectl apply -f -
+
+# 2. Apply all three manifests
+rancher kubectl apply -f docker/k8s/redis/redis-configmap.yaml
+rancher kubectl apply -f docker/k8s/redis/redis-deployment.yaml
+rancher kubectl apply -f docker/k8s/redis/redis-service.yaml
+
+# 3. Verify the pod is Running
+rancher kubectl -n sapl-redis get pods -l app=sapl-redis
+```
+
+Expected output:
+```
+NAME                          READY   STATUS    RESTARTS   AGE
+sapl-redis-6d9f8b7c4d-xk2lm   1/1     Running   0          30s
+```
+
+---
+
+## Verify the rate limiter
+
+`scripts/test_ratelimiter.py` fires repeated GET requests at a SAPL URL and reports
+when the first 429 is returned.
+
+### Usage
+
+```
+python scripts/test_ratelimiter.py <URL> [-n NUM] [-d DELAY] [-t TIMEOUT]
+```
+
+| Flag | Default | Meaning |
+|------|---------|---------|
+| `url` | *(required)* | Full URL including scheme, e.g. `http://localhost` |
+| `-n`, `--num-requests` | `50` | Maximum requests to send |
+| `-d`, `--delay` | `0.1` | Seconds between requests |
+| `-t`, `--timeout` | `10` | Per-request timeout in seconds |
+
+The script stops and prints a summary as soon as a 429 is received.
+
+### Examples
+
+```bash
+# Hit the anonymous threshold (35 req/min) — fire 40 requests with minimal delay
+python scripts/test_ratelimiter.py http://localhost -n 40 -d 0.05
+
+# Slower fire — check that legitimate traffic is not rate-limited
+python scripts/test_ratelimiter.py http://localhost -n 20 -d 2
+
+# Test against a staging pod via port-forward
+rancher kubectl port-forward -n <NAMESPACE> deploy/sapl 8080:80 &
+python scripts/test_ratelimiter.py http://localhost:8080 -n 40 -d 0.05
+```
+
+### Reading the output
+
+```
+Request   1: Status 200 | Time: 0.045s
+...
+Request  36: Status 429 | Time: 0.038s
+  -> Rate limited on request 36
+
+Summary:
+  Total requests attempted: 36
+  Successful (200):          35
+  Rate limited (429):        1
+  First 429 occurred at request: 36
+```
+
+A first-429 near the configured anonymous threshold (35 req/min) confirms the
+middleware is wired correctly. A first-429 much earlier points to nginx `limit_req`
+firing before Django sees the request.
+
+---
+
+## Inject REDIS_URL into SAPL instances
+
+`REDIS_URL` points at the shared instance:
+
+```
+redis://redis.sapl-redis.svc.cluster.local:6379
+         ^^^^^  ^^^^^^^^^^
+         svc    namespace
+```
+
+`start.sh` picks it up on every pod startup and sets the `REDIS_CACHE` waffle switch
+automatically — no further intervention needed.
+
+### Fleet-wide rollout
+
+Uses the `app.kubernetes.io/name=sapl` pod label to discover every SAPL namespace
+automatically — onboarding a new municipality requires no script changes.
+
+```bash
+for ns in $(rancher kubectl get pods -A -l app.kubernetes.io/name=sapl \
+  -o jsonpath='{.items[*].metadata.namespace}' | tr ' ' '\n' | sort -u); do
+  rancher kubectl set env deployment/sapl \
+    REDIS_URL=redis://redis.sapl-redis.svc.cluster.local:6379 \
+    -n $ns
+done
+```
+
+### Roll back
+
+```bash
+for ns in $(rancher kubectl get pods -A -l app.kubernetes.io/name=sapl \
+  -o jsonpath='{.items[*].metadata.namespace}' | tr ' ' '\n' | sort -u); do
+  rancher kubectl set env deployment/sapl REDIS_URL- -n $ns
+done
+```
+
+`kubectl set env deployment/sapl REDIS_URL-` (trailing `-`) removes the variable.
+`start.sh` then falls back to file-based cache automatically.
+
+---
+
+## Monitor
+
+### Pod and events
+
+```bash
+# Pod status
+rancher kubectl -n sapl-redis get pods -l app=sapl-redis -o wide
+
+# Deployment events (useful right after apply)
+rancher kubectl -n sapl-redis describe deployment sapl-redis
+
+# Pod events (OOMKill, restarts, etc.)
+rancher kubectl -n sapl-redis describe pod -l app=sapl-redis
+```
+
+### Logs
+
+```bash
+# Tail live logs
+rancher kubectl -n sapl-redis logs -f deploy/sapl-redis
+
+# Last 100 lines
+rancher kubectl -n sapl-redis logs deploy/sapl-redis --tail=100
+```
+
+### Redis INFO
+
+```bash
+# Memory usage
+rancher kubectl exec -n sapl-redis deploy/sapl-redis -- \
+  redis-cli info memory \
+  | grep -E 'used_memory_human|maxmemory_human|mem_fragmentation_ratio'
+
+# Connection pressure
+rancher kubectl exec -n sapl-redis deploy/sapl-redis -- \
+  redis-cli info stats \
+  | grep -E 'rejected_connections|instantaneous_ops_per_sec'
+
+# Key distribution per DB
+rancher kubectl exec -n sapl-redis deploy/sapl-redis -- redis-cli info keyspace
+
+# Recent slow queries
+rancher kubectl exec -n sapl-redis deploy/sapl-redis -- redis-cli slowlog get 10
+
+# Live command sampling (1-second window)
+rancher kubectl exec -n sapl-redis deploy/sapl-redis -- redis-cli --latency-history -i 1
+```
+
+### Rate-limiter keys (DB 1)
+
+```bash
+rancher kubectl exec -n sapl-redis deploy/sapl-redis -- \
+  redis-cli -n 1 dbsize
+
+rancher kubectl exec -n sapl-redis deploy/sapl-redis -- \
+  redis-cli -n 1 --scan --pattern 'rl:ip:*' | head -20
+```
+
+---
+
+## Seed the UA deny list (once after first deploy)
+
+`rl:bot:ua:blocked` is a permanent Redis SET in DB 1.  Each member is the
+SHA-256 of a **UA token** — the identifying fragment extracted after splitting
+on `/`, spaces, `;`, `(`, `)`, e.g.:
+
+```
+UA string:  "GPTBot/1.1 (+https://openai.com/gptbot)"
+Tokens:      GPTBot  1.1  +https:  ...
+Hash stored: sha256("GPTBot")
+```
+
+The middleware (`_is_redis_blocked_ua`) tokenises the incoming UA the same
+way and checks each token hash against the cached set.  The SET is fetched
+from Redis at most once per `RATE_LIMITER_UA_BLOCKLIST_REFRESH` seconds (default 60)
+per worker process.
+
+The bots in `BOT_UA_FRAGMENTS` (Python list, always active) and this Redis
+SET are **independent** — the Python list provides the baseline and the Redis
+SET allows adding new offenders at runtime **without a code deploy**.
+
+```bash
+rancher kubectl exec -n sapl-redis deploy/sapl-redis -- redis-cli -n 1 \
+  SADD rl:bot:ua:blocked \
+    "$(echo -n 'GPTBot'             | sha256sum | cut -d' ' -f1)" \
+    "$(echo -n 'ClaudeBot'          | sha256sum | cut -d' ' -f1)" \
+    "$(echo -n 'PerplexityBot'      | sha256sum | cut -d' ' -f1)" \
+    "$(echo -n 'Bytespider'         | sha256sum | cut -d' ' -f1)" \
+    "$(echo -n 'AhrefsBot'          | sha256sum | cut -d' ' -f1)" \
+    "$(echo -n 'meta-externalagent' | sha256sum | cut -d' ' -f1)"
+
+# Add a new offender at runtime (picked up within RATE_LIMITER_UA_BLOCKLIST_REFRESH seconds)
+rancher kubectl exec -n sapl-redis deploy/sapl-redis -- redis-cli -n 1 \
+  SADD rl:bot:ua:blocked "$(echo -n 'NewBot' | sha256sum | cut -d' ' -f1)"
+```
+
+---
+
+## Local standalone Redis (development / testing)
+
+No Kubernetes? Run Redis directly with Docker:
+
+```bash
+sudo docker run --rm -p 6379:6379 redis:7-alpine \
+  redis-server --save "" --appendonly no
+```
+
+Then point Django at it by exporting the env var before starting the dev server:
+
+```bash
+export REDIS_URL="redis://localhost:6379"
+export CACHE_BACKEND="redis"
+python manage.py runserver
+```
+
+Or add them to your local `.env` file:
+
+```
+REDIS_URL=redis://localhost:6379
+CACHE_BACKEND=redis
+```
+
+> **Note**: the waffle switch `REDIS_CACHE` must also be `on` in your local
+> database for `start.sh` to activate the Redis backend. Run:
+> ```bash
+> python manage.py waffle_switch REDIS_CACHE on --create
+> ```
+
+---
+
+## Update `redis.conf` without redeploying
+
+```bash
+# Edit the ConfigMap
+rancher kubectl -n sapl-redis edit configmap redis-config
+
+# Restart the pod to pick up the new config
+rancher kubectl -n sapl-redis rollout restart deployment/sapl-redis
+```
+
+---
+
+## Gunicorn tuning
+
+`docker/startup_scripts/gunicorn.conf.py` — resolved values for the current pod budget (1600Mi RAM, 1 CPU):
+
+```python
+NUM_WORKERS = int(os.getenv("WEB_CONCURRENCY", "2"))      # was 3
+THREADS     = int(os.getenv("GUNICORN_THREADS", "4"))      # was 8
+TIMEOUT     = int(os.getenv("GUNICORN_TIMEOUT", "120"))    # was 300
+
+max_requests                = 1000
+max_requests_jitter         = 200
+worker_max_memory_per_child = 400 * 1024 * 1024  # 400 MB — was 300 MB
+```
+
+**Per-location timeout strategy** — nginx overrides the global Gunicorn timeout per-path:
+
+| Operation | Timeout | Rationale |
+|-----------|---------|-----------|
+| Normal page rendering | 60 s | No legitimate page should take > 60 s |
+| API endpoints | 30 s | Stateless, fast by design |
+| PDF download (cached / nginx) | 30 s | nginx serves from disk, worker not involved |
+| PDF generation (uncached) | 180 s | Kept high — addressed in a future phase |
+| Large file upload | 180 s | nginx buffers upload; worker processes after |
+
+---
+
+## nginx real-IP and core fixes
+
+Added to `docker/config/nginx/nginx.conf` (http {} block):
+
+```nginx
+# Kernel bypass — was off (bug)
+sendfile    on;
+tcp_nopush  on;
+tcp_nodelay on;
+
+# Real client IP from X-Forwarded-For set by K8s Ingress
+real_ip_header     X-Forwarded-For;
+real_ip_recursive  on;
+set_real_ip_from   10.0.0.0/8;
+set_real_ip_from   172.16.0.0/12;
+set_real_ip_from   192.168.0.0/16;
+```
+
+Without `real_ip_recursive on`, `$remote_addr` inside the pod would always be the Ingress IP, making IP-based rate limiting and blocking meaningless.
+
+---
+
+## Django upload settings
+
+Added to `sapl/settings.py` — files above 2 MB are streamed to disk rather than held in worker RAM.  Critical for 150 MB upload support without OOM pressure:
+
+```python
+FILE_UPLOAD_MAX_MEMORY_SIZE = 2 * 1024 * 1024       # 2 MB
+DATA_UPLOAD_MAX_MEMORY_SIZE = 10 * 1024 * 1024      # 10 MB
+MAX_DOC_UPLOAD_SIZE         = 150 * 1024 * 1024     # 150 MB
+FILE_UPLOAD_TEMP_DIR        = '/var/interlegis/sapl/tmp'
+```
+
+---
+
+## N+1 fix — `get_etiqueta_protocolos`
+
+`sapl/relatorios/views.py` — previously called `MateriaLegislativa.objects.filter()` inside a loop over protocols.  Fixed to **three queries total** regardless of volume (one for protocols, one for materias, one for documentos):
+
+```python
+# sapl/relatorios/views.py
+def get_etiqueta_protocolos(prots):
+    prot_list = list(prots)
+    if not prot_list:
+        return []
+
+    # Pre-fetch MateriaLegislativa for all protocols in one query.
+    materia_query = Q()
+    for p in prot_list:
+        materia_query |= Q(numero_protocolo=p.numero, ano=p.ano)
+    materias_map = {
+        (m.numero_protocolo, m.ano): m
+        for m in MateriaLegislativa.objects.filter(
+            materia_query).select_related('tipo')
+    }
+
+    # Pre-fetch DocumentoAdministrativo for all protocols in one query.
+    documentos_map = {
+        doc.protocolo_id: doc
+        for doc in DocumentoAdministrativo.objects.filter(
+            protocolo__in=prot_list).select_related('tipo')
+    }
+
+    protocolos = []
+    for p in prot_list:
+        dic = {}
+        dic['titulo'] = str(p.numero) + '/' + str(p.ano)
+        # ... timestamp / assunto / interessado / autor fields ...
+
+        materia = materias_map.get((p.numero, p.ano))
+        dic['num_materia'] = (
+            materia.tipo.sigla + ' ' + str(materia.numero) + '/' + str(materia.ano)
+            if materia else ''
+        )
+
+        documento = documentos_map.get(p.pk)
+        dic['num_documento'] = (
+            documento.tipo.sigla + ' ' + str(documento.numero) + '/' + str(documento.ano)
+            if documento else ''
+        )
+
+        dic['ident_processo'] = dic['num_materia'] or dic['num_documento']
+        protocolos.append(dic)
+    return protocolos
+```
+
+---
+
+## Rate limiting — two layers, two jobs
+
+SAPL enforces rate limits at two independent layers. They use different
+algorithms and protect different things; their thresholds must be tuned
+separately.
+
+### Layer 1 — nginx `limit_req` (leaky bucket)
+
+Defined in `docker/config/nginx/nginx.conf` (zones) and `sapl.conf` (burst).
+
+```
+sapl_general  rate=30r/m   # 1 token every 2 s
+sapl_heavy    rate=10r/m   # 1 token every 6 s  (PDF/report endpoints)
+```
+
+`burst=N nodelay` means nginx accepts up to N requests instantly above the
+current token level, then enforces the drip rate.  Requests beyond the burst
+cap return 429 before reaching Gunicorn — **zero Python cost**.
+
+Burst values are set at container startup via env vars:
+
+| Env var | Default | Location |
+|---------|---------|----------|
+| `NGINX_BURST_GENERAL` | `60` | `location /`, `location /media/` |
+| `NGINX_BURST_API` | `60` | `location /api/` |
+| `NGINX_BURST_HEAVY` | `20` | `location /relatorios/` |
+
+Defaults are 2× the zone's per-minute rate, so a user can spend a full
+minute's quota in a single burst before the leaky bucket takes over.
+
+### Layer 2 — Django `RateLimitMiddleware` (sliding window)
+
+Defined in `sapl/middleware/ratelimit.py`, backed by Redis DB 1.
+
+Requests that pass nginx reach Python.  The middleware counts them in a
+60-second sliding window per IP (anonymous) or per user (authenticated):
+
+| Env var | Default | Scope |
+|---------|---------|-------|
+| `RATE_LIMITER_RATE` | `35/m` | Anonymous IP |
+| `RATE_LIMITER_RATE_AUTHENTICATED` | `120/m` | Authenticated user |
+| `RATE_LIMITER_RATE_BOT` | `5/m` | *(reserved — bots are currently blocked outright, not counted)* |
+| `RATE_LIMITER_UA_BLOCKLIST_REFRESH` | `60` s | How often each worker re-fetches `rl:bot:ua:blocked` from Redis |
+
+When the window count hits the threshold the IP/user is written to a Redis
+blocked-set with a 300 s TTL and subsequent requests return 429 with
+`Retry-After: 300` — without touching the database.
+
+Decision flow inside `RateLimitMiddleware._evaluate()`:
+
+```
+1.  IP in whitelist?                          → pass (no further checks)
+1a. UA matches BOT_UA_FRAGMENTS list?         → 429  reason=known_ua
+1b. UA token hash in rl:bot:ua:blocked SET?   → 429  reason=redis_ua
+2.  IP in rl:ip:{ip}:blocked?                 → 429  reason=ip_blocked
+2b. Path extension in RATE_LIMIT_SCANNER_EXTENSIONS? → SET blocked, 429  reason=scanner_probe
+3.  Authenticated user?
+    3a. User in rl:{ns}:user:{uid}:blocked?   → 429  reason=user_blocked
+    3b. Suspicious headers (no Accept/AL)?    → 429  reason=suspicious_headers_auth
+    3c. User request count ≥ auth threshold?  → SET blocked, 429  reason=auth_user_rate
+4.  Anonymous:
+    4a. Suspicious headers?                   → 429  reason=suspicious_headers
+    4b. IP request count ≥ anon threshold?    → SET blocked, 429  reason=ip_rate
+    4c. NS/IP window count ≥ anon threshold?  → SET blocked, 429  reason=ua_rotation
+    → pass
+```
+
+### Decision flow diagram
+
+```mermaid
+flowchart TD
+    REQ([Request]) --> C1
+
+    C1{"Known bot UA?"}
+    C1 -- "yes — substring in BOT_UA_FRAGMENTS" --> R_UA([429\nknown_ua])
+    C1 -- no --> C1B
+
+    C1B{"Redis UA deny list?"}
+    C1B -- "yes — token hash in rl:bot:ua:blocked" --> R_RUA([429\nredis_ua])
+    C1B -- no --> C2
+
+    C2{"IP blocked?"}
+    C2 -- "yes — rl:ip:IP:blocked exists" --> R_IPB([429\nip_blocked])
+    C2 -- no --> C2B
+
+    C2B{"Scanner extension?\n.php .asp .aspx …"}
+    C2B -- yes --> SIPB["SET rl:ip:IP:blocked  TTL 300 s"]
+    SIPB --> R_SCN([429\nscanner_probe])
+    C2B -- no --> C3
+
+    C3{"Authenticated?"}
+    C3 -- yes --> C3A
+    C3 -- no --> C4A
+
+    subgraph AUTH ["Authenticated"]
+        C3A{"User blocked?"}
+        C3A -- "yes — rl:ns:user:UID:blocked" --> R_UB([429\nuser_blocked])
+        C3A -- no --> C3B
+        C3B{"Suspicious headers?\nno Accept-Language + no Accept"}
+        C3B -- yes --> R_SH([429\nsuspicious_headers_auth])
+        C3B -- no --> C3C
+        C3C{"User rate ≥ 120/min?"}
+        C3C -- yes --> SUB["SET rl:ns:user:UID:blocked  TTL 300 s"]
+        SUB --> R_AUR([429\nauth_user_rate])
+        C3C -- no --> PASS_A([✓ pass])
+    end
+
+    subgraph ANON ["Anonymous"]
+        C4A{"Suspicious headers?\nno Accept-Language + no Accept"}
+        C4A -- yes --> R_ASH([429\nsuspicious_headers])
+        C4A -- no --> C4B
+        C4B{"IP rate ≥ 35/min?"}
+        C4B -- yes --> SIPR["SET rl:ip:IP:blocked  TTL 300 s"]
+        SIPR --> R_IPR([429\nip_rate])
+        C4B -- no --> C4C
+        C4C{"NS/IP window hit\n≥ 35 in bucket?"}
+        C4C -- yes --> SUAR["SET rl:ip:IP:blocked  TTL 300 s"]
+        SUAR --> R_UAR([429\nua_rotation])
+        C4C -- no --> PASS_N([✓ pass])
+    end
+```
+
+### Enforcement graduation order
+
+Roll out to canary pods first; promote check-by-check in order of false-positive risk:
+
+| Order | Check | Reason | Risk | Condition to promote |
+|-------|-------|--------|------|---------------------|
+| 1st | `known_ua` | Substring in hardcoded `BOT_UA_FRAGMENTS` list | Zero | UA strings are deterministic |
+| 2nd | `redis_ua` | Token hash in `rl:bot:ua:blocked` SET | Zero | Keys only set manually by operators |
+| 3rd | `ip_blocked` | Marker set by prior proven-bad requests | Zero | Fast-path only, no new blocks created |
+| 4th | `scanner_probe` | Path ext in `RATE_LIMIT_SCANNER_EXTENSIONS` | Zero | Django never legitimately serves `.php`/`.asp`/etc. |
+| 5th | `ip_rate` | Rolling IP counter ≥ 35/min | Low | Threshold calibrated from canary logs |
+| 6th | `suspicious_headers` | No Accept-Language **and** no Accept | Medium | Confirmed no legitimate clients omit both headers |
+| 7th | `ua_rotation` (ns/window) | NS/IP clock-aligned bucket ≥ 35 | Medium | NAT IP whitelist in place (see Open Questions) |
+
+### Decorator migration
+
+For views where `django-ratelimit` decorators already exist:
+
+| Endpoint type | Action | Reason |
+|---------------|--------|--------|
+| List views (GET) | Remove after middleware stable | Middleware covers equivalent threshold |
+| Detail views (GET) | Remove after middleware stable | Middleware covers equivalent threshold |
+| Search / filter views | Remove last | Expensive queries — keep stricter per-view limit until traffic data confirms safety |
+| PDF / file generation | **Keep permanently** | Most expensive endpoint; per-view limit tighter than global |
+| Write endpoints (POST/PUT/DELETE) | **Keep permanently** | Different abuse surface |
+| Auth endpoints (login, reset) | **Keep permanently** | Credential stuffing; must be independent of IP rate |
+
+### Why they are not the same number
+
+| | nginx burst | Django threshold |
+|-|------------|-----------------|
+| **Algorithm** | Leaky bucket — token refills over time | Sliding window — hard count per 60 s |
+| **Protects** | Gunicorn workers from being flooded | Per-client fairness, business policy |
+| **Tuned by** | Capacity of the server | Acceptable request volume per client |
+| **Failure mode** | Workers overwhelmed | Legitimate user over-browsing |
+
+A user loading a page quickly may fire 5–10 Django requests in two seconds.
+With `rate=30r/m` (1 token/2 s) and `burst=60` they absorb that fine; the
+leaky bucket refills before they click the next link.  The Django threshold
+(35/m sliding window) catches sustained automated traffic from a single IP
+that looks like scraping even if it arrives slowly enough to beat the nginx
+burst cap.
+
+---
+
+## Request routing — how nginx reaches Django
+
+`proxy_pass http://sapl_server` forwards the HTTP request — with the original
+path intact — to the Gunicorn Unix socket.  Django doesn't know or care that
+nginx is in front; it sees a standard HTTP request.
+
+```
+GET /media/foo.pdf
+      │
+      ▼
+   nginx (sapl.conf)
+   location /media/ → proxy_pass to Unix socket
+      │
+      ▼
+   Gunicorn (WSGI server)
+   receives raw HTTP, calls Django WSGI application
+      │
+      ▼
+   Django middleware stack (settings.MIDDLEWARE)
+   RateLimitMiddleware → pass or 429
+      │
+      ▼
+   Django URL router (sapl/urls.py)
+   r'^media/(?P<path>.*)$' → serve_media
+      │
+      ▼
+   serve_media(request, path='foo.pdf')
+   returns HttpResponse with X-Accel-Redirect: /_accel/media/foo.pdf
+      │
+      ▼
+   nginx sees X-Accel-Redirect header
+   /_accel/media/ internal location → reads file from disk → sends to client
+```
+
+nginx does no routing beyond picking a `location` block.  The mapping from
+URL path to Python function lives entirely in `sapl/urls.py`.  `proxy_pass` is
+just a pipe.
+
+---
+
+## Media file serving — `serve_media` and X-Accel-Redirect
+
+All `/media/` requests (public and private) are routed through Gunicorn so that
+Django middleware runs on every hit.  Nginx serves the file bytes via
+`X-Accel-Redirect` — the Gunicorn worker is freed as soon as it sends the
+response headers.
+
+### nginx locations (`docker/config/nginx/sapl.conf`)
+
+```nginx
+# Proxied to Gunicorn — Django middleware + serve_media() run here.
+location /media/ {
+    limit_req zone=sapl_general burst=${NGINX_BURST_GENERAL} nodelay;
+    proxy_pass http://sapl_server;
+}
+
+# Internal — only reachable via X-Accel-Redirect, not by external clients.
+location /_accel/media/ {
+    internal;
+    alias /var/interlegis/sapl/media/;
+    sendfile on;
+    etag on;
+}
+```
+
+### Django view (`sapl/base/media.py`)
+
+`serve_media(request, path)` — registered at `^media/(?P<path>.*)$` in `sapl/urls.py`.
+
+Per-request steps:
+
+1. **Path traversal guard** — `os.path.abspath` check; raises 404 on escape.
+2. **Auth gate** — `documentos_privados/` paths require an authenticated session; redirects to login otherwise.
+3. **Path counter** — increments `rl:{ns}:path:{sha256}:reqs` in Redis DB 1 (TTL = `MEDIA_PATH_COUNTER_TTL`).
+4. **Serve** — in DEBUG: `django.views.static.serve` directly.  In production: `X-Accel-Redirect: /_accel/media/<path>`.  Nginx sets `Content-Type` from its own `mime.types`.
+
+### Settings
+
+| Setting | Default | Purpose |
+|---------|---------|---------|
+| `MEDIA_PATH_COUNTER_TTL` | `60` s | TTL for both URL-path and storage-path counters (DB 1) |
+
+### File serving decision matrix
+
+| File type | Size | Strategy |
+|-----------|------|----------|
+| Logos / images | Any | nginx `alias` + `sendfile` + ETag + `Cache-Control` |
+| Small PDFs | ≤ 360 KB | nginx direct + ETag |
+| Medium PDFs | 360 KB – 2 MB | nginx direct + ETag + rate limit |
+| Large PDFs | > 2 MB | nginx direct + strict rate limit; never Redis |
+| LGPD-restricted | Any | Django `serve_media` → `X-Accel-Redirect` → nginx (access control enforced) |
+| Public `/media/` | Any | Django `serve_media` → `X-Accel-Redirect` → nginx (middleware runs; path counter written) |
+
+### Why Redis is not needed for PDFs
+
+With the full mitigation stack active:
+- **ASN blocking** drops datacenter bot traffic at nginx (zero Python cost)
+- **UA blocking** drops known-UA bots at nginx (zero Python cost)
+- **Shared Redis rate counters** enforce limits across all pods
+- **ETags** convert repeat requests to 304 responses with zero bytes transferred
+- **`sendfile on`** means disk reads bypass userspace entirely
+
+Redis PDF caching would solve "high request volume reaching the file layer" — but that problem no longer exists once the above stack is active.  For `Brasão - Foz do Iguaçu.png` (392 KB × 14,512 requests = 5.6 GB), a 50% conditional-request hit rate saves ~2.8 GB immediately — without any Redis.
+
+---
+
+## Key schema reference
+
+| DB | Use case | Key pattern | TTL | Threshold | Constant |
+|----|----------|-------------|-----|-----------|----------|
+| 0 | Page / view cache | `cache:{ns}:*` | 300 s (default) | — | `CACHES['default']` KEY_PREFIX |
+| 0 | Static file cache (logos) | `static:{ns}:{sha256}` | 3 – 24 h | — | *Future* (requires OpenResty/Lua) |
+| 0 | File content cache (≤ 360 KB) | `file:{ns}:{sha256}` | 1 h | — | *Future* |
+| 1 | IP rate-limit counter | `rl:ip:{ip}:reqs` | 60 s | 35 (`RATE_LIMITER_RATE`) | `RL_IP_REQUESTS` |
+| 1 | IP blocked marker | `rl:ip:{ip}:blocked` | 300 s | — | `RL_IP_BLOCKED` |
+| 1 | User rate-limit counter | `rl:{ns}:user:{uid}:reqs` | 60 s | 120 (`RATE_LIMITER_RATE_AUTHENTICATED`) | `RL_USER_REQUESTS` |
+| 1 | User blocked marker | `rl:{ns}:user:{uid}:blocked` | 300 s | — | `RL_USER_BLOCKED` |
+| 1 | Namespace/IP sliding window | `rl:{ns}:ip:{ip}:w:{bucket}` | 120 s | 35 (`RATE_LIMITER_RATE`) | `RL_NS_WINDOW` |
+| 1 | Path counter (`/media/`) | `rl:{ns}:path:{sha256}:reqs` | 60 s | — (observability only) | `RL_PATH_REQUESTS` |
+| 1 | Path counter (`/static/`) | `rl:{ns}:path:{sha256}:reqs` | 60 s | — | *Future* (requires OpenResty/Lua) |
+| 1 | UA deny list | `rl:bot:ua:blocked` | permanent SET | — (block on match) | `RL_UA_BLOCKLIST` |
+| 2 | Django Channels | `channels:*` | session TTL | — | *Future* |
+
+### What each counter catches — and misses
+
+**`rl:ip:{ip}:reqs` — global rolling IP counter**
+
+Catches: any sustained anonymous volume from a single IP regardless of namespace,
+path, or User-Agent — pure request rate.
+
+Misses: a user legitimately accessing several municipality SAPLs simultaneously;
+their requests accumulate across namespaces into one global count and may trip the
+threshold even though no individual SAPL is being abused.  Also misses a
+timing-aware scraper that paces exactly 34 req/min: the 60 s TTL resets from the
+first request, so the attacker can safely send 34, wait for reset, repeat forever.
+
+---
+
+**`rl:ip:{ip}:blocked` — IP short-circuit marker**
+
+Written when `rl:ip:{ip}:reqs` hits the anonymous threshold (step 4b) or when the
+namespace/IP bucket hits the threshold (step 4c).  Checked at step 2 — before any
+counting — so a blocked IP never increments any counter on subsequent requests.
+
+Catches: saves Redis INCR + EXPIRE calls for every request from an already-blocked
+IP; the 300 s TTL is a hard cooldown regardless of how many requests arrive.
+
+Misses: the TTL is fixed — a persistent attacker simply waits 300 s and gets
+another full window quota.  Also, because the key is global (no namespace), an IP
+blocked for one municipal SAPL is blocked for all SAPLs on the same pod —
+collateral effect for shared IPs.
+
+---
+
+**`rl:{ns}:ip:{ip}:w:{bucket}` — namespace-scoped clock-aligned bucket**
+
+Catches: sustained scraping against a *specific* municipal SAPL that stays just
+under the global threshold; a scraper pacing 34 req/min globally across namespaces
+still accumulates in the per-namespace bucket.  Clock alignment (bucket =
+`time() // 60`) means a burst straddling a minute boundary still contributes to
+the *next* bucket for 120 s (2× TTL), making precise timing attacks harder.
+
+Misses: an IP that floods one namespace to exactly 34 req/min: it never reaches 35
+in the bucket either.  Cross-namespace legitimate traffic that happens to land
+within the same clock minute — same blind spot as `rl:ip:*` but scoped lower.
+
+**Why this key is namespace-scoped**
+
+Five arguments for `rl:{ns}:ip:{ip}:w:{bucket}` over a global `rl:ip:{ip}:w:{bucket}`:
+
+1. **Matches the observed attack pattern.** The botnet in §Bot Traffic Profile targets one SAPL at a time, not the fleet evenly. A scraper hammering `fortaleza-ce` at 34 req/min has a namespace counter of 34 and a global counter of 34. Without the namespace the two keys are redundant — the window adds no new signal. With it, a scraper that legitimately distributes across 5 SAPLs (7 req/min each, 35 globally) is caught globally but *not* per-SAPL — correct behaviour, since no single SAPL is being abused.
+
+2. **Two counters defeat two different gaming strategies.** `rl:ip:{ip}:reqs` uses a rolling TTL (starts on the first INCR). A scraper that knows this can send 34 requests, wait ~61 s for the key to expire, and repeat indefinitely. The clock-aligned window resets at wall-clock minute boundaries. To game *both* simultaneously the attacker must time bursts to expire the rolling key *and* land entirely within one clock window — two independent constraints that are hard to satisfy together.
+
+3. **Without the namespace it duplicates the global counter.** All pods share the same Redis. A global `rl:ip:{ip}:w:{bucket}` would aggregate that IP's traffic from every pod — exactly what `rl:ip:{ip}:reqs` already does, just with different reset timing. Two keys measuring the same dimension is wasted INCR overhead with no added signal.
+
+4. **Multi-SAPL legitimate IPs are not penalised.** Municipal IT departments, ISP shared exit nodes, and Googlebot all produce high global request rates while being individually harmless to any one SAPL. A namespaced window lets them access 10 SAPLs at 3 req/min each without triggering a per-SAPL block, while the global counter still catches them if their total rate is abusive.
+
+5. **Consistent with the established `{ns}` isolation contract.** All user-keyed (`rl:{ns}:user:{uid}:*`) and path-keyed (`rl:{ns}:path:{sha256}:reqs`) entries are namespace-scoped. A global window key would break the invariant that per-tenant data is isolated — complicating key-space inspection, `SCAN`-based dashboards, and future per-tenant rate adjustments.
+
+---
+
+**`rl:{ns}:user:{uid}:reqs` — authenticated user counter**
+
+Catches: an authenticated account being used as a scraping credential — even if
+the requests come from many different IPs (e.g., distributed proxy pool), all
+requests share the same `uid` and accumulate in one counter.
+
+Misses: a credential that is shared across multiple legitimate users in the same
+office; all their activity adds up to one counter and can trip the 120/min
+threshold during a busy session.
+
+---
+
+**`rl:{ns}:user:{uid}:blocked` — authenticated user short-circuit marker**
+
+Written when `rl:{ns}:user:{uid}:reqs` hits the authenticated threshold (step 3c).
+Checked at step 3a — before counting — so a blocked user never increments their
+counter on subsequent requests during the 300 s cooldown.
+
+Catches: credential-stuffing or runaway automation using a valid session — once the
+120/min threshold is hit, the account is locked out immediately for 300 s.  Unlike
+the IP marker, the block is namespace-scoped, so the same user account can be
+blocked on one SAPL but still active on another.
+
+Misses: same fixed-TTL weakness as the IP marker — a persistent attacker resumes
+after 300 s.  An account shared by multiple legitimate users (e.g., a departmental
+login) can be locked out during peak collaborative use.
+
+---
+
+**`rl:{ns}:path:{sha256}:reqs` — per-media-file URL counter**
+
+Currently observability-only (no threshold enforced).  Intended for future
+hot-file detection: a single document being hammered by many IPs would show
+a spike in this counter even if no individual IP exceeds the IP threshold.
+
+Misses: nothing is blocked today.  Once a threshold is added, it will miss
+distributed access where many IPs each download the file once (legitimate CDN
+pre-warming or public interest event).
+
+---
+
+**`rl:bot:ua:blocked` — runtime UA deny list**
+
+Catches: new bot UA tokens added at runtime via `redis-cli SADD` without a code
+deploy; picked up within `RATE_LIMITER_UA_BLOCKLIST_REFRESH` seconds (default 60)
+per worker.  Complements the hardcoded `BOT_UA_FRAGMENTS` Python list.
+
+Misses: bots that rotate UA tokens on every request (no single token accumulates);
+bots that impersonate a valid browser UA completely (no known fragment to match).
+
+---
+
+## Dynamic page caching
+
+**Goal**: Eliminate ORM queries for anonymous bot requests on list views.
+**Prerequisite**: Phase 1 (shared Redis, `CACHE_BACKEND=redis`).
+
+Many SAPL list views (`pesquisar-materia`, `norma`, etc.) are not truly dynamic for anonymous users between edits.  A bot hammering `?page=1` through `?page=100` triggers 100 ORM queries per pod.  With Redis page cache, each unique URL is queried once per TTL across the entire fleet.
+
+```python
+# Apply to anonymous list views only — AnonCachePageMixin already wired to materia/sessao detail views.
+from django.views.decorators.cache import cache_page
+from django.utils.decorators import method_decorator
+
+@method_decorator(cache_page(60 * 5), name='dispatch')  # 5-minute TTL
+class PesquisarMateriaView(FilterView):
+    ...
+```
+
+> **Safety check**: `cache_page` sets `Cache-Control: private` for authenticated sessions automatically.
+> Verify this is working before deploying — accidentally caching a session-aware response is a data leak.
+
+### Cache TTL guidelines
+
+| View type | TTL | Reasoning |
+|-----------|-----|-----------|
+| Matéria list (anonymous) | 300 s | Changes infrequently between sessions |
+| Norma list (anonymous) | 300 s | Same |
+| Parlamentar list | 3600 s | Changes rarely |
+| Search results | 60 s | Query-dependent; shorter TTL safer |
+| Authenticated views | Never | `cache_page` respects this automatically |
+| PDF generation | Never | Too large — serve from disk via nginx |
+
+---
+
+## Open Questions
+
+| # | Question | Status | Blocks |
+|---|----------|--------|--------|
+| 1 | Does Chrome/98.0.4758 impersonator appear consistently in nginx access logs? | Needs investigation | UA block safety |
+| 2 | Which legislative house IPs can be pre-whitelisted in `RATE_LIMIT_WHITELIST_IPS`? | No list yet — obtain in the future.  Setting is **optional / future**. | Enforcement safety for NAT users |
+| 3 | `CONN_MAX_AGE` tuning | Currently **300 s** (`sapl/settings.py`). Evaluate whether to reduce given worker recycling at 400 MB. | Gunicorn tuning |
+| 4 | WebSocket voting panel priority | Separate project.  Resumes after Redis is on k8s, bot siege addressed, and OOM pressure reduced. | Phase 5 sequencing |
--- a/plan/RATE_LIMITER_PLAN.md
+++ b/plan/RATE_LIMITER_PLAN.md
@ -1,474 +0,0 @@
-# SAPL — Kubernetes Redis
-
-Manifests for the shared Redis instance used by all SAPL pods for
-cross-pod rate limiting (DB 1) and view/static-file caching (DB 0).
-
---
-
-## Directory layout
-
-```
-docker/k8s/
-└── redis/
-    ├── redis-configmap.yaml    # redis.conf — no persistence, allkeys-lru, 5 GB ceiling
-    ├── redis-deployment.yaml   # Deployment (1 replica, redis:7-alpine)
-    └── redis-service.yaml      # ClusterIP service on port 6379
-```
-
---
-
-## Prerequisites
-
- `kubectl` configured to talk to the target cluster.
- A `sapl-redis` namespace (created below if it doesn't exist).
-
---
-
-## Deploy
-
-```bash
-# 1. Create the namespace (idempotent)
-rancher kubectl create namespace sapl-redis --dry-run=client -o yaml | rancher kubectl apply -f -
-
-# 2. Apply all three manifests
-rancher kubectl apply -f docker/k8s/redis/redis-configmap.yaml
-rancher kubectl apply -f docker/k8s/redis/redis-deployment.yaml
-rancher kubectl apply -f docker/k8s/redis/redis-service.yaml
-
-# 3. Verify the pod is Running
-rancher kubectl -n sapl-redis get pods -l app=sapl-redis
-```
-
-Expected output:
-```
-NAME                          READY   STATUS    RESTARTS   AGE
-sapl-redis-6d9f8b7c4d-xk2lm   1/1     Running   0          30s
-```
-
---
-
-## Verify the rate limiter
-
-`scripts/test_ratelimiter.py` fires repeated GET requests at a SAPL URL and reports
-when the first 429 is returned.
-
-### Usage
-
-```
-python scripts/test_ratelimiter.py <URL> [-n NUM] [-d DELAY] [-t TIMEOUT]
-```
-
-| Flag | Default | Meaning |
-|------|---------|---------|
-| `url` | *(required)* | Full URL including scheme, e.g. `http://localhost` |
-| `-n`, `--num-requests` | `50` | Maximum requests to send |
-| `-d`, `--delay` | `0.1` | Seconds between requests |
-| `-t`, `--timeout` | `10` | Per-request timeout in seconds |
-
-The script stops and prints a summary as soon as a 429 is received.
-
-### Examples
-
-```bash
-# Hit the anonymous threshold (35 req/min) — fire 40 requests with minimal delay
-python scripts/test_ratelimiter.py http://localhost -n 40 -d 0.05
-
-# Slower fire — check that legitimate traffic is not rate-limited
-python scripts/test_ratelimiter.py http://localhost -n 20 -d 2
-
-# Test against a staging pod via port-forward
-rancher kubectl port-forward -n <NAMESPACE> deploy/sapl 8080:80 &
-python scripts/test_ratelimiter.py http://localhost:8080 -n 40 -d 0.05
-```
-
-### Reading the output
-
-```
-Request   1: Status 200 | Time: 0.045s
-...
-Request  36: Status 429 | Time: 0.038s
-  -> Rate limited on request 36
-
-Summary:
-  Total requests attempted: 36
-  Successful (200):          35
-  Rate limited (429):        1
-  First 429 occurred at request: 36
-```
-
-A first-429 near the configured anonymous threshold (35 req/min) confirms the
-middleware is wired correctly. A first-429 much earlier points to nginx `limit_req`
-firing before Django sees the request.
-
---
-
-## Inject REDIS_URL into SAPL instances
-
-`REDIS_URL` points at the shared instance:
-
-```
-redis://redis.sapl-redis.svc.cluster.local:6379
-         ^^^^^  ^^^^^^^^^^
-         svc    namespace
-```
-
-`start.sh` picks it up on every pod startup and sets the `REDIS_CACHE` waffle switch
-automatically — no further intervention needed.
-
-### Fleet-wide rollout
-
-Uses the `app.kubernetes.io/name=sapl` pod label to discover every SAPL namespace
-automatically — onboarding a new municipality requires no script changes.
-
-```bash
-for ns in $(rancher kubectl get pods -A -l app.kubernetes.io/name=sapl \
-  -o jsonpath='{.items[*].metadata.namespace}' | tr ' ' '\n' | sort -u); do
-  rancher kubectl set env deployment/sapl \
-    REDIS_URL=redis://redis.sapl-redis.svc.cluster.local:6379 \
-    -n $ns
-done
-```
-
-### Roll back
-
-```bash
-for ns in $(rancher kubectl get pods -A -l app.kubernetes.io/name=sapl \
-  -o jsonpath='{.items[*].metadata.namespace}' | tr ' ' '\n' | sort -u); do
-  rancher kubectl set env deployment/sapl REDIS_URL- -n $ns
-done
-```
-
-`kubectl set env deployment/sapl REDIS_URL-` (trailing `-`) removes the variable.
-`start.sh` then falls back to file-based cache automatically.
-
---
-
-## Monitor
-
-### Pod and events
-
-```bash
-# Pod status
-rancher kubectl -n sapl-redis get pods -l app=sapl-redis -o wide
-
-# Deployment events (useful right after apply)
-rancher kubectl -n sapl-redis describe deployment sapl-redis
-
-# Pod events (OOMKill, restarts, etc.)
-rancher kubectl -n sapl-redis describe pod -l app=sapl-redis
-```
-
-### Logs
-
-```bash
-# Tail live logs
-rancher kubectl -n sapl-redis logs -f deploy/sapl-redis
-
-# Last 100 lines
-rancher kubectl -n sapl-redis logs deploy/sapl-redis --tail=100
-```
-
-### Redis INFO
-
-```bash
-# Memory usage
-rancher kubectl exec -n sapl-redis deploy/sapl-redis -- \
-  redis-cli info memory \
-  | grep -E 'used_memory_human|maxmemory_human|mem_fragmentation_ratio'
-
-# Connection pressure
-rancher kubectl exec -n sapl-redis deploy/sapl-redis -- \
-  redis-cli info stats \
-  | grep -E 'rejected_connections|instantaneous_ops_per_sec'
-
-# Key distribution per DB
-rancher kubectl exec -n sapl-redis deploy/sapl-redis -- redis-cli info keyspace
-
-# Recent slow queries
-rancher kubectl exec -n sapl-redis deploy/sapl-redis -- redis-cli slowlog get 10
-
-# Live command sampling (1-second window)
-rancher kubectl exec -n sapl-redis deploy/sapl-redis -- redis-cli --latency-history -i 1
-```
-
-### Rate-limiter keys (DB 1)
-
-```bash
-rancher kubectl exec -n sapl-redis deploy/sapl-redis -- \
-  redis-cli -n 1 dbsize
-
-rancher kubectl exec -n sapl-redis deploy/sapl-redis -- \
-  redis-cli -n 1 --scan --pattern 'rl:ip:*' | head -20
-```
-
---
-
-## Seed the UA deny list (once after first deploy)
-
-`rl:bot:ua:blocked` is a permanent Redis SET in DB 1.  Each member is the
-SHA-256 of a **UA token** — the identifying fragment extracted after splitting
-on `/`, spaces, `;`, `(`, `)`, e.g.:
-
-```
-UA string:  "GPTBot/1.1 (+https://openai.com/gptbot)"
-Tokens:      GPTBot  1.1  +https:  ...
-Hash stored: sha256("GPTBot")
-```
-
-The middleware (`_is_redis_blocked_ua`) tokenises the incoming UA the same
-way and checks each token hash against the cached set.  The SET is fetched
-from Redis at most once per `RATE_LIMITER_UA_BLOCKLIST_REFRESH` seconds (default 60)
-per worker process.
-
-The bots in `BOT_UA_FRAGMENTS` (Python list, always active) and this Redis
-SET are **independent** — the Python list provides the baseline and the Redis
-SET allows adding new offenders at runtime **without a code deploy**.
-
-```bash
-rancher kubectl exec -n sapl-redis deploy/sapl-redis -- redis-cli -n 1 \
-  SADD rl:bot:ua:blocked \
-    "$(echo -n 'GPTBot'             | sha256sum | cut -d' ' -f1)" \
-    "$(echo -n 'ClaudeBot'          | sha256sum | cut -d' ' -f1)" \
-    "$(echo -n 'PerplexityBot'      | sha256sum | cut -d' ' -f1)" \
-    "$(echo -n 'Bytespider'         | sha256sum | cut -d' ' -f1)" \
-    "$(echo -n 'AhrefsBot'          | sha256sum | cut -d' ' -f1)" \
-    "$(echo -n 'meta-externalagent' | sha256sum | cut -d' ' -f1)"
-
-# Add a new offender at runtime (picked up within RATE_LIMITER_UA_BLOCKLIST_REFRESH seconds)
-rancher kubectl exec -n sapl-redis deploy/sapl-redis -- redis-cli -n 1 \
-  SADD rl:bot:ua:blocked "$(echo -n 'NewBot' | sha256sum | cut -d' ' -f1)"
-```
-
---
-
-## Local standalone Redis (development / testing)
-
-No Kubernetes? Run Redis directly with Docker:
-
-```bash
-sudo docker run --rm -p 6379:6379 redis:7-alpine \
-  redis-server --save "" --appendonly no
-```
-
-Then point Django at it by exporting the env var before starting the dev server:
-
-```bash
-export REDIS_URL="redis://localhost:6379"
-export CACHE_BACKEND="redis"
-python manage.py runserver
-```
-
-Or add them to your local `.env` file:
-
-```
-REDIS_URL=redis://localhost:6379
-CACHE_BACKEND=redis
-```
-
-> **Note**: the waffle switch `REDIS_CACHE` must also be `on` in your local
-> database for `start.sh` to activate the Redis backend. Run:
-> ```bash
-> python manage.py waffle_switch REDIS_CACHE on --create
-> ```
-
---
-
-## Update `redis.conf` without redeploying
-
-```bash
-# Edit the ConfigMap
-rancher kubectl -n sapl-redis edit configmap redis-config
-
-# Restart the pod to pick up the new config
-rancher kubectl -n sapl-redis rollout restart deployment/sapl-redis
-```
-
---
-
-## Rate limiting — two layers, two jobs
-
-SAPL enforces rate limits at two independent layers. They use different
-algorithms and protect different things; their thresholds must be tuned
-separately.
-
-### Layer 1 — nginx `limit_req` (leaky bucket)
-
-Defined in `docker/config/nginx/nginx.conf` (zones) and `sapl.conf` (burst).
-
-```
-sapl_general  rate=30r/m   # 1 token every 2 s
-sapl_heavy    rate=10r/m   # 1 token every 6 s  (PDF/report endpoints)
-```
-
-`burst=N nodelay` means nginx accepts up to N requests instantly above the
-current token level, then enforces the drip rate.  Requests beyond the burst
-cap return 429 before reaching Gunicorn — **zero Python cost**.
-
-Burst values are set at container startup via env vars:
-
-| Env var | Default | Location |
-|---------|---------|----------|
-| `NGINX_BURST_GENERAL` | `60` | `location /`, `location /media/` |
-| `NGINX_BURST_API` | `60` | `location /api/` |
-| `NGINX_BURST_HEAVY` | `20` | `location /relatorios/` |
-
-Defaults are 2× the zone's per-minute rate, so a user can spend a full
-minute's quota in a single burst before the leaky bucket takes over.
-
-### Layer 2 — Django `RateLimitMiddleware` (sliding window)
-
-Defined in `sapl/middleware/ratelimit.py`, backed by Redis DB 1.
-
-Requests that pass nginx reach Python.  The middleware counts them in a
-60-second sliding window per IP (anonymous) or per user (authenticated):
-
-| Env var | Default | Scope |
-|---------|---------|-------|
-| `RATE_LIMITER_RATE` | `35/m` | Anonymous IP |
-| `RATE_LIMITER_RATE_AUTHENTICATED` | `120/m` | Authenticated user |
-| `RATE_LIMITER_RATE_BOT` | `5/m` | *(reserved — bots are currently blocked outright, not counted)* |
-| `RATE_LIMITER_UA_BLOCKLIST_REFRESH` | `60` s | How often each worker re-fetches `rl:bot:ua:blocked` from Redis |
-
-When the window count hits the threshold the IP/user is written to a Redis
-blocked-set with a 300 s TTL and subsequent requests return 429 with
-`Retry-After: 300` — without touching the database.
-
-Decision flow inside `RateLimitMiddleware._evaluate()`:
-
-```
-1.  IP in whitelist?                          → pass (no further checks)
-1a. UA matches BOT_UA_FRAGMENTS list?         → 429  reason=known_ua
-1b. UA token hash in rl:bot:ua:blocked SET?   → 429  reason=redis_ua
-2.  IP in rl:ip:{ip}:blocked?                 → 429  reason=ip_blocked
-3.  Authenticated user?
-    3a. User in rl:{ns}:user:{uid}:blocked?   → 429  reason=user_blocked
-    3b. Suspicious headers (no Accept/AL)?    → 429  reason=suspicious_headers_auth
-    3c. User request count ≥ auth threshold?  → SET blocked, 429  reason=auth_user_rate
-4.  Anonymous:
-    4a. Suspicious headers?                   → 429  reason=suspicious_headers
-    4b. IP request count ≥ anon threshold?    → SET blocked, 429  reason=ip_rate
-    4c. NS/IP window count ≥ anon threshold?  → SET blocked, 429  reason=ua_rotation
-    → pass
-```
-
-### Why they are not the same number
-
-| | nginx burst | Django threshold |
-|-|------------|-----------------|
-| **Algorithm** | Leaky bucket — token refills over time | Sliding window — hard count per 60 s |
-| **Protects** | Gunicorn workers from being flooded | Per-client fairness, business policy |
-| **Tuned by** | Capacity of the server | Acceptable request volume per client |
-| **Failure mode** | Workers overwhelmed | Legitimate user over-browsing |
-
-A user loading a page quickly may fire 5–10 Django requests in two seconds.
-With `rate=30r/m` (1 token/2 s) and `burst=60` they absorb that fine; the
-leaky bucket refills before they click the next link.  The Django threshold
-(35/m sliding window) catches sustained automated traffic from a single IP
-that looks like scraping even if it arrives slowly enough to beat the nginx
-burst cap.
-
---
-
-## Request routing — how nginx reaches Django
-
-`proxy_pass http://sapl_server` forwards the HTTP request — with the original
-path intact — to the Gunicorn Unix socket.  Django doesn't know or care that
-nginx is in front; it sees a standard HTTP request.
-
-```
-GET /media/foo.pdf
-      │
-      ▼
-   nginx (sapl.conf)
-   location /media/ → proxy_pass to Unix socket
-      │
-      ▼
-   Gunicorn (WSGI server)
-   receives raw HTTP, calls Django WSGI application
-      │
-      ▼
-   Django middleware stack (settings.MIDDLEWARE)
-   RateLimitMiddleware → pass or 429
-      │
-      ▼
-   Django URL router (sapl/urls.py)
-   r'^media/(?P<path>.*)$' → serve_media
-      │
-      ▼
-   serve_media(request, path='foo.pdf')
-   returns HttpResponse with X-Accel-Redirect: /_accel/media/foo.pdf
-      │
-      ▼
-   nginx sees X-Accel-Redirect header
-   /_accel/media/ internal location → reads file from disk → sends to client
-```
-
-nginx does no routing beyond picking a `location` block.  The mapping from
-URL path to Python function lives entirely in `sapl/urls.py`.  `proxy_pass` is
-just a pipe.
-
---
-
-## Media file serving — `serve_media` and X-Accel-Redirect
-
-All `/media/` requests (public and private) are routed through Gunicorn so that
-Django middleware runs on every hit.  Nginx serves the file bytes via
-`X-Accel-Redirect` — the Gunicorn worker is freed as soon as it sends the
-response headers.
-
-### nginx locations (`docker/config/nginx/sapl.conf`)
-
-```nginx
-# Proxied to Gunicorn — Django middleware + serve_media() run here.
-location /media/ {
-    limit_req zone=sapl_general burst=${NGINX_BURST_GENERAL} nodelay;
-    proxy_pass http://sapl_server;
-}
-
-# Internal — only reachable via X-Accel-Redirect, not by external clients.
-location /_accel/media/ {
-    internal;
-    alias /var/interlegis/sapl/media/;
-    sendfile on;
-    etag on;
-}
-```
-
-### Django view (`sapl/base/media.py`)
-
-`serve_media(request, path)` — registered at `^media/(?P<path>.*)$` in `sapl/urls.py`.
-
-Per-request steps:
-
-1. **Path traversal guard** — `os.path.abspath` check; raises 404 on escape.
-2. **Auth gate** — `documentos_privados/` paths require an authenticated session; redirects to login otherwise.
-3. **Path counter** — increments `rl:{ns}:path:{sha256}:reqs` in Redis DB 1 (TTL = `MEDIA_PATH_COUNTER_TTL`).
-4. **Content-type cache** — reads `file:{ns}:{sha256}` from Django default cache (DB 0); on miss, calls `mimetypes.guess_type`, stores result (TTL = `MEDIA_FILE_CACHE_TTL`).
-5. **Serve** — in DEBUG: `django.views.static.serve` directly.  In production: `X-Accel-Redirect: /_accel/media/<path>`.
-
-### Settings
-
-| Setting | Default | Purpose |
-|---------|---------|---------|
-| `FILE_META_KEY` | `'file:{ns}:{sha256}'` | Key template for content-type cache (DB 0) |
-| `MEDIA_PATH_COUNTER_TTL` | `60` s | Per-path counter window |
-| `MEDIA_FILE_CACHE_TTL` | `3600` s | Content-type metadata TTL |
-
---
-
-## Key schema reference
-
-| DB | Use case | Key pattern | TTL | Constant |
-|----|----------|-------------|-----|----------|
-| 0 | Page / view cache | `cache:{ns}:*` | 300 s (default) | `CACHES['default']` KEY_PREFIX |
-| 0 | Static file cache (logos) | `static:{ns}:{sha256}` | 3 – 24 h | *Future* (requires OpenResty/Lua) |
-| 0 | Media file content-type cache | `file:{ns}:{sha256}` | 1 h | `FILE_META_KEY` |
-| 1 | IP rate-limit counter | `rl:ip:{ip}:reqs` | 60 s | `RL_IP_REQUESTS` |
-| 1 | IP blocked marker | `rl:ip:{ip}:blocked` | 300 s | `RL_IP_BLOCKED` |
-| 1 | User rate-limit counter | `rl:{ns}:user:{uid}:reqs` | 60 s | `RL_USER_REQUESTS` |
-| 1 | User blocked marker | `rl:{ns}:user:{uid}:blocked` | 300 s | `RL_USER_BLOCKED` |
-| 1 | Namespace/IP sliding window | `rl:{ns}:ip:{ip}:w:{bucket}` | 120 s | `RL_NS_WINDOW` |
-| 1 | Path counter (`/media/`) | `rl:{ns}:path:{sha256}:reqs` | 60 s | `RL_PATH_REQUESTS`  |
-| 1 | Path counter (`/static/`) | `rl:{ns}:path:{sha256}:reqs` | 60 s | *Future* (requires OpenResty/Lua) |
-| 1 | UA deny list | `rl:bot:ua:blocked` | permanent SET | `RL_UA_BLOCKLIST` |
-| 2 | Django Channels | `channels:*` | session TTL | *Future* |
--- a/sapl/base/media.py
+++ b/sapl/base/media.py
@ -4,34 +4,26 @@ serve_media — X-Accel-Redirect gate for all /media/ files.
 Production flow (nginx proxies /media/ to Gunicorn):
  1. Django middleware runs (IP rate-limit, bot UA check, etc.).
  2. serve_media() runs auth check for documentos_privados/, writes
-     per-path counter to Redis DB 1, caches content-type in Redis DB 0.
-  3. Returns an empty 200 with X-Accel-Redirect pointing to the nginx
-     internal location /_accel/media/<path>.  Nginx serves the bytes
-     directly from disk — Gunicorn worker is freed immediately.
+     URL-path counter to Redis DB 1, then returns X-Accel-Redirect.
+     Nginx serves the bytes directly from disk — Gunicorn worker freed immediately.

 Development flow (DEBUG=True, nginx absent):
  Falls back to django.views.static.serve for live file serving.

-Redis side-effects per request:
-  DB 1  rl:{ns}:path:{sha256}:reqs  — per-path access counter, TTL=MEDIA_PATH_COUNTER_TTL
-  DB 0  file:{ns}:{sha256}          — content-type metadata,   TTL=MEDIA_FILE_CACHE_TTL
-  (sha256 is of the URL path, e.g. sha256('/media/2024/01/doc.pdf'))
-  Key template: FILE_META_KEY (sapl/middleware/ratelimit.py); TTLs in sapl/settings.py
+Redis side-effects per request (DB 1, TTL=MEDIA_PATH_COUNTER_TTL):
+  rl:{ns}:path:{sha256('/media/<path>')}:reqs  — URL-path access counter
 """

 import hashlib
-import mimetypes
 import os

 from django.conf import settings
-from django.core.cache import caches
 from django.http import Http404, HttpResponse
 from django.views.static import serve

 from sapl import settings as sapl_settings
 from sapl.middleware.ratelimit import (
    _NAMESPACE,
-    FILE_META_KEY,
    RL_PATH_REQUESTS,
    _incr_with_ttl,
 )
@ -65,31 +57,23 @@ def serve_media(request, path):
            from django.contrib.auth.views import redirect_to_login
            return redirect_to_login(request.get_full_path())

-    # Per-path rate counter (DB 1) — key uses URL path so that storage
-    # location changes in the next PR don't reset existing counters.
-    path_hash = hashlib.sha256(f'/media/{path}'.encode()).hexdigest()
+    # 404 before writing any counters.
+    if not os.path.isfile(abs_path):
+        raise Http404
+
+    # URL-path counter (DB 1).
    _incr_with_ttl(
-        RL_PATH_REQUESTS.format(ns=_NAMESPACE, sha256=path_hash),
+        RL_PATH_REQUESTS.format(ns=_NAMESPACE, sha256=hashlib.sha256(f'/media/{path}'.encode()).hexdigest()),
        ttl=sapl_settings.MEDIA_PATH_COUNTER_TTL,
    )

-    # Content-type metadata cache (DB 0) — avoids mimetypes.guess_type
-    # and os.path.isfile on every hit for hot files.
-    file_key = FILE_META_KEY.format(ns=_NAMESPACE, sha256=path_hash)
-    content_type = caches['default'].get(file_key)
-    if content_type is None:
-        if not os.path.isfile(abs_path):
-            raise Http404
-        guessed, _ = mimetypes.guess_type(abs_path)
-        content_type = guessed or 'application/octet-stream'
-        caches['default'].set(file_key, content_type, timeout=sapl_settings.MEDIA_FILE_CACHE_TTL)
-
    if settings.DEBUG:
        # Development: no nginx present; serve the file directly.
        return serve(request, path, document_root=settings.MEDIA_ROOT)

    # Production: tell nginx to serve the file from the internal location.
-    response = HttpResponse(content_type=content_type)
+    # Nginx sets Content-Type from its own mime.types when serving the file.
+    response = HttpResponse()
    response['X-Accel-Redirect'] = f'/_accel/media/{path}'
    response['Cache-Control'] = 'public, max-age=86400, stale-while-revalidate=3600'
    response['X-Robots-Tag'] = 'noindex'
--- a/sapl/middleware/ratelimit.py
+++ b/sapl/middleware/ratelimit.py
@ -5,6 +5,7 @@ Decision flow (per request):
  1. Known bot UA?          → 429  (Python list — substring match)
  1b. Redis UA deny list?   → 429  (runtime SET — token hash match, refreshed every 60 s)
  2. IP in blocked set?     → 429
+  2b. Path extension in scanner set? → SET RL_IP_BLOCKED, 429
  3. Authenticated user?
       a. User blocked?     → 429
       b. Suspicious hdrs?  → 429
@ -27,6 +28,7 @@ no per-request lookup is needed or correct.

 import hashlib
 import logging
+import os
 import re
 import time

@ -55,7 +57,6 @@ RL_USER_BLOCKED = 'rl:{ns}:user:{uid}:blocked'
 RL_NS_WINDOW = 'rl:{ns}:ip:{ip}:w:{bucket}'
 RL_PATH_REQUESTS = 'rl:{ns}:path:{sha256}:reqs'
 RL_UA_BLOCKLIST = 'rl:bot:ua:blocked'  # permanent SET — runtime UA deny list
-FILE_META_KEY = 'file:{ns}:{sha256}'   # content-type metadata cache (DB 0)

 # ---------------------------------------------------------------------------
 # Bot UA fragments
@ -260,6 +261,12 @@ class RateLimitMiddleware:
        if self._rl_cache.get(RL_IP_BLOCKED.format(ip=ip)):
            return {'action': 'block', 'reason': 'ip_blocked', 'ip': ip}

+        # Check 2b: scanner probe (e.g. .php, .asp) — Django never serves these.
+        ext = os.path.splitext(request.path)[1].lower()
+        if ext in settings.RATE_LIMIT_SCANNER_EXTENSIONS:
+            self._rl_cache.set(RL_IP_BLOCKED.format(ip=ip), 1, timeout=self.BLOCK_TTL)
+            return {'action': 'block', 'reason': 'scanner_probe', 'ip': ip}
+
        user = getattr(request, 'user', None)
        if user is not None and user.is_authenticated:
            return self._evaluate_authenticated(request, ip)
--- a/sapl/settings.py
+++ b/sapl/settings.py
@ -415,9 +415,20 @@ RATE_LIMIT_WHITELIST_IPS = config(
 # Lower values pick up new blocked UAs faster; higher values reduce Redis round-trips.
 RATE_LIMITER_UA_BLOCKLIST_REFRESH = config('RATE_LIMITER_UA_BLOCKLIST_REFRESH', default=60, cast=int)

+# File extensions that indicate a scanner probe (e.g. PHP/ASP app fingerprinting).
+# Requests for these extensions are blocked immediately and the IP is written to
+# rl:ip:{ip}:blocked for BLOCK_TTL seconds — Django never legitimately serves them.
+RATE_LIMIT_SCANNER_EXTENSIONS = frozenset(
+    config(
+        'RATE_LIMIT_SCANNER_EXTENSIONS',
+        default='.php .asp .aspx .jsp .cgi .env',
+        cast=lambda v: [x.strip() for x in v.split() if x.strip()],
+    )
+)
+
 # Media file serving — serve_media (sapl/base/media.py) via X-Accel-Redirect.
-MEDIA_PATH_COUNTER_TTL = config('MEDIA_PATH_COUNTER_TTL', default=60, cast=int)    # seconds — per-path counter window
-MEDIA_FILE_CACHE_TTL = config('MEDIA_FILE_CACHE_TTL', default=3600, cast=int)      # seconds — content-type metadata TTL
+# TTL for both URL-path and storage-path access counters (DB 1).
+MEDIA_PATH_COUNTER_TTL = config('MEDIA_PATH_COUNTER_TTL', default=60, cast=int)

 # ---------------------------------------------------------------------------
 # Anonymous page caching — AnonCachePageMixin (sapl/middleware/page_cache.py)