Sistema de Apoio ao Processo Legislativo
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
 
 
 
 
 

5.9 KiB

SAPL — Work Queues & Real-Time: Async PDF + WebSocket Voting

Status: Planned follow-up mini-project.
Prerequisite: Redis A (cache + rate-limiter pod, rate-limiter-2026 branch) must be deployed to production, stable, and OOM pressure confirmed reduced before starting this work.
Scope: Django 2.2 / Gunicorn / Celery / Django Channels — same fleet of 1,200+ pods.


Table of Contents

  1. Context & Motivation
  2. Redis Topology for Work Queues
  3. Phase 1 — Async PDF via Celery
  4. Phase 2 — Django Channels (WebSocket Voting Panel)
  5. Open Questions

1. Context & Motivation

After rate-limiter-2026 ships:

Remaining pain point Current behaviour Target
PDF generation Holds a Gunicorn worker thread for the full build duration (10–60 s). Workers are at 400 MB cap — a PDF request burns one slot for up to a minute Enqueue via Celery; respond 202 immediately; worker is freed
WebSocket voting panel Not implemented; councillors use a polling page Persistent connection via Django Channels backed by Redis

2. Redis Topology for Work Queues

Critical constraint: Celery broker must be a separate Redis instance (Redis B) with noeviction policy.
Redis A (cache + rate-limiter) uses allkeys-lru — tasks enqueued there would be silently evicted under memory pressure, causing jobs to vanish without error.

Instance Role Eviction policy Persistence
Redis A (existing) Page cache (DB0), rate limiter (DB1), Django Channels (DB2) allkeys-lru none
Redis B (new) Celery broker + result backend noeviction AOF + RDB snapshot
# docker/k8s/redis-celery-configmap.yaml
data:
  redis.conf: |
    maxmemory-policy noeviction   # never evict tasks
    appendonly yes                 # AOF persistence ON
    save "900 1"                   # RDB snapshot every 15 min if ≥1 change
    databases 2                    # DB0 = broker queue, DB1 = result backend    

3. Phase 1 — Async PDF via Celery

3.1 Current (synchronous) flow

Holds worker memory for the entire PDF build:

sequenceDiagram
    participant B as Browser
    participant G as Gunicorn worker
    participant ORM as PostgreSQL
    participant RL as ReportLab

    B->>G: GET /pdf/materia/12345
    G->>ORM: N+1 queries (get_etiqueta_protocolos)
    ORM-->>G: data
    G->>RL: build entire PDF in RAM
    RL-->>G: PDF bytes (held in worker memory)
    G-->>B: stream response
    note over G: worker blocked for full duration

3.2 Target (async) flow

Worker freed immediately after enqueueing:

sequenceDiagram
    participant B as Browser
    participant G as Gunicorn worker
    participant Q as Redis B (Celery queue)
    participant W as Celery worker
    participant D as Disk / nginx

    B->>G: POST /pdf/materia/12345
    G->>Q: enqueue task
    G-->>B: 202 Accepted + task_id
    W->>W: build PDF (out of band)
    W->>D: write PDF to /media/pdf/task_id.pdf
    B->>G: GET /pdf/status/task_id
    G-->>B: 302 → nginx /media/pdf/task_id.pdf

3.3 Celery settings

# sapl/settings.py additions
CELERY_BROKER_URL     = config('CELERY_BROKER_URL',     default='')
CELERY_RESULT_BACKEND = config('CELERY_RESULT_BACKEND', default='')

# Soft limit: warn at 350 MB; hard limit: kill+restart at 450 MB.
# Keeps Celery workers inside the same memory envelope as Gunicorn workers.
CELERY_WORKER_MAX_MEMORY_PER_CHILD = 400 * 1024  # KB
CELERY_TASK_SOFT_TIME_LIMIT = 120   # seconds — warn
CELERY_TASK_TIME_LIMIT      = 180   # seconds — SIGKILL

3.4 k8s manifests

New files to be created under docker/k8s/:

  • redis-celery-configmap.yaml — Redis B config (noeviction, AOF)
  • redis-celery-deployment.yaml — single-replica Redis B pod
  • redis-celery-service.yaml — ClusterIP service
  • celery-deployment.yaml — Celery worker deployment (same image as SAPL)

3.5 Environment variables (per-namespace Secret)

Variable Example value Notes
CELERY_BROKER_URL redis://sapl-redis-celery.redis.svc:6379/0 Redis B, DB0
CELERY_RESULT_BACKEND redis://sapl-redis-celery.redis.svc:6379/1 Redis B, DB1

4. Phase 2 — Django Channels (WebSocket Voting Panel)

Uses Redis A DB2 (reserved in the existing key-layout table — no new infra needed beyond what ships in rate-limiter-2026).

4.1 Channel layer settings

# sapl/settings.py additions
CHANNEL_LAYERS = {
    "default": {
        "BACKEND": "channels_redis.core.RedisChannelLayer",
        "CONFIG": {
            "hosts": [("sapl-redis.redis.svc.cluster.local", 6379)],
            "db": 2,        # DB2 reserved for channels (see rate-limiter-v2.md §0.2)
            "capacity": 1500,
            "expiry": 10,
        },
    }
}

4.2 Prerequisites before starting

  • Redis A stable in production (rate limiter + cache confirmed working)
  • OOM kill rate reduced to near-zero
  • Bot siege resolved (Phase 0–2 metrics reviewed)
  • Decision on ASGI server (Daphne vs Uvicorn + channels) — Gunicorn alone cannot serve WebSockets

5. Open Questions

# Question Blocks
1 Which PDF endpoints are highest priority for async migration? (/relatorios/, /materia/pdf/, other)? Phase 1 scope
2 Should the Celery worker run in the same pod as Gunicorn (sidecar) or a dedicated deployment? Phase 1 k8s design
3 Result backend TTL — how long should generated PDFs be retained before cleanup? Phase 1 storage design
4 ASGI server selection for Channels (Daphne vs uvicorn + channels) Phase 2
5 WebSocket voting panel: is per-session or per-pod state acceptable? Phase 2 architecture

Planned work — begins after rate-limiter-2026 is stable in production.