mirror of https://github.com/interlegis/sapl.git
11 changed files with 975 additions and 219 deletions
@ -0,0 +1,151 @@ |
|||||
|
# CLAUDE.md |
||||
|
|
||||
|
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository. |
||||
|
|
||||
|
## Project Overview |
||||
|
|
||||
|
SAPL (Sistema de Apoio ao Processo Legislativo) is a Django-based legislative management system used by Brazilian municipal and state legislative houses. It manages bills, parliamentary sessions, committees, norms, protocols, and related legislative workflows. |
||||
|
|
||||
|
## Design references |
||||
|
|
||||
|
@/Users/eribeiro/projects/sapl-docs/rfc/files-metafields-impl.md |
||||
|
|
||||
|
|
||||
|
## Commands |
||||
|
|
||||
|
### Development |
||||
|
|
||||
|
```bash |
||||
|
# Run dev server |
||||
|
python manage.py runserver |
||||
|
|
||||
|
# Docker (dev, without bundled DB) |
||||
|
docker-compose -f docker/docker-compose-dev.yml up |
||||
|
|
||||
|
# Docker (dev, with PostgreSQL container) |
||||
|
docker-compose -f docker/docker-compose-dev-db.yml up |
||||
|
``` |
||||
|
|
||||
|
### Database Setup (local PostgreSQL) |
||||
|
|
||||
|
```bash |
||||
|
sudo -u postgres psql -c "CREATE ROLE sapl LOGIN ENCRYPTED PASSWORD 'sapl' NOSUPERUSER INHERIT CREATEDB NOCREATEROLE NOREPLICATION;" |
||||
|
sudo -u postgres psql -c "CREATE DATABASE sapl WITH OWNER=sapl ENCODING='UTF8' LC_COLLATE='pt_BR.UTF-8' LC_CTYPE='pt_BR.UTF-8' CONNECTION LIMIT=-1 TEMPLATE template0;" |
||||
|
python manage.py migrate |
||||
|
``` |
||||
|
|
||||
|
### Testing |
||||
|
|
||||
|
```bash |
||||
|
# All tests (reuses DB by default for speed) |
||||
|
pytest |
||||
|
|
||||
|
# Single test file or test function |
||||
|
pytest sapl/materia/tests/test_materia.py |
||||
|
pytest sapl/materia/tests/test_materia.py::test_function_name |
||||
|
|
||||
|
# Force DB recreation |
||||
|
pytest --create-db |
||||
|
|
||||
|
# With coverage |
||||
|
pytest --cov=sapl |
||||
|
``` |
||||
|
|
||||
|
Tests require `DJANGO_SETTINGS_MODULE=sapl.settings` (set in `pytest.ini`). All tests must be marked with `@pytest.mark.django_db`. The `conftest.py` root fixture provides an `app` fixture (WebTest `DjangoTestApp`). |
||||
|
|
||||
|
### Linting / Formatting |
||||
|
|
||||
|
```bash |
||||
|
flake8 . |
||||
|
isort . |
||||
|
autopep8 --in-place <file.py> |
||||
|
``` |
||||
|
|
||||
|
### Restore Database from Backup |
||||
|
|
||||
|
```bash |
||||
|
./scripts/restore_db.sh -f /path/to/dump |
||||
|
./scripts/restore_db.sh -f /path/to/dump -p 5433 # Docker port |
||||
|
``` |
||||
|
|
||||
|
## Architecture |
||||
|
|
||||
|
### Django Apps |
||||
|
|
||||
|
Apps are under `sapl/` and follow domain boundaries: |
||||
|
|
||||
|
| App | Domain | |
||||
|
|-----|--------| |
||||
|
| `base` | `CasaLegislativa` (legislative house config), `AppConfig`, `Autor` (authorship) | |
||||
|
| `parliamentary` | `Parlamentar`, `Legislatura`, `SessaoLegislativa`, `Coligacao` | |
||||
|
| `materia` | Bills (`MateriaLegislativa`), types, tracking, annexes | |
||||
|
| `norma` | Laws/norms (`NormaJuridica`) and hierarchies | |
||||
|
| `sessao` | Plenary sessions, agenda, attendance, voting | |
||||
|
| `comissoes` | Committees (`Comissao`) and meetings (`Reuniao`) | |
||||
|
| `protocoloadm` | Administrative protocols and document intake | |
||||
|
| `compilacao` | Structured/articulated texts (LexML-like tree structure) | |
||||
|
| `lexml` | LexML XML standard integration | |
||||
|
| `audiencia` | Public hearings | |
||||
|
| `painel` | Real-time session display panel | |
||||
|
| `relatorios` | PDF report generation | |
||||
|
| `api` | REST API entry point (auto-generated ViewSets) | |
||||
|
| `crud` | Generic CRUD base views | |
||||
|
| `rules` | Business rules and permission definitions | |
||||
|
|
||||
|
### REST API |
||||
|
|
||||
|
The API uses a custom `drfautoapi` package (`drfautoapi/drfautoapi.py`) that auto-generates DRF ViewSets, Serializers, and FilterSets from Django models. Authentication is Token + Session. Permissions use a custom `SaplModelPermissions` class that maps HTTP methods to Django model permissions. |
||||
|
|
||||
|
OpenAPI 3.0 docs are generated by drf-spectacular. |
||||
|
|
||||
|
### Caching |
||||
|
|
||||
|
- **Default:** File-based (`/var/tmp/django_cache`) |
||||
|
- **Production:** Redis via django-redis; configured at startup by `configure_redis_cache()` in `sapl/settings.py` |
||||
|
- **Cache key prefix:** `cache:{POD_NAMESPACE}:` (namespace-isolated for multi-tenant k8s) |
||||
|
- **Rate limiter state** is shared via Redis keys |
||||
|
|
||||
|
### Feature Flags |
||||
|
|
||||
|
django-waffle is used for feature flags. Switches (global on/off) can be toggled via: |
||||
|
|
||||
|
```bash |
||||
|
python manage.py waffle_switch <switch_name> on|off |
||||
|
``` |
||||
|
|
||||
|
### Key Environment Variables |
||||
|
|
||||
|
| Variable | Purpose | |
||||
|
|----------|---------| |
||||
|
| `DATABASE_URL` | PostgreSQL connection string | |
||||
|
| `SECRET_KEY` | Django secret key | |
||||
|
| `DEBUG` | Debug mode | |
||||
|
| `REDIS_URL` | Redis host:port | |
||||
|
| `CACHE_BACKEND` | `file` or `redis` | |
||||
|
| `POD_NAMESPACE` | K8s namespace (used in cache key prefix) | |
||||
|
| `USE_SOLR` | Enable Haystack/Solr full-text search | |
||||
|
| `SOLR_URL` / `SOLR_COLLECTION` | Solr connection | |
||||
|
|
||||
|
### Docker Build |
||||
|
|
||||
|
The production build requires a MaxMind GeoLite2-ASN license key (for nginx ASN-based bot blocking): |
||||
|
|
||||
|
```bash |
||||
|
docker build --secret id=maxmind_key,src=.env -f docker/Dockerfile -t sapl:local . |
||||
|
``` |
||||
|
|
||||
|
Optional build args: `WITH_NGINX`, `WITH_GRAPHVIZ`, `WITH_POPPLER`, `WITH_PSQL_CLIENT`. |
||||
|
|
||||
|
### Key File Locations |
||||
|
|
||||
|
| File | Purpose | |
||||
|
|------|---------| |
||||
|
| `sapl/settings.py` | All Django settings, including cache/rate-limit setup | |
||||
|
| `pytest.ini` | Test configuration (DJANGO_SETTINGS_MODULE, addopts) | |
||||
|
| `conftest.py` | Root pytest fixtures | |
||||
|
| `drfautoapi/drfautoapi.py` | Auto-API generation logic | |
||||
|
| `docker/startup_scripts/start.sh` | Container entrypoint (migrations, waffle, gunicorn) | |
||||
|
| `requirements/requirements.txt` | Production deps | |
||||
|
| `requirements/test-requirements.txt` | Test deps | |
||||
|
| `requirements/dev-requirements.txt` | Dev/lint deps | |
||||
|
|
||||
@ -0,0 +1,111 @@ |
|||||
|
""" |
||||
|
Phase 2 backfill — fills file_size_bytes and content_hash for FileMetadata rows |
||||
|
where backfilled_at IS NULL. I/O-bound; designed to run at low priority over |
||||
|
days or weeks without affecting production traffic. |
||||
|
|
||||
|
Safe to interrupt and resume: sets backfilled_at on each completed row. |
||||
|
|
||||
|
Usage: |
||||
|
python manage.py backfill_file_metadata_hashes [--batch-size=200] [--rate-limit=20] |
||||
|
|
||||
|
Progress query: |
||||
|
SELECT |
||||
|
COUNT(*) FILTER (WHERE backfilled_at IS NULL) AS pending, |
||||
|
COUNT(*) FILTER (WHERE backfilled_at IS NOT NULL) AS done |
||||
|
FROM base_file_metadata; |
||||
|
""" |
||||
|
import hashlib |
||||
|
import os |
||||
|
import time |
||||
|
|
||||
|
from django.conf import settings |
||||
|
from django.core.management.base import BaseCommand |
||||
|
from django.utils import timezone |
||||
|
|
||||
|
|
||||
|
def _compute_hash(path): |
||||
|
h = hashlib.sha256() |
||||
|
with open(path, 'rb') as fh: |
||||
|
for chunk in iter(lambda: fh.read(65536), b''): |
||||
|
h.update(chunk) |
||||
|
return h.hexdigest() |
||||
|
|
||||
|
|
||||
|
class Command(BaseCommand): |
||||
|
help = ( |
||||
|
'Phase 2 backfill: fill file_size_bytes and content_hash for FileMetadata rows ' |
||||
|
'where backfilled_at IS NULL. Resumable and rate-limited.' |
||||
|
) |
||||
|
|
||||
|
def add_arguments(self, parser): |
||||
|
parser.add_argument( |
||||
|
'--batch-size', type=int, default=200, |
||||
|
help='Rows per iteration (default: 200).', |
||||
|
) |
||||
|
parser.add_argument( |
||||
|
'--rate-limit', type=int, default=0, |
||||
|
help='Max rows per second (0 = unlimited).', |
||||
|
) |
||||
|
parser.add_argument( |
||||
|
'--skip-hash', action='store_true', |
||||
|
help='Only populate file_size_bytes, skip SHA-256 (faster).', |
||||
|
) |
||||
|
parser.add_argument( |
||||
|
'--dry-run', action='store_true', |
||||
|
help='Report counts without writing.', |
||||
|
) |
||||
|
|
||||
|
def handle(self, *args, **options): |
||||
|
from sapl.base.models import FileMetadata |
||||
|
|
||||
|
batch_size = options['batch_size'] |
||||
|
rate_limit = options['rate_limit'] |
||||
|
skip_hash = options['skip_hash'] |
||||
|
dry_run = options['dry_run'] |
||||
|
|
||||
|
if dry_run: |
||||
|
self.stdout.write(self.style.WARNING('DRY RUN — nothing will be written.')) |
||||
|
|
||||
|
qs = FileMetadata.objects.filter(backfilled_at__isnull=True) |
||||
|
total = qs.count() |
||||
|
self.stdout.write(f'Phase 2: {total} rows to process...') |
||||
|
|
||||
|
processed = 0 |
||||
|
errors = 0 |
||||
|
batch_start = time.time() |
||||
|
|
||||
|
for meta in qs.iterator(chunk_size=batch_size): |
||||
|
full_path = os.path.join(settings.MEDIA_ROOT, meta.storage_name) |
||||
|
try: |
||||
|
stat = os.stat(full_path) |
||||
|
file_size_bytes = stat.st_size |
||||
|
content_hash = _compute_hash(full_path) if not skip_hash else '' |
||||
|
except OSError as e: |
||||
|
self.stdout.write(self.style.WARNING( |
||||
|
f' uuid={meta.uuid}: cannot read {full_path}: {e}')) |
||||
|
errors += 1 |
||||
|
continue |
||||
|
|
||||
|
if not dry_run: |
||||
|
update_fields = ['backfilled_at'] |
||||
|
meta.backfilled_at = timezone.now() |
||||
|
if meta.file_size_bytes is None: |
||||
|
meta.file_size_bytes = file_size_bytes |
||||
|
update_fields.append('file_size_bytes') |
||||
|
if not meta.content_hash and not skip_hash: |
||||
|
meta.content_hash = content_hash |
||||
|
update_fields.append('content_hash') |
||||
|
meta.save(update_fields=update_fields) |
||||
|
|
||||
|
processed += 1 |
||||
|
|
||||
|
if rate_limit > 0: |
||||
|
elapsed = time.time() - batch_start |
||||
|
target = processed / rate_limit |
||||
|
if target > elapsed: |
||||
|
time.sleep(target - elapsed) |
||||
|
|
||||
|
self.stdout.write(self.style.SUCCESS( |
||||
|
f'Phase 2 complete — processed: {processed}, errors: {errors}')) |
||||
|
if dry_run: |
||||
|
self.stdout.write(self.style.WARNING('(dry run — nothing was written)')) |
||||
@ -0,0 +1,153 @@ |
|||||
|
""" |
||||
|
Phase 1 backfill — creates FileMetadata rows for parent-model instances whose |
||||
|
_metadata FK is NULL. Pure DB work, no disk I/O. Completes in seconds. |
||||
|
|
||||
|
Safe to re-run (idempotent). Safe to run from multiple workers simultaneously |
||||
|
(bulk_create with ignore_conflicts=True handles races). |
||||
|
|
||||
|
Usage: |
||||
|
python manage.py backfill_file_metadata_structural [--batch-size=1000] |
||||
|
""" |
||||
|
from pathlib import Path |
||||
|
|
||||
|
from django.apps import apps |
||||
|
from django.core.management.base import BaseCommand |
||||
|
from django.db import transaction |
||||
|
|
||||
|
METADATA_FILE_FIELDS = [ |
||||
|
('materia', 'materialegislativa', 'texto_original'), |
||||
|
('materia', 'documentoacessorio', 'arquivo'), |
||||
|
('materia', 'proposicao', 'texto_original'), |
||||
|
('protocoloadm', 'documentoadministrativo', 'texto_integral'), |
||||
|
('protocoloadm', 'documentoacessorioadministrativo', 'arquivo'), |
||||
|
('norma', 'normajuridica', 'texto_integral'), |
||||
|
('norma', 'anexonormajuridica', 'anexo_arquivo'), |
||||
|
('comissoes', 'reuniao', 'upload_pauta'), |
||||
|
('comissoes', 'reuniao', 'upload_ata'), |
||||
|
('comissoes', 'reuniao', 'upload_anexo'), |
||||
|
('comissoes', 'documentoacessorio', 'arquivo'), |
||||
|
('audiencia', 'audienciapublica', 'upload_pauta'), |
||||
|
('audiencia', 'audienciapublica', 'upload_ata'), |
||||
|
('audiencia', 'audienciapublica', 'upload_anexo'), |
||||
|
('audiencia', 'anexoaudienciapublica', 'arquivo'), |
||||
|
('sessao', 'sessaoplenaria', 'upload_pauta'), |
||||
|
('sessao', 'sessaoplenaria', 'upload_ata'), |
||||
|
('sessao', 'sessaoplenaria', 'upload_anexo'), |
||||
|
('sessao', 'justificativaausencia', 'upload_anexo'), |
||||
|
('sessao', 'orador', 'upload_anexo'), |
||||
|
('sessao', 'oradorexpediente', 'upload_anexo'), |
||||
|
('sessao', 'oradorordemdia', 'upload_anexo'), |
||||
|
] |
||||
|
|
||||
|
|
||||
|
class Command(BaseCommand): |
||||
|
help = ( |
||||
|
'Phase 1 backfill: create FileMetadata rows for existing files (no disk I/O). ' |
||||
|
'Idempotent — skips instances where _metadata FK is already set.' |
||||
|
) |
||||
|
|
||||
|
def add_arguments(self, parser): |
||||
|
parser.add_argument( |
||||
|
'--batch-size', type=int, default=1000, |
||||
|
help='Rows per bulk_create batch (default: 1000).', |
||||
|
) |
||||
|
parser.add_argument( |
||||
|
'--dry-run', action='store_true', |
||||
|
help='Report counts without writing.', |
||||
|
) |
||||
|
|
||||
|
def handle(self, *args, **options): |
||||
|
from sapl.base.models import FileMetadata |
||||
|
|
||||
|
batch_size = options['batch_size'] |
||||
|
dry_run = options['dry_run'] |
||||
|
|
||||
|
if dry_run: |
||||
|
self.stdout.write(self.style.WARNING('DRY RUN — nothing will be written.')) |
||||
|
|
||||
|
total_created = 0 |
||||
|
|
||||
|
for app_label, model_name, field_name in METADATA_FILE_FIELDS: |
||||
|
try: |
||||
|
Model = apps.get_model(app_label, model_name) |
||||
|
except LookupError: |
||||
|
self.stdout.write(self.style.ERROR( |
||||
|
f' {app_label}.{model_name} not found — skipping.')) |
||||
|
continue |
||||
|
|
||||
|
meta_fk = f'{field_name}_metadata' |
||||
|
qs = ( |
||||
|
Model.objects |
||||
|
.filter(**{f'{field_name}__isnull': False, f'{meta_fk}__isnull': True}) |
||||
|
.exclude(**{field_name: ''}) |
||||
|
.only('pk', field_name, meta_fk) |
||||
|
) |
||||
|
count = qs.count() |
||||
|
if count == 0: |
||||
|
self.stdout.write(f' {app_label}.{model_name}.{field_name}: up to date.') |
||||
|
continue |
||||
|
|
||||
|
self.stdout.write(f' {app_label}.{model_name}.{field_name}: {count} rows...') |
||||
|
|
||||
|
if dry_run: |
||||
|
total_created += count |
||||
|
continue |
||||
|
|
||||
|
# Process in batches: bulk_create rows, then bulk_update the FK back. |
||||
|
batch = [] |
||||
|
instances = [] |
||||
|
for instance in qs.iterator(chunk_size=batch_size): |
||||
|
field_file = getattr(instance, field_name) |
||||
|
storage_name = field_file.name |
||||
|
batch.append(FileMetadata( |
||||
|
storage_name=storage_name, |
||||
|
original_filename=Path(storage_name).name, |
||||
|
app_label=app_label, |
||||
|
model_name=model_name, |
||||
|
field_name=field_name, |
||||
|
owner_pk=instance.pk, |
||||
|
)) |
||||
|
instances.append(instance) |
||||
|
|
||||
|
if len(batch) >= batch_size: |
||||
|
total_created += self._flush( |
||||
|
batch, instances, meta_fk, field_name, Model) |
||||
|
batch = [] |
||||
|
instances = [] |
||||
|
|
||||
|
if batch: |
||||
|
total_created += self._flush(batch, instances, meta_fk, field_name, Model) |
||||
|
|
||||
|
self.stdout.write(self.style.SUCCESS(f' done.')) |
||||
|
|
||||
|
self.stdout.write(f'Structural backfill complete — created: {total_created}') |
||||
|
if dry_run: |
||||
|
self.stdout.write(self.style.WARNING('(dry run — nothing was written)')) |
||||
|
|
||||
|
def _flush(self, batch, instances, meta_fk, field_name, Model): |
||||
|
from sapl.base.models import FileMetadata |
||||
|
|
||||
|
with transaction.atomic(): |
||||
|
created = FileMetadata.objects.bulk_create(batch, ignore_conflicts=True) |
||||
|
|
||||
|
# Re-query to get PKs for the rows we just inserted (bulk_create may not |
||||
|
# return PKs on all DB backends, and ignore_conflicts rows have pk=None). |
||||
|
storage_names = [m.storage_name for m in batch] |
||||
|
meta_map = { |
||||
|
m.storage_name: m.pk |
||||
|
for m in FileMetadata.objects.filter(storage_name__in=storage_names) |
||||
|
} |
||||
|
|
||||
|
update_instances = [] |
||||
|
for instance in instances: |
||||
|
field_file = getattr(instance, field_name) |
||||
|
pk = meta_map.get(field_file.name) |
||||
|
if pk: |
||||
|
setattr(instance, f'{meta_fk}_id', pk) |
||||
|
update_instances.append(instance) |
||||
|
|
||||
|
if update_instances: |
||||
|
with transaction.atomic(): |
||||
|
Model.objects.bulk_update(update_instances, [f'{meta_fk}_id']) |
||||
|
|
||||
|
return len(created) |
||||
@ -0,0 +1,43 @@ |
|||||
|
from django.db import migrations, models |
||||
|
|
||||
|
|
||||
|
class Migration(migrations.Migration): |
||||
|
|
||||
|
dependencies = [ |
||||
|
('base', '0061_file_metadata'), |
||||
|
] |
||||
|
|
||||
|
operations = [ |
||||
|
migrations.AddField( |
||||
|
model_name='filemetadata', |
||||
|
name='app_label', |
||||
|
field=models.CharField(blank=True, default='', max_length=100), |
||||
|
), |
||||
|
migrations.AddField( |
||||
|
model_name='filemetadata', |
||||
|
name='model_name', |
||||
|
field=models.CharField(blank=True, default='', max_length=100), |
||||
|
), |
||||
|
migrations.AddField( |
||||
|
model_name='filemetadata', |
||||
|
name='field_name', |
||||
|
field=models.CharField(blank=True, default='', max_length=100), |
||||
|
), |
||||
|
migrations.AddField( |
||||
|
model_name='filemetadata', |
||||
|
name='owner_pk', |
||||
|
field=models.PositiveIntegerField(blank=True, null=True), |
||||
|
), |
||||
|
migrations.AlterField( |
||||
|
model_name='filemetadata', |
||||
|
name='storage_name', |
||||
|
field=models.CharField(editable=False, max_length=512, verbose_name='Storage name'), |
||||
|
), |
||||
|
migrations.AddIndex( |
||||
|
model_name='filemetadata', |
||||
|
index=models.Index( |
||||
|
fields=['app_label', 'model_name', 'field_name'], |
||||
|
name='filemetadata_owner_context_idx', |
||||
|
), |
||||
|
), |
||||
|
] |
||||
Loading…
Reference in new issue