mirror of https://github.com/interlegis/sapl.git
11 changed files with 975 additions and 219 deletions
@ -0,0 +1,151 @@ |
|||
# CLAUDE.md |
|||
|
|||
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository. |
|||
|
|||
## Project Overview |
|||
|
|||
SAPL (Sistema de Apoio ao Processo Legislativo) is a Django-based legislative management system used by Brazilian municipal and state legislative houses. It manages bills, parliamentary sessions, committees, norms, protocols, and related legislative workflows. |
|||
|
|||
## Design references |
|||
|
|||
@/Users/eribeiro/projects/sapl-docs/rfc/files-metafields-impl.md |
|||
|
|||
|
|||
## Commands |
|||
|
|||
### Development |
|||
|
|||
```bash |
|||
# Run dev server |
|||
python manage.py runserver |
|||
|
|||
# Docker (dev, without bundled DB) |
|||
docker-compose -f docker/docker-compose-dev.yml up |
|||
|
|||
# Docker (dev, with PostgreSQL container) |
|||
docker-compose -f docker/docker-compose-dev-db.yml up |
|||
``` |
|||
|
|||
### Database Setup (local PostgreSQL) |
|||
|
|||
```bash |
|||
sudo -u postgres psql -c "CREATE ROLE sapl LOGIN ENCRYPTED PASSWORD 'sapl' NOSUPERUSER INHERIT CREATEDB NOCREATEROLE NOREPLICATION;" |
|||
sudo -u postgres psql -c "CREATE DATABASE sapl WITH OWNER=sapl ENCODING='UTF8' LC_COLLATE='pt_BR.UTF-8' LC_CTYPE='pt_BR.UTF-8' CONNECTION LIMIT=-1 TEMPLATE template0;" |
|||
python manage.py migrate |
|||
``` |
|||
|
|||
### Testing |
|||
|
|||
```bash |
|||
# All tests (reuses DB by default for speed) |
|||
pytest |
|||
|
|||
# Single test file or test function |
|||
pytest sapl/materia/tests/test_materia.py |
|||
pytest sapl/materia/tests/test_materia.py::test_function_name |
|||
|
|||
# Force DB recreation |
|||
pytest --create-db |
|||
|
|||
# With coverage |
|||
pytest --cov=sapl |
|||
``` |
|||
|
|||
Tests require `DJANGO_SETTINGS_MODULE=sapl.settings` (set in `pytest.ini`). All tests must be marked with `@pytest.mark.django_db`. The `conftest.py` root fixture provides an `app` fixture (WebTest `DjangoTestApp`). |
|||
|
|||
### Linting / Formatting |
|||
|
|||
```bash |
|||
flake8 . |
|||
isort . |
|||
autopep8 --in-place <file.py> |
|||
``` |
|||
|
|||
### Restore Database from Backup |
|||
|
|||
```bash |
|||
./scripts/restore_db.sh -f /path/to/dump |
|||
./scripts/restore_db.sh -f /path/to/dump -p 5433 # Docker port |
|||
``` |
|||
|
|||
## Architecture |
|||
|
|||
### Django Apps |
|||
|
|||
Apps are under `sapl/` and follow domain boundaries: |
|||
|
|||
| App | Domain | |
|||
|-----|--------| |
|||
| `base` | `CasaLegislativa` (legislative house config), `AppConfig`, `Autor` (authorship) | |
|||
| `parliamentary` | `Parlamentar`, `Legislatura`, `SessaoLegislativa`, `Coligacao` | |
|||
| `materia` | Bills (`MateriaLegislativa`), types, tracking, annexes | |
|||
| `norma` | Laws/norms (`NormaJuridica`) and hierarchies | |
|||
| `sessao` | Plenary sessions, agenda, attendance, voting | |
|||
| `comissoes` | Committees (`Comissao`) and meetings (`Reuniao`) | |
|||
| `protocoloadm` | Administrative protocols and document intake | |
|||
| `compilacao` | Structured/articulated texts (LexML-like tree structure) | |
|||
| `lexml` | LexML XML standard integration | |
|||
| `audiencia` | Public hearings | |
|||
| `painel` | Real-time session display panel | |
|||
| `relatorios` | PDF report generation | |
|||
| `api` | REST API entry point (auto-generated ViewSets) | |
|||
| `crud` | Generic CRUD base views | |
|||
| `rules` | Business rules and permission definitions | |
|||
|
|||
### REST API |
|||
|
|||
The API uses a custom `drfautoapi` package (`drfautoapi/drfautoapi.py`) that auto-generates DRF ViewSets, Serializers, and FilterSets from Django models. Authentication is Token + Session. Permissions use a custom `SaplModelPermissions` class that maps HTTP methods to Django model permissions. |
|||
|
|||
OpenAPI 3.0 docs are generated by drf-spectacular. |
|||
|
|||
### Caching |
|||
|
|||
- **Default:** File-based (`/var/tmp/django_cache`) |
|||
- **Production:** Redis via django-redis; configured at startup by `configure_redis_cache()` in `sapl/settings.py` |
|||
- **Cache key prefix:** `cache:{POD_NAMESPACE}:` (namespace-isolated for multi-tenant k8s) |
|||
- **Rate limiter state** is shared via Redis keys |
|||
|
|||
### Feature Flags |
|||
|
|||
django-waffle is used for feature flags. Switches (global on/off) can be toggled via: |
|||
|
|||
```bash |
|||
python manage.py waffle_switch <switch_name> on|off |
|||
``` |
|||
|
|||
### Key Environment Variables |
|||
|
|||
| Variable | Purpose | |
|||
|----------|---------| |
|||
| `DATABASE_URL` | PostgreSQL connection string | |
|||
| `SECRET_KEY` | Django secret key | |
|||
| `DEBUG` | Debug mode | |
|||
| `REDIS_URL` | Redis host:port | |
|||
| `CACHE_BACKEND` | `file` or `redis` | |
|||
| `POD_NAMESPACE` | K8s namespace (used in cache key prefix) | |
|||
| `USE_SOLR` | Enable Haystack/Solr full-text search | |
|||
| `SOLR_URL` / `SOLR_COLLECTION` | Solr connection | |
|||
|
|||
### Docker Build |
|||
|
|||
The production build requires a MaxMind GeoLite2-ASN license key (for nginx ASN-based bot blocking): |
|||
|
|||
```bash |
|||
docker build --secret id=maxmind_key,src=.env -f docker/Dockerfile -t sapl:local . |
|||
``` |
|||
|
|||
Optional build args: `WITH_NGINX`, `WITH_GRAPHVIZ`, `WITH_POPPLER`, `WITH_PSQL_CLIENT`. |
|||
|
|||
### Key File Locations |
|||
|
|||
| File | Purpose | |
|||
|------|---------| |
|||
| `sapl/settings.py` | All Django settings, including cache/rate-limit setup | |
|||
| `pytest.ini` | Test configuration (DJANGO_SETTINGS_MODULE, addopts) | |
|||
| `conftest.py` | Root pytest fixtures | |
|||
| `drfautoapi/drfautoapi.py` | Auto-API generation logic | |
|||
| `docker/startup_scripts/start.sh` | Container entrypoint (migrations, waffle, gunicorn) | |
|||
| `requirements/requirements.txt` | Production deps | |
|||
| `requirements/test-requirements.txt` | Test deps | |
|||
| `requirements/dev-requirements.txt` | Dev/lint deps | |
|||
|
|||
@ -0,0 +1,111 @@ |
|||
""" |
|||
Phase 2 backfill — fills file_size_bytes and content_hash for FileMetadata rows |
|||
where backfilled_at IS NULL. I/O-bound; designed to run at low priority over |
|||
days or weeks without affecting production traffic. |
|||
|
|||
Safe to interrupt and resume: sets backfilled_at on each completed row. |
|||
|
|||
Usage: |
|||
python manage.py backfill_file_metadata_hashes [--batch-size=200] [--rate-limit=20] |
|||
|
|||
Progress query: |
|||
SELECT |
|||
COUNT(*) FILTER (WHERE backfilled_at IS NULL) AS pending, |
|||
COUNT(*) FILTER (WHERE backfilled_at IS NOT NULL) AS done |
|||
FROM base_file_metadata; |
|||
""" |
|||
import hashlib |
|||
import os |
|||
import time |
|||
|
|||
from django.conf import settings |
|||
from django.core.management.base import BaseCommand |
|||
from django.utils import timezone |
|||
|
|||
|
|||
def _compute_hash(path): |
|||
h = hashlib.sha256() |
|||
with open(path, 'rb') as fh: |
|||
for chunk in iter(lambda: fh.read(65536), b''): |
|||
h.update(chunk) |
|||
return h.hexdigest() |
|||
|
|||
|
|||
class Command(BaseCommand): |
|||
help = ( |
|||
'Phase 2 backfill: fill file_size_bytes and content_hash for FileMetadata rows ' |
|||
'where backfilled_at IS NULL. Resumable and rate-limited.' |
|||
) |
|||
|
|||
def add_arguments(self, parser): |
|||
parser.add_argument( |
|||
'--batch-size', type=int, default=200, |
|||
help='Rows per iteration (default: 200).', |
|||
) |
|||
parser.add_argument( |
|||
'--rate-limit', type=int, default=0, |
|||
help='Max rows per second (0 = unlimited).', |
|||
) |
|||
parser.add_argument( |
|||
'--skip-hash', action='store_true', |
|||
help='Only populate file_size_bytes, skip SHA-256 (faster).', |
|||
) |
|||
parser.add_argument( |
|||
'--dry-run', action='store_true', |
|||
help='Report counts without writing.', |
|||
) |
|||
|
|||
def handle(self, *args, **options): |
|||
from sapl.base.models import FileMetadata |
|||
|
|||
batch_size = options['batch_size'] |
|||
rate_limit = options['rate_limit'] |
|||
skip_hash = options['skip_hash'] |
|||
dry_run = options['dry_run'] |
|||
|
|||
if dry_run: |
|||
self.stdout.write(self.style.WARNING('DRY RUN — nothing will be written.')) |
|||
|
|||
qs = FileMetadata.objects.filter(backfilled_at__isnull=True) |
|||
total = qs.count() |
|||
self.stdout.write(f'Phase 2: {total} rows to process...') |
|||
|
|||
processed = 0 |
|||
errors = 0 |
|||
batch_start = time.time() |
|||
|
|||
for meta in qs.iterator(chunk_size=batch_size): |
|||
full_path = os.path.join(settings.MEDIA_ROOT, meta.storage_name) |
|||
try: |
|||
stat = os.stat(full_path) |
|||
file_size_bytes = stat.st_size |
|||
content_hash = _compute_hash(full_path) if not skip_hash else '' |
|||
except OSError as e: |
|||
self.stdout.write(self.style.WARNING( |
|||
f' uuid={meta.uuid}: cannot read {full_path}: {e}')) |
|||
errors += 1 |
|||
continue |
|||
|
|||
if not dry_run: |
|||
update_fields = ['backfilled_at'] |
|||
meta.backfilled_at = timezone.now() |
|||
if meta.file_size_bytes is None: |
|||
meta.file_size_bytes = file_size_bytes |
|||
update_fields.append('file_size_bytes') |
|||
if not meta.content_hash and not skip_hash: |
|||
meta.content_hash = content_hash |
|||
update_fields.append('content_hash') |
|||
meta.save(update_fields=update_fields) |
|||
|
|||
processed += 1 |
|||
|
|||
if rate_limit > 0: |
|||
elapsed = time.time() - batch_start |
|||
target = processed / rate_limit |
|||
if target > elapsed: |
|||
time.sleep(target - elapsed) |
|||
|
|||
self.stdout.write(self.style.SUCCESS( |
|||
f'Phase 2 complete — processed: {processed}, errors: {errors}')) |
|||
if dry_run: |
|||
self.stdout.write(self.style.WARNING('(dry run — nothing was written)')) |
|||
@ -0,0 +1,153 @@ |
|||
""" |
|||
Phase 1 backfill — creates FileMetadata rows for parent-model instances whose |
|||
_metadata FK is NULL. Pure DB work, no disk I/O. Completes in seconds. |
|||
|
|||
Safe to re-run (idempotent). Safe to run from multiple workers simultaneously |
|||
(bulk_create with ignore_conflicts=True handles races). |
|||
|
|||
Usage: |
|||
python manage.py backfill_file_metadata_structural [--batch-size=1000] |
|||
""" |
|||
from pathlib import Path |
|||
|
|||
from django.apps import apps |
|||
from django.core.management.base import BaseCommand |
|||
from django.db import transaction |
|||
|
|||
METADATA_FILE_FIELDS = [ |
|||
('materia', 'materialegislativa', 'texto_original'), |
|||
('materia', 'documentoacessorio', 'arquivo'), |
|||
('materia', 'proposicao', 'texto_original'), |
|||
('protocoloadm', 'documentoadministrativo', 'texto_integral'), |
|||
('protocoloadm', 'documentoacessorioadministrativo', 'arquivo'), |
|||
('norma', 'normajuridica', 'texto_integral'), |
|||
('norma', 'anexonormajuridica', 'anexo_arquivo'), |
|||
('comissoes', 'reuniao', 'upload_pauta'), |
|||
('comissoes', 'reuniao', 'upload_ata'), |
|||
('comissoes', 'reuniao', 'upload_anexo'), |
|||
('comissoes', 'documentoacessorio', 'arquivo'), |
|||
('audiencia', 'audienciapublica', 'upload_pauta'), |
|||
('audiencia', 'audienciapublica', 'upload_ata'), |
|||
('audiencia', 'audienciapublica', 'upload_anexo'), |
|||
('audiencia', 'anexoaudienciapublica', 'arquivo'), |
|||
('sessao', 'sessaoplenaria', 'upload_pauta'), |
|||
('sessao', 'sessaoplenaria', 'upload_ata'), |
|||
('sessao', 'sessaoplenaria', 'upload_anexo'), |
|||
('sessao', 'justificativaausencia', 'upload_anexo'), |
|||
('sessao', 'orador', 'upload_anexo'), |
|||
('sessao', 'oradorexpediente', 'upload_anexo'), |
|||
('sessao', 'oradorordemdia', 'upload_anexo'), |
|||
] |
|||
|
|||
|
|||
class Command(BaseCommand): |
|||
help = ( |
|||
'Phase 1 backfill: create FileMetadata rows for existing files (no disk I/O). ' |
|||
'Idempotent — skips instances where _metadata FK is already set.' |
|||
) |
|||
|
|||
def add_arguments(self, parser): |
|||
parser.add_argument( |
|||
'--batch-size', type=int, default=1000, |
|||
help='Rows per bulk_create batch (default: 1000).', |
|||
) |
|||
parser.add_argument( |
|||
'--dry-run', action='store_true', |
|||
help='Report counts without writing.', |
|||
) |
|||
|
|||
def handle(self, *args, **options): |
|||
from sapl.base.models import FileMetadata |
|||
|
|||
batch_size = options['batch_size'] |
|||
dry_run = options['dry_run'] |
|||
|
|||
if dry_run: |
|||
self.stdout.write(self.style.WARNING('DRY RUN — nothing will be written.')) |
|||
|
|||
total_created = 0 |
|||
|
|||
for app_label, model_name, field_name in METADATA_FILE_FIELDS: |
|||
try: |
|||
Model = apps.get_model(app_label, model_name) |
|||
except LookupError: |
|||
self.stdout.write(self.style.ERROR( |
|||
f' {app_label}.{model_name} not found — skipping.')) |
|||
continue |
|||
|
|||
meta_fk = f'{field_name}_metadata' |
|||
qs = ( |
|||
Model.objects |
|||
.filter(**{f'{field_name}__isnull': False, f'{meta_fk}__isnull': True}) |
|||
.exclude(**{field_name: ''}) |
|||
.only('pk', field_name, meta_fk) |
|||
) |
|||
count = qs.count() |
|||
if count == 0: |
|||
self.stdout.write(f' {app_label}.{model_name}.{field_name}: up to date.') |
|||
continue |
|||
|
|||
self.stdout.write(f' {app_label}.{model_name}.{field_name}: {count} rows...') |
|||
|
|||
if dry_run: |
|||
total_created += count |
|||
continue |
|||
|
|||
# Process in batches: bulk_create rows, then bulk_update the FK back. |
|||
batch = [] |
|||
instances = [] |
|||
for instance in qs.iterator(chunk_size=batch_size): |
|||
field_file = getattr(instance, field_name) |
|||
storage_name = field_file.name |
|||
batch.append(FileMetadata( |
|||
storage_name=storage_name, |
|||
original_filename=Path(storage_name).name, |
|||
app_label=app_label, |
|||
model_name=model_name, |
|||
field_name=field_name, |
|||
owner_pk=instance.pk, |
|||
)) |
|||
instances.append(instance) |
|||
|
|||
if len(batch) >= batch_size: |
|||
total_created += self._flush( |
|||
batch, instances, meta_fk, field_name, Model) |
|||
batch = [] |
|||
instances = [] |
|||
|
|||
if batch: |
|||
total_created += self._flush(batch, instances, meta_fk, field_name, Model) |
|||
|
|||
self.stdout.write(self.style.SUCCESS(f' done.')) |
|||
|
|||
self.stdout.write(f'Structural backfill complete — created: {total_created}') |
|||
if dry_run: |
|||
self.stdout.write(self.style.WARNING('(dry run — nothing was written)')) |
|||
|
|||
def _flush(self, batch, instances, meta_fk, field_name, Model): |
|||
from sapl.base.models import FileMetadata |
|||
|
|||
with transaction.atomic(): |
|||
created = FileMetadata.objects.bulk_create(batch, ignore_conflicts=True) |
|||
|
|||
# Re-query to get PKs for the rows we just inserted (bulk_create may not |
|||
# return PKs on all DB backends, and ignore_conflicts rows have pk=None). |
|||
storage_names = [m.storage_name for m in batch] |
|||
meta_map = { |
|||
m.storage_name: m.pk |
|||
for m in FileMetadata.objects.filter(storage_name__in=storage_names) |
|||
} |
|||
|
|||
update_instances = [] |
|||
for instance in instances: |
|||
field_file = getattr(instance, field_name) |
|||
pk = meta_map.get(field_file.name) |
|||
if pk: |
|||
setattr(instance, f'{meta_fk}_id', pk) |
|||
update_instances.append(instance) |
|||
|
|||
if update_instances: |
|||
with transaction.atomic(): |
|||
Model.objects.bulk_update(update_instances, [f'{meta_fk}_id']) |
|||
|
|||
return len(created) |
|||
@ -0,0 +1,43 @@ |
|||
from django.db import migrations, models |
|||
|
|||
|
|||
class Migration(migrations.Migration): |
|||
|
|||
dependencies = [ |
|||
('base', '0061_file_metadata'), |
|||
] |
|||
|
|||
operations = [ |
|||
migrations.AddField( |
|||
model_name='filemetadata', |
|||
name='app_label', |
|||
field=models.CharField(blank=True, default='', max_length=100), |
|||
), |
|||
migrations.AddField( |
|||
model_name='filemetadata', |
|||
name='model_name', |
|||
field=models.CharField(blank=True, default='', max_length=100), |
|||
), |
|||
migrations.AddField( |
|||
model_name='filemetadata', |
|||
name='field_name', |
|||
field=models.CharField(blank=True, default='', max_length=100), |
|||
), |
|||
migrations.AddField( |
|||
model_name='filemetadata', |
|||
name='owner_pk', |
|||
field=models.PositiveIntegerField(blank=True, null=True), |
|||
), |
|||
migrations.AlterField( |
|||
model_name='filemetadata', |
|||
name='storage_name', |
|||
field=models.CharField(editable=False, max_length=512, verbose_name='Storage name'), |
|||
), |
|||
migrations.AddIndex( |
|||
model_name='filemetadata', |
|||
index=models.Index( |
|||
fields=['app_label', 'model_name', 'field_name'], |
|||
name='filemetadata_owner_context_idx', |
|||
), |
|||
), |
|||
] |
|||
Loading…
Reference in new issue