Data model

luplo’s schema was frozen on 2026-04-13 as twelve tables split across three concerns: core domain (6), sync and history (3), and glossary (3). A post-freeze refactor promoted items into a general-purpose substrate; tasks and QA checks stopped being new tables and became item types with typed context JSONB. A small item_types registry gives the database a language-agnostic contract for extension.

The twelve tables

Core (6)

Table

Purpose

projects

Top-level scope. Most queries are WHERE project_id = ?.

actors

Attribution registry (v0.7.0+). UUID id, NOT-NULL email as a label (git-committer style). No passwords, no auth — identity belongs to whatever wraps luplo.

systems

Named components inside a project with optional dependency edges. “Auth”, “Payments”, “Notifications”. Items and work units tag into these.

items

The substrate. One row per decision / knowledge / policy / document / task / qa_check / research (and any user-registered type). See below for columns.

links

Typed edges — item↔item, item↔system, item↔work_unit.

work_units

User-facing intent grouping. Spans multiple sessions. Replaces the earlier sessions concept.

Sync & history (3)

Table

Purpose

items_history

Immutable row per semantic edit of an item. Carries semantic_impact (see Semantic impact categories).

audit_log

Every write through the core, with actor_id, action, payload. The authoritative “who did what”.

sync_jobs

Debounced outbound-sync queue drained by the worker.

Glossary (3)

Table

Purpose

glossary_groups

A normalized term plus its approved aliases. Expands queries at search time.

glossary_terms

Pending candidates awaiting curation (plus rejected history for auditability).

glossary_rejections

Permanent “do not suggest this again” list, scoped per group.

Items as substrate

The big shift after 2026-04-13 was giving up the three-tier split (items / tasks / qa_checks as separate tables) and promoting items into a general-purpose row. The reasons:

  • items already had item_type, system_ids, tags, supersedes_id, deleted_at, actor, project — every cross-cutting field a domain object needs.

  • An OSS user wanting a custom type (sprints, retros, stand-ups) can now INSERT into item_types and start using it. Forking luplo is no longer the entry fee.

  • The DB, not Python, is the contract. Raw SQL and non-Python clients can add a type without importing anything.

The item_types registry

CREATE TABLE item_types (
  key          TEXT PRIMARY KEY,   -- 'task', 'qa_check', 'decision', ...
  display_name TEXT NOT NULL,
  schema       JSONB NOT NULL,     -- JSON Schema, validates context
  owner        TEXT NOT NULL,      -- 'system' or 'user'
  created_at   TIMESTAMPTZ DEFAULT now()
);

items.item_type carries a foreign key into this table. Seven system types ship in the migrations:

key

Schema policy

Notes

decision

loose (additionalProperties permitted)

free-form rationale + alternatives

knowledge

loose

“how X works”, gotchas

policy

loose

organisational must-do / must-not

document

loose

specs, RFCs

research

loose + source_url NOT NULL CHECK

cached external references with TTL

task

strict (additionalProperties=false)

status machine, sort_order, block reason

qa_check

strict

coverage, area, severity, target arrays

Strict vs loose is by intent, not by owner. Types that gate behavior (task, qa_check) enforce their context shape so the state machines stay honest. Types that exist for human prose (decision, policy) stay permissive so new fields don’t require a migration. See P6 in the project decision log.

The items row

Key columns shared across every item type:

Column

Type

Role

id

UUID

primary key

project_id

UUID

scope

item_type

TEXT FK → item_types.key

domain discriminator

title

TEXT

human label (indexed for tsquery)

body

TEXT

main content (indexed)

rationale

TEXT

why this decision (indexed)

alternatives

TEXT

what was considered and rejected

system_ids

UUID[]

systems touched (GIN indexed)

tags

TEXT[]

free tags (GIN indexed)

context

JSONB

type-specific fields, validated against item_types.schema

source_url

TEXT

external reference (research type requires it)

actor_id

UUID

who wrote this row

work_unit_id

UUID NULL

optional grouping

supersedes_id

UUID NULL

previous chain head when edited

deleted_at

TIMESTAMPTZ NULL

soft delete

ts

tsvector

generated — full-text search vector

embedding

vector(1024)

only when pgvector is installed

Supersede chain

Decisions are immutable. They get superseded, never edited. Your mistakes are your most valuable data.

Edits never mutate an existing row. update_item writes a new row with supersedes_id pointing at the previous head. Readers follow the chain to the current head; auditors walk it backwards. The principle: a wrong decision teaches more than a right one — overwriting it destroys the lesson.

Two deliberate exceptions where luplo updates in place with an audit entry instead of creating a new row:

  • Task sort_order — a presentation concern; N=100 reorders would otherwise create 3N writes and bloat the chain (P10).

  • QA-check revalidation trigger — a system-initiated “please re-look” flag, not a human-decided change (P9).

These exceptions are recorded as explicit policy decisions in the project log. Everything else obeys the rule “human decision → new row”.

Soft delete

deleted_at marks rows as gone without physically removing them. get_item and default listings hide deleted rows; history and audit readers still see them. This is the compliance floor: no write is ever lost, and any past state can be reconstructed.

Raw captures

captures is a raw private intake stream outside the project / item / system graph. It intentionally has no project_id, system_ids, or work_unit_id: capture-time input should succeed with text alone, and raw backlog material must not pollute curated item search.

Core stores caller-provided text and optional BYOLLM annotations:

  • summary is supplied by the caller, never generated by core.

  • sensitivity_hint is supplied by the caller and limited to none, possible, or sensitive.

  • signals is a JSON object of caller-provided hints such as tags, project guesses, mood labels, or confidence values. Core validates only that the value is an object.

Default capture search/list stays separate from items. A capture becomes curated memory only through explicit promotion. Promotion creates a normal items row through the existing item creation path, then records a capture_promotions bridge row:

Column

Role

capture_id

Raw capture used as source material.

target_item_id

Curated item created from the promotion.

promoted_as

Requested item type at promotion time. The item row remains authoritative.

created_by, created_at

Promotion attribution and timestamp.

The raw capture row is preserved after promotion and its review_state becomes promoted. Redaction is the privacy exception: capture redaction replaces text and summary with [redacted], clears signals, stamps redacted_at / redacted_by, and rebuilds the search vector so original capture content no longer matches normal search.

Relationships

  • links carries typed edges. Common kinds: supersedes, implements, conflicts_with, belongs_to (item→work_unit), touches (item→system).

  • systems can declare dependencies on other systems — a small graph captured per project so briefs and search can include neighbours.

  • work_units attach to items via items.work_unit_id and to the creator/closer via two actor_id columns — see Work units for the A→B handoff pattern.

Where the design is written down

Migrations are the executable spec. They ship inside the wheel under src/luplo/_db_assets/migrations/ so lp migrate works from a PyPI install without a source checkout:

src/luplo/_db_assets/migrations/
├── 0001_init_schema.py               # 12 tables, frozen 2026-04-13
├── 0002_auth_redesign.py             # actors TEXT → UUID, email-first
├── 0003_item_types_and_context.py    # substrate refactor + registry
├── 0004_add_research_item_type.py    # research type + URL CHECK
├── 0005_auth_reset_tokens.py
├── 0006_drop_auth.py
└── 0007_work_units_context.py        # work_units.context jsonb + GIN index

See Changelog for the narrative version.