# Data model

luplo's schema was frozen on 2026-04-13 as **twelve tables** split across
three concerns: core domain (6), sync and history (3), and glossary (3).
A post-freeze refactor promoted `items` into a general-purpose substrate;
tasks and QA checks stopped being new tables and became **item types**
with typed `context` JSONB. A small `item_types` registry gives the
database a language-agnostic contract for extension.

## The twelve tables

### Core (6)

| Table | Purpose |
|---|---|
| `projects` | Top-level scope. Most queries are `WHERE project_id = ?`. |
| `actors` | Attribution registry (v0.7.0+). UUID id, NOT-NULL email as a label (git-committer style). No passwords, no auth — identity belongs to whatever wraps luplo. |
| `systems` | Named components inside a project with optional dependency edges. "Auth", "Payments", "Notifications". Items and work units tag into these. |
| `items` | The substrate. One row per decision / knowledge / policy / document / task / qa_check / research (and any user-registered type). See below for columns. |
| `links` | Typed edges — item↔item, item↔system, item↔work_unit. |
| `work_units` | User-facing intent grouping. Spans multiple sessions. Replaces the earlier `sessions` concept. |

### Sync & history (3)

| Table | Purpose |
|---|---|
| `items_history` | Immutable row per semantic edit of an item. Carries `semantic_impact` (see {doc}`../reference/semantic-impact`). |
| `audit_log` | Every write through the core, with `actor_id`, `action`, payload. The authoritative "who did what". |
| `sync_jobs` | Debounced outbound-sync queue drained by the worker. |

### Glossary (3)

| Table | Purpose |
|---|---|
| `glossary_groups` | A normalized term plus its approved aliases. Expands queries at search time. |
| `glossary_terms` | Pending candidates awaiting curation (plus rejected history for auditability). |
| `glossary_rejections` | Permanent "do not suggest this again" list, scoped per group. |

## Items as substrate

The big shift after 2026-04-13 was giving up the three-tier split (items /
tasks / qa_checks as separate tables) and promoting `items` into a
general-purpose row. The reasons:

- `items` already had `item_type`, `system_ids`, `tags`, `supersedes_id`,
  `deleted_at`, actor, project — every cross-cutting field a domain
  object needs.
- An OSS user wanting a custom type (sprints, retros, stand-ups) can now
  `INSERT` into `item_types` and start using it. Forking luplo is no
  longer the entry fee.
- The DB, not Python, is the contract. Raw SQL and non-Python clients
  can add a type without importing anything.

### The `item_types` registry

```sql
CREATE TABLE item_types (
  key          TEXT PRIMARY KEY,   -- 'task', 'qa_check', 'decision', ...
  display_name TEXT NOT NULL,
  schema       JSONB NOT NULL,     -- JSON Schema, validates context
  owner        TEXT NOT NULL,      -- 'system' or 'user'
  created_at   TIMESTAMPTZ DEFAULT now()
);
```

`items.item_type` carries a foreign key into this table. Seven system
types ship in the migrations:

| key | Schema policy | Notes |
|---|---|---|
| `decision` | loose (`additionalProperties` permitted) | free-form rationale + alternatives |
| `knowledge` | loose | "how X works", gotchas |
| `policy` | loose | organisational must-do / must-not |
| `document` | loose | specs, RFCs |
| `research` | loose + `source_url NOT NULL` CHECK | cached external references with TTL |
| `task` | strict (`additionalProperties=false`) | status machine, sort_order, block reason |
| `qa_check` | strict | coverage, area, severity, target arrays |

**Strict vs loose is by intent, not by owner.** Types that gate behavior
(task, qa_check) enforce their context shape so the state machines stay
honest. Types that exist for human prose (decision, policy) stay permissive
so new fields don't require a migration. See P6 in the project decision log.

## The `items` row

Key columns shared across every item type:

| Column | Type | Role |
|---|---|---|
| `id` | UUID | primary key |
| `project_id` | UUID | scope |
| `item_type` | TEXT FK → `item_types.key` | domain discriminator |
| `title` | TEXT | human label (indexed for tsquery) |
| `body` | TEXT | main content (indexed) |
| `rationale` | TEXT | why this decision (indexed) |
| `alternatives` | TEXT | what was considered and rejected |
| `system_ids` | UUID[] | systems touched (GIN indexed) |
| `tags` | TEXT[] | free tags (GIN indexed) |
| `context` | JSONB | type-specific fields, validated against `item_types.schema` |
| `source_url` | TEXT | external reference (research type requires it) |
| `actor_id` | UUID | who wrote this row |
| `work_unit_id` | UUID NULL | optional grouping |
| `supersedes_id` | UUID NULL | previous chain head when edited |
| `deleted_at` | TIMESTAMPTZ NULL | soft delete |
| `ts` | `tsvector` | generated — full-text search vector |
| `embedding` | `vector(1024)` | only when `pgvector` is installed |

### Supersede chain

> **Decisions are immutable. They get superseded, never edited. Your
> mistakes are your most valuable data.**

Edits **never mutate** an existing row. `update_item` writes a new row
with `supersedes_id` pointing at the previous head. Readers follow the
chain to the current head; auditors walk it backwards. The principle:
a wrong decision teaches more than a right one — overwriting it destroys
the lesson.

Two deliberate exceptions where luplo **updates in place with an audit
entry** instead of creating a new row:

- Task `sort_order` — a presentation concern; N=100 reorders would
  otherwise create 3N writes and bloat the chain (P10).
- QA-check revalidation trigger — a system-initiated "please re-look"
  flag, not a human-decided change (P9).

These exceptions are recorded as explicit policy decisions in the project
log. Everything else obeys the rule "human decision → new row".

### Soft delete

`deleted_at` marks rows as gone without physically removing them.
`get_item` and default listings hide deleted rows; history and audit
readers still see them. This is the compliance floor: no write is ever
lost, and any past state can be reconstructed.

## Raw captures

`captures` is a raw private intake stream outside the project / item /
system graph. It intentionally has no `project_id`, `system_ids`, or
`work_unit_id`: capture-time input should succeed with text alone, and
raw backlog material must not pollute curated item search.

Core stores caller-provided text and optional BYOLLM annotations:

- `summary` is supplied by the caller, never generated by core.
- `sensitivity_hint` is supplied by the caller and limited to `none`,
  `possible`, or `sensitive`.
- `signals` is a JSON object of caller-provided hints such as tags,
  project guesses, mood labels, or confidence values. Core validates
  only that the value is an object.

Default capture search/list stays separate from items. A capture becomes
curated memory only through explicit promotion. Promotion creates a
normal `items` row through the existing item creation path, then records
a `capture_promotions` bridge row:

| Column | Role |
|---|---|
| `capture_id` | Raw capture used as source material. |
| `target_item_id` | Curated item created from the promotion. |
| `promoted_as` | Requested item type at promotion time. The item row remains authoritative. |
| `created_by`, `created_at` | Promotion attribution and timestamp. |

The raw capture row is preserved after promotion and its `review_state`
becomes `promoted`. Redaction is the privacy exception: capture
redaction replaces `text` and `summary` with `[redacted]`, clears
`signals`, stamps `redacted_at` / `redacted_by`, and rebuilds the search
vector so original capture content no longer matches normal search.

## Relationships

- **`links`** carries typed edges. Common kinds: `supersedes`, `implements`,
  `conflicts_with`, `belongs_to` (item→work_unit), `touches` (item→system).
- **`systems`** can declare dependencies on other systems — a small graph
  captured per project so briefs and search can include neighbours.
- **`work_units`** attach to items via `items.work_unit_id` and to the
  creator/closer via two `actor_id` columns — see
  {doc}`../guides/work-units` for the A→B handoff pattern.

## Where the design is written down

Migrations are the executable spec. They ship inside the wheel under
``src/luplo/_db_assets/migrations/`` so ``lp migrate`` works from a
PyPI install without a source checkout:

```
src/luplo/_db_assets/migrations/
├── 0001_init_schema.py               # 12 tables, frozen 2026-04-13
├── 0002_auth_redesign.py             # actors TEXT → UUID, email-first
├── 0003_item_types_and_context.py    # substrate refactor + registry
├── 0004_add_research_item_type.py    # research type + URL CHECK
├── 0005_auth_reset_tokens.py
├── 0006_drop_auth.py
└── 0007_work_units_context.py        # work_units.context jsonb + GIN index
```

See {doc}`../project/changelog` for the narrative version.